Ngram Networks

Chris Bail, Duke University
SICSS, Day 2

Basics of Social Network Analysis

From Social Networks to Text Networks

Ngram Networks

1) Instead of treating people as nodes, treat people's shared used of words as edges

Ngram Networks

1) Instead of treating people as nodes, treat people's shared used of words as edges

2) The strength of these edges can be determined via various NLP methods (e.g. Term Frequency-Inverse Document Frequency)

Ngram Networks

1) Instead of treating people as nodes, treat people's shared used of words as edges

2) The strength of these edges can be determined via various NLP methods (e.g. Term Frequency-Inverse Document Frequency)

3) Group people (or documents if you like) using various centrality/community detection techniques

What is an Ngram?

Constructing Ngram Networks with Noun Phrases

The Ngram Networks that link ASD Advocacy Orgs on Facebook

State of the Union Addresses

Advantages of Ngram Networks

1) Recognizes the relational nature of meaning (meaning is construed via the relationships between various symbols)

2) Less sensitive to word length restrictions that restrict topic models

3) Better equipped to handle shifts over time?

4) Better validation methods (e.g. optimal modularity)

5) More parismonious and transparent?

Coding Ngram Networks

Worked Example on SICSS webpage

Bail, Christopher A. 2016. “Combining Network Analysis and Natural Language Processing to Examine how Ad- vocacy Organizations Stimulate Conversation on Social Media.” Proceedings of the National Academy of Sciences, 113:42 11823-11828

Rule Alix and Jean-Phillipe Cointet and Peter Bearman. 2015 “Lexical shifts, substantive changes, and continuity in State of the Union discourse, 1790–2014”

GROUP EXERCISE

Group Exercise

1) Visit the list of text datasets here.

2) Put your name next to the dataset that interests you most.

3) Find the rest of the people who chose that dataset.

4) As a group, discuss a) what interesting research questions can be asked with this data; and b) which types of quantitative text analysis would be most useful to study this question.

5) Write code together

6) No presentations.