Research, tool/analysis review

Asking new questions, part 3 – RezoViz

This is the third (first, second) analysis review in this series, and it is the third tool produced by Voyant.

RezoViz takes a corpus of text, identifies which words appear in documents together, and assigns a weight according to the frequency of that appearance. For example, one document that has the words Istanbul, Erdogan, and Minister and a document with Istanbul, Erdogan, and CHP will add 1 unit of weight to the link between Istanbul and Erdogan. Across many documents, you can begin to see which topics are discussed together. Below is an example of a graph that can be drawn given this data.

Why Turkey not mentioned in these

The green lines represent the words between which there are links, i.e. between the words that appear in the same document. The numbers in circles represent the ‘weight’ with which each have a link, i.e. how many documents they appear in together. In this image, the terms displayed are the ones with the most links (above whatever threshold). I’ve highlighted the term ‘Turkey’, so that linked terms with ‘Turkey’ appear in red.

Several types of questions can be derived from this kind of representation. Are there unexpected links between terms? Are there no links between terms that would be expected to have links? What can the weights of each link tell us about the importance of links?

Turkey is known for vary nationalistic press, so terms like ‘turkey’ are expected. But why don’t terms like ‘Taksim solidarity’, ‘Kurdish Communities Union’ or ‘Multu’ (the then governor of Istanbul province) appear with mentions of ‘Turkey’? To research this further, we would have to dig deeper into the contexts in which they arise, a tool for which could be here.

The picture below depicts the same data, but showing only the most frequent terms. This means that even terms which have no links to others on the graph are shown. The ones floating alone have a high frequency, but are not linked with other terms that also have a high frequency.

Screen Shot 2015-07-07 at 14.34.35  (2)

This can be al the more revealing, because the ‘floaters’ aren’t linked with popular topics, but the newspaper still chooses to publish on them regularly. Why are topics unassociated with popular topics published frequently?

Or this example:

Why CHP Erdogan and Kilicdaroglu same

 

CHP is the main opposition party to the AKP. Yet when I highlight CHP, the leader of both parties are also highlighted, and with the same weight. Does this mean that the newspaper speaks about the AKP in every article on the CHP, and in terms of their leaders?

Below is the raw analyzed data from the corpus of text, in this case from after the height of the protests. RezoViz identifies the frequent terms as either a person, organization, or location, without extra effort. This may however limit the kinds of things it draws links between, because it could still be revealing to see the link between figures in government and the kind of language that surrounds them, for example. For this reason, it is a kind of collocate tool that only looks at people, places, and organizations that appear anywhere in the same document.

Screen Shot 2015-07-07 at 14.11.59                                     Screen Shot 2015-07-07 at 14.24.21

 

Below is a graph showing the frequency of each term and which “type” they are. As you can see, the program uses a Stanford database of people and places, but guesses for the rest and is not always right.

After evenly distributed

1 thought on “Asking new questions, part 3 – RezoViz”

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s