I’m not one to say that any one day is more important than another, but today may dictate the future for millions of migrants seeking asylum outside their countries, and for millions more who have seen the refugee route as a way to safety from war-torn countries. Today hinges on Turkey.
I have completed my first very rough text analysis on my first section of my first newspaper for my first event. There are myriad ways to improve and new directions to go, but I am at a stifling disadvantage compared to other DH projects. As far as I know, this project is the first of its kind.
I am thus confronted by unchartered territory. I can approach a large-scale DH news article analysis any way I like. I can decide that structure of news articles is more important than the number of notable figures in a piece. Or that the presence of a picture deems a news article more important than others and should be considered with proportionally more weight.
But the potential for crippling limitations is intimidating. Any one of the assumptions I’m making can throw my project down the drain. I probably don’t know half of the assumptions I’m making. What if the personal style of authors plays a much larger role than I recognize?
Keyness is one tool within the free utility AntConc and I’ll cover what I’ve done with it on my last post in this series on the first part of the text analysis. It’s a simple but powerful tool that has the potential to be the most revealing.
Keyness essentially measures how “unusual” a word is in a corpus of text compared to a “reference” corpus. The tool allows anyone to put in a reference text that it then labels as normal. When you upload your own corpus, the tool carries out a simple mathematical formula on word frequencies in each corpus. By comparing frequencies, it quantifies how strange the appearance of one word is compared to the norm.
The results allow us to see into which topics are discussed in one period and not another. A newspaper’s publications naturally vary in topic over the course of even a week, so the big names will most certainly change over time. Sometimes it is quite difficult to discern whether a term is related to a specific story being covered, or if it is being used more often in other contexts.
Our results however have shown that not only proper nouns change in frequency over time. How other, less unique, nouns and even verbs and transitional phrases alter frequency can give us real insight into what change really means.
The fourth part in the series of analyses reviews covers Clusters and N-Grams, both used within the AntConc program. This is a brilliant tool to learn more about notable cases from the Voyant tools
So if you remember when I talked about TermsRadio, there was a significant spike in the number of terms related to protests and the square in Istanbul where the Gezi protests happened. This however does not tell use much about how each term is portrayed. How would we know whether the language around these topics had even changed? It turns out that the Gezi protest seemed to have had a lasting affect on the way ‘police’ was discussed.
RezoViz takes a corpus of text, identifies which words appear in documents together, and assigns a weight according to the frequency of that appearance. For example, one document that has the words Istanbul, Erdogan, and Minister and a document with Istanbul, Erdogan, and CHP will add 1 unit of weight to the link between Istanbul and Erdogan. Across many documents, you can begin to see which topics are discussed together. Below is an example of a graph that can be drawn given this data.
The green lines represent the words between which there are links, i.e. between the words that appear in the same document. The numbers in circles represent the ‘weight’ with which each have a link, i.e. how many documents they appear in together. In this image, the terms displayed are the ones with the most links (above whatever threshold). I’ve highlighted the term ‘Turkey’, so that linked terms with ‘Turkey’ appear in red.
Several types of questions can be derived from this kind of representation. Are there unexpected links between terms? Are there no links between terms that would be expected to have links? What can the weights of each link tell us about the importance of links?
This is the second tool review and text analysis in this series. The first is here.
ScatterPlot irritates me and fascinates me at the same time. It purports to be able to measure how each word corresponds to other words in terms of frequencies. It uses “statistical analyses” to set each text input as a dimension then condense it into an easily visualized 3D or 2D space. I have no idea what that means. I’m not sure anybody is meant to be able to understand what “statistical analyses” are. Below is an example, using the first several days of my corpus.
If you look at the axes of this ‘3D’ graph, there are no labels. While it declares that it plots frequencies of words, it seems strange that you can have -1 frequency. In the upper left, three percentages explain how each term can be explained by each of the three dimensions, represented by the two axes and the fill of the blue-labeled terms (I assume).
The kicker about this tool is that it may be graphically displaying how many and which topics are covered in a given period. But we need to know more about it.
Digital humanities work allows us to ask new questions, questions that could not previously have been asked.
After getting the conceptualization around my project down, I picked a newspaper and an event to begin collecting data and analysis. My data are 1065 articles from the domestic section of Today’s Zaman from May 18 to July 15. I chose these dates because it surrounds two weeks before and two weeks after the height of the Gezi Park protests in Istanbul and Turkey in general. The protests became an extreme example of a crackdown on the news media in Turkey.
Another part of the project attempts to visually show the relationship between these events of media freedom violations, their effect, measures of variables that affect media freedom, and press freedom indices. The statistics for things like corruption measures, journalist killings and arrests, newspaper ad revenue, circulation, and readership were easy to find. The majority of this first step of research thus consisted of the text analyses.
I searched through many tools to analyze my corpus (the 1065 articles), but ended up focusing on a few. Many tools, such as scatterplot, looked interesting but were untrustworthy or unrevealing. This is the first part and tool that began the exciting stuff.
Introduction and Abstract to Press Freedom DH
A comparison between Turkey and Finland
Many measures of press freedom developed by advocacy groups, such as Reporters Without Borders and Freedom House, are simple aggregations of point scores derived from extensive questionnaires. The values that are shown on these indices are thus the results of several calculations of very different indicators of press freedom. This is to ensure that press freedom scores can be compared across countries and time periods.
However, the image created skews its interpretation. It portrays countries with similar scores as suffering from the same problems. For example, as the RWB recognizes itself, “Africa’s newest country [South Sudan] is torn by civil war and has an extremely polarized press. In Afghanistan, it is the state’s ability to guarantee media safety that is lacking.” Despite differing causes of low press freedom measures, South Sudan and Afghanistan have almost identical scores: 38.04 and 37.44, respectively. When searching for solutions, it does not help, and is in fact counterproductive, to believe that all countries suffer from the same problems.
Several pundits and non-profit advocacy groups have however no problem in pontificating why the best countries on the index occupy the position at the top. Finland has been a hot topic in press freedom circles because of the 13 years that the RWB has produced their index, the Scandinavian country has come out on top 11 of those years. In the debate to understand why, these reasons pop up:
Last Tuesday saw events that have since reverberated and escalated in Turkey. Two armed men claiming to be with an outlawed leftist political party called DHKP-C entered the largest courthouse in Europe a couple blocks from my apartment. They took hostage the prosecutor pursuing the case of a teenager who was hit with a tear gas canister in the Gezi Protests in 2013. He was sent into a coma and died nine months later. The militants claimed the prosecutor was dragging his feet on the case. Their actions did not do much to ease public perception of the alleged terrorist group.
One of the first things the militants did was hack a Twitter account and post a picture of the terrified lawyer, gun pressed aggressively against his head. The walls of the room had been draped with the flag of the DHKP-C. It was clearly meant to terrorize and send a message beyond this one case. (I’m not providing a link to the picture for reasons to be explained.) They demanded all accused and unaccused police officers record their confessions to the boy’s murder within 3 hours. If at 3:36 pm their demands were not met, the prosecutor would pay the price.
Several news publications published the picture from Twitter on their platforms. The Turkish government immediately imposed a media blackout. Many have been to quick to judge this as another example of outrageous censorship. Continue reading “Media ban behind the wall”