The last blog post was about my first few weeks as the Digital Humanities Intern and the process that my supervisors and I went through to create a research project. In that vein, this week’s post will talk about the first step that I took to answer my research questions in early Fall 2018.
Edmon Low Library at Oklahoma State University is home to the Oklahoma Oral History Research Program, as well as its many digital oral history collections, which are highly informative and reflective of the state’s industry, history, and culture. My supervisors decided that these collections might help answer some of our research questions and I could learn some useful technology along the way. I began scouring these collections for people who had lived in Stillwater or attended Oklahoma State during the Great Depression. I came up with a list of key words (such as the college’s name, Stillwater, etc.) which I entered in the oral history’s collection search engine. Often, the search yielded a few transcripts, which I and saved. Unfortunately, there were a few issues. While many of the interviews talked about the school, what they were saying was about a different period. Many of the other transcripts just mentioned OSU in passing. I ended up with very little information, but we decided to continue, as the tools I would learn how to use are useful to know.
For this project, I learned how to use AntConc, which lists all the words in a set of texts from most used to least used. There are multiple uses for this (it’s more useful than it sounds) but essentially, this program gives you an unobstructed view of what the interviewees were talking about. Take for instance, the oral histories from the Dust, Drought, and Dreams Gone Dry oral history collection. Based on the list of words generated by the software, the most used word was ‘house’ and family-related words were also very popular. These words give us a glimpse into what was important to the interviewees.
To use AntConc, I first had to ‘clean’ the transcripts. This involved removing the interviewer’s questions, the name of the speaker, and information in brackets because they would skew that data. I also deleted answers that wouldn’t add any pertinent information, such as “Yes, yeah, no.” These phrases are only affirming what the interviewer has asked, so the phrase loses its context and is subsequently useless. Unfortunately, there were times that sizable text sections were deleted because of these rules. Additionally, I created a stop word list. Stop words are words like, “a, am, and, etc” and while they are useful for knowing the context of a noun in a sentence, it will add unnecessary words in our word list. Adding a stop word list makes AntConc ignore these words.
As I was finishing this project and looking at my data, I began to think about the idea of ‘cold hard facts.’ Most people, unfortunately, think that history, science, and computer generated information is a set of ‘cold hard facts.’ What I learned from this project is that computer generated information is often as reflective of the person entering the information as it is of the content being analyzed. That information is selected by a human, the questions that the interviewer asks shapes the answers, and how the information is interpreted is defined by me. While there’s certainly less room for human error, it is still there. I think that this shows us that a computer can’t tell history for us: sometimes we must accept that even though we can’t tell history perfectly, we are the only ones who can tell history. And that, I think, is kind of awesome.