Blog

Text Analysis through Voyant

Voyant is a text analysis system that can create data visualizations of documents through word maps and frequency of use calculations. Those visualizations can show a reader at a glance, what are the main topics of a document as well as how the language of the document is being used. While text analysis software has limitations, the text itself must be read by optical character recognition (OCR), it can create an interactive and visually impressive display for the reader. In my quest to create digital history, I have been experimenting with Voyant to create text analysis of Native American economic status through the loss of territory in the 20th century. In order to utilize the technology I will be following SDSU’s workshop which you can find here.

The workshop began with a question and answer portion that gave a brief overview of Voyant as a data visualization source and the pros and cons of text analysis. By engaging in text mining, scholars should be prepared to interpret the data that the computer spits out. Computers are very good at computational analysis and quantifying sources but they lack the ability to read. There are situations where context matters and in those cases, computations should not be the only form of analysis. Another limitation of text analysis is the reliance of modern software on contemporary word banks that may have shifted from their usage in past periods. Despite some of the limitations of computational text analysis, it remains a potent tool for digital historians to utilize in understanding and interpreting text.

Getting a document ready for text analysis requires that only the plain text is fed into the software. The plain text is just that, plain. There should be no bolded words, underlines, italics, or anything else other than punctuation and text. For online sources that includes stripping away the excess code or HTML to present a clean text. An advantage of using Voyant is its ‘language agnostic’ qualities. What that means in plain English, is that Voyant does not restrict its text analysis capabilities to certain languages. You could throw any written language into the text analysis and Voyant would be able to recognize it and create useful data visualizations from it.

Once your document is prepared, you can upload it to Voyant or link the URL if you are using an online source. In my case, the source I am using is from the San Diego History Journal and I could paste the URL into the text analysis field. The article I chose is “The Removal of the Indians of El Capitan to Viejas: Confrontation and Change
in San Diego Indian Affairs in the 1930s,” by Tanis C. Thorne.

The initial view that Voyant creates can be separated into five sections; A word cloud view, a digital reader, word trend analysis, a summary of frequently used words, and finally context linking. Each of those sections can be individually manipulated to include, exclude, or analyze word groups and individual words. In my case I can eliminate ‘San’ and ‘Diego’ from the analysis because Voyant counts those as two separate words rather than a singular place. I can also eliminate those from the analysis because the article’s focus is on San Diego County and therefore no meaningful analysis could be made by including ‘San Diego’ in my data visualization. In order to make that elimination I can go to the ‘define options for this tool’ and start to edit different ‘stopwords.’

Each of the tools in Voyant can be manipulated by itself or as part of a larger project across the entire window. The original five tools can all be replaced by other visualization methods from graphs and charts to maps and word clouds, Voyant can create many forms for textual analysis. Each of the tools can also be exported or embedded individually as well. Voyant offers a level of flexibility for the digital historian to create and visualize the data that offers the best interpretation of the original source material. While Voyant may have some issues in the scope of its abilities and OCR system in general, it creates an accessible and simple workspace for  historians to make meaningful interpretations of source material.