{"id":197,"date":"2022-02-20T21:46:01","date_gmt":"2022-02-21T05:46:01","guid":{"rendered":"https:\/\/wp.csusm.edu\/rsheehan\/?p=197"},"modified":"2022-05-22T12:47:53","modified_gmt":"2022-05-22T19:47:53","slug":"text-analysis-through-voyant","status":"publish","type":"post","link":"https:\/\/wp.csusm.edu\/rsheehan\/2022\/02\/20\/text-analysis-through-voyant\/","title":{"rendered":"Text Analysis through Voyant"},"content":{"rendered":"<p>Voyant is a text analysis system that can create data visualizations of documents through word maps and frequency of use calculations. Those visualizations can show a reader at a glance, what are the main topics of a document as well as how the language of the document is being used. While text analysis software has limitations, the text itself must be read by optical character recognition (OCR), it can create an interactive and visually impressive display for the reader. In my quest to create digital history, I have been experimenting with Voyant to create text analysis of Native American economic status through the loss of territory in the 20th century. In order to utilize the technology I will be following SDSU\u2019s workshop which you can find <a href=\"https:\/\/youtu.be\/EqtA9Mel9KY\">here.<\/a><\/p>\n<p><span style=\"font-weight: 400\">The workshop began with a question and answer portion that gave a brief overview of Voyant as a data visualization source and the pros and cons of text analysis. By engaging in text mining, scholars should be prepared to interpret the data that the computer spits out. Computers are very good at computational analysis and quantifying sources but they lack the ability to read. There are situations where context matters and in those cases, computations should not be the only form of analysis. Another limitation of text analysis is the reliance of modern software on contemporary word banks that may have shifted from their usage in past periods. Despite some of the limitations of computational text analysis, it remains a potent tool for digital historians to utilize in understanding and interpreting text.<\/span><\/p>\n<p><span style=\"font-weight: 400\">Getting a document ready for text analysis requires that only the plain text is fed into the software. The plain text is just that, plain. There should be no bolded words, underlines, italics, or anything else other than punctuation and text. For online sources that includes stripping away the excess code or HTML to present a clean text. An advantage of using Voyant is its \u2018language agnostic\u2019 qualities. What that means in plain English, is that Voyant does not restrict its text analysis capabilities to certain languages. You could throw any written language into the text analysis and Voyant would be able to recognize it and create useful data visualizations from it.<\/span><\/p>\n<p><span style=\"font-weight: 400\">Once your document is prepared, you can upload it to Voyant or link the URL if you are using an online source. In my case, the source I am using is from the San Diego History Journal <\/span><span style=\"font-weight: 400\">and I could paste the URL into the text analysis field. The <a href=\"https:\/\/sandiegohistory.org\/journal\/v56-1\/v56-1thorne.pdf\">article<\/a> I chose is &#8220;The Removal of the Indians of El Capitan to Viejas: Confrontation and Change<br \/>\nin San Diego Indian Affairs in the 1930s,&#8221; by Tanis C. Thorne.<\/span><\/p>\n<p><img fetchpriority=\"high\" decoding=\"async\" class=\"aligncenter wp-image-201 size-large\" src=\"https:\/\/wp.csusm.edu\/rsheehan\/wp-content\/uploads\/sites\/29\/2022\/02\/2022-02-20-3-1024x683.png\" alt=\"\" width=\"640\" height=\"427\" srcset=\"https:\/\/wp.csusm.edu\/rsheehan\/wp-content\/uploads\/sites\/29\/2022\/02\/2022-02-20-3-1024x683.png 1024w, https:\/\/wp.csusm.edu\/rsheehan\/wp-content\/uploads\/sites\/29\/2022\/02\/2022-02-20-3-300x200.png 300w, https:\/\/wp.csusm.edu\/rsheehan\/wp-content\/uploads\/sites\/29\/2022\/02\/2022-02-20-3-768x512.png 768w, https:\/\/wp.csusm.edu\/rsheehan\/wp-content\/uploads\/sites\/29\/2022\/02\/2022-02-20-3-1536x1024.png 1536w, https:\/\/wp.csusm.edu\/rsheehan\/wp-content\/uploads\/sites\/29\/2022\/02\/2022-02-20-3-2048x1365.png 2048w, https:\/\/wp.csusm.edu\/rsheehan\/wp-content\/uploads\/sites\/29\/2022\/02\/2022-02-20-3-600x400.png 600w\" sizes=\"(max-width: 640px) 100vw, 640px\" \/><\/p>\n<p><span style=\"font-weight: 400\">The initial view that Voyant creates can be separated into five sections; A word cloud view, a digital reader, word trend analysis, a summary of frequently used words, and finally context linking. Each of those sections can be individually manipulated to include, exclude, or analyze word groups and individual words. In my case I can eliminate \u2018San\u2019 and \u2018Diego\u2019 from the analysis because Voyant counts those as two separate words rather than a singular place. I can also eliminate those from the analysis because the article\u2019s focus is on San Diego County and therefore no meaningful analysis could be made by including \u2018San Diego\u2019 in my data visualization. In order to make that elimination I can go to the \u2018define options for this tool\u2019 and start to edit different \u2018stopwords.\u2019<\/span><\/p>\n<p><img decoding=\"async\" class=\"size-medium wp-image-200 alignnone\" src=\"https:\/\/wp.csusm.edu\/rsheehan\/wp-content\/uploads\/sites\/29\/2022\/02\/2022-02-20-300x200.png\" alt=\"\" width=\"300\" height=\"200\" srcset=\"https:\/\/wp.csusm.edu\/rsheehan\/wp-content\/uploads\/sites\/29\/2022\/02\/2022-02-20-300x200.png 300w, https:\/\/wp.csusm.edu\/rsheehan\/wp-content\/uploads\/sites\/29\/2022\/02\/2022-02-20-1024x683.png 1024w, https:\/\/wp.csusm.edu\/rsheehan\/wp-content\/uploads\/sites\/29\/2022\/02\/2022-02-20-768x512.png 768w, https:\/\/wp.csusm.edu\/rsheehan\/wp-content\/uploads\/sites\/29\/2022\/02\/2022-02-20-1536x1024.png 1536w, https:\/\/wp.csusm.edu\/rsheehan\/wp-content\/uploads\/sites\/29\/2022\/02\/2022-02-20-2048x1365.png 2048w, https:\/\/wp.csusm.edu\/rsheehan\/wp-content\/uploads\/sites\/29\/2022\/02\/2022-02-20-600x400.png 600w\" sizes=\"(max-width: 300px) 100vw, 300px\" \/> <img decoding=\"async\" class=\"size-medium wp-image-199 alignnone\" src=\"https:\/\/wp.csusm.edu\/rsheehan\/wp-content\/uploads\/sites\/29\/2022\/02\/2022-02-20-2-300x200.png\" alt=\"\" width=\"300\" height=\"200\" srcset=\"https:\/\/wp.csusm.edu\/rsheehan\/wp-content\/uploads\/sites\/29\/2022\/02\/2022-02-20-2-300x200.png 300w, https:\/\/wp.csusm.edu\/rsheehan\/wp-content\/uploads\/sites\/29\/2022\/02\/2022-02-20-2-1024x683.png 1024w, https:\/\/wp.csusm.edu\/rsheehan\/wp-content\/uploads\/sites\/29\/2022\/02\/2022-02-20-2-768x512.png 768w, https:\/\/wp.csusm.edu\/rsheehan\/wp-content\/uploads\/sites\/29\/2022\/02\/2022-02-20-2-1536x1024.png 1536w, https:\/\/wp.csusm.edu\/rsheehan\/wp-content\/uploads\/sites\/29\/2022\/02\/2022-02-20-2-2048x1365.png 2048w, https:\/\/wp.csusm.edu\/rsheehan\/wp-content\/uploads\/sites\/29\/2022\/02\/2022-02-20-2-600x400.png 600w\" sizes=\"(max-width: 300px) 100vw, 300px\" \/><\/p>\n<p><span style=\"font-weight: 400\">Each of the tools in Voyant can be manipulated by itself or as part of a larger project across the entire window. The original five tools can all be replaced by other visualization methods from graphs and charts to maps and word clouds, Voyant can create many forms for textual analysis. Each of the tools can also be exported or embedded individually as well. Voyant offers a level of flexibility for the digital historian to create and visualize the data that offers the best interpretation of the original source material. While Voyant may have some issues in the scope of its abilities and OCR system in general, it creates an accessible and simple workspace for\u00a0 historians to make meaningful interpretations of source material.<\/span><\/p>\n","protected":false},"excerpt":{"rendered":"<p>Voyant is a text analysis system that can create data visualizations of documents through word maps and frequency of use calculations. Those visualizations can show<\/p>\n","protected":false},"author":64,"featured_media":0,"comment_status":"closed","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"footnotes":"","_links_to":"","_links_to_target":""},"categories":[1],"tags":[],"class_list":["post-197","post","type-post","status-publish","format-standard","hentry","category-blog"],"_links":{"self":[{"href":"https:\/\/wp.csusm.edu\/rsheehan\/wp-json\/wp\/v2\/posts\/197","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/wp.csusm.edu\/rsheehan\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/wp.csusm.edu\/rsheehan\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/wp.csusm.edu\/rsheehan\/wp-json\/wp\/v2\/users\/64"}],"replies":[{"embeddable":true,"href":"https:\/\/wp.csusm.edu\/rsheehan\/wp-json\/wp\/v2\/comments?post=197"}],"version-history":[{"count":0,"href":"https:\/\/wp.csusm.edu\/rsheehan\/wp-json\/wp\/v2\/posts\/197\/revisions"}],"wp:attachment":[{"href":"https:\/\/wp.csusm.edu\/rsheehan\/wp-json\/wp\/v2\/media?parent=197"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/wp.csusm.edu\/rsheehan\/wp-json\/wp\/v2\/categories?post=197"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/wp.csusm.edu\/rsheehan\/wp-json\/wp\/v2\/tags?post=197"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}