This week, I selected a chapter from the thrilling 1877 page turner, The Past and Present of Lake County Illinois, specifically “Abstract of Illinois State Laws.”
When I looked at the plain text from Google Books, I wasn’t really surprised by its clarity and accuracy. The typeface seemed pretty straightforward, so it made sense to me that the text was pretty correctly reflected in the plain text rendering. There were some odd spots, like where a page ended and the next one began with page numbers (I think…). The layout of the text was also odd. It was really closely spaced, almost with lines overlapping at points. I don’t know if that was a function of the OCR or the book or what, but it could affect people with poor vision or dyslexia since the letters are so close together. Overall, it was pretty easy for me to analyze the text and identify where the weird errors might be coming through, but someone with learning or vision difficulties might not agree with me.
Since this document was a legal document, this word cloud produced by Voyant also did not surprise me. The chapter lays out different laws in a certain order, which explains the numeric term. The fact that widow appears as often might be surprising, since the document does predate the advent of the modern welfare system, and comes from a time when widows were often left to their own devices. But since part of the document deals with wills and inheritance, it does make sense that it talks a lot about widows, husbands, and descendants. It tells people how to decide who gets what from a deceased person’s estate.
The subject of inheritance is reinforced by the phrase function in the bottom corner. I liked the word cloud more than the word count in that corner, but the addition of the phrase counter was interesting and useful. I wouldn’t have thought to look for the way words and phrases occur together if it hadn’t been there.
I think in another situation, the context function could have been interesting as well. “Shall” was the most used word in the chapter, which is logical since it is a legal document. Therefore, it’s basically the same concept over and over again both before and after the word. I’m interested to see how other students saw their context function with different texts.
I entered the words shall, wife, and descendants into the N-Gram viewer. I was really surprised by the rapid decline in the usage of the word shall. It does seem old-fashioned, but apparently it truly is…
I had trouble using the HathiTrust site since my ISP wouldn’t let me on, giving me an error message that said the site could be under attack. I have used it in the past, though, so this is based off my recollections. I don’t think its OCR was as good as Google’s, but few things in this world are as well-funded as Google, so that isn’t a total shock. It could have more trouble indexing the text from its books than the N-Gram function.
After working with both the digital images and plain text, I’m not sure which one I prefer. When fonts are really hard to read, the plain text is definitely helpful, but sometimes the layout and the errors are distracting enough to make it hard to read. Like I mentioned in the beginning of the post, I do wonder what the reactions of people with learning and vision difficulties think about each option. I would hate to give a student a resource that ends up being really hard for them to use, but I don’t know what the alternative is if both of them are bad.
Voyant could be a really interesting resource for fiction, I think. It would make it easier to track concepts and phrases throughout a lengthy text. I can see it being used well in conjunction with something like Sparknotes, but don’t tell the high schoolers of America that. Using the most popular phrases and words in N-Gram or Bookworm would help researchers see concepts over time, too. I could see that being a great tool for people looking through a lot of newspaper or magazine articles and wondering when something was most important to a certain group of people. I would recommend using Voyant to people studying trends in literature, and combining Voyant and a word usage visualizer for historians, sociologists, or anthropologists trying to chart something over time.