The Old Scholar's Historical Thoughts

November 29, 2011

HI 731 Trials

I have completed a draft of my paper and welcome feedback from everyone. As I completed the paper I was more intrigued than ever on why England didn’t have a revolution in the nineteenth century. This “last revolution” would seem to provide all the ammunition people needed to distrust the government and clamor for change. Yet it did not happen. I know this topic has been studied extensively. Working on this paper makes me want to read and investigate those studies.

May 14, 2010

My Project

November 21, 2009

Open Access is a funny thing

Just have a lot of random thoughts this week.

Looking at the articles on Open Access News, I saw that the British Library had digitized their 500,000 item but they charge you for looking at their stuff. On the other hand the British National Archives are free. Many of the collections at the Library of Congress are free, but our National Archives is having someone else do the digitizing and it will cost you a fee to see the data. Digitized US Maps are free, British maps cost money. Government logic and consistency seem to be mutually exclusive.

By the way – did everybody notice the note published on the Open Access News site that said the blog would not be kept up to date as much as the Open Access Tracking Project. That is a wiki with updates about OA.

A problem with Open Access is the Chaos that occurs when there are not “the accepted” places to go. It may be limiting to only have the 40 or so  journals on Victorian England to look at for research but that is a lot smaller than trying to link up every scholars site from every university and make sense of them. If you want to see Chaos, just try and track down all the standards that people propose for creating data sharing that Willinsky talks about. He references the Open Archives Initiative but that is only one of many. Which ones are being maintained? I have worked with the W3C before, so I know they are a standards body with clout, but how about MINH – who follows them, who uses them.

Now take that chaos magnify it millions of times and try to sift through scholary ouput – using the methods proposed by Wineburg discussed in chapter 11. Where did this come from? Who is the guy who wrote it? How can I trust what I found on the web?

One of the great things about coming back to school for me has been the access I get through the library to journals. I have found some great information. I went looking for an article the other day and found an abstract of exactly what I was looking for. I could not access it through the Library Catalog System. I went to the library and Mike was working there so I asked for some help – after all he’s in CLIO-1 like me, he must know all the answers. Well, we both learned something that day. The George Mason Library subscribes to the Oxford Journals. But they do not subscribe to the Oxford Journals before 1996. For those you have to try and get an inter-library loan or you can pay $36 per one day use of an article. Needless to say I did not pay the $36 to see how the one day use was enforced and I was not able to use the research someone else had done. So this is another form of Open Access that needs to go into Appendix 1 of Willinsky’s book. I think it is called “Open – but not for you”.

November 14, 2009

Another Project Idea

Professor Cohen pointed me to some sites which are doing some of the same things I am trying to do with my project of data standarization. He suggested that I try to cover the reasons data standardization has failed in the past and why people do not use standards that are available.

That has got me thinking of including in my proposal a workshop with the commander of the Joint Interoperability Test Command and the people in charge of conformance testing for imagery and intelligence projects with people from the Humanities. This workshop would discuss the problems that the Department of Defense has tackled in setting, enforcing and encouraging the development of standards.  And instead of just the government and academia it would be nice to get the people who set the standards for Business to Business communication (B2B) to participate. America’s entire business model depends upon these systems exchanging unambiguous data. I doubt if we can put bar codes on historical artifacts but I can see some areas where the problem space is the same. Interoperability of data standards is a problem that has NOT been solved in many places. Getting different viewpoints may benefit everyone. The output of the workshop could be a definition of the top 10 stumbling blocks and plans of action to tackle these stumbling blocks within the Humanities.

Does anybody else think this is a good idea? The Humanties have many different standards for specific problem spaces, i.e TEI, Museum Artifact Standards etc. What I am proposing is that these different areas step back, and get different ideas from outside their problem space. I am immersed in this stuff all day and think I see the relevance. Do you think all these people from the different organizations would even want to work together? I think the DoD would be willing to work on this; after all the Internet began out of a defense project. Lessons learned by organizations dedicated to interoperabilty may be useful as the Humanites tackle the same problem space.

Words of wisdom

There are times when it is better to remain quiet and be thought a fool, than open your mouth and remove all doubt. (Attributed to Abraham Lincoln, but I can’t find the source)

This week I am going to sort of take this to heart. There are two great posts, one at Lynn’s site, and one at Carl’s site, that discuss data visualization very well. Instead of highlighting my foolishness on this site, I will remain quiet and provide my observations as comments on their sites.

I will comment on Professor Cohen’s article about trying to make sense of digital research with data mining techniques. We are at the forefront of data mining in the digital age. In the past historians were limited by time and distance into what they could review and try to correlate. Looking at the court records of 19th century Britain is possible – correlating the data between jurisdictions and developing time analysis is daunting. Taking this court data and correlating it with social data, such as parish records or economic data such as tax receipts is impossible, except for isolated cases. If all this data was digitized and correctly tagged historians could write queries that asked for correlations between data sets for whole sections or all of Britain. Trends of the whole country could be reviewed. Even if the data wasn’t tagged, if an API existed like the Google API for filtering queries within parameters, or H-bot could be used, the data could yield new correlations which historians have not yet theorized or investigated. We are lucky to be at the forefront of this trend, but only if we take advantage of what is there and get involved in setting the direction for future historians.

Getting to the problem of Abundance or Scarcity it seems as if historians will have some great tools on data born digital. For instance the British Court System now digitizes all records and even has a link that allows researchers to sign up for, and use these digital records. Not only are some records born digital – they are being prepared for researchres of the future.


OOPS! I guess I removed all doubt about my being a fool. 😉

November 11, 2009

Just a quick note – Readings vs Wordle

Moretti (P51)  said there is a  change in how people used pronouns in novels which showed an increased sense of community. He references Elizabeth Gaskell’s, Cranford. He said it starts off with the word “Our” and ends with “Us.” However if you use Wordle and accept common words (Cranford Wordleized) plural pronouns are small and singular pronouns are large.

I think Moretti was just seeing what he wanted to see and not what was really there.

November 8, 2009

Another possible source of background material

Google Reader gives me suggested blogs that I might be interested in built on what I have been searching for and reading off of the reader. Today they gave me a link to a project called DARIAH which is Digital Research Infrastructure for Arts and Humanities. The DARIAH project is very interesting to me -probably not so much to most everybody else.

But what is interesting is the Arts and Humanities.Net Web site, that they originally pointed me to.  You can go to their Tools tab and get a rundown of the different types of tools used to digitize artifacts, store data, etc. You can go to projects and look up projects that do things that are close to what you are doing for your project. They discuss the project, the different areas the project touches and then give you a link to the projects web site. There were some interesting digitization projects that incorporate the digitization of old records and the incorporation of Web 2.0 features. For instance there was The Old Bailey, which digitized the criminal proceedings of London courts (Digitization), added keyword searches (Data Mining tools) and set up a Wiki for people to add information about the people discussed in the proceedings (Web 2.0)

There were different map projects  one of which looked at medieval town plots in Wales and created 3d rendtions of them. There were projects that put up art works with discussions about them. This might be a place where you can find more information about types of projects already being done for your paper.  The site was

November 6, 2009

Elementary my dear data miner, elementary

The readings this week were very interesting. I was fascinated with the talk by Norvig. I did some AI work a long time ago using Neural Nets and “computer learning,” for Military Intelligence. We were trying to determine the bad guys future course of action by viewing what our sensors were telling us in the present. Norvig’s whole section where he talks about Google feeding in raw data and then attempting to develop classes of similar words and concepts through statistical computations takes that thinking to a much higher level. The software does not have to be programmed with preconceived notions. Pattern matching and statistical algorithms transform raw data into abstract concepts.

As Norvig pointed out in the drug example, what the publishers of web pages provide is different from what searchers are looking for. Wouldn’t it be neat to take a corpus of work such as Victorian British parliamentary debates  and develop candidate classes of data using Google’s algorithms. Then you could take the British newspapers from the same time period and see what similar classes would yield.  Finally you could take pamphlets from Trade Unions or religious sermons or popular novels and correlate all of the different classes.  Comparing generated classes from different data sources could provide insights into controversy and tension between groups that people have not yet researched. It might provide links between people and organizations that no one has realized was there before.

Of course this all depends upon the amount of data you can process. Norvig wants billions of data points.  As Leary pointed out, the Victorians on the web can give us more data than other time periods because we don’t have to worry about the copyright. But Victorain data  is also limited because of limitations on scanning of primary sources. Like he said only the Scotsman and the Times are fully available and most others are not.

Using data mining with a robust set of works could give a researcher a starting point for further research.  Given this starting point  going to collections of letters, background notes and other sources to put context around the candidate data classes would need to be done. I think it would be a lot of fun figuring out the puzzle of why the algorithm found different relationships depending upon the data sources. We could all  play Sherlock Holmes.

Fascinating, fasinating way to look at research.

An issue from class last week

I think it was Hal who asked why there was not a standard for playing video, to which I replied “Bill Gates would love for there to be a standard, and he would like to set it.”  Then Dr. Cohen said that the new HTML-5 was defining a video element that all HTML5 browsers would have to support. I wanted to see more about the whole controversy, so I looked up HTML5 to try and understand why there wasn’t a standard. A great article by Paul Ryan discusses HTML5.

Basically the standards committee still can not get everyone to agree on the standard to be used in HTML5 and have left it open.

It boils down to Apple and Google supporting one standard (H.264) and Mozilla and Opera favoring Ogg Theora ( I don’t know where they come up with these names.) When you read further you find these standard wars hinge on two things. First is the supposed technical superiority of one (H.264) over the other. The second is Open Source vs non-Open Source software. Mozilla and Opera cannot put H.264 in their browsers because it contains code that is patented. It they included the patented routintes they would violate their Open source license to be able to share their code freely. Apple and Google say you don’t know what lurking patents are in Ogg Theora so you can’t trust that either. I’m glad we had the discussion in class of Open Source and how you could not fence off some part of the work from the patented work, otherwise these explanations would not have made sense to me.

It seems as if Google prefers H.264 for Chrome, but they will support Ogg Theora. Apple will be a bad guy this time, supporting only a patented technology. Microsoft gets around it by saying they just are not going to support HTML5 completely. Ryan said “My inner pessimist suspects that Microsoft will finally get around to implementing HTML 5 video at the same time that the H.264 patents expire, in roughly 2025.”

If you want to see how passionate people get about these debates I cut out a quote from one of the comments about this controversy from (I am leaving out the really offensive things) He said that companies make decisions about standards because they don’t want to be sued by  “some scum sucking, syphillitic pus-drinking, rotting corpse-devouring and worm-infested defecation-eating patent troll.”  I assume he is somewhat of a fan of  the Open Source movement.

November 2, 2009


I looked at Open Library and Google Books and agree with many of the observations of the other people in the class. I had actually downloaded a PDF copy of Greater England from Charles Dilke (1899) for another class last semester from Google Books. I found it interesting because I was able to use my full edition of Adobe Acrobat to take notes, highlight and index the book to the things that were interesting to me. I linked that into my Zotero library which allows me to have my research right there and I don’t have to go search my bookshelf for the book,  and then rummage through my notes to find what I am looking for. I find it much easier to read an honest to goodness book, but the ease of keeping track of things for research makes eBooks very useful.

Another source for eBooks is Project Gutenberg. This site allows the community to contribute to the collection. It has a tools area where you can see the different tools that people use to scan the books. The interface is not as slick as Google but it does offert some audio versions of the books.  The books are in different formats -WRITTEN text, pdf, AUDIO – MP3, Apple etc. Since I drive an hour or so per day to get to work, I’m amazed at how many books I have been able to “read” in the audio format. Getting them from the library or on-line from Project Gutenberg is a great time/money saver. Books under copyright I can get from the Library for free. Books out of copyright I can get from sites like this.

