The Old Scholar's Historical Thoughts

November 6, 2009

Elementary my dear data miner, elementary

Filed under: Readings — theoldscholar @ 9:01 pm

The readings this week were very interesting. I was fascinated with the talk by Norvig. I did some AI work a long time ago using Neural Nets and “computer learning,” for Military Intelligence. We were trying to determine the bad guys future course of action by viewing what our sensors were telling us in the present. Norvig’s whole section where he talks about Google feeding in raw data and then attempting to develop classes of similar words and concepts through statistical computations takes that thinking to a much higher level. The software does not have to be programmed with preconceived notions. Pattern matching and statistical algorithms transform raw data into abstract concepts.

As Norvig pointed out in the drug example, what the publishers of web pages provide is different from what searchers are looking for. Wouldn’t it be neat to take a corpus of work such as Victorian British parliamentary debates  and develop candidate classes of data using Google’s algorithms. Then you could take the British newspapers from the same time period and see what similar classes would yield.  Finally you could take pamphlets from Trade Unions or religious sermons or popular novels and correlate all of the different classes.  Comparing generated classes from different data sources could provide insights into controversy and tension between groups that people have not yet researched. It might provide links between people and organizations that no one has realized was there before.

Of course this all depends upon the amount of data you can process. Norvig wants billions of data points.  As Leary pointed out, the Victorians on the web can give us more data than other time periods because we don’t have to worry about the copyright. But Victorain data  is also limited because of limitations on scanning of primary sources. Like he said only the Scotsman and the Times are fully available and most others are not.

Using data mining with a robust set of works could give a researcher a starting point for further research.  Given this starting point  going to collections of letters, background notes and other sources to put context around the candidate data classes would need to be done. I think it would be a lot of fun figuring out the puzzle of why the algorithm found different relationships depending upon the data sources. We could all  play Sherlock Holmes.

Fascinating, fasinating way to look at research.

Advertisements

4 Comments »

  1. I liked your post. It helped me to understand quite a bit of what Norvig was presenting. There were still a few things that I am not sure about though. Maybe you could explain them to me.
    What do you mean by ‘classes’ I liked the idea of scanning all of the Victorian resources (parliamentary debates, sermons, pamphlets, and newspapers) and finding patterns, but I don’t know what candidate classes are. I feel like you have something really neat here. Then again, you may have been explaining something that 5th graders are learning this year, I just don’t know this kind of stuff.

    Comment by hbarthold — November 10, 2009 @ 5:35 pm | Reply

  2. Let me clarify that last one, I understand that classes mean subjects or themes, I just don’t know how they are created.

    Comment by hbarthold — November 10, 2009 @ 5:45 pm | Reply

  3. John –

    I agree, data mining is a crucial element emerging from these tools. You nailed it. The issue is how to best set mining tools in a way that maximizes efficiency, and as Norvig pointed out, connects the right users with the right seekers.

    DGQ

    Comment by DeadGuyQuotes — November 10, 2009 @ 6:04 pm | Reply

  4. test

    Comment by DeadGuyQuotes — November 14, 2009 @ 2:21 pm | Reply


RSS feed for comments on this post. TrackBack URI

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

Blog at WordPress.com.

%d bloggers like this: