The Old Scholar's Historical Thoughts

September 12, 2009

So much history I can’t understand

Filed under: Readings — theoldscholar @ 2:30 pm

The article by Rosenzweig, Scarcity or Abundance? Preserving the Past in a Digital Era referenced by both Cohen in the Interchange article and by Turkel in his post highlights the problems historians in the future will have. They will have way too much data that they will not be able to read.  How many of us still have 3 ¼ inch floppies at our house that our computers don’t read anymore? (Don’t tell anyone – I still have an 8 inch floppy at my house.) This topic really hit home for me because of where I work. We are required, by law, to keep all of our test data on military systems. We collect huge amounts of data. On one test we filled up three warehouses full of magnetic tapes – in a unique format developed for tape drives that can only be read using an IBM tape drive that runs on an IBM 360 in EBCDIC. We have the data – there are no machines that can read the data. I also worked with Fitzimmons Army Hospital when they were thinking about transferring all of their data to CD’s back in the 1990’s. They wanted to be able to quickly access data through searches etc. It was imperative that these records lasted for 150 years due to lawsuits. (They had just been involved in a lawsuit by the descendants of a man who had died at age 72. The descendants claimed the medical problems of the deceased stemmed from a doctors mistake when he was born at Fitzimmons 72 years before.)  After studying the state of the art at the time we concluded that the least risky format was to keep the paper records. The cost of storage would be considerable However, the next best format was micro-fiche which had shown to last longer than CD’s.  In this case the cost of storage was less, but there was a cost to microfiche the data and they would still not be able to achieve their request to easily search and retrieve information.

There is one area that has bothered me for the longest time discussed in the Digital History book which is the taxonomy of data and how it is accessed. There is a very good article about this topic called The Petabyte Problem by Alex Antunes.  Here is his statement of the problem

Let’s look at the example of the starship Enterprise from Star Trek.  It has a nigh omniscient computer and a vast array of sensors.  Commander Data wants to find a hidden Romulan ship.  He talks to the computer to use it.  Here is his technique [paraphrased]:

“Computer, visual onscreen now.”  [no ship seen]
“Computer, show thermal signatures.”  [still no ship seen]
“Computer, show subspace anomalies.” [voila!  the outline of a Romulan ship appears!]

This is a terrible model for attacking a large data set!  A good system would work like this:

“Computer, show me all ship-like objects, in any profile.  Ah, there it is.”

Why should a scientist have to drill down through all possibilities?  The problem becomes worse when scientists access data from different domains.  A sociologist may want to explore ‘temperature’ with ‘crime rate’ for different regions of a city, to see if there is a connection.

As historians, we are always trying to look at data in different ways. Defining the taxonomy of data and providing access to data in new and different ways allows us to have historical discussions. If we couldn’t interpret data differently there would be no historical discussions and a lot of history professors would be out of jobs.  Antunes proposes the creation of a person called a “Data Researcher” who develops maps of data – not unlike the map at the Smithsonian site that shows patterns of data depending upon the way the user wants to see the data. I believe that is what historians do, the New Media needs to be harnessed so we can do that job better.

Advertisements

3 Comments »

  1. An interesting example of the classification of objects and trying to search for information is at the Bronte Museum site (http://bronte.adlibsoft.com/) The site is very stark and uninviting for anyone except someone interested in doing primary research. It has some interesting features in the advanced search and expert search area. In the advanced search are they give you a list of terms that could be used for “source” or “Object name”. If you partially fill in the box and hit the icon it will position the list at that point. Not an automatic fill in but pretty useful. In the expert search area it will allow a technical wizard to create a unique Boolean query for the SQL search. So a researchers computer expertise will influence the ease of getting information.

    Wouldn’t it be great if there was a real person who was an expert on the collection and how it was cataloged who you could give the general outline of what you were looking for and they could manipulate the collection to show you things that may have a bearing. The quest for semantic understanding in query building goes on!!

    Comment by theoldscholar — September 13, 2009 @ 11:05 am | Reply

  2. John –

    There ARE a lot of historians out of a job. Your point is well-made, however, there will be more if we don’t better grapple with this. You are also onto something in the query “Wouldn’t it be great if there was a real person who was an expert on the collection.. [to] manipulate the collection…”

    There is an element of technical experitise and familiarity with the data that suggest historical data engineers as an emergine field. This person, for example, would have some idea about the data in the mountain of Clinton White House emails and would be able to offer certain strains of logic or event that can be ascertained only through familiarity with the data.

    Unfortunately, historians are expected to sit down with the shoe-boxes (digital or analog) full of old letters and sort out rationale, events, reason, emotion, motive, etc on behalf of the writer and the recipient. This expectation assumes that no person (or thing/computer) is available to draw inferences and relationships. Further, I can’t imagine an economic reason to create that job.

    Query building opens up another chasm of opporunity. A skilled database developer has to have an understanding of the data when building queries… some sense of “what right looks like” that will expose errors in the query logic OR errors in the data. To attempt to develop a solution will demand fully normalized data in history complete with detailed meta-tags/data/descriptors. Solutions are available, but these are labor intensive and can easily run amok. Look at the search engine on the National Security Archives at GWU for example: http://www.gwu.edu/~nsarchiv/

    Comment by DeadGuyQuotes — September 14, 2009 @ 6:43 pm | Reply

    • You are right – there is no economic reason for creating that job. There is an economic reason for developing an automated natural language semantic aware search engine that could work with raw data – not normalized. I was in discussion today with a firm that uses parallel grid computing to solve computational intensive problems and we discussed that exact problem. He had experience in the intelligence community and knew exactly what I was talking about, but had not thought of it in terms of historical research.

      However, if the understanding of language and nuances could be programmed – what need is there for historians all coming up with their own version of “what really happened.”

      Comment by theoldscholar — September 14, 2009 @ 8:23 pm | Reply


RSS feed for comments on this post. TrackBack URI

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

Blog at WordPress.com.

%d bloggers like this: