Archive for March, 2006

A Model for Semantically Encoded Scholarship ?

Three stages:

  1. Source
  2. Interpretation
  3. Statement

For the source, we have TEI. This lets us mark-up different parts of the source document according to what they are. What is also needed is some way of encoding what inferences you draw from the source document about the world – a way of saying “I think this part of this document means that…”. In other words, a schema for encoding statements of knowledge. In other words, RDF.

Interpretation

Then, what is needed is a way of mapping between Source and Statement, a way of saying:

This part (or ‘these parts’) of this document (or ‘these documents’) lead me to make this (or these) Statement(s) of Knowledge.

As a language designed for transforming one XML document into another, XSL seems pretty ideal for this. Of course, you might use any scripting language to do this, but the interpretation stage is both data, and a set of instructions, so the fact that XSL is XML too, lends it to this task. This means you can use the same set of tools for creating, storing and querying Source, Interpretation, and Statement.

An example of how this might work in practice:

You have a TEI encoded version of a Book Auction Catalogue. It consists of lists (listBibl) of books (bibl). You think this list means something, so you write a template:

<xsl:template match="tei:listBibl/bibl">
<xsl:text> This Book, </xsl:text>
<xsl:apply-templates/>
<xsl:text>, was sold on </xsl:text>
<xsl:value-of select="tei:date[@xml:id ='this-is-the-date-of-the-auction' ]"/>
</xsl:template >

Note: This is a simplified example. The idea is to output RDF rather than plain text (which I have used here because I am not exactly sure yet how an equivalent machine readable statement would best be expressed). You would also want to include many more details in your statement, pulled in from other parts of the document, as well as other documents perhaps – details identifying you, the source(s), place, probability. The point here, is that using XSL, you can define your interpretation of the document(s).

XSLT simultaneously produces and documents your interpretation. If you then discover that all the books in the catalogue were previously owned by a Rev. John Smith, you can edit the interpretive XSLT file to represent that too (dutifully documenting your change in interpretation in some way). And because you are (hopefully) generating your Statements of Knowledge dynamically from your interpretative XSLT file, when your interpretation changes, your statements change accordingly.

Statements of Knowledge

Ok, we use RDF for representing statements of knowledge – but how? what might these statements look like?

Well, one starting point (at least) could be to look at the W6 vocabulary invented by Danny Ayers in 2004.
The W6 is

an ontology to allow resource descriptions and reasoning based on the 6 questions : who, why, what, when, where, how

Does that cover any statement you would want to make about something?

As I see it, by nesting statements, you have quite a powerful syntax of expression. The top level statement could encompass the act of the interpreter interpreting the source (refrenced within the What element) into the child statement in the How element.

What might need to be added, is a way to identify the source and the interpreter for the statements – although if there’s a neat way I’m not seeing to do that within the current W6 syntax, then all the better. The syntax needs to let us specify, for example, one source for What happened, and another for When it happened.

… But I don’t want to get into too much detail here.
So to recap:

  1. encode source in TEI
  2. write XSL to transform (read: ‘interpret’) source into:
  3. RDF statements of knowledge

Tools for semantically encoding scholarship?

It’s probably too much to expect every practising scholar to be thoroughly conversant with XML and TEI, happily hacking away at XSL stylesheets in vim.

What would an application that lets you make machine-readable statements of knowledge, without hurting your eyes with angle brackets, look like?

Well the Mangrove project at the University of Washington have an interesting graphical tagger application that lets you semantically tag up existing documents.

I imagine something like that. You load in your TEI, and you see it in the window, probably with some CSS applied so that you don’t see the tags, but the elements are styled according to whether they refer to a person, thing, place, or date – each is a different colour, and in bold (for example). Just some easy visual way for you to pick out these elements from the rest of the text. Then when you select these elements, you get a dialogue box for building the statement. The statement is maybe already partly filled out because you set up some template statements for the document(s) you’re working with. And one of the W6 questions is answered depending on which type of element you selected.
You also need to be able to set up macros, for repetitive interpretations of consistently structured documents. For instance, you might have a diary, and want to set up a tentative rule that says the writer of the diary knew every person mentioned within. You could then go through each instance (in a spell checker-esque interface) adding further information on each relationship, and telling it to ignore mentions of people that the writer knew only by repute.


Well, that’s it one broad vague idea for a three step model for digital scholarship, with the odd over-specific digression thrown in for good measure.

Advertisements

Comments (1)

Native to a Web of Data

I’ve been meaning to post this for a while.
MP3 of Tom Coates’ presentation

An excellent presentation from Tom Coates who used to work for the BBC, but has now moved on to Yahoo. His talk is aimed at web developers about to develop web 2.0 apps.
Those building or contemplating building web apps for humanities (or any other) purposes should bear his excellent advice in mind:

  1. Look to add value to the Aggregate web of data
  2. Build for normal users, developers and machines
  3. Start designing with data (objects), not with pages
  4. Identify your first order object and make them addressable
  5. Use readable, reliable, and hackable URLs
  6. Correlate with external identifier schemes (or coin a new standard)
  7. Build list views and batch manipulation interfaces
  8. Parallel data representations
  9. Make your data as discoverable as possible

The future looks good if everyone follows his advice.

Leave a Comment

Titlez: a tool for studying the modern Book Trade?

Having done my MA in the Book Studies department at Leiden, I can’t help but mention this new web service:

TitleZ

According to their web-site:

TitleZ makes it easy to see how a book or group of books has performed over time, relative to other books on the market. Simply enter a search phrase, book title, or author, and TitleZ returns a comprehensive listing of books from Amazon along with our historical sales rank data.

So, although intended for publishing professionals, this could be highly interesting for scholars of the (online) book trade. And currently, it’s free (whilst in beta). Perhaps they could be persuaded to offer academic licenses?

Leave a Comment