(Seen on Digital Medievalist)
Archive for General
What Digital Humanities tools could take from Web 2.0:
Give users tools to visualise and network their own data. And make it easy.
A good example is Last.FM. You run a program they give you that uploads the data about the songs you listen to, as you are listening to them. You can then see stats about your listening habits, and are linked with people with similar listening habits. The key thing is that you don’t have to do extra work.
Another example is LibraryThing, which makes it easy visualise and network data about your book collection. It can’t be as automatic as last.fm, but it does let you import any file you might happen to have with ISBNs in it.
Compare this to a Digital Humanities project: The Reading Experience Database, which aims to accumulate records of
reading experiences. They ask that if you come across any reading experiences in your research, you note them down, and submit them to the database with their online form (there are two – a 4 page form and a shorter one page form if you can’t be bothered with 4 pages of forms).
I’m not out to disparage the RED here – in many ways it is a fine endeavour. But I do want to criticise the conceptual model of how it accumulates data:
It requires that you, as a researcher, do your normal work, and then go and fill in (ideally) 4 pages of web forms for every reading experience that you have found (and possibly already documented elsewhere). Do you like filling out forms? I don’t. Worst of all, you don’t get any kind of access to the data – yours, or anyone elses (you just have to trust they will eventually get around to coding a search page).
This doesn’t help you to do your work now.
Which brings me to my next point…
Harness the self-interest of your users
You need them to use you, so make it worth their while. Don’t ask for their help, help them!
One problem, I think, is when projects start from a research interest. They want to gather data on that topic, so they ask other researchers to help them by filling out web forms.
A better approach to gathering data, I suggest, is to help the user with their own research interests as a first priority. The guy that built del.icio.us, interestingly, said that he primarily wanted his users to tag bookmarks with the keywords that suited them best personally, to tag out of pure self interest. The network effect of their tagging is a huge side benefit, but it doesn’t need to be the reason that people use del.icio.us. The end result is something more anarchic, more used, and more useful than something like dmoz.org.
del.icio.us doesn’t say
I’m interested in French Renaissance Poetry, please fill in these forms. It gives you a tool to keep track of your bookmarks. It let’s you import bookmarks you already have, and it lets you export your data too.
Have an API
You don’t know what you’ve got until you give it away.
SOAP is good, but it doesn’t even need to be that complicated. Make sure that search results are retrievable through a url, and presented as semantic xhtml, and your data is already much more sharable (listen to Tom Coates’ presentation on the
Web of Data).
Sharing data in a machine readable and retrievable format, is the most important feature. It lets other people build features for you
Back in March, Dan Cohen lamented on the lack of non-commercial APIs
suitable for the humanities hacker. And it’s odd – humanities scholarship is a community that you would think would want to facilitate access and reuse of their data – and the only useful APIs Dan Cohen could find (from programmableweb.org) were from the Library of Congress and the BBC. (It’s not quite as bad as that, commercial APIs are potentially useful too, and there’s also the COPAC for querying UK research libraries, and of course wikipedia).
There are a ton of digital projects stored away in repositories, such as are provided by the AHDS, but few are much more accessible or usable in their digital form than in print.
I read that the ESTC is going to be made freely available through the British Library’s website later this year – imagine the historical mashups that could be done – the information that could be mined and visualised – if they would provide a developers’ API.
Embrace the chaos of knowledge
The exciting thing about the folksonomy approach of tagging, and the user creation and maintenance of knowledge of Wikipedia, is that they have shown that a bottom-up method of knowledge representation can be more powerful and more accurate than traditional top-down methods.
It’s messy, flawed, pragmatic, flexible, useful, and realistic system for representing knowledge.
What do you think?
Some projects already do, and have done, some of these things for quite some time (please comment with examples!).
Perhaps it is wrong to try to apply lessons from commercial/mainstream web apps too closely to digital humanities projects, which after all, have different aims and priorities?
There are also different types of projects (some more like resources, others more like tools?), some of which might find these points inappropriate.
What other principles (and web trends) do you think digital humanities projects should be thinking about?
For the source, we have TEI. This lets us mark-up different parts of the source document according to what they are. What is also needed is some way of encoding what inferences you draw from the source document about the world – a way of saying “I think this part of this document means that…”. In other words, a schema for encoding statements of knowledge. In other words, RDF.
Then, what is needed is a way of mapping between Source and Statement, a way of saying:
This part (or ‘these parts’) of this document (or ‘these documents’) lead me to make this (or these) Statement(s) of Knowledge.
As a language designed for transforming one XML document into another, XSL seems pretty ideal for this. Of course, you might use any scripting language to do this, but the interpretation stage is both data, and a set of instructions, so the fact that XSL is XML too, lends it to this task. This means you can use the same set of tools for creating, storing and querying Source, Interpretation, and Statement.
An example of how this might work in practice:
You have a TEI encoded version of a Book Auction Catalogue. It consists of lists (listBibl) of books (bibl). You think this list means something, so you write a template:
<xsl:text> This Book, </xsl:text>
<xsl:text>, was sold on </xsl:text>
<xsl:value-of select="tei:date[@xml:id ='this-is-the-date-of-the-auction' ]"/>
Note: This is a simplified example. The idea is to output RDF rather than plain text (which I have used here because I am not exactly sure yet how an equivalent machine readable statement would best be expressed). You would also want to include many more details in your statement, pulled in from other parts of the document, as well as other documents perhaps – details identifying you, the source(s), place, probability. The point here, is that using XSL, you can define your interpretation of the document(s).
XSLT simultaneously produces and documents your interpretation. If you then discover that all the books in the catalogue were previously owned by a Rev. John Smith, you can edit the interpretive XSLT file to represent that too (dutifully documenting your change in interpretation in some way). And because you are (hopefully) generating your Statements of Knowledge dynamically from your interpretative XSLT file, when your interpretation changes, your statements change accordingly.
Statements of Knowledge
Ok, we use RDF for representing statements of knowledge – but how? what might these statements look like?
Well, one starting point (at least) could be to look at the W6 vocabulary invented by Danny Ayers in 2004.
The W6 is
an ontology to allow resource descriptions and reasoning based on the 6 questions : who, why, what, when, where, how
Does that cover any statement you would want to make about something?
As I see it, by nesting statements, you have quite a powerful syntax of expression. The top level statement could encompass the act of the interpreter interpreting the source (refrenced within the What element) into the child statement in the How element.
What might need to be added, is a way to identify the source and the interpreter for the statements – although if there’s a neat way I’m not seeing to do that within the current W6 syntax, then all the better. The syntax needs to let us specify, for example, one source for What happened, and another for When it happened.
… But I don’t want to get into too much detail here.
So to recap:
- encode source in TEI
- write XSL to transform (read: ‘interpret’) source into:
- RDF statements of knowledge
Tools for semantically encoding scholarship?
It’s probably too much to expect every practising scholar to be thoroughly conversant with XML and TEI, happily hacking away at XSL stylesheets in vim.
What would an application that lets you make machine-readable statements of knowledge, without hurting your eyes with angle brackets, look like?
Well the Mangrove project at the University of Washington have an interesting graphical tagger application that lets you semantically tag up existing documents.
I imagine something like that. You load in your TEI, and you see it in the window, probably with some CSS applied so that you don’t see the tags, but the elements are styled according to whether they refer to a person, thing, place, or date – each is a different colour, and in bold (for example). Just some easy visual way for you to pick out these elements from the rest of the text. Then when you select these elements, you get a dialogue box for building the statement. The statement is maybe already partly filled out because you set up some template statements for the document(s) you’re working with. And one of the W6 questions is answered depending on which type of element you selected.
You also need to be able to set up macros, for repetitive interpretations of consistently structured documents. For instance, you might have a diary, and want to set up a tentative rule that says the writer of the diary knew every person mentioned within. You could then go through each instance (in a spell checker-esque interface) adding further information on each relationship, and telling it to ignore mentions of people that the writer knew only by repute.
Well, that’s it one broad vague idea for a three step model for digital scholarship, with the odd over-specific digression thrown in for good measure.
I’ve been meaning to post this for a while.
MP3 of Tom Coates’ presentation
An excellent presentation from Tom Coates who used to work for the BBC, but has now moved on to Yahoo. His talk is aimed at web developers about to develop web 2.0 apps.
Those building or contemplating building web apps for humanities (or any other) purposes should bear his excellent advice in mind:
- Look to add value to the Aggregate web of data
- Build for normal users, developers and machines
- Start designing with data (objects), not with pages
- Identify your first order object and make them addressable
- Use readable, reliable, and hackable URLs
- Correlate with external identifier schemes (or coin a new standard)
- Build list views and batch manipulation interfaces
- Parallel data representations
- Make your data as discoverable as possible
The future looks good if everyone follows his advice.
Having done my MA in the Book Studies department at Leiden, I can’t help but mention this new web service:
According to their web-site:
TitleZ makes it easy to see how a book or group of books has performed over time, relative to other books on the market. Simply enter a search phrase, book title, or author, and TitleZ returns a comprehensive listing of books from Amazon along with our historical sales rank data.
So, although intended for publishing professionals, this could be highly interesting for scholars of the (online) book trade. And currently, it’s free (whilst in beta). Perhaps they could be persuaded to offer academic licenses?
Talking with Talis is a podcast you might be interested in listening in to. Talis is a service provider to libraries in the UK, and they’ve set up a podcast interviewing various luminaries world-wide in the field of information management. The latest one has a lively discussion of ‘Library 2.0’.