Authoring Born TEI

I wrote my thesis in TEI. Before I began, I searched google (mainly in vain) for advice and examples of writing born-TEI – that is, documents originally written in TEI, not encoded in it afterwards. So, for the benefit of others who are also thinking of authoring in TEI, here is some of what I took away from the experience.

  • Writing (and thinking) Digitally and Semantically Can Be Quite Different from Writing (and thinking) in the conventions of the Printed Page

    For one thing, I was tempted down the path of DRY. So, for example, when citing a book, instead of having a footnote with bibliographic details, I had a <ptr/> element with a target attribute pointing to the id of the <bibl/> in my bibliography. Not having to repeat yourself is nice.

    Another thing you can do is write your notes inline with the text, and then transform them when it comes to presentation, replacing the text with a reference number, and moving the text to the foot of the page or the end of the document. Not only is this breaking out of the mindset of print, it is easier than having to shoot down to a notes section in the document every time you want to write one.

    You could also do the same thing with citations I suppose. Instead of targeting items in a bibliography section, you might simply write the bibliographic details out inline with your main text, targeting back in subsequent citations, and moving everything down into a bibliography (and notes) in the presentational stage. However, you may want to have a bibliography in the TEI as well, for those books and articles that you don’t refer to directly, but nonetheless want to acknowledge. You may also prefer to write your bibliography before you start writing the main text. It depends how you work.

  • You (probably) Still Have to Present It in the Conventions of the Printed Page

    It can be pretty annoying having to batter your born-digital document back into the typographical conventions you tried so hard to think and write outside of. The great advantage of course, is that you can present your text in many different forms without touching the original document. Unfortunately, most new documents, such as university dissertations, only really have to be presented in one form, so this advantage didn’t really console me much.

  • TEI offers too many different ways to fulfil common tasks

    Not that we need less choice, but it would be good if there were ‘microformats’ for authoring in TEI, so that you didn’t have to develop so many mini principles of best practice as you wrote.

    An example: in your bibliography, you have some urls. Scholarly practice dictates that you include a ‘last accessed on’ date, but how do you mark it up? This is a situation where you have to follow a convention anyway, so it would be really useful if you could follow a conventional way to mark it up. If we all do <date type=”lastAccessed>2004-10-16</date> then we can share stylesheets and other tools. And that would be nice.

  • HTML is pretty un-semantic

    There is a lot of talk in the web-dev community about the importance of semantic (x)html. And it’s true, html authors should try to do it as semantically as possible. Transforming from a really semantic mark-up language like TEI though, you realise how little the amount of meaning you can give text with html really is. Of course, it is a good thing that html has a far smaller tag set – imagine the success of the web if every homepage-jockey had to wade through the TEI guidlines to publish their poetry and pet photography. But it really puzzles me why in html we have so many tags for presenting programming stuff – kbd, samp, var, code – but not tags for marking-up the stuff that programmers really care about, like dates and names.

    So, if you are going to transform your TEI into html, you also have to decide how semantic your html is going to be, and how much presentation you are going to do with XSLT (or scripting language of your choice), and how much you are going to do with CSS. This probably depends heavily on the browsers you need to support. CSS3 is quite powerful, but it ain’t gonna work in Internet Explorer. CSS is also, I find, a bit easier to read and work with than XSLT, but you will need to stop-gap html’s small tag set with plenty of classed spans and divs, and it can get quite time consuming switching between xsl and css files trying to locate and solve various presentational glitches (did I do this in css, or xsl?).

    One answer is to skip the html stage. Style your TEI with css, and use only a mere sprinkling of scripting/xslt to re-order and copy chunks of content. This has the advantage that your document will retain its semantics right up till it hits the printer ribbon. The disadvantage is that it loses the functionality of html – you won’t have hyperlinks, and it will only really work in the newest most standard compliant browsers, so won’t be terribly accessible.

Mapping TEI to HTML

One of the annoying differences between TEI and valid (x)html is that in tei, lists and quotes can occur in paragraphs, but in html they can’t. So I thought it might be helpful to put my solution to this here as well. The following template assumes that quotes longers than 130 characters are blockquotes, whilst shorter quotes will be inline. Lists that are part of a paragraph’s text (ie: a comma separated list) cannot be transformed to an <ul> or an <ol> (well, they maybe can if you split it into two paragraphs and fiddle enough with the css, but it’s probably less semantic than to transform it into plain text). I have marked up these lists in the TEI with @type=’inline’.

<xsl:template match="tei:p[child::tei:list[not(@type='inline')]|child::tei:cit[string-length(tei:quote) > 130]|child::tei:listBibl]">
<xsl:if test="@xml:id">
<xsl:attribute name="id">
<xsl:value-of select="@xml:id"/>
<xsl:attribute name="class">
<xsl:value-of select="string('preblock')"/>
<xsl:for-each select="node()[following-sibling::tei:cit[string-length(tei:quote) > 130]|following-sibling::tei:list[not(@type='inline')]|following-sibling::tei:listBibl]">
<xsl:apply-templates select="current()"/>
<xsl:apply-templates select="tei:list|tei:cit[string-length(tei:quote) > 130]|tei:listBibl"/>
<p class="postblock">
<xsl:for-each select="node()[preceding-sibling::tei:list[not(@type='inline')]|preceding-sibling::tei:listBibl|preceding-sibling::tei:cit[string-length(tei:quote) > 130]]">
<xsl:apply-templates select="current()"/>

NB: XHTML 2.0, when it comes, will allow lists within paragraphs.
Also, if you don't already know, #tei-c at is a good place to ask, argue and discuss TEI.

Comments are (as always), most welcome.



  1. […] I came across a post about semantic markup and accessibility citing a remark I had made on this blog about how, for all the talk about semantic markup in the web-dev community, HTML isn’t a very semantic markup language. […]

  2. […] Tags: Mills Kelly thought they could be used to subvert the archive. TEI: Keith Alexander talked about authoring directly in it. Timeline: the SIMILE group released a “DHTML-based AJAXy widget.” […]

  3. site said

    Have you given any kind of thought at all with converting your webpage in to German? I know a small number of translaters right here which might help you do it for no cost if you wanna get in touch with me personally.

RSS feed for comments on this post · TrackBack URI

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s