I wrote my thesis in TEI. Before I began, I searched google (mainly in vain) for advice and examples of writing born-TEI – that is, documents originally written in TEI, not encoded in it afterwards. So, for the benefit of others who are also thinking of authoring in TEI, here is some of what I took away from the experience.
Writing (and thinking) Digitally and Semantically Can Be Quite Different from Writing (and thinking) in the conventions of the Printed Page
For one thing, I was tempted down the path of DRY. So, for example, when citing a book, instead of having a footnote with bibliographic details, I had a <ptr/> element with a target attribute pointing to the id of the <bibl/> in my bibliography. Not having to repeat yourself is nice.
Another thing you can do is write your notes inline with the text, and then transform them when it comes to presentation, replacing the text with a reference number, and moving the text to the foot of the page or the end of the document. Not only is this breaking out of the mindset of print, it is easier than having to shoot down to a notes section in the document every time you want to write one.
You could also do the same thing with citations I suppose. Instead of targeting items in a bibliography section, you might simply write the bibliographic details out inline with your main text, targeting back in subsequent citations, and moving everything down into a bibliography (and notes) in the presentational stage. However, you may want to have a bibliography in the TEI as well, for those books and articles that you don’t refer to directly, but nonetheless want to acknowledge. You may also prefer to write your bibliography before you start writing the main text. It depends how you work.
You (probably) Still Have to Present It in the Conventions of the Printed Page
It can be pretty annoying having to batter your born-digital document back into the typographical conventions you tried so hard to think and write outside of. The great advantage of course, is that you can present your text in many different forms without touching the original document. Unfortunately, most new documents, such as university dissertations, only really have to be presented in one form, so this advantage didn’t really console me much.
TEI offers too many different ways to fulfil common tasks
Not that we need less choice, but it would be good if there were ‘microformats’ for authoring in TEI, so that you didn’t have to develop so many mini principles of best practice as you wrote.
An example: in your bibliography, you have some urls. Scholarly practice dictates that you include a ‘last accessed on’ date, but how do you mark it up? This is a situation where you have to follow a convention anyway, so it would be really useful if you could follow a conventional way to mark it up. If we all do <date type=”lastAccessed>2004-10-16</date> then we can share stylesheets and other tools. And that would be nice.
HTML is pretty un-semantic
There is a lot of talk in the web-dev community about the importance of semantic (x)html. And it’s true, html authors should try to do it as semantically as possible. Transforming from a really semantic mark-up language like TEI though, you realise how little the amount of meaning you can give text with html really is. Of course, it is a good thing that html has a far smaller tag set – imagine the success of the web if every homepage-jockey had to wade through the TEI guidlines to publish their poetry and pet photography. But it really puzzles me why in html we have so many tags for presenting programming stuff – kbd, samp, var, code – but not tags for marking-up the stuff that programmers really care about, like dates and names.
So, if you are going to transform your TEI into html, you also have to decide how semantic your html is going to be, and how much presentation you are going to do with XSLT (or scripting language of your choice), and how much you are going to do with CSS. This probably depends heavily on the browsers you need to support. CSS3 is quite powerful, but it ain’t gonna work in Internet Explorer. CSS is also, I find, a bit easier to read and work with than XSLT, but you will need to stop-gap html’s small tag set with plenty of classed spans and divs, and it can get quite time consuming switching between xsl and css files trying to locate and solve various presentational glitches (did I do this in css, or xsl?).
One answer is to skip the html stage. Style your TEI with css, and use only a mere sprinkling of scripting/xslt to re-order and copy chunks of content. This has the advantage that your document will retain its semantics right up till it hits the printer ribbon. The disadvantage is that it loses the functionality of html – you won’t have hyperlinks, and it will only really work in the newest most standard compliant browsers, so won’t be terribly accessible.
Mapping TEI to HTML
One of the annoying differences between TEI and valid (x)html is that in tei, lists and quotes can occur in paragraphs, but in html they can’t. So I thought it might be helpful to put my solution to this here as well. The following template assumes that quotes longers than 130 characters are blockquotes, whilst shorter quotes will be inline. Lists that are part of a paragraph’s text (ie: a comma separated list) cannot be transformed to an <ul> or an <ol> (well, they maybe can if you split it into two paragraphs and fiddle enough with the css, but it’s probably less semantic than to transform it into plain text). I have marked up these lists in the TEI with @type=’inline’.
<xsl:template match="tei:p[child::tei:list[not(@type='inline')]|child::tei:cit[string-length(tei:quote) > 130]|child::tei:listBibl]"> <p> <xsl:if test="@xml:id"> <xsl:attribute name="id"> <xsl:value-of select="@xml:id"/> </xsl:attribute> </xsl:if> <xsl:attribute name="class"> <xsl:value-of select="string('preblock')"/> </xsl:attribute> <xsl:for-each select="node()[following-sibling::tei:cit[string-length(tei:quote) > 130]|following-sibling::tei:list[not(@type='inline')]|following-sibling::tei:listBibl]"> <xsl:apply-templates select="current()"/> </xsl:for-each> </p> <xsl:apply-templates select="tei:list|tei:cit[string-length(tei:quote) > 130]|tei:listBibl"/> <p class="postblock"> <xsl:for-each select="node()[preceding-sibling::tei:list[not(@type='inline')]|preceding-sibling::tei:listBibl|preceding-sibling::tei:cit[string-length(tei:quote) > 130]]"> <xsl:apply-templates select="current()"/> </xsl:for-each> </p> </xsl:template>
NB: XHTML 2.0, when it comes, will allow lists within paragraphs.
Also, if you don't already know, #tei-c at irc.freenode.net is a good place to ask, argue and discuss TEI.
Comments are (as always), most welcome.