Saturday, 20 November 2010

HOWTO: Deep linking into the NZETC site

As the heaving mass of activity that is the mixandmash competition heats up, I have come to realise that I should have better documented a feature of the NZETC site, the ability to extract the TEI xml annotated with the IDs for deep linking.

Our content's archival form is TEI xml, which we massage for various output formats. There is a link from the top level of every document to the TEI for the document, which people are welcome to use in their mashups and remixes. Unfortunately, between that TEI and our HTML output is a deep magic that involves moving footnotes, moving page breaks, breaking pages into nicely browsable chunks, floating marginal notes, etc., and this makes it hard to deep link back to the website from anything derived from that TEI.

There is another form of the TEI available which is annotated with whether or not each structural element maps 1:1 to an HTML: nzetc:has-text and what the ID of that page is: nzetc:id This annotated XML is found by replacing the 'tei-source' in the URL with 'etexts'

Thus for The Laws of England, Compiled and translated into the Māori language at there is the raw TEI at and the annotated TEI at

Looking in the annotated TEI at we see for example:

<div xml:id="t1-g1-t1-front1-tp1" xml:lang="en" rend="center" type="titlePage" nzetc:id="tei-GorLaws-t1-g1-t1-front1-tp1" nzetc:depth="5" nzetc:string-length="200" nzetc:has-text="true">

This means that this div has it's own page (because it has nzetc:has-text="true" and that the ID of that page is tei-GorLaws-t1-g1-t1-front1-tp1 (because of the nzetc:id="tei-GorLaws-t1-g1-t1-front1-tp1"). The ID can be plugged into:<ID>.html to get a URL for the HTML. Thus the URL for this div is This process should work for both text and figures.

Happy remixing everyone!

1 comment:

Rebecca said...

Just thought I would add a note on how to link to item records on Timeframes ( or Find (

URLs for item records copied and pasted from the browser are not stable. I use the following format to get clean & stable links:

Replace the characters at the end (everything after &docId=) with the identifier string for the item you want. To get this:

Copy and paste the URL from the browser into Notepad. This will have a bunch of extra stuff in it such as the searchterm. The item identifier is the value for &doc and may be in various formats depending on whether the item is a book, article, image, etc.

To make the link display the page in Timeframes layout, change the value for vid to TF (rather than NLNZ).

Hope this makes sense :-)