Thursday 23 June 2011

unit testing framework for XSL transformations?

I'm part of the TEI community, which maintains an XML standard which is commonly transformed to HTML for presentation (more rarely PDF). The TEI standard is relatively large but relatively well documented, the transformation to HTML has thus far been largely piecemeal (from a software engineering point of view) and not error free.

Recently we've come under pressure to introduce significantly more complexity into transformations, both to produce ePub (which is wrapped HTML bundled with media and metadata files) and HTML5 (which can represent more of the formal semantics in TEI). The software engineer in me sees unit testing the a way to reduce our errors while opening development up to a larger more diverse group of people with a larger more diverse set of features they want to see implemented.

The problem is, that I can't seem to find a decent unit testing framework for XSLT. Does anyone know of one?

Our requirements are: XSLT 2.0; free to use; runnable on our ubuntu build server; testing the transformation with multiple arguments; etc;

We're already using: XSD, RNG, DTD and schematron schemas, epubcheck, xmllint, standard HTML validators, etc. Having the framework drive these too would be useful.

The kinds of things we want to test include:
  1. Footnotes appear once and only once
  2. Footnotes are referenced in the text and there's a back link from the footnote to the appropriate point in the text
  3. Internal references (tables of contents, indexes, etc) point somewhere
  4. Language encoding used xml:lang survives from the TEI to the HTML
  5. That all the paragraphs in the TEI appear at least once in the HTML
  6. That local links work
  7. Sanity check tables
  8. Internal links within parallel texts
  9. ....
Any of many languages could be used to represent these tests, but ideally it should have a DOM library and be able to run that library across entire directories of files. Most of our community speak XML fluently, so leveraging that would be good.