Monday 3 October 2016

How would we know when it was time to move from TEI/XML to TEI/JSON?

This post inspired by TEI Next by Hugh Cayless.

How would we know when it was time to move from TEI/XML to TEI/JSON?

If we stand back and think about what it is we (the TEI community) need from the format :
  1. A common format for storing and communicating Texts and augmentations of Texts (Transcriptions, Manuscript Description, Critical Apparatus, Authority Control, etc, etc.).
  2. A body of documentation for shared use and understanding of that format.
  3. A method of validating Texts in the format as being in the format.
  4. A method of transforming Texts in the format for computation, display or migration.
  5. The ability to reuse the work of other communities so we don't have to build everything for ourselves (Unicode, IETF language tags, URIs, parsers, validators, outsourcing providers who are tooled up to at least have a conversation about what we're trying to do, etc)
[Everyone will have their slightly different priorities for a list like this, but I'm sure we can agree that a list of important functionality could be drawn up and expanded to requirements list at a sufficiently granular level so we can assess different potential technologies against those items. ] 

If we really want to ponder whether TEI/JSON is the next step after TEI/XML we need to compare the two approaches against such as list of requirements. Personally I'm confident that TEI/XML will come out in front right now. Whether javascript has potential to replace XSLT as the preferred method for really exciting interfaces to TEI/XML docs is a much more open question, in my mind.  

That's not to say that the criticisms of XML aren't true (they are) or valid (they are) or worth repeating (they are), but perfection is commonly the enemy of progress.

Sunday 2 October 2016

Whither TEI? The Next Thirty Years



This post is a direct response to some of the organisational issues raised in https://scalablereading.northwestern.edu/?p=477
I completely agree that we need to significantly broaden the base of the TEI. A 200 x 500 campaign is a great idea, but better is a 2,000 x 250 goal, or a 20,000 x 250 goal. If we can reduce the cost to the normal range of a hardback text, most libraries will have delegated signing authority to individuals in acquisitions and only one person will need to be convinced, rather than a chain of people.
But how could we scale 20,000 institutions? To scale like that, we to think (a) in terms of scale and (b) in terms of how to make it easy for members to be a part of us.

Scale (1)

A recent excellent innovation in the the TEI community has been the appointment of a social media coordinator. This is a great thing and I’ve certainly learnt about happenings I would not have otherwise been exposed to. But by nature the concept of ‘a social media coordinator’ can’t scale (one person in one time zone with one set of priorities...). If we look at what mature large-scale open projects do for social media (debian, wikimedia, etc), planets are almost always part of the solution. A planet for TEI might include (in no particular):
  1. 20x blog feeds from TEI-specific projects
  2. 20x blog feeds from TEI-using projects (limited to those posts tagged TEI)
  3. 1x RSS feed for changes to the TEI wiki (limited to one / day each)
  4. 1x RSS feed for jenkins server (limited to successful build only; limited to one / day each; tweaked to include full context and links)
  5. 20x RSS feeds for github repositories not covered by jenkins server (limited to one / day each)
  6. 10x RSS feeds for other sundry repositories (limited to one / day each)
  7. 50x blog feeds from TEI-people (limited to those posts tagged TEI)
  8. 15x RSS feeds from TEI-people’s zotero bibliographic databases (limited to those bibs tagged TEI; limited to one / day each)
  9. 1x RSS feed for official TEI news
  10. 7x RSS feed of edits for the TEI article on each language wikipedia (limited to one / day each)
  11. 1x RSS feed of announcements from the JTEI
  12. 1x RSS feed of new papers in the JTEI
The diversity of the planet would be incredible compared to current views of the TEI community and it’s all generated as a byproduct of what people are already doing. There might be some pressure to improve commit messages in some repos, but that might not be all bad.
Of course the whole planet is available as an RSS feed and there are RSS-to-facebook (and twitter, yammer, etc) converters if you wish to do TEI in your favourite social media. If the need for a curated facebook feed remains, there is now a diverse constant feed of items to select within.
This is a social media approach at scale.

Scale (2)

There is an annual international conference which is great to attend. There is a perception that engagement in the TEI community requires attendance at the said conference. It’s a huge barrier to entry to small projects, particularly those in far-away places (think global south / developing world / etc). The TEI community should seriously consider a policy for decision making that explicitly removes assumptions about attendances. Something as simple as requiring draft papers intended for submission and agendas to be published and 30 days in advance of meetings and a notice to be posted to TEI-L. That would allow for thoughtful global input, scaling community from those who can attend an annual international conference to a wider group of people who care about the TEI and have time to contribute.

Make it easy (1)

Libraries (at least the library I work in and libraries I talk to) buy resources based on suggestions and lobbying by faculty but renew resources based largely on usage. If we want 20,000 libraries to have TEI on automatic renewal we need usage statistics. The players in the field are SUSHI and COUNTER (SUSHI is a harvesting system for COUNTER).
Maybe the TEI offers members stats at 10 diverse TEI-using sites. It’s not clear to me without deep investigation whether the TEI could offer these stats to members at very little on-going cost to us, but it would be a member benefit that all acquisitions librarians, their supervisors and their auditors could understand and use to evaluate their TEI membership subscription. I believe that that comparison would be favourable.
Of course, the TEI-using sites generating the traffic are going to want at least some cut of the subs, even if it’s just a discount against their own membership (thus driving the number of participating sites up and the perceived member benefits up) and free support for the stats-generating infrastructure.
For the sake of clarity: I’m not suggesting charging for access to content, I’m suggesting charging institutions for access to statistics related to access to the content by their users.

Make it easy (2)

Academics using computers for research, whether or not they think or call the field digital humanities face a relatively large number of policies and rules imposed by their institutions, funders and governments. The TEI community can / should be selling itself as he approach to meet these.
  1. Copyright issues? Have some corpora that are available under a CC license.
  2. Need to prove academic outputs are archivable? Here’s the PRONOM entry (Note: I’m currently working on this)
  3. Management doesn’t think the department as the depth of TEI experience to enroll PhDs in TEI-centric work? Here’s a map of global TEI people to help you find local backups in case staff move on.
  4. Looking for a TEI consultant? A different facet of the same map gives you what you need.
  5. You’re a random academic who knows nothing about the TEI but assigned a TEI-centric paper as part of a national research assessment exercise? Here’s an outline of TEI’s academic credentials.
  6. ....

Make it easy (3)

Librarians love quality MARC / MARCXML records. Many of us have quality MARC / MARCXML records for our TEI-based web content. Might this be offered as a member benefit?

Make it easy (4)

As far as I can tell the TEI community makes very little attempt to reach out to academic communities other than ‘literature departments and cognate humanities disciplines’ attracting a more diverse range of skills and academics will increase our community in depth and breadth. Outreach could be:
  1. Something like CSS Zen Garden http://www.csszengarden.com/ only backed by TEI rather than HTML
  2. A list of ‘hard problems’ that we face that various divergent disciplines might want to set as second or third year projects. Each problem would have a brief description of the problem, pointers to Things like:
    1. Transformation for display for documents have five foot levels of footnotes, multiple obscure scripts, non-Unicode characters, and so forth.
    2. Schema / ODD auto-generation from a corpus of documents
    3. ...
  3. Engaging with a group like http://software-carpentry.org/ to ubiquify TEI training
  4. ..

End Note

I'm not advocating that any particular approach is the cure-all for everything that might be ailing the TEI community, but the current status-quo is increasingly seeming like benign neglect. We need to change the way we think about TEI as a community.