As someone who generates large numbers of epubs from TEI for
http://nzetc.victoria.ac.nz/ I can tell you that neither ePubs nor TEI /
XML are a magic bullet.
Originally we included both the TEI XML and the unscaled images in our
ePubs for archival purposes, but they were huge. Even after we stripped
out both some of our ePubs are > 80 Megabytes.
Navigation of reference works is remains a crawling abomination on most
Many ereaders have no 'search within page' functionality or one so poor
that most users never use it in practise. Many ereaders silently ignore
intra-page and intra-book links embedded in HTML. No ereaders that I'm
aware of have 'search whole ePub' functionality. No ereaders that I'm
aware of have the capability of linking from references to other epubs
on the same ereader that match that reference.
If you want to try for yourself you're welcome to try
http://nzetc.victoria.ac.nz/tm/scholarly/tei-Cyc01Cycl.html (ePub link
in righthand column). This document has already been made smaller by
removing page images.
Having said that, I'm glad I'm working with ePub + TEI/XML rather than PDF.
On 01/08/12 10:41, Doug Moncur wrote:
> epub is a great format and being reflowable means it can easily be read
> and displayed on a variety devices, and things like upping the font size
> for legibility are trivial. It's great for born digital documents that
> will only ever be digital.
> The problem really comes where one wants to preserve the structure of
> the document, perhaps because the printed document is in some way
> considered to be the true version. Page numbers, positioning of foot
> notes - any other structural information suddenly becomes difficult to
> TEI anyone ?
> On 01/08/2012, at 08:30 , Richard M. Davis wrote:
>> Hi Tim
>> On 27/07/2012 10:42, Tim Brody wrote:
>>> Of course the most successful format, available on by far the most
>>> platforms and most vendors, is HTML. As the Semantic Web/schema.org gain
>>> traction the amount of information stored in HTML will dwarf that in
>>> dead-tree formats like Word and PDF (if it doesn't already).
>> Not only that, but e-book formats are essentially HTML too, EPub
>> particularly. I'm optimistic that, as tablets and e-book readers
>> continue to gain traction, it will be successful, flexible and hopefully
>> semantically rich rendering on those devices, that will become the
>> benchmark for most publications, rather than pseudo-A4. Much like what
>> we've been striving for on the Web for 20 years, but this time as a real
>> substitute for print, not an adjunct.
>> Not that preserving *everything* someone might choose to package up in
>> an EPUB3 file is necessarily going to be a picnic, but at least we're in
>> familiar web archiving territory ;)
>> Richard M. Davis
>> Manager, Research Technologies Group
>> University of London Computer Centre (ULCC)
>> Senate House, Malet Street, London WC1E 7HU
>> t: +44 (0) 20 7863 1350
>> m: +44 (0) 79 3040 6197
>> e: [log in to unmask] <mailto:[log in to unmask]>
>> w: http://www.ulcc.ac.uk/
>> b: http://dablog.ulcc.ac.uk/
>> c: http://tinyurl.com/richardscalendar
>> *Save electrons* "When replying to a message, include enough original
>> material to be understood but no more." (RFC 1855)
>> The University of London is an exempt charity in England and Wales
>> and a charity registered in Scotland (reg. no. SC041194)
> Doug Moncur
> Repository Manager
> Division of Information
> Level 4, RG Menzies Building (Building #2)
> The Australian National University
> Canberra ACT 0200
> t: +61 2 6125 0977
> m: +61 402 395 211
> w: http://information.anu.edu.au/information/
Library Technology Services http://www.victoria.ac.nz/library/