Print

Print


How wonderful, Martin, Joe and Tracy!  I'm sure you know that the Women Writers Project has been doing TEI-based work on early modern texts (including protocols for metadata) for over 25 years, and assume you have been in touch with the current Director, Julia Flanders--but I am copying her just in case.

This will be not only a wonderful tool, but a terrific teaching resource.

Warm regards and very best wishes,

Susanne

On Mon, Apr 11, 2016 at 9:10 PM, Martin Mueller <[log in to unmask]> wrote:
I hope I'm not abusing the privileges of this list by sending you the announcement below, which has more than casual roots in the Spenser edition of Joe Loewenstein, David Miller et al. And if any of you have an interest in the collaborative curation of Early Modern texts, I will be delighted to keep you in the loop on our progress.

The Early Modern Lab

 

We announce with delight and gratitude that the Andrew W. Mellon Foundation has contributed  $200,000  to the “Early Modern Lab,” a project that will bring together students, faculty, librarians, and IT professionals from Northwestern University, the University of Notre Dame, and Washington University in  St. Louis. The project’s primary objective is to build an environment for the collaborative curation and exploration of the large corpus of Early Modern English texts transcribed into a TEI format by the Text Creation Partnership (TCP).

The more than 60,000 TCP texts can be thought of as a substantially complete deduplicated library of Early English print culture. No other sample of the distant past provides comparable digital coverage in terms of size, density, diversity, and importance. The digital migration of these texts constitutes a big step towards a goal that Jerome McGann outlined when he argued in 2001 that “in the next fifty years the entirety of our inherited archive of cultural works will have to be re-edited within a network of digital storage, access, and dissemination.”  The magnitude of the step is not diminished by the fact that a quarter of the TCP transcriptions have many and well-known defects that can and should be repaired. Committed to the work of repair, the Early Modern Lab will transform the TCP archive into the foundation for a “Book of English,” defined as

·      A very large, growing, collaboratively curated, and public domain digital corpus

·      Of printed English since its earliest modern form

·      With full bibliographical detail

·      And light but consistent structural and linguistic annotation

The curation of digital textual data involves not only editorial tasks familiar from the print world but also the creation of machine-actionable metadata that support the linking of data across many texts, both for informal exploratory search and for machine- assisted “analytics” in which quantitative routines are applied to textual data and the results are typically visualized as graphs, tables, or word clouds.

While the Early Modern Lab will encourage activities across the whole spectrum of digital exploration, it is a cardinal assumption of this project that for many years to come “reading” will remain the first and most important way in which students of Early Modern culture encounter the primary sources that constitute the documentary infrastructure of their discipline. Getting the digitized primary sources into shipshape form for reading therefore remains a first-order task for digital curation. Digital surrogates should fully meet the expectations of a print-based world about what counts as a good enough text for scholarly reading.

For the past three years, undergraduates at Northwestern, Washington University, and more recently Amherst College have engaged in archival and editorial work in a computationally assisted environment. Working with some 500 Early Modern plays in the TCP corpus (0.5% of total word count), they reduced the rate of known defects per 10, 000 words from 14 to 1.1 at the median, and 62 to 3.5 at the 75th percentile of texts.

The Early Modern Lab will extend this experiment, enlisting a larger team to correct and curate a much wider swath of the TCP corpus, with an initial focus on texts from the English Civil War and the Early American period. We will use machine-learning techniques to identify textual defects and offer probability-ranked choices that ease the human editors’ tasks. Central to the curatorial part of the project will be a web-based Annotation Module, built by eXist Solutions GMBH and integrated with a TEI Simple environment. The module will make annotation almost as easy as reading with a pencil, but the who, what, when, and where of each editorial suggestions is logged as a discrete transaction in a central store, where after editorial review it can be automatically integrated into the source texts. The Annotation Module will be built this spring and will be tested extensively by students in summer internships, with an initial focus on texts from the English Civil War and Colonial America.  It should be available for general use by Fall 2016. We hope that it will appeal to teachers who like to integrate some editorial assignments into their pedagogical practice because students will enjoy work from which they learn and that is useful to others.  But the tool will be helpful to individuals in any walk of life who would like to invest a little of their ‘cognitive surplus’ in helping improve the textual heritage of a critical period in the history of the English speaking world.  John Heywood’s Proverbs of 1546 include Many handis make light warke.” 

Tracy Bergstrom, University of Notre Dame
Joe Loewenstein, Washington University in St. Louis
Martin Mueller, Northwestern University