Apologies for cross-posting.
The three-year Collaborative Electronic Records Project
(CERP) of the Smithsonian Institution Archives and the Rockefeller Archive
Center concluded in December 2008. Among the project outcomes, the CERP Email
Parser was produced and we are pleased to offer it to the archival and related
communities as an open source software tool for the preservation of email
accounts. The Email Parser (http://siarchives.si.edu/cerp/parserdownload.htm)
migrates an email account and its messages into a single XML file using the
Email Account XML Schema developed in collaboration with the North Carolina
State Archives and the EMCAP project.
The CERP Email Parser migrates an email account in MBOX
format into XML, using the schema to preserve the full body of messages,
together with their attachments, and keeps intact the account’s internal
organization (e.g., an Inbox containing subfolders labeled Policies, Special
Events, and Projects). The CERP team successfully preserved email accounts from
a variety of applications including Microsoft Outlook, AppleMail, LotusNotes,
and Netscape. All email messages retain their full header content, in contrast
to some tools produced in earlier research efforts.
The parser runs on a workstation in a virtual machine
environment compatible with Windows, Macintosh, Linux, and some Unix platforms.
CERP testing was limited to the Windows XP environment. The CERP Email Parser
is licensed as open source software so that it may be used, supported, and
enhanced by all organizations that adopt it.
The Email Parser is designed to address the task of
preserving bodies of email, such as an account, without requiring access to the
original email systems. Still, email accounts from active email systems may
also be preserved using this tool. The CERP Email Parser will be featured in
the pre-conference workshop “Achieving Email Account Preservation With XML” at
the Society of American Archivists 2009 Annual Meeting this August.
For more information and to download the parser, visit http://siarchives.si.edu/cerp/parserdownload.htm.
For more on the Collaborative Electronic Records Project, visit http://siarchives.si.edu/cerp/.
Please direct email inquiries to [log in to unmask].
Riccardo Ferrante
IT Archivist and Electronic Records Program Director
Smithsonian Institution Archives
600 Maryland Ave SW MRC 507
Washington, DC 20013-7012
[log in to unmask] | phone 202.633.5906 | fax
202.633.5928 | cell 202.341.4658