JISCMail - RECORDS-MANAGEMENT-UK Archives

Email discussion lists for the UK Education and Research communities

Subscriber's Corner

Email Lists

RECORDS-MANAGEMENT-UK Archives

RECORDS-MANAGEMENT-UK@JISCMAIL.AC.UK

View:

Message:

[

First

Last

]

By Topic:

[

First

Last

]

By Author:

[

First

Last

]

Font:

Proportional Font

		LISTSERV Archives
		RECORDS-MANAGEMENT-UK Home
		RECORDS-MANAGEMENT-UK October 2013

Options

Subscribe or Unsubscribe

Get Password

Subject:

Synopsis of Advice Received - Large Volume Scanning Project

From:

Colin Tyc <[log in to unmask]>

Reply-To:

Colin Tyc <[log in to unmask]>

Date:

Wed, 9 Oct 2013 15:43:21 +0100

Content-Type:

text/plain

Parts/Attachments:

text/plain (89 lines)

Dear List Members

On 6th August I placed post asking for views on issues to consider for a proposed large scale scanning project. With apologies for delay, here is synopsis of views and ideas expressed.

Post received around 20 replies. Six giving detailed advice or information, while others said were interested in the issue for future projects in their organisation. Some gave experience of large scale scanning as a stand-alone project, broadly the requirement I was asked to investigate, while more contacts described scanning as a component of a wider business change such as installing EDRMS and/or introducing digital mailroom. Issues grouped by broad topic below

ESTIMATING AND METRICS

Views were mixed on whether there are reliable industry-wide metrics. Some suggested figures, but others argued was not feasible to simply forecast volume based on the filetype; different content can mean one PDF takes N kb whereas another takes 2xN kb or more.

The only source who suggested a standard measure said for an A4 typed document monochrome scanned at 300 dpi into PDF file, we should allow 40kb per side of text. This was based on format of image-over-text: the scanned image on top with OCR-searchable text behind it. However, another argued a single page 200dpi TIFF can vary 10kb to > 100kb dependent on print grain and point size and if colour is needed, variations in compression or colour tone can vary from <10kb to > 1Mb per image.
Some contacts suggested we should not be too fussed about file size, when cost of disk storage is so low. No one mentioned cloud storage costs.

In terms of scanning throughput, it was suggested a project should handle 100,000 pages per month at maximum. While the sheet-feeder scanning element could run at “eye-watering speeds” the need for human input in preparation and indexing would limit the amount of paper that can be scanned each month.

At least one correspondent strongly rejected the concept of benchmark metrics, arguing we would have to gather our own a sample to estimate from. This would need to feature ‘average’ documents, assess the number of keystrokes to index those, and include the number of indexing fields.

Range of variables needed considering in estimates: how much work to remove staples, clips, dust covers? How much space to store these while the contents are being scanned? How frequently will files be needed for core business during the project? If taken off-site (or even off-shore), what are suitable batch sizes for available transport. Is the indexing data “objective” (in standard place or standard form) or “subjective” (needs expertise to find and interpret). How accurate must the indexing actually be?

Complexity is another important variable. Scanning a single record series is easier if these all use the same “metadata set”. This can be ‘auto populated’ by some (unspecified) software products.

PILOTING

Most common theme; nearly everyone called for pilot. Estimating must have sample trial. Recommended contacting choice of bureaux and including sample trial as part of bidding process. Practical trial of people and technologies will show qualitative factors like scan readability from range of originals as well as quantitative hard costs.

PROJECT MANAGEMENT AND BUSINESS ENGAGEMENT

Need for clear objectives and senior management sponsorship, plus a project exec or manager actually driving the project forward. There was strong recommendation to appoint a dedicated project manager and produce and agree clear specification for the work.

Some contacts said they had resourced in-house. It appears if firms had spare accommodation then leasing machines and deploying staff in-house could be considered. Most sources recommended outsourcing as far as possible with in-house resources quality assuring a sample of the scans.

Need to spread effort over business process, people and technology was highlighted, otherwise project risked be viewed as “ICT initiative” and users are more likely to try and avoid making the change.

COMMERCIAL

Where EDRMS services are being introduced, many vendors will recommend a partner or preferred-supplier scanning bureaux, although this may limit competition against an open marketplace.

A number of contacts mentioned firms used in the past. While I dont feel it appropriate to put these on open list, I did note everyone seemed to have had a good experience, expressing positive views. No one related bad experiences or suggested a firm to avoid.
In our own case, we have been directed to use a specific scanning firm who already have a framework contract. This has offered benefits in avoiding the overhead of external tendering but meant we were not able to compare wider market vendors’ technical capabilities and capacity.

THE SCANNING PROCESS

Thanks to everyone who described the four-step process:
1 Cleansing – removing staples etc, placing sheets in neat order
2 Scanning and Indexing
3 Quality Assurance – checking scans legible and accurate, making sure double-sided scans made of double sided originals etc
4 Managing the paper originals – either returning to storage adequately labelled or destroying securely.

The scanning part of the process appears the simplest; almost completely automated, whereas significant human input is needed at all other steps.
One contact suggested reviewing and indexing manually from file header sheets was simpler than trying to capture metadata from content by OCR, particularly for administrative services (HR, student records). OCR was used in industries where search and discovery paramount (such as mining).

Quality control can be on sample basis. Early in project sample a lot but as confidence in quality (on both parties) grows could reduce to 5pc.

Preparing files and re-packaging after scanning (if paper being kept) are normally the largest aspect of project and contacts suggested many projects under-estimate this element.

LOCATION AND LOGISTICS

A number of contacts said they had successfully run the operation in-house, or at least on their premises. This offered better control and for highly sensitive material could prove safer, but needed very accurate planning estimates. For a large project, there was a suggestion to set up parallel teams to generate some healthy competition and motivation. On smallest projects, the view was segregate tasks: for example: two colleagues on preparation, one running the scanner, two more indexing and repackaging.

Other contacts argued against in-house saying bulk scanning is a specialist operation, particularly on volume approaching 0.25 million files. Even where the volume and duration of work could not be accurately forecast, then contacts were recommending shipping the material to an external bureau instead. Bureau should help with specification analysis, as their core business.

Be sure of logistics in getting materials to bureau. Could have ultrafast scanning team but they cannot excel if transport company creates bottleneck in getting goods to them. Preference for shipping files to single central bureau, rather than multiple small bureaux near the files.

Make sure you have means to access files needed for core business during the scanning project, before the scanning starts. Likewise agree process for handling new paperwork that comes in after start of scanning.

DIGITAL STORAGE

TIFF format was only suggested where need for high level compliance.

As storage will be needed for the scanned images, most contacts reported they had implemented an EDRMS product (no one mentioned SharePoint) at the time of the scanning project. Including those where EDRMS had not been the original driver for the scanning work.

Kofax users mentioned a ‘connector’ facility that would route scanned documents directly into an EDRMS fileplan.

For less complex operations, suggestion to store on network drives, subject to warning the network managers about the sudden increase in volume!

Many thanks to everyone who contributed. The information received has been valuable in helping plan this project. The project is still at an early stage, and could be down-scoped as early estimates and liaison with government-frameworked supplier suggested the original volume could take 12 months to complete. This may mean only files up to 8 years old being scanned as these are the most accessed files, with older files remaining in paper form unless or until there is increased audit attention on that earlier time period. Resource demands of manual indexing also led to decision to avoid complex indexing (either manual or OCR) with files being scanned at whole-box level (putting multiple files on a PDF) and then indexed and PDFs split later.

Kind regards

Colin Tyc

To view the list archives go to: https://www.jiscmail.ac.uk/cgi-bin/webadmin?A0=RECORDS-MANAGEMENT-UK
To unsubscribe from this list, send an email to [log in to unmask] with the words UNSUBSCRIBE RECORDS-MANAGEMENT-UK

For any technical queries re JISC please email [log in to unmask]
For any content based queries, please email [log in to unmask]

Top of Message | Previous Page | Permalink

JiscMail Tools

Files Area | help

RSS Feeds and Sharing

Search Archives

Advanced Options

Archives

April 2024
March 2024
February 2024
January 2024
December 2023
November 2023
October 2023
September 2023
August 2023
July 2023
June 2023
May 2023
April 2023
March 2023
February 2023
January 2023
December 2022
November 2022
October 2022
September 2022
August 2022
July 2022
June 2022
May 2022
April 2022
March 2022
February 2022
January 2022
December 2021
November 2021
October 2021
September 2021
August 2021
July 2021
June 2021
May 2021
April 2021
March 2021
February 2021
January 2021
December 2020
November 2020
October 2020
September 2020
August 2020
July 2020
June 2020
May 2020
April 2020
March 2020
February 2020
January 2020
December 2019
November 2019
October 2019
September 2019
August 2019
July 2019
June 2019
May 2019
April 2019
March 2019
February 2019
January 2019
December 2018
November 2018
October 2018
September 2018
August 2018
July 2018
June 2018
May 2018
April 2018
March 2018
February 2018
January 2018
December 2017
November 2017
October 2017
September 2017
August 2017
July 2017
June 2017
May 2017
April 2017
March 2017
February 2017
January 2017
December 2016
November 2016
October 2016
September 2016
August 2016
July 2016
June 2016
May 2016
April 2016
March 2016
February 2016
January 2016
December 2015
November 2015
October 2015
September 2015
August 2015
July 2015
June 2015
May 2015
April 2015
March 2015
February 2015
January 2015
December 2014
November 2014
October 2014
September 2014
August 2014
July 2014
June 2014
May 2014
April 2014
March 2014
February 2014
January 2014
December 2013
November 2013
October 2013
September 2013
August 2013
July 2013
June 2013
May 2013
April 2013
March 2013
February 2013
January 2013
December 2012
November 2012
October 2012
September 2012
August 2012
July 2012
June 2012
May 2012
April 2012
March 2012
February 2012
January 2012
December 2011
November 2011
October 2011
September 2011
August 2011
July 2011
June 2011
May 2011
April 2011
March 2011
February 2011
January 2011
December 2010
November 2010
October 2010
September 2010
August 2010
July 2010
June 2010
May 2010
April 2010
March 2010
February 2010
January 2010
December 2009
November 2009
October 2009
September 2009
August 2009
July 2009
June 2009
May 2009
April 2009
March 2009
February 2009
January 2009
December 2008
November 2008
October 2008
September 2008
August 2008
July 2008
June 2008
May 2008
April 2008
March 2008
February 2008
January 2008
December 2007
November 2007
October 2007
September 2007
August 2007
July 2007
June 2007
May 2007
April 2007
March 2007
February 2007
January 2007
December 2006
November 2006
October 2006
September 2006
August 2006
July 2006
June 2006
May 2006
April 2006
March 2006
February 2006
January 2006
December 2005
November 2005
October 2005
September 2005
August 2005
July 2005
June 2005
May 2005
April 2005
March 2005
February 2005
January 2005
December 2004
November 2004
October 2004
September 2004
August 2004
July 2004
June 2004
May 2004
April 2004
March 2004
February 2004
January 2004
December 2003
November 2003
October 2003
September 2003
August 2003
July 2003
June 2003
May 2003
April 2003
March 2003
February 2003
January 2003

JiscMail is a Jisc service.

View our service policies at https://www.jiscmail.ac.uk/policyandsecurity/ and Jisc's privacy policy at https://www.jisc.ac.uk/website/privacy-notice

For help and support help@jisc.ac.uk