Hi Gio,
There are dozens if not hundreds of applications that will compare OCR
output (which is after all, just text). An incomplete list:
https://en.wikipedia.org/wiki/Comparison_of_file_comparison_tools
To that list should be added: CollateX, Delta XML, Diff-match-patch,
Juxta, NDiff, Oxygen XML Editor, Open Office, Microsoft Word, Google
docs, rdiff, Saktumiva, TAN Diff+, and online resources such as
diffchecker.com, text-compare.com, textdiff.com
If, like me, you need to rigorously compare multiple OCR configurations
against each other and a ground truth text (say five or more versions),
you might want to consider TAN Diff+ (http://textalign.net), which is
designed primarily for complex needs not handled by standard consumer
text-differencing software. It also requires some comfort with XSLT.
Here is a sample output applied to OCR:
https://textalign.net/output/diff-grc-2021-02-08-five-versions.html
Best wishes,
jk
On 2022-03-22 07:35, DiRusso, Gio wrote:
> Hello all,
>
> Suppose I am running OCR on the same image/set of images with two
> different OCR models or engines (e.g. Tesseract and Kraken), or for
> that matter wish to compare OCR output with a human transcription of a
> given page. Is there any software or code that allows me to
> automatically compare the output and highlights lines or characters
> where the transcriptions disagree? I know such code would not be
> terribly difficult to write, but would prefer to use an already
> existing application if possible.
>
> If it matters, the use case I have in mind is actually for a
> right-to-left language, but processes that are designed for
> left-to-right languages could still be helpful.
>
> Many thanks for your time and help.
>
> Best wishes,
> -Gio
>
> -------------------------
>
> To unsubscribe from the DIGITALCLASSICIST list, click the following
> link:
> https://www.jiscmail.ac.uk/cgi-bin/WA-JISC.exe?SUBED1=DIGITALCLASSICIST&A=1
--
Joel Kalvesmaki
Director, Text Alignment Network
http://textalign.net
########################################################################
To unsubscribe from the DIGITALCLASSICIST list, click the following link:
https://www.jiscmail.ac.uk/cgi-bin/WA-JISC.exe?SUBED1=DIGITALCLASSICIST&A=1
This message was issued to members of www.jiscmail.ac.uk/DIGITALCLASSICIST, a mailing list hosted by www.jiscmail.ac.uk, terms & conditions are available at https://www.jiscmail.ac.uk/policyandsecurity/
|