Hi David,
> Thanks to those of you who have replied. What is it
> about PDF documents that makes them so resistant to
> CAQDAS? If JPG files can be used in many programs
> (like Atlas.ti) why can't programs be made that use
> the ever more common PDF format?
For the programmer's point of view it is not the file type as such but how
its content is organised. JPG files are pictures, and if you want to analyse
pictures, this file type is okay. But JPG files can obtain text, you can
read it, but for QDA programs or text analysis software in general JPG files
are a bunch of pixels. If you want to extract the text from a picture, you
have to use OCR (optical character recognition) software, other colleagues
worked out on that before in this thread.
PDF is a display format, mostly used for text, and it has a certain
structure that describes first the text and second how the text is
displayed. If you look at HTML code, PDF is similar but much more
complicated. So if a text analysis program reads a PDF file, it must extract
the text and ignore the formatting information. If the PDF file was
generated from a text processor, this will work.
Some PDF files however contain scanned graphics, so an OCR program has to be
used. For a programmer this means re-inventing the wheel.
I hope this explains why JPGs and PDFs are causing problems.
Harald
--
Echte DSL-Flatrate dauerhaft für 0,- Euro*!
"Feel free" mit GMX DSL! http://www.gmx.net/de/go/dsl
|