Depends on what you mean by "meaningful". Obviously, you can take
a screenshot of the PDF page and import that into Word as one big
photo, but I imagine that is not what you want to do?
If you mean to reverse the "convert to PDF" process and get back
the original Word document you used to make the PDF, then I'm
afraid that is basically impossible. A lot of information is lost
in the "printing" process. PDF is actually a type-setting
language, so it contains things like "there is a letter "a" at
coordinates x,y on page 1". Or, worse, some PDFs are actually
just scanned images with no text in them at all. For these, all
you can do is print out the PDF and then scan it back in with an
OCR-capable scanner. Sounds awful, but modern OCR is actually
surprisingly smart.
I generally use the poppler-utils programs (pdftotext, pdfimages,
pdftops, pstotext, pdftohtml, etc.) to extract computer-readable
meaning from PDF documents. I find they tend to do a pretty good
job at figuring out the formatting. There are various options and
flags to choose from, and sometimes you get better results using
an intermediate format (pdftops followed by pstotext), but your
mileage may vary. Remember, since PDF is a type-setting format
converting it into something else is essentially an image
recognition algorithm. Usually, you will get a word here or there
that is split into two, or sometimes two words get stuck
together. So, be sure to spell check.
But yes, perhaps the easiest thing to do is load up the PDF in
Acrobat, cut-and-paste all the text into Word, and then
cut-and-paste each figure. Then spend some time re-formatting,
etc. Or, alternately, you could spend some time trying to find
the original Word document. The latter is somewhat easier to
automate.
Sorry!
-James Holton
MAD Scientist
On 9/1/2012 3:48 AM, Rex Palmer wrote:
[log in to unmask]"
type="cite">
Dear CCP4BB
Does
anyone know how to convert a .pdf file into a meaningful
Word file.
Any
suggestions will be greatly appreciated. The pdf file has
numerous figures and tables.