View Single Post
  #6   Report Post  
Posted to microsoft.public.word.docmanagement
Graham Mayor Graham Mayor is offline
external usenet poster
 
Posts: 19,312
Default disable content copying in a pdf

You don't need a scanner! SnagIt will output to a graphics file that any
half decent OCR software will access directly. And SnagIt will capture the
full page (the full document even), not simply half the screen.

And yes you are right that 99.9% of people will not want to jump through
hoops. It's the other .1% you should be worried about. I will repeat
(because you are deluding yourself if you think otherwise) that if you can
see it you can copy it.

Just for the hell of it I converted a four page PDF using this process and
the only (Finereader 8) OCR read errors were two superscripted date ordinals
and two misread words. It took less than 10 minutes to produce a Word
document that was close to the original, and with a bit more time it could
have been made indistinguishable. It sounds as though you need better OCR
software.


--

Graham Mayor - Word MVP

My web site www.gmayor.com
Word MVP web site http://word.mvps.org




Don wrote:
"Graham Mayor" wrote in
:

It doesn't depend on anything of the kind. If you can see it you can
copy it. These methods only slow the user down. Worst case scenario -
screen capture the pdf one page at a time (Snag It will do that
easily) and run it through OCR. The encryption won't help there!


Your method is less than effective.
I tested it on a seven page PDF that is 100% pure text, no numerals
(which presents real issues, especially fractions).
Seven full pages of screen captures amounts to approx., 14 half page
screen captures.

I saved the resulting JPG's @ 100% non-compression (which most folks
of the masses are not even aware of).

The resulting OCR was approximately 60% accurate. (and all of this
utilizing the same spell checker that was used to created the initial
RTF from which the PDF was created. (My spell checker dictionary has
been supplemented extensively over a ten year period on "my
widgets"). Anothers dictionary not similarly focused would provide
even lesser results.

99.99% of the population are simply not going to jump through this
many hoops for such ineffective results.
Hell! 90% of population have moved their scanner to a back corner of
their desktop after attemping a solitary image and/or OCR.

One example of ineffective OCR is the TIF images and their conversion
to text that are made available by the Library of Congress-American
Memory archives.