#1   Report Post  
Posted to microsoft.public.word.docmanagement
Rebecca Rebecca is offline
external usenet poster
 
Posts: 49
Default A Better Solution?

I apologize that this is not exactly a question (except for one in the last
paragraph), but it would be nice to hear some comments or suggestions.

Using a Fijitsu ScanSnap scanner, I scanned (at a "very good" resolution) an
entire book of 350 pages, which has many color pictures, using Acrobat 7.0.
The resulting PDF was 154 megabytes. I then saved this PDF as a htm file,
opened it in MS Word 2003 or 2007 (same results in both), and saved it as a
word document. The resulting doc size is a teensy-weensy 21 KB, and the file
(which is broken up into 350 separate pages, just as in the original PDF) is
as readible as a PDF (and more navigational after I add some page numbers,
links, and the like -- this process can be automated). OCR-ing takes too much
time (and I have to proofread the files anyway), so just having images of the
book in one MS Word file is a workable solution. And with a Tablet PC the
images are inkable for annotations and the like.

If I scan directly into Word or other programs, I get huge files, no matter
how much I fiddle with the resolutions, file types, or compressions. Using a
ADF I'm currently scanning all the thousands of books and articles scattered
here and there in my library (for my personal use, so no copyright issues),
and I will be able to carry my portable (and searchable) library around on my
(under 2 pound) Tablet PC.

I've tried to import jpeg images into other programs such as OneNote,
AskSam, UltraRecall, you name it, but the resulting size of the files bloats
to intolerable levels. PDF files take up too much space (and are slow when
navigating). Does anyone have a better solution other than the one I
mentioned above?

  #2   Report Post  
Posted to microsoft.public.word.docmanagement
Tim in Ottawa Tim in Ottawa is offline
external usenet poster
 
Posts: 31
Default A Better Solution?

Just keep the big pdfs, this will retain the most quality. Although these
files seem big now, as computers and software get faster and faster in the
coming years, these file sizes will seem irrelevant.

"Rebecca" wrote:

I apologize that this is not exactly a question (except for one in the last
paragraph), but it would be nice to hear some comments or suggestions.

Using a Fijitsu ScanSnap scanner, I scanned (at a "very good" resolution) an
entire book of 350 pages, which has many color pictures, using Acrobat 7.0.
The resulting PDF was 154 megabytes. I then saved this PDF as a htm file,
opened it in MS Word 2003 or 2007 (same results in both), and saved it as a
word document. The resulting doc size is a teensy-weensy 21 KB, and the file
(which is broken up into 350 separate pages, just as in the original PDF) is
as readible as a PDF (and more navigational after I add some page numbers,
links, and the like -- this process can be automated). OCR-ing takes too much
time (and I have to proofread the files anyway), so just having images of the
book in one MS Word file is a workable solution. And with a Tablet PC the
images are inkable for annotations and the like.

If I scan directly into Word or other programs, I get huge files, no matter
how much I fiddle with the resolutions, file types, or compressions. Using a
ADF I'm currently scanning all the thousands of books and articles scattered
here and there in my library (for my personal use, so no copyright issues),
and I will be able to carry my portable (and searchable) library around on my
(under 2 pound) Tablet PC.

I've tried to import jpeg images into other programs such as OneNote,
AskSam, UltraRecall, you name it, but the resulting size of the files bloats
to intolerable levels. PDF files take up too much space (and are slow when
navigating). Does anyone have a better solution other than the one I
mentioned above?

  #3   Report Post  
Posted to microsoft.public.word.docmanagement
Robert M. Franz (RMF) Robert M. Franz (RMF) is offline
external usenet poster
 
Posts: 1,741
Default A Better Solution?

Hi Rebecca

Rebecca wrote:
Using a Fijitsu ScanSnap scanner, I scanned (at a "very good" resolution) an
entire book of 350 pages, which has many color pictures, using Acrobat 7.0.
The resulting PDF was 154 megabytes. I then saved this PDF as a htm file,
opened it in MS Word 2003 or 2007 (same results in both), and saved it as a
word document. The resulting doc size is a teensy-weensy 21 KB, and the file


Wait a minute: how many high-color pictures are there in your 350 page
document? At 21 KByte, I doubt there can be much text in a 350 page Word
document, and no pictures to speak of. When you save as HTML, the
pictures and other stuff are most probably external (that's what Word
does, anyway, when you save a document to HTML there).

There seems to be either a couple of other big files around, or your
resulting document cannot be much more then a mere text file ...

BTW, have you tried saving as RTF from Acrobat?

Greetinx
Robert
--
/"\ ASCII Ribbon Campaign | MS
\ / | MVP
X Against HTML | for
/ \ in e-mail & news | Word
  #4   Report Post  
Posted to microsoft.public.word.docmanagement
Rebecca Rebecca is offline
external usenet poster
 
Posts: 49
Default A Better Solution?

Yes, Robert, the scanned book contains dozens and dozens of color pictures,
and yes, that's the actual size of the file. I know, at first I thought it
was a bug (say, my computer was not reading the file size correctly) or I was
losing my eyesight or my mind.

It does seem impossible (and I've been experimenting with various scanned
images for years to get the file sizes down). Try it out and you'll see.
It's almost a miracle (if you've got a ton of scanned material in PDF files,
that is). And frankly, navigating PDF files in Acrobat is a pain (slow as
molasses, despite some nice functions, though). But with a Tablet PC, you
can ink and do other thinks with the images in MS Word with no problem, and
it still does not increase the file size too much (though I haven't been
highlighting that much yet).

I don't think there are other big (connecting) files lurking somewhere on my
hard disk, and if there are, well, this would be a first, too. I saved the
htm files as MS Word files, so go figure. But who knows, maybe you're right
-- maybe there's a catch somewhere. But as I recommended, try it with a big
PDF in Acrobat, save it as a htm file, open it in MS Word, and save it as a
MS Word doc. Viola!

"Robert M. Franz (RMF)" wrote:

Hi Rebecca

Rebecca wrote:
Using a Fijitsu ScanSnap scanner, I scanned (at a "very good" resolution) an
entire book of 350 pages, which has many color pictures, using Acrobat 7.0.
The resulting PDF was 154 megabytes. I then saved this PDF as a htm file,
opened it in MS Word 2003 or 2007 (same results in both), and saved it as a
word document. The resulting doc size is a teensy-weensy 21 KB, and the file


Wait a minute: how many high-color pictures are there in your 350 page
document? At 21 KByte, I doubt there can be much text in a 350 page Word
document, and no pictures to speak of. When you save as HTML, the
pictures and other stuff are most probably external (that's what Word
does, anyway, when you save a document to HTML there).

There seems to be either a couple of other big files around, or your
resulting document cannot be much more then a mere text file ...

BTW, have you tried saving as RTF from Acrobat?

Greetinx
Robert
--
/"\ ASCII Ribbon Campaign | MS
\ / | MVP
X Against HTML | for
/ \ in e-mail & news | Word

  #5   Report Post  
Posted to microsoft.public.word.docmanagement
Robert M. Franz (RMF) Robert M. Franz (RMF) is offline
external usenet poster
 
Posts: 1,741
Default A Better Solution?

Hi Rebecca

Rebecca wrote:
Yes, Robert, the scanned book contains dozens and dozens of color pictures,
and yes, that's the actual size of the file. I know, at first I thought it
was a bug (say, my computer was not reading the file size correctly) or I was
losing my eyesight or my mind.


Wasn't thinking about a bug, but I've seen my share of 300 page (and
lots more) files in Word, and one file with 300 pages, esp. if it is a
converted thingy, is unbeleivable to be less than 30 KByte in size --
and that's w/o pictures! :-)


It does seem impossible (and I've been experimenting with various scanned
images for years to get the file sizes down). Try it out and you'll see.
It's almost a miracle (if you've got a ton of scanned material in PDF files,
that is). And frankly, navigating PDF files in Acrobat is a pain (slow as
molasses, despite some nice functions, though). But with a Tablet PC, you
can ink and do other thinks with the images in MS Word with no problem, and
it still does not increase the file size too much (though I haven't been
highlighting that much yet).


You are talking about the "full" Acrobat (not the Reader), right?
Haven't got that one any of the systems I'm working at these days,
unfortunately. But if the file is as small as you say it is, can you
send it to me for inspection? I'm _very_ dubious I must admit. Sheer
information theory would prohibit compression in the magnitude we're
discussion here (well, that's not quite right: you can compress the
whole Bible into 1 bit, but then the whole Bible text must be part of
the decompressing algorithm -- and I very much doubt Acrobat hacked the
Word executables ... ;-)).

Greetinx from good old Europe
Robert
--
/"\ ASCII Ribbon Campaign | MS
\ / | MVP
X Against HTML | for
/ \ in e-mail & news | Word


  #6   Report Post  
Posted to microsoft.public.word.docmanagement
Rebecca Rebecca is offline
external usenet poster
 
Posts: 49
Default A Better Solution?

Robert,

Yes, I'm using Acrobat 7.0 -- it came with my Fujitsu ScanSnap scanner (an
incredibly useful scanner, by the way). And yes, the file sizes are correct
because I sent them to myself by e-mail (I couldn't find your e-mail address
-- if you give it to me and I'll send you one with a lot of color pictures
[the whole book was scanned in color] , but please remember it was scanned
for my personal use -- I want to avoid copyright entanglements). Did I
stumble upon a method to save an enormous amount of disk space? Like you I
am still very dubious -- it's too good to be true. Such compression is
absolutely impossible, as you said, and maybe you will be able to find out
what's really going on. The original PDF was 154 megabytes, and the
resulting MS doc file is about 170 kbs. As you implied, such radical
compression would be insane. Please see if you can get to the bottom of this.

"Robert M. Franz (RMF)" wrote:

Hi Rebecca

Rebecca wrote:
Yes, Robert, the scanned book contains dozens and dozens of color pictures,
and yes, that's the actual size of the file. I know, at first I thought it
was a bug (say, my computer was not reading the file size correctly) or I was
losing my eyesight or my mind.


Wasn't thinking about a bug, but I've seen my share of 300 page (and
lots more) files in Word, and one file with 300 pages, esp. if it is a
converted thingy, is unbeleivable to be less than 30 KByte in size --
and that's w/o pictures! :-)


It does seem impossible (and I've been experimenting with various scanned
images for years to get the file sizes down). Try it out and you'll see.
It's almost a miracle (if you've got a ton of scanned material in PDF files,
that is). And frankly, navigating PDF files in Acrobat is a pain (slow as
molasses, despite some nice functions, though). But with a Tablet PC, you
can ink and do other thinks with the images in MS Word with no problem, and
it still does not increase the file size too much (though I haven't been
highlighting that much yet).


You are talking about the "full" Acrobat (not the Reader), right?
Haven't got that one any of the systems I'm working at these days,
unfortunately. But if the file is as small as you say it is, can you
send it to me for inspection? I'm _very_ dubious I must admit. Sheer
information theory would prohibit compression in the magnitude we're
discussion here (well, that's not quite right: you can compress the
whole Bible into 1 bit, but then the whole Bible text must be part of
the decompressing algorithm -- and I very much doubt Acrobat hacked the
Word executables ... ;-)).

Greetinx from good old Europe
Robert
--
/"\ ASCII Ribbon Campaign | MS
\ / | MVP
X Against HTML | for
/ \ in e-mail & news | Word

  #7   Report Post  
Posted to microsoft.public.word.docmanagement
Robert M. Franz (RMF) Robert M. Franz (RMF) is offline
external usenet poster
 
Posts: 1,741
Default A Better Solution?

Hi Rebecca

Rebecca wrote:
Yes, I'm using Acrobat 7.0 -- it came with my Fujitsu ScanSnap scanner (an
incredibly useful scanner, by the way). And yes, the file sizes are correct
because I sent them to myself by e-mail (I couldn't find your e-mail address
-- if you give it to me and I'll send you one with a lot of color pictures
[the whole book was scanned in color] , but please remember it was scanned
for my personal use -- I want to avoid copyright entanglements). Did I
stumble upon a method to save an enormous amount of disk space? Like you I
am still very dubious -- it's too good to be true. Such compression is
absolutely impossible, as you said, and maybe you will be able to find out
what's really going on. The original PDF was 154 megabytes, and the
resulting MS doc file is about 170 kbs. As you implied, such radical
compression would be insane. Please see if you can get to the bottom of this.


My email address should show up even in this MSFT web thingy (CDO)
you're using to access this group: robert.franz (at) mvps.org

I'm really curious now to see what you'll send me! :-) And have no fear,
I won't send this document elsewhere without your approval.

Greetinx
Robert
--
/"\ ASCII Ribbon Campaign | MS
\ / | MVP
X Against HTML | for
/ \ in e-mail & news | Word
  #8   Report Post  
Posted to microsoft.public.word.docmanagement
Robert M. Franz (RMF) Robert M. Franz (RMF) is offline
external usenet poster
 
Posts: 1,741
Default A Better Solution?

Hi Rebecca

Robert M. Franz (RMF) wrote:
[..] My email address should show up even in this MSFT web thingy (CDO)
you're using to access this group: robert.franz (at) mvps.org

I'm really curious now to see what you'll send me! :-) And have no fear,
I won't send this document elsewhere without your approval.


OK, I got your file, and my scepticism seems to be justified: I see a
bunch of picture placeholders with small "red X".

When I switch to field-code view (ALT-F9), I see a whole document
consisting of INCLUDEPICTRE fields like this.

INCLUDEPICTURE "images/PYRAMIDS_img_0.jpg" \* MERGEFORMAT \d

If you look at the syntax of INCLUDEPICTURE, the \d switch is explicitly
prohibiting storing the image in the document. So, look at the "images"
subfolder at the position of your file, and there you go.

Even Acrobat can't do magin, after all ... :-)

Greetinx
Robert
--
/"\ ASCII Ribbon Campaign | MS
\ / | MVP
X Against HTML | for
/ \ in e-mail & news | Word
  #9   Report Post  
Posted to microsoft.public.word.docmanagement
Rebecca Rebecca is offline
external usenet poster
 
Posts: 49
Default A Better Solution?

Thanks, Robert.

Alas! When something is too good to be true, it is usually is just that.
The image folder you mentioned was right there staring me in the face (which
is quite red right now). And the jpegs there are all about 350-400 kbs each.
Thank God Seagate is going to come out with a terabyte hard drive because
I'll need one or two thanks to all these huge PDFs.

"Robert M. Franz (RMF)" wrote:

Hi Rebecca

Robert M. Franz (RMF) wrote:
[..] My email address should show up even in this MSFT web thingy (CDO)
you're using to access this group: robert.franz (at) mvps.org

I'm really curious now to see what you'll send me! :-) And have no fear,
I won't send this document elsewhere without your approval.


OK, I got your file, and my scepticism seems to be justified: I see a
bunch of picture placeholders with small "red X".

When I switch to field-code view (ALT-F9), I see a whole document
consisting of INCLUDEPICTRE fields like this.

INCLUDEPICTURE "images/PYRAMIDS_img_0.jpg" \* MERGEFORMAT \d

If you look at the syntax of INCLUDEPICTURE, the \d switch is explicitly
prohibiting storing the image in the document. So, look at the "images"
subfolder at the position of your file, and there you go.

Even Acrobat can't do magin, after all ... :-)

Greetinx
Robert
--
/"\ ASCII Ribbon Campaign | MS
\ / | MVP
X Against HTML | for
/ \ in e-mail & news | Word

Reply
Thread Tools
Display Modes

Posting Rules

Smilies are On
[IMG] code is On
HTML code is Off

Forum Jump

Similar Threads
Thread Thread Starter Forum Replies Last Post
Solution! How to import zipcodes into Word 2003 WITH their leading zeros... [email protected] Mailmerge 1 March 17th 06 07:51 AM
My boss hasn't used Styles, what is the solution? Khoshravan Microsoft Word Help 4 March 10th 06 02:36 PM
Office solution deployment on sharepoint Sariya Microsoft Word Help 0 October 18th 05 06:38 AM
desperately searching for special backup solution Ulfried Terlitza Microsoft Word Help 1 June 21st 05 08:55 PM
View Only Solution for Word Documents Fer Mav Microsoft Word Help 1 April 24th 05 11:55 AM


All times are GMT +1. The time now is 01:36 PM.

Copyright ©2000 - 2024, Jelsoft Enterprises Ltd.
Copyright ©2004-2024 Microsoft Office Word Forum - WordBanter.
The comments are property of their posters.
 

About Us

"It's about Microsoft Word"