Home |
Search |
Today's Posts |
|
#1
Posted to microsoft.public.word.docmanagement
|
|||
|
|||
A Better Solution?
I apologize that this is not exactly a question (except for one in the last
paragraph), but it would be nice to hear some comments or suggestions. Using a Fijitsu ScanSnap scanner, I scanned (at a "very good" resolution) an entire book of 350 pages, which has many color pictures, using Acrobat 7.0. The resulting PDF was 154 megabytes. I then saved this PDF as a htm file, opened it in MS Word 2003 or 2007 (same results in both), and saved it as a word document. The resulting doc size is a teensy-weensy 21 KB, and the file (which is broken up into 350 separate pages, just as in the original PDF) is as readible as a PDF (and more navigational after I add some page numbers, links, and the like -- this process can be automated). OCR-ing takes too much time (and I have to proofread the files anyway), so just having images of the book in one MS Word file is a workable solution. And with a Tablet PC the images are inkable for annotations and the like. If I scan directly into Word or other programs, I get huge files, no matter how much I fiddle with the resolutions, file types, or compressions. Using a ADF I'm currently scanning all the thousands of books and articles scattered here and there in my library (for my personal use, so no copyright issues), and I will be able to carry my portable (and searchable) library around on my (under 2 pound) Tablet PC. I've tried to import jpeg images into other programs such as OneNote, AskSam, UltraRecall, you name it, but the resulting size of the files bloats to intolerable levels. PDF files take up too much space (and are slow when navigating). Does anyone have a better solution other than the one I mentioned above? |
#2
Posted to microsoft.public.word.docmanagement
|
|||
|
|||
A Better Solution?
Just keep the big pdfs, this will retain the most quality. Although these
files seem big now, as computers and software get faster and faster in the coming years, these file sizes will seem irrelevant. "Rebecca" wrote: I apologize that this is not exactly a question (except for one in the last paragraph), but it would be nice to hear some comments or suggestions. Using a Fijitsu ScanSnap scanner, I scanned (at a "very good" resolution) an entire book of 350 pages, which has many color pictures, using Acrobat 7.0. The resulting PDF was 154 megabytes. I then saved this PDF as a htm file, opened it in MS Word 2003 or 2007 (same results in both), and saved it as a word document. The resulting doc size is a teensy-weensy 21 KB, and the file (which is broken up into 350 separate pages, just as in the original PDF) is as readible as a PDF (and more navigational after I add some page numbers, links, and the like -- this process can be automated). OCR-ing takes too much time (and I have to proofread the files anyway), so just having images of the book in one MS Word file is a workable solution. And with a Tablet PC the images are inkable for annotations and the like. If I scan directly into Word or other programs, I get huge files, no matter how much I fiddle with the resolutions, file types, or compressions. Using a ADF I'm currently scanning all the thousands of books and articles scattered here and there in my library (for my personal use, so no copyright issues), and I will be able to carry my portable (and searchable) library around on my (under 2 pound) Tablet PC. I've tried to import jpeg images into other programs such as OneNote, AskSam, UltraRecall, you name it, but the resulting size of the files bloats to intolerable levels. PDF files take up too much space (and are slow when navigating). Does anyone have a better solution other than the one I mentioned above? |
#3
Posted to microsoft.public.word.docmanagement
|
|||
|
|||
A Better Solution?
Hi Rebecca
Rebecca wrote: Using a Fijitsu ScanSnap scanner, I scanned (at a "very good" resolution) an entire book of 350 pages, which has many color pictures, using Acrobat 7.0. The resulting PDF was 154 megabytes. I then saved this PDF as a htm file, opened it in MS Word 2003 or 2007 (same results in both), and saved it as a word document. The resulting doc size is a teensy-weensy 21 KB, and the file Wait a minute: how many high-color pictures are there in your 350 page document? At 21 KByte, I doubt there can be much text in a 350 page Word document, and no pictures to speak of. When you save as HTML, the pictures and other stuff are most probably external (that's what Word does, anyway, when you save a document to HTML there). There seems to be either a couple of other big files around, or your resulting document cannot be much more then a mere text file ... BTW, have you tried saving as RTF from Acrobat? Greetinx Robert -- /"\ ASCII Ribbon Campaign | MS \ / | MVP X Against HTML | for / \ in e-mail & news | Word |
#4
Posted to microsoft.public.word.docmanagement
|
|||
|
|||
A Better Solution?
Yes, Robert, the scanned book contains dozens and dozens of color pictures,
and yes, that's the actual size of the file. I know, at first I thought it was a bug (say, my computer was not reading the file size correctly) or I was losing my eyesight or my mind. It does seem impossible (and I've been experimenting with various scanned images for years to get the file sizes down). Try it out and you'll see. It's almost a miracle (if you've got a ton of scanned material in PDF files, that is). And frankly, navigating PDF files in Acrobat is a pain (slow as molasses, despite some nice functions, though). But with a Tablet PC, you can ink and do other thinks with the images in MS Word with no problem, and it still does not increase the file size too much (though I haven't been highlighting that much yet). I don't think there are other big (connecting) files lurking somewhere on my hard disk, and if there are, well, this would be a first, too. I saved the htm files as MS Word files, so go figure. But who knows, maybe you're right -- maybe there's a catch somewhere. But as I recommended, try it with a big PDF in Acrobat, save it as a htm file, open it in MS Word, and save it as a MS Word doc. Viola! "Robert M. Franz (RMF)" wrote: Hi Rebecca Rebecca wrote: Using a Fijitsu ScanSnap scanner, I scanned (at a "very good" resolution) an entire book of 350 pages, which has many color pictures, using Acrobat 7.0. The resulting PDF was 154 megabytes. I then saved this PDF as a htm file, opened it in MS Word 2003 or 2007 (same results in both), and saved it as a word document. The resulting doc size is a teensy-weensy 21 KB, and the file Wait a minute: how many high-color pictures are there in your 350 page document? At 21 KByte, I doubt there can be much text in a 350 page Word document, and no pictures to speak of. When you save as HTML, the pictures and other stuff are most probably external (that's what Word does, anyway, when you save a document to HTML there). There seems to be either a couple of other big files around, or your resulting document cannot be much more then a mere text file ... BTW, have you tried saving as RTF from Acrobat? Greetinx Robert -- /"\ ASCII Ribbon Campaign | MS \ / | MVP X Against HTML | for / \ in e-mail & news | Word |
#5
Posted to microsoft.public.word.docmanagement
|
|||
|
|||
A Better Solution?
Hi Rebecca
Rebecca wrote: Yes, Robert, the scanned book contains dozens and dozens of color pictures, and yes, that's the actual size of the file. I know, at first I thought it was a bug (say, my computer was not reading the file size correctly) or I was losing my eyesight or my mind. Wasn't thinking about a bug, but I've seen my share of 300 page (and lots more) files in Word, and one file with 300 pages, esp. if it is a converted thingy, is unbeleivable to be less than 30 KByte in size -- and that's w/o pictures! :-) It does seem impossible (and I've been experimenting with various scanned images for years to get the file sizes down). Try it out and you'll see. It's almost a miracle (if you've got a ton of scanned material in PDF files, that is). And frankly, navigating PDF files in Acrobat is a pain (slow as molasses, despite some nice functions, though). But with a Tablet PC, you can ink and do other thinks with the images in MS Word with no problem, and it still does not increase the file size too much (though I haven't been highlighting that much yet). You are talking about the "full" Acrobat (not the Reader), right? Haven't got that one any of the systems I'm working at these days, unfortunately. But if the file is as small as you say it is, can you send it to me for inspection? I'm _very_ dubious I must admit. Sheer information theory would prohibit compression in the magnitude we're discussion here (well, that's not quite right: you can compress the whole Bible into 1 bit, but then the whole Bible text must be part of the decompressing algorithm -- and I very much doubt Acrobat hacked the Word executables ... ;-)). Greetinx from good old Europe Robert -- /"\ ASCII Ribbon Campaign | MS \ / | MVP X Against HTML | for / \ in e-mail & news | Word |
#6
Posted to microsoft.public.word.docmanagement
|
|||
|
|||
A Better Solution?
Robert,
Yes, I'm using Acrobat 7.0 -- it came with my Fujitsu ScanSnap scanner (an incredibly useful scanner, by the way). And yes, the file sizes are correct because I sent them to myself by e-mail (I couldn't find your e-mail address -- if you give it to me and I'll send you one with a lot of color pictures [the whole book was scanned in color] , but please remember it was scanned for my personal use -- I want to avoid copyright entanglements). Did I stumble upon a method to save an enormous amount of disk space? Like you I am still very dubious -- it's too good to be true. Such compression is absolutely impossible, as you said, and maybe you will be able to find out what's really going on. The original PDF was 154 megabytes, and the resulting MS doc file is about 170 kbs. As you implied, such radical compression would be insane. Please see if you can get to the bottom of this. "Robert M. Franz (RMF)" wrote: Hi Rebecca Rebecca wrote: Yes, Robert, the scanned book contains dozens and dozens of color pictures, and yes, that's the actual size of the file. I know, at first I thought it was a bug (say, my computer was not reading the file size correctly) or I was losing my eyesight or my mind. Wasn't thinking about a bug, but I've seen my share of 300 page (and lots more) files in Word, and one file with 300 pages, esp. if it is a converted thingy, is unbeleivable to be less than 30 KByte in size -- and that's w/o pictures! :-) It does seem impossible (and I've been experimenting with various scanned images for years to get the file sizes down). Try it out and you'll see. It's almost a miracle (if you've got a ton of scanned material in PDF files, that is). And frankly, navigating PDF files in Acrobat is a pain (slow as molasses, despite some nice functions, though). But with a Tablet PC, you can ink and do other thinks with the images in MS Word with no problem, and it still does not increase the file size too much (though I haven't been highlighting that much yet). You are talking about the "full" Acrobat (not the Reader), right? Haven't got that one any of the systems I'm working at these days, unfortunately. But if the file is as small as you say it is, can you send it to me for inspection? I'm _very_ dubious I must admit. Sheer information theory would prohibit compression in the magnitude we're discussion here (well, that's not quite right: you can compress the whole Bible into 1 bit, but then the whole Bible text must be part of the decompressing algorithm -- and I very much doubt Acrobat hacked the Word executables ... ;-)). Greetinx from good old Europe Robert -- /"\ ASCII Ribbon Campaign | MS \ / | MVP X Against HTML | for / \ in e-mail & news | Word |
#7
Posted to microsoft.public.word.docmanagement
|
|||
|
|||
A Better Solution?
Hi Rebecca
Rebecca wrote: Yes, I'm using Acrobat 7.0 -- it came with my Fujitsu ScanSnap scanner (an incredibly useful scanner, by the way). And yes, the file sizes are correct because I sent them to myself by e-mail (I couldn't find your e-mail address -- if you give it to me and I'll send you one with a lot of color pictures [the whole book was scanned in color] , but please remember it was scanned for my personal use -- I want to avoid copyright entanglements). Did I stumble upon a method to save an enormous amount of disk space? Like you I am still very dubious -- it's too good to be true. Such compression is absolutely impossible, as you said, and maybe you will be able to find out what's really going on. The original PDF was 154 megabytes, and the resulting MS doc file is about 170 kbs. As you implied, such radical compression would be insane. Please see if you can get to the bottom of this. My email address should show up even in this MSFT web thingy (CDO) you're using to access this group: robert.franz (at) mvps.org I'm really curious now to see what you'll send me! :-) And have no fear, I won't send this document elsewhere without your approval. Greetinx Robert -- /"\ ASCII Ribbon Campaign | MS \ / | MVP X Against HTML | for / \ in e-mail & news | Word |
Reply |
Thread Tools | |
Display Modes | |
|
|
Similar Threads | ||||
Thread | Forum | |||
Solution! How to import zipcodes into Word 2003 WITH their leading zeros... | Mailmerge | |||
My boss hasn't used Styles, what is the solution? | Microsoft Word Help | |||
Office solution deployment on sharepoint | Microsoft Word Help | |||
desperately searching for special backup solution | Microsoft Word Help | |||
View Only Solution for Word Documents | Microsoft Word Help |