Home |
Search |
Today's Posts |
#1
Posted to microsoft.public.word.newusers
|
|||
|
|||
File size and HTML
Hi,
I am using Word2007 to edit a rather largish bilingual dictionary. When I strip all superfluous HLML-tags, the size is around 6 MB. The file produced by Word used to be around 1 MB larger, about 7 MB. I use a DOS32 program to strip the file of its superfluous tags for advanced processing. However, lately, the file size has increased enormously. Under Word-2007 (before I used Word-2000) the file size has increased from 6 MB to 15.9 MB approx. For instance, the header now contains a list of all available fonts (several hundred, while I use only two: Times New Roman and Symbol). Also, every two or three words the file contains totally superfluous information of the font, language and font size. How can I bring back the file size to something more normal? Word slows down considerably with a file of this size. Thanks for your help, Rob in Amsterdam. |
#2
Posted to microsoft.public.word.newusers
|
|||
|
|||
File size and HTML
Hi Rob,
Word 2007's new features (langauge neutral architecture, quick style sets, font pairs in themes...) can put quite a bit of information into a Word web document to allow restoring to a .doc,.docX/M file type from a web page. If you use Office Button=Save As=Other File Types=Web Page-Filtered you may see quite a bit of that removed. What is the DOS utility you're using to filter the HTML output? ============= "Rob van Albada" wrote in message ... Hi, I am using Word2007 to edit a rather largish bilingual dictionary. When I strip all superfluous HLML-tags, the size is around 6 MB. The file produced by Word used to be around 1 MB larger, about 7 MB. I use a DOS32 program to strip the file of its superfluous tags for advanced processing. However, lately, the file size has increased enormously. Under Word-2007 (before I used Word-2000) the file size has increased from 6 MB to 15.9 MB approx. For instance, the header now contains a list of all available fonts (several hundred, while I use only two: Times New Roman and Symbol). Also, every two or three words the file contains totally superfluous information of the font, language and font size. How can I bring back the file size to something more normal? Word slows down considerably with a file of this size. Thanks for your help, Rob in Amsterdam -- Bob Buckland ?:-) MS Office System Products MVP *Courtesy is not expensive and can pay big dividends* |
#3
Posted to microsoft.public.word.newusers
|
|||
|
|||
File size and HTML
Hi Bob,
Thanks. I followed your advice and got a file which is 11.975.311 bytes in size, i.e. around 4 MB smaller than the one I had but not nearly as small as the file made by Word-2000. Here is a sample of the code I get: Fragment of the header: !-- /* Font Definitions */ @font-face {font-family:Helvetica; panose-1:2 11 5 4 2 2 2 2 2 4;} @font-face {font-family:Courier; panose-1:2 7 4 9 2 2 5 2 4 4;} @font-face {font-family:"Tms Rmn"; panose-1:2 2 6 3 4 5 5 2 3 4;} @font-face {font-family:Helv; panose-1:2 11 6 4 2 2 2 3 2 4;} @font-face {font-family:"New York"; panose-1:2 4 5 3 6 5 6 2 3 4;} @font-face {font-family:System; panose-1:0 0 0 0 0 0 0 0 0 0;} @font-face {font-family:Wingdings; panose-1:5 0 0 0 0 0 0 0 0 0;} @font-face {font-family:"MS Mincho"; panose-1:2 2 6 9 4 2 5 8 3 4;} Fragment of the body of the file: p class=MsoNormalbspan lang=PT-BR style='font-size:9.0pt'acak/span/bspan lang=PT-BR style='font-size:9.0pt'-bacakan/b II bi: ongeordend, verward, wanordelijk, rommelig Tw {iSapa wani kandha yèn aku nyambutgawé acak-acakan?/i Tr253}·/span/p p class=MsoNormalbspan lang=PT-BR style='font-size:9.0pt'acak/span/bspan lang=PT-BR style='font-size:9.0pt'-bacak/b III Gun: meevragen, vragen om mee te komen {\iLha mbah Nan ki yahéné wis acak-acak ki pité jawané.../i Ros3}; zo iajak/i·/span/p p class=MsoNormalbspan lang=PT-BR style='font-size:9.0pt'acala/span/bspan lang=PT-BR style='font-size:9.0pt' bt: berg·/span/p p class=MsoNormalispan lang=PT-BR style='font-size:9.0pt'ora/span/ispan lang=PT-BR style='font-size:9.0pt', idurung/i bacan/b gw: helemaal (nog) niet·/span/p As you see, the font definitions take space, but most space is used by tags in the body of the text. The whole text is 9pt Times, with a few arrows which are from the Symbol font strewn in between (also 9 pt.) The language setting also does not change anywhere in the file. (It is only relevant to the key code setting, I suppose.) So there is no need at all to repeat it every few words. The program I use to remove superfluous HTML code is STRIPHTM.EXE, which I wrote in Stonybrook Modula-2. If you wish I can mail you a copy. Kind regards, Rob, Amsterdam. On Sun, 20 Jul 2008 09:56:44 -0700, "Bob Buckland ?:-\)" 75214.226(At Beautiful Downtown)compuserve.com wrote: Hi Rob, Word 2007's new features (langauge neutral architecture, quick style sets, font pairs in themes...) can put quite a bit of information into a Word web document to allow restoring to a .doc,.docX/M file type from a web page. If you use Office Button=Save As=Other File Types=Web Page-Filtered you may see quite a bit of that removed. What is the DOS utility you're using to filter the HTML output? ============= "Rob van Albada" wrote in message ... Hi, I am using Word2007 to edit a rather largish bilingual dictionary. When I strip all superfluous HLML-tags, the size is around 6 MB. The file produced by Word used to be around 1 MB larger, about 7 MB. I use a DOS32 program to strip the file of its superfluous tags for advanced processing. However, lately, the file size has increased enormously. Under Word-2007 (before I used Word-2000) the file size has increased from 6 MB to 15.9 MB approx. For instance, the header now contains a list of all available fonts (several hundred, while I use only two: Times New Roman and Symbol). Also, every two or three words the file contains totally superfluous information of the font, language and font size. How can I bring back the file size to something more normal? Word slows down considerably with a file of this size. Thanks for your help, Rob in Amsterdam -- Bob Buckland ?:-) MS Office System Products MVP *Courtesy is not expensive and can pay big dividends* |
Reply |
Thread Tools | |
Display Modes | |
|
|
Similar Threads | ||||
Thread | Forum | |||
Does Word 2007 save doc as html with relative font size? | Microsoft Word Help | |||
Include images in Word file when insert html file | Microsoft Word Help | |||
Looking for XSLT File to transform Word 2003 XML File into Word 2000 HTML | Microsoft Word Help | |||
Split a file into two yet size of both files are near size of orig | Microsoft Word Help | |||
Word - Save as HTML - font-size:10.0pt | Microsoft Word Help |