Reply
 
Thread Tools Display Modes
  #1   Report Post  
Posted to microsoft.public.word.newusers
Rob van Albada Rob van Albada is offline
external usenet poster
 
Posts: 10
Default File size and HTML

Hi,

I am using Word2007 to edit a rather largish bilingual dictionary.
When I strip all superfluous HLML-tags, the size is around 6 MB.
The file produced by Word used to be around 1 MB larger, about 7 MB.
I use a DOS32 program to strip the file of its superfluous tags for
advanced processing.
However, lately, the file size has increased enormously.
Under Word-2007 (before I used Word-2000) the file size has increased
from 6 MB to 15.9 MB approx.
For instance, the header now contains a list of all available fonts
(several hundred, while I use only two: Times New Roman and Symbol).
Also, every two or three words the file contains totally superfluous
information of the font, language and font size.
How can I bring back the file size to something more normal?
Word slows down considerably with a file of this size.

Thanks for your help,

Rob in Amsterdam.






  #2   Report Post  
Posted to microsoft.public.word.newusers
Bob Buckland ?:-\) Bob   Buckland ?:-\) is offline
external usenet poster
 
Posts: 2,073
Default File size and HTML

Hi Rob,

Word 2007's new features (langauge neutral architecture, quick style sets, font pairs in themes...) can put quite a bit of
information into a Word web document to allow restoring to a .doc,.docX/M file type from a web page.

If you use Office Button=Save As=Other File Types=Web Page-Filtered
you may see quite a bit of that removed.

What is the DOS utility you're using to filter the HTML output?

=============
"Rob van Albada" wrote in message ...
Hi,

I am using Word2007 to edit a rather largish bilingual dictionary.
When I strip all superfluous HLML-tags, the size is around 6 MB.
The file produced by Word used to be around 1 MB larger, about 7 MB.
I use a DOS32 program to strip the file of its superfluous tags for
advanced processing.
However, lately, the file size has increased enormously.
Under Word-2007 (before I used Word-2000) the file size has increased
from 6 MB to 15.9 MB approx.
For instance, the header now contains a list of all available fonts
(several hundred, while I use only two: Times New Roman and Symbol).
Also, every two or three words the file contains totally superfluous
information of the font, language and font size.
How can I bring back the file size to something more normal?
Word slows down considerably with a file of this size.

Thanks for your help,

Rob in Amsterdam
--

Bob Buckland ?:-)
MS Office System Products MVP

*Courtesy is not expensive and can pay big dividends*







  #3   Report Post  
Posted to microsoft.public.word.newusers
Rob van Albada Rob van Albada is offline
external usenet poster
 
Posts: 10
Default File size and HTML

Hi Bob,

Thanks. I followed your advice and got a file which is 11.975.311
bytes in size, i.e. around 4 MB smaller than the one I had but not
nearly as small as the file made by Word-2000.

Here is a sample of the code I get:

Fragment of the header:


!--
/* Font Definitions */
@font-face
{font-family:Helvetica;
panose-1:2 11 5 4 2 2 2 2 2 4;}
@font-face
{font-family:Courier;
panose-1:2 7 4 9 2 2 5 2 4 4;}
@font-face
{font-family:"Tms Rmn";
panose-1:2 2 6 3 4 5 5 2 3 4;}
@font-face
{font-family:Helv;
panose-1:2 11 6 4 2 2 2 3 2 4;}
@font-face
{font-family:"New York";
panose-1:2 4 5 3 6 5 6 2 3 4;}
@font-face
{font-family:System;
panose-1:0 0 0 0 0 0 0 0 0 0;}
@font-face
{font-family:Wingdings;
panose-1:5 0 0 0 0 0 0 0 0 0;}
@font-face
{font-family:"MS Mincho";
panose-1:2 2 6 9 4 2 5 8 3 4;}


Fragment of the body of the file:


p class=MsoNormalbspan lang=PT-BR
style='font-size:9.0pt'acak/span/bspan
lang=PT-BR style='font-size:9.0pt'-bacakan/b II bi: ongeordend,
verward,
wanordelijk, rommelig Tw {iSapa wani kandha yèn aku nyambutgawé
acak-acakan?/i
Tr253}·/span/p

p class=MsoNormalbspan lang=PT-BR
style='font-size:9.0pt'acak/span/bspan
lang=PT-BR style='font-size:9.0pt'-bacak/b III Gun: meevragen,
vragen om
mee te komen {\iLha mbah Nan ki yahéné wis acak-acak ki pité
jawané.../i
Ros3}; zo iajak/i·/span/p

p class=MsoNormalbspan lang=PT-BR
style='font-size:9.0pt'acala/span/bspan
lang=PT-BR style='font-size:9.0pt' bt: berg·/span/p

p class=MsoNormalispan lang=PT-BR
style='font-size:9.0pt'ora/span/ispan
lang=PT-BR style='font-size:9.0pt', idurung/i bacan/b gw:
helemaal
(nog) niet·/span/p

As you see, the font definitions take space, but most space is used by
tags in the body of the text. The whole text is 9pt Times, with a few
arrows which are from the Symbol font strewn in between (also 9 pt.)
The language setting also does not change anywhere in the file. (It is
only relevant to the key code setting, I suppose.) So there is no need
at all to repeat it every few words.

The program I use to remove superfluous HTML code is STRIPHTM.EXE,
which I wrote in Stonybrook Modula-2.
If you wish I can mail you a copy.

Kind regards,

Rob, Amsterdam.





On Sun, 20 Jul 2008 09:56:44 -0700, "Bob Buckland ?:-\)"
75214.226(At Beautiful Downtown)compuserve.com wrote:

Hi Rob,

Word 2007's new features (langauge neutral architecture, quick style sets, font pairs in themes...) can put quite a bit of
information into a Word web document to allow restoring to a .doc,.docX/M file type from a web page.

If you use Office Button=Save As=Other File Types=Web Page-Filtered
you may see quite a bit of that removed.

What is the DOS utility you're using to filter the HTML output?

=============
"Rob van Albada" wrote in message ...
Hi,

I am using Word2007 to edit a rather largish bilingual dictionary.
When I strip all superfluous HLML-tags, the size is around 6 MB.
The file produced by Word used to be around 1 MB larger, about 7 MB.
I use a DOS32 program to strip the file of its superfluous tags for
advanced processing.
However, lately, the file size has increased enormously.
Under Word-2007 (before I used Word-2000) the file size has increased
from 6 MB to 15.9 MB approx.
For instance, the header now contains a list of all available fonts
(several hundred, while I use only two: Times New Roman and Symbol).
Also, every two or three words the file contains totally superfluous
information of the font, language and font size.
How can I bring back the file size to something more normal?
Word slows down considerably with a file of this size.

Thanks for your help,

Rob in Amsterdam
--

Bob Buckland ?:-)
MS Office System Products MVP

*Courtesy is not expensive and can pay big dividends*








Reply
Thread Tools
Display Modes

Posting Rules

Smilies are On
[IMG] code is On
HTML code is Off

Forum Jump

Similar Threads
Thread Thread Starter Forum Replies Last Post
Does Word 2007 save doc as html with relative font size? OhioTech Microsoft Word Help 6 July 4th 08 06:26 AM
Include images in Word file when insert html file JandB Microsoft Word Help 0 September 20th 07 06:10 AM
Looking for XSLT File to transform Word 2003 XML File into Word 2000 HTML [email protected] Microsoft Word Help 0 August 24th 07 01:34 PM
Split a file into two yet size of both files are near size of orig EricAtMII Microsoft Word Help 1 May 9th 07 09:58 PM
Word - Save as HTML - font-size:10.0pt OhioTech Microsoft Word Help 3 May 6th 07 11:19 PM


All times are GMT +1. The time now is 07:41 PM.

Copyright ©2000 - 2024, Jelsoft Enterprises Ltd.
Copyright ©2004-2024 Microsoft Office Word Forum - WordBanter.
The comments are property of their posters.
 

About Us

"It's about Microsoft Word"