Home |
Search |
Today's Posts |
#1
Posted to microsoft.public.word.newusers
|
|||
|
|||
Saving a Word DOC as HTML
I've discovered that by Saving a document As HTML, it will preserve
attributes such as italic ( I ... /I ), bold, and certain characters, such as #268; (which is a Czech accent). If the document is a table, the content of the cells is saved inside of htm pairs like td ... /td . In fact, the Save As HTM gives me a text file which I can analyze and process with external programs. So far so good. The problem is, that the Save As conversion also gives me a lot of trash that is of no use to me, such as: p class=MsoPlainText style='margin-left:.25 in. etc etc span style='font-size:12 etc ... /span etc I can search and replace some of these strings, and I have developed filters that can take care of others. But it's a laborious process involving various software. It would be better not to get the "trash" in the first place. The Save As XML option is even worse. Is there a way to make a simple conversion using Word or 3rd party software? |
#2
Posted to microsoft.public.word.newusers
|
|||
|
|||
Saving a Word DOC as HTML
One man's trash is another's treasure trove. The short answer to your
question is no because only you know which bits of formatting you want to interrogate and which you want to ignore. That said, I believe there is a way to remove some of the bloat that Word adds to HTML documents. I can't remember it off the top of my head but I think most of what it removes is relatively easy to identify yourself. -- Enjoy, Tony "Peter Rooney" wrote in message ink.net... I've discovered that by Saving a document As HTML, it will preserve attributes such as italic ( I ... /I ), bold, and certain characters, such as #268; (which is a Czech accent). If the document is a table, the content of the cells is saved inside of htm pairs like td ... /td . In fact, the Save As HTM gives me a text file which I can analyze and process with external programs. So far so good. The problem is, that the Save As conversion also gives me a lot of trash that is of no use to me, such as: p class=MsoPlainText style='margin-left:.25 in. etc etc span style='font-size:12 etc ... /span etc I can search and replace some of these strings, and I have developed filters that can take care of others. But it's a laborious process involving various software. It would be better not to get the "trash" in the first place. The Save As XML option is even worse. Is there a way to make a simple conversion using Word or 3rd party software? |
#3
Posted to microsoft.public.word.newusers
|
|||
|
|||
Saving a Word DOC as HTML
I've used them but, according to an article at
http://techrepublic.com.com/5100-1035_11-5197013.html , there are a couple of free utilities you can download from Microsoft that'll clear out the gubbins that Word introduces into HTML (links in the article). HTH Steve Tony Jollans wrote: One man's trash is another's treasure trove. The short answer to your question is no because only you know which bits of formatting you want to interrogate and which you want to ignore. That said, I believe there is a way to remove some of the bloat that Word adds to HTML documents. I can't remember it off the top of my head but I think most of what it removes is relatively easy to identify yourself. -- Enjoy, Tony "Peter Rooney" wrote in message ink.net... I've discovered that by Saving a document As HTML, it will preserve attributes such as italic ( I ... /I ), bold, and certain characters, such as #268; (which is a Czech accent). If the document is a table, the content of the cells is saved inside of htm pairs like td ... /td . In fact, the Save As HTM gives me a text file which I can analyze and process with external programs. So far so good. The problem is, that the Save As conversion also gives me a lot of trash that is of no use to me, such as: p class=MsoPlainText style='margin-left:.25 in. etc etc span style='font-size:12 etc ... /span etc I can search and replace some of these strings, and I have developed filters that can take care of others. But it's a laborious process involving various software. It would be better not to get the "trash" in the first place. The Save As XML option is even worse. Is there a way to make a simple conversion using Word or 3rd party software? |
#4
Posted to microsoft.public.word.newusers
|
|||
|
|||
Saving a Word DOC as HTML
Thanks. The "Office 2000 HTML Filter" - downloaded as MSOHTMF2.EXE - seems
to be just what I'm looking for (aside from the limitation that you need to have Office 2000 on your system to install it - Office 2003 won't do. It's a ridiculous limitation, but can be circumvented). *"Stephen Glynn wrote "according to an article at *http://techrepublic.com.com *there are a couple of free utilities you can download *from Microsoft that'll clear out the gubbins that Word *introduces into HTML" |
#5
Posted to microsoft.public.word.newusers
|
|||
|
|||
Saving a Word DOC as HTML
Hi Peter,
In Word 2002 and Word 2003 using File=Save As Web Page-Filtered will do basically the same thing as the Office 2000 HTML filter (at least the part used inside of Word that lists it as File=Save as Compact HTML by using the MSFilter.DOT addin. In Word 2002 and 2003 the 'Filtered' content will take into consideration the settings you have in Tools=Options=General=[Web Options] You can still use the standalone Office 2000 MSFilter.exe tool to batch process already created Word HTML files and it will remove the CSS style formatting from the filtered HTML pages as well. You can also use apps such as HTMLTidy to process the files. Creating 'public use' web pages wasn't really the design goal for the Word files after Word 97 but rather as a way to create a 'browser viewable' version of a Word document while retaining all of the parts of a .doc file that a browser didn't support so you could turn it back into a doc file when opened in Word from a browser ('roundtripping'). For 'web page' MS Office Frontpage was the app targeted. ======= "Peter Rooney" wrote in message ink.net... Thanks. The "Office 2000 HTML Filter" - downloaded as MSOHTMF2.EXE - seems to be just what I'm looking for (aside from the limitation that you need to have Office 2000 on your system to install it - Office 2003 won't do. It's a ridiculous limitation, but can be circumvented). -- Let us know if this helped you, Bob Buckland ?:-) MS Office System Products MVP *Courtesy is not expensive and can pay big dividends* For Everyday MS Office tips to "use right away" - http://microsoft.com/events/series/a...andtricks.mspx |
Reply |
Thread Tools | |
Display Modes | |
|
|
Similar Threads | ||||
Thread | Forum | |||
Word 97 in Windows XP to maintain formatting | Microsoft Word Help | |||
Does Word have Keyboard Merges like Word Perfect does? | Mailmerge | |||
Word2000 letterhead merge | Mailmerge | |||
Underscore (_) will not always display in RTF files (Word 2002). | Microsoft Word Help | |||
Boiletplates from Word Perfect | Microsoft Word Help |