Home |
Search |
Today's Posts |
#1
Posted to microsoft.public.word.docmanagement
|
|||
|
|||
How can I validate a Word document to find out if it is corrupt?
Hi!
I am working with an application called Doc2Help that takes a word file as input. I am having some problems, and Doc2Help support staff says my Word file is "corrupt". I have no problems opening the file in Word 2003, converting it to PDF, etc, so I don´t quite buy their explanation. Is there some clever little application that will validate my document to see if it is corrupt or not, or, even better, fix whatever is wrong? By the way, what on earth does "corrupt" mean? I have tried some recovery epplications, but they all tend to just take your document and find all the text and pictures. All formatting, page layout, etc is then lost, which sort of defeats the purpose. With that approach, I might as well copy all the content to a text file, then create a new word document with that content. That´s a lot of work, though. Thanks, John Sweden |
#2
Posted to microsoft.public.word.docmanagement
|
|||
|
|||
How can I validate a Word document to find out if it is corrupt?
The application that will can best tell you if the document is corrupt is,
as you have already worked out, Word itself. the document opens correctly in Word, displays correctly, and prints correctly, then it is, by definition, well-formed. If you're getting this kind of response from the Doc2Help people, run a mile from them. Things can only go downhill. What they are actually saying, I suspect, is that your document is structured in a way that their application can't handle. That's their problem, not yours. "John Liungman" wrote in message ... Hi! I am working with an application called Doc2Help that takes a word file as input. I am having some problems, and Doc2Help support staff says my Word file is "corrupt". I have no problems opening the file in Word 2003, converting it to PDF, etc, so I don´t quite buy their explanation. Is there some clever little application that will validate my document to see if it is corrupt or not, or, even better, fix whatever is wrong? By the way, what on earth does "corrupt" mean? I have tried some recovery epplications, but they all tend to just take your document and find all the text and pictures. All formatting, page layout, etc is then lost, which sort of defeats the purpose. With that approach, I might as well copy all the content to a text file, then create a new word document with that content. That´s a lot of work, though. Thanks, John Sweden |
#3
Posted to microsoft.public.word.docmanagement
|
|||
|
|||
How can I validate a Word document to find out if it is corrupt?
John, I agree with Jezebel's assessment. I just wanted to answer your
question 'what on earth does "corrupt" mean?' so you can deal with Doc2Help support on a reasonable basis. The Word document file format is not a linear stream of text with embedded formatting tags like that of WordPerfect. It's a very complex structure known as "OLE structured storage". The text and graphics for the main text, headers, footers, footnotes, etc. are stored in multiple separate "containers", and the formatting is stored separately from the text. Pointers (numeric locators) indicate where each piece of text and formatting goes. The Word program interprets all these things to determine what goes where on each page. Sometimes things go wrong -- a Save operation writes the wrong pointers, or the file becomes damaged while it's attached to an email, or gremlins work on it -- and one or more pointers don't point to the right place. When the Word program tries to interpret the file, the errors become apparent. That's corruption. Because corruption can cause almost any kind of incorrect operation, depending on what is corrupted and what (if anything) the incorrect value means to Word, it's easy to blame all sorts of misbehavior on corruption. As Jezebel said, though, if Word can interpret the file correctly, then it isn't corrupt. The article at http://word.mvps.org/FAQs/AppErrors/CorruptDoc.htm discusses this topic and offers some tips for fixing files that are corrupt. -- Regards, Jay Freedman Microsoft Word MVP FAQ: http://word.mvps.org Email cannot be acknowledged; please post all follow-ups to the newsgroup so all may benefit. Jezebel wrote: The application that will can best tell you if the document is corrupt is, as you have already worked out, Word itself. the document opens correctly in Word, displays correctly, and prints correctly, then it is, by definition, well-formed. If you're getting this kind of response from the Doc2Help people, run a mile from them. Things can only go downhill. What they are actually saying, I suspect, is that your document is structured in a way that their application can't handle. That's their problem, not yours. "John Liungman" wrote in message ... Hi! I am working with an application called Doc2Help that takes a word file as input. I am having some problems, and Doc2Help support staff says my Word file is "corrupt". I have no problems opening the file in Word 2003, converting it to PDF, etc, so I don´t quite buy their explanation. Is there some clever little application that will validate my document to see if it is corrupt or not, or, even better, fix whatever is wrong? By the way, what on earth does "corrupt" mean? I have tried some recovery epplications, but they all tend to just take your document and find all the text and pictures. All formatting, page layout, etc is then lost, which sort of defeats the purpose. With that approach, I might as well copy all the content to a text file, then create a new word document with that content. That´s a lot of work, though. Thanks, John Sweden |
#4
Posted to microsoft.public.word.docmanagement
|
|||
|
|||
How can I validate a Word document to find out if it is corrupt?
I'll throw my paltry weight behind the experts opinions but might just point
out that the *only* way to be sure one is processing a Word document correctly is to use Word. If Word can open the file and Doc2Help is working via a Word interface then all should be well and if it isn't, document corruption is about the least likely cause. If, on the other hand, Doc2Help is trying to process the document in some other way then it is likely to have problems. OLE structured storage is a proprietary format and not fully documented outside Microsoft (and maybe not inside it either, I don't know). Anyone trying to interpret it is relying on, at best, good guesswork and could come unstuck at any moment. -- Enjoy, Tony "Jay Freedman" wrote in message ... John, I agree with Jezebel's assessment. I just wanted to answer your question 'what on earth does "corrupt" mean?' so you can deal with Doc2Help support on a reasonable basis. The Word document file format is not a linear stream of text with embedded formatting tags like that of WordPerfect. It's a very complex structure known as "OLE structured storage". The text and graphics for the main text, headers, footers, footnotes, etc. are stored in multiple separate "containers", and the formatting is stored separately from the text. Pointers (numeric locators) indicate where each piece of text and formatting goes. The Word program interprets all these things to determine what goes where on each page. Sometimes things go wrong -- a Save operation writes the wrong pointers, or the file becomes damaged while it's attached to an email, or gremlins work on it -- and one or more pointers don't point to the right place. When the Word program tries to interpret the file, the errors become apparent. That's corruption. Because corruption can cause almost any kind of incorrect operation, depending on what is corrupted and what (if anything) the incorrect value means to Word, it's easy to blame all sorts of misbehavior on corruption. As Jezebel said, though, if Word can interpret the file correctly, then it isn't corrupt. The article at http://word.mvps.org/FAQs/AppErrors/CorruptDoc.htm discusses this topic and offers some tips for fixing files that are corrupt. -- Regards, Jay Freedman Microsoft Word MVP FAQ: http://word.mvps.org Email cannot be acknowledged; please post all follow-ups to the newsgroup so all may benefit. Jezebel wrote: The application that will can best tell you if the document is corrupt is, as you have already worked out, Word itself. the document opens correctly in Word, displays correctly, and prints correctly, then it is, by definition, well-formed. If you're getting this kind of response from the Doc2Help people, run a mile from them. Things can only go downhill. What they are actually saying, I suspect, is that your document is structured in a way that their application can't handle. That's their problem, not yours. "John Liungman" wrote in message ... Hi! I am working with an application called Doc2Help that takes a word file as input. I am having some problems, and Doc2Help support staff says my Word file is "corrupt". I have no problems opening the file in Word 2003, converting it to PDF, etc, so I don´t quite buy their explanation. Is there some clever little application that will validate my document to see if it is corrupt or not, or, even better, fix whatever is wrong? By the way, what on earth does "corrupt" mean? I have tried some recovery epplications, but they all tend to just take your document and find all the text and pictures. All formatting, page layout, etc is then lost, which sort of defeats the purpose. With that approach, I might as well copy all the content to a text file, then create a new word document with that content. That´s a lot of work, though. Thanks, John Sweden |
#5
Posted to microsoft.public.word.docmanagement
|
|||
|
|||
How can I validate a Word document to find out if it is corrup
Hello all, and thanks for your replies!
I feel I now have a stronger case in the never-ending battle with D2H support. (I am actually being a bit tough on them - they ARE being very friendly and doing their best.) However, there might be some truth to what they say about the word file being corrupt. For those of you unacquainted with Doc2Help (D2H), it is an application which takes a word file and generates a Help file on the fly - table of contents, index, and all. Very handy. One nice feature is that it takes cross-references in Word like "for more info see "Further Information" on page 33" and transforms this into a hyperlink in the help file looking like this: "for more info see _Further Information_". This is the feature that OCCASIONALLY does not work. In a few cases, the hyperlinks just come out as ordinary text, ie "unclickable". In most instances, they come out fine. To track the error, I copied a problematic cross-reference in the word file and pasted it in various locations further up the document - before, after and inside the preceding heading, and also futher up the document. I then generated the help file in D2H, and could note that early in the document the links works, while at a certain point down they cease to do so. The shift always seems to happen around a heading. Now, of course D2H could still be to blame. Headings play a crucial role in defining the start of Help topics, as well as the TOC level of topics. The reason I still suspect Word is that I have seen another error in Word related to headings, a problem for which I have yet to find a solution. It has to do with all the paragraph-level features for text flow, such as Keep with Next. Much of this functionality does not work in my document, or it works unpredictably. Members of this forum have suggested that this may be caused by corruption of the document. Although I have not been able to prove this, it goes to show that some knowledgeable people (at least one MVP) believe that corruption can exist even if the document opens normally in Word. I finally resolved my Word/D2H problem by cutting out a large section around the misbehaving cross-reference, pasting it into Notepad, then copying the text back in the right location in Word. I redid the formatting, recreated the cross-link, and voilÃ*: it works! Strange, isn´t it? Casting blame becomes somewhat philosophical. But I do think that Microsoft should be more open about their file formats. .doc is not the only one causing headaches for developers of third-party apps. Outlook profiles... Ouch. Thanks again! John |
Reply |
Thread Tools | |
Display Modes | |
|
|
Similar Threads | ||||
Thread | Forum | |||
take yet another lesson from wordperfect "reveal codes" | Microsoft Word Help | |||
How to reformat a complete document. | Microsoft Word Help | |||
In Word, how can I see all files (*.*) in "save as"? | New Users | |||
How do I create & merge specific data base & master documents? | New Users | |||
Continuous breaks convert to next page breaks | Microsoft Word Help |