View Single Post
  #4   Report Post  
Posted to microsoft.public.word.docmanagement
Tony Jollans
 
Posts: n/a
Default How can I validate a Word document to find out if it is corrupt?

I'll throw my paltry weight behind the experts opinions but might just point
out that the *only* way to be sure one is processing a Word document
correctly is to use Word.

If Word can open the file and Doc2Help is working via a Word interface then
all should be well and if it isn't, document corruption is about the least
likely cause.

If, on the other hand, Doc2Help is trying to process the document in some
other way then it is likely to have problems. OLE structured storage is a
proprietary format and not fully documented outside Microsoft (and maybe not
inside it either, I don't know). Anyone trying to interpret it is relying
on, at best, good guesswork and could come unstuck at any moment.

--
Enjoy,
Tony


"Jay Freedman" wrote in message
...
John, I agree with Jezebel's assessment. I just wanted to answer your
question 'what on earth does "corrupt" mean?' so you can deal with

Doc2Help
support on a reasonable basis.

The Word document file format is not a linear stream of text with embedded
formatting tags like that of WordPerfect. It's a very complex structure
known as "OLE structured storage". The text and graphics for the main

text,
headers, footers, footnotes, etc. are stored in multiple separate
"containers", and the formatting is stored separately from the text.
Pointers (numeric locators) indicate where each piece of text and

formatting
goes. The Word program interprets all these things to determine what goes
where on each page.

Sometimes things go wrong -- a Save operation writes the wrong pointers,

or
the file becomes damaged while it's attached to an email, or gremlins work
on it -- and one or more pointers don't point to the right place. When the
Word program tries to interpret the file, the errors become apparent.

That's
corruption.

Because corruption can cause almost any kind of incorrect operation,
depending on what is corrupted and what (if anything) the incorrect value
means to Word, it's easy to blame all sorts of misbehavior on corruption.

As
Jezebel said, though, if Word can interpret the file correctly, then it
isn't corrupt.

The article at http://word.mvps.org/FAQs/AppErrors/CorruptDoc.htm

discusses
this topic and offers some tips for fixing files that are corrupt.

--
Regards,
Jay Freedman
Microsoft Word MVP FAQ: http://word.mvps.org
Email cannot be acknowledged; please post all follow-ups to the newsgroup

so
all may benefit.

Jezebel wrote:
The application that will can best tell you if the document is
corrupt is, as you have already worked out, Word itself. the document
opens correctly in Word, displays correctly, and prints correctly,
then it is, by definition, well-formed.

If you're getting this kind of response from the Doc2Help people, run
a mile from them. Things can only go downhill.

What they are actually saying, I suspect, is that your document is
structured in a way that their application can't handle. That's their
problem, not yours.




"John Liungman" wrote in
message ...
Hi!

I am working with an application called Doc2Help that takes a word
file as input. I am having some problems, and Doc2Help support staff
says my Word file is "corrupt". I have no problems opening the file
in Word 2003, converting it to PDF, etc, so I donīt quite buy their
explanation.

Is there some clever little application that will validate my
document to see if it is corrupt or not, or, even better, fix
whatever is wrong? By the
way, what on earth does "corrupt" mean?

I have tried some recovery epplications, but they all tend to just
take your
document and find all the text and pictures. All formatting, page
layout, etc
is then lost, which sort of defeats the purpose. With that approach,
I might
as well copy all the content to a text file, then create a new word
document
with that content. Thatīs a lot of work, though.

Thanks,

John
Sweden