Reply
 
Thread Tools Display Modes
  #1   Report Post  
Posted to microsoft.public.word.docmanagement
John Liungman
 
Posts: n/a
Default How can I validate a Word document to find out if it is corrupt?

Hi!

I am working with an application called Doc2Help that takes a word file as
input. I am having some problems, and Doc2Help support staff says my Word
file is "corrupt". I have no problems opening the file in Word 2003,
converting it to PDF, etc, so I don´t quite buy their explanation.

Is there some clever little application that will validate my document to
see if it is corrupt or not, or, even better, fix whatever is wrong? By the
way, what on earth does "corrupt" mean?

I have tried some recovery epplications, but they all tend to just take your
document and find all the text and pictures. All formatting, page layout, etc
is then lost, which sort of defeats the purpose. With that approach, I might
as well copy all the content to a text file, then create a new word document
with that content. That´s a lot of work, though.

Thanks,

John
Sweden
  #2   Report Post  
Posted to microsoft.public.word.docmanagement
Jezebel
 
Posts: n/a
Default How can I validate a Word document to find out if it is corrupt?

The application that will can best tell you if the document is corrupt is,
as you have already worked out, Word itself. the document opens correctly in
Word, displays correctly, and prints correctly, then it is, by definition,
well-formed.

If you're getting this kind of response from the Doc2Help people, run a mile
from them. Things can only go downhill.

What they are actually saying, I suspect, is that your document is
structured in a way that their application can't handle. That's their
problem, not yours.




"John Liungman" wrote in message
...
Hi!

I am working with an application called Doc2Help that takes a word file as
input. I am having some problems, and Doc2Help support staff says my Word
file is "corrupt". I have no problems opening the file in Word 2003,
converting it to PDF, etc, so I don´t quite buy their explanation.

Is there some clever little application that will validate my document to
see if it is corrupt or not, or, even better, fix whatever is wrong? By
the
way, what on earth does "corrupt" mean?

I have tried some recovery epplications, but they all tend to just take
your
document and find all the text and pictures. All formatting, page layout,
etc
is then lost, which sort of defeats the purpose. With that approach, I
might
as well copy all the content to a text file, then create a new word
document
with that content. That´s a lot of work, though.

Thanks,

John
Sweden



  #3   Report Post  
Posted to microsoft.public.word.docmanagement
Jay Freedman
 
Posts: n/a
Default How can I validate a Word document to find out if it is corrupt?

John, I agree with Jezebel's assessment. I just wanted to answer your
question 'what on earth does "corrupt" mean?' so you can deal with Doc2Help
support on a reasonable basis.

The Word document file format is not a linear stream of text with embedded
formatting tags like that of WordPerfect. It's a very complex structure
known as "OLE structured storage". The text and graphics for the main text,
headers, footers, footnotes, etc. are stored in multiple separate
"containers", and the formatting is stored separately from the text.
Pointers (numeric locators) indicate where each piece of text and formatting
goes. The Word program interprets all these things to determine what goes
where on each page.

Sometimes things go wrong -- a Save operation writes the wrong pointers, or
the file becomes damaged while it's attached to an email, or gremlins work
on it -- and one or more pointers don't point to the right place. When the
Word program tries to interpret the file, the errors become apparent. That's
corruption.

Because corruption can cause almost any kind of incorrect operation,
depending on what is corrupted and what (if anything) the incorrect value
means to Word, it's easy to blame all sorts of misbehavior on corruption. As
Jezebel said, though, if Word can interpret the file correctly, then it
isn't corrupt.

The article at http://word.mvps.org/FAQs/AppErrors/CorruptDoc.htm discusses
this topic and offers some tips for fixing files that are corrupt.

--
Regards,
Jay Freedman
Microsoft Word MVP FAQ: http://word.mvps.org
Email cannot be acknowledged; please post all follow-ups to the newsgroup so
all may benefit.

Jezebel wrote:
The application that will can best tell you if the document is
corrupt is, as you have already worked out, Word itself. the document
opens correctly in Word, displays correctly, and prints correctly,
then it is, by definition, well-formed.

If you're getting this kind of response from the Doc2Help people, run
a mile from them. Things can only go downhill.

What they are actually saying, I suspect, is that your document is
structured in a way that their application can't handle. That's their
problem, not yours.




"John Liungman" wrote in
message ...
Hi!

I am working with an application called Doc2Help that takes a word
file as input. I am having some problems, and Doc2Help support staff
says my Word file is "corrupt". I have no problems opening the file
in Word 2003, converting it to PDF, etc, so I don´t quite buy their
explanation.

Is there some clever little application that will validate my
document to see if it is corrupt or not, or, even better, fix
whatever is wrong? By the
way, what on earth does "corrupt" mean?

I have tried some recovery epplications, but they all tend to just
take your
document and find all the text and pictures. All formatting, page
layout, etc
is then lost, which sort of defeats the purpose. With that approach,
I might
as well copy all the content to a text file, then create a new word
document
with that content. That´s a lot of work, though.

Thanks,

John
Sweden



  #4   Report Post  
Posted to microsoft.public.word.docmanagement
Tony Jollans
 
Posts: n/a
Default How can I validate a Word document to find out if it is corrupt?

I'll throw my paltry weight behind the experts opinions but might just point
out that the *only* way to be sure one is processing a Word document
correctly is to use Word.

If Word can open the file and Doc2Help is working via a Word interface then
all should be well and if it isn't, document corruption is about the least
likely cause.

If, on the other hand, Doc2Help is trying to process the document in some
other way then it is likely to have problems. OLE structured storage is a
proprietary format and not fully documented outside Microsoft (and maybe not
inside it either, I don't know). Anyone trying to interpret it is relying
on, at best, good guesswork and could come unstuck at any moment.

--
Enjoy,
Tony


"Jay Freedman" wrote in message
...
John, I agree with Jezebel's assessment. I just wanted to answer your
question 'what on earth does "corrupt" mean?' so you can deal with

Doc2Help
support on a reasonable basis.

The Word document file format is not a linear stream of text with embedded
formatting tags like that of WordPerfect. It's a very complex structure
known as "OLE structured storage". The text and graphics for the main

text,
headers, footers, footnotes, etc. are stored in multiple separate
"containers", and the formatting is stored separately from the text.
Pointers (numeric locators) indicate where each piece of text and

formatting
goes. The Word program interprets all these things to determine what goes
where on each page.

Sometimes things go wrong -- a Save operation writes the wrong pointers,

or
the file becomes damaged while it's attached to an email, or gremlins work
on it -- and one or more pointers don't point to the right place. When the
Word program tries to interpret the file, the errors become apparent.

That's
corruption.

Because corruption can cause almost any kind of incorrect operation,
depending on what is corrupted and what (if anything) the incorrect value
means to Word, it's easy to blame all sorts of misbehavior on corruption.

As
Jezebel said, though, if Word can interpret the file correctly, then it
isn't corrupt.

The article at http://word.mvps.org/FAQs/AppErrors/CorruptDoc.htm

discusses
this topic and offers some tips for fixing files that are corrupt.

--
Regards,
Jay Freedman
Microsoft Word MVP FAQ: http://word.mvps.org
Email cannot be acknowledged; please post all follow-ups to the newsgroup

so
all may benefit.

Jezebel wrote:
The application that will can best tell you if the document is
corrupt is, as you have already worked out, Word itself. the document
opens correctly in Word, displays correctly, and prints correctly,
then it is, by definition, well-formed.

If you're getting this kind of response from the Doc2Help people, run
a mile from them. Things can only go downhill.

What they are actually saying, I suspect, is that your document is
structured in a way that their application can't handle. That's their
problem, not yours.




"John Liungman" wrote in
message ...
Hi!

I am working with an application called Doc2Help that takes a word
file as input. I am having some problems, and Doc2Help support staff
says my Word file is "corrupt". I have no problems opening the file
in Word 2003, converting it to PDF, etc, so I don´t quite buy their
explanation.

Is there some clever little application that will validate my
document to see if it is corrupt or not, or, even better, fix
whatever is wrong? By the
way, what on earth does "corrupt" mean?

I have tried some recovery epplications, but they all tend to just
take your
document and find all the text and pictures. All formatting, page
layout, etc
is then lost, which sort of defeats the purpose. With that approach,
I might
as well copy all the content to a text file, then create a new word
document
with that content. That´s a lot of work, though.

Thanks,

John
Sweden





  #5   Report Post  
Posted to microsoft.public.word.docmanagement
John Liungman
 
Posts: n/a
Default How can I validate a Word document to find out if it is corrup

Hello all, and thanks for your replies!

I feel I now have a stronger case in the never-ending battle with D2H
support. (I am actually being a bit tough on them - they ARE being very
friendly and doing their best.)

However, there might be some truth to what they say about the word file
being corrupt.

For those of you unacquainted with Doc2Help (D2H), it is an application
which takes a word file and generates a Help file on the fly - table of
contents, index, and all. Very handy. One nice feature is that it takes
cross-references in Word like "for more info see "Further Information" on
page 33" and transforms this into a hyperlink in the help file looking like
this: "for more info see _Further Information_". This is the feature that
OCCASIONALLY does not work.

In a few cases, the hyperlinks just come out as ordinary text, ie
"unclickable". In most instances, they come out fine. To track the error, I
copied a problematic cross-reference in the word file and pasted it in
various locations further up the document - before, after and inside the
preceding heading, and also futher up the document. I then generated the help
file in D2H, and could note that early in the document the links works, while
at a certain point down they cease to do so. The shift always seems to happen
around a heading.

Now, of course D2H could still be to blame. Headings play a crucial role in
defining the start of Help topics, as well as the TOC level of topics. The
reason I still suspect Word is that I have seen another error in Word related
to headings, a problem for which I have yet to find a solution. It has to do
with all the paragraph-level features for text flow, such as Keep with Next.
Much of this functionality does not work in my document, or it works
unpredictably. Members of this forum have suggested that this may be caused
by corruption of the document. Although I have not been able to prove this,
it goes to show that some knowledgeable people (at least one MVP) believe
that corruption can exist even if the document opens normally in Word.

I finally resolved my Word/D2H problem by cutting out a large section around
the misbehaving cross-reference, pasting it into Notepad, then copying the
text back in the right location in Word. I redid the formatting, recreated
the cross-link, and voilÃ*: it works!

Strange, isn´t it? Casting blame becomes somewhat philosophical. But I do
think that Microsoft should be more open about their file formats. .doc is
not the only one causing headaches for developers of third-party apps.
Outlook profiles... Ouch.

Thanks again!

John


Reply
Thread Tools
Display Modes

Posting Rules

Smilies are On
[IMG] code is On
HTML code is Off

Forum Jump

Similar Threads
Thread Thread Starter Forum Replies Last Post
take yet another lesson from wordperfect "reveal codes" wordperfect is superior Microsoft Word Help 5 May 11th 09 07:58 PM
How to reformat a complete document. wanda Microsoft Word Help 16 February 11th 06 11:43 PM
In Word, how can I see all files (*.*) in "save as"? citizen53 New Users 8 April 4th 05 04:56 PM
How do I create & merge specific data base & master documents? maggiev New Users 2 January 12th 05 11:30 PM
Continuous breaks convert to next page breaks Jennifer Hunt Microsoft Word Help 2 December 30th 04 05:45 PM


All times are GMT +1. The time now is 04:23 PM.

Copyright ©2000 - 2024, Jelsoft Enterprises Ltd.
Copyright ©2004-2024 Microsoft Office Word Forum - WordBanter.
The comments are property of their posters.
 

About Us

"It's about Microsoft Word"