Reply
 
Thread Tools Display Modes
  #1   Report Post  
Posted to microsoft.public.word.docmanagement
Ghitorni Ghitorni is offline
external usenet poster
 
Posts: 14
Default .docx files have XML components, but what's their use?

I read that if any corruption occurs, slim chances of recovering for 2003
version files. In 2007 you can recover almost fully because the actual file
is in zip format and inside it contains many xml files. But the "file" as
such, .docx is a single file (until unzipped & extracted). Then how can some
corruption save the file, because even in a zip format file, if a small
chunk is gone, you can never open it. Could anyone shed some light on this?
Thanks

  #2   Report Post  
Posted to microsoft.public.word.docmanagement
Doug Robbins - Word MVP Doug Robbins - Word MVP is offline
external usenet poster
 
Posts: 8,832
Default .docx files have XML components, but what's their use?

There could well be (and certainly are) cases where the corruption does not
preclude the Zip file from being opened.

--
Hope this helps.

Please reply to the newsgroup unless you wish to avail yourself of my
services on a paid consulting basis.

Doug Robbins - Word MVP, originally posted via msnews.microsoft.com

"Ghitorni" wrote in message
...
I read that if any corruption occurs, slim chances of recovering for 2003
version files. In 2007 you can recover almost fully because the actual
file is in zip format and inside it contains many xml files. But the
"file" as such, .docx is a single file (until unzipped & extracted). Then
how can some corruption save the file, because even in a zip format file,
if a small chunk is gone, you can never open it. Could anyone shed some
light on this? Thanks


  #3   Report Post  
Posted to microsoft.public.word.docmanagement
Doug Robbins - Word MVP Doug Robbins - Word MVP is offline
external usenet poster
 
Posts: 8,832
Default .docx files have XML components, but what's their use?

There could well be (and certainly are) cases where the corruption does not
preclude the Zip file from being opened.

--
Hope this helps.

Please reply to the newsgroup unless you wish to avail yourself of my
services on a paid consulting basis.

Doug Robbins - Word MVP, originally posted via msnews.microsoft.com

"Ghitorni" wrote in message
...
I read that if any corruption occurs, slim chances of recovering for 2003
version files. In 2007 you can recover almost fully because the actual
file is in zip format and inside it contains many xml files. But the
"file" as such, .docx is a single file (until unzipped & extracted). Then
how can some corruption save the file, because even in a zip format file,
if a small chunk is gone, you can never open it. Could anyone shed some
light on this? Thanks


  #4   Report Post  
Posted to microsoft.public.word.docmanagement
Peter Jamieson Peter Jamieson is offline
external usenet poster
 
Posts: 4,582
Default .docx files have XML components, but what's their use?

..docx and .doc files (at least since about Word 6) have a more similar
structure than many people probably realise - even in .doc, which uses
OLE Compound Files, the content is divided into different "streams"
which can be opened separately.

That said, .docx does have considerable advantages, including
a. the ZIP file structure itself is a de facto standard - I don't
personally have any ZIP utilities for recovering "unopenable" ZIP files,
but I expect there are many. I don't think you will find so many
utilities that know how to recover the content of a corrupted OLE
Compound File
b. each file within the ZIP is almost certainly going to be an XML
text file such as "document.xml", or a single binary object such as a
..jpg. If the ZIP is damaged, but you can still open it and get the
document.xml, you have already achieved quite a lot. Even if the ZIP is
damaged to the extent that you cannot open it, a recovery utility has a
much better chance of identifying the component files when it knows that
they are either XML or - in some cases at least - well-known types of
binary object such as .jpg. In contrast, in a .doc, the equivalent of
document.xml is a complex binary structure. It isn't even a simple
stream of text with markup. You have to have a utility that knows
precisely how to look through that binary representation in order to
extract anything at all. Although MS has now published the .doc standard
(it appears to be a work in progress), I suspect not many people will
want to spend resource developing new recovery software for obsolescent
formats.

Peter Jamieson

http://tips.pjmsn.me.uk

On 05/05/2010 09:10, Ghitorni wrote:
I read that if any corruption occurs, slim chances of recovering for
2003 version files. In 2007 you can recover almost fully because the
actual file is in zip format and inside it contains many xml files. But
the "file" as such, .docx is a single file (until unzipped & extracted).
Then how can some corruption save the file, because even in a zip format
file, if a small chunk is gone, you can never open it. Could anyone shed
some light on this? Thanks

  #5   Report Post  
Posted to microsoft.public.word.docmanagement
Peter Jamieson Peter Jamieson is offline
external usenet poster
 
Posts: 4,582
Default .docx files have XML components, but what's their use?

..docx and .doc files (at least since about Word 6) have a more similar
structure than many people probably realise - even in .doc, which uses
OLE Compound Files, the content is divided into different "streams"
which can be opened separately.

That said, .docx does have considerable advantages, including
a. the ZIP file structure itself is a de facto standard - I don't
personally have any ZIP utilities for recovering "unopenable" ZIP files,
but I expect there are many. I don't think you will find so many
utilities that know how to recover the content of a corrupted OLE
Compound File
b. each file within the ZIP is almost certainly going to be an XML
text file such as "document.xml", or a single binary object such as a
..jpg. If the ZIP is damaged, but you can still open it and get the
document.xml, you have already achieved quite a lot. Even if the ZIP is
damaged to the extent that you cannot open it, a recovery utility has a
much better chance of identifying the component files when it knows that
they are either XML or - in some cases at least - well-known types of
binary object such as .jpg. In contrast, in a .doc, the equivalent of
document.xml is a complex binary structure. It isn't even a simple
stream of text with markup. You have to have a utility that knows
precisely how to look through that binary representation in order to
extract anything at all. Although MS has now published the .doc standard
(it appears to be a work in progress), I suspect not many people will
want to spend resource developing new recovery software for obsolescent
formats.

Peter Jamieson

http://tips.pjmsn.me.uk

On 05/05/2010 09:10, Ghitorni wrote:
I read that if any corruption occurs, slim chances of recovering for
2003 version files. In 2007 you can recover almost fully because the
actual file is in zip format and inside it contains many xml files. But
the "file" as such, .docx is a single file (until unzipped & extracted).
Then how can some corruption save the file, because even in a zip format
file, if a small chunk is gone, you can never open it. Could anyone shed
some light on this? Thanks

Reply
Thread Tools
Display Modes

Posting Rules

Smilies are On
[IMG] code is On
HTML code is Off

Forum Jump

Similar Threads
Thread Thread Starter Forum Replies Last Post
DOCX files Stephen Ford[_2_] New Users 9 April 6th 10 09:23 PM
How to see files in .docx? Ghitorni Microsoft Word Help 8 March 6th 10 01:42 PM
How do I get old hyperlinks to look for .docx files? Dino-Dino-UK Microsoft Word Help 1 December 12th 08 02:49 AM
docx files Newbie2007 Microsoft Word Help 1 August 28th 07 05:30 AM
Why doesn't Word 2007 insert scanned TIF files, or DOCX files? Emdub Microsoft Word Help 0 July 17th 07 04:38 PM


All times are GMT +1. The time now is 12:26 PM.

Copyright ©2000 - 2024, Jelsoft Enterprises Ltd.
Copyright ©2004-2024 Microsoft Office Word Forum - WordBanter.
The comments are property of their posters.
 

About Us

"It's about Microsoft Word"