Reply
 
Thread Tools Display Modes
  #1   Report Post  
Posted to microsoft.public.word.docmanagement,microsoft.public.word.programming
Phantom[_2_] Phantom[_2_] is offline
external usenet poster
 
Posts: 3
Default Open Packaging Format creator/manager for OS X

I'm looking for an Open Packaging Format creator/manager for OS X.

Specfically, I'm trying to generate dynamic .docx files, which is going
pretty well, but I'm not able to repackage the edited files without
blowing up Word... any rel(s), image, or XML file in the package that I
change (other than the document.xml itself) is seen as corrupt by Word
when it tries to open the file. Sure enough, Word is clever enough to
recover the remaining elements, but anything I change gets nuked.

That prevents me from swapping in new images (charts in my case), or
modifying any hyperlinks (which exist in .rel files).

I found that I can open up a .docx file directly in Stuffit Archive
Manager (SAM) (without even having to change the extension), which
eliminates the need to re-zip the .docx files from scratch (which seems
to blow my entire document when I try). Using SAM, I can extract only
the file I choose, edit it, put it back, and the other elements remain
intact.

The key problem appears to be the compression technique: the docx isn't
actually a Plain Old Zip (POZ) file, it's actually an Open Packaging
Convention (OPC) file:

http://en.wikipedia.org/wiki/Open_Packaging_Conventions

Now, if I could only find an OPC creator/manager for the Mac... a GUI
would be great, but a command line would do as well. I seemed to have
found what at first appeared such a tool, but it doesn't seem to do
anything except manage MacPorts:
http://www.versiontracker.com/dyn/moreinfo/macosx/32608

FYI, Porticus needs MacPorts installed as well:
http://www.macports.org/install.php

However, I can't seem to see how Porticus helps me with the OPC file
management... I'm probably on a wild turkey chase with that, but there
might be something there.

I'm guessing that there are Mac developers here more informed than me,
hoping someone can shed light on my OPC requirements.

thanks in advance, folks.

  #2   Report Post  
Posted to microsoft.public.word.docmanagement,microsoft.public.word.programming
Yves Dhondt Yves Dhondt is offline
external usenet poster
 
Posts: 767
Default Open Packaging Format creator/manager for OS X


"Phantom" wrote in message
news:2009101110305975249-phantom@mailinatorcom...
I'm looking for an Open Packaging Format creator/manager for OS X.

Specfically, I'm trying to generate dynamic .docx files, which is going
pretty well, but I'm not able to repackage the edited files without
blowing up Word... any rel(s), image, or XML file in the package that I
change (other than the document.xml itself) is seen as corrupt by Word
when it tries to open the file. Sure enough, Word is clever enough to
recover the remaining elements, but anything I change gets nuked.


What happens if you open one of the XML files inside the container and then
save it without changing anything? Does that corrupt the document as well?
If so, extract the XML file from both the correct and corrupt version and do
a byte-by-byte comparison. It could be that the editor you use adds a byte
order mark (BOM) at the start of an XML file.

That prevents me from swapping in new images (charts in my case), or
modifying any hyperlinks (which exist in .rel files).

I found that I can open up a .docx file directly in Stuffit Archive
Manager (SAM) (without even having to change the extension), which
eliminates the need to re-zip the .docx files from scratch (which seems to
blow my entire document when I try). Using SAM, I can extract only the
file I choose, edit it, put it back, and the other elements remain intact.

The key problem appears to be the compression technique: the docx isn't
actually a Plain Old Zip (POZ) file, it's actually an Open Packaging
Convention (OPC) file:

http://en.wikipedia.org/wiki/Open_Packaging_Conventions


From a compression point of view, there isn't any difference between the
two. POZ and OPC are the same format. The difference is that a POZ file has
no notion of its contents while an OPC file uses standardized entrypoints to
find the relation between the different files in the container.

Now, if I could only find an OPC creator/manager for the Mac... a GUI
would be great, but a command line would do as well. I seemed to have
found what at first appeared such a tool, but it doesn't seem to do
anything except manage MacPorts:
http://www.versiontracker.com/dyn/moreinfo/macosx/32608

FYI, Porticus needs MacPorts installed as well:
http://www.macports.org/install.php

However, I can't seem to see how Porticus helps me with the OPC file
management... I'm probably on a wild turkey chase with that, but there
might be something there.

I'm guessing that there are Mac developers here more informed than me,
hoping someone can shed light on my OPC requirements.

thanks in advance, folks.


Mono implements the System.IO.Packaging namespace for .Net on MacOS (not
sure if it is in the latest release already). That namespace contains an API
for accessing an manipulating OPC files. I know the API works as I have
written apps already which use it (I'm not a Mac specialist, I just needed
to make some software cross-platform). Of course, this means you would still
have to program your application yourself.

Yves

  #3   Report Post  
Posted to microsoft.public.word.docmanagement,microsoft.public.word.programming
Phantom[_2_] Phantom[_2_] is offline
external usenet poster
 
Posts: 3
Default Open Packaging Format creator/manager for OS X

On 2009-10-11 12:26:01 -0700, "Yves Dhondt" said:


"Phantom" wrote in message
news:2009101110305975249-phantom@mailinatorcom...
I'm looking for an Open Packaging Format creator/manager for OS X.

Specfically, I'm trying to generate dynamic .docx files, which is going
pretty well, but I'm not able to repackage the edited files without
blowing up Word... any rel(s), image, or XML file in the package that I
change (other than the document.xml itself) is seen as corrupt by Word
when it tries to open the file. Sure enough, Word is clever enough to
recover the remaining elements, but anything I change gets nuked.


What happens if you open one of the XML files inside the container and
then save it without changing anything? Does that corrupt the document
as well? If so, extract the XML file from both the correct and corrupt
version and do a byte-by-byte comparison. It could be that the editor
you use adds a byte order mark (BOM) at the start of an XML file.


no, only when something is added to the file (without its proper meta
infomation, I'm guessing).


That prevents me from swapping in new images (charts in my case), or
modifying any hyperlinks (which exist in .rel files).

I found that I can open up a .docx file directly in Stuffit Archive
Manager (SAM) (without even having to change the extension), which
eliminates the need to re-zip the .docx files from scratch (which seems
to blow my entire document when I try). Using SAM, I can extract only
the file I choose, edit it, put it back, and the other elements remain
intact.

The key problem appears to be the compression technique: the docx isn't
actually a Plain Old Zip (POZ) file, it's actually an Open Packaging
Convention (OPC) file:

http://en.wikipedia.org/wiki/Open_Packaging_Conventions


From a compression point of view, there isn't any difference between
the two. POZ and OPC are the same format. The difference is that a POZ
file has no notion of its contents while an OPC file uses standardized
entrypoints to find the relation between the different files in the
container.

Now, if I could only find an OPC creator/manager for the Mac... a GUI
would be great, but a command line would do as well. I seemed to have
found what at first appeared such a tool, but it doesn't seem to do
anything except manage MacPorts:
http://www.versiontracker.com/dyn/moreinfo/macosx/32608

FYI, Porticus needs MacPorts installed as well:
http://www.macports.org/install.php

However, I can't seem to see how Porticus helps me with the OPC file
management... I'm probably on a wild turkey chase with that, but there
might be something there.

I'm guessing that there are Mac developers here more informed than me,
hoping someone can shed light on my OPC requirements.

thanks in advance, folks.


Mono implements the System.IO.Packaging namespace for .Net on MacOS
(not sure if it is in the latest release already). That namespace
contains an API for accessing an manipulating OPC files. I know the API
works as I have written apps already which use it (I'm not a Mac
specialist, I just needed to make some software cross-platform). Of
course, this means you would still have to program your application
yourself.

Yves


..net isn't going to help me much on OS X... this is supposed to be an
open standard, so it should go without saying that I shouldn't have to
use a MS product to manage the document.

to Microsoft's credit, the document is pretty well formed, and they're
95% of the way there... I just can't create the dang OPC file.

  #4   Report Post  
Posted to microsoft.public.word.docmanagement,microsoft.public.word.programming
Yves Dhondt Yves Dhondt is offline
external usenet poster
 
Posts: 767
Default Open Packaging Format creator/manager for OS X


"Phantom" wrote in message
news:2009101113230016807-phantom@mailinatorcom...
On 2009-10-11 12:26:01 -0700, "Yves Dhondt" said:


"Phantom" wrote in message
news:2009101110305975249-phantom@mailinatorcom...
I'm looking for an Open Packaging Format creator/manager for OS X.

Specfically, I'm trying to generate dynamic .docx files, which is going
pretty well, but I'm not able to repackage the edited files without
blowing up Word... any rel(s), image, or XML file in the package that I
change (other than the document.xml itself) is seen as corrupt by Word
when it tries to open the file. Sure enough, Word is clever enough to
recover the remaining elements, but anything I change gets nuked.


What happens if you open one of the XML files inside the container and
then save it without changing anything? Does that corrupt the document as
well? If so, extract the XML file from both the correct and corrupt
version and do a byte-by-byte comparison. It could be that the editor you
use adds a byte order mark (BOM) at the start of an XML file.


no, only when something is added to the file (without its proper meta
infomation, I'm guessing).


That prevents me from swapping in new images (charts in my case), or
modifying any hyperlinks (which exist in .rel files).

I found that I can open up a .docx file directly in Stuffit Archive
Manager (SAM) (without even having to change the extension), which
eliminates the need to re-zip the .docx files from scratch (which seems
to blow my entire document when I try). Using SAM, I can extract only
the file I choose, edit it, put it back, and the other elements remain
intact.

The key problem appears to be the compression technique: the docx isn't
actually a Plain Old Zip (POZ) file, it's actually an Open Packaging
Convention (OPC) file:

http://en.wikipedia.org/wiki/Open_Packaging_Conventions


From a compression point of view, there isn't any difference between the
two. POZ and OPC are the same format. The difference is that a POZ file
has no notion of its contents while an OPC file uses standardized
entrypoints to find the relation between the different files in the
container.

Now, if I could only find an OPC creator/manager for the Mac... a GUI
would be great, but a command line would do as well. I seemed to have
found what at first appeared such a tool, but it doesn't seem to do
anything except manage MacPorts:
http://www.versiontracker.com/dyn/moreinfo/macosx/32608

FYI, Porticus needs MacPorts installed as well:
http://www.macports.org/install.php

However, I can't seem to see how Porticus helps me with the OPC file
management... I'm probably on a wild turkey chase with that, but there
might be something there.

I'm guessing that there are Mac developers here more informed than me,
hoping someone can shed light on my OPC requirements.

thanks in advance, folks.


Mono implements the System.IO.Packaging namespace for .Net on MacOS (not
sure if it is in the latest release already). That namespace contains an
API for accessing an manipulating OPC files. I know the API works as I
have written apps already which use it (I'm not a Mac specialist, I just
needed to make some software cross-platform). Of course, this means you
would still have to program your application yourself.

Yves


.net isn't going to help me much on OS X... this is supposed to be an open
standard, so it should go without saying that I shouldn't have to use a MS
product to manage the document.

to Microsoft's credit, the document is pretty well formed, and they're 95%
of the way there... I just can't create the dang OPC file.


Mono (http://www.mono-project.com/Main_Page) has nothing to do with MS. It's
a cross-platform, open-source implementation of the .NET framework. There is
also a Java implementation called OpenXML4J
(http://sourceforge.net/projects/openxml4j/) but I have no experience with
it.

You were writing about changing hyperlinks. Do your documents crash if the
only thing you do is change the value of the "Target" element of your
hyperlink in your document.xml.rels file?

Yves

  #5   Report Post  
Posted to microsoft.public.word.docmanagement,microsoft.public.word.programming
Phantom[_2_] Phantom[_2_] is offline
external usenet poster
 
Posts: 3
Default Open Packaging Format creator/manager for OS X

On 2009-10-11 12:26:01 -0700, "Yves Dhondt" said:


"Phantom" wrote in message
news:2009101110305975249-phantom@mailinatorcom...
I'm looking for an Open Packaging Format creator/manager for OS X.

Specfically, I'm trying to generate dynamic .docx files, which is going
pretty well, but I'm not able to repackage the edited files without
blowing up Word... any rel(s), image, or XML file in the package that I
change (other than the document.xml itself) is seen as corrupt by Word
when it tries to open the file. Sure enough, Word is clever enough to
recover the remaining elements, but anything I change gets nuked.


What happens if you open one of the XML files inside the container and
then save it without changing anything? Does that corrupt the document
as well? If so, extract the XML file from both the correct and corrupt
version and do a byte-by-byte comparison. It could be that the editor
you use adds a byte order mark (BOM) at the start of an XML file.

That prevents me from swapping in new images (charts in my case), or
modifying any hyperlinks (which exist in .rel files).

I found that I can open up a .docx file directly in Stuffit Archive
Manager (SAM) (without even having to change the extension), which
eliminates the need to re-zip the .docx files from scratch (which seems
to blow my entire document when I try). Using SAM, I can extract only
the file I choose, edit it, put it back, and the other elements remain
intact.

The key problem appears to be the compression technique: the docx isn't
actually a Plain Old Zip (POZ) file, it's actually an Open Packaging
Convention (OPC) file:

http://en.wikipedia.org/wiki/Open_Packaging_Conventions


From a compression point of view, there isn't any difference between
the two. POZ and OPC are the same format. The difference is that a POZ
file has no notion of its contents while an OPC file uses standardized
entrypoints to find the relation between the different files in the
container.

Now, if I could only find an OPC creator/manager for the Mac... a GUI
would be great, but a command line would do as well. I seemed to have
found what at first appeared such a tool, but it doesn't seem to do
anything except manage MacPorts:
http://www.versiontracker.com/dyn/moreinfo/macosx/32608

FYI, Porticus needs MacPorts installed as well:
http://www.macports.org/install.php

However, I can't seem to see how Porticus helps me with the OPC file
management... I'm probably on a wild turkey chase with that, but there
might be something there.

I'm guessing that there are Mac developers here more informed than me,
hoping someone can shed light on my OPC requirements.

thanks in advance, folks.


Mono implements the System.IO.Packaging namespace for .Net on MacOS
(not sure if it is in the latest release already). That namespace
contains an API for accessing an manipulating OPC files. I know the API
works as I have written apps already which use it (I'm not a Mac
specialist, I just needed to make some software cross-platform). Of
course, this means you would still have to program your application
yourself.

Yves


you're definitely ahead of me on ideas, Yves, thanks for your input so far.

I tried out another idea regarding resource forks, but no go.

I expanded the .docx file to its juicy component files, then (without
changing anything) recompressed them with the command line zip tool,
which by all accounts, does not include resource forks:

zip -X -r test test

.... then renamed test.zip as test.docx and attempted to open it with
Word 2008. No luck, Word declares the document bogus.

I did attempt a zip -df, but that's long deprecated, and doesn't work.
Given that the current command line zip tool doesn't stuff resource
forks in the first place, it shouldn't be an issue. Just to make sure,
I checked the zip file and didn't see any resource-looking files:

% zip -X -r test test
adding: test/ (stored 0%)
adding: test/[Content_Types].xml (deflated 84%)
adding: test/_rels/ (stored 0%)
adding: test/_rels/.rels (deflated 66%)
adding: test/docProps/ (stored 0%)
adding: test/docProps/app.xml (deflated 73%)
adding: test/docProps/core.xml (deflated 52%)
adding: test/docProps/custom.xml (deflated 60%)
adding: test/word/ (stored 0%)
adding: test/word/_rels/ (stored 0%)
adding: test/word/_rels/document.xml.rels (deflated 85%)
adding: test/word/_rels/header2.xml.rels (deflated 38%)
adding: test/word/_rels/header3.xml.rels (deflated 38%)
adding: test/word/_rels/header4.xml.rels (deflated 38%)
adding: test/word/document.xml (deflated 83%)
adding: test/word/endnotes.xml (deflated 65%)
adding: test/word/fontTable.xml (deflated 85%)
adding: test/word/footer1.xml (deflated 65%)
adding: test/word/footer2.xml (deflated 79%)
adding: test/word/footer3.xml (deflated 81%)
adding: test/word/footer4.xml (deflated 81%)
adding: test/word/footnotes.xml (deflated 65%)
adding: test/word/header1.xml (deflated 70%)
adding: test/word/header2.xml (deflated 64%)
adding: test/word/header3.xml (deflated 64%)
adding: test/word/header4.xml (deflated 64%)
adding: test/word/media/ (stored 0%)
adding: test/word/media/image1.jpeg (deflated 72%)
adding: test/word/media/image2.jpeg (deflated 61%)
adding: test/word/numbering.xml (deflated 96%)
adding: test/word/settings.xml (deflated 59%)
adding: test/word/styles.xml (deflated 89%)
adding: test/word/theme/ (stored 0%)
adding: test/word/theme/theme1.xml (deflated 79%)
adding: test/word/webSettings.xml (deflated 34%)

Darn, thought I had it with that resource fork idea.


I also checked your idea about files being changed in the round trip. I
extracted one file (A), put it right back in, then took it out again
(B). I then used diff to compare A and B, and found them to be
identical.


So, it definitey seems like the meta data is missing from the OPC.

You mentioned "OPC file uses standardized entrypoints to find the
relation between the different files in the container"... did you mean
that the internal OPC file has entrypoint data that it is tracking
internally somewhere within the file, or that the entrypoints are
simply standard file names and folder structures that differentiates it
from a POZ file?

So close, so close...



  #6   Report Post  
Posted to microsoft.public.word.docmanagement,microsoft.public.word.programming
Yves Dhondt Yves Dhondt is offline
external usenet poster
 
Posts: 767
Default Open Packaging Format creator/manager for OS X

"Phantom" wrote in message
news:2009101114123475249-phantom@mailinatorcom...
On 2009-10-11 12:26:01 -0700, "Yves Dhondt" said:


"Phantom" wrote in message
news:2009101110305975249-phantom@mailinatorcom...
I'm looking for an Open Packaging Format creator/manager for OS X.

Specfically, I'm trying to generate dynamic .docx files, which is going
pretty well, but I'm not able to repackage the edited files without
blowing up Word... any rel(s), image, or XML file in the package that I
change (other than the document.xml itself) is seen as corrupt by Word
when it tries to open the file. Sure enough, Word is clever enough to
recover the remaining elements, but anything I change gets nuked.


What happens if you open one of the XML files inside the container and
then save it without changing anything? Does that corrupt the document as
well? If so, extract the XML file from both the correct and corrupt
version and do a byte-by-byte comparison. It could be that the editor you
use adds a byte order mark (BOM) at the start of an XML file.

That prevents me from swapping in new images (charts in my case), or
modifying any hyperlinks (which exist in .rel files).

I found that I can open up a .docx file directly in Stuffit Archive
Manager (SAM) (without even having to change the extension), which
eliminates the need to re-zip the .docx files from scratch (which seems
to blow my entire document when I try). Using SAM, I can extract only
the file I choose, edit it, put it back, and the other elements remain
intact.

The key problem appears to be the compression technique: the docx isn't
actually a Plain Old Zip (POZ) file, it's actually an Open Packaging
Convention (OPC) file:

http://en.wikipedia.org/wiki/Open_Packaging_Conventions


From a compression point of view, there isn't any difference between the
two. POZ and OPC are the same format. The difference is that a POZ file
has no notion of its contents while an OPC file uses standardized
entrypoints to find the relation between the different files in the
container.

Now, if I could only find an OPC creator/manager for the Mac... a GUI
would be great, but a command line would do as well. I seemed to have
found what at first appeared such a tool, but it doesn't seem to do
anything except manage MacPorts:
http://www.versiontracker.com/dyn/moreinfo/macosx/32608

FYI, Porticus needs MacPorts installed as well:
http://www.macports.org/install.php

However, I can't seem to see how Porticus helps me with the OPC file
management... I'm probably on a wild turkey chase with that, but there
might be something there.

I'm guessing that there are Mac developers here more informed than me,
hoping someone can shed light on my OPC requirements.

thanks in advance, folks.


Mono implements the System.IO.Packaging namespace for .Net on MacOS (not
sure if it is in the latest release already). That namespace contains an
API for accessing an manipulating OPC files. I know the API works as I
have written apps already which use it (I'm not a Mac specialist, I just
needed to make some software cross-platform). Of course, this means you
would still have to program your application yourself.

Yves


you're definitely ahead of me on ideas, Yves, thanks for your input so
far.

I tried out another idea regarding resource forks, but no go.

I expanded the .docx file to its juicy component files, then (without
changing anything) recompressed them with the command line zip tool, which
by all accounts, does not include resource forks:

zip -X -r test test

... then renamed test.zip as test.docx and attempted to open it with Word
2008. No luck, Word declares the document bogus.

I did attempt a zip -df, but that's long deprecated, and doesn't work.
Given that the current command line zip tool doesn't stuff resource forks
in the first place, it shouldn't be an issue. Just to make sure, I checked
the zip file and didn't see any resource-looking files:

% zip -X -r test test
adding: test/ (stored 0%)
adding: test/[Content_Types].xml (deflated 84%)
adding: test/_rels/ (stored 0%)
adding: test/_rels/.rels (deflated 66%)
adding: test/docProps/ (stored 0%)
adding: test/docProps/app.xml (deflated 73%)
adding: test/docProps/core.xml (deflated 52%)
adding: test/docProps/custom.xml (deflated 60%)
adding: test/word/ (stored 0%)
adding: test/word/_rels/ (stored 0%)
adding: test/word/_rels/document.xml.rels (deflated 85%)
adding: test/word/_rels/header2.xml.rels (deflated 38%)
adding: test/word/_rels/header3.xml.rels (deflated 38%)
adding: test/word/_rels/header4.xml.rels (deflated 38%)
adding: test/word/document.xml (deflated 83%)
adding: test/word/endnotes.xml (deflated 65%)
adding: test/word/fontTable.xml (deflated 85%)
adding: test/word/footer1.xml (deflated 65%)
adding: test/word/footer2.xml (deflated 79%)
adding: test/word/footer3.xml (deflated 81%)
adding: test/word/footer4.xml (deflated 81%)
adding: test/word/footnotes.xml (deflated 65%)
adding: test/word/header1.xml (deflated 70%)
adding: test/word/header2.xml (deflated 64%)
adding: test/word/header3.xml (deflated 64%)
adding: test/word/header4.xml (deflated 64%)
adding: test/word/media/ (stored 0%)
adding: test/word/media/image1.jpeg (deflated 72%)
adding: test/word/media/image2.jpeg (deflated 61%)
adding: test/word/numbering.xml (deflated 96%)
adding: test/word/settings.xml (deflated 59%)
adding: test/word/styles.xml (deflated 89%)
adding: test/word/theme/ (stored 0%)
adding: test/word/theme/theme1.xml (deflated 79%)
adding: test/word/webSettings.xml (deflated 34%)

Darn, thought I had it with that resource fork idea.


I also checked your idea about files being changed in the round trip. I
extracted one file (A), put it right back in, then took it out again (B).
I then used diff to compare A and B, and found them to be identical.


So, it definitey seems like the meta data is missing from the OPC.

You mentioned "OPC file uses standardized entrypoints to find the relation
between the different files in the container"... did you mean that the
internal OPC file has entrypoint data that it is tracking internally
somewhere within the file, or that the entrypoints are simply standard
file names and folder structures that differentiates it from a POZ file?

So close, so close...


It looks for [Content_Types].xml in the root directory of your zip file.
That file contains links to the different parts in your document. Without
that file, Word (or any OPC capable tool) can do a thing.



When you extract your document, you extract it to a subfolder you create
somewhere. When you recompress your document, you should compress the
CONTENT of that subfolder, not that subfolder. The output of your
compression algorithm makes it look as if you compress the folder "test"
while you should be compressing the content of the folder "test", not the
folder itself. So you should move into the folder and run your compression
command from there.

Yves

  #7   Report Post  
David John David John is offline
Junior Member
 
Posts: 0
Default

Quote:
Originally Posted by Yves Dhondt View Post
"Phantom" wrote in message
news:2009101114123475249-phantom@mailinatorcom...
On 2009-10-11 12:26:01 -0700, "Yves Dhondt"
said:


"Phantom"
wrote in message
news:2009101110305975249-phantom@mailinatorcom...
I'm looking for an Open Packaging Format creator/manager for OS X.

Specfically, I'm trying to generate dynamic .docx files, which is going
pretty well, but I'm not able to repackage the edited files without
blowing up Word... any rel(s), image, or XML file in the package that I
change (other than the document.xml itself) is seen as corrupt by Word
when it tries to open the file. Sure enough, Word is clever enough to
recover the remaining elements, but anything I change gets nuked.


What happens if you open one of the XML files inside the container and
then save it without changing anything? Does that corrupt the document as
well? If so, extract the XML file from both the correct and corrupt
version and do a byte-by-byte comparison. It could be that the editor you
use adds a byte order mark (BOM) at the start of an XML file.

That prevents me from swapping in new images (charts in my case), or
modifying any hyperlinks (which exist in .rel files).

I found that I can open up a .docx file directly in Stuffit Archive
Manager (SAM) (without even having to change the extension), which
eliminates the need to re-zip the .docx files from scratch (which seems
to blow my entire document when I try). Using SAM, I can extract only
the file I choose, edit it, put it back, and the other elements remain
intact.

The key problem appears to be the compression technique: the docx isn't
actually a Plain Old Zip (POZ) file, it's actually an Open Packaging
Convention (OPC) file:

http://en.wikipedia.org/wiki/Open_Packaging_Conventions


From a compression point of view, there isn't any difference between the
two. POZ and OPC are the same format. The difference is that a POZ file
has no notion of its contents while an OPC file uses standardized
entrypoints to find the relation between the different files in the
container.

Now, if I could only find an OPC creator/manager for the Mac... a GUI
would be great, but a command line would do as well. I seemed to have
found what at first appeared such a tool, but it doesn't seem to do
anything except manage MacPorts:
http://www.versiontracker.com/dyn/moreinfo/macosx/32608

FYI, Porticus needs MacPorts installed as well:
http://www.macports.org/install.php

However, I can't seem to see how Porticus helps me with the OPC file
management... I'm probably on a wild turkey chase with that, but there
might be something there.

I'm guessing that there are Mac developers here more informed than me,
hoping someone can shed light on my OPC requirements.

thanks in advance, folks.


Mono implements the System.IO.Packaging namespace for .Net on MacOS (not
sure if it is in the latest release already). That namespace contains an
API for accessing an manipulating OPC files. I know the API works as I
have written apps already which use it (I'm not a Mac specialist, I just
needed to make some software cross-platform). Of course, this means you
would still have to program your application yourself.

Yves


you're definitely ahead of me on ideas, Yves, thanks for your input so
far.

I tried out another idea regarding resource forks, but no go.

I expanded the .docx file to its juicy component files, then (without
changing anything) recompressed them with the command line zip tool, which
by all accounts, does not include resource forks:

zip -X -r test test

... then renamed test.zip as test.docx and attempted to open it with Word
2008. No luck, Word declares the document bogus.

I did attempt a zip -df, but that's long deprecated, and doesn't work.
Given that the current command line zip tool doesn't stuff resource forks
in the first place, it shouldn't be an issue. Just to make sure, I checked
the zip file and didn't see any resource-looking files:

% zip -X -r test test
adding: test/ (stored 0%)
adding: test/[Content_Types].xml (deflated 84%)
adding: test/_rels/ (stored 0%)
adding: test/_rels/.rels (deflated 66%)
adding: test/docProps/ (stored 0%)
adding: test/docProps/app.xml (deflated 73%)
adding: test/docProps/core.xml (deflated 52%)
adding: test/docProps/custom.xml (deflated 60%)
adding: test/word/ (stored 0%)
adding: test/word/_rels/ (stored 0%)
adding: test/word/_rels/document.xml.rels (deflated 85%)
adding: test/word/_rels/header2.xml.rels (deflated 38%)
adding: test/word/_rels/header3.xml.rels (deflated 38%)
adding: test/word/_rels/header4.xml.rels (deflated 38%)
adding: test/word/document.xml (deflated 83%)
adding: test/word/endnotes.xml (deflated 65%)
adding: test/word/fontTable.xml (deflated 85%)
adding: test/word/footer1.xml (deflated 65%)
adding: test/word/footer2.xml (deflated 79%)
adding: test/word/footer3.xml (deflated 81%)
adding: test/word/footer4.xml (deflated 81%)
adding: test/word/footnotes.xml (deflated 65%)
adding: test/word/header1.xml (deflated 70%)
adding: test/word/header2.xml (deflated 64%)
adding: test/word/header3.xml (deflated 64%)
adding: test/word/header4.xml (deflated 64%)
adding: test/word/media/ (stored 0%)
adding: test/word/media/image1.jpeg (deflated 72%)
adding: test/word/media/image2.jpeg (deflated 61%)
adding: test/word/numbering.xml (deflated 96%)
adding: test/word/settings.xml (deflated 59%)
adding: test/word/styles.xml (deflated 89%)
adding: test/word/theme/ (stored 0%)
adding: test/word/theme/theme1.xml (deflated 79%)
adding: test/word/webSettings.xml (deflated 34%)

Darn, thought I had it with that resource fork idea.


I also checked your idea about files being changed in the round trip. I
extracted one file (A), put it right back in, then took it out again (B).
I then used diff to compare A and B, and found them to be identical.


So, it definitey seems like the meta data is missing from the OPC.

You mentioned "OPC file uses standardized entrypoints to find the relation
between the different files in the container"... did you mean that the
internal OPC file has entrypoint data that it is tracking internally
somewhere within the file, or that the entrypoints are simply standard
file names and folder structures that differentiates it from a POZ file?

So close, so close...


It looks for [Content_Types].xml in the root directory of your zip file.
That file contains links to the different parts in your document. Without
that file, Word (or any OPC capable tool) can do a thing.



When you extract your document, you extract it to a subfolder you create
somewhere. When you recompress your document, you should compress the
CONTENT of that subfolder, not that subfolder. The output of your
compression algorithm makes it look as if you compress the folder "test"
while you should be compressing the content of the folder "test", not the
folder itself. So you should move into the folder and run your compression
command from there.

Yves
Packaging is very important for our product and for our business, Because packaging play a key role in the advertising the brand of the product.
  #8   Report Post  
leopicasso leopicasso is offline
Junior Member
 
Posts: 0
Default

The information above is very good to me, thanks for sharing! Five Nights at Freddy's
Reply
Thread Tools
Display Modes

Posting Rules

Smilies are On
[IMG] code is On
HTML code is Off

Forum Jump

Similar Threads
Thread Thread Starter Forum Replies Last Post
Import bibliography in source manager XML format Nima Microsoft Word Help 3 May 16th 09 10:13 AM
WORD.EXE open multiple times in task manager jimrinflorida Microsoft Word Help 4 July 30th 08 01:22 PM
WORD.EXE open multiple times in task manager jimrinflorida Microsoft Word Help 1 July 27th 08 10:38 PM
I want all docs to open at 115% zoom, not what the creator set . JudyHarris Microsoft Word Help 1 September 24th 05 06:08 AM
Looking for Resident Manager Job Description ideas/format RM Dave Page Layout 1 February 21st 05 07:37 PM


All times are GMT +1. The time now is 03:04 PM.

Copyright ©2000 - 2024, Jelsoft Enterprises Ltd.
Copyright ©2004-2024 Microsoft Office Word Forum - WordBanter.
The comments are property of their posters.
 

About Us

"It's about Microsoft Word"