View Single Post
  #7   Report Post  
David John David John is offline
Junior Member
 
Posts: 0
Default

Quote:
Originally Posted by Yves Dhondt View Post
"Phantom" wrote in message
news:2009101114123475249-phantom@mailinatorcom...
On 2009-10-11 12:26:01 -0700, "Yves Dhondt"
said:


"Phantom"
wrote in message
news:2009101110305975249-phantom@mailinatorcom...
I'm looking for an Open Packaging Format creator/manager for OS X.

Specfically, I'm trying to generate dynamic .docx files, which is going
pretty well, but I'm not able to repackage the edited files without
blowing up Word... any rel(s), image, or XML file in the package that I
change (other than the document.xml itself) is seen as corrupt by Word
when it tries to open the file. Sure enough, Word is clever enough to
recover the remaining elements, but anything I change gets nuked.


What happens if you open one of the XML files inside the container and
then save it without changing anything? Does that corrupt the document as
well? If so, extract the XML file from both the correct and corrupt
version and do a byte-by-byte comparison. It could be that the editor you
use adds a byte order mark (BOM) at the start of an XML file.

That prevents me from swapping in new images (charts in my case), or
modifying any hyperlinks (which exist in .rel files).

I found that I can open up a .docx file directly in Stuffit Archive
Manager (SAM) (without even having to change the extension), which
eliminates the need to re-zip the .docx files from scratch (which seems
to blow my entire document when I try). Using SAM, I can extract only
the file I choose, edit it, put it back, and the other elements remain
intact.

The key problem appears to be the compression technique: the docx isn't
actually a Plain Old Zip (POZ) file, it's actually an Open Packaging
Convention (OPC) file:

http://en.wikipedia.org/wiki/Open_Packaging_Conventions


From a compression point of view, there isn't any difference between the
two. POZ and OPC are the same format. The difference is that a POZ file
has no notion of its contents while an OPC file uses standardized
entrypoints to find the relation between the different files in the
container.

Now, if I could only find an OPC creator/manager for the Mac... a GUI
would be great, but a command line would do as well. I seemed to have
found what at first appeared such a tool, but it doesn't seem to do
anything except manage MacPorts:
http://www.versiontracker.com/dyn/moreinfo/macosx/32608

FYI, Porticus needs MacPorts installed as well:
http://www.macports.org/install.php

However, I can't seem to see how Porticus helps me with the OPC file
management... I'm probably on a wild turkey chase with that, but there
might be something there.

I'm guessing that there are Mac developers here more informed than me,
hoping someone can shed light on my OPC requirements.

thanks in advance, folks.


Mono implements the System.IO.Packaging namespace for .Net on MacOS (not
sure if it is in the latest release already). That namespace contains an
API for accessing an manipulating OPC files. I know the API works as I
have written apps already which use it (I'm not a Mac specialist, I just
needed to make some software cross-platform). Of course, this means you
would still have to program your application yourself.

Yves


you're definitely ahead of me on ideas, Yves, thanks for your input so
far.

I tried out another idea regarding resource forks, but no go.

I expanded the .docx file to its juicy component files, then (without
changing anything) recompressed them with the command line zip tool, which
by all accounts, does not include resource forks:

zip -X -r test test

... then renamed test.zip as test.docx and attempted to open it with Word
2008. No luck, Word declares the document bogus.

I did attempt a zip -df, but that's long deprecated, and doesn't work.
Given that the current command line zip tool doesn't stuff resource forks
in the first place, it shouldn't be an issue. Just to make sure, I checked
the zip file and didn't see any resource-looking files:

% zip -X -r test test
adding: test/ (stored 0%)
adding: test/[Content_Types].xml (deflated 84%)
adding: test/_rels/ (stored 0%)
adding: test/_rels/.rels (deflated 66%)
adding: test/docProps/ (stored 0%)
adding: test/docProps/app.xml (deflated 73%)
adding: test/docProps/core.xml (deflated 52%)
adding: test/docProps/custom.xml (deflated 60%)
adding: test/word/ (stored 0%)
adding: test/word/_rels/ (stored 0%)
adding: test/word/_rels/document.xml.rels (deflated 85%)
adding: test/word/_rels/header2.xml.rels (deflated 38%)
adding: test/word/_rels/header3.xml.rels (deflated 38%)
adding: test/word/_rels/header4.xml.rels (deflated 38%)
adding: test/word/document.xml (deflated 83%)
adding: test/word/endnotes.xml (deflated 65%)
adding: test/word/fontTable.xml (deflated 85%)
adding: test/word/footer1.xml (deflated 65%)
adding: test/word/footer2.xml (deflated 79%)
adding: test/word/footer3.xml (deflated 81%)
adding: test/word/footer4.xml (deflated 81%)
adding: test/word/footnotes.xml (deflated 65%)
adding: test/word/header1.xml (deflated 70%)
adding: test/word/header2.xml (deflated 64%)
adding: test/word/header3.xml (deflated 64%)
adding: test/word/header4.xml (deflated 64%)
adding: test/word/media/ (stored 0%)
adding: test/word/media/image1.jpeg (deflated 72%)
adding: test/word/media/image2.jpeg (deflated 61%)
adding: test/word/numbering.xml (deflated 96%)
adding: test/word/settings.xml (deflated 59%)
adding: test/word/styles.xml (deflated 89%)
adding: test/word/theme/ (stored 0%)
adding: test/word/theme/theme1.xml (deflated 79%)
adding: test/word/webSettings.xml (deflated 34%)

Darn, thought I had it with that resource fork idea.


I also checked your idea about files being changed in the round trip. I
extracted one file (A), put it right back in, then took it out again (B).
I then used diff to compare A and B, and found them to be identical.


So, it definitey seems like the meta data is missing from the OPC.

You mentioned "OPC file uses standardized entrypoints to find the relation
between the different files in the container"... did you mean that the
internal OPC file has entrypoint data that it is tracking internally
somewhere within the file, or that the entrypoints are simply standard
file names and folder structures that differentiates it from a POZ file?

So close, so close...


It looks for [Content_Types].xml in the root directory of your zip file.
That file contains links to the different parts in your document. Without
that file, Word (or any OPC capable tool) can do a thing.



When you extract your document, you extract it to a subfolder you create
somewhere. When you recompress your document, you should compress the
CONTENT of that subfolder, not that subfolder. The output of your
compression algorithm makes it look as if you compress the folder "test"
while you should be compressing the content of the folder "test", not the
folder itself. So you should move into the folder and run your compression
command from there.

Yves
Packaging is very important for our product and for our business, Because packaging play a key role in the advertising the brand of the product.