View Single Post
  #5   Report Post  
Posted to microsoft.public.word.docmanagement,microsoft.public.word.programming
Phantom[_2_] Phantom[_2_] is offline
external usenet poster
 
Posts: 3
Default Open Packaging Format creator/manager for OS X

On 2009-10-11 12:26:01 -0700, "Yves Dhondt" said:


"Phantom" wrote in message
news:2009101110305975249-phantom@mailinatorcom...
I'm looking for an Open Packaging Format creator/manager for OS X.

Specfically, I'm trying to generate dynamic .docx files, which is going
pretty well, but I'm not able to repackage the edited files without
blowing up Word... any rel(s), image, or XML file in the package that I
change (other than the document.xml itself) is seen as corrupt by Word
when it tries to open the file. Sure enough, Word is clever enough to
recover the remaining elements, but anything I change gets nuked.


What happens if you open one of the XML files inside the container and
then save it without changing anything? Does that corrupt the document
as well? If so, extract the XML file from both the correct and corrupt
version and do a byte-by-byte comparison. It could be that the editor
you use adds a byte order mark (BOM) at the start of an XML file.

That prevents me from swapping in new images (charts in my case), or
modifying any hyperlinks (which exist in .rel files).

I found that I can open up a .docx file directly in Stuffit Archive
Manager (SAM) (without even having to change the extension), which
eliminates the need to re-zip the .docx files from scratch (which seems
to blow my entire document when I try). Using SAM, I can extract only
the file I choose, edit it, put it back, and the other elements remain
intact.

The key problem appears to be the compression technique: the docx isn't
actually a Plain Old Zip (POZ) file, it's actually an Open Packaging
Convention (OPC) file:

http://en.wikipedia.org/wiki/Open_Packaging_Conventions


From a compression point of view, there isn't any difference between
the two. POZ and OPC are the same format. The difference is that a POZ
file has no notion of its contents while an OPC file uses standardized
entrypoints to find the relation between the different files in the
container.

Now, if I could only find an OPC creator/manager for the Mac... a GUI
would be great, but a command line would do as well. I seemed to have
found what at first appeared such a tool, but it doesn't seem to do
anything except manage MacPorts:
http://www.versiontracker.com/dyn/moreinfo/macosx/32608

FYI, Porticus needs MacPorts installed as well:
http://www.macports.org/install.php

However, I can't seem to see how Porticus helps me with the OPC file
management... I'm probably on a wild turkey chase with that, but there
might be something there.

I'm guessing that there are Mac developers here more informed than me,
hoping someone can shed light on my OPC requirements.

thanks in advance, folks.


Mono implements the System.IO.Packaging namespace for .Net on MacOS
(not sure if it is in the latest release already). That namespace
contains an API for accessing an manipulating OPC files. I know the API
works as I have written apps already which use it (I'm not a Mac
specialist, I just needed to make some software cross-platform). Of
course, this means you would still have to program your application
yourself.

Yves


you're definitely ahead of me on ideas, Yves, thanks for your input so far.

I tried out another idea regarding resource forks, but no go.

I expanded the .docx file to its juicy component files, then (without
changing anything) recompressed them with the command line zip tool,
which by all accounts, does not include resource forks:

zip -X -r test test

.... then renamed test.zip as test.docx and attempted to open it with
Word 2008. No luck, Word declares the document bogus.

I did attempt a zip -df, but that's long deprecated, and doesn't work.
Given that the current command line zip tool doesn't stuff resource
forks in the first place, it shouldn't be an issue. Just to make sure,
I checked the zip file and didn't see any resource-looking files:

% zip -X -r test test
adding: test/ (stored 0%)
adding: test/[Content_Types].xml (deflated 84%)
adding: test/_rels/ (stored 0%)
adding: test/_rels/.rels (deflated 66%)
adding: test/docProps/ (stored 0%)
adding: test/docProps/app.xml (deflated 73%)
adding: test/docProps/core.xml (deflated 52%)
adding: test/docProps/custom.xml (deflated 60%)
adding: test/word/ (stored 0%)
adding: test/word/_rels/ (stored 0%)
adding: test/word/_rels/document.xml.rels (deflated 85%)
adding: test/word/_rels/header2.xml.rels (deflated 38%)
adding: test/word/_rels/header3.xml.rels (deflated 38%)
adding: test/word/_rels/header4.xml.rels (deflated 38%)
adding: test/word/document.xml (deflated 83%)
adding: test/word/endnotes.xml (deflated 65%)
adding: test/word/fontTable.xml (deflated 85%)
adding: test/word/footer1.xml (deflated 65%)
adding: test/word/footer2.xml (deflated 79%)
adding: test/word/footer3.xml (deflated 81%)
adding: test/word/footer4.xml (deflated 81%)
adding: test/word/footnotes.xml (deflated 65%)
adding: test/word/header1.xml (deflated 70%)
adding: test/word/header2.xml (deflated 64%)
adding: test/word/header3.xml (deflated 64%)
adding: test/word/header4.xml (deflated 64%)
adding: test/word/media/ (stored 0%)
adding: test/word/media/image1.jpeg (deflated 72%)
adding: test/word/media/image2.jpeg (deflated 61%)
adding: test/word/numbering.xml (deflated 96%)
adding: test/word/settings.xml (deflated 59%)
adding: test/word/styles.xml (deflated 89%)
adding: test/word/theme/ (stored 0%)
adding: test/word/theme/theme1.xml (deflated 79%)
adding: test/word/webSettings.xml (deflated 34%)

Darn, thought I had it with that resource fork idea.


I also checked your idea about files being changed in the round trip. I
extracted one file (A), put it right back in, then took it out again
(B). I then used diff to compare A and B, and found them to be
identical.


So, it definitey seems like the meta data is missing from the OPC.

You mentioned "OPC file uses standardized entrypoints to find the
relation between the different files in the container"... did you mean
that the internal OPC file has entrypoint data that it is tracking
internally somewhere within the file, or that the entrypoints are
simply standard file names and folder structures that differentiates it
from a POZ file?

So close, so close...