Microsoft Office Word Forum - WordBanter - View Single Post - CSV merge data via HTTP: redundant downloads?

Peter Jamieson

I can't answer your question, but could not get to the point where Word
would open my csv from an http address so could not look at the traffic.

Maybe you could let me know precisely what changes you ended up making
in your xml. Here, I changed a local drive file:///c:\a\csv.csv address
to http://www.somewhere.com/csv.csv in two places:
a. a relationship element with
Type="http://schemas.openxmlformats.org/officeDocument/2006/relationships/mailMergeSource"
b. the SELECT statement in the w:query element

FWIW I don't look at the protocol side much so cannot guarantee to
interpret what I see correctly, but I do remember having a conversation
some years ago about another aspect of the way that Word dealt with http
addresses etc., where Word also seemed to insist on a pretty laborious
process (I think it was to follow a HYPERLINK field).

In this case, I notice that when I open the document, Word first tries
to issue a Webdav(?) PROPFIND, fails, then

issues a GET and receives a 305 Not Modified,
issues a HEAD and a GET and receives a 305 Not Modified
issues a HEAD and a GET and receives a 305 Not Modified
issues a HEAD

Peter Jamieson

http://tips.pjmsn.me.uk

On 21/02/2010 14:47, Mark McGinty wrote:
Greets,

I have written an app that programmatically connects a mail merge [WordXML]
document to a source of CSV data by altering the XML to specify a URL as the
data source, and adding the appropriate field definitions. (The UI won't
let you do this, even though it offers .ASP files as a possible data
source -- go figure.)

This all works nicely, but there's one little quirk: when the user opens the
document, Word downloads the data between 3 and 4 times -- I watched it
happen using WireShark, it sends the same request to the server (thus
downloads the same data) multiple times.

As a matter of course this sort of thing is set to expire immediately, but
in this case I set it to expire 5 minutes in the future, hoping to leverage
browser cache -- but no love! Apparently Word's HTTP request bypasses
cache. (I verified that the headers returned by the server permitted it to
be cached, again using WireShark; the request is definitely cachable.)

A rational reason for this behavior exceeds the limits of my imagination!
You'd think that if there was even a chance of redundant downloads, they
would at least respect the cache control headers sent by the server, but
nooooo. Cache would've made it barely noticable...

Anyways, the problem isn't a show-stopper, but the needless waste of server
resources (oh yeah, and end-users' time) is always annoying... hoping maybe
someone here has some insight.

TIA,
Mark McGinty