Microsoft Office Word Forum - WordBanter - View Single Post - mail merge http-sourced data

Peter Jamieson

An aside, the open data source dialog includes "*.htm, *.html, *.asp" as a
supported type -- any idea what format it's expecting for that? I tried
plain text and HTML table, but got no love.

Word can indeed use a table in an HTML format file as a data source. By
default it uses the Jet/ACE OLE DB provider and the HTML IISAM, but it may
also try its internal text converter in certain cicumstances and you may be
able to use the same syntax as you were using with the .csv to do that too.

I can't say I've explored this in depth but the only experiments I have done
are with very plain handcrafted HTML, with nothing much except table, tr
and td tags (the html and body tags appear to be optional). I've never
tried with a .asp generating the HTML and cannot do so right now. Nor have I
tried .mht (which as far as I can tell is actually some sort of MIME-encoded
HTML)

some stuff from one of my previous messages...

I'm not sure you can use the text file provider + SCHEMA.INI for that -
MS's documentation suggests that the text file provider and the HTML
provider are the same thing, but I think you have to specify different
extended properties in yor connection string, i.e.

Extended Properties=""HTML Import;HDR=YES;"";"
or something like
Extended Properties="HTML Import;HDR=YES;"
in a .odc file.

(The engine type, 112, is different from the text file engine type, 96,
but in this case it seems that you need the extended properties rather
than the engine type).

If the file is also encoded usig a particular character set, I don't
know how you would specify that. I was never sure how international
character set stuff was handled in earlier HTML versions anyway.

FWIW
1. the header row can be defined as
trthfield1/ththfield2/ththetc./th/tr
or
trtdfield1/tdtdfield2/tdtdetc./td/tr

2. If you have more than one table, the provider uses the name in any
caption tag in the table if there is one, but for "uncaptioned" tables
it seems to name the tables

table
table1
table2

etc.

You may need to name the table in the .odc or in the SQLStatement
parameter of the OpenDataSource call

(e.g.
SELECT * FROM [table]
)

As far as I can see, this all works with simple examples but I've never
used it "for real". You appear to need a .odc when the data gets more
complex (perhaps if there is more than one table or you named a table
with a caption), but you may be able to specify the .htm file as the
Name parameter and put everything else in the Connection parameter (and
possibly the SQLStatement paramter) in simpler cases.

Addendum: [I probably should just delete this paragraph but I can't
resist...] Ya know, if Word opened mail merge data connections in a way
that was read-only, dangerous SQL would be a non-issue -- oh wait, I
forgot, current MS data clients are only capable of restricting
writability at the cursor level, "read-only" connections are still free to
execute DML that writes data... But still, the inherent and absolute
non-writability of CSV text delivered via HTTP (no freakin less) ought to
factor in... Ah screw, clearly they have a vested interest in "security
via beating the user over the head with incessant popup warnings," who am
I to question that? :-)

Yeah, I try to resist those sort of observations because unless you can
connect to the right person in MS at the right moment, nothing will ever
happen. That said, MS seems to respond to all reports of security flaws by
"fixing" them, but
a. in many cases, these appear to be "lawyers' fixes", which IMO allow
someone to claim that any problems that arise are someone else's fault, but
actually encourage an environment where users routinely dismiss warning
messages
b. I have never understood it is Word's responsibility to prevent access to
other data files. It seems to me that it is the responsibility of the
operating system and applications such as SQL Server to provide facilities
to allow data providers to define access to data, and for users and system
buiders to use them. And of course both Windows and SQL Server do provide
such facilities - it's just that it seems that Word cannot be left to rely
solely on them.
c. I can just about understand why things such as DATABASE and INCLUDETEXT
fields - which that could suck lots of data into a document without the user
even realising - were nobbled a few years ago to prevent the resulting data
from being distributed by botnets etc., but
- even in those cases, it's not completely obvious why it's /Word's/
responsibility to prevent the data being accessed.
- that would also seem to apply to Word 2007 documents which could
potentially contain loads of Custom XML data that the user never sees
either. No-one seems to have spent much time worrying about that.

I guess in the end, Microsoft had to assume that there are simply too many
systems out there with data that has not been properly secured and decided
to impose a "nanny state" solution. But I really don't know.

--
Peter Jamieson
http://tips.pjmsn.me.uk

"Mark McGinty" wrote in message
...
Peter,

Thank you very much for the informative response (you far-exceeded my
expectations!) :-) That registry option does indeed dispense with all
the prompts! Very cool! :-)

Oddly enough we do have the capacity to implement registry changes -- we
currently establish IE trust and adjust settings for the trusted zone on
client systems by asking them to run an executable that directly writes
the registry. (The client-specific domain name is incorporated into the
name of the .exe as a sort-of chicken parameter, to make a single version
of the code able to handle any client domain, by merely copying it to a
different file name.) Without that our current Office automation + FSO
scheme would be doomed from the start.

Ironically that little trust coersion mechanism was one of the things we
were hoping to forego with this change -- but this option is surely the
lesser of two evils. If we could do away with permitting Script ActiveX
not marked as safe [for trusted zone], and write-access to the client file
system as requirements I'd call it major progress. More, even without the
registry change, it does still function (as opposed to our current
scheme.) Dispensing with the annoying prompts would be icing on the cake,
rather than lack of registry changes being a show-stopper by default.
More still, the change is configuration-agnostic, a simple .reg file would
do it.

An aside, the open data source dialog includes "*.htm, *.html, *.asp" as a
supported type -- any idea what format it's expecting for that? I tried
plain text and HTML table, but got no love. I figured, sure an html file
could be local, but .asp, for all practical purposes, would be a useless
option if not via HTTP. I assumed this option almost insured workability
of HTTP as a transport... but needless to say, I was disappointed.

In any case, your reply was very helpful, thanks again!!

-Mark

Addendum: [I probably should just delete this paragraph but I can't
resist...] Ya know, if Word opened mail merge data connections in a way
that was read-only, dangerous SQL would be a non-issue -- oh wait, I
forgot, current MS data clients are only capable of restricting
writability at the cursor level, "read-only" connections are still free to
execute DML that writes data... But still, the inherent and absolute
non-writability of CSV text delivered via HTTP (no freakin less) ought to
factor in... Ah screw, clearly they have a vested interest in "security
via beating the user over the head with incessant popup warnings," who am
I to question that? :-)

"Peter Jamieson" wrote in message
...
You did well to connect to a data source that's an HTTP URL - FWIW, I had
never found a way to do it within the Word UI or using VBA
OpenDataSource, but specifying the URL in the XML works here too.

Unfortunately, that seems to be about as good as it gets. The following
may tell you something, but it's hard to see how you could actually
achieve your objective.

At least one of the prompts is probably the SQL-related prompt that
requires a registry fix on the user's machine:

http://support.microsoft.com/kb/825765

So that's not something you can modify on behalf of your site users
(uness they are all actually on your own corporate network, in which case
I suppose it is at least feasible).

You used to be able to avoid that prompt using RTF and removing the SQL,
but
a. there was a time when Word would discard any mailmerge info. in RTF
format files
b. Microsoft seemed to close off that possibility in an update a year or
so ago.
c. I don't know whether the equivalent approach ever worked in XML.

However, I don't know what other messages you are seeing. Here, the most
significant difference in behaviour is to do with whether the URL where
the .csv is located is trusted by Internet Explorer. If it isn't, Word
won't connect. If it is, Word will connect, but my experience with Word
2007/Vista suggests that it is always actually dealing with a downloaded
copy "behind the scenes" as you suggest. And again, unless you have
control over what URLs your users' system trust, there is no simple fix
for that.

Even the new Word 2007 approach of using content controls and putting all
the data in a custom XML data store isn't workable unless you can
guarantee that your users can download and execute a complete VSTO-type
Word document with the .NET code to point to each record in turn and
produce tear-off copies of the document.

Best I can do, sorry.

Peter Jamieson

http://tips.pjmsn.me.uk

Mark McGinty wrote:
env: Word 2003+

Greets,

I'm trying to do something that, on the surface, seemed pretty basic: I
need to serve-up a Word mail-merge document (as XML) over HTTP, that is
already linked to data that is also served over HTTP.

The closest I have gotten is to link .csv data by setting the attributes
(defined using XPath) w:wordDocument/w:docPr/w:mailMerge/w:query@w:val
and w:wordDocument/w:docPr/w:mailMerge/w:dataSource@w:val to a URL.
(w:query@w:val must be prefixed with "SELECT * FROM"; oddly, both must
be set.)

This actually does work, except that it prompts annoyingly, 2 to 3
times. The first time I could accept (the prompt references the URL) but
the "extra" prompts make the functionality look dumb (they apparently
reference some locally cached alias -- from a user standpoint, it looks
really lame.)

To summarize, we want to provide a Word mail merge all pre-prepared, no
user intervention required, as a feature of a web site. We don't want
to expose any SQL Servers to the public Internet (for obvious reasons)
and we absolutely do NOT want to use Jet (not explicitly, anyway.) Our
current solution (which had to support Word 2000) involves use of Office
automation objects and the scripting file system object -- difficult to
support to say the least.

At one point it looked to me like there was a way to embed the merge
data inside the [xml] document itself, though I can't find a way to make
that happen via Word's UI. (Otoh, I couldn't get Word to persist a
URL-based data source via UI either, and that nominally works, soooo...)

Any suggestions appreciated.

-Mark McGinty