Home |
Search |
Today's Posts |
#1
Posted to microsoft.public.word.mailmerge.fields
|
|||
|
|||
How do I tell Word 2003 my mail-merge text is NOT UTF-8?
I am using Word 2000 automation to invoke a mail-merge from a number of C++
applications. For various operational reasons not relevant here, the applications create a tab-separated temporary file and call a library function which uses the OpenDataSource method of the MailMerge object to connect it to the template before using Execute to perform the merge. The verson of Word actually in use is Word 2003. It all works fine in the majority of cases, but one of the merge fields is 8-bit text used to print an i2of5 barcode using a special font - in this format each 8-bit character encodes 2 barcode digits. Every now and again the value in this field gets truncated - this is triggered by certain values in the first data line of the file (i.e. the second line, as the first contains field names), and thereafter applies to every instance of the field in the data source. Moving the offending data line further down the file "cures" the problem, but is not an option in practice. By dropping the offending field into a simple text file and opening it interactively with Word, I have established that Word is wrongly guessing the file (or maybe only the field ?) to be encoded in UTF-8 - it brings up an interactive dialogue which shows the guessed encoding and previews the text as truncated. Selecting other encodings displays the text in various other ways, but (critically) does not truncate it. Similarly, most values for the barcode text are guessed by Word to be in various encodings which do not truncate. I have tried explicitly setting the WdOpenFormat parameter of OpenDataSource to wdOpenFormatText, but it makes no difference - in fact, whatever value is in this parameter seems to be ignored. Is there any other way I can get these characters passed through without being corrupted, other than by writing the data source file in 16-bit Unicode in the first place ? I am reluctant to do this as it would mean either changing many programs or making the library routine transcribe the entire data source file, which can be quite large. For the record, a value of the text string which causes problems is "(7åÓ)" (i.e. hex 28 37 e8 cc 4a 29), whereas values such as "(7èÌJ)" (i.e. hex 28 37 e5 3c d3 29) work OK. My PC is running Windows XP SP2 with UK English regional settings (code page 0x0809). Any help anyone can offer will be much appreciated. Regards, Jolyon |
#3
Posted to microsoft.public.word.mailmerge.fields
|
|||
|
|||
How do I tell Word 2003 my mail-merge text is NOT UTF-8?
BTW, to use the .odc sample I reference, you will probably need to replace
the reference to the ACE OLE DB provider in the connection string by the Jet one, i.e. replace Microsoft.ACE.OLEDB.12.0 with Microsoft.Jet.OLEDB.4.0 Peter Jamieson "Peter Jamieson" wrote in message ... I am using Word 2000 automation to invoke a mail-merge from a number of C++ applications. For various operational reasons not relevant here, the applications create a tab-separated temporary file and call a library function which uses the OpenDataSource method of the MailMerge object to connect it to the template before using Execute to perform the merge. The verson of Word actually in use is Word 2003. Is it Word 2000, as you first mention, or Word 2003? From your description it sounds like 2003, but there are significant differences. It is probably worth seeing if setting the DefaultCPG registry value described in http://support.microsoft.com/kb/290981/en-us (It's also possible that opening the document in Word with an explicit encoding , saving it as a WOrd document, then using that as the data source for a merge, as described in that article, might do the trick). If your data source has 255 columns or fewer, and you are using Word 2003, you can try the approach using .odc and SCHEMA.INI that I described in the conversation beginning at http://groups.google.com/group/micro...unicode&rnum=1 If you are using Word 2000, that can't work because it doesn't support OLE DB and .odc files. Although you can use ODBC to connect using a similar SCHEMA.INI, the ODBC driver does not seem to recognise all the entries in the .INI file that the OLE DB provider does, and it doesn't have the same character encoding support anyway AFAIK. In this particular case, it's possible that the 8-bit encoding will screw up any encoding choices you make anyway, if Word does recoognise the characters as being part of the character set implicitly or explicity specified. However, you can but try. Peter Jamieson "Jolyon Cox" Jolyon wrote in message ... I am using Word 2000 automation to invoke a mail-merge from a number of C++ applications. For various operational reasons not relevant here, the applications create a tab-separated temporary file and call a library function which uses the OpenDataSource method of the MailMerge object to connect it to the template before using Execute to perform the merge. The verson of Word actually in use is Word 2003. It all works fine in the majority of cases, but one of the merge fields is 8-bit text used to print an i2of5 barcode using a special font - in this format each 8-bit character encodes 2 barcode digits. Every now and again the value in this field gets truncated - this is triggered by certain values in the first data line of the file (i.e. the second line, as the first contains field names), and thereafter applies to every instance of the field in the data source. Moving the offending data line further down the file "cures" the problem, but is not an option in practice. By dropping the offending field into a simple text file and opening it interactively with Word, I have established that Word is wrongly guessing the file (or maybe only the field ?) to be encoded in UTF-8 - it brings up an interactive dialogue which shows the guessed encoding and previews the text as truncated. Selecting other encodings displays the text in various other ways, but (critically) does not truncate it. Similarly, most values for the barcode text are guessed by Word to be in various encodings which do not truncate. I have tried explicitly setting the WdOpenFormat parameter of OpenDataSource to wdOpenFormatText, but it makes no difference - in fact, whatever value is in this parameter seems to be ignored. Is there any other way I can get these characters passed through without being corrupted, other than by writing the data source file in 16-bit Unicode in the first place ? I am reluctant to do this as it would mean either changing many programs or making the library routine transcribe the entire data source file, which can be quite large. For the record, a value of the text string which causes problems is "(7åÓ)" (i.e. hex 28 37 e8 cc 4a 29), whereas values such as "(7èÌJ)" (i.e. hex 28 37 e5 3c d3 29) work OK. My PC is running Windows XP SP2 with UK English regional settings (code page 0x0809). Any help anyone can offer will be much appreciated. Regards, Jolyon |
#4
Posted to microsoft.public.word.mailmerge.fields
|
|||
|
|||
How do I tell Word 2003 my mail-merge text is NOT UTF-8?
Thanks for the reply. In answer to your points:
1) We are using the Word 2000 automation interface because that is what is supported by the tool in which the programs are developed (Borland Developer Studio 2006). In this version, for instance, the OpenDataSource() method has only 14 parameters rather than the current 16. However, we actually have Office 2003 installed because this fixes some (though not all) other problems to do with mail-merge. 2) I tried adding the registry key mentioned in the KB article - for Word 11 as well as Word 10 - but it makes no difference. 3) I don't think your suggestion for using .odc is feasible - the items in the data source come from a variety of places. Many of them are retrieved from an Ingres database via ODBC, but some are created on the fly by application code. Anyway, I do not have the resources to rewrite and re-test all the affected applications, not to mention retraining all the developers. All I am trying to do is prevent Word from making wild (and wrong) guesses about the content of a mail-merge data source - surely this is not an unreasonable thing to expect ? I do note that even the latest implementation of OpenDataSource() does not have an encoding parameter as Documents.Open() has - this seems to be the real problem. I will keep plugging away and let you know of any progress... Jolyon "Peter Jamieson" wrote: BTW, to use the .odc sample I reference, you will probably need to replace the reference to the ACE OLE DB provider in the connection string by the Jet one, i.e. replace Microsoft.ACE.OLEDB.12.0 with Microsoft.Jet.OLEDB.4.0 Peter Jamieson "Peter Jamieson" wrote in message ... Is it Word 2000, as you first mention, or Word 2003? From your description it sounds like 2003, but there are significant differences. It is probably worth seeing if setting the DefaultCPG registry value described in http://support.microsoft.com/kb/290981/en-us (It's also possible that opening the document in Word with an explicit encoding , saving it as a WOrd document, then using that as the data source for a merge, as described in that article, might do the trick). If your data source has 255 columns or fewer, and you are using Word 2003, you can try the approach using .odc and SCHEMA.INI that I described in the conversation beginning at http://groups.google.com/group/micro...unicode&rnum=1 If you are using Word 2000, that can't work because it doesn't support OLE DB and .odc files. Although you can use ODBC to connect using a similar SCHEMA.INI, the ODBC driver does not seem to recognise all the entries in the .INI file that the OLE DB provider does, and it doesn't have the same character encoding support anyway AFAIK. In this particular case, it's possible that the 8-bit encoding will screw up any encoding choices you make anyway, if Word does recoognise the characters as being part of the character set implicitly or explicity specified. However, you can but try. Peter Jamieson |
#5
Posted to microsoft.public.word.mailmerge.fields
|
|||
|
|||
How do I tell Word 2003 my mail-merge text is NOT UTF-8?
All I am trying to do is prevent Word from making wild (and wrong) guesses
about the content of a mail-merge data source - surely this is not an unreasonable thing to expect ? I do note that even the latest implementation of OpenDataSource() does not have an encoding parameter as Documents.Open() has - this seems to be the real problem. Yes, I would also prefer it if Word let you specify the encoding and everything could be kept very simple, but unfortunately a. I don't work for Microsoft - I'm just a volunteer - so I am also stuck with the way Word actually works. b. the .odc approach is the only one I know that has a chance of solving the specific problem you described in a reasonably simple way. (FWIW c. several of the parameters in OpenDataSource have no effect and are probably only there because someone in the WordBasic era decided that OpenDataSource would probably need much the same parameters as Open. d. Arguably the whole problem with OpenDataSource, ODBC and OLE DB is that between them they don't abstract the business of opening an arbitrary data source anything like well enough. For example, even if you had a character encoding parameter, Word would have to know that it would have to be able to provide it to its external text converter via one mechanism, and to OLE DB via another, and that would be data source-dependant. For example, when opening a .txt file via OLE DB there is no parameter you can specify in the connection string that says "use this character encoding".) 3) I don't think your suggestion for using .odc is feasible - the items in the data source come from a variety of places. Many of them are retrieved from an Ingres database via ODBC, but some are created on the fly by application code. Anyway, I do not have the resources to rewrite and re-test all the affected applications, not to mention retraining all the developers. OK, but in your original post you described a specific situation where you were creating a tab-separated temp file with barcode data and using that as your data source - in that case I would hope that you be able to limit the use of .odc to the specific situation where you are creating the data source on-the-fly. If you're connecting on-the-fly using a library routine under your control then if absolutely necessary you could consider adding a .odc and an entry in a SCHEMA.INI on-the-fly as well. As far as the .odc is concerned, it is in effect just a text file with the path name of the text file's folder and the file name of the .txt file, i.e. no nasty binary stuff to create, and the SCHEMA.INI is a standard .INI with one section per file. 1) We are using the Word 2000 automation interface because that is what is supported by the tool in which the programs are developed (Borland Developer Studio 2006). In this version, for instance, the OpenDataSource() method has only 14 parameters rather than the current 16. However, we actually have Office 2003 installed because this fixes some (though not all) other problems to do with mail-merge. OK, I'm not sure whether using the Word 2000 interface would make any difference as far as encoding issues are concerned, unless it prevented you from using the OLE DB connectivity available in Word 2003 (in which case you could not use the .odc approach), /or/ you needed to connect using DDE or ODBC and the inability to specify the Subtype parameter prevented you from doing that. I would have thought that with Borland you would be able to specify a .olb or .tlb with the correct parameter list if necessary, somehow or other (with the older versions of Delphi, for example, there was support for the dispatch method of Automation and I don't think the compiler did any type checking at all, but that approach may be significantly harder to use in more recent versions, and/or with C++ rather than Delphi. Peter Jamieson "Jolyon Cox" wrote in message ... Thanks for the reply. In answer to your points: 1) We are using the Word 2000 automation interface because that is what is supported by the tool in which the programs are developed (Borland Developer Studio 2006). In this version, for instance, the OpenDataSource() method has only 14 parameters rather than the current 16. However, we actually have Office 2003 installed because this fixes some (though not all) other problems to do with mail-merge. 2) I tried adding the registry key mentioned in the KB article - for Word 11 as well as Word 10 - but it makes no difference. 3) I don't think your suggestion for using .odc is feasible - the items in the data source come from a variety of places. Many of them are retrieved from an Ingres database via ODBC, but some are created on the fly by application code. Anyway, I do not have the resources to rewrite and re-test all the affected applications, not to mention retraining all the developers. All I am trying to do is prevent Word from making wild (and wrong) guesses about the content of a mail-merge data source - surely this is not an unreasonable thing to expect ? I do note that even the latest implementation of OpenDataSource() does not have an encoding parameter as Documents.Open() has - this seems to be the real problem. I will keep plugging away and let you know of any progress... Jolyon "Peter Jamieson" wrote: BTW, to use the .odc sample I reference, you will probably need to replace the reference to the ACE OLE DB provider in the connection string by the Jet one, i.e. replace Microsoft.ACE.OLEDB.12.0 with Microsoft.Jet.OLEDB.4.0 Peter Jamieson "Peter Jamieson" wrote in message ... Is it Word 2000, as you first mention, or Word 2003? From your description it sounds like 2003, but there are significant differences. It is probably worth seeing if setting the DefaultCPG registry value described in http://support.microsoft.com/kb/290981/en-us (It's also possible that opening the document in Word with an explicit encoding , saving it as a WOrd document, then using that as the data source for a merge, as described in that article, might do the trick). If your data source has 255 columns or fewer, and you are using Word 2003, you can try the approach using .odc and SCHEMA.INI that I described in the conversation beginning at http://groups.google.com/group/micro...unicode&rnum=1 If you are using Word 2000, that can't work because it doesn't support OLE DB and .odc files. Although you can use ODBC to connect using a similar SCHEMA.INI, the ODBC driver does not seem to recognise all the entries in the .INI file that the OLE DB provider does, and it doesn't have the same character encoding support anyway AFAIK. In this particular case, it's possible that the 8-bit encoding will screw up any encoding choices you make anyway, if Word does recoognise the characters as being part of the character set implicitly or explicity specified. However, you can but try. Peter Jamieson |
#6
Posted to microsoft.public.word.mailmerge.fields
|
|||
|
|||
How do I tell Word 2003 my mail-merge text is NOT UTF-8?
Peter,
My apologies - on re-checking, I noticed that I had inadvertently applied the default code page registry fix under HKLM instead of HKCU. Doing it under HKCU fixes the problem. Many thanks for the insight - I will re-rate your original reply. Just for the record, I have replied below to your latest points. Many thanks again. Jolyon "Peter Jamieson" wrote: All I am trying to do is prevent Word from making wild (and wrong) guesses about the content of a mail-merge data source - surely this is not an unreasonable thing to expect ? I do note that even the latest implementation of OpenDataSource() does not have an encoding parameter as Documents.Open() has - this seems to be the real problem. Yes, I would also prefer it if Word let you specify the encoding and everything could be kept very simple, but unfortunately a. I don't work for Microsoft - I'm just a volunteer - so I am also stuck with the way Word actually works. b. the .odc approach is the only one I know that has a chance of solving the specific problem you described in a reasonably simple way. Fair enough - sorry, didn't mean to take out my frustrations on you... (FWIW c. several of the parameters in OpenDataSource have no effect and are probably only there because someone in the WordBasic era decided that OpenDataSource would probably need much the same parameters as Open. d. Arguably the whole problem with OpenDataSource, ODBC and OLE DB is that between them they don't abstract the business of opening an arbitrary data source anything like well enough. For example, even if you had a character encoding parameter, Word would have to know that it would have to be able to provide it to its external text converter via one mechanism, and to OLE DB via another, and that would be data source-dependant. For example, when opening a .txt file via OLE DB there is no parameter you can specify in the connection string that says "use this character encoding".) 3) I don't think your suggestion for using .odc is feasible - the items in the data source come from a variety of places. Many of them are retrieved from an Ingres database via ODBC, but some are created on the fly by application code. Anyway, I do not have the resources to rewrite and re-test all the affected applications, not to mention retraining all the developers. OK, but in your original post you described a specific situation where you were creating a tab-separated temp file with barcode data and using that as your data source - in that case I would hope that you be able to limit the use of .odc to the specific situation where you are creating the data source on-the-fly. If you're connecting on-the-fly using a library routine under your control then if absolutely necessary you could consider adding a .odc and an entry in a SCHEMA.INI on-the-fly as well. As far as the .odc is concerned, it is in effect just a text file with the path name of the text file's folder and the file name of the .txt file, i.e. no nasty binary stuff to create, and the SCHEMA.INI is a standard .INI with one section per file. OK - maybe I've misunderstood what is involved here - it's not an area I'm familiar with... 1) We are using the Word 2000 automation interface because that is what is supported by the tool in which the programs are developed (Borland Developer Studio 2006). In this version, for instance, the OpenDataSource() method has only 14 parameters rather than the current 16. However, we actually have Office 2003 installed because this fixes some (though not all) other problems to do with mail-merge. OK, I'm not sure whether using the Word 2000 interface would make any difference as far as encoding issues are concerned, unless it prevented you from using the OLE DB connectivity available in Word 2003 (in which case you could not use the .odc approach), /or/ you needed to connect using DDE or ODBC and the inability to specify the Subtype parameter prevented you from doing that. I would have thought that with Borland you would be able to specify a .olb or .tlb with the correct parameter list if necessary, somehow or other (with the older versions of Delphi, for example, there was support for the dispatch method of Automation and I don't think the compiler did any type checking at all, but that approach may be significantly harder to use in more recent versions, and/or with C++ rather than Delphi. Yes, of course I could have faked up a interface definition for the latest version if it would have fixed it, though it would be a bit tedious and harder to maintain. |
Reply |
Thread Tools | |
Display Modes | |
|
|
Similar Threads | ||||
Thread | Forum | |||
Word 2003 pulls up my mail merge template, but doesn't complete the merge into a new Form Letter 1 | Mailmerge | |||
Automating Mail merge between Word 2003 and Access 2003 using VB.NET | Mailmerge | |||
Mail Merge Word 2003 Excel 2003 database | Mailmerge | |||
Word 2003 and Access 2003 Mail Merge question | Microsoft Word Help | |||
Exchange 2003 - Outlook 2003 - Word 2003 mail merge | Mailmerge |