Home |
Search |
Today's Posts |
|
#1
|
|||
|
|||
Weird characters (double question marks) in text
My client has a handful of documents that have either double-question mark or
y-umlaut characters in them. In most cases, the characters can be deleted, but when you save the document the characters return. We are working in Word 2003, but the documents could have been originally created in any version of Word or another word processor. Usually we can try various combinations of cutting, pasting, formatting, and so forth to get rid of the characters, but we need a more automated solution. We haven't found any way to find and replace these characters, and so far my research hasn't turned up anything predictable enough to use an XSLT transform. I've tried to look for patterns and possible solutions by saving the documents as XML and then saving back in Word format. Sometimes (but not always) the XML document will not show the characters when opened in Word. But when the XML document is then saved in Word format the characters will return. In all the cases I've looked at, the paragraph containing the characters also contains an emspace. Occassionally (but not always) replacing the emspace with two regular spaces or two enspaces will fix the problem. But as soon as the emspace is inserted again the problem returns. In most cases I've looked at where saving as XML has no effect or where the characters return upon saving in Word format, I can see a w:r node that has only one child, w:rPr. That node always contains a w:b-cs/ child, along with various combinations of w:b/, w:i-cs/, and w:i/. There are other cases within these same documents where I can see these same nodes as well as emspaces and there are no unexpected characters at all. Has anyone else seen this? Is it a known Word bug? Any suggestions on how I should proceed? |
#2
|
|||
|
|||
Hi Jan,
The biggest clue here is the fact that you are copying and pasting in your document and you don't know which program generated the documents that you're copying from. Whenever you're in doubt, you should always copy and then select Paste Special and select Unformatted Text. As long as they aren't huge documents, I would simply select the entire document and copy it and paste it into Wordpad to strip out any formatting and then copy it from there and paste it into a new Word document. You can then format the document properly in Word. I hope this has been helpful to you. "Jan Fransen" wrote: My client has a handful of documents that have either double-question mark or y-umlaut characters in them. In most cases, the characters can be deleted, but when you save the document the characters return. We are working in Word 2003, but the documents could have been originally created in any version of Word or another word processor. Usually we can try various combinations of cutting, pasting, formatting, and so forth to get rid of the characters, but we need a more automated solution. We haven't found any way to find and replace these characters, and so far my research hasn't turned up anything predictable enough to use an XSLT transform. I've tried to look for patterns and possible solutions by saving the documents as XML and then saving back in Word format. Sometimes (but not always) the XML document will not show the characters when opened in Word. But when the XML document is then saved in Word format the characters will return. In all the cases I've looked at, the paragraph containing the characters also contains an emspace. Occassionally (but not always) replacing the emspace with two regular spaces or two enspaces will fix the problem. But as soon as the emspace is inserted again the problem returns. In most cases I've looked at where saving as XML has no effect or where the characters return upon saving in Word format, I can see a w:r node that has only one child, w:rPr. That node always contains a w:b-cs/ child, along with various combinations of w:b/, w:i-cs/, and w:i/. There are other cases within these same documents where I can see these same nodes as well as emspaces and there are no unexpected characters at all. Has anyone else seen this? Is it a known Word bug? Any suggestions on how I should proceed? |
#3
|
|||
|
|||
Cutting the text in question and then pasting as unformatted text only solves
the problem in 1 of the 4 test documents I'm looking at. In the others, the text looks great right after the paste, but when I save the document the characters either come back, or I get a proliferation of more double question mark characters within the range, in different places from where they were originally. Also, I did a bit more research with the end users who have encountered this problem, and it seems that it doesn't appear when they paste in information from other sources. Rather, the text will look and print fine within the document for some time and then suddenly one day the ?? characters will show up within text that hasn't been changed in any other way. Just to head off the obvious question: I did try doing an Open and Repair on these documents. It had no effect. "Carol" wrote: Hi Jan, The biggest clue here is the fact that you are copying and pasting in your document and you don't know which program generated the documents that you're copying from. Whenever you're in doubt, you should always copy and then select Paste Special and select Unformatted Text. As long as they aren't huge documents, I would simply select the entire document and copy it and paste it into Wordpad to strip out any formatting and then copy it from there and paste it into a new Word document. You can then format the document properly in Word. I hope this has been helpful to you. "Jan Fransen" wrote: My client has a handful of documents that have either double-question mark or y-umlaut characters in them. In most cases, the characters can be deleted, but when you save the document the characters return. We are working in Word 2003, but the documents could have been originally created in any version of Word or another word processor. Usually we can try various combinations of cutting, pasting, formatting, and so forth to get rid of the characters, but we need a more automated solution. We haven't found any way to find and replace these characters, and so far my research hasn't turned up anything predictable enough to use an XSLT transform. I've tried to look for patterns and possible solutions by saving the documents as XML and then saving back in Word format. Sometimes (but not always) the XML document will not show the characters when opened in Word. But when the XML document is then saved in Word format the characters will return. In all the cases I've looked at, the paragraph containing the characters also contains an emspace. Occassionally (but not always) replacing the emspace with two regular spaces or two enspaces will fix the problem. But as soon as the emspace is inserted again the problem returns. In most cases I've looked at where saving as XML has no effect or where the characters return upon saving in Word format, I can see a w:r node that has only one child, w:rPr. That node always contains a w:b-cs/ child, along with various combinations of w:b/, w:i-cs/, and w:i/. There are other cases within these same documents where I can see these same nodes as well as emspaces and there are no unexpected characters at all. Has anyone else seen this? Is it a known Word bug? Any suggestions on how I should proceed? |
Reply |
Thread Tools | |
Display Modes | |
|
|
Similar Threads | ||||
Thread | Forum | |||
unexpected characters in text | Microsoft Word Help | |||
Access to Word Mail Merge; symbols change to question marks | Mailmerge | |||
Form Field help Text question | Microsoft Word Help | |||
question marks appear in merged email | Mailmerge | |||
Text Formatting Question | Microsoft Word Help |