View Single Post
  #1   Report Post  
Jan Fransen
 
Posts: n/a
Default Weird characters (double question marks) in text

My client has a handful of documents that have either double-question mark or
y-umlaut characters in them. In most cases, the characters can be deleted,
but when you save the document the characters return.

We are working in Word 2003, but the documents could have been originally
created in any version of Word or another word processor.

Usually we can try various combinations of cutting, pasting, formatting, and
so forth to get rid of the characters, but we need a more automated solution.
We haven't found any way to find and replace these characters, and so far my
research hasn't turned up anything predictable enough to use an XSLT
transform.

I've tried to look for patterns and possible solutions by saving the
documents as XML and then saving back in Word format. Sometimes (but not
always) the XML document will not show the characters when opened in Word.
But when the XML document is then saved in Word format the characters will
return.

In all the cases I've looked at, the paragraph containing the characters
also contains an emspace. Occassionally (but not always) replacing the
emspace with two regular spaces or two enspaces will fix the problem. But as
soon as the emspace is inserted again the problem returns.

In most cases I've looked at where saving as XML has no effect or where the
characters return upon saving in Word format, I can see a w:r node that has
only one child, w:rPr. That node always contains a w:b-cs/ child, along
with various combinations of w:b/, w:i-cs/, and w:i/.

There are other cases within these same documents where I can see these same
nodes as well as emspaces and there are no unexpected characters at all.

Has anyone else seen this? Is it a known Word bug? Any suggestions on how I
should proceed?