Reply
 
Thread Tools Display Modes
  #1   Report Post  
Ken Benson
 
Posts: n/a
Default Save As Encoded Text, Unicode characters save differently

I'm trying to save Word files containing Unicode characters to plain text so
I can clean them up and code them as Indesign Tagged Text. Save As Encoded
Text actually works nicely, except that certain characters sometimes save
successfully, and sometimes get converted to a single open parenthesis. I've
posted a tiny test file showing this at http://www.pegtype.com/test.doc.

The mystery is that the same character in two places in the same file
converts differently.

Thanks for any help.
Ken Benson


  #2   Report Post  
Bob Buckland ?:-\)
 
Posts: n/a
Default

Hi Ken,

Interestingly, while the Reveal Formatting Task Pane
in Word did not show any formatting differences between
your paragraph #1 and #'s 2-5 using File=Web Page Preview
and View=Source did show differences and those are
apparently enough to confuse the Plain text converter.

If you turn on the [x] Show Paste Options button in
Tools=Options=Edit then select all of paragraph 1,
cut-it (Ctrl+X), repaste it (Ctrl+V) and from the icon
select 'keep text only' it corrects the problem.

Copying the similar two characters from items 2-5 and
pasting them over the problem ones in item 1 also corrected
the problem.

========
"Ken Benson" wrote in message ...
I'm trying to save Word files containing Unicode characters to plain text so
I can clean them up and code them as Indesign Tagged Text. Save As Encoded
Text actually works nicely, except that certain characters sometimes save
successfully, and sometimes get converted to a single open parenthesis. I've
posted a tiny test file showing this at http://www.pegtype.com/test.doc.

The mystery is that the same character in two places in the same file
converts differently.

Thanks for any help.
Ken Benson
--
Let us know if this helped you,

Bob Buckland ?:-)
MS Office System Products MVP

*Courtesy is not expensive and can pay big dividends*

Office 2003 Editions explained
http://www.microsoft.com/uk/office/editions.mspx




  #3   Report Post  
Ken Benson
 
Posts: n/a
Default

Hi Bob

Thanks for looking into this.

Show Paste Options is probably an option for a newer version of Word (I've
got Word 2000), but I seem to be able to accomplish the same thing by using
Paste Special|Unformatted Unicode Text.

This is a workable solution. Do you have any idea how the original author
(I'm several steps removed from him) could have accomplished this?

Thank again,
Ken Benson


  #4   Report Post  
Ken Benson
 
Posts: n/a
Default

Hi Bob

I came up with an even better solution than "cut/paste as text". I installed
OpenOffice, opened my problem file there, resaved it, and then opened it
again in Word. All the problem characters were converted nicely. This method
both fixes the problem and keeps the formatting.

Thanks for your help,
Ken Benson


  #5   Report Post  
Klaus Linke
 
Posts: n/a
Default

Do you have any idea how the original author (I'm several steps=20
removed from him) could have accomplished this?



Hi Ken,

He probably used "Insert Symbol", and the font is a symbol =
("decorative") font like Symbol or Wingdings.

Since you don't want the symbols to change if you change the font (say =
by applying another one, or applying another style), Word inserts those =
symbols as a kind of symbol field. But it will never show you the field =
code, only the result.

The effect is as desired: The field can have any font applied, but the =
character will still be inserted from the font specified in the field.
The drawback is that Word usually can't tell you the code or font. =
AscW(Selection.Text) will return the code 40 ... "(" =3D opening brace, =
and the font dropdown will show you the font of the surrounding text.
If you select the symbol and open "Insert Symbol" again, Word can =
usually tell you the font and the code.
This doesn't always work, though. Word expects the codes to be in the =
range from U+F000 to U+F0FF. But since Symbol fields don't change if you =
add or subtract multiples of 256 from that code, you often get files =
with messed up symbols.
The WordPerfect import filters seem to be a special culprit in this =
regard, but I suspect some other filters aren't working correctly =
either.

If you only need unformatted text, "Paste Special as text" is a good =
solution.=20
If you need to keep the formatting, I have posted a macro to turn =
"proper symbol fields" into regular characters... google for =
"SymbolsUnprotect".
For messed up symbol "fields", I haven't found a good solution, since =
you'd have to do a lot of processing for each character which takes too =
long even on moderately sized files.

Greetings,
Klaus


  #6   Report Post  
Klaus Linke
 
Posts: n/a
Default

Hi Ken,

Nice!!! =20

I just tried it on a file with "messed up" symbols.
That file had given me trouble last week, because the symbols messed up =
in InDesign, and I couldn't fix it in Word.
I had resorted to editing the RTF code "by hand" with Find/Replace.

OO Writer 1.9.97 fixed the symbols fine. I'd keep it installed if only =
for that reason.=20

Thanks for the tip!
Klaus




"Ken Benson" wrote:
Hi Bob
=20
I came up with an even better solution than "cut/paste as text". I =

installed
OpenOffice, opened my problem file there, resaved it, and then opened =

it
again in Word. All the problem characters were converted nicely. This =

method
both fixes the problem and keeps the formatting.
=20
Thanks for your help,
Ken Benson
=20

  #7   Report Post  
Ken Benson
 
Posts: n/a
Default


"Klaus Linke" wrote in message
...

OO Writer 1.9.97 fixed the symbols fine. I'd keep it installed if only

for that reason.


Yes, I think I'll just be running all my bizarre author files through
OpenOffice from now on.

Ken Benson


Reply
Thread Tools
Display Modes

Posting Rules

Smilies are On
[IMG] code is On
HTML code is Off

Forum Jump

Similar Threads
Thread Thread Starter Forum Replies Last Post
How do I get rid of a page number on a Save As\ Text Only doc? Mike Whitehurst Microsoft Word Help 2 March 24th 05 01:17 PM
How do i get rid of a page number on a Save As\ Text Only doc? Mike Whitehurst Microsoft Word Help 0 March 16th 05 01:18 PM
Outline Renee Hendershott Page Layout 2 December 25th 04 02:49 PM
textbox to normal text Jack Sons New Users 16 December 5th 04 03:44 PM
merge data in a header or text box does not save JPCT Mailmerge 2 December 1st 04 03:45 PM


All times are GMT +1. The time now is 02:09 PM.

Copyright ©2000 - 2024, Jelsoft Enterprises Ltd.
Copyright ©2004-2024 Microsoft Office Word Forum - WordBanter.
The comments are property of their posters.
 

About Us

"It's about Microsoft Word"