View Single Post
  #1   Report Post  
MLeditor_Dana
 
Posts: n/a
Default How to clean .rtf documents

My company uses .rtf documents as "raw" documents (revised annually) that are
subsequently converted (via a special in-house conversion program) to
speically coded html documents for use on our web site. Over the past 2
years or so, we have noticed more than ever that our conversion program is
getting caught up on all the extraneous code automatically entered into .rtf
documents as they are opened and edited by various people. The code causes
our conversion program to skip important sections of text that should be
recognized and linked, and occasionally, the file will get so jumbled with
this text that the conversion program rejects it altogether!

Currently, the only way I know to fix the problem is to save the entire
document as a .txt file, then resave it as .rtf. Then reformat all my lost
formatting (bolds, italics, etc). And even then, I still sometimes have to
re-open the file as .txt and manually hunt for the offensive code! We have
over 1200 documents that I reconvert 4 times a year, so I'm wasting a ton of
time.

Is there a way to "clean" this code? Also, is there a way to turn off this
annoying code so that it doesn't get entered in the first place? (I believe
much of this formatting is RSID code that records document changes and
versions? Much of our revision process involved pasting text from other
documents, so Im sure some of this code is getting entered as a result of
the copy/paste.)

For example, this:
2-Methyl-3-hydroxybutyryl-}{\insrsid8069086
coA}{\insrsid8069086\charrsid1904895 dehydrogenase deficiency is a rare
X-linked organic aciduria with a highly unusual \'93neurodegenerative\'94
disease

should simply be this:
2-Methyl-3-hydroxybutyryl-coA dehydrogenase deficiency is a rare X-linked
organic aciduria with a highly unusual "neurodegenerative" disease

ANY assistance is MUCH appreciated!!