View Single Post
  #3   Report Post  
lostinspace
 
Posts: n/a
Default

----- Original Message -----
From: "BruceM"
Newsgroups: microsoft.public.word.docmanagement
Sent: Monday, January 17, 2005 11:43 AM
Subject: Find and Replace anomaly


I have used Find and Replace many times to replace extra paragraph marks
and
paragraph marks that occur at the end of every line (typical of things
copied
from web pages). Now I am in a situation where a government
representative
(we are in a regulated industry) wants to see where we keep a particular
regulation for reference. The twist is that the regulation is not
available
from the government in printable form, yet the web site's copy is not
considered adequate for our records. That leaves me to copy from the web
site and attempt to make it into a document. I have done this before, but
this one is different. I have replaced all styles with Normal (for now; I
will apply custom styles later), used a macro to remove all hyperlinks,
used
Find and Replace to remove all graphics. Here's the problem: I cannot
use
Find and Replace to replace a succession of paragraph marks with a single
paragraph mark. I can do it in any other document, but not in this one.
If
I copy a succession of two paragraphs from the document to a new document
I
get the same result (it doesn't identify the successive paragraphs as
being
two paragraphs), but if I add paragraphs to the new document with the
Enter
key Find and Replace works as it should. Similarly, when I add empty
paragraphs to the troublesome document I can find them as I would expect.
I
have tried a wildcard search (Find ^13{2,}, Replace With ^p), and without
wildcards (Find ^p^p, Replace With ^p). No luck. If I search for a
single
paragraph I can find every one, including both in the pair. If I replace
every paragraph mark with, say, a £, then attempt to replace every
instance
of ££ with £, same problem as with the paragraphs: it does not recognize
it
as a pair.
There is nothing such as a space between the paragraphs. I have removed
all
manual formatting, hyperlinks, graphics, etc. In short, everything in the
document is part of the ASCII extended character set. I replaced ^13 with
^p, and ^p with ^p (with and without wildcards respectively). I copied
the
entire document to Notepad, then opened that with Word. In every case,
same
result.
Anybody have an idea as to what is going on here?


Bruce,
Likely the best service you could do in assiting yourself would
be in providing the URL for the page your attempting to convert?

CSS and html have formatting options which display spacing and such beyond
Word's formatting options.

One alternative option may be to print the web page to a PDF file retaning
all formatting in the process.