View Single Post
  #1   Report Post  
Posted to microsoft.public.word.docmanagement
Jezebel
 
Posts: n/a
Default Finding unique words

The find text is a regular expression. The exclamation mark means 'not' --
so the expression means 'match any character other than a-z, upper or lower
case. You can add any other characters you also want to exclude, eg
[!A-Za-z,-] You have to put the hyphen last, otherwise it's interpreted as
a range indicator.

The caron means that the following digits are a decimal character number.
013 = paragraph mark.




"jezzica85" wrote in message
...
Thanks Jezebel, that works really well, but I notice it destroys hyphens
and
apostrophes too, is there a way to do this keeping the hyphens and
apostrophes? And I'm just curious so I know later, what does the ^013
mean?
Thanks!

"Jezebel" wrote:

With 'Use wildcards' checked --

Find: [!a-zA-Z]
Replace: ^013




"jezzica85" wrote in message
...
Hi all,
Does anyone know if it's possible to make a list of all the unique
words
in
a document without having to destroy all the punctuation and formatting
first? I know you can make a concordance index, but you have to know
all
the
words first for that. I'm an amateur Java programmer, so if you know
Java,
you know that we can use StringTokenizers and HashSets to do this for
small
strings, but is there a way to do that on a larger scale for a Word
file
(I
know it's a different programming language too, the Java was just an
example)
that's a few hundred pages long?
Thanks!