View Single Post
  #9   Report Post  
Posted to microsoft.public.word.docmanagement
jezzica85
 
Posts: n/a
Default Finding unique words

Thank you Jezebel and Jay, this was really helpful.
jezzica85

"Jay Freedman" wrote:

A good reference for wildcards in Find and Replace is at
http://www.gmayor.com/replace_using_wildcards.htm.

The ^013 is the code for a paragraph mark (technically, the ASCII
character with the numeric value 13, which is a carriage return in
plain text).

The code ^p could also be used for a paragraph mark, but only in the
Replace With box (for some reason only ^013 works in the Find What
box). In fact, if you use ^013 in the Replace With box, the Table
Sort command in Word won't recognize the "paragraph marks" and will
claim there are no valid records (paragraphs) in the text to be
sorted. They'll work OK when you copy the text into Excel, though.

To make the Replace leave apostrophes and hyphens in place, use the
search expression

[!a-zA-Z'-]

This expression translates to "find all characters that are not in the
ranges a through z or A through Z, and are not an apostrophe or a
hyphen".

--
Regards,
Jay Freedman
Microsoft Word MVP FAQ: http://word.mvps.org
Email cannot be acknowledged; please post all follow-ups to the
newsgroup so all may benefit.

On Sat, 22 Apr 2006 17:33:01 -0700, jezzica85
wrote:

Thanks Jezebel, that works really well, but I notice it destroys hyphens and
apostrophes too, is there a way to do this keeping the hyphens and
apostrophes? And I'm just curious so I know later, what does the ^013 mean?
Thanks!

"Jezebel" wrote:

With 'Use wildcards' checked --

Find: [!a-zA-Z]
Replace: ^013




"jezzica85" wrote in message
...
Hi all,
Does anyone know if it's possible to make a list of all the unique words
in
a document without having to destroy all the punctuation and formatting
first? I know you can make a concordance index, but you have to know all
the
words first for that. I'm an amateur Java programmer, so if you know
Java,
you know that we can use StringTokenizers and HashSets to do this for
small
strings, but is there a way to do that on a larger scale for a Word file
(I
know it's a different programming language too, the Java was just an
example)
that's a few hundred pages long?
Thanks!