View Single Post
  #3   Report Post  
Posted to microsoft.public.word.docmanagement
jezzica85
 
Posts: n/a
Default Finding unique words

Hi Jezebel,
There isn't anything wrong with making a copy and destroying all the
punctuation, I've done that before and it works well. I was just hoping
there was a faster way because it takes quite a while to go through all the
possible punctuation marks and stuff. Is there a way to quickly replace
anything nontext with a paragraph break?

"Jezebel" wrote:

What's wrong with making a copy then destroying all the punctuation?
Quickest method I know is to use Find and Replace to delete all non-text and
convert all white space to paragraph marks; then copy to Excel and do a
unique filter.

If you want to do it with VBA, iterate the words collection, check whether
the 'word' is text, and if so, add it to a collection using it as both the
key and the item. Since keys must be unique, you end up with a unique list.




"jezzica85" wrote in message
...
Hi all,
Does anyone know if it's possible to make a list of all the unique words
in
a document without having to destroy all the punctuation and formatting
first? I know you can make a concordance index, but you have to know all
the
words first for that. I'm an amateur Java programmer, so if you know
Java,
you know that we can use StringTokenizers and HashSets to do this for
small
strings, but is there a way to do that on a larger scale for a Word file
(I
know it's a different programming language too, the Java was just an
example)
that's a few hundred pages long?
Thanks!