View Single Post
  #12   Report Post  
Posted to microsoft.public.word.docmanagement
Greg Maxey Greg Maxey is offline
external usenet poster
 
Posts: 285
Default How do I delete duplicate entries in a Word document?

SN,

For assistance running the macro, see:
http://www.gmayor.com/installing_macro.htm

AFAIK, the is no straightforward way to delete duplicate words in a
document. Part of the problem is how Word defines a word. In this simple
example there are four words:

one two three. They are "one " "two " "three" and "."

In this example three are seven words:

one two three one two three. They are "one " two " three " one " two "
"three" and "."

At first glance it would appear the words one two and three are duplicated.
One and two are, but on closer observation you will see that "three " is in
fact not the same as "three"

If I ran this procedure on the second example:

Sub ScratchMacroII()
Dim oWord As Range
Dim myCol As New Collection
For Each oWord In ActiveDocument.Range.Words
On Error Resume Next
myCol.Add oWord.Text, oWord.Text
If Err.Number = 457 Then oWord.Delete
Next
End Sub

I would be left with "one two three three."

I can fix that by trimming the trailing space from the word range:

Sub ScratchMacroII()
Dim oWord As Range
Dim myCol As New Collection
For Each oWord In ActiveDocument.Range.Words
On Error Resume Next
myCol.Add Trim(oWord.Text), Trim(oWord.Text)
If Err.Number = 457 Then oWord.Delete
Next
End Sub

Which leaves "one two three."

But if I run code like that on a group of e-mail addresses that contain the
punctuation "." things get fouled up real quick.

When you type an email address in a Word document Word by default will
convert that text to a hyperlink. E.g., if I type Word
automatically changes that to a hyperlink field: { Hyperlink
" }

You can see this by right clicking and e-mail address and selecting toggle
field codes.

The procedure that I sent to you searches a document for the first e-mail
address. It then compares the field code in that field to the field code in
every other field in the main text part of the document. If they match the
duplicate fields are deleted. The procedure then looks for the next e-mail
address and repeats the procedure and on and on till all duplicates are
deleted.




--
Greg Maxey/Word MVP
See:
http://gregmaxey.mvps.org/word_tips.htm
For some helpful tips using Word.


SN wrote:
Dear Greg,
Thanks vm .... but this appears too technical to me.... I am just an
average computer user and cannot really understand what you say.....
Isnt there any straightforward way to 'DELETE DUPLICATE WORDS IN A
DOCUMENT?"
Rgds/SN

"Greg Maxey" wrote:

Provided the email entries are of the ????????@??????.??? format
(where ? is any character) then this might work:

Sub ScratchMacro()
Dim pStr As String
Dim oRng As Range
Dim i As Long
Dim j As Long
Dim oFld As Field
Dim bLoop As Boolean
i = 1
bLoop = True
Do
Set oRng = ActiveDocument.Range
oRng.Start = i
With oRng.Find
.Text = "?{1,}\@?{1,}.?{3}"
.MatchWildcards = True
Do
.Execute
If Not .Found Then
bLoop = False
Exit Do
End If
If oRng.Start = i Then
pStr = Trim(oRng.Text)
i = oRng.End
j = 0
For Each oFld In ActiveDocument.Fields
If InStr(oFld.Code, pStr) 0 Then
j = j + 1
If j 1 Then
oFld.Delete
End If
End If
Next oFld
Exit Do
Else
oRng.Collapse wdCollapseEnd
End If
Loop While .Found = True
End With
Loop While bLoop = True
End Sub


--
Greg Maxey/Word MVP
See:
http://gregmaxey.mvps.org/word_tips.htm
For some helpful tips using Word.


SN wrote:
Helo, I am facing exactly the same problem. Did you find the
answer?? I will very much appreciate if you can share it with me.
Thank you very much
SN

"ayesha" wrote:

I have a large word document with lots of email addresses, many of
which are duplicate entries- how do I delete these repeat entries
without having to rely on the ControlF function?