#1   Report Post  
Posted to microsoft.public.word.docmanagement
Kat3n Kat3n is offline
external usenet poster
 
Posts: 3
Default Top Ten Word count

I create policies using Word 2003. I would like to be able to count each word
in the entire policy and use the top ten words (ignoring words such as I,
the, in, etc.) as keywords. Is there anyway to do this?
--
Kat3n
  #2   Report Post  
Posted to microsoft.public.word.docmanagement
Doug Robbins - Word MVP Doug Robbins - Word MVP is offline
external usenet poster
 
Posts: 8,832
Default Top Ten Word count

Here's a macro that will count the frequency of the words in a document to
allow you to determine the top 10 most frequently used words:

Sub WordFrequency()



Dim SingleWord As String 'Raw word pulled from doc
Const maxwords = 9000 'Maximum unique words allowed
Dim Words(maxwords) As String 'Array to hold unique words
Dim Freq(maxwords) As Integer 'Frequency counter for Unique
Words
Dim WordNum As Integer 'Number of unique words
Dim ByFreq As Boolean 'Flag for sorting order
Dim ttlwds As Long 'Total words in the document
Dim Excludes As String 'Words to be excluded
Dim Found As Boolean 'Temporary flag
Dim j, k, l, Temp As Integer 'Temporary variables
Dim tword As String '



' Set up excluded words
' Excludes =
"[the][a][of][is][to][for][this][that][by][be][and][are]"
Excludes = ""
Excludes = InputBox$("Enter words that you wish to exclude,
surrounding each word with [ ].", "Excluded Words", "")
' Excludes = Excludes & InputBox$("The following words are excluded:
" & Excludes & ". Enter words that you wish to exclude, surrounding each
word with [ ].", "Excluded Words", "")
' Find out how to sort
ByFreq = True
Ans = InputBox$("Sort by WORD or by FREQ?", "Sort order", "FREQ")
If Ans = "" Then End
If UCase(Ans) = "WORD" Then
ByFreq = False
End If



Selection.HomeKey Unit:=wdStory
System.Cursor = wdCursorWait
WordNum = 0
ttlwds = ActiveDocument.Words.Count
Totalwords = ActiveDocument.Words.Count



' Control the repeat
For Each aword In ActiveDocument.Words
SingleWord = Trim(aword)
If SingleWord "A" Or SingleWord "z" Then SingleWord = ""
'Out of range?
If InStr(Excludes, "[" & SingleWord & "]") Then SingleWord = ""
'On exclude list?
If Len(SingleWord) 0 Then
Found = False
For j = 1 To WordNum
If Words(j) = SingleWord Then
Freq(j) = Freq(j) + 1
Found = True
Exit For
End If
Next j
If Not Found Then
WordNum = WordNum + 1
Words(WordNum) = SingleWord
Freq(WordNum) = 1
End If
If WordNum maxwords - 1 Then
j = MsgBox("The maximum array size has been exceeded.
Increase maxwords.", vbOKOnly)
Exit For
End If
End If
ttlwds = ttlwds - 1
StatusBar = "Remaining: " & ttlwds & " Unique: " & WordNum
Next aword



' Now sort it into word order
For j = 1 To WordNum - 1
k = j
For l = j + 1 To WordNum
If (Not ByFreq And Words(l) Words(k)) Or (ByFreq And
Freq(l) Freq(k)) Then k = l
Next l
If k j Then
tword = Words(j)
Words(j) = Words(k)
Words(k) = tword
Temp = Freq(j)
Freq(j) = Freq(k)
Freq(k) = Temp
End If
StatusBar = "Sorting: " & WordNum - j
Next j



' Now write out the results
tmpName = ActiveDocument.AttachedTemplate.FullName
Documents.Add Template:=tmpName, NewTemplate:=False
Selection.ParagraphFormat.TabStops.ClearAll
With Selection
For j = 1 To WordNum
.TypeText Text:=Words(j) & vbTab & Trim(Str(Freq(j))) &
vbCrLf
Next j
End With
ActiveDocument.Range.Select
Selection.ConvertToTable
Selection.Collapse wdCollapseStart
ActiveDocument.Tables(1).Rows.Add BeforeRow:=Selection.Rows(1)
ActiveDocument.Tables(1).Cell(1, 1).Range.InsertBefore "Word"
ActiveDocument.Tables(1).Cell(1, 2).Range.InsertBefore
"Occurrences"
ActiveDocument.Tables(1).Range.ParagraphFormat.Ali gnment =
wdAlignParagraphCenter
ActiveDocument.Tables(1).Rows.Add
ActiveDocument.Tables(1).Cell(ActiveDocument.Table s(1).Rows.Count,
1).Range.InsertBefore "Total words in Document"
ActiveDocument.Tables(1).Cell(ActiveDocument.Table s(1).Rows.Count,
2).Range.InsertBefore Totalwords
ActiveDocument.Tables(1).Rows.Add
ActiveDocument.Tables(1).Cell(ActiveDocument.Table s(1).Rows.Count,
1).Range.InsertBefore "Number of different words in Document"
ActiveDocument.Tables(1).Cell(ActiveDocument.Table s(1).Rows.Count,
2).Range.InsertBefore Trim(Str(WordNum))
System.Cursor = wdCursorNormal
' j = MsgBox("There were " & Trim(Str(WordNum)) & " different words
", vbOKOnly, "Finished")
Selection.HomeKey wdStory



End Sub


--
Hope this helps.

Please reply to the newsgroup unless you wish to avail yourself of my
services on a paid consulting basis.

Doug Robbins - Word MVP

"Kat3n" wrote in message
...
I create policies using Word 2003. I would like to be able to count each
word
in the entire policy and use the top ten words (ignoring words such as I,
the, in, etc.) as keywords. Is there anyway to do this?
--
Kat3n



  #3   Report Post  
Posted to microsoft.public.word.docmanagement
george 16-17 george 16-17 is offline
external usenet poster
 
Posts: 3
Default Top Ten Word count

Hi Doug,

I apologize in advance for hijacking this thread - sorry!

I am working on a similar project, as I need to count word frequency in a
document. That being said, I am relatively new to macros and VBA.

I copied and pasted in the code into a module and there where numerous
syntax errors (lines of code in red font). I must be doing something wrong.

Any assistance would be appreciated...thanks in advance,
george

"Doug Robbins - Word MVP" wrote:

Here's a macro that will count the frequency of the words in a document to
allow you to determine the top 10 most frequently used words:

Sub WordFrequency()



Dim SingleWord As String 'Raw word pulled from doc
Const maxwords = 9000 'Maximum unique words allowed
Dim Words(maxwords) As String 'Array to hold unique words
Dim Freq(maxwords) As Integer 'Frequency counter for Unique
Words
Dim WordNum As Integer 'Number of unique words
Dim ByFreq As Boolean 'Flag for sorting order
Dim ttlwds As Long 'Total words in the document
Dim Excludes As String 'Words to be excluded
Dim Found As Boolean 'Temporary flag
Dim j, k, l, Temp As Integer 'Temporary variables
Dim tword As String '



' Set up excluded words
' Excludes =
"[the][a][of][is][to][for][this][that][by][be][and][are]"
Excludes = ""
Excludes = InputBox$("Enter words that you wish to exclude,
surrounding each word with [ ].", "Excluded Words", "")
' Excludes = Excludes & InputBox$("The following words are excluded:
" & Excludes & ". Enter words that you wish to exclude, surrounding each
word with [ ].", "Excluded Words", "")
' Find out how to sort
ByFreq = True
Ans = InputBox$("Sort by WORD or by FREQ?", "Sort order", "FREQ")
If Ans = "" Then End
If UCase(Ans) = "WORD" Then
ByFreq = False
End If



Selection.HomeKey Unit:=wdStory
System.Cursor = wdCursorWait
WordNum = 0
ttlwds = ActiveDocument.Words.Count
Totalwords = ActiveDocument.Words.Count



' Control the repeat
For Each aword In ActiveDocument.Words
SingleWord = Trim(aword)
If SingleWord "A" Or SingleWord "z" Then SingleWord = ""
'Out of range?
If InStr(Excludes, "[" & SingleWord & "]") Then SingleWord = ""
'On exclude list?
If Len(SingleWord) 0 Then
Found = False
For j = 1 To WordNum
If Words(j) = SingleWord Then
Freq(j) = Freq(j) + 1
Found = True
Exit For
End If
Next j
If Not Found Then
WordNum = WordNum + 1
Words(WordNum) = SingleWord
Freq(WordNum) = 1
End If
If WordNum maxwords - 1 Then
j = MsgBox("The maximum array size has been exceeded.
Increase maxwords.", vbOKOnly)
Exit For
End If
End If
ttlwds = ttlwds - 1
StatusBar = "Remaining: " & ttlwds & " Unique: " & WordNum
Next aword



' Now sort it into word order
For j = 1 To WordNum - 1
k = j
For l = j + 1 To WordNum
If (Not ByFreq And Words(l) Words(k)) Or (ByFreq And
Freq(l) Freq(k)) Then k = l
Next l
If k j Then
tword = Words(j)
Words(j) = Words(k)
Words(k) = tword
Temp = Freq(j)
Freq(j) = Freq(k)
Freq(k) = Temp
End If
StatusBar = "Sorting: " & WordNum - j
Next j



' Now write out the results
tmpName = ActiveDocument.AttachedTemplate.FullName
Documents.Add Template:=tmpName, NewTemplate:=False
Selection.ParagraphFormat.TabStops.ClearAll
With Selection
For j = 1 To WordNum
.TypeText Text:=Words(j) & vbTab & Trim(Str(Freq(j))) &
vbCrLf
Next j
End With
ActiveDocument.Range.Select
Selection.ConvertToTable
Selection.Collapse wdCollapseStart
ActiveDocument.Tables(1).Rows.Add BeforeRow:=Selection.Rows(1)
ActiveDocument.Tables(1).Cell(1, 1).Range.InsertBefore "Word"
ActiveDocument.Tables(1).Cell(1, 2).Range.InsertBefore
"Occurrences"
ActiveDocument.Tables(1).Range.ParagraphFormat.Ali gnment =
wdAlignParagraphCenter
ActiveDocument.Tables(1).Rows.Add
ActiveDocument.Tables(1).Cell(ActiveDocument.Table s(1).Rows.Count,
1).Range.InsertBefore "Total words in Document"
ActiveDocument.Tables(1).Cell(ActiveDocument.Table s(1).Rows.Count,
2).Range.InsertBefore Totalwords
ActiveDocument.Tables(1).Rows.Add
ActiveDocument.Tables(1).Cell(ActiveDocument.Table s(1).Rows.Count,
1).Range.InsertBefore "Number of different words in Document"
ActiveDocument.Tables(1).Cell(ActiveDocument.Table s(1).Rows.Count,
2).Range.InsertBefore Trim(Str(WordNum))
System.Cursor = wdCursorNormal
' j = MsgBox("There were " & Trim(Str(WordNum)) & " different words
", vbOKOnly, "Finished")
Selection.HomeKey wdStory



End Sub


--
Hope this helps.

Please reply to the newsgroup unless you wish to avail yourself of my
services on a paid consulting basis.

Doug Robbins - Word MVP

"Kat3n" wrote in message
...
I create policies using Word 2003. I would like to be able to count each
word
in the entire policy and use the top ten words (ignoring words such as I,
the, in, etc.) as keywords. Is there anyway to do this?
--
Kat3n




  #4   Report Post  
Posted to microsoft.public.word.docmanagement
Jay Freedman Jay Freedman is offline
external usenet poster
 
Posts: 9,854
Default Top Ten Word count

Newsgroup posting forces a limited line length, and that causes long code
lines to be broken by a carriage return. When you copy/paste the code, VBA
sees the broken lines as syntax errors.

Wherever you see a line that starts at the left margin, and that line or the
next line is red, you need to delete the carriage return that causes the
line to break. For example, these two lines should be only one line:

If SingleWord "A" Or SingleWord "z" Then SingleWord
= "" 'Out of range?

--
Regards,
Jay Freedman
Microsoft Word MVP FAQ: http://word.mvps.org
Email cannot be acknowledged; please post all follow-ups to the newsgroup so
all may benefit.

george 16-17 wrote:
Hi Doug,

I apologize in advance for hijacking this thread - sorry!

I am working on a similar project, as I need to count word frequency
in a document. That being said, I am relatively new to macros and VBA.

I copied and pasted in the code into a module and there where numerous
syntax errors (lines of code in red font). I must be doing something
wrong.

Any assistance would be appreciated...thanks in advance,
george

"Doug Robbins - Word MVP" wrote:

Here's a macro that will count the frequency of the words in a
document to allow you to determine the top 10 most frequently used
words:

Sub WordFrequency()



Dim SingleWord As String 'Raw word pulled from doc
Const maxwords = 9000 'Maximum unique words
allowed Dim Words(maxwords) As String 'Array to hold
unique words Dim Freq(maxwords) As Integer 'Frequency
counter for Unique
Words
Dim WordNum As Integer 'Number of unique words
Dim ByFreq As Boolean 'Flag for sorting order
Dim ttlwds As Long 'Total words in the
document Dim Excludes As String 'Words to be
excluded Dim Found As Boolean 'Temporary flag
Dim j, k, l, Temp As Integer 'Temporary variables
Dim tword As String '



' Set up excluded words
' Excludes =
"[the][a][of][is][to][for][this][that][by][be][and][are]"
Excludes = ""
Excludes = InputBox$("Enter words that you wish to exclude,
surrounding each word with [ ].", "Excluded Words", "")
' Excludes = Excludes & InputBox$("The following words are
excluded: " & Excludes & ". Enter words that you wish to exclude,
surrounding each word with [ ].", "Excluded Words", "")
' Find out how to sort
ByFreq = True
Ans = InputBox$("Sort by WORD or by FREQ?", "Sort order",
"FREQ") If Ans = "" Then End
If UCase(Ans) = "WORD" Then
ByFreq = False
End If



Selection.HomeKey Unit:=wdStory
System.Cursor = wdCursorWait
WordNum = 0
ttlwds = ActiveDocument.Words.Count
Totalwords = ActiveDocument.Words.Count



' Control the repeat
For Each aword In ActiveDocument.Words
SingleWord = Trim(aword)
If SingleWord "A" Or SingleWord "z" Then SingleWord
= "" 'Out of range?
If InStr(Excludes, "[" & SingleWord & "]") Then
SingleWord = "" 'On exclude list?
If Len(SingleWord) 0 Then
Found = False
For j = 1 To WordNum
If Words(j) = SingleWord Then
Freq(j) = Freq(j) + 1
Found = True
Exit For
End If
Next j
If Not Found Then
WordNum = WordNum + 1
Words(WordNum) = SingleWord
Freq(WordNum) = 1
End If
If WordNum maxwords - 1 Then
j = MsgBox("The maximum array size has been
exceeded. Increase maxwords.", vbOKOnly)
Exit For
End If
End If
ttlwds = ttlwds - 1
StatusBar = "Remaining: " & ttlwds & " Unique: " &
WordNum Next aword



' Now sort it into word order
For j = 1 To WordNum - 1
k = j
For l = j + 1 To WordNum
If (Not ByFreq And Words(l) Words(k)) Or (ByFreq
And Freq(l) Freq(k)) Then k = l
Next l
If k j Then
tword = Words(j)
Words(j) = Words(k)
Words(k) = tword
Temp = Freq(j)
Freq(j) = Freq(k)
Freq(k) = Temp
End If
StatusBar = "Sorting: " & WordNum - j
Next j



' Now write out the results
tmpName = ActiveDocument.AttachedTemplate.FullName
Documents.Add Template:=tmpName, NewTemplate:=False
Selection.ParagraphFormat.TabStops.ClearAll
With Selection
For j = 1 To WordNum
.TypeText Text:=Words(j) & vbTab &
Trim(Str(Freq(j))) & vbCrLf
Next j
End With
ActiveDocument.Range.Select
Selection.ConvertToTable
Selection.Collapse wdCollapseStart
ActiveDocument.Tables(1).Rows.Add
BeforeRow:=Selection.Rows(1)
ActiveDocument.Tables(1).Cell(1, 1).Range.InsertBefore
"Word" ActiveDocument.Tables(1).Cell(1, 2).Range.InsertBefore
"Occurrences"
ActiveDocument.Tables(1).Range.ParagraphFormat.Ali gnment =
wdAlignParagraphCenter ActiveDocument.Tables(1).Rows.Add

ActiveDocument.Tables(1).Cell(ActiveDocument.Table s(1).Rows.Count,
1).Range.InsertBefore "Total words in Document"
ActiveDocument.Tables(1).Cell(ActiveDocument.Table s(1).Rows.Count,
2).Range.InsertBefore Totalwords
ActiveDocument.Tables(1).Rows.Add
ActiveDocument.Tables(1).Cell(ActiveDocument.Table s(1).Rows.Count,
1).Range.InsertBefore "Number of different words in
Document"


ActiveDocument.Tables(1).Cell(ActiveDocument.Table s(1).Rows.Count,
2).Range.InsertBefore Trim(Str(WordNum)) System.Cursor =
wdCursorNormal ' j = MsgBox("There were " & Trim(Str(WordNum)) & "
different words ", vbOKOnly, "Finished") Selection.HomeKey wdStory



End Sub


--
Hope this helps.

Please reply to the newsgroup unless you wish to avail yourself of my
services on a paid consulting basis.

Doug Robbins - Word MVP

"Kat3n" wrote in message
...
I create policies using Word 2003. I would like to be able to count
each word
in the entire policy and use the top ten words (ignoring words such
as I, the, in, etc.) as keywords. Is there anyway to do this?
--
Kat3n



  #5   Report Post  
Posted to microsoft.public.word.docmanagement
george 16-17 george 16-17 is offline
external usenet poster
 
Posts: 3
Default Top Ten Word count

Jay,

I appreciate that piece of info. I am too much of a noob to figure that out
on my own.

I will fix the code and try again.

Thanks again,
george

"Jay Freedman" wrote:

Newsgroup posting forces a limited line length, and that causes long code
lines to be broken by a carriage return. When you copy/paste the code, VBA
sees the broken lines as syntax errors.

Wherever you see a line that starts at the left margin, and that line or the
next line is red, you need to delete the carriage return that causes the
line to break. For example, these two lines should be only one line:

If SingleWord "A" Or SingleWord "z" Then SingleWord
= "" 'Out of range?

--
Regards,
Jay Freedman
Microsoft Word MVP FAQ: http://word.mvps.org
Email cannot be acknowledged; please post all follow-ups to the newsgroup so
all may benefit.

george 16-17 wrote:
Hi Doug,

I apologize in advance for hijacking this thread - sorry!

I am working on a similar project, as I need to count word frequency
in a document. That being said, I am relatively new to macros and VBA.

I copied and pasted in the code into a module and there where numerous
syntax errors (lines of code in red font). I must be doing something
wrong.

Any assistance would be appreciated...thanks in advance,
george

"Doug Robbins - Word MVP" wrote:

Here's a macro that will count the frequency of the words in a
document to allow you to determine the top 10 most frequently used
words:

Sub WordFrequency()



Dim SingleWord As String 'Raw word pulled from doc
Const maxwords = 9000 'Maximum unique words
allowed Dim Words(maxwords) As String 'Array to hold
unique words Dim Freq(maxwords) As Integer 'Frequency
counter for Unique
Words
Dim WordNum As Integer 'Number of unique words
Dim ByFreq As Boolean 'Flag for sorting order
Dim ttlwds As Long 'Total words in the
document Dim Excludes As String 'Words to be
excluded Dim Found As Boolean 'Temporary flag
Dim j, k, l, Temp As Integer 'Temporary variables
Dim tword As String '



' Set up excluded words
' Excludes =
"[the][a][of][is][to][for][this][that][by][be][and][are]"
Excludes = ""
Excludes = InputBox$("Enter words that you wish to exclude,
surrounding each word with [ ].", "Excluded Words", "")
' Excludes = Excludes & InputBox$("The following words are
excluded: " & Excludes & ". Enter words that you wish to exclude,
surrounding each word with [ ].", "Excluded Words", "")
' Find out how to sort
ByFreq = True
Ans = InputBox$("Sort by WORD or by FREQ?", "Sort order",
"FREQ") If Ans = "" Then End
If UCase(Ans) = "WORD" Then
ByFreq = False
End If



Selection.HomeKey Unit:=wdStory
System.Cursor = wdCursorWait
WordNum = 0
ttlwds = ActiveDocument.Words.Count
Totalwords = ActiveDocument.Words.Count



' Control the repeat
For Each aword In ActiveDocument.Words
SingleWord = Trim(aword)
If SingleWord "A" Or SingleWord "z" Then SingleWord
= "" 'Out of range?
If InStr(Excludes, "[" & SingleWord & "]") Then
SingleWord = "" 'On exclude list?
If Len(SingleWord) 0 Then
Found = False
For j = 1 To WordNum
If Words(j) = SingleWord Then
Freq(j) = Freq(j) + 1
Found = True
Exit For
End If
Next j
If Not Found Then
WordNum = WordNum + 1
Words(WordNum) = SingleWord
Freq(WordNum) = 1
End If
If WordNum maxwords - 1 Then
j = MsgBox("The maximum array size has been
exceeded. Increase maxwords.", vbOKOnly)
Exit For
End If
End If
ttlwds = ttlwds - 1
StatusBar = "Remaining: " & ttlwds & " Unique: " &
WordNum Next aword



' Now sort it into word order
For j = 1 To WordNum - 1
k = j
For l = j + 1 To WordNum
If (Not ByFreq And Words(l) Words(k)) Or (ByFreq
And Freq(l) Freq(k)) Then k = l
Next l
If k j Then
tword = Words(j)
Words(j) = Words(k)
Words(k) = tword
Temp = Freq(j)
Freq(j) = Freq(k)
Freq(k) = Temp
End If
StatusBar = "Sorting: " & WordNum - j
Next j



' Now write out the results
tmpName = ActiveDocument.AttachedTemplate.FullName
Documents.Add Template:=tmpName, NewTemplate:=False
Selection.ParagraphFormat.TabStops.ClearAll
With Selection
For j = 1 To WordNum
.TypeText Text:=Words(j) & vbTab &
Trim(Str(Freq(j))) & vbCrLf
Next j
End With
ActiveDocument.Range.Select
Selection.ConvertToTable
Selection.Collapse wdCollapseStart
ActiveDocument.Tables(1).Rows.Add
BeforeRow:=Selection.Rows(1)
ActiveDocument.Tables(1).Cell(1, 1).Range.InsertBefore
"Word" ActiveDocument.Tables(1).Cell(1, 2).Range.InsertBefore
"Occurrences"
ActiveDocument.Tables(1).Range.ParagraphFormat.Ali gnment =
wdAlignParagraphCenter ActiveDocument.Tables(1).Rows.Add

ActiveDocument.Tables(1).Cell(ActiveDocument.Table s(1).Rows.Count,
1).Range.InsertBefore "Total words in Document"
ActiveDocument.Tables(1).Cell(ActiveDocument.Table s(1).Rows.Count,
2).Range.InsertBefore Totalwords
ActiveDocument.Tables(1).Rows.Add
ActiveDocument.Tables(1).Cell(ActiveDocument.Table s(1).Rows.Count,
1).Range.InsertBefore "Number of different words in
Document"


ActiveDocument.Tables(1).Cell(ActiveDocument.Table s(1).Rows.Count,
2).Range.InsertBefore Trim(Str(WordNum)) System.Cursor =
wdCursorNormal ' j = MsgBox("There were " & Trim(Str(WordNum)) & "
different words ", vbOKOnly, "Finished") Selection.HomeKey wdStory



End Sub


--
Hope this helps.

Please reply to the newsgroup unless you wish to avail yourself of my
services on a paid consulting basis.

Doug Robbins - Word MVP

"Kat3n" wrote in message
...
I create policies using Word 2003. I would like to be able to count
each word
in the entire policy and use the top ten words (ignoring words such
as I, the, in, etc.) as keywords. Is there anyway to do this?
--
Kat3n






  #6   Report Post  
Posted to microsoft.public.word.docmanagement
george 16-17 george 16-17 is offline
external usenet poster
 
Posts: 3
Default Top Ten Word count

Works perfectly! Exactly what I needed.

Thanks!

"george 16-17" wrote:

Jay,

I appreciate that piece of info. I am too much of a noob to figure that out
on my own.

I will fix the code and try again.

Thanks again,
george

"Jay Freedman" wrote:

Newsgroup posting forces a limited line length, and that causes long code
lines to be broken by a carriage return. When you copy/paste the code, VBA
sees the broken lines as syntax errors.

Wherever you see a line that starts at the left margin, and that line or the
next line is red, you need to delete the carriage return that causes the
line to break. For example, these two lines should be only one line:

If SingleWord "A" Or SingleWord "z" Then SingleWord
= "" 'Out of range?

--
Regards,
Jay Freedman
Microsoft Word MVP FAQ: http://word.mvps.org
Email cannot be acknowledged; please post all follow-ups to the newsgroup so
all may benefit.

george 16-17 wrote:
Hi Doug,

I apologize in advance for hijacking this thread - sorry!

I am working on a similar project, as I need to count word frequency
in a document. That being said, I am relatively new to macros and VBA.

I copied and pasted in the code into a module and there where numerous
syntax errors (lines of code in red font). I must be doing something
wrong.

Any assistance would be appreciated...thanks in advance,
george

"Doug Robbins - Word MVP" wrote:

Here's a macro that will count the frequency of the words in a
document to allow you to determine the top 10 most frequently used
words:

Sub WordFrequency()



Dim SingleWord As String 'Raw word pulled from doc
Const maxwords = 9000 'Maximum unique words
allowed Dim Words(maxwords) As String 'Array to hold
unique words Dim Freq(maxwords) As Integer 'Frequency
counter for Unique
Words
Dim WordNum As Integer 'Number of unique words
Dim ByFreq As Boolean 'Flag for sorting order
Dim ttlwds As Long 'Total words in the
document Dim Excludes As String 'Words to be
excluded Dim Found As Boolean 'Temporary flag
Dim j, k, l, Temp As Integer 'Temporary variables
Dim tword As String '



' Set up excluded words
' Excludes =
"[the][a][of][is][to][for][this][that][by][be][and][are]"
Excludes = ""
Excludes = InputBox$("Enter words that you wish to exclude,
surrounding each word with [ ].", "Excluded Words", "")
' Excludes = Excludes & InputBox$("The following words are
excluded: " & Excludes & ". Enter words that you wish to exclude,
surrounding each word with [ ].", "Excluded Words", "")
' Find out how to sort
ByFreq = True
Ans = InputBox$("Sort by WORD or by FREQ?", "Sort order",
"FREQ") If Ans = "" Then End
If UCase(Ans) = "WORD" Then
ByFreq = False
End If



Selection.HomeKey Unit:=wdStory
System.Cursor = wdCursorWait
WordNum = 0
ttlwds = ActiveDocument.Words.Count
Totalwords = ActiveDocument.Words.Count



' Control the repeat
For Each aword In ActiveDocument.Words
SingleWord = Trim(aword)
If SingleWord "A" Or SingleWord "z" Then SingleWord
= "" 'Out of range?
If InStr(Excludes, "[" & SingleWord & "]") Then
SingleWord = "" 'On exclude list?
If Len(SingleWord) 0 Then
Found = False
For j = 1 To WordNum
If Words(j) = SingleWord Then
Freq(j) = Freq(j) + 1
Found = True
Exit For
End If
Next j
If Not Found Then
WordNum = WordNum + 1
Words(WordNum) = SingleWord
Freq(WordNum) = 1
End If
If WordNum maxwords - 1 Then
j = MsgBox("The maximum array size has been
exceeded. Increase maxwords.", vbOKOnly)
Exit For
End If
End If
ttlwds = ttlwds - 1
StatusBar = "Remaining: " & ttlwds & " Unique: " &
WordNum Next aword



' Now sort it into word order
For j = 1 To WordNum - 1
k = j
For l = j + 1 To WordNum
If (Not ByFreq And Words(l) Words(k)) Or (ByFreq
And Freq(l) Freq(k)) Then k = l
Next l
If k j Then
tword = Words(j)
Words(j) = Words(k)
Words(k) = tword
Temp = Freq(j)
Freq(j) = Freq(k)
Freq(k) = Temp
End If
StatusBar = "Sorting: " & WordNum - j
Next j



' Now write out the results
tmpName = ActiveDocument.AttachedTemplate.FullName
Documents.Add Template:=tmpName, NewTemplate:=False
Selection.ParagraphFormat.TabStops.ClearAll
With Selection
For j = 1 To WordNum
.TypeText Text:=Words(j) & vbTab &
Trim(Str(Freq(j))) & vbCrLf
Next j
End With
ActiveDocument.Range.Select
Selection.ConvertToTable
Selection.Collapse wdCollapseStart
ActiveDocument.Tables(1).Rows.Add
BeforeRow:=Selection.Rows(1)
ActiveDocument.Tables(1).Cell(1, 1).Range.InsertBefore
"Word" ActiveDocument.Tables(1).Cell(1, 2).Range.InsertBefore
"Occurrences"
ActiveDocument.Tables(1).Range.ParagraphFormat.Ali gnment =
wdAlignParagraphCenter ActiveDocument.Tables(1).Rows.Add

ActiveDocument.Tables(1).Cell(ActiveDocument.Table s(1).Rows.Count,
1).Range.InsertBefore "Total words in Document"
ActiveDocument.Tables(1).Cell(ActiveDocument.Table s(1).Rows.Count,
2).Range.InsertBefore Totalwords
ActiveDocument.Tables(1).Rows.Add
ActiveDocument.Tables(1).Cell(ActiveDocument.Table s(1).Rows.Count,
1).Range.InsertBefore "Number of different words in
Document"


ActiveDocument.Tables(1).Cell(ActiveDocument.Table s(1).Rows.Count,
2).Range.InsertBefore Trim(Str(WordNum)) System.Cursor =
wdCursorNormal ' j = MsgBox("There were " & Trim(Str(WordNum)) & "
different words ", vbOKOnly, "Finished") Selection.HomeKey wdStory



End Sub


--
Hope this helps.

Please reply to the newsgroup unless you wish to avail yourself of my
services on a paid consulting basis.

Doug Robbins - Word MVP

"Kat3n" wrote in message
...
I create policies using Word 2003. I would like to be able to count
each word
in the entire policy and use the top ten words (ignoring words such
as I, the, in, etc.) as keywords. Is there anyway to do this?
--
Kat3n




  #7   Report Post  
Posted to microsoft.public.word.docmanagement
StevenM[_2_] StevenM[_2_] is offline
external usenet poster
 
Posts: 169
Default Top Ten Word count

I've went at the problem a little differently, I also tried to remove common
contractions, possesives, etc.

'
' WordFrequencyCount
' Creates a list of all the words (and their frequency) in the active document
' and presents the words & frequencies in a new document.
' It attempts to remove common contractions, possesives, numbers and
punctuation.
' Is also has a small exclude list (which can be adjusted).
'
Sub WordFrequencyCount()
Dim WordList() As String
Dim WordCount() As Long
Dim nWords As Long
Dim Index As Long
Dim actDoc As Document
Dim newDoc As Document
Dim oTable As Table
Dim oRow As Row
Dim aWord As Object
Dim sWord As String
Dim sExcludeList As String
Dim bFrequencyThreshold As Byte
'
' The variable bFrequencyThreshold is the minimum frequency needed
' to have the word & frequency count reported. The value of 1 (or 0)
' reports every word counted. The value of 5 omits all words
' with a frequency less than 5. (A higher number allows the macro
' to run faster, since fewer items are added to the table.)
'
bFrequencyThreshold = 5
'
'
ReDim WordList(1)
ReDim WordCount(1)
WordList(1) = ""
WordCount(1) = 0
nWords = 0
Set actDoc = ActiveDocument
For Each aWord In actDoc.Words
sWord = Trim(aWord.Text)
'
' Any of the next six lines of code can be omitted.
' Omit a line of code by placing a comma before it.
' The line will then turn green (or the same color as this line).
' Chr(160) = non-breaking spaces
'
sWord = Replace(sWord, Chr(160), "")
sWord = RemoveContractions(sWord)
If IsExcluded(sWord) Then sWord = ""
If IsAllDigits(sWord) Then sWord = ""
If IsOnlyPunctuation(sWord) Then sWord = ""
If Len(sWord) = 1 Then sWord = ""
'
' Any of the above six lines of code can be omitted.
' Omit a line of code by placing a comma before it.
'
If Len(sWord) 0 Then
Index = 1
While (Index 0 And Index = nWords)
If StrComp(WordList(Index), sWord, vbTextCompare) = 0 Then
WordCount(Index) = WordCount(Index) + 1
Index = 0
Else
Index = Index + 1
End If
Wend
If Index 0 Then
If nWords = 0 Then
nWords = 1
Else
nWords = nWords + 1
Application.StatusBar = "Counting Tokens in Document: "
& nWords
ReDim Preserve WordList(nWords)
ReDim Preserve WordCount(nWords)
End If
WordList(nWords) = sWord
WordCount(nWords) = 1
End If
End If
Next aWord
Set newDoc = Documents.Add
Set oRange = newDoc.Range
Set oTable = newDoc.Tables.Add(oRange, NumRows:=1, NumColumns:=2)
With oTable.Range.Rows(1)
.Cells(1).Range.Text = "Words"
.Cells(2).Range.Text = "Count"
End With
For Index = 1 To nWords
Application.StatusBar = "Creating Table: " & nWords & ": " & Index
If WordCount(Index) = bFrequencyThreshold Then
Set oRow = oTable.Rows.Add
With oRow
.Cells(1).Range.Text = WordList(Index)
.Cells(2).Range.Text = WordCount(Index)
End With
End If
Next Index
If oTable.Rows.Count 2 Then
oTable.Sort ExcludeHeader:=True, _
FieldNumber:=2, _
SortFieldType:=wdSortFieldNumeric, _
SortOrder:=wdSortOrderDescending
End If
End Sub
'
' IsExcluded
'
' Note that comparisons are not case sensitive,
' so that "ACT" would be excluded, if "act" is excluded.
'
' Each word in the sExcludeList
' should be separated on each side by a space (" ").
' Any word can be added, and they don't need to be in any order.
'
Private Function IsExcluded(ByVal sWord As String) As Boolean
Const sExcludeList As String = " a about act after again all also an and
any are as ask at away back be been before between big but by call came can
cause close come could did do does down each end even every far few find
first follow for form from get give go good great had hard has have he help
her here high him his hot how I if in is it its just keep kind know large
last late left let like little live long look low made make man many may me
mean might more most move much must my name near need never new no not now of
off on one only or other our out over own part people place press put round
said same saw say see set she should show side small so some stand still such
take tell than that the their them then there these they thing think this
those through to too try turn two under up us use very want was way we well
went were what when where which while who why will with word work would you
your "

If InStr(1, sExcludeList, " " & sWord & " ", vbTextCompare) 0 Then
IsExcluded = True
End If
End Function

'
' IsOnlyPunctuation
' Returns true only if every character in a word string is punctuation
'
Private Function IsOnlyPunctuation(ByVal sWord As String) As Boolean
Dim sPunctuation As String
Dim sChar As String
Dim nIndex As Long
sPunctuation = " .,?';:![]{}()-_" & Chr(9) & Chr(10) & Chr(11) & Chr(12)
& Chr(13) & Chr(14) & Chr(34) & Chr(145) & Chr(146) & Chr(147) & Chr(148) &
Chr(150) & Chr(151) & Chr(160)
nIndex = 1
While (nIndex = Len(sWord))
sChar = Mid(sWord, nIndex, 1)
If InStr(1, sPunctuation, sChar, vbBinaryCompare) = 0 Then
IsOnlyPunctuation = False
Exit Function
End If
nIndex = nIndex + 1
Wend
IsOnlyPunctuation = True
End Function
'
' IsAllDigits returns true if every character is a digit (0-9)
'
Private Function IsAllDigits(ByVal sWord As String) As Boolean
Dim sChar As String
Dim nIndex As Long
nIndex = 1
While (nIndex = Len(sWord))
sChar = Mid(sWord, nIndex, 1)
If sChar "0" Or sChar "9" Then
IsAllDigits = False
Exit Function
End If
nIndex = nIndex + 1
Wend
IsAllDigits = True
End Function
'
' RemoveContractions
' Attempts to remove some obvious contractions and possesives
' Chr(146) = the closing single smart quotation mark/apostrophe
'
Private Function RemoveContractions(ByVal sWord As String) As String
If InStr(1, sWord, "'") Then
If sWord = "won't" Then
sWord = ""
ElseIf sWord = "can't" Then
sWord = "can"
ElseIf Right(sWord, 1) = "'" Then
sWord = Left(sWord, Len(sWord) - 1)
ElseIf Right(sWord, 2) = "'s" Then
sWord = Left(sWord, Len(sWord) - 2)
ElseIf Right(sWord, 2) = "'d" Then
sWord = Left(sWord, Len(sWord) - 2)
ElseIf Right(sWord, 3) = "'ll" Then
sWord = Left(sWord, Len(sWord) - 3)
ElseIf Right(sWord, 3) = "'ve" Then
sWord = Left(sWord, Len(sWord) - 3)
ElseIf Right(sWord, 3) = "'re" Then
sWord = Left(sWord, Len(sWord) - 3)
ElseIf Right(sWord, 3) = "n't" Then
sWord = Left(sWord, Len(sWord) - 3)
End If
ElseIf InStr(1, sWord, Chr(146)) Then
If sWord = "won" & Chr(146) & "t" Then
sWord = ""
ElseIf sWord = "can" & Chr(146) & "t" Then
sWord = "can"
ElseIf Right(sWord, 1) = Chr(146) Then
sWord = Left(sWord, Len(sWord) - 1)
ElseIf Right(sWord, 2) = Chr(146) & "s" Then
sWord = Left(sWord, Len(sWord) - 2)
ElseIf Right(sWord, 2) = Chr(146) & "d" Then
sWord = Left(sWord, Len(sWord) - 2)
ElseIf Right(sWord, 3) = Chr(146) & "ll" Then
sWord = Left(sWord, Len(sWord) - 3)
ElseIf Right(sWord, 3) = Chr(146) & "ve" Then
sWord = Left(sWord, Len(sWord) - 3)
ElseIf Right(sWord, 3) = Chr(146) & "re" Then
sWord = Left(sWord, Len(sWord) - 3)
ElseIf Right(sWord, 3) = "n" & Chr(146) & "t" Then
sWord = Left(sWord, Len(sWord) - 3)
End If
End If
RemoveContractions = sWord
End Function

Steven Craig Miller

"Kat3n" wrote:

I create policies using Word 2003. I would like to be able to count each word
in the entire policy and use the top ten words (ignoring words such as I,
the, in, etc.) as keywords. Is there anyway to do this?
--
Kat3n

Reply
Thread Tools
Display Modes

Posting Rules

Smilies are On
[IMG] code is On
HTML code is Off

Forum Jump

Similar Threads
Thread Thread Starter Forum Replies Last Post
Count of words in Word 2007 and count in 2003 are different Gabi Microsoft Word Help 2 May 27th 08 12:20 PM
How do I get word count to count my headers and footers? Working with word argh!!! Microsoft Word Help 2 October 16th 06 04:33 PM
Can I set word count to not count words with three letters less? DianaHolmes Microsoft Word Help 8 September 24th 06 05:53 AM
Word's word count feature should be able to count sentences also Connor Microsoft Word Help 31 November 15th 05 07:03 PM
A word count that appears next to the page count Wordy88 Microsoft Word Help 1 June 24th 05 12:30 AM


All times are GMT +1. The time now is 07:00 PM.

Copyright ©2000 - 2024, Jelsoft Enterprises Ltd.
Copyright ©2004-2024 Microsoft Office Word Forum - WordBanter.
The comments are property of their posters.
 

About Us

"It's about Microsoft Word"