Home |
Search |
Today's Posts |
#1
Posted to microsoft.public.word.docmanagement
|
|||
|
|||
Top Ten Word count
I create policies using Word 2003. I would like to be able to count each word
in the entire policy and use the top ten words (ignoring words such as I, the, in, etc.) as keywords. Is there anyway to do this? -- Kat3n |
#2
Posted to microsoft.public.word.docmanagement
|
|||
|
|||
Top Ten Word count
Here's a macro that will count the frequency of the words in a document to
allow you to determine the top 10 most frequently used words: Sub WordFrequency() Dim SingleWord As String 'Raw word pulled from doc Const maxwords = 9000 'Maximum unique words allowed Dim Words(maxwords) As String 'Array to hold unique words Dim Freq(maxwords) As Integer 'Frequency counter for Unique Words Dim WordNum As Integer 'Number of unique words Dim ByFreq As Boolean 'Flag for sorting order Dim ttlwds As Long 'Total words in the document Dim Excludes As String 'Words to be excluded Dim Found As Boolean 'Temporary flag Dim j, k, l, Temp As Integer 'Temporary variables Dim tword As String ' ' Set up excluded words ' Excludes = "[the][a][of][is][to][for][this][that][by][be][and][are]" Excludes = "" Excludes = InputBox$("Enter words that you wish to exclude, surrounding each word with [ ].", "Excluded Words", "") ' Excludes = Excludes & InputBox$("The following words are excluded: " & Excludes & ". Enter words that you wish to exclude, surrounding each word with [ ].", "Excluded Words", "") ' Find out how to sort ByFreq = True Ans = InputBox$("Sort by WORD or by FREQ?", "Sort order", "FREQ") If Ans = "" Then End If UCase(Ans) = "WORD" Then ByFreq = False End If Selection.HomeKey Unit:=wdStory System.Cursor = wdCursorWait WordNum = 0 ttlwds = ActiveDocument.Words.Count Totalwords = ActiveDocument.Words.Count ' Control the repeat For Each aword In ActiveDocument.Words SingleWord = Trim(aword) If SingleWord "A" Or SingleWord "z" Then SingleWord = "" 'Out of range? If InStr(Excludes, "[" & SingleWord & "]") Then SingleWord = "" 'On exclude list? If Len(SingleWord) 0 Then Found = False For j = 1 To WordNum If Words(j) = SingleWord Then Freq(j) = Freq(j) + 1 Found = True Exit For End If Next j If Not Found Then WordNum = WordNum + 1 Words(WordNum) = SingleWord Freq(WordNum) = 1 End If If WordNum maxwords - 1 Then j = MsgBox("The maximum array size has been exceeded. Increase maxwords.", vbOKOnly) Exit For End If End If ttlwds = ttlwds - 1 StatusBar = "Remaining: " & ttlwds & " Unique: " & WordNum Next aword ' Now sort it into word order For j = 1 To WordNum - 1 k = j For l = j + 1 To WordNum If (Not ByFreq And Words(l) Words(k)) Or (ByFreq And Freq(l) Freq(k)) Then k = l Next l If k j Then tword = Words(j) Words(j) = Words(k) Words(k) = tword Temp = Freq(j) Freq(j) = Freq(k) Freq(k) = Temp End If StatusBar = "Sorting: " & WordNum - j Next j ' Now write out the results tmpName = ActiveDocument.AttachedTemplate.FullName Documents.Add Template:=tmpName, NewTemplate:=False Selection.ParagraphFormat.TabStops.ClearAll With Selection For j = 1 To WordNum .TypeText Text:=Words(j) & vbTab & Trim(Str(Freq(j))) & vbCrLf Next j End With ActiveDocument.Range.Select Selection.ConvertToTable Selection.Collapse wdCollapseStart ActiveDocument.Tables(1).Rows.Add BeforeRow:=Selection.Rows(1) ActiveDocument.Tables(1).Cell(1, 1).Range.InsertBefore "Word" ActiveDocument.Tables(1).Cell(1, 2).Range.InsertBefore "Occurrences" ActiveDocument.Tables(1).Range.ParagraphFormat.Ali gnment = wdAlignParagraphCenter ActiveDocument.Tables(1).Rows.Add ActiveDocument.Tables(1).Cell(ActiveDocument.Table s(1).Rows.Count, 1).Range.InsertBefore "Total words in Document" ActiveDocument.Tables(1).Cell(ActiveDocument.Table s(1).Rows.Count, 2).Range.InsertBefore Totalwords ActiveDocument.Tables(1).Rows.Add ActiveDocument.Tables(1).Cell(ActiveDocument.Table s(1).Rows.Count, 1).Range.InsertBefore "Number of different words in Document" ActiveDocument.Tables(1).Cell(ActiveDocument.Table s(1).Rows.Count, 2).Range.InsertBefore Trim(Str(WordNum)) System.Cursor = wdCursorNormal ' j = MsgBox("There were " & Trim(Str(WordNum)) & " different words ", vbOKOnly, "Finished") Selection.HomeKey wdStory End Sub -- Hope this helps. Please reply to the newsgroup unless you wish to avail yourself of my services on a paid consulting basis. Doug Robbins - Word MVP "Kat3n" wrote in message ... I create policies using Word 2003. I would like to be able to count each word in the entire policy and use the top ten words (ignoring words such as I, the, in, etc.) as keywords. Is there anyway to do this? -- Kat3n |
#3
Posted to microsoft.public.word.docmanagement
|
|||
|
|||
Top Ten Word count
Hi Doug,
I apologize in advance for hijacking this thread - sorry! I am working on a similar project, as I need to count word frequency in a document. That being said, I am relatively new to macros and VBA. I copied and pasted in the code into a module and there where numerous syntax errors (lines of code in red font). I must be doing something wrong. Any assistance would be appreciated...thanks in advance, george "Doug Robbins - Word MVP" wrote: Here's a macro that will count the frequency of the words in a document to allow you to determine the top 10 most frequently used words: Sub WordFrequency() Dim SingleWord As String 'Raw word pulled from doc Const maxwords = 9000 'Maximum unique words allowed Dim Words(maxwords) As String 'Array to hold unique words Dim Freq(maxwords) As Integer 'Frequency counter for Unique Words Dim WordNum As Integer 'Number of unique words Dim ByFreq As Boolean 'Flag for sorting order Dim ttlwds As Long 'Total words in the document Dim Excludes As String 'Words to be excluded Dim Found As Boolean 'Temporary flag Dim j, k, l, Temp As Integer 'Temporary variables Dim tword As String ' ' Set up excluded words ' Excludes = "[the][a][of][is][to][for][this][that][by][be][and][are]" Excludes = "" Excludes = InputBox$("Enter words that you wish to exclude, surrounding each word with [ ].", "Excluded Words", "") ' Excludes = Excludes & InputBox$("The following words are excluded: " & Excludes & ". Enter words that you wish to exclude, surrounding each word with [ ].", "Excluded Words", "") ' Find out how to sort ByFreq = True Ans = InputBox$("Sort by WORD or by FREQ?", "Sort order", "FREQ") If Ans = "" Then End If UCase(Ans) = "WORD" Then ByFreq = False End If Selection.HomeKey Unit:=wdStory System.Cursor = wdCursorWait WordNum = 0 ttlwds = ActiveDocument.Words.Count Totalwords = ActiveDocument.Words.Count ' Control the repeat For Each aword In ActiveDocument.Words SingleWord = Trim(aword) If SingleWord "A" Or SingleWord "z" Then SingleWord = "" 'Out of range? If InStr(Excludes, "[" & SingleWord & "]") Then SingleWord = "" 'On exclude list? If Len(SingleWord) 0 Then Found = False For j = 1 To WordNum If Words(j) = SingleWord Then Freq(j) = Freq(j) + 1 Found = True Exit For End If Next j If Not Found Then WordNum = WordNum + 1 Words(WordNum) = SingleWord Freq(WordNum) = 1 End If If WordNum maxwords - 1 Then j = MsgBox("The maximum array size has been exceeded. Increase maxwords.", vbOKOnly) Exit For End If End If ttlwds = ttlwds - 1 StatusBar = "Remaining: " & ttlwds & " Unique: " & WordNum Next aword ' Now sort it into word order For j = 1 To WordNum - 1 k = j For l = j + 1 To WordNum If (Not ByFreq And Words(l) Words(k)) Or (ByFreq And Freq(l) Freq(k)) Then k = l Next l If k j Then tword = Words(j) Words(j) = Words(k) Words(k) = tword Temp = Freq(j) Freq(j) = Freq(k) Freq(k) = Temp End If StatusBar = "Sorting: " & WordNum - j Next j ' Now write out the results tmpName = ActiveDocument.AttachedTemplate.FullName Documents.Add Template:=tmpName, NewTemplate:=False Selection.ParagraphFormat.TabStops.ClearAll With Selection For j = 1 To WordNum .TypeText Text:=Words(j) & vbTab & Trim(Str(Freq(j))) & vbCrLf Next j End With ActiveDocument.Range.Select Selection.ConvertToTable Selection.Collapse wdCollapseStart ActiveDocument.Tables(1).Rows.Add BeforeRow:=Selection.Rows(1) ActiveDocument.Tables(1).Cell(1, 1).Range.InsertBefore "Word" ActiveDocument.Tables(1).Cell(1, 2).Range.InsertBefore "Occurrences" ActiveDocument.Tables(1).Range.ParagraphFormat.Ali gnment = wdAlignParagraphCenter ActiveDocument.Tables(1).Rows.Add ActiveDocument.Tables(1).Cell(ActiveDocument.Table s(1).Rows.Count, 1).Range.InsertBefore "Total words in Document" ActiveDocument.Tables(1).Cell(ActiveDocument.Table s(1).Rows.Count, 2).Range.InsertBefore Totalwords ActiveDocument.Tables(1).Rows.Add ActiveDocument.Tables(1).Cell(ActiveDocument.Table s(1).Rows.Count, 1).Range.InsertBefore "Number of different words in Document" ActiveDocument.Tables(1).Cell(ActiveDocument.Table s(1).Rows.Count, 2).Range.InsertBefore Trim(Str(WordNum)) System.Cursor = wdCursorNormal ' j = MsgBox("There were " & Trim(Str(WordNum)) & " different words ", vbOKOnly, "Finished") Selection.HomeKey wdStory End Sub -- Hope this helps. Please reply to the newsgroup unless you wish to avail yourself of my services on a paid consulting basis. Doug Robbins - Word MVP "Kat3n" wrote in message ... I create policies using Word 2003. I would like to be able to count each word in the entire policy and use the top ten words (ignoring words such as I, the, in, etc.) as keywords. Is there anyway to do this? -- Kat3n |
#4
Posted to microsoft.public.word.docmanagement
|
|||
|
|||
Top Ten Word count
Newsgroup posting forces a limited line length, and that causes long code
lines to be broken by a carriage return. When you copy/paste the code, VBA sees the broken lines as syntax errors. Wherever you see a line that starts at the left margin, and that line or the next line is red, you need to delete the carriage return that causes the line to break. For example, these two lines should be only one line: If SingleWord "A" Or SingleWord "z" Then SingleWord = "" 'Out of range? -- Regards, Jay Freedman Microsoft Word MVP FAQ: http://word.mvps.org Email cannot be acknowledged; please post all follow-ups to the newsgroup so all may benefit. george 16-17 wrote: Hi Doug, I apologize in advance for hijacking this thread - sorry! I am working on a similar project, as I need to count word frequency in a document. That being said, I am relatively new to macros and VBA. I copied and pasted in the code into a module and there where numerous syntax errors (lines of code in red font). I must be doing something wrong. Any assistance would be appreciated...thanks in advance, george "Doug Robbins - Word MVP" wrote: Here's a macro that will count the frequency of the words in a document to allow you to determine the top 10 most frequently used words: Sub WordFrequency() Dim SingleWord As String 'Raw word pulled from doc Const maxwords = 9000 'Maximum unique words allowed Dim Words(maxwords) As String 'Array to hold unique words Dim Freq(maxwords) As Integer 'Frequency counter for Unique Words Dim WordNum As Integer 'Number of unique words Dim ByFreq As Boolean 'Flag for sorting order Dim ttlwds As Long 'Total words in the document Dim Excludes As String 'Words to be excluded Dim Found As Boolean 'Temporary flag Dim j, k, l, Temp As Integer 'Temporary variables Dim tword As String ' ' Set up excluded words ' Excludes = "[the][a][of][is][to][for][this][that][by][be][and][are]" Excludes = "" Excludes = InputBox$("Enter words that you wish to exclude, surrounding each word with [ ].", "Excluded Words", "") ' Excludes = Excludes & InputBox$("The following words are excluded: " & Excludes & ". Enter words that you wish to exclude, surrounding each word with [ ].", "Excluded Words", "") ' Find out how to sort ByFreq = True Ans = InputBox$("Sort by WORD or by FREQ?", "Sort order", "FREQ") If Ans = "" Then End If UCase(Ans) = "WORD" Then ByFreq = False End If Selection.HomeKey Unit:=wdStory System.Cursor = wdCursorWait WordNum = 0 ttlwds = ActiveDocument.Words.Count Totalwords = ActiveDocument.Words.Count ' Control the repeat For Each aword In ActiveDocument.Words SingleWord = Trim(aword) If SingleWord "A" Or SingleWord "z" Then SingleWord = "" 'Out of range? If InStr(Excludes, "[" & SingleWord & "]") Then SingleWord = "" 'On exclude list? If Len(SingleWord) 0 Then Found = False For j = 1 To WordNum If Words(j) = SingleWord Then Freq(j) = Freq(j) + 1 Found = True Exit For End If Next j If Not Found Then WordNum = WordNum + 1 Words(WordNum) = SingleWord Freq(WordNum) = 1 End If If WordNum maxwords - 1 Then j = MsgBox("The maximum array size has been exceeded. Increase maxwords.", vbOKOnly) Exit For End If End If ttlwds = ttlwds - 1 StatusBar = "Remaining: " & ttlwds & " Unique: " & WordNum Next aword ' Now sort it into word order For j = 1 To WordNum - 1 k = j For l = j + 1 To WordNum If (Not ByFreq And Words(l) Words(k)) Or (ByFreq And Freq(l) Freq(k)) Then k = l Next l If k j Then tword = Words(j) Words(j) = Words(k) Words(k) = tword Temp = Freq(j) Freq(j) = Freq(k) Freq(k) = Temp End If StatusBar = "Sorting: " & WordNum - j Next j ' Now write out the results tmpName = ActiveDocument.AttachedTemplate.FullName Documents.Add Template:=tmpName, NewTemplate:=False Selection.ParagraphFormat.TabStops.ClearAll With Selection For j = 1 To WordNum .TypeText Text:=Words(j) & vbTab & Trim(Str(Freq(j))) & vbCrLf Next j End With ActiveDocument.Range.Select Selection.ConvertToTable Selection.Collapse wdCollapseStart ActiveDocument.Tables(1).Rows.Add BeforeRow:=Selection.Rows(1) ActiveDocument.Tables(1).Cell(1, 1).Range.InsertBefore "Word" ActiveDocument.Tables(1).Cell(1, 2).Range.InsertBefore "Occurrences" ActiveDocument.Tables(1).Range.ParagraphFormat.Ali gnment = wdAlignParagraphCenter ActiveDocument.Tables(1).Rows.Add ActiveDocument.Tables(1).Cell(ActiveDocument.Table s(1).Rows.Count, 1).Range.InsertBefore "Total words in Document" ActiveDocument.Tables(1).Cell(ActiveDocument.Table s(1).Rows.Count, 2).Range.InsertBefore Totalwords ActiveDocument.Tables(1).Rows.Add ActiveDocument.Tables(1).Cell(ActiveDocument.Table s(1).Rows.Count, 1).Range.InsertBefore "Number of different words in Document" ActiveDocument.Tables(1).Cell(ActiveDocument.Table s(1).Rows.Count, 2).Range.InsertBefore Trim(Str(WordNum)) System.Cursor = wdCursorNormal ' j = MsgBox("There were " & Trim(Str(WordNum)) & " different words ", vbOKOnly, "Finished") Selection.HomeKey wdStory End Sub -- Hope this helps. Please reply to the newsgroup unless you wish to avail yourself of my services on a paid consulting basis. Doug Robbins - Word MVP "Kat3n" wrote in message ... I create policies using Word 2003. I would like to be able to count each word in the entire policy and use the top ten words (ignoring words such as I, the, in, etc.) as keywords. Is there anyway to do this? -- Kat3n |
#5
Posted to microsoft.public.word.docmanagement
|
|||
|
|||
Top Ten Word count
Jay,
I appreciate that piece of info. I am too much of a noob to figure that out on my own. I will fix the code and try again. Thanks again, george "Jay Freedman" wrote: Newsgroup posting forces a limited line length, and that causes long code lines to be broken by a carriage return. When you copy/paste the code, VBA sees the broken lines as syntax errors. Wherever you see a line that starts at the left margin, and that line or the next line is red, you need to delete the carriage return that causes the line to break. For example, these two lines should be only one line: If SingleWord "A" Or SingleWord "z" Then SingleWord = "" 'Out of range? -- Regards, Jay Freedman Microsoft Word MVP FAQ: http://word.mvps.org Email cannot be acknowledged; please post all follow-ups to the newsgroup so all may benefit. george 16-17 wrote: Hi Doug, I apologize in advance for hijacking this thread - sorry! I am working on a similar project, as I need to count word frequency in a document. That being said, I am relatively new to macros and VBA. I copied and pasted in the code into a module and there where numerous syntax errors (lines of code in red font). I must be doing something wrong. Any assistance would be appreciated...thanks in advance, george "Doug Robbins - Word MVP" wrote: Here's a macro that will count the frequency of the words in a document to allow you to determine the top 10 most frequently used words: Sub WordFrequency() Dim SingleWord As String 'Raw word pulled from doc Const maxwords = 9000 'Maximum unique words allowed Dim Words(maxwords) As String 'Array to hold unique words Dim Freq(maxwords) As Integer 'Frequency counter for Unique Words Dim WordNum As Integer 'Number of unique words Dim ByFreq As Boolean 'Flag for sorting order Dim ttlwds As Long 'Total words in the document Dim Excludes As String 'Words to be excluded Dim Found As Boolean 'Temporary flag Dim j, k, l, Temp As Integer 'Temporary variables Dim tword As String ' ' Set up excluded words ' Excludes = "[the][a][of][is][to][for][this][that][by][be][and][are]" Excludes = "" Excludes = InputBox$("Enter words that you wish to exclude, surrounding each word with [ ].", "Excluded Words", "") ' Excludes = Excludes & InputBox$("The following words are excluded: " & Excludes & ". Enter words that you wish to exclude, surrounding each word with [ ].", "Excluded Words", "") ' Find out how to sort ByFreq = True Ans = InputBox$("Sort by WORD or by FREQ?", "Sort order", "FREQ") If Ans = "" Then End If UCase(Ans) = "WORD" Then ByFreq = False End If Selection.HomeKey Unit:=wdStory System.Cursor = wdCursorWait WordNum = 0 ttlwds = ActiveDocument.Words.Count Totalwords = ActiveDocument.Words.Count ' Control the repeat For Each aword In ActiveDocument.Words SingleWord = Trim(aword) If SingleWord "A" Or SingleWord "z" Then SingleWord = "" 'Out of range? If InStr(Excludes, "[" & SingleWord & "]") Then SingleWord = "" 'On exclude list? If Len(SingleWord) 0 Then Found = False For j = 1 To WordNum If Words(j) = SingleWord Then Freq(j) = Freq(j) + 1 Found = True Exit For End If Next j If Not Found Then WordNum = WordNum + 1 Words(WordNum) = SingleWord Freq(WordNum) = 1 End If If WordNum maxwords - 1 Then j = MsgBox("The maximum array size has been exceeded. Increase maxwords.", vbOKOnly) Exit For End If End If ttlwds = ttlwds - 1 StatusBar = "Remaining: " & ttlwds & " Unique: " & WordNum Next aword ' Now sort it into word order For j = 1 To WordNum - 1 k = j For l = j + 1 To WordNum If (Not ByFreq And Words(l) Words(k)) Or (ByFreq And Freq(l) Freq(k)) Then k = l Next l If k j Then tword = Words(j) Words(j) = Words(k) Words(k) = tword Temp = Freq(j) Freq(j) = Freq(k) Freq(k) = Temp End If StatusBar = "Sorting: " & WordNum - j Next j ' Now write out the results tmpName = ActiveDocument.AttachedTemplate.FullName Documents.Add Template:=tmpName, NewTemplate:=False Selection.ParagraphFormat.TabStops.ClearAll With Selection For j = 1 To WordNum .TypeText Text:=Words(j) & vbTab & Trim(Str(Freq(j))) & vbCrLf Next j End With ActiveDocument.Range.Select Selection.ConvertToTable Selection.Collapse wdCollapseStart ActiveDocument.Tables(1).Rows.Add BeforeRow:=Selection.Rows(1) ActiveDocument.Tables(1).Cell(1, 1).Range.InsertBefore "Word" ActiveDocument.Tables(1).Cell(1, 2).Range.InsertBefore "Occurrences" ActiveDocument.Tables(1).Range.ParagraphFormat.Ali gnment = wdAlignParagraphCenter ActiveDocument.Tables(1).Rows.Add ActiveDocument.Tables(1).Cell(ActiveDocument.Table s(1).Rows.Count, 1).Range.InsertBefore "Total words in Document" ActiveDocument.Tables(1).Cell(ActiveDocument.Table s(1).Rows.Count, 2).Range.InsertBefore Totalwords ActiveDocument.Tables(1).Rows.Add ActiveDocument.Tables(1).Cell(ActiveDocument.Table s(1).Rows.Count, 1).Range.InsertBefore "Number of different words in Document" ActiveDocument.Tables(1).Cell(ActiveDocument.Table s(1).Rows.Count, 2).Range.InsertBefore Trim(Str(WordNum)) System.Cursor = wdCursorNormal ' j = MsgBox("There were " & Trim(Str(WordNum)) & " different words ", vbOKOnly, "Finished") Selection.HomeKey wdStory End Sub -- Hope this helps. Please reply to the newsgroup unless you wish to avail yourself of my services on a paid consulting basis. Doug Robbins - Word MVP "Kat3n" wrote in message ... I create policies using Word 2003. I would like to be able to count each word in the entire policy and use the top ten words (ignoring words such as I, the, in, etc.) as keywords. Is there anyway to do this? -- Kat3n |
#6
Posted to microsoft.public.word.docmanagement
|
|||
|
|||
Top Ten Word count
Works perfectly! Exactly what I needed.
Thanks! "george 16-17" wrote: Jay, I appreciate that piece of info. I am too much of a noob to figure that out on my own. I will fix the code and try again. Thanks again, george "Jay Freedman" wrote: Newsgroup posting forces a limited line length, and that causes long code lines to be broken by a carriage return. When you copy/paste the code, VBA sees the broken lines as syntax errors. Wherever you see a line that starts at the left margin, and that line or the next line is red, you need to delete the carriage return that causes the line to break. For example, these two lines should be only one line: If SingleWord "A" Or SingleWord "z" Then SingleWord = "" 'Out of range? -- Regards, Jay Freedman Microsoft Word MVP FAQ: http://word.mvps.org Email cannot be acknowledged; please post all follow-ups to the newsgroup so all may benefit. george 16-17 wrote: Hi Doug, I apologize in advance for hijacking this thread - sorry! I am working on a similar project, as I need to count word frequency in a document. That being said, I am relatively new to macros and VBA. I copied and pasted in the code into a module and there where numerous syntax errors (lines of code in red font). I must be doing something wrong. Any assistance would be appreciated...thanks in advance, george "Doug Robbins - Word MVP" wrote: Here's a macro that will count the frequency of the words in a document to allow you to determine the top 10 most frequently used words: Sub WordFrequency() Dim SingleWord As String 'Raw word pulled from doc Const maxwords = 9000 'Maximum unique words allowed Dim Words(maxwords) As String 'Array to hold unique words Dim Freq(maxwords) As Integer 'Frequency counter for Unique Words Dim WordNum As Integer 'Number of unique words Dim ByFreq As Boolean 'Flag for sorting order Dim ttlwds As Long 'Total words in the document Dim Excludes As String 'Words to be excluded Dim Found As Boolean 'Temporary flag Dim j, k, l, Temp As Integer 'Temporary variables Dim tword As String ' ' Set up excluded words ' Excludes = "[the][a][of][is][to][for][this][that][by][be][and][are]" Excludes = "" Excludes = InputBox$("Enter words that you wish to exclude, surrounding each word with [ ].", "Excluded Words", "") ' Excludes = Excludes & InputBox$("The following words are excluded: " & Excludes & ". Enter words that you wish to exclude, surrounding each word with [ ].", "Excluded Words", "") ' Find out how to sort ByFreq = True Ans = InputBox$("Sort by WORD or by FREQ?", "Sort order", "FREQ") If Ans = "" Then End If UCase(Ans) = "WORD" Then ByFreq = False End If Selection.HomeKey Unit:=wdStory System.Cursor = wdCursorWait WordNum = 0 ttlwds = ActiveDocument.Words.Count Totalwords = ActiveDocument.Words.Count ' Control the repeat For Each aword In ActiveDocument.Words SingleWord = Trim(aword) If SingleWord "A" Or SingleWord "z" Then SingleWord = "" 'Out of range? If InStr(Excludes, "[" & SingleWord & "]") Then SingleWord = "" 'On exclude list? If Len(SingleWord) 0 Then Found = False For j = 1 To WordNum If Words(j) = SingleWord Then Freq(j) = Freq(j) + 1 Found = True Exit For End If Next j If Not Found Then WordNum = WordNum + 1 Words(WordNum) = SingleWord Freq(WordNum) = 1 End If If WordNum maxwords - 1 Then j = MsgBox("The maximum array size has been exceeded. Increase maxwords.", vbOKOnly) Exit For End If End If ttlwds = ttlwds - 1 StatusBar = "Remaining: " & ttlwds & " Unique: " & WordNum Next aword ' Now sort it into word order For j = 1 To WordNum - 1 k = j For l = j + 1 To WordNum If (Not ByFreq And Words(l) Words(k)) Or (ByFreq And Freq(l) Freq(k)) Then k = l Next l If k j Then tword = Words(j) Words(j) = Words(k) Words(k) = tword Temp = Freq(j) Freq(j) = Freq(k) Freq(k) = Temp End If StatusBar = "Sorting: " & WordNum - j Next j ' Now write out the results tmpName = ActiveDocument.AttachedTemplate.FullName Documents.Add Template:=tmpName, NewTemplate:=False Selection.ParagraphFormat.TabStops.ClearAll With Selection For j = 1 To WordNum .TypeText Text:=Words(j) & vbTab & Trim(Str(Freq(j))) & vbCrLf Next j End With ActiveDocument.Range.Select Selection.ConvertToTable Selection.Collapse wdCollapseStart ActiveDocument.Tables(1).Rows.Add BeforeRow:=Selection.Rows(1) ActiveDocument.Tables(1).Cell(1, 1).Range.InsertBefore "Word" ActiveDocument.Tables(1).Cell(1, 2).Range.InsertBefore "Occurrences" ActiveDocument.Tables(1).Range.ParagraphFormat.Ali gnment = wdAlignParagraphCenter ActiveDocument.Tables(1).Rows.Add ActiveDocument.Tables(1).Cell(ActiveDocument.Table s(1).Rows.Count, 1).Range.InsertBefore "Total words in Document" ActiveDocument.Tables(1).Cell(ActiveDocument.Table s(1).Rows.Count, 2).Range.InsertBefore Totalwords ActiveDocument.Tables(1).Rows.Add ActiveDocument.Tables(1).Cell(ActiveDocument.Table s(1).Rows.Count, 1).Range.InsertBefore "Number of different words in Document" ActiveDocument.Tables(1).Cell(ActiveDocument.Table s(1).Rows.Count, 2).Range.InsertBefore Trim(Str(WordNum)) System.Cursor = wdCursorNormal ' j = MsgBox("There were " & Trim(Str(WordNum)) & " different words ", vbOKOnly, "Finished") Selection.HomeKey wdStory End Sub -- Hope this helps. Please reply to the newsgroup unless you wish to avail yourself of my services on a paid consulting basis. Doug Robbins - Word MVP "Kat3n" wrote in message ... I create policies using Word 2003. I would like to be able to count each word in the entire policy and use the top ten words (ignoring words such as I, the, in, etc.) as keywords. Is there anyway to do this? -- Kat3n |
#7
Posted to microsoft.public.word.docmanagement
|
|||
|
|||
Top Ten Word count
I've went at the problem a little differently, I also tried to remove common
contractions, possesives, etc. ' ' WordFrequencyCount ' Creates a list of all the words (and their frequency) in the active document ' and presents the words & frequencies in a new document. ' It attempts to remove common contractions, possesives, numbers and punctuation. ' Is also has a small exclude list (which can be adjusted). ' Sub WordFrequencyCount() Dim WordList() As String Dim WordCount() As Long Dim nWords As Long Dim Index As Long Dim actDoc As Document Dim newDoc As Document Dim oTable As Table Dim oRow As Row Dim aWord As Object Dim sWord As String Dim sExcludeList As String Dim bFrequencyThreshold As Byte ' ' The variable bFrequencyThreshold is the minimum frequency needed ' to have the word & frequency count reported. The value of 1 (or 0) ' reports every word counted. The value of 5 omits all words ' with a frequency less than 5. (A higher number allows the macro ' to run faster, since fewer items are added to the table.) ' bFrequencyThreshold = 5 ' ' ReDim WordList(1) ReDim WordCount(1) WordList(1) = "" WordCount(1) = 0 nWords = 0 Set actDoc = ActiveDocument For Each aWord In actDoc.Words sWord = Trim(aWord.Text) ' ' Any of the next six lines of code can be omitted. ' Omit a line of code by placing a comma before it. ' The line will then turn green (or the same color as this line). ' Chr(160) = non-breaking spaces ' sWord = Replace(sWord, Chr(160), "") sWord = RemoveContractions(sWord) If IsExcluded(sWord) Then sWord = "" If IsAllDigits(sWord) Then sWord = "" If IsOnlyPunctuation(sWord) Then sWord = "" If Len(sWord) = 1 Then sWord = "" ' ' Any of the above six lines of code can be omitted. ' Omit a line of code by placing a comma before it. ' If Len(sWord) 0 Then Index = 1 While (Index 0 And Index = nWords) If StrComp(WordList(Index), sWord, vbTextCompare) = 0 Then WordCount(Index) = WordCount(Index) + 1 Index = 0 Else Index = Index + 1 End If Wend If Index 0 Then If nWords = 0 Then nWords = 1 Else nWords = nWords + 1 Application.StatusBar = "Counting Tokens in Document: " & nWords ReDim Preserve WordList(nWords) ReDim Preserve WordCount(nWords) End If WordList(nWords) = sWord WordCount(nWords) = 1 End If End If Next aWord Set newDoc = Documents.Add Set oRange = newDoc.Range Set oTable = newDoc.Tables.Add(oRange, NumRows:=1, NumColumns:=2) With oTable.Range.Rows(1) .Cells(1).Range.Text = "Words" .Cells(2).Range.Text = "Count" End With For Index = 1 To nWords Application.StatusBar = "Creating Table: " & nWords & ": " & Index If WordCount(Index) = bFrequencyThreshold Then Set oRow = oTable.Rows.Add With oRow .Cells(1).Range.Text = WordList(Index) .Cells(2).Range.Text = WordCount(Index) End With End If Next Index If oTable.Rows.Count 2 Then oTable.Sort ExcludeHeader:=True, _ FieldNumber:=2, _ SortFieldType:=wdSortFieldNumeric, _ SortOrder:=wdSortOrderDescending End If End Sub ' ' IsExcluded ' ' Note that comparisons are not case sensitive, ' so that "ACT" would be excluded, if "act" is excluded. ' ' Each word in the sExcludeList ' should be separated on each side by a space (" "). ' Any word can be added, and they don't need to be in any order. ' Private Function IsExcluded(ByVal sWord As String) As Boolean Const sExcludeList As String = " a about act after again all also an and any are as ask at away back be been before between big but by call came can cause close come could did do does down each end even every far few find first follow for form from get give go good great had hard has have he help her here high him his hot how I if in is it its just keep kind know large last late left let like little live long look low made make man many may me mean might more most move much must my name near need never new no not now of off on one only or other our out over own part people place press put round said same saw say see set she should show side small so some stand still such take tell than that the their them then there these they thing think this those through to too try turn two under up us use very want was way we well went were what when where which while who why will with word work would you your " If InStr(1, sExcludeList, " " & sWord & " ", vbTextCompare) 0 Then IsExcluded = True End If End Function ' ' IsOnlyPunctuation ' Returns true only if every character in a word string is punctuation ' Private Function IsOnlyPunctuation(ByVal sWord As String) As Boolean Dim sPunctuation As String Dim sChar As String Dim nIndex As Long sPunctuation = " .,?';:![]{}()-_" & Chr(9) & Chr(10) & Chr(11) & Chr(12) & Chr(13) & Chr(14) & Chr(34) & Chr(145) & Chr(146) & Chr(147) & Chr(148) & Chr(150) & Chr(151) & Chr(160) nIndex = 1 While (nIndex = Len(sWord)) sChar = Mid(sWord, nIndex, 1) If InStr(1, sPunctuation, sChar, vbBinaryCompare) = 0 Then IsOnlyPunctuation = False Exit Function End If nIndex = nIndex + 1 Wend IsOnlyPunctuation = True End Function ' ' IsAllDigits returns true if every character is a digit (0-9) ' Private Function IsAllDigits(ByVal sWord As String) As Boolean Dim sChar As String Dim nIndex As Long nIndex = 1 While (nIndex = Len(sWord)) sChar = Mid(sWord, nIndex, 1) If sChar "0" Or sChar "9" Then IsAllDigits = False Exit Function End If nIndex = nIndex + 1 Wend IsAllDigits = True End Function ' ' RemoveContractions ' Attempts to remove some obvious contractions and possesives ' Chr(146) = the closing single smart quotation mark/apostrophe ' Private Function RemoveContractions(ByVal sWord As String) As String If InStr(1, sWord, "'") Then If sWord = "won't" Then sWord = "" ElseIf sWord = "can't" Then sWord = "can" ElseIf Right(sWord, 1) = "'" Then sWord = Left(sWord, Len(sWord) - 1) ElseIf Right(sWord, 2) = "'s" Then sWord = Left(sWord, Len(sWord) - 2) ElseIf Right(sWord, 2) = "'d" Then sWord = Left(sWord, Len(sWord) - 2) ElseIf Right(sWord, 3) = "'ll" Then sWord = Left(sWord, Len(sWord) - 3) ElseIf Right(sWord, 3) = "'ve" Then sWord = Left(sWord, Len(sWord) - 3) ElseIf Right(sWord, 3) = "'re" Then sWord = Left(sWord, Len(sWord) - 3) ElseIf Right(sWord, 3) = "n't" Then sWord = Left(sWord, Len(sWord) - 3) End If ElseIf InStr(1, sWord, Chr(146)) Then If sWord = "won" & Chr(146) & "t" Then sWord = "" ElseIf sWord = "can" & Chr(146) & "t" Then sWord = "can" ElseIf Right(sWord, 1) = Chr(146) Then sWord = Left(sWord, Len(sWord) - 1) ElseIf Right(sWord, 2) = Chr(146) & "s" Then sWord = Left(sWord, Len(sWord) - 2) ElseIf Right(sWord, 2) = Chr(146) & "d" Then sWord = Left(sWord, Len(sWord) - 2) ElseIf Right(sWord, 3) = Chr(146) & "ll" Then sWord = Left(sWord, Len(sWord) - 3) ElseIf Right(sWord, 3) = Chr(146) & "ve" Then sWord = Left(sWord, Len(sWord) - 3) ElseIf Right(sWord, 3) = Chr(146) & "re" Then sWord = Left(sWord, Len(sWord) - 3) ElseIf Right(sWord, 3) = "n" & Chr(146) & "t" Then sWord = Left(sWord, Len(sWord) - 3) End If End If RemoveContractions = sWord End Function Steven Craig Miller "Kat3n" wrote: I create policies using Word 2003. I would like to be able to count each word in the entire policy and use the top ten words (ignoring words such as I, the, in, etc.) as keywords. Is there anyway to do this? -- Kat3n |
Reply |
Thread Tools | |
Display Modes | |
|
|
Similar Threads | ||||
Thread | Forum | |||
Count of words in Word 2007 and count in 2003 are different | Microsoft Word Help | |||
How do I get word count to count my headers and footers? | Microsoft Word Help | |||
Can I set word count to not count words with three letters less? | Microsoft Word Help | |||
Word's word count feature should be able to count sentences also | Microsoft Word Help | |||
A word count that appears next to the page count | Microsoft Word Help |