Home |
Search |
Today's Posts |
#1
|
|||
|
|||
How do I build a word list with number of occurrences of each wor.
I need to build a technical term list with definitions from a large Word
document. I figure if I could simply list all of the words in the document and the number of times the word is used, I could esily cut out the common words and concentrate on the tech terms. Does anyone know a way to do this in Word? Thanks, Bill Crowl |
#2
|
|||
|
|||
Bill,
This macro does a nice job of creating a word frequency list: Sub WordFrequency() Dim SingleWord As String 'Raw word pulled from doc Const maxwords = 9000 'Maximum unique words allowed Dim Words(maxwords) As String 'Array to hold unique words Dim Freq(maxwords) As Integer 'Frequency counter for Unique Words Dim WordNum As Integer 'Number of unique words Dim ByFreq As Boolean 'Flag for sorting order Dim ttlwds As Long 'Total words in the document Dim Excludes As String 'Words to be excluded Dim Found As Boolean 'Temporary flag Dim j, k, l, Temp As Integer 'Temporary variables Dim IngWordCount As Long 'Total non-excluded words in document Dim NonWordObjects As Long Dim AllWordOjects As Long Dim TotalWords As Long Dim tword As String ' 'Set up excluded words 'Excludes = "[pickleloaf][gruntbutter]" 'Excludes = Excludes & InputBox$("The following words are excluded by default: " & Excludes & ". Enter additional words that you wish to exclude, surrounding each word with [ ].", "Excluded Words", "") Excludes = InputBox$("Enter words that you wish to exclude. " _ & "Place each word within square brackets [ ]. " _ & "Example: [is][a].", "Excluded Words", "") 'Find out how to sort ByFreq = True Ans = InputBox$("Default sort order is word freqeuncy. To sort alphabetically by word, type Word in the field below.", "Sort order", "FREQ") If Ans = "" Then End If UCase(Ans) = "WORD" Then ByFreq = False End If Selection.HomeKey Unit:=wdStory System.Cursor = wdCursorWait WordNum = 0 ttlwds = ActiveDocument.Words.Count 'AllWordObjects = ActiveDocument.Words.Count 'TotalWords = NonWordObjects 'Control the repeat For Each aword In ActiveDocument.Words SingleWord = Trim(LCase(aword)) If SingleWord "a" Or SingleWord "z" Then SingleWord = "" 'Out of range? If SingleWord "a" Or SingleWord "z" Then NonWordObjects = NonWordObjects + 1 'SingleWord = Trim(aword) 'If SingleWord "A" Or SingleWord "z" Then SingleWord = "" 'Out of range? If InStr(Excludes, "[" & SingleWord & "]") Then SingleWord = "" 'On exclude list? If Len(SingleWord) 0 Then IngWordCount = IngWordCount + 1 Found = False For j = 1 To WordNum If Words(j) = SingleWord Then Freq(j) = Freq(j) + 1 Found = True Exit For End If Next j If Not Found Then WordNum = WordNum + 1 Words(WordNum) = SingleWord Freq(WordNum) = 1 End If If WordNum maxwords - 1 Then j = MsgBox("The maximum array size has been exceeded. Increase maxwords.", vbOKOnly) Exit For End If End If ttlwds = ttlwds - 1 StatusBar = "Remaining: " & ttlwds & " Unique: " & WordNum Next aword 'Now sort it into word order For j = 1 To WordNum - 1 k = j For l = j + 1 To WordNum If (Not ByFreq And Words(l) Words(k)) Or (ByFreq And Freq(l) Freq(k)) Then k = l Next l If k j Then tword = Words(j) Words(j) = Words(k) Words(k) = tword Temp = Freq(j) Freq(j) = Freq(k) Freq(k) = Temp End If StatusBar = "Sorting: " & WordNum - j Next j AllWordObjects = ActiveDocument.Words.Count NonWordObjects = NonWordObjects TotalWords = AllWordObjects - NonWordObjects 'Now write out the results tmpName = ActiveDocument.AttachedTemplate.FullName Documents.Add Template:=tmpName, NewTemplate:=False Selection.ParagraphFormat.TabStops.ClearAll With Selection For j = 1 To WordNum ..TypeText Text:=Words(j) & vbTab & Trim(Str(Freq(j))) & vbCrLf Next j End With ActiveDocument.Range.Select Selection.ConvertToTable Selection.Collapse wdCollapseStart ActiveDocument.Tables(1).Rows.Add BeforeRow:=Selection.Rows(1) ActiveDocument.Tables(1).Cell(1, 1).Range.InsertBefore "Unique Words" ActiveDocument.Tables(1).Cell(1, 2).Range.InsertBefore "Number of Occurrences" ActiveDocument.Tables(1).Columns(2).Select Selection.ParagraphFormat.Alignment = wdAlignParagraphRight Selection.Collapse wdCollapseStart ActiveDocument.Tables(1).Rows(1).Shading.Backgroun dPatternColor = wdColorGray20 ActiveDocument.Tables(1).Columns(1).PreferredWidth = InchesToPoints(4.75) ActiveDocument.Tables(1).Columns(2).PreferredWidth = InchesToPoints(1.9) ActiveDocument.Tables(1).Rows.Add ActiveDocument.Tables(1).Cell(ActiveDocument.Table s(1).Rows.Count, 1).Range.InsertBefore "Summary" ActiveDocument.Tables(1).Cell(ActiveDocument.Table s(1).Rows.Count, 2).Range.InsertBefore "Total" ActiveDocument.Tables(1).Rows(ActiveDocument.Table s(1).Rows.Count).Shading.BackgroundPatternColor = wdColorGray20 ActiveDocument.Tables(1).Rows.Add ActiveDocument.Tables(1).Cell(ActiveDocument.Table s(1).Rows.Count, 1).Range.InsertBefore "Number of Unique Words in Document" ActiveDocument.Tables(1).Cell(ActiveDocument.Table s(1).Rows.Count, 2).Range.InsertBefore Trim(Str(WordNum)) ActiveDocument.Tables(1).Rows(ActiveDocument.Table s(1).Rows.Count).Shading.BackgroundPatternColor = wdColorAutomatic ActiveDocument.Tables(1).Rows.Add ActiveDocument.Tables(1).Cell(ActiveDocument.Table s(1).Rows.Count, 1).Range.InsertBefore "Number of Non-Excluded Words in Document" ActiveDocument.Tables(1).Cell(ActiveDocument.Table s(1).Rows.Count, 2).Range.InsertBefore (IngWordCount) ActiveDocument.Tables(1).Rows.Add ActiveDocument.Tables(1).Cell(ActiveDocument.Table s(1).Rows.Count, 1).Range.InsertBefore "Number of Words (Excluded and Non-Excluded) in Document" ActiveDocument.Tables(1).Cell(ActiveDocument.Table s(1).Rows.Count, 2).Range.InsertBefore (TotalWords) System.Cursor = wdCursorNormal MsgBox "This document contains " & Trim(Str(WordNum)) & " unique words. " MsgBox "This document contains " & IngWordCount & " non-excluded words. " MsgBox "This document contains a total of " & TotalWords & " (excluded and non-excluded) words. " MsgBox "For more statistics on this document, use ToolsWord Count in the original document. " Selection.HomeKey wdStory End Sub -- Greg Maxey/Word MVP See: http://gregmaxey.mvps.org/word_tips.htm For some helpful tips using Word. Bill Crowl wrote: I need to build a technical term list with definitions from a large Word document. I figure if I could simply list all of the words in the document and the number of times the word is used, I could esily cut out the common words and concentrate on the tech terms. Does anyone know a way to do this in Word? Thanks, Bill Crowl |
#3
|
|||
|
|||
"Bill Crowl" wrote: I need to build a technical term list with definitions from a large Word document. I figure if I could simply list all of the words in the document and the number of times the word is used, I could esily cut out the common words and concentrate on the tech terms. Does anyone know a way to do this in Word? Thanks, Bill Crowl Thank you. I'll try it. Bill |
Reply |
Thread Tools | |
Display Modes | |
|
|
Similar Threads | ||||
Thread | Forum | |||
Reveal Formating Code | Microsoft Word Help | |||
How can I keep the PDFMaker toolbar from appearing every time I o. | Microsoft Word Help | |||
Maximum number of pages in a word document | Formatting Long Documents | |||
How do I create & merge specific data base & master documents? | New Users | |||
How do I convert a cd in word perfect to microsoft word | Microsoft Word Help |