Home |
Search |
Today's Posts |
#1
Posted to microsoft.public.word.pagelayout
|
|||
|
|||
How to extract raw text from columns
I have a word doc that the author created columns in and I need to get the raw text. If I save it as txt the formatting gets messed up. When I look at the page (or print it) I see somthing like: Date: xx/xx/xx Name: Fred Time: xx:xx Occupation: Tech Support When I save as text, or select, copy & paste in notepad, I see something like: Date: xx/xx/xx Name: Time: xx:xx Occupation: Fred Tech Support I have a lot more info on the page which makes it impossible for me to parse it out. Is there a way to just remove the columns and preserve the same text on the same line? Thanks! -- Dave Miles |
#2
Posted to microsoft.public.word.pagelayout
|
|||
|
|||
How to extract raw text from columns
Hi Dave,
Have you tried Tabel|Convert|Table to Text? -- Cheers macropod [Microsoft MVP - Word] "Dave Miles" wrote in message news I have a word doc that the author created columns in and I need to get the raw text. If I save it as txt the formatting gets messed up. When I look at the page (or print it) I see somthing like: Date: xx/xx/xx Name: Fred Time: xx:xx Occupation: Tech Support When I save as text, or select, copy & paste in notepad, I see something like: Date: xx/xx/xx Name: Time: xx:xx Occupation: Fred Tech Support I have a lot more info on the page which makes it impossible for me to parse it out. Is there a way to just remove the columns and preserve the same text on the same line? Thanks! -- Dave Miles |
#3
Posted to microsoft.public.word.pagelayout
|
|||
|
|||
How to extract raw text from columns
It's not a table so the option is not avail to me
"macropod" wrote: Hi Dave, Have you tried Tabel|Convert|Table to Text? -- Cheers macropod [Microsoft MVP - Word] "Dave Miles" wrote in message news I have a word doc that the author created columns in and I need to get the raw text. If I save it as txt the formatting gets messed up. When I look at the page (or print it) I see somthing like: Date: xx/xx/xx Name: Fred Time: xx:xx Occupation: Tech Support When I save as text, or select, copy & paste in notepad, I see something like: Date: xx/xx/xx Name: Time: xx:xx Occupation: Fred Tech Support I have a lot more info on the page which makes it impossible for me to parse it out. Is there a way to just remove the columns and preserve the same text on the same line? Thanks! -- Dave Miles . |
#4
Posted to microsoft.public.word.pagelayout
|
|||
|
|||
How to extract raw text from columns
Send me a copy of the document to look at.
-- Hope this helps Doug Robbins - Word MVP Please reply only to the newsgroups unless you wish to avail yourself of my services on a paid, professional basis. "Dave Miles" wrote in message ... It's not a table so the option is not avail to me "macropod" wrote: Hi Dave, Have you tried Tabel|Convert|Table to Text? -- Cheers macropod [Microsoft MVP - Word] "Dave Miles" wrote in message news I have a word doc that the author created columns in and I need to get the raw text. If I save it as txt the formatting gets messed up. When I look at the page (or print it) I see somthing like: Date: xx/xx/xx Name: Fred Time: xx:xx Occupation: Tech Support When I save as text, or select, copy & paste in notepad, I see something like: Date: xx/xx/xx Name: Time: xx:xx Occupation: Fred Tech Support I have a lot more info on the page which makes it impossible for me to parse it out. Is there a way to just remove the columns and preserve the same text on the same line? Thanks! -- Dave Miles . |
#5
Posted to microsoft.public.word.pagelayout
|
|||
|
|||
How to extract raw text from columns
So what sort of column arrangement are you using? And how do you keep the items aligned?
-- Cheers macropod [Microsoft MVP - Word] "Dave Miles" wrote in message ... It's not a table so the option is not avail to me "macropod" wrote: Hi Dave, Have you tried Tabel|Convert|Table to Text? -- Cheers macropod [Microsoft MVP - Word] "Dave Miles" wrote in message news I have a word doc that the author created columns in and I need to get the raw text. If I save it as txt the formatting gets messed up. When I look at the page (or print it) I see somthing like: Date: xx/xx/xx Name: Fred Time: xx:xx Occupation: Tech Support When I save as text, or select, copy & paste in notepad, I see something like: Date: xx/xx/xx Name: Time: xx:xx Occupation: Fred Tech Support I have a lot more info on the page which makes it impossible for me to parse it out. Is there a way to just remove the columns and preserve the same text on the same line? Thanks! -- Dave Miles . |
#6
Posted to microsoft.public.word.pagelayout
|
|||
|
|||
How to extract raw text from columns
Hi Paul,
Dave sent me one of the documents and I believe that it may have been produced via OCR. I am sending him the following response: You can clean up the document a lot by using EditReplace to first replace ^b with nothing to remove all of the Section Breaks, then ^n with nothing to remove the column breaks, then use Ctrl+A to select everything and use the Format Paragraph dialog to set the paragraph indents to 0 and the Special Indent to None. Then use EditReplace again to replace ^t with ^p. A macro could be written to perform all of the above and to further process the documents (assuming that you have many to do), you could create a list of the attributes for which you want to extract the values, and then use this in a macro that iterated through that list and then inserted a tab after each attribute. If you then used Convert Text to Table, you would have most of the information in a two column table with the attributes in the first column and the values in the second column. There would be a few exceptions such as the addresses and a bit more attention would need to be paid to the Loan Details section With a bit of work however, and depending upon how similar the documents are and what you want as the final result, it should be possible to create some code that would do a fairly complete job of parsing the data from the document. -- Hope this helps Doug Robbins - Word MVP Please reply only to the newsgroups unless you wish to avail yourself of my services on a paid, professional basis. "macropod" wrote in message ... So what sort of column arrangement are you using? And how do you keep the items aligned? -- Cheers macropod [Microsoft MVP - Word] "Dave Miles" wrote in message ... It's not a table so the option is not avail to me "macropod" wrote: Hi Dave, Have you tried Tabel|Convert|Table to Text? -- Cheers macropod [Microsoft MVP - Word] "Dave Miles" wrote in message news I have a word doc that the author created columns in and I need to get the raw text. If I save it as txt the formatting gets messed up. When I look at the page (or print it) I see somthing like: Date: xx/xx/xx Name: Fred Time: xx:xx Occupation: Tech Support When I save as text, or select, copy & paste in notepad, I see something like: Date: xx/xx/xx Name: Time: xx:xx Occupation: Fred Tech Support I have a lot more info on the page which makes it impossible for me to parse it out. Is there a way to just remove the columns and preserve the same text on the same line? Thanks! -- Dave Miles . |
#7
Posted to microsoft.public.word.pagelayout
|
|||
|
|||
How to extract raw text from columns
Hey Doug & Paul,
I think the docs may be generated by Access. I understand that the source comes in in Excel and the reports are generated from that. Yes, the simple answer would be to work from the Excel sheets but they contain more data than I license so I have to take what I get......sad but true "Doug Robbins - Word MVP" wrote: Hi Paul, Dave sent me one of the documents and I believe that it may have been produced via OCR. I am sending him the following response: You can clean up the document a lot by using EditReplace to first replace ^b with nothing to remove all of the Section Breaks, then ^n with nothing to remove the column breaks, then use Ctrl+A to select everything and use the Format Paragraph dialog to set the paragraph indents to 0 and the Special Indent to None. Then use EditReplace again to replace ^t with ^p. A macro could be written to perform all of the above and to further process the documents (assuming that you have many to do), you could create a list of the attributes for which you want to extract the values, and then use this in a macro that iterated through that list and then inserted a tab after each attribute. If you then used Convert Text to Table, you would have most of the information in a two column table with the attributes in the first column and the values in the second column. There would be a few exceptions such as the addresses and a bit more attention would need to be paid to the Loan Details section With a bit of work however, and depending upon how similar the documents are and what you want as the final result, it should be possible to create some code that would do a fairly complete job of parsing the data from the document. -- Hope this helps Doug Robbins - Word MVP Please reply only to the newsgroups unless you wish to avail yourself of my services on a paid, professional basis. "macropod" wrote in message ... So what sort of column arrangement are you using? And how do you keep the items aligned? -- Cheers macropod [Microsoft MVP - Word] "Dave Miles" wrote in message ... It's not a table so the option is not avail to me "macropod" wrote: Hi Dave, Have you tried Tabel|Convert|Table to Text? -- Cheers macropod [Microsoft MVP - Word] "Dave Miles" wrote in message news I have a word doc that the author created columns in and I need to get the raw text. If I save it as txt the formatting gets messed up. When I look at the page (or print it) I see somthing like: Date: xx/xx/xx Name: Fred Time: xx:xx Occupation: Tech Support When I save as text, or select, copy & paste in notepad, I see something like: Date: xx/xx/xx Name: Time: xx:xx Occupation: Fred Tech Support I have a lot more info on the page which makes it impossible for me to parse it out. Is there a way to just remove the columns and preserve the same text on the same line? Thanks! -- Dave Miles . . |
Reply |
Thread Tools | |
Display Modes | |
|
|
Similar Threads | ||||
Thread | Forum | |||
How to extract raw text from columns | Page Layout | |||
Extract by text color | Microsoft Word Help | |||
Extract Text Box info | Microsoft Word Help | |||
Extract Text from Table | Tables | |||
Extract Text out of Textboxes | Microsoft Word Help |