Reply
 
Thread Tools Display Modes
  #1   Report Post  
Posted to microsoft.public.word.pagelayout
Dave Miles[_2_] Dave Miles[_2_] is offline
external usenet poster
 
Posts: 1
Default How to extract raw text from columns


I have a word doc that the author created columns in and I need to get
the raw text. If I save it as txt the formatting gets messed up.

When I look at the page (or print it) I see somthing like:


Date: xx/xx/xx Name: Fred
Time: xx:xx Occupation: Tech Support

When I save as text, or select, copy & paste in notepad, I see
something like:

Date: xx/xx/xx Name:
Time: xx:xx Occupation:
Fred
Tech Support

I have a lot more info on the page which makes it impossible for me to
parse it out. Is there a way to just remove the columns and preserve
the same text on the same line?

Thanks!




--
Dave Miles
  #2   Report Post  
Posted to microsoft.public.word.pagelayout
macropod[_2_] macropod[_2_] is offline
external usenet poster
 
Posts: 2,059
Default How to extract raw text from columns

Hi Dave,

Have you tried Tabel|Convert|Table to Text?

--
Cheers
macropod
[Microsoft MVP - Word]


"Dave Miles" wrote in message news

I have a word doc that the author created columns in and I need to get
the raw text. If I save it as txt the formatting gets messed up.

When I look at the page (or print it) I see somthing like:


Date: xx/xx/xx Name: Fred
Time: xx:xx Occupation: Tech Support

When I save as text, or select, copy & paste in notepad, I see
something like:

Date: xx/xx/xx Name:
Time: xx:xx Occupation:
Fred
Tech Support

I have a lot more info on the page which makes it impossible for me to
parse it out. Is there a way to just remove the columns and preserve
the same text on the same line?

Thanks!




--
Dave Miles

  #3   Report Post  
Posted to microsoft.public.word.pagelayout
Dave Miles[_3_] Dave Miles[_3_] is offline
external usenet poster
 
Posts: 2
Default How to extract raw text from columns

It's not a table so the option is not avail to me

"macropod" wrote:

Hi Dave,

Have you tried Tabel|Convert|Table to Text?

--
Cheers
macropod
[Microsoft MVP - Word]


"Dave Miles" wrote in message news

I have a word doc that the author created columns in and I need to get
the raw text. If I save it as txt the formatting gets messed up.

When I look at the page (or print it) I see somthing like:


Date: xx/xx/xx Name: Fred
Time: xx:xx Occupation: Tech Support

When I save as text, or select, copy & paste in notepad, I see
something like:

Date: xx/xx/xx Name:
Time: xx:xx Occupation:
Fred
Tech Support

I have a lot more info on the page which makes it impossible for me to
parse it out. Is there a way to just remove the columns and preserve
the same text on the same line?

Thanks!




--
Dave Miles

.

  #4   Report Post  
Posted to microsoft.public.word.pagelayout
Doug Robbins - Word MVP Doug Robbins - Word MVP is offline
external usenet poster
 
Posts: 8,832
Default How to extract raw text from columns

Send me a copy of the document to look at.

--
Hope this helps

Doug Robbins - Word MVP
Please reply only to the newsgroups unless you wish to avail yourself of my
services on a paid, professional basis.

"Dave Miles" wrote in message
...
It's not a table so the option is not avail to me

"macropod" wrote:

Hi Dave,

Have you tried Tabel|Convert|Table to Text?

--
Cheers
macropod
[Microsoft MVP - Word]


"Dave Miles" wrote in message
news

I have a word doc that the author created columns in and I need to get
the raw text. If I save it as txt the formatting gets messed up.

When I look at the page (or print it) I see somthing like:


Date: xx/xx/xx Name: Fred
Time: xx:xx Occupation: Tech Support

When I save as text, or select, copy & paste in notepad, I see
something like:

Date: xx/xx/xx Name:
Time: xx:xx Occupation:
Fred
Tech Support

I have a lot more info on the page which makes it impossible for me to
parse it out. Is there a way to just remove the columns and preserve
the same text on the same line?

Thanks!




--
Dave Miles

.



  #5   Report Post  
Posted to microsoft.public.word.pagelayout
macropod[_2_] macropod[_2_] is offline
external usenet poster
 
Posts: 2,059
Default How to extract raw text from columns

So what sort of column arrangement are you using? And how do you keep the items aligned?

--
Cheers
macropod
[Microsoft MVP - Word]


"Dave Miles" wrote in message ...
It's not a table so the option is not avail to me

"macropod" wrote:

Hi Dave,

Have you tried Tabel|Convert|Table to Text?

--
Cheers
macropod
[Microsoft MVP - Word]


"Dave Miles" wrote in message news

I have a word doc that the author created columns in and I need to get
the raw text. If I save it as txt the formatting gets messed up.

When I look at the page (or print it) I see somthing like:


Date: xx/xx/xx Name: Fred
Time: xx:xx Occupation: Tech Support

When I save as text, or select, copy & paste in notepad, I see
something like:

Date: xx/xx/xx Name:
Time: xx:xx Occupation:
Fred
Tech Support

I have a lot more info on the page which makes it impossible for me to
parse it out. Is there a way to just remove the columns and preserve
the same text on the same line?

Thanks!




--
Dave Miles

.



  #6   Report Post  
Posted to microsoft.public.word.pagelayout
Doug Robbins - Word MVP Doug Robbins - Word MVP is offline
external usenet poster
 
Posts: 8,832
Default How to extract raw text from columns

Hi Paul,

Dave sent me one of the documents and I believe that it may have been
produced via OCR.

I am sending him the following response:

You can clean up the document a lot by using EditReplace to first replace
^b with nothing to remove all of the Section Breaks, then ^n with nothing to
remove the column breaks, then use Ctrl+A to select everything and use the
Format Paragraph dialog to set the paragraph indents to 0 and the Special
Indent to None. Then use EditReplace again to replace ^t with ^p.



A macro could be written to perform all of the above and to further process
the documents (assuming that you have many to do), you could create a list
of the attributes for which you want to extract the values, and then use
this in a macro that iterated through that list and then inserted a tab
after each attribute. If you then used Convert Text to Table, you would
have most of the information in a two column table with the attributes in
the first column and the values in the second column. There would be a few
exceptions such as the addresses and a bit more attention would need to be
paid to the Loan Details section



With a bit of work however, and depending upon how similar the documents are
and what you want as the final result, it should be possible to create some
code that would do a fairly complete job of parsing the data from the
document.


--
Hope this helps

Doug Robbins - Word MVP
Please reply only to the newsgroups unless you wish to avail yourself of my
services on a paid, professional basis.

"macropod" wrote in message
...
So what sort of column arrangement are you using? And how do you keep the
items aligned?

--
Cheers
macropod
[Microsoft MVP - Word]


"Dave Miles" wrote in message
...
It's not a table so the option is not avail to me

"macropod" wrote:

Hi Dave,

Have you tried Tabel|Convert|Table to Text?

--
Cheers
macropod
[Microsoft MVP - Word]


"Dave Miles" wrote in message
news
I have a word doc that the author created columns in and I need to get
the raw text. If I save it as txt the formatting gets messed up.

When I look at the page (or print it) I see somthing like:


Date: xx/xx/xx Name: Fred
Time: xx:xx Occupation: Tech Support

When I save as text, or select, copy & paste in notepad, I see
something like:

Date: xx/xx/xx Name:
Time: xx:xx Occupation: Fred
Tech Support

I have a lot more info on the page which makes it impossible for me to
parse it out. Is there a way to just remove the columns and preserve
the same text on the same line?

Thanks!




--
Dave Miles
.



  #7   Report Post  
Posted to microsoft.public.word.pagelayout
Dave Miles[_3_] Dave Miles[_3_] is offline
external usenet poster
 
Posts: 2
Default How to extract raw text from columns

Hey Doug & Paul,

I think the docs may be generated by Access. I understand that the source
comes in in Excel and the reports are generated from that. Yes, the simple
answer would be to work from the Excel sheets but they contain more data
than I license so I have to take what I get......sad but true



"Doug Robbins - Word MVP" wrote:

Hi Paul,

Dave sent me one of the documents and I believe that it may have been
produced via OCR.

I am sending him the following response:

You can clean up the document a lot by using EditReplace to first replace
^b with nothing to remove all of the Section Breaks, then ^n with nothing to
remove the column breaks, then use Ctrl+A to select everything and use the
Format Paragraph dialog to set the paragraph indents to 0 and the Special
Indent to None. Then use EditReplace again to replace ^t with ^p.



A macro could be written to perform all of the above and to further process
the documents (assuming that you have many to do), you could create a list
of the attributes for which you want to extract the values, and then use
this in a macro that iterated through that list and then inserted a tab
after each attribute. If you then used Convert Text to Table, you would
have most of the information in a two column table with the attributes in
the first column and the values in the second column. There would be a few
exceptions such as the addresses and a bit more attention would need to be
paid to the Loan Details section



With a bit of work however, and depending upon how similar the documents are
and what you want as the final result, it should be possible to create some
code that would do a fairly complete job of parsing the data from the
document.


--
Hope this helps

Doug Robbins - Word MVP
Please reply only to the newsgroups unless you wish to avail yourself of my
services on a paid, professional basis.

"macropod" wrote in message
...
So what sort of column arrangement are you using? And how do you keep the
items aligned?

--
Cheers
macropod
[Microsoft MVP - Word]


"Dave Miles" wrote in message
...
It's not a table so the option is not avail to me

"macropod" wrote:

Hi Dave,

Have you tried Tabel|Convert|Table to Text?

--
Cheers
macropod
[Microsoft MVP - Word]


"Dave Miles" wrote in message
news
I have a word doc that the author created columns in and I need to get
the raw text. If I save it as txt the formatting gets messed up.

When I look at the page (or print it) I see somthing like:


Date: xx/xx/xx Name: Fred
Time: xx:xx Occupation: Tech Support

When I save as text, or select, copy & paste in notepad, I see
something like:

Date: xx/xx/xx Name:
Time: xx:xx Occupation: Fred
Tech Support

I have a lot more info on the page which makes it impossible for me to
parse it out. Is there a way to just remove the columns and preserve
the same text on the same line?

Thanks!




--
Dave Miles
.



.

Reply
Thread Tools
Display Modes

Posting Rules

Smilies are On
[IMG] code is On
HTML code is Off

Forum Jump

Similar Threads
Thread Thread Starter Forum Replies Last Post
How to extract raw text from columns Dave Miles Page Layout 0 October 27th 09 04:01 PM
Extract by text color BBDavid1 Microsoft Word Help 1 February 13th 08 05:48 PM
Extract Text Box info philr Microsoft Word Help 4 May 28th 07 07:46 AM
Extract Text from Table Norman Zhang Tables 2 January 8th 07 11:13 AM
Extract Text out of Textboxes MarioFromBelgium Microsoft Word Help 1 September 26th 06 12:16 PM


All times are GMT +1. The time now is 06:13 AM.

Copyright ©2000 - 2024, Jelsoft Enterprises Ltd.
Copyright ©2004-2024 Microsoft Office Word Forum - WordBanter.
The comments are property of their posters.
 

About Us

"It's about Microsoft Word"