Reply
 
Thread Tools Display Modes
  #1   Report Post  
Jack Sons
 
Posts: n/a
Default textbox to normal text

Hi all,

I scanned a document of may pages. The result (a rtf-file) looks fine, but
in reality the text I see is not "text in a document" but text in textboxes.

I really need this to convert to text "directly in the document", like in
any "normal" document. I mean that it will be as if I typed it directly into
the document.

Of course I could select (highlight) the text in the first textbox and than
paste it to a new document (a doc-file), do the same with the text of the
next textbox, past it below the first text in the new docment etc. I tried,
did it for a lot of textboxes, but it will be very tedious to do it with the
whole document because of the many hundreds - maybe thouthands - of
textboxes, some of which contain only a single line of text..

Also there is a strange effect, when I try to "control c - control v " the
highlighted text of a textbox to the other document, suddenly it is not the
text that is copied to the new document, but the whole textbox, and so it
just moved the problem from one document to the other one.

Can anyone show me a way out? Perhaps with VBA it will be possible to
convert all textboxes at once to normal text.

I am in very urgent need for advice. Please help.

Jack Sons
The Netherlands


  #2   Report Post  
Graham Mayor
 
Posts: n/a
Default

OCR software that formats the document using text boxes is a nightmare to
edit. You might find it simpler to use the plain text output of the software
and apply your own editing.


--

Graham Mayor - Word MVP

My web site www.gmayor.com
Word MVP web site http://word.mvps.org





Jack Sons wrote:
Hi all,

I scanned a document of may pages. The result (a rtf-file) looks
fine, but in reality the text I see is not "text in a document" but
text in textboxes.

I really need this to convert to text "directly in the document",
like in any "normal" document. I mean that it will be as if I typed
it directly into the document.

Of course I could select (highlight) the text in the first textbox
and than paste it to a new document (a doc-file), do the same with
the text of the next textbox, past it below the first text in the new
docment etc. I tried, did it for a lot of textboxes, but it will be
very tedious to do it with the whole document because of the many
hundreds - maybe thouthands - of textboxes, some of which contain
only a single line of text..

Also there is a strange effect, when I try to "control c - control v
" the highlighted text of a textbox to the other document, suddenly
it is not the text that is copied to the new document, but the whole
textbox, and so it just moved the problem from one document to the
other one.

Can anyone show me a way out? Perhaps with VBA it will be possible to
convert all textboxes at once to normal text.

I am in very urgent need for advice. Please help.

Jack Sons
The Netherlands



  #3   Report Post  
Greg Maxey
 
Posts: n/a
Default

Jack Sons,

Yes Graham is probably right. I cobbled together the following which first
converts textboxes to frames and then removes the frame.

Sub ScratchMacro()
'Convert textbox text to plain text
Dim oShp As Shape
Dim i As Integer
For Each oShp In ActiveDocument.Shapes
If oShp.Type = msoTextBox Then oShp.ConvertToFrame
Next oShp
For i = ActiveDocument.Frames.Count To 1 Step -1
With ActiveDocument.Frames(i)
.Borders.Enable = False
With .Shading
.Texture = wdTextureNone
.ForegroundPatternColor = wdColorAutomatic
.BackgroundPatternColor = wdColorAutomatic
End With
.Delete
End With
Next
End Sub

--
Greg Maxey/Word MVP
A Peer in Peer to Peer Support

Jack Sons wrote:
Hi all,

I scanned a document of may pages. The result (a rtf-file) looks
fine, but in reality the text I see is not "text in a document" but
text in textboxes.

I really need this to convert to text "directly in the document",
like in any "normal" document. I mean that it will be as if I typed
it directly into the document.

Of course I could select (highlight) the text in the first textbox
and than paste it to a new document (a doc-file), do the same with
the text of the next textbox, past it below the first text in the new
docment etc. I tried, did it for a lot of textboxes, but it will be
very tedious to do it with the whole document because of the many
hundreds - maybe thouthands - of textboxes, some of which contain
only a single line of text..

Also there is a strange effect, when I try to "control c - control v
" the highlighted text of a textbox to the other document, suddenly
it is not the text that is copied to the new document, but the whole
textbox, and so it just moved the problem from one document to the
other one.

Can anyone show me a way out? Perhaps with VBA it will be possible to
convert all textboxes at once to normal text.

I am in very urgent need for advice. Please help.

Jack Sons
The Netherlands



  #4   Report Post  
Jack Sons
 
Posts: n/a
Default

Graham,

I am an absolute newbie to scanning. Some months ago I bought an at that
time rather expensive HP scanner (the one with the detached glass frame, HP
4670) and used it now for the first time. I just put the glass frame over
the book to scan som 50 pages to MS WORD and yes, after the scanning process
was completed I found on my PC screen the resulting rtf-document.

I know it sounds stupid, but I have no idea what you mean by "use the plain
text output of the software and apply your own editing".

My own editing, that's what I want. But how do I get "the plain text output
of the software"? Because I use XP there was no need to install any software
(if I remenmber it well), I just plugged in the scanner and after XP
recognised it, it would function.

Please enlighten me on how to get te plain text output, which is apparently
exactly what I need.

Thousands thanks in advance.

Jack.



"Graham Mayor" schreef in bericht
...
OCR software that formats the document using text boxes is a nightmare to
edit. You might find it simpler to use the plain text output of the

software
and apply your own editing.


--

Graham Mayor - Word MVP

My web site www.gmayor.com
Word MVP web site http://word.mvps.org





Jack Sons wrote:
Hi all,

I scanned a document of may pages. The result (a rtf-file) looks
fine, but in reality the text I see is not "text in a document" but
text in textboxes.

I really need this to convert to text "directly in the document",
like in any "normal" document. I mean that it will be as if I typed
it directly into the document.

Of course I could select (highlight) the text in the first textbox
and than paste it to a new document (a doc-file), do the same with
the text of the next textbox, past it below the first text in the new
docment etc. I tried, did it for a lot of textboxes, but it will be
very tedious to do it with the whole document because of the many
hundreds - maybe thouthands - of textboxes, some of which contain
only a single line of text..

Also there is a strange effect, when I try to "control c - control v
" the highlighted text of a textbox to the other document, suddenly
it is not the text that is copied to the new document, but the whole
textbox, and so it just moved the problem from one document to the
other one.

Can anyone show me a way out? Perhaps with VBA it will be possible to
convert all textboxes at once to normal text.

I am in very urgent need for advice. Please help.

Jack Sons
The Netherlands





  #5   Report Post  
Jack Sons
 
Posts: n/a
Default

Greg,

Thank you for your macro, it worked.

How did it work? I think it converts each textbox to a frame without
(visible) borders. What is the essential difference between a textbox and a
frame ?
And what is done with the frames, I can't find them in the result. To me it
looks like a normal document, without any objects, just characters as it
should be.

Would the result of using "the plain text output of the software", as Graham
advised, (if I would know how to do that) give a different result?

Before and after the use of the macro the resulting document is a rtf-file
(result.rtf). What does that extension inplicate? Can I rename it as a
doc-file (result.doc) without repercussions?

Last question (for now): why does the scanning process result in a textbox
output in stead of "normal text"?

Jack.


"Greg Maxey" gro.spvm@yexamg (thats my e-mail address backwards) schreef
in bericht ...
Jack Sons,

Yes Graham is probably right. I cobbled together the following which

first
converts textboxes to frames and then removes the frame.

Sub ScratchMacro()
'Convert textbox text to plain text
Dim oShp As Shape
Dim i As Integer
For Each oShp In ActiveDocument.Shapes
If oShp.Type = msoTextBox Then oShp.ConvertToFrame
Next oShp
For i = ActiveDocument.Frames.Count To 1 Step -1
With ActiveDocument.Frames(i)
.Borders.Enable = False
With .Shading
.Texture = wdTextureNone
.ForegroundPatternColor = wdColorAutomatic
.BackgroundPatternColor = wdColorAutomatic
End With
.Delete
End With
Next
End Sub

--
Greg Maxey/Word MVP
A Peer in Peer to Peer Support

Jack Sons wrote:
Hi all,

I scanned a document of may pages. The result (a rtf-file) looks
fine, but in reality the text I see is not "text in a document" but
text in textboxes.

I really need this to convert to text "directly in the document",
like in any "normal" document. I mean that it will be as if I typed
it directly into the document.

Of course I could select (highlight) the text in the first textbox
and than paste it to a new document (a doc-file), do the same with
the text of the next textbox, past it below the first text in the new
docment etc. I tried, did it for a lot of textboxes, but it will be
very tedious to do it with the whole document because of the many
hundreds - maybe thouthands - of textboxes, some of which contain
only a single line of text..

Also there is a strange effect, when I try to "control c - control v
" the highlighted text of a textbox to the other document, suddenly
it is not the text that is copied to the new document, but the whole
textbox, and so it just moved the problem from one document to the
other one.

Can anyone show me a way out? Perhaps with VBA it will be possible to
convert all textboxes at once to normal text.

I am in very urgent need for advice. Please help.

Jack Sons
The Netherlands







  #6   Report Post  
Graham Mayor
 
Posts: n/a
Default

I am not familiar with the workings of your particular scanner or software,
but there will certainly be an option to scan to text rather than Word. Try
the help file.

--

Graham Mayor - Word MVP

My web site www.gmayor.com
Word MVP web site http://word.mvps.org




Jack Sons wrote:
Graham,

I am an absolute newbie to scanning. Some months ago I bought an at
that time rather expensive HP scanner (the one with the detached
glass frame, HP 4670) and used it now for the first time. I just put
the glass frame over the book to scan som 50 pages to MS WORD and
yes, after the scanning process was completed I found on my PC screen
the resulting rtf-document.

I know it sounds stupid, but I have no idea what you mean by "use the
plain text output of the software and apply your own editing".

My own editing, that's what I want. But how do I get "the plain text
output of the software"? Because I use XP there was no need to
install any software (if I remenmber it well), I just plugged in the
scanner and after XP recognised it, it would function.

Please enlighten me on how to get te plain text output, which is
apparently exactly what I need.

Thousands thanks in advance.

Jack.



"Graham Mayor" schreef in bericht
...
OCR software that formats the document using text boxes is a
nightmare to edit. You might find it simpler to use the plain text
output of the software and apply your own editing.


--

Graham Mayor - Word MVP

My web site www.gmayor.com
Word MVP web site http://word.mvps.org





Jack Sons wrote:
Hi all,

I scanned a document of may pages. The result (a rtf-file) looks
fine, but in reality the text I see is not "text in a document" but
text in textboxes.

I really need this to convert to text "directly in the document",
like in any "normal" document. I mean that it will be as if I typed
it directly into the document.

Of course I could select (highlight) the text in the first textbox
and than paste it to a new document (a doc-file), do the same with
the text of the next textbox, past it below the first text in the
new docment etc. I tried, did it for a lot of textboxes, but it
will be very tedious to do it with the whole document because of
the many hundreds - maybe thouthands - of textboxes, some of which
contain
only a single line of text..

Also there is a strange effect, when I try to "control c - control v
" the highlighted text of a textbox to the other document, suddenly
it is not the text that is copied to the new document, but the whole
textbox, and so it just moved the problem from one document to the
other one.

Can anyone show me a way out? Perhaps with VBA it will be possible
to convert all textboxes at once to normal text.

I am in very urgent need for advice. Please help.

Jack Sons
The Netherlands



  #7   Report Post  
JulieD
 
Posts: n/a
Default

Hi Jack

just wandered off to look at your product's documentation on the HP site
(www.hp.com) and the user manual isn't that helpful however, you can
have a "real time chat" with a support technician on the site (as far as i
can tell it's free!) - so that might be the thing to do. They'll be able to
take you (hopefully) step by step the process of scanning and getting the
output you want.

Cheers
JulieD



"Jack Sons" wrote in message
...
Greg,

Thank you for your macro, it worked.

How did it work? I think it converts each textbox to a frame without
(visible) borders. What is the essential difference between a textbox and
a
frame ?
And what is done with the frames, I can't find them in the result. To me
it
looks like a normal document, without any objects, just characters as it
should be.

Would the result of using "the plain text output of the software", as
Graham
advised, (if I would know how to do that) give a different result?

Before and after the use of the macro the resulting document is a rtf-file
(result.rtf). What does that extension inplicate? Can I rename it as a
doc-file (result.doc) without repercussions?

Last question (for now): why does the scanning process result in a textbox
output in stead of "normal text"?

Jack.


"Greg Maxey" gro.spvm@yexamg (thats my e-mail address backwards) schreef
in bericht ...
Jack Sons,

Yes Graham is probably right. I cobbled together the following which

first
converts textboxes to frames and then removes the frame.

Sub ScratchMacro()
'Convert textbox text to plain text
Dim oShp As Shape
Dim i As Integer
For Each oShp In ActiveDocument.Shapes
If oShp.Type = msoTextBox Then oShp.ConvertToFrame
Next oShp
For i = ActiveDocument.Frames.Count To 1 Step -1
With ActiveDocument.Frames(i)
.Borders.Enable = False
With .Shading
.Texture = wdTextureNone
.ForegroundPatternColor = wdColorAutomatic
.BackgroundPatternColor = wdColorAutomatic
End With
.Delete
End With
Next
End Sub

--
Greg Maxey/Word MVP
A Peer in Peer to Peer Support

Jack Sons wrote:
Hi all,

I scanned a document of may pages. The result (a rtf-file) looks
fine, but in reality the text I see is not "text in a document" but
text in textboxes.

I really need this to convert to text "directly in the document",
like in any "normal" document. I mean that it will be as if I typed
it directly into the document.

Of course I could select (highlight) the text in the first textbox
and than paste it to a new document (a doc-file), do the same with
the text of the next textbox, past it below the first text in the new
docment etc. I tried, did it for a lot of textboxes, but it will be
very tedious to do it with the whole document because of the many
hundreds - maybe thouthands - of textboxes, some of which contain
only a single line of text..

Also there is a strange effect, when I try to "control c - control v
" the highlighted text of a textbox to the other document, suddenly
it is not the text that is copied to the new document, but the whole
textbox, and so it just moved the problem from one document to the
other one.

Can anyone show me a way out? Perhaps with VBA it will be possible to
convert all textboxes at once to normal text.

I am in very urgent need for advice. Please help.

Jack Sons
The Netherlands







  #8   Report Post  
Greg Maxey
 
Posts: n/a
Default

Jack,

I am afraid that my usefulness to you has about run its course :-)

The code does look at all shapes, if the shape is a textbox it converts it
to a frame, removes any borders and fill effects from the frame and then
deletes the frame leaving the text. I found through experimentation that if
I just deleted the frames then any border and fill effects in the frame
would be transfered to the text paragraphs.

I will have to defer to others as to the technical difference between a
frame and textbox.

RTF is, I think, "Raw Text Format." I have never monkeyed around very much
with differenct types of text, but why don't you just try saving your RTF
file as a Word.doc and see what happens :-)

I have a hard time figuring out the workings of a simple screw, so I can't
be of much help with the workings of your scanner. Sorry.

--
Greg Maxey/Word MVP
A Peer in Peer to Peer Support

Jack Sons wrote:
Greg,

Thank you for your macro, it worked.

How did it work? I think it converts each textbox to a frame without
(visible) borders. What is the essential difference between a textbox
and a frame ?
And what is done with the frames, I can't find them in the result. To
me it looks like a normal document, without any objects, just
characters as it should be.

Would the result of using "the plain text output of the software", as
Graham advised, (if I would know how to do that) give a different
result?

Before and after the use of the macro the resulting document is a
rtf-file (result.rtf). What does that extension inplicate? Can I
rename it as a doc-file (result.doc) without repercussions?

Last question (for now): why does the scanning process result in a
textbox output in stead of "normal text"?

Jack.


"Greg Maxey" gro.spvm@yexamg (thats my e-mail address backwards)
schreef in bericht ...
Jack Sons,

Yes Graham is probably right. I cobbled together the following which

first
converts textboxes to frames and then removes the frame.

Sub ScratchMacro()
'Convert textbox text to plain text
Dim oShp As Shape
Dim i As Integer
For Each oShp In ActiveDocument.Shapes
If oShp.Type = msoTextBox Then oShp.ConvertToFrame
Next oShp
For i = ActiveDocument.Frames.Count To 1 Step -1
With ActiveDocument.Frames(i)
.Borders.Enable = False
With .Shading
.Texture = wdTextureNone
.ForegroundPatternColor = wdColorAutomatic
.BackgroundPatternColor = wdColorAutomatic
End With
.Delete
End With
Next
End Sub

--
Greg Maxey/Word MVP
A Peer in Peer to Peer Support

Jack Sons wrote:
Hi all,

I scanned a document of may pages. The result (a rtf-file) looks
fine, but in reality the text I see is not "text in a document" but
text in textboxes.

I really need this to convert to text "directly in the document",
like in any "normal" document. I mean that it will be as if I typed
it directly into the document.

Of course I could select (highlight) the text in the first textbox
and than paste it to a new document (a doc-file), do the same with
the text of the next textbox, past it below the first text in the
new docment etc. I tried, did it for a lot of textboxes, but it
will be
very tedious to do it with the whole document because of the many
hundreds - maybe thouthands - of textboxes, some of which contain
only a single line of text..

Also there is a strange effect, when I try to "control c - control v
" the highlighted text of a textbox to the other document, suddenly
it is not the text that is copied to the new document, but the whole
textbox, and so it just moved the problem from one document to the
other one.

Can anyone show me a way out? Perhaps with VBA it will be possible
to convert all textboxes at once to normal text.

I am in very urgent need for advice. Please help.

Jack Sons
The Netherlands



  #9   Report Post  
Jay Freedman
 
Posts: n/a
Default

Hi Jack,

I'll try to follow up Greg's musings and shed what light I can...

In the macro, the line ".Delete" removes the frame and leaves the
text. That should give you what you need -- plain text -- but there
may be a wrinkle, which I'll explain after a bit.

A frame and a textbox are similar in some ways, but the big difference
is that a textbox is in the "drawing layer" while the frame is in the
"text layer". That is, Word thinks of the textbox as a sort of
picture, while a frame is more like special formatting of text. You
can include a frame as part of a paragraph style, which you can't do
with a textbox. The ability to transform a textbox into a frame is
truly magical and involves some very fancy programming inside Word.

RTF is actually "Rich Text Format", and it's a way to use a file of
plain text to describe all sorts of formatting. If you open an RTF
file in NotePad, you'll see a ton of codes in braces that describe
fonts, page locations, and lots of other things. When you tell Word to
open an RTF file, a special converter program reads all those codes
and applies the formatting to the text part, resulting in what looks
like a regular Word document. You can then save that as a .doc file,
whose structure is completely different.

When you scan a document, the initial result is just a picture of the
page. Many scanners will let you save that as a graphics file (usually
..tif or .jpg). You feed that picture into an optical character
recognition (OCR) program, which may be part of the scanner software
or may be a separately installed program. The output of the OCR is
text.

In the early days, you were doing well to get just a plain-text
reading of the document, with headers and footers and pictures all
jammed in there. As OCR programmers got better, they started offering
output of a word processing file that looked exactly (well, more or
less) like the original, with the proper fonts, bold/italic, headers,
and so forth. In order to get the stuff positioned correctly on the
page, they resorted to textboxes -- but that's really hard to deal
with when you want to edit the document.

Now the wrinkle... Every graphic object in Word's drawing layer has an
"anchor", a spot in the regular text to which it's attached. (You can
see the anchor symbol in the left margin of Page Layout view if you go
to Tools Options View and check "Object anchors", then select a
textbox or floating picture.) When you convert the textbox to a frame
and then delete the frame, the text inside gets dumped into the
regular text at the anchor position.

Many OCR programs put a single paragraph mark on a page, and anchor
all the textboxes on the page to that paragraph. When you run the
macro, the various chunks of text appear in the order in which their
anchors occurred in the original paragraph, which will probably be
more-or-less random. You're then left to untangle the spaghetti. :-(

This is why Graham's suggestion to output the scan (from the OCR
program) as plain text is a good one. You may lose the "looks just
like the original" formatting, but you'll also never create the
textboxes. This should make your editing job a whole lot simpler. Look
through the OCR program and its help file to find out where you can
turn off formatted output.

--
Regards,
Jay Freedman
Microsoft Word MVP FAQ: http://word.mvps.org

On Sat, 4 Dec 2004 13:20:18 -0500, "Greg Maxey" gro.spvm@yexamg
(thats my e-mail address backwards) wrote:

Jack,

I am afraid that my usefulness to you has about run its course :-)

The code does look at all shapes, if the shape is a textbox it converts it
to a frame, removes any borders and fill effects from the frame and then
deletes the frame leaving the text. I found through experimentation that if
I just deleted the frames then any border and fill effects in the frame
would be transfered to the text paragraphs.

I will have to defer to others as to the technical difference between a
frame and textbox.

RTF is, I think, "Raw Text Format." I have never monkeyed around very much
with differenct types of text, but why don't you just try saving your RTF
file as a Word.doc and see what happens :-)

I have a hard time figuring out the workings of a simple screw, so I can't
be of much help with the workings of your scanner. Sorry.

--
Greg Maxey/Word MVP
A Peer in Peer to Peer Support

Jack Sons wrote:
Greg,

Thank you for your macro, it worked.

How did it work? I think it converts each textbox to a frame without
(visible) borders. What is the essential difference between a textbox
and a frame ?
And what is done with the frames, I can't find them in the result. To
me it looks like a normal document, without any objects, just
characters as it should be.

Would the result of using "the plain text output of the software", as
Graham advised, (if I would know how to do that) give a different
result?

Before and after the use of the macro the resulting document is a
rtf-file (result.rtf). What does that extension inplicate? Can I
rename it as a doc-file (result.doc) without repercussions?

Last question (for now): why does the scanning process result in a
textbox output in stead of "normal text"?

Jack.


"Greg Maxey" gro.spvm@yexamg (thats my e-mail address backwards)
schreef in bericht ...
Jack Sons,

Yes Graham is probably right. I cobbled together the following which

first
converts textboxes to frames and then removes the frame.

Sub ScratchMacro()
'Convert textbox text to plain text
Dim oShp As Shape
Dim i As Integer
For Each oShp In ActiveDocument.Shapes
If oShp.Type = msoTextBox Then oShp.ConvertToFrame
Next oShp
For i = ActiveDocument.Frames.Count To 1 Step -1
With ActiveDocument.Frames(i)
.Borders.Enable = False
With .Shading
.Texture = wdTextureNone
.ForegroundPatternColor = wdColorAutomatic
.BackgroundPatternColor = wdColorAutomatic
End With
.Delete
End With
Next
End Sub

--
Greg Maxey/Word MVP
A Peer in Peer to Peer Support

Jack Sons wrote:
Hi all,

I scanned a document of may pages. The result (a rtf-file) looks
fine, but in reality the text I see is not "text in a document" but
text in textboxes.

I really need this to convert to text "directly in the document",
like in any "normal" document. I mean that it will be as if I typed
it directly into the document.

Of course I could select (highlight) the text in the first textbox
and than paste it to a new document (a doc-file), do the same with
the text of the next textbox, past it below the first text in the
new docment etc. I tried, did it for a lot of textboxes, but it
will be
very tedious to do it with the whole document because of the many
hundreds - maybe thouthands - of textboxes, some of which contain
only a single line of text..

Also there is a strange effect, when I try to "control c - control v
" the highlighted text of a textbox to the other document, suddenly
it is not the text that is copied to the new document, but the whole
textbox, and so it just moved the problem from one document to the
other one.

Can anyone show me a way out? Perhaps with VBA it will be possible
to convert all textboxes at once to normal text.

I am in very urgent need for advice. Please help.

Jack Sons
The Netherlands



  #10   Report Post  
Greg Maxey
 
Posts: n/a
Default

Jay,

Well done!! My stab at RTF was literally pulled from the nether region.

--
Greg Maxey/Word MVP
A Peer in Peer to Peer Support

Jay Freedman wrote:
Hi Jack,

I'll try to follow up Greg's musings and shed what light I can...

In the macro, the line ".Delete" removes the frame and leaves the
text. That should give you what you need -- plain text -- but there
may be a wrinkle, which I'll explain after a bit.

A frame and a textbox are similar in some ways, but the big difference
is that a textbox is in the "drawing layer" while the frame is in the
"text layer". That is, Word thinks of the textbox as a sort of
picture, while a frame is more like special formatting of text. You
can include a frame as part of a paragraph style, which you can't do
with a textbox. The ability to transform a textbox into a frame is
truly magical and involves some very fancy programming inside Word.

RTF is actually "Rich Text Format", and it's a way to use a file of
plain text to describe all sorts of formatting. If you open an RTF
file in NotePad, you'll see a ton of codes in braces that describe
fonts, page locations, and lots of other things. When you tell Word to
open an RTF file, a special converter program reads all those codes
and applies the formatting to the text part, resulting in what looks
like a regular Word document. You can then save that as a .doc file,
whose structure is completely different.

When you scan a document, the initial result is just a picture of the
page. Many scanners will let you save that as a graphics file (usually
.tif or .jpg). You feed that picture into an optical character
recognition (OCR) program, which may be part of the scanner software
or may be a separately installed program. The output of the OCR is
text.

In the early days, you were doing well to get just a plain-text
reading of the document, with headers and footers and pictures all
jammed in there. As OCR programmers got better, they started offering
output of a word processing file that looked exactly (well, more or
less) like the original, with the proper fonts, bold/italic, headers,
and so forth. In order to get the stuff positioned correctly on the
page, they resorted to textboxes -- but that's really hard to deal
with when you want to edit the document.

Now the wrinkle... Every graphic object in Word's drawing layer has an
"anchor", a spot in the regular text to which it's attached. (You can
see the anchor symbol in the left margin of Page Layout view if you go
to Tools Options View and check "Object anchors", then select a
textbox or floating picture.) When you convert the textbox to a frame
and then delete the frame, the text inside gets dumped into the
regular text at the anchor position.

Many OCR programs put a single paragraph mark on a page, and anchor
all the textboxes on the page to that paragraph. When you run the
macro, the various chunks of text appear in the order in which their
anchors occurred in the original paragraph, which will probably be
more-or-less random. You're then left to untangle the spaghetti. :-(

This is why Graham's suggestion to output the scan (from the OCR
program) as plain text is a good one. You may lose the "looks just
like the original" formatting, but you'll also never create the
textboxes. This should make your editing job a whole lot simpler. Look
through the OCR program and its help file to find out where you can
turn off formatted output.


Jack,

I am afraid that my usefulness to you has about run its course :-)

The code does look at all shapes, if the shape is a textbox it
converts it to a frame, removes any borders and fill effects from
the frame and then deletes the frame leaving the text. I found
through experimentation that if I just deleted the frames then any
border and fill effects in the frame would be transfered to the text
paragraphs.

I will have to defer to others as to the technical difference
between a frame and textbox.

RTF is, I think, "Raw Text Format." I have never monkeyed around
very much with differenct types of text, but why don't you just try
saving your RTF file as a Word.doc and see what happens :-)

I have a hard time figuring out the workings of a simple screw, so I
can't be of much help with the workings of your scanner. Sorry.

--
Greg Maxey/Word MVP
A Peer in Peer to Peer Support

Jack Sons wrote:
Greg,

Thank you for your macro, it worked.

How did it work? I think it converts each textbox to a frame without
(visible) borders. What is the essential difference between a
textbox
and a frame ?
And what is done with the frames, I can't find them in the result.
To
me it looks like a normal document, without any objects, just
characters as it should be.

Would the result of using "the plain text output of the software",
as Graham advised, (if I would know how to do that) give a different
result?

Before and after the use of the macro the resulting document is a
rtf-file (result.rtf). What does that extension inplicate? Can I
rename it as a doc-file (result.doc) without repercussions?

Last question (for now): why does the scanning process result in a
textbox output in stead of "normal text"?

Jack.


"Greg Maxey" gro.spvm@yexamg (thats my e-mail address backwards)
schreef in bericht ...
Jack Sons,

Yes Graham is probably right. I cobbled together the following
which first converts textboxes to frames and then removes the
frame.

Sub ScratchMacro()
'Convert textbox text to plain text
Dim oShp As Shape
Dim i As Integer
For Each oShp In ActiveDocument.Shapes
If oShp.Type = msoTextBox Then oShp.ConvertToFrame
Next oShp
For i = ActiveDocument.Frames.Count To 1 Step -1
With ActiveDocument.Frames(i)
.Borders.Enable = False
With .Shading
.Texture = wdTextureNone
.ForegroundPatternColor = wdColorAutomatic
.BackgroundPatternColor = wdColorAutomatic
End With
.Delete
End With
Next
End Sub

--
Greg Maxey/Word MVP
A Peer in Peer to Peer Support

Jack Sons wrote:
Hi all,

I scanned a document of may pages. The result (a rtf-file) looks
fine, but in reality the text I see is not "text in a document"
but text in textboxes.

I really need this to convert to text "directly in the document",
like in any "normal" document. I mean that it will be as if I
typed
it directly into the document.

Of course I could select (highlight) the text in the first textbox
and than paste it to a new document (a doc-file), do the same with
the text of the next textbox, past it below the first text in the
new docment etc. I tried, did it for a lot of textboxes, but it
will be
very tedious to do it with the whole document because of the many
hundreds - maybe thouthands - of textboxes, some of which contain
only a single line of text..

Also there is a strange effect, when I try to "control c -
control v " the highlighted text of a textbox to the other
document, suddenly
it is not the text that is copied to the new document, but the
whole textbox, and so it just moved the problem from one document
to the other one.

Can anyone show me a way out? Perhaps with VBA it will be possible
to convert all textboxes at once to normal text.

I am in very urgent need for advice. Please help.

Jack Sons
The Netherlands





  #11   Report Post  
Greg Maxey
 
Posts: n/a
Default

Well actually it was "figuratively" pulled.

--
Greg Maxey/Word MVP
A Peer in Peer to Peer Support

Greg Maxey wrote:
Jay,

Well done!! My stab at RTF was literally pulled from the nether
region.

Jay Freedman wrote:
Hi Jack,

I'll try to follow up Greg's musings and shed what light I can...

In the macro, the line ".Delete" removes the frame and leaves the
text. That should give you what you need -- plain text -- but there
may be a wrinkle, which I'll explain after a bit.

A frame and a textbox are similar in some ways, but the big
difference is that a textbox is in the "drawing layer" while the
frame is in the "text layer". That is, Word thinks of the textbox as
a sort of picture, while a frame is more like special formatting of
text. You can include a frame as part of a paragraph style, which
you can't do with a textbox. The ability to transform a textbox into
a frame is truly magical and involves some very fancy programming
inside Word. RTF is actually "Rich Text Format", and it's a way to use a
file of
plain text to describe all sorts of formatting. If you open an RTF
file in NotePad, you'll see a ton of codes in braces that describe
fonts, page locations, and lots of other things. When you tell Word
to open an RTF file, a special converter program reads all those
codes and applies the formatting to the text part, resulting in what
looks like a regular Word document. You can then save that as a .doc
file, whose structure is completely different.

When you scan a document, the initial result is just a picture of the
page. Many scanners will let you save that as a graphics file
(usually .tif or .jpg). You feed that picture into an optical
character recognition (OCR) program, which may be part of the
scanner software or may be a separately installed program. The
output of the OCR is text.

In the early days, you were doing well to get just a plain-text
reading of the document, with headers and footers and pictures all
jammed in there. As OCR programmers got better, they started offering
output of a word processing file that looked exactly (well, more or
less) like the original, with the proper fonts, bold/italic, headers,
and so forth. In order to get the stuff positioned correctly on the
page, they resorted to textboxes -- but that's really hard to deal
with when you want to edit the document.

Now the wrinkle... Every graphic object in Word's drawing layer has
an "anchor", a spot in the regular text to which it's attached. (You
can see the anchor symbol in the left margin of Page Layout view if
you go to Tools Options View and check "Object anchors", then
select a textbox or floating picture.) When you convert the textbox
to a frame and then delete the frame, the text inside gets dumped
into the regular text at the anchor position.

Many OCR programs put a single paragraph mark on a page, and anchor
all the textboxes on the page to that paragraph. When you run the
macro, the various chunks of text appear in the order in which their
anchors occurred in the original paragraph, which will probably be
more-or-less random. You're then left to untangle the spaghetti. :-(

This is why Graham's suggestion to output the scan (from the OCR
program) as plain text is a good one. You may lose the "looks just
like the original" formatting, but you'll also never create the
textboxes. This should make your editing job a whole lot simpler.
Look through the OCR program and its help file to find out where you
can turn off formatted output.


Jack,

I am afraid that my usefulness to you has about run its course :-)

The code does look at all shapes, if the shape is a textbox it
converts it to a frame, removes any borders and fill effects from
the frame and then deletes the frame leaving the text. I found
through experimentation that if I just deleted the frames then any
border and fill effects in the frame would be transfered to the text
paragraphs.

I will have to defer to others as to the technical difference
between a frame and textbox.

RTF is, I think, "Raw Text Format." I have never monkeyed around
very much with differenct types of text, but why don't you just try
saving your RTF file as a Word.doc and see what happens :-)

I have a hard time figuring out the workings of a simple screw, so I
can't be of much help with the workings of your scanner. Sorry.

--
Greg Maxey/Word MVP
A Peer in Peer to Peer Support

Jack Sons wrote:
Greg,

Thank you for your macro, it worked.

How did it work? I think it converts each textbox to a frame
without (visible) borders. What is the essential difference
between a textbox
and a frame ?
And what is done with the frames, I can't find them in the result.
To
me it looks like a normal document, without any objects, just
characters as it should be.

Would the result of using "the plain text output of the software",
as Graham advised, (if I would know how to do that) give a
different result?

Before and after the use of the macro the resulting document is a
rtf-file (result.rtf). What does that extension inplicate? Can I
rename it as a doc-file (result.doc) without repercussions?

Last question (for now): why does the scanning process result in a
textbox output in stead of "normal text"?

Jack.


"Greg Maxey" gro.spvm@yexamg (thats my e-mail address backwards)
schreef in bericht ...
Jack Sons,

Yes Graham is probably right. I cobbled together the following
which first converts textboxes to frames and then removes the
frame.

Sub ScratchMacro()
'Convert textbox text to plain text
Dim oShp As Shape
Dim i As Integer
For Each oShp In ActiveDocument.Shapes
If oShp.Type = msoTextBox Then oShp.ConvertToFrame
Next oShp
For i = ActiveDocument.Frames.Count To 1 Step -1
With ActiveDocument.Frames(i)
.Borders.Enable = False
With .Shading
.Texture = wdTextureNone
.ForegroundPatternColor = wdColorAutomatic
.BackgroundPatternColor = wdColorAutomatic
End With
.Delete
End With
Next
End Sub

--
Greg Maxey/Word MVP
A Peer in Peer to Peer Support

Jack Sons wrote:
Hi all,

I scanned a document of may pages. The result (a rtf-file) looks
fine, but in reality the text I see is not "text in a document"
but text in textboxes.

I really need this to convert to text "directly in the document",
like in any "normal" document. I mean that it will be as if I
typed
it directly into the document.

Of course I could select (highlight) the text in the first
textbox and than paste it to a new document (a doc-file), do the
same with the text of the next textbox, past it below the first
text in the new docment etc. I tried, did it for a lot of
textboxes, but it will be
very tedious to do it with the whole document because of the many
hundreds - maybe thouthands - of textboxes, some of which contain
only a single line of text..

Also there is a strange effect, when I try to "control c -
control v " the highlighted text of a textbox to the other
document, suddenly
it is not the text that is copied to the new document, but the
whole textbox, and so it just moved the problem from one document
to the other one.

Can anyone show me a way out? Perhaps with VBA it will be
possible to convert all textboxes at once to normal text.

I am in very urgent need for advice. Please help.

Jack Sons
The Netherlands



  #12   Report Post  
Jay Freedman
 
Posts: n/a
Default

LOL!

On Sat, 4 Dec 2004 17:06:22 -0500, "Greg Maxey" gro.spvm@yexamg
(thats my e-mail address backwards) wrote:

Well actually it was "figuratively" pulled.

--
Greg Maxey/Word MVP
A Peer in Peer to Peer Support

Greg Maxey wrote:
Jay,

Well done!! My stab at RTF was literally pulled from the nether
region.

Jay Freedman wrote:
Hi Jack,



--
Regards,
Jay Freedman
Microsoft Word MVP FAQ: http://word.mvps.org
  #13   Report Post  
Jack Sons
 
Posts: n/a
Default

Greg,

Your help was enormous!

What you wrote (I found through experimentation that if
I just deleted the frames then any border and fill effects in the frame
would be transfered to the text paragraphs.) enlightened me, now I
understand. Of course I thought that with deleting the frame and its border
one would also lose the text. Apparently that is not the case.

Thanks again.

Jack.

"Greg Maxey" gro.spvm@yexamg (thats my e-mail address backwards) schreef
in bericht ...
Jack,

I am afraid that my usefulness to you has about run its course :-)

The code does look at all shapes, if the shape is a textbox it converts it
to a frame, removes any borders and fill effects from the frame and then
deletes the frame leaving the text. I found through experimentation that

if
I just deleted the frames then any border and fill effects in the frame
would be transfered to the text paragraphs.

I will have to defer to others as to the technical difference between a
frame and textbox.

RTF is, I think, "Raw Text Format." I have never monkeyed around very

much
with differenct types of text, but why don't you just try saving your RTF
file as a Word.doc and see what happens :-)

I have a hard time figuring out the workings of a simple screw, so I can't
be of much help with the workings of your scanner. Sorry.

--
Greg Maxey/Word MVP
A Peer in Peer to Peer Support

Jack Sons wrote:
Greg,

Thank you for your macro, it worked.

How did it work? I think it converts each textbox to a frame without
(visible) borders. What is the essential difference between a textbox
and a frame ?
And what is done with the frames, I can't find them in the result. To
me it looks like a normal document, without any objects, just
characters as it should be.

Would the result of using "the plain text output of the software", as
Graham advised, (if I would know how to do that) give a different
result?

Before and after the use of the macro the resulting document is a
rtf-file (result.rtf). What does that extension inplicate? Can I
rename it as a doc-file (result.doc) without repercussions?

Last question (for now): why does the scanning process result in a
textbox output in stead of "normal text"?

Jack.


"Greg Maxey" gro.spvm@yexamg (thats my e-mail address backwards)
schreef in bericht ...
Jack Sons,

Yes Graham is probably right. I cobbled together the following which

first
converts textboxes to frames and then removes the frame.

Sub ScratchMacro()
'Convert textbox text to plain text
Dim oShp As Shape
Dim i As Integer
For Each oShp In ActiveDocument.Shapes
If oShp.Type = msoTextBox Then oShp.ConvertToFrame
Next oShp
For i = ActiveDocument.Frames.Count To 1 Step -1
With ActiveDocument.Frames(i)
.Borders.Enable = False
With .Shading
.Texture = wdTextureNone
.ForegroundPatternColor = wdColorAutomatic
.BackgroundPatternColor = wdColorAutomatic
End With
.Delete
End With
Next
End Sub

--
Greg Maxey/Word MVP
A Peer in Peer to Peer Support

Jack Sons wrote:
Hi all,

I scanned a document of may pages. The result (a rtf-file) looks
fine, but in reality the text I see is not "text in a document" but
text in textboxes.

I really need this to convert to text "directly in the document",
like in any "normal" document. I mean that it will be as if I typed
it directly into the document.

Of course I could select (highlight) the text in the first textbox
and than paste it to a new document (a doc-file), do the same with
the text of the next textbox, past it below the first text in the
new docment etc. I tried, did it for a lot of textboxes, but it
will be
very tedious to do it with the whole document because of the many
hundreds - maybe thouthands - of textboxes, some of which contain
only a single line of text..

Also there is a strange effect, when I try to "control c - control v
" the highlighted text of a textbox to the other document, suddenly
it is not the text that is copied to the new document, but the whole
textbox, and so it just moved the problem from one document to the
other one.

Can anyone show me a way out? Perhaps with VBA it will be possible
to convert all textboxes at once to normal text.

I am in very urgent need for advice. Please help.

Jack Sons
The Netherlands





  #14   Report Post  
Jack Sons
 
Posts: n/a
Default

Jay,

Your answer is absolutely clear, now I understand what is going on. I am
very grateful to you.

I hope to find the plain text output of the software when I am not so busy.
Gregs macro worked fine, as far as I could see all text came "on paper" in
the correct sequence.

Jack.
"Jay Freedman" schreef in bericht
...
Hi Jack,

I'll try to follow up Greg's musings and shed what light I can...

In the macro, the line ".Delete" removes the frame and leaves the
text. That should give you what you need -- plain text -- but there
may be a wrinkle, which I'll explain after a bit.

A frame and a textbox are similar in some ways, but the big difference
is that a textbox is in the "drawing layer" while the frame is in the
"text layer". That is, Word thinks of the textbox as a sort of
picture, while a frame is more like special formatting of text. You
can include a frame as part of a paragraph style, which you can't do
with a textbox. The ability to transform a textbox into a frame is
truly magical and involves some very fancy programming inside Word.

RTF is actually "Rich Text Format", and it's a way to use a file of
plain text to describe all sorts of formatting. If you open an RTF
file in NotePad, you'll see a ton of codes in braces that describe
fonts, page locations, and lots of other things. When you tell Word to
open an RTF file, a special converter program reads all those codes
and applies the formatting to the text part, resulting in what looks
like a regular Word document. You can then save that as a .doc file,
whose structure is completely different.

When you scan a document, the initial result is just a picture of the
page. Many scanners will let you save that as a graphics file (usually
.tif or .jpg). You feed that picture into an optical character
recognition (OCR) program, which may be part of the scanner software
or may be a separately installed program. The output of the OCR is
text.

In the early days, you were doing well to get just a plain-text
reading of the document, with headers and footers and pictures all
jammed in there. As OCR programmers got better, they started offering
output of a word processing file that looked exactly (well, more or
less) like the original, with the proper fonts, bold/italic, headers,
and so forth. In order to get the stuff positioned correctly on the
page, they resorted to textboxes -- but that's really hard to deal
with when you want to edit the document.

Now the wrinkle... Every graphic object in Word's drawing layer has an
"anchor", a spot in the regular text to which it's attached. (You can
see the anchor symbol in the left margin of Page Layout view if you go
to Tools Options View and check "Object anchors", then select a
textbox or floating picture.) When you convert the textbox to a frame
and then delete the frame, the text inside gets dumped into the
regular text at the anchor position.

Many OCR programs put a single paragraph mark on a page, and anchor
all the textboxes on the page to that paragraph. When you run the
macro, the various chunks of text appear in the order in which their
anchors occurred in the original paragraph, which will probably be
more-or-less random. You're then left to untangle the spaghetti. :-(

This is why Graham's suggestion to output the scan (from the OCR
program) as plain text is a good one. You may lose the "looks just
like the original" formatting, but you'll also never create the
textboxes. This should make your editing job a whole lot simpler. Look
through the OCR program and its help file to find out where you can
turn off formatted output.

--
Regards,
Jay Freedman
Microsoft Word MVP FAQ: http://word.mvps.org

On Sat, 4 Dec 2004 13:20:18 -0500, "Greg Maxey" gro.spvm@yexamg
(thats my e-mail address backwards) wrote:

Jack,

I am afraid that my usefulness to you has about run its course :-)

The code does look at all shapes, if the shape is a textbox it converts

it
to a frame, removes any borders and fill effects from the frame and then
deletes the frame leaving the text. I found through experimentation that

if
I just deleted the frames then any border and fill effects in the frame
would be transfered to the text paragraphs.

I will have to defer to others as to the technical difference between a
frame and textbox.

RTF is, I think, "Raw Text Format." I have never monkeyed around very

much
with differenct types of text, but why don't you just try saving your RTF
file as a Word.doc and see what happens :-)

I have a hard time figuring out the workings of a simple screw, so I

can't
be of much help with the workings of your scanner. Sorry.

--
Greg Maxey/Word MVP
A Peer in Peer to Peer Support

Jack Sons wrote:
Greg,

Thank you for your macro, it worked.

How did it work? I think it converts each textbox to a frame without
(visible) borders. What is the essential difference between a textbox
and a frame ?
And what is done with the frames, I can't find them in the result. To
me it looks like a normal document, without any objects, just
characters as it should be.

Would the result of using "the plain text output of the software", as
Graham advised, (if I would know how to do that) give a different
result?

Before and after the use of the macro the resulting document is a
rtf-file (result.rtf). What does that extension inplicate? Can I
rename it as a doc-file (result.doc) without repercussions?

Last question (for now): why does the scanning process result in a
textbox output in stead of "normal text"?

Jack.


"Greg Maxey" gro.spvm@yexamg (thats my e-mail address backwards)
schreef in bericht ...
Jack Sons,

Yes Graham is probably right. I cobbled together the following which
first
converts textboxes to frames and then removes the frame.

Sub ScratchMacro()
'Convert textbox text to plain text
Dim oShp As Shape
Dim i As Integer
For Each oShp In ActiveDocument.Shapes
If oShp.Type = msoTextBox Then oShp.ConvertToFrame
Next oShp
For i = ActiveDocument.Frames.Count To 1 Step -1
With ActiveDocument.Frames(i)
.Borders.Enable = False
With .Shading
.Texture = wdTextureNone
.ForegroundPatternColor = wdColorAutomatic
.BackgroundPatternColor = wdColorAutomatic
End With
.Delete
End With
Next
End Sub

--
Greg Maxey/Word MVP
A Peer in Peer to Peer Support

Jack Sons wrote:
Hi all,

I scanned a document of may pages. The result (a rtf-file) looks
fine, but in reality the text I see is not "text in a document" but
text in textboxes.

I really need this to convert to text "directly in the document",
like in any "normal" document. I mean that it will be as if I typed
it directly into the document.

Of course I could select (highlight) the text in the first textbox
and than paste it to a new document (a doc-file), do the same with
the text of the next textbox, past it below the first text in the
new docment etc. I tried, did it for a lot of textboxes, but it
will be
very tedious to do it with the whole document because of the many
hundreds - maybe thouthands - of textboxes, some of which contain
only a single line of text..

Also there is a strange effect, when I try to "control c - control v
" the highlighted text of a textbox to the other document, suddenly
it is not the text that is copied to the new document, but the whole
textbox, and so it just moved the problem from one document to the
other one.

Can anyone show me a way out? Perhaps with VBA it will be possible
to convert all textboxes at once to normal text.

I am in very urgent need for advice. Please help.

Jack Sons
The Netherlands





  #15   Report Post  
Greg Maxey
 
Posts: n/a
Default

Jack,

To delete the text and the frame it would be

ActiveDocument.Frames(i).Range.Delete 'deletes the text
ActiveDocument.Frames(i).Delete 'deletes the frame



--
Greg Maxey/Word MVP
A Peer in Peer to Peer Support

Jack Sons wrote:
Greg,

Your help was enormous!

What you wrote (I found through experimentation that if
I just deleted the frames then any border and fill effects in the
frame would be transfered to the text paragraphs.) enlightened me,
now I understand. Of course I thought that with deleting the frame
and its border one would also lose the text. Apparently that is not
the case.

Thanks again.

Jack.

"Greg Maxey" gro.spvm@yexamg (thats my e-mail address backwards)
schreef in bericht ...
Jack,

I am afraid that my usefulness to you has about run its course :-)

The code does look at all shapes, if the shape is a textbox it
converts it to a frame, removes any borders and fill effects from
the frame and then deletes the frame leaving the text. I found
through experimentation that if I just deleted the frames then any
border and fill effects in the frame would be transfered to the text
paragraphs.

I will have to defer to others as to the technical difference
between a frame and textbox.

RTF is, I think, "Raw Text Format." I have never monkeyed around
very much with differenct types of text, but why don't you just try
saving your RTF file as a Word.doc and see what happens :-)

I have a hard time figuring out the workings of a simple screw, so I
can't be of much help with the workings of your scanner. Sorry.

--
Greg Maxey/Word MVP
A Peer in Peer to Peer Support

Jack Sons wrote:
Greg,

Thank you for your macro, it worked.

How did it work? I think it converts each textbox to a frame without
(visible) borders. What is the essential difference between a
textbox and a frame ?
And what is done with the frames, I can't find them in the result.
To me it looks like a normal document, without any objects, just
characters as it should be.

Would the result of using "the plain text output of the software",
as Graham advised, (if I would know how to do that) give a different
result?

Before and after the use of the macro the resulting document is a
rtf-file (result.rtf). What does that extension inplicate? Can I
rename it as a doc-file (result.doc) without repercussions?

Last question (for now): why does the scanning process result in a
textbox output in stead of "normal text"?

Jack.


"Greg Maxey" gro.spvm@yexamg (thats my e-mail address backwards)
schreef in bericht ...
Jack Sons,

Yes Graham is probably right. I cobbled together the following
which first converts textboxes to frames and then removes the
frame.

Sub ScratchMacro()
'Convert textbox text to plain text
Dim oShp As Shape
Dim i As Integer
For Each oShp In ActiveDocument.Shapes
If oShp.Type = msoTextBox Then oShp.ConvertToFrame
Next oShp
For i = ActiveDocument.Frames.Count To 1 Step -1
With ActiveDocument.Frames(i)
.Borders.Enable = False
With .Shading
.Texture = wdTextureNone
.ForegroundPatternColor = wdColorAutomatic
.BackgroundPatternColor = wdColorAutomatic
End With
.Delete
End With
Next
End Sub

--
Greg Maxey/Word MVP
A Peer in Peer to Peer Support

Jack Sons wrote:
Hi all,

I scanned a document of may pages. The result (a rtf-file) looks
fine, but in reality the text I see is not "text in a document"
but text in textboxes.

I really need this to convert to text "directly in the document",
like in any "normal" document. I mean that it will be as if I
typed it directly into the document.

Of course I could select (highlight) the text in the first textbox
and than paste it to a new document (a doc-file), do the same with
the text of the next textbox, past it below the first text in the
new docment etc. I tried, did it for a lot of textboxes, but it
will be
very tedious to do it with the whole document because of the many
hundreds - maybe thouthands - of textboxes, some of which contain
only a single line of text..

Also there is a strange effect, when I try to "control c -
control v " the highlighted text of a textbox to the other
document, suddenly it is not the text that is copied to the new
document, but the whole textbox, and so it just moved the problem
from one document to the other one.

Can anyone show me a way out? Perhaps with VBA it will be possible
to convert all textboxes at once to normal text.

I am in very urgent need for advice. Please help.

Jack Sons
The Netherlands





  #16   Report Post  
Jack Sons
 
Posts: n/a
Default

Greg,

ActiveDocument.Frames(i).Delete is what your macro did, it deletes the frame
and leaves the text. I see it now in your code.

At first I thought that deleting the frame would also delete the text. But
now I understand that "frame" is just a rectangle of borderlines within the
text layer of the document.

ActiveDocument.Frames(i).Range.Delete will delete the text and keeps the
frame intact. So an empty frame (borders - if visible - without any text
inside) is the result?

Does that mean that you can't delete frame and text with one instruction?

Jack.


"Greg Maxey" gro.spvm@yexamg (thats my e-mail address backwards) schreef
in bericht ...
Jack,

To delete the text and the frame it would be

ActiveDocument.Frames(i).Range.Delete 'deletes the text
ActiveDocument.Frames(i).Delete 'deletes the frame



--
Greg Maxey/Word MVP
A Peer in Peer to Peer Support

Jack Sons wrote:
Greg,

Your help was enormous!

What you wrote (I found through experimentation that if
I just deleted the frames then any border and fill effects in the
frame would be transfered to the text paragraphs.) enlightened me,
now I understand. Of course I thought that with deleting the frame
and its border one would also lose the text. Apparently that is not
the case.

Thanks again.

Jack.

"Greg Maxey" gro.spvm@yexamg (thats my e-mail address backwards)
schreef in bericht ...
Jack,

I am afraid that my usefulness to you has about run its course :-)

The code does look at all shapes, if the shape is a textbox it
converts it to a frame, removes any borders and fill effects from
the frame and then deletes the frame leaving the text. I found
through experimentation that if I just deleted the frames then any
border and fill effects in the frame would be transfered to the text
paragraphs.

I will have to defer to others as to the technical difference
between a frame and textbox.

RTF is, I think, "Raw Text Format." I have never monkeyed around
very much with differenct types of text, but why don't you just try
saving your RTF file as a Word.doc and see what happens :-)

I have a hard time figuring out the workings of a simple screw, so I
can't be of much help with the workings of your scanner. Sorry.

--
Greg Maxey/Word MVP
A Peer in Peer to Peer Support

Jack Sons wrote:
Greg,

Thank you for your macro, it worked.

How did it work? I think it converts each textbox to a frame without
(visible) borders. What is the essential difference between a
textbox and a frame ?
And what is done with the frames, I can't find them in the result.
To me it looks like a normal document, without any objects, just
characters as it should be.

Would the result of using "the plain text output of the software",
as Graham advised, (if I would know how to do that) give a different
result?

Before and after the use of the macro the resulting document is a
rtf-file (result.rtf). What does that extension inplicate? Can I
rename it as a doc-file (result.doc) without repercussions?

Last question (for now): why does the scanning process result in a
textbox output in stead of "normal text"?

Jack.


"Greg Maxey" gro.spvm@yexamg (thats my e-mail address backwards)
schreef in bericht ...
Jack Sons,

Yes Graham is probably right. I cobbled together the following
which first converts textboxes to frames and then removes the
frame.

Sub ScratchMacro()
'Convert textbox text to plain text
Dim oShp As Shape
Dim i As Integer
For Each oShp In ActiveDocument.Shapes
If oShp.Type = msoTextBox Then oShp.ConvertToFrame
Next oShp
For i = ActiveDocument.Frames.Count To 1 Step -1
With ActiveDocument.Frames(i)
.Borders.Enable = False
With .Shading
.Texture = wdTextureNone
.ForegroundPatternColor = wdColorAutomatic
.BackgroundPatternColor = wdColorAutomatic
End With
.Delete
End With
Next
End Sub

--
Greg Maxey/Word MVP
A Peer in Peer to Peer Support

Jack Sons wrote:
Hi all,

I scanned a document of may pages. The result (a rtf-file) looks
fine, but in reality the text I see is not "text in a document"
but text in textboxes.

I really need this to convert to text "directly in the document",
like in any "normal" document. I mean that it will be as if I
typed it directly into the document.

Of course I could select (highlight) the text in the first textbox
and than paste it to a new document (a doc-file), do the same with
the text of the next textbox, past it below the first text in the
new docment etc. I tried, did it for a lot of textboxes, but it
will be
very tedious to do it with the whole document because of the many
hundreds - maybe thouthands - of textboxes, some of which contain
only a single line of text..

Also there is a strange effect, when I try to "control c -
control v " the highlighted text of a textbox to the other
document, suddenly it is not the text that is copied to the new
document, but the whole textbox, and so it just moved the problem
from one document to the other one.

Can anyone show me a way out? Perhaps with VBA it will be possible
to convert all textboxes at once to normal text.

I am in very urgent need for advice. Please help.

Jack Sons
The Netherlands





  #17   Report Post  
Greg Maxey
 
Posts: n/a
Default

Jack,

That is my understaning, but I don't like to use "can't" as I am just a
novice with VBA.

Actually I suppose that I would write the code:

With ActiveDocument.Frames(i)
.Range.Delete
.Delete
End With

If we are wrong, Jay or one of the other Senseis will be along to set us
straight ;-)

Here is a little piece of code that creates a frame with text. If your
leave out part of the code that inserts the text then you just have an empty
frame:

Public Sub FrameMaker()



Dim MyFrame As Frame

Dim MyRange As Range



Set MyRange = Selection.Range

Set MyFrame = ActiveDocument.Frames.Add(MyRange)



With MyFrame

'Add color

.Shading.BackgroundPatternColor = wdColorLightYellow

'Size it

.WidthRule = wdFrameExact

.Width = 460

.HeightRule = wdFrameExact

.Height = 20

'Position it

.RelativeHorizontalPosition = wdRelativeHorizontalPositionPage

.RelativeVerticalPosition = wdRelativeVerticalPositionParagraph

.HorizontalPosition = 75

.VerticalPosition = 0

'Add some text

.Range.InsertAfter "This is a macro created frame!"

End With

Set MyFrame = Nothing

End Sub

--
Greg Maxey/Word MVP
A Peer in Peer to Peer Support

Jack Sons wrote:
Greg,

ActiveDocument.Frames(i).Delete is what your macro did, it deletes
the frame and leaves the text. I see it now in your code.

At first I thought that deleting the frame would also delete the
text. But now I understand that "frame" is just a rectangle of
borderlines within the text layer of the document.

ActiveDocument.Frames(i).Range.Delete will delete the text and keeps
the frame intact. So an empty frame (borders - if visible - without
any text inside) is the result?

Does that mean that you can't delete frame and text with one
instruction?

Jack.


"Greg Maxey" gro.spvm@yexamg (thats my e-mail address backwards)
schreef in bericht ...
Jack,

To delete the text and the frame it would be

ActiveDocument.Frames(i).Range.Delete 'deletes the text
ActiveDocument.Frames(i).Delete 'deletes the frame



--
Greg Maxey/Word MVP
A Peer in Peer to Peer Support

Jack Sons wrote:
Greg,

Your help was enormous!

What you wrote (I found through experimentation that if
I just deleted the frames then any border and fill effects in the
frame would be transfered to the text paragraphs.) enlightened me,
now I understand. Of course I thought that with deleting the frame
and its border one would also lose the text. Apparently that is not
the case.

Thanks again.

Jack.

"Greg Maxey" gro.spvm@yexamg (thats my e-mail address backwards)
schreef in bericht ...
Jack,

I am afraid that my usefulness to you has about run its course :-)

The code does look at all shapes, if the shape is a textbox it
converts it to a frame, removes any borders and fill effects from
the frame and then deletes the frame leaving the text. I found
through experimentation that if I just deleted the frames then any
border and fill effects in the frame would be transfered to the
text paragraphs.

I will have to defer to others as to the technical difference
between a frame and textbox.

RTF is, I think, "Raw Text Format." I have never monkeyed around
very much with differenct types of text, but why don't you just try
saving your RTF file as a Word.doc and see what happens :-)

I have a hard time figuring out the workings of a simple screw, so
I can't be of much help with the workings of your scanner. Sorry.

--
Greg Maxey/Word MVP
A Peer in Peer to Peer Support

Jack Sons wrote:
Greg,

Thank you for your macro, it worked.

How did it work? I think it converts each textbox to a frame
without (visible) borders. What is the essential difference
between a
textbox and a frame ?
And what is done with the frames, I can't find them in the result.
To me it looks like a normal document, without any objects, just
characters as it should be.

Would the result of using "the plain text output of the software",
as Graham advised, (if I would know how to do that) give a
different result?

Before and after the use of the macro the resulting document is a
rtf-file (result.rtf). What does that extension inplicate? Can I
rename it as a doc-file (result.doc) without repercussions?

Last question (for now): why does the scanning process result in a
textbox output in stead of "normal text"?

Jack.


"Greg Maxey" gro.spvm@yexamg (thats my e-mail address backwards)
schreef in bericht ...
Jack Sons,

Yes Graham is probably right. I cobbled together the following
which first converts textboxes to frames and then removes the
frame.

Sub ScratchMacro()
'Convert textbox text to plain text
Dim oShp As Shape
Dim i As Integer
For Each oShp In ActiveDocument.Shapes
If oShp.Type = msoTextBox Then oShp.ConvertToFrame
Next oShp
For i = ActiveDocument.Frames.Count To 1 Step -1
With ActiveDocument.Frames(i)
.Borders.Enable = False
With .Shading
.Texture = wdTextureNone
.ForegroundPatternColor = wdColorAutomatic
.BackgroundPatternColor = wdColorAutomatic
End With
.Delete
End With
Next
End Sub

--
Greg Maxey/Word MVP
A Peer in Peer to Peer Support

Jack Sons wrote:
Hi all,

I scanned a document of may pages. The result (a rtf-file) looks
fine, but in reality the text I see is not "text in a document"
but text in textboxes.

I really need this to convert to text "directly in the
document", like in any "normal" document. I mean that it will
be as if I
typed it directly into the document.

Of course I could select (highlight) the text in the first
textbox and than paste it to a new document (a doc-file), do
the same with the text of the next textbox, past it below the
first text in the new docment etc. I tried, did it for a lot of
textboxes, but it
will be
very tedious to do it with the whole document because of the
many hundreds - maybe thouthands - of textboxes, some of which
contain only a single line of text..

Also there is a strange effect, when I try to "control c -
control v " the highlighted text of a textbox to the other
document, suddenly it is not the text that is copied to the new
document, but the whole textbox, and so it just moved the
problem from one document to the other one.

Can anyone show me a way out? Perhaps with VBA it will be
possible to convert all textboxes at once to normal text.

I am in very urgent need for advice. Please help.

Jack Sons
The Netherlands



Reply
Thread Tools
Display Modes

Posting Rules

Smilies are On
[IMG] code is On
HTML code is Off

Forum Jump

Similar Threads
Thread Thread Starter Forum Replies Last Post
Mirror Text Box [email protected] Formatting Long Documents 1 January 24th 05 08:02 PM
help, problem with text field in word 2003 Salvatore Microsoft Word Help 1 January 21st 05 04:40 PM
Text wrapping around an image or textbox CGreen Formatting Long Documents 4 January 18th 05 08:44 PM
Excel worksheet in Word linked text boxes Eduardo Oliveira Page Layout 0 January 6th 05 12:23 AM
Outline Renee Hendershott Page Layout 2 December 25th 04 03:49 PM


All times are GMT +1. The time now is 12:00 AM.

Copyright ©2000 - 2024, Jelsoft Enterprises Ltd.
Copyright ©2004-2024 Microsoft Office Word Forum - WordBanter.
The comments are property of their posters.
 

About Us

"It's about Microsoft Word"