Home |
Search |
Today's Posts |
|
#1
![]() |
|||
|
|||
![]()
Hi all,
I scanned a document of may pages. The result (a rtf-file) looks fine, but in reality the text I see is not "text in a document" but text in textboxes. I really need this to convert to text "directly in the document", like in any "normal" document. I mean that it will be as if I typed it directly into the document. Of course I could select (highlight) the text in the first textbox and than paste it to a new document (a doc-file), do the same with the text of the next textbox, past it below the first text in the new docment etc. I tried, did it for a lot of textboxes, but it will be very tedious to do it with the whole document because of the many hundreds - maybe thouthands - of textboxes, some of which contain only a single line of text.. Also there is a strange effect, when I try to "control c - control v " the highlighted text of a textbox to the other document, suddenly it is not the text that is copied to the new document, but the whole textbox, and so it just moved the problem from one document to the other one. Can anyone show me a way out? Perhaps with VBA it will be possible to convert all textboxes at once to normal text. I am in very urgent need for advice. Please help. Jack Sons The Netherlands |
#2
![]() |
|||
|
|||
![]()
OCR software that formats the document using text boxes is a nightmare to
edit. You might find it simpler to use the plain text output of the software and apply your own editing. -- Graham Mayor - Word MVP My web site www.gmayor.com Word MVP web site http://word.mvps.org Jack Sons wrote: Hi all, I scanned a document of may pages. The result (a rtf-file) looks fine, but in reality the text I see is not "text in a document" but text in textboxes. I really need this to convert to text "directly in the document", like in any "normal" document. I mean that it will be as if I typed it directly into the document. Of course I could select (highlight) the text in the first textbox and than paste it to a new document (a doc-file), do the same with the text of the next textbox, past it below the first text in the new docment etc. I tried, did it for a lot of textboxes, but it will be very tedious to do it with the whole document because of the many hundreds - maybe thouthands - of textboxes, some of which contain only a single line of text.. Also there is a strange effect, when I try to "control c - control v " the highlighted text of a textbox to the other document, suddenly it is not the text that is copied to the new document, but the whole textbox, and so it just moved the problem from one document to the other one. Can anyone show me a way out? Perhaps with VBA it will be possible to convert all textboxes at once to normal text. I am in very urgent need for advice. Please help. Jack Sons The Netherlands |
#3
![]() |
|||
|
|||
![]()
Graham,
I am an absolute newbie to scanning. Some months ago I bought an at that time rather expensive HP scanner (the one with the detached glass frame, HP 4670) and used it now for the first time. I just put the glass frame over the book to scan som 50 pages to MS WORD and yes, after the scanning process was completed I found on my PC screen the resulting rtf-document. I know it sounds stupid, but I have no idea what you mean by "use the plain text output of the software and apply your own editing". My own editing, that's what I want. But how do I get "the plain text output of the software"? Because I use XP there was no need to install any software (if I remenmber it well), I just plugged in the scanner and after XP recognised it, it would function. Please enlighten me on how to get te plain text output, which is apparently exactly what I need. Thousands thanks in advance. Jack. "Graham Mayor" schreef in bericht ... OCR software that formats the document using text boxes is a nightmare to edit. You might find it simpler to use the plain text output of the software and apply your own editing. -- Graham Mayor - Word MVP My web site www.gmayor.com Word MVP web site http://word.mvps.org Jack Sons wrote: Hi all, I scanned a document of may pages. The result (a rtf-file) looks fine, but in reality the text I see is not "text in a document" but text in textboxes. I really need this to convert to text "directly in the document", like in any "normal" document. I mean that it will be as if I typed it directly into the document. Of course I could select (highlight) the text in the first textbox and than paste it to a new document (a doc-file), do the same with the text of the next textbox, past it below the first text in the new docment etc. I tried, did it for a lot of textboxes, but it will be very tedious to do it with the whole document because of the many hundreds - maybe thouthands - of textboxes, some of which contain only a single line of text.. Also there is a strange effect, when I try to "control c - control v " the highlighted text of a textbox to the other document, suddenly it is not the text that is copied to the new document, but the whole textbox, and so it just moved the problem from one document to the other one. Can anyone show me a way out? Perhaps with VBA it will be possible to convert all textboxes at once to normal text. I am in very urgent need for advice. Please help. Jack Sons The Netherlands |
#4
![]() |
|||
|
|||
![]()
I am not familiar with the workings of your particular scanner or software,
but there will certainly be an option to scan to text rather than Word. Try the help file. -- Graham Mayor - Word MVP My web site www.gmayor.com Word MVP web site http://word.mvps.org Jack Sons wrote: Graham, I am an absolute newbie to scanning. Some months ago I bought an at that time rather expensive HP scanner (the one with the detached glass frame, HP 4670) and used it now for the first time. I just put the glass frame over the book to scan som 50 pages to MS WORD and yes, after the scanning process was completed I found on my PC screen the resulting rtf-document. I know it sounds stupid, but I have no idea what you mean by "use the plain text output of the software and apply your own editing". My own editing, that's what I want. But how do I get "the plain text output of the software"? Because I use XP there was no need to install any software (if I remenmber it well), I just plugged in the scanner and after XP recognised it, it would function. Please enlighten me on how to get te plain text output, which is apparently exactly what I need. Thousands thanks in advance. Jack. "Graham Mayor" schreef in bericht ... OCR software that formats the document using text boxes is a nightmare to edit. You might find it simpler to use the plain text output of the software and apply your own editing. -- Graham Mayor - Word MVP My web site www.gmayor.com Word MVP web site http://word.mvps.org Jack Sons wrote: Hi all, I scanned a document of may pages. The result (a rtf-file) looks fine, but in reality the text I see is not "text in a document" but text in textboxes. I really need this to convert to text "directly in the document", like in any "normal" document. I mean that it will be as if I typed it directly into the document. Of course I could select (highlight) the text in the first textbox and than paste it to a new document (a doc-file), do the same with the text of the next textbox, past it below the first text in the new docment etc. I tried, did it for a lot of textboxes, but it will be very tedious to do it with the whole document because of the many hundreds - maybe thouthands - of textboxes, some of which contain only a single line of text.. Also there is a strange effect, when I try to "control c - control v " the highlighted text of a textbox to the other document, suddenly it is not the text that is copied to the new document, but the whole textbox, and so it just moved the problem from one document to the other one. Can anyone show me a way out? Perhaps with VBA it will be possible to convert all textboxes at once to normal text. I am in very urgent need for advice. Please help. Jack Sons The Netherlands |
#5
![]() |
|||
|
|||
![]()
Jack Sons,
Yes Graham is probably right. I cobbled together the following which first converts textboxes to frames and then removes the frame. Sub ScratchMacro() 'Convert textbox text to plain text Dim oShp As Shape Dim i As Integer For Each oShp In ActiveDocument.Shapes If oShp.Type = msoTextBox Then oShp.ConvertToFrame Next oShp For i = ActiveDocument.Frames.Count To 1 Step -1 With ActiveDocument.Frames(i) .Borders.Enable = False With .Shading .Texture = wdTextureNone .ForegroundPatternColor = wdColorAutomatic .BackgroundPatternColor = wdColorAutomatic End With .Delete End With Next End Sub -- Greg Maxey/Word MVP A Peer in Peer to Peer Support Jack Sons wrote: Hi all, I scanned a document of may pages. The result (a rtf-file) looks fine, but in reality the text I see is not "text in a document" but text in textboxes. I really need this to convert to text "directly in the document", like in any "normal" document. I mean that it will be as if I typed it directly into the document. Of course I could select (highlight) the text in the first textbox and than paste it to a new document (a doc-file), do the same with the text of the next textbox, past it below the first text in the new docment etc. I tried, did it for a lot of textboxes, but it will be very tedious to do it with the whole document because of the many hundreds - maybe thouthands - of textboxes, some of which contain only a single line of text.. Also there is a strange effect, when I try to "control c - control v " the highlighted text of a textbox to the other document, suddenly it is not the text that is copied to the new document, but the whole textbox, and so it just moved the problem from one document to the other one. Can anyone show me a way out? Perhaps with VBA it will be possible to convert all textboxes at once to normal text. I am in very urgent need for advice. Please help. Jack Sons The Netherlands |
#6
![]() |
|||
|
|||
![]()
Greg,
Thank you for your macro, it worked. How did it work? I think it converts each textbox to a frame without (visible) borders. What is the essential difference between a textbox and a frame ? And what is done with the frames, I can't find them in the result. To me it looks like a normal document, without any objects, just characters as it should be. Would the result of using "the plain text output of the software", as Graham advised, (if I would know how to do that) give a different result? Before and after the use of the macro the resulting document is a rtf-file (result.rtf). What does that extension inplicate? Can I rename it as a doc-file (result.doc) without repercussions? Last question (for now): why does the scanning process result in a textbox output in stead of "normal text"? Jack. "Greg Maxey" gro.spvm@yexamg (thats my e-mail address backwards) schreef in bericht ... Jack Sons, Yes Graham is probably right. I cobbled together the following which first converts textboxes to frames and then removes the frame. Sub ScratchMacro() 'Convert textbox text to plain text Dim oShp As Shape Dim i As Integer For Each oShp In ActiveDocument.Shapes If oShp.Type = msoTextBox Then oShp.ConvertToFrame Next oShp For i = ActiveDocument.Frames.Count To 1 Step -1 With ActiveDocument.Frames(i) .Borders.Enable = False With .Shading .Texture = wdTextureNone .ForegroundPatternColor = wdColorAutomatic .BackgroundPatternColor = wdColorAutomatic End With .Delete End With Next End Sub -- Greg Maxey/Word MVP A Peer in Peer to Peer Support Jack Sons wrote: Hi all, I scanned a document of may pages. The result (a rtf-file) looks fine, but in reality the text I see is not "text in a document" but text in textboxes. I really need this to convert to text "directly in the document", like in any "normal" document. I mean that it will be as if I typed it directly into the document. Of course I could select (highlight) the text in the first textbox and than paste it to a new document (a doc-file), do the same with the text of the next textbox, past it below the first text in the new docment etc. I tried, did it for a lot of textboxes, but it will be very tedious to do it with the whole document because of the many hundreds - maybe thouthands - of textboxes, some of which contain only a single line of text.. Also there is a strange effect, when I try to "control c - control v " the highlighted text of a textbox to the other document, suddenly it is not the text that is copied to the new document, but the whole textbox, and so it just moved the problem from one document to the other one. Can anyone show me a way out? Perhaps with VBA it will be possible to convert all textboxes at once to normal text. I am in very urgent need for advice. Please help. Jack Sons The Netherlands |
#7
![]() |
|||
|
|||
![]()
Hi Jack
just wandered off to look at your product's documentation on the HP site (www.hp.com) and the user manual isn't that helpful ![]() have a "real time chat" with a support technician on the site (as far as i can tell it's free!) - so that might be the thing to do. They'll be able to take you (hopefully) step by step the process of scanning and getting the output you want. Cheers JulieD "Jack Sons" wrote in message ... Greg, Thank you for your macro, it worked. How did it work? I think it converts each textbox to a frame without (visible) borders. What is the essential difference between a textbox and a frame ? And what is done with the frames, I can't find them in the result. To me it looks like a normal document, without any objects, just characters as it should be. Would the result of using "the plain text output of the software", as Graham advised, (if I would know how to do that) give a different result? Before and after the use of the macro the resulting document is a rtf-file (result.rtf). What does that extension inplicate? Can I rename it as a doc-file (result.doc) without repercussions? Last question (for now): why does the scanning process result in a textbox output in stead of "normal text"? Jack. "Greg Maxey" gro.spvm@yexamg (thats my e-mail address backwards) schreef in bericht ... Jack Sons, Yes Graham is probably right. I cobbled together the following which first converts textboxes to frames and then removes the frame. Sub ScratchMacro() 'Convert textbox text to plain text Dim oShp As Shape Dim i As Integer For Each oShp In ActiveDocument.Shapes If oShp.Type = msoTextBox Then oShp.ConvertToFrame Next oShp For i = ActiveDocument.Frames.Count To 1 Step -1 With ActiveDocument.Frames(i) .Borders.Enable = False With .Shading .Texture = wdTextureNone .ForegroundPatternColor = wdColorAutomatic .BackgroundPatternColor = wdColorAutomatic End With .Delete End With Next End Sub -- Greg Maxey/Word MVP A Peer in Peer to Peer Support Jack Sons wrote: Hi all, I scanned a document of may pages. The result (a rtf-file) looks fine, but in reality the text I see is not "text in a document" but text in textboxes. I really need this to convert to text "directly in the document", like in any "normal" document. I mean that it will be as if I typed it directly into the document. Of course I could select (highlight) the text in the first textbox and than paste it to a new document (a doc-file), do the same with the text of the next textbox, past it below the first text in the new docment etc. I tried, did it for a lot of textboxes, but it will be very tedious to do it with the whole document because of the many hundreds - maybe thouthands - of textboxes, some of which contain only a single line of text.. Also there is a strange effect, when I try to "control c - control v " the highlighted text of a textbox to the other document, suddenly it is not the text that is copied to the new document, but the whole textbox, and so it just moved the problem from one document to the other one. Can anyone show me a way out? Perhaps with VBA it will be possible to convert all textboxes at once to normal text. I am in very urgent need for advice. Please help. Jack Sons The Netherlands |
#8
![]() |
|||
|
|||
![]()
Jack,
I am afraid that my usefulness to you has about run its course :-) The code does look at all shapes, if the shape is a textbox it converts it to a frame, removes any borders and fill effects from the frame and then deletes the frame leaving the text. I found through experimentation that if I just deleted the frames then any border and fill effects in the frame would be transfered to the text paragraphs. I will have to defer to others as to the technical difference between a frame and textbox. RTF is, I think, "Raw Text Format." I have never monkeyed around very much with differenct types of text, but why don't you just try saving your RTF file as a Word.doc and see what happens :-) I have a hard time figuring out the workings of a simple screw, so I can't be of much help with the workings of your scanner. Sorry. -- Greg Maxey/Word MVP A Peer in Peer to Peer Support Jack Sons wrote: Greg, Thank you for your macro, it worked. How did it work? I think it converts each textbox to a frame without (visible) borders. What is the essential difference between a textbox and a frame ? And what is done with the frames, I can't find them in the result. To me it looks like a normal document, without any objects, just characters as it should be. Would the result of using "the plain text output of the software", as Graham advised, (if I would know how to do that) give a different result? Before and after the use of the macro the resulting document is a rtf-file (result.rtf). What does that extension inplicate? Can I rename it as a doc-file (result.doc) without repercussions? Last question (for now): why does the scanning process result in a textbox output in stead of "normal text"? Jack. "Greg Maxey" gro.spvm@yexamg (thats my e-mail address backwards) schreef in bericht ... Jack Sons, Yes Graham is probably right. I cobbled together the following which first converts textboxes to frames and then removes the frame. Sub ScratchMacro() 'Convert textbox text to plain text Dim oShp As Shape Dim i As Integer For Each oShp In ActiveDocument.Shapes If oShp.Type = msoTextBox Then oShp.ConvertToFrame Next oShp For i = ActiveDocument.Frames.Count To 1 Step -1 With ActiveDocument.Frames(i) .Borders.Enable = False With .Shading .Texture = wdTextureNone .ForegroundPatternColor = wdColorAutomatic .BackgroundPatternColor = wdColorAutomatic End With .Delete End With Next End Sub -- Greg Maxey/Word MVP A Peer in Peer to Peer Support Jack Sons wrote: Hi all, I scanned a document of may pages. The result (a rtf-file) looks fine, but in reality the text I see is not "text in a document" but text in textboxes. I really need this to convert to text "directly in the document", like in any "normal" document. I mean that it will be as if I typed it directly into the document. Of course I could select (highlight) the text in the first textbox and than paste it to a new document (a doc-file), do the same with the text of the next textbox, past it below the first text in the new docment etc. I tried, did it for a lot of textboxes, but it will be very tedious to do it with the whole document because of the many hundreds - maybe thouthands - of textboxes, some of which contain only a single line of text.. Also there is a strange effect, when I try to "control c - control v " the highlighted text of a textbox to the other document, suddenly it is not the text that is copied to the new document, but the whole textbox, and so it just moved the problem from one document to the other one. Can anyone show me a way out? Perhaps with VBA it will be possible to convert all textboxes at once to normal text. I am in very urgent need for advice. Please help. Jack Sons The Netherlands |
#9
![]() |
|||
|
|||
![]()
Hi Jack,
I'll try to follow up Greg's musings and shed what light I can... In the macro, the line ".Delete" removes the frame and leaves the text. That should give you what you need -- plain text -- but there may be a wrinkle, which I'll explain after a bit. A frame and a textbox are similar in some ways, but the big difference is that a textbox is in the "drawing layer" while the frame is in the "text layer". That is, Word thinks of the textbox as a sort of picture, while a frame is more like special formatting of text. You can include a frame as part of a paragraph style, which you can't do with a textbox. The ability to transform a textbox into a frame is truly magical and involves some very fancy programming inside Word. RTF is actually "Rich Text Format", and it's a way to use a file of plain text to describe all sorts of formatting. If you open an RTF file in NotePad, you'll see a ton of codes in braces that describe fonts, page locations, and lots of other things. When you tell Word to open an RTF file, a special converter program reads all those codes and applies the formatting to the text part, resulting in what looks like a regular Word document. You can then save that as a .doc file, whose structure is completely different. When you scan a document, the initial result is just a picture of the page. Many scanners will let you save that as a graphics file (usually ..tif or .jpg). You feed that picture into an optical character recognition (OCR) program, which may be part of the scanner software or may be a separately installed program. The output of the OCR is text. In the early days, you were doing well to get just a plain-text reading of the document, with headers and footers and pictures all jammed in there. As OCR programmers got better, they started offering output of a word processing file that looked exactly (well, more or less) like the original, with the proper fonts, bold/italic, headers, and so forth. In order to get the stuff positioned correctly on the page, they resorted to textboxes -- but that's really hard to deal with when you want to edit the document. Now the wrinkle... Every graphic object in Word's drawing layer has an "anchor", a spot in the regular text to which it's attached. (You can see the anchor symbol in the left margin of Page Layout view if you go to Tools Options View and check "Object anchors", then select a textbox or floating picture.) When you convert the textbox to a frame and then delete the frame, the text inside gets dumped into the regular text at the anchor position. Many OCR programs put a single paragraph mark on a page, and anchor all the textboxes on the page to that paragraph. When you run the macro, the various chunks of text appear in the order in which their anchors occurred in the original paragraph, which will probably be more-or-less random. You're then left to untangle the spaghetti. :-( This is why Graham's suggestion to output the scan (from the OCR program) as plain text is a good one. You may lose the "looks just like the original" formatting, but you'll also never create the textboxes. This should make your editing job a whole lot simpler. Look through the OCR program and its help file to find out where you can turn off formatted output. -- Regards, Jay Freedman Microsoft Word MVP FAQ: http://word.mvps.org On Sat, 4 Dec 2004 13:20:18 -0500, "Greg Maxey" gro.spvm@yexamg (thats my e-mail address backwards) wrote: Jack, I am afraid that my usefulness to you has about run its course :-) The code does look at all shapes, if the shape is a textbox it converts it to a frame, removes any borders and fill effects from the frame and then deletes the frame leaving the text. I found through experimentation that if I just deleted the frames then any border and fill effects in the frame would be transfered to the text paragraphs. I will have to defer to others as to the technical difference between a frame and textbox. RTF is, I think, "Raw Text Format." I have never monkeyed around very much with differenct types of text, but why don't you just try saving your RTF file as a Word.doc and see what happens :-) I have a hard time figuring out the workings of a simple screw, so I can't be of much help with the workings of your scanner. Sorry. -- Greg Maxey/Word MVP A Peer in Peer to Peer Support Jack Sons wrote: Greg, Thank you for your macro, it worked. How did it work? I think it converts each textbox to a frame without (visible) borders. What is the essential difference between a textbox and a frame ? And what is done with the frames, I can't find them in the result. To me it looks like a normal document, without any objects, just characters as it should be. Would the result of using "the plain text output of the software", as Graham advised, (if I would know how to do that) give a different result? Before and after the use of the macro the resulting document is a rtf-file (result.rtf). What does that extension inplicate? Can I rename it as a doc-file (result.doc) without repercussions? Last question (for now): why does the scanning process result in a textbox output in stead of "normal text"? Jack. "Greg Maxey" gro.spvm@yexamg (thats my e-mail address backwards) schreef in bericht ... Jack Sons, Yes Graham is probably right. I cobbled together the following which first converts textboxes to frames and then removes the frame. Sub ScratchMacro() 'Convert textbox text to plain text Dim oShp As Shape Dim i As Integer For Each oShp In ActiveDocument.Shapes If oShp.Type = msoTextBox Then oShp.ConvertToFrame Next oShp For i = ActiveDocument.Frames.Count To 1 Step -1 With ActiveDocument.Frames(i) .Borders.Enable = False With .Shading .Texture = wdTextureNone .ForegroundPatternColor = wdColorAutomatic .BackgroundPatternColor = wdColorAutomatic End With .Delete End With Next End Sub -- Greg Maxey/Word MVP A Peer in Peer to Peer Support Jack Sons wrote: Hi all, I scanned a document of may pages. The result (a rtf-file) looks fine, but in reality the text I see is not "text in a document" but text in textboxes. I really need this to convert to text "directly in the document", like in any "normal" document. I mean that it will be as if I typed it directly into the document. Of course I could select (highlight) the text in the first textbox and than paste it to a new document (a doc-file), do the same with the text of the next textbox, past it below the first text in the new docment etc. I tried, did it for a lot of textboxes, but it will be very tedious to do it with the whole document because of the many hundreds - maybe thouthands - of textboxes, some of which contain only a single line of text.. Also there is a strange effect, when I try to "control c - control v " the highlighted text of a textbox to the other document, suddenly it is not the text that is copied to the new document, but the whole textbox, and so it just moved the problem from one document to the other one. Can anyone show me a way out? Perhaps with VBA it will be possible to convert all textboxes at once to normal text. I am in very urgent need for advice. Please help. Jack Sons The Netherlands |
#10
![]() |
|||
|
|||
![]()
Jay,
Well done!! My stab at RTF was literally pulled from the nether region. -- Greg Maxey/Word MVP A Peer in Peer to Peer Support Jay Freedman wrote: Hi Jack, I'll try to follow up Greg's musings and shed what light I can... In the macro, the line ".Delete" removes the frame and leaves the text. That should give you what you need -- plain text -- but there may be a wrinkle, which I'll explain after a bit. A frame and a textbox are similar in some ways, but the big difference is that a textbox is in the "drawing layer" while the frame is in the "text layer". That is, Word thinks of the textbox as a sort of picture, while a frame is more like special formatting of text. You can include a frame as part of a paragraph style, which you can't do with a textbox. The ability to transform a textbox into a frame is truly magical and involves some very fancy programming inside Word. RTF is actually "Rich Text Format", and it's a way to use a file of plain text to describe all sorts of formatting. If you open an RTF file in NotePad, you'll see a ton of codes in braces that describe fonts, page locations, and lots of other things. When you tell Word to open an RTF file, a special converter program reads all those codes and applies the formatting to the text part, resulting in what looks like a regular Word document. You can then save that as a .doc file, whose structure is completely different. When you scan a document, the initial result is just a picture of the page. Many scanners will let you save that as a graphics file (usually .tif or .jpg). You feed that picture into an optical character recognition (OCR) program, which may be part of the scanner software or may be a separately installed program. The output of the OCR is text. In the early days, you were doing well to get just a plain-text reading of the document, with headers and footers and pictures all jammed in there. As OCR programmers got better, they started offering output of a word processing file that looked exactly (well, more or less) like the original, with the proper fonts, bold/italic, headers, and so forth. In order to get the stuff positioned correctly on the page, they resorted to textboxes -- but that's really hard to deal with when you want to edit the document. Now the wrinkle... Every graphic object in Word's drawing layer has an "anchor", a spot in the regular text to which it's attached. (You can see the anchor symbol in the left margin of Page Layout view if you go to Tools Options View and check "Object anchors", then select a textbox or floating picture.) When you convert the textbox to a frame and then delete the frame, the text inside gets dumped into the regular text at the anchor position. Many OCR programs put a single paragraph mark on a page, and anchor all the textboxes on the page to that paragraph. When you run the macro, the various chunks of text appear in the order in which their anchors occurred in the original paragraph, which will probably be more-or-less random. You're then left to untangle the spaghetti. :-( This is why Graham's suggestion to output the scan (from the OCR program) as plain text is a good one. You may lose the "looks just like the original" formatting, but you'll also never create the textboxes. This should make your editing job a whole lot simpler. Look through the OCR program and its help file to find out where you can turn off formatted output. Jack, I am afraid that my usefulness to you has about run its course :-) The code does look at all shapes, if the shape is a textbox it converts it to a frame, removes any borders and fill effects from the frame and then deletes the frame leaving the text. I found through experimentation that if I just deleted the frames then any border and fill effects in the frame would be transfered to the text paragraphs. I will have to defer to others as to the technical difference between a frame and textbox. RTF is, I think, "Raw Text Format." I have never monkeyed around very much with differenct types of text, but why don't you just try saving your RTF file as a Word.doc and see what happens :-) I have a hard time figuring out the workings of a simple screw, so I can't be of much help with the workings of your scanner. Sorry. -- Greg Maxey/Word MVP A Peer in Peer to Peer Support Jack Sons wrote: Greg, Thank you for your macro, it worked. How did it work? I think it converts each textbox to a frame without (visible) borders. What is the essential difference between a textbox and a frame ? And what is done with the frames, I can't find them in the result. To me it looks like a normal document, without any objects, just characters as it should be. Would the result of using "the plain text output of the software", as Graham advised, (if I would know how to do that) give a different result? Before and after the use of the macro the resulting document is a rtf-file (result.rtf). What does that extension inplicate? Can I rename it as a doc-file (result.doc) without repercussions? Last question (for now): why does the scanning process result in a textbox output in stead of "normal text"? Jack. "Greg Maxey" gro.spvm@yexamg (thats my e-mail address backwards) schreef in bericht ... Jack Sons, Yes Graham is probably right. I cobbled together the following which first converts textboxes to frames and then removes the frame. Sub ScratchMacro() 'Convert textbox text to plain text Dim oShp As Shape Dim i As Integer For Each oShp In ActiveDocument.Shapes If oShp.Type = msoTextBox Then oShp.ConvertToFrame Next oShp For i = ActiveDocument.Frames.Count To 1 Step -1 With ActiveDocument.Frames(i) .Borders.Enable = False With .Shading .Texture = wdTextureNone .ForegroundPatternColor = wdColorAutomatic .BackgroundPatternColor = wdColorAutomatic End With .Delete End With Next End Sub -- Greg Maxey/Word MVP A Peer in Peer to Peer Support Jack Sons wrote: Hi all, I scanned a document of may pages. The result (a rtf-file) looks fine, but in reality the text I see is not "text in a document" but text in textboxes. I really need this to convert to text "directly in the document", like in any "normal" document. I mean that it will be as if I typed it directly into the document. Of course I could select (highlight) the text in the first textbox and than paste it to a new document (a doc-file), do the same with the text of the next textbox, past it below the first text in the new docment etc. I tried, did it for a lot of textboxes, but it will be very tedious to do it with the whole document because of the many hundreds - maybe thouthands - of textboxes, some of which contain only a single line of text.. Also there is a strange effect, when I try to "control c - control v " the highlighted text of a textbox to the other document, suddenly it is not the text that is copied to the new document, but the whole textbox, and so it just moved the problem from one document to the other one. Can anyone show me a way out? Perhaps with VBA it will be possible to convert all textboxes at once to normal text. I am in very urgent need for advice. Please help. Jack Sons The Netherlands |
#11
![]() |
|||
|
|||
![]()
Jay,
Your answer is absolutely clear, now I understand what is going on. I am very grateful to you. I hope to find the plain text output of the software when I am not so busy. Gregs macro worked fine, as far as I could see all text came "on paper" in the correct sequence. Jack. "Jay Freedman" schreef in bericht ... Hi Jack, I'll try to follow up Greg's musings and shed what light I can... In the macro, the line ".Delete" removes the frame and leaves the text. That should give you what you need -- plain text -- but there may be a wrinkle, which I'll explain after a bit. A frame and a textbox are similar in some ways, but the big difference is that a textbox is in the "drawing layer" while the frame is in the "text layer". That is, Word thinks of the textbox as a sort of picture, while a frame is more like special formatting of text. You can include a frame as part of a paragraph style, which you can't do with a textbox. The ability to transform a textbox into a frame is truly magical and involves some very fancy programming inside Word. RTF is actually "Rich Text Format", and it's a way to use a file of plain text to describe all sorts of formatting. If you open an RTF file in NotePad, you'll see a ton of codes in braces that describe fonts, page locations, and lots of other things. When you tell Word to open an RTF file, a special converter program reads all those codes and applies the formatting to the text part, resulting in what looks like a regular Word document. You can then save that as a .doc file, whose structure is completely different. When you scan a document, the initial result is just a picture of the page. Many scanners will let you save that as a graphics file (usually .tif or .jpg). You feed that picture into an optical character recognition (OCR) program, which may be part of the scanner software or may be a separately installed program. The output of the OCR is text. In the early days, you were doing well to get just a plain-text reading of the document, with headers and footers and pictures all jammed in there. As OCR programmers got better, they started offering output of a word processing file that looked exactly (well, more or less) like the original, with the proper fonts, bold/italic, headers, and so forth. In order to get the stuff positioned correctly on the page, they resorted to textboxes -- but that's really hard to deal with when you want to edit the document. Now the wrinkle... Every graphic object in Word's drawing layer has an "anchor", a spot in the regular text to which it's attached. (You can see the anchor symbol in the left margin of Page Layout view if you go to Tools Options View and check "Object anchors", then select a textbox or floating picture.) When you convert the textbox to a frame and then delete the frame, the text inside gets dumped into the regular text at the anchor position. Many OCR programs put a single paragraph mark on a page, and anchor all the textboxes on the page to that paragraph. When you run the macro, the various chunks of text appear in the order in which their anchors occurred in the original paragraph, which will probably be more-or-less random. You're then left to untangle the spaghetti. :-( This is why Graham's suggestion to output the scan (from the OCR program) as plain text is a good one. You may lose the "looks just like the original" formatting, but you'll also never create the textboxes. This should make your editing job a whole lot simpler. Look through the OCR program and its help file to find out where you can turn off formatted output. -- Regards, Jay Freedman Microsoft Word MVP FAQ: http://word.mvps.org On Sat, 4 Dec 2004 13:20:18 -0500, "Greg Maxey" gro.spvm@yexamg (thats my e-mail address backwards) wrote: Jack, I am afraid that my usefulness to you has about run its course :-) The code does look at all shapes, if the shape is a textbox it converts it to a frame, removes any borders and fill effects from the frame and then deletes the frame leaving the text. I found through experimentation that if I just deleted the frames then any border and fill effects in the frame would be transfered to the text paragraphs. I will have to defer to others as to the technical difference between a frame and textbox. RTF is, I think, "Raw Text Format." I have never monkeyed around very much with differenct types of text, but why don't you just try saving your RTF file as a Word.doc and see what happens :-) I have a hard time figuring out the workings of a simple screw, so I can't be of much help with the workings of your scanner. Sorry. -- Greg Maxey/Word MVP A Peer in Peer to Peer Support Jack Sons wrote: Greg, Thank you for your macro, it worked. How did it work? I think it converts each textbox to a frame without (visible) borders. What is the essential difference between a textbox and a frame ? And what is done with the frames, I can't find them in the result. To me it looks like a normal document, without any objects, just characters as it should be. Would the result of using "the plain text output of the software", as Graham advised, (if I would know how to do that) give a different result? Before and after the use of the macro the resulting document is a rtf-file (result.rtf). What does that extension inplicate? Can I rename it as a doc-file (result.doc) without repercussions? Last question (for now): why does the scanning process result in a textbox output in stead of "normal text"? Jack. "Greg Maxey" gro.spvm@yexamg (thats my e-mail address backwards) schreef in bericht ... Jack Sons, Yes Graham is probably right. I cobbled together the following which first converts textboxes to frames and then removes the frame. Sub ScratchMacro() 'Convert textbox text to plain text Dim oShp As Shape Dim i As Integer For Each oShp In ActiveDocument.Shapes If oShp.Type = msoTextBox Then oShp.ConvertToFrame Next oShp For i = ActiveDocument.Frames.Count To 1 Step -1 With ActiveDocument.Frames(i) .Borders.Enable = False With .Shading .Texture = wdTextureNone .ForegroundPatternColor = wdColorAutomatic .BackgroundPatternColor = wdColorAutomatic End With .Delete End With Next End Sub -- Greg Maxey/Word MVP A Peer in Peer to Peer Support Jack Sons wrote: Hi all, I scanned a document of may pages. The result (a rtf-file) looks fine, but in reality the text I see is not "text in a document" but text in textboxes. I really need this to convert to text "directly in the document", like in any "normal" document. I mean that it will be as if I typed it directly into the document. Of course I could select (highlight) the text in the first textbox and than paste it to a new document (a doc-file), do the same with the text of the next textbox, past it below the first text in the new docment etc. I tried, did it for a lot of textboxes, but it will be very tedious to do it with the whole document because of the many hundreds - maybe thouthands - of textboxes, some of which contain only a single line of text.. Also there is a strange effect, when I try to "control c - control v " the highlighted text of a textbox to the other document, suddenly it is not the text that is copied to the new document, but the whole textbox, and so it just moved the problem from one document to the other one. Can anyone show me a way out? Perhaps with VBA it will be possible to convert all textboxes at once to normal text. I am in very urgent need for advice. Please help. Jack Sons The Netherlands |
#12
![]() |
|||
|
|||
![]()
Greg,
Your help was enormous! What you wrote (I found through experimentation that if I just deleted the frames then any border and fill effects in the frame would be transfered to the text paragraphs.) enlightened me, now I understand. Of course I thought that with deleting the frame and its border one would also lose the text. Apparently that is not the case. Thanks again. Jack. "Greg Maxey" gro.spvm@yexamg (thats my e-mail address backwards) schreef in bericht ... Jack, I am afraid that my usefulness to you has about run its course :-) The code does look at all shapes, if the shape is a textbox it converts it to a frame, removes any borders and fill effects from the frame and then deletes the frame leaving the text. I found through experimentation that if I just deleted the frames then any border and fill effects in the frame would be transfered to the text paragraphs. I will have to defer to others as to the technical difference between a frame and textbox. RTF is, I think, "Raw Text Format." I have never monkeyed around very much with differenct types of text, but why don't you just try saving your RTF file as a Word.doc and see what happens :-) I have a hard time figuring out the workings of a simple screw, so I can't be of much help with the workings of your scanner. Sorry. -- Greg Maxey/Word MVP A Peer in Peer to Peer Support Jack Sons wrote: Greg, Thank you for your macro, it worked. How did it work? I think it converts each textbox to a frame without (visible) borders. What is the essential difference between a textbox and a frame ? And what is done with the frames, I can't find them in the result. To me it looks like a normal document, without any objects, just characters as it should be. Would the result of using "the plain text output of the software", as Graham advised, (if I would know how to do that) give a different result? Before and after the use of the macro the resulting document is a rtf-file (result.rtf). What does that extension inplicate? Can I rename it as a doc-file (result.doc) without repercussions? Last question (for now): why does the scanning process result in a textbox output in stead of "normal text"? Jack. "Greg Maxey" gro.spvm@yexamg (thats my e-mail address backwards) schreef in bericht ... Jack Sons, Yes Graham is probably right. I cobbled together the following which first converts textboxes to frames and then removes the frame. Sub ScratchMacro() 'Convert textbox text to plain text Dim oShp As Shape Dim i As Integer For Each oShp In ActiveDocument.Shapes If oShp.Type = msoTextBox Then oShp.ConvertToFrame Next oShp For i = ActiveDocument.Frames.Count To 1 Step -1 With ActiveDocument.Frames(i) .Borders.Enable = False With .Shading .Texture = wdTextureNone .ForegroundPatternColor = wdColorAutomatic .BackgroundPatternColor = wdColorAutomatic End With .Delete End With Next End Sub -- Greg Maxey/Word MVP A Peer in Peer to Peer Support Jack Sons wrote: Hi all, I scanned a document of may pages. The result (a rtf-file) looks fine, but in reality the text I see is not "text in a document" but text in textboxes. I really need this to convert to text "directly in the document", like in any "normal" document. I mean that it will be as if I typed it directly into the document. Of course I could select (highlight) the text in the first textbox and than paste it to a new document (a doc-file), do the same with the text of the next textbox, past it below the first text in the new docment etc. I tried, did it for a lot of textboxes, but it will be very tedious to do it with the whole document because of the many hundreds - maybe thouthands - of textboxes, some of which contain only a single line of text.. Also there is a strange effect, when I try to "control c - control v " the highlighted text of a textbox to the other document, suddenly it is not the text that is copied to the new document, but the whole textbox, and so it just moved the problem from one document to the other one. Can anyone show me a way out? Perhaps with VBA it will be possible to convert all textboxes at once to normal text. I am in very urgent need for advice. Please help. Jack Sons The Netherlands |
#13
![]() |
|||
|
|||
![]()
Jack,
To delete the text and the frame it would be ActiveDocument.Frames(i).Range.Delete 'deletes the text ActiveDocument.Frames(i).Delete 'deletes the frame -- Greg Maxey/Word MVP A Peer in Peer to Peer Support Jack Sons wrote: Greg, Your help was enormous! What you wrote (I found through experimentation that if I just deleted the frames then any border and fill effects in the frame would be transfered to the text paragraphs.) enlightened me, now I understand. Of course I thought that with deleting the frame and its border one would also lose the text. Apparently that is not the case. Thanks again. Jack. "Greg Maxey" gro.spvm@yexamg (thats my e-mail address backwards) schreef in bericht ... Jack, I am afraid that my usefulness to you has about run its course :-) The code does look at all shapes, if the shape is a textbox it converts it to a frame, removes any borders and fill effects from the frame and then deletes the frame leaving the text. I found through experimentation that if I just deleted the frames then any border and fill effects in the frame would be transfered to the text paragraphs. I will have to defer to others as to the technical difference between a frame and textbox. RTF is, I think, "Raw Text Format." I have never monkeyed around very much with differenct types of text, but why don't you just try saving your RTF file as a Word.doc and see what happens :-) I have a hard time figuring out the workings of a simple screw, so I can't be of much help with the workings of your scanner. Sorry. -- Greg Maxey/Word MVP A Peer in Peer to Peer Support Jack Sons wrote: Greg, Thank you for your macro, it worked. How did it work? I think it converts each textbox to a frame without (visible) borders. What is the essential difference between a textbox and a frame ? And what is done with the frames, I can't find them in the result. To me it looks like a normal document, without any objects, just characters as it should be. Would the result of using "the plain text output of the software", as Graham advised, (if I would know how to do that) give a different result? Before and after the use of the macro the resulting document is a rtf-file (result.rtf). What does that extension inplicate? Can I rename it as a doc-file (result.doc) without repercussions? Last question (for now): why does the scanning process result in a textbox output in stead of "normal text"? Jack. "Greg Maxey" gro.spvm@yexamg (thats my e-mail address backwards) schreef in bericht ... Jack Sons, Yes Graham is probably right. I cobbled together the following which first converts textboxes to frames and then removes the frame. Sub ScratchMacro() 'Convert textbox text to plain text Dim oShp As Shape Dim i As Integer For Each oShp In ActiveDocument.Shapes If oShp.Type = msoTextBox Then oShp.ConvertToFrame Next oShp For i = ActiveDocument.Frames.Count To 1 Step -1 With ActiveDocument.Frames(i) .Borders.Enable = False With .Shading .Texture = wdTextureNone .ForegroundPatternColor = wdColorAutomatic .BackgroundPatternColor = wdColorAutomatic End With .Delete End With Next End Sub -- Greg Maxey/Word MVP A Peer in Peer to Peer Support Jack Sons wrote: Hi all, I scanned a document of may pages. The result (a rtf-file) looks fine, but in reality the text I see is not "text in a document" but text in textboxes. I really need this to convert to text "directly in the document", like in any "normal" document. I mean that it will be as if I typed it directly into the document. Of course I could select (highlight) the text in the first textbox and than paste it to a new document (a doc-file), do the same with the text of the next textbox, past it below the first text in the new docment etc. I tried, did it for a lot of textboxes, but it will be very tedious to do it with the whole document because of the many hundreds - maybe thouthands - of textboxes, some of which contain only a single line of text.. Also there is a strange effect, when I try to "control c - control v " the highlighted text of a textbox to the other document, suddenly it is not the text that is copied to the new document, but the whole textbox, and so it just moved the problem from one document to the other one. Can anyone show me a way out? Perhaps with VBA it will be possible to convert all textboxes at once to normal text. I am in very urgent need for advice. Please help. Jack Sons The Netherlands |
Reply |
Thread Tools | |
Display Modes | |
|
|
![]() |
||||
Thread | Forum | |||
Mirror Text Box | Formatting Long Documents | |||
help, problem with text field in word 2003 | Microsoft Word Help | |||
Text wrapping around an image or textbox | Formatting Long Documents | |||
Excel worksheet in Word linked text boxes | Page Layout | |||
Outline | Page Layout |