Home |
Search |
Today's Posts |
#1
![]()
Posted to microsoft.public.word.formatting.longdocs
|
|||
|
|||
![]()
Hi folks,
I often am dealing with Word files I didn't create. That is, a human being other than myself created them (perhaps on Macs), or Save As/Export filters created them. For example, Saving As .doc/.rtf out of Adobe Acrobat, or OCR scanning software that saves as .doc/rtf. Because of this I often run into fake returns, which I can manipulate to some degree, in Find & Replace with ^p and ^13. However, if I select a group of these paragraphs and apply a Style to them, their fakeness is revealed and the block of paragraphs is treated as if it were one paragraph. I used to see and solve a problem like this, in which a para mark had a ^10 before or after it, but I haven't seen those ^10s since a couple Word or operating system versions ago. Any ideas? WilliamW |
#2
![]()
Posted to microsoft.public.word.formatting.longdocs
|
|||
|
|||
![]()
Hi William,
Yes, as you say that's a problem that has been around a while. The issues with the non-working ¶ para marks should go away once you save the file in a native Word format (doc, rtf, docx...). Or, if that's not a good option, replace ^13 with ^p, and maybe ^10 with ^p to be on the safe side. I do that routinely at the beginning of macros that process text files or other non-word files. Regards, Klaus "WilliamWMeyer" wrote: Hi folks, I often am dealing with Word files I didn't create. That is, a human being other than myself created them (perhaps on Macs), or Save As/Export filters created them. For example, Saving As .doc/.rtf out of Adobe Acrobat, or OCR scanning software that saves as .doc/rtf. Because of this I often run into fake returns, which I can manipulate to some degree, in Find & Replace with ^p and ^13. However, if I select a group of these paragraphs and apply a Style to them, their fakeness is revealed and the block of paragraphs is treated as if it were one paragraph. I used to see and solve a problem like this, in which a para mark had a ^10 before or after it, but I haven't seen those ^10s since a couple Word or operating system versions ago. Any ideas? WilliamW |
#3
![]()
Posted to microsoft.public.word.formatting.longdocs
|
|||
|
|||
![]() "Klaus Linke" wrote in message ... Hi William, Yes, as you say that's a problem that has been around a while. The issues with the non-working ¶ para marks should go away once you save the file in a native Word format (doc, rtf, docx...). Or, if that's not a good option, replace ^13 with ^p, and maybe ^10 with ^p to be on the safe side. I do that routinely at the beginning of macros that process text files or other non-word files. Regards, Klaus Hi Klaus, Thanks for responding. Yes, I remembered after posting that I asked this before (!), and that the response you gave me then about changing ^13 to ^p, does solve the problem. (^13 to ^p does the same thing, either with *wildcards* checked or without.) The main thing that throws me, is that I use a wonderful macro I got from a Microsoft-provided template called Macros8 that was supplied with Word several versions back. That template has a number of useful macros, but the one I use most is called ANSIValue, which displays the ANSI values of a swiped group of characters. Therefore, when I swipe these four characters surrounding a para return: e.¶ G I get 101 46 13 71 Regardless of whether it's fake paras or real paras I get that 13 -- and only that 13. However, in earlier days of Word I would see 10 13 often enough (or 13 10, I don't remember the order), but I haven't seen a true 10 13 this way in several years. If you look at the Word file in a text editor there's no sign of a difference, so I figure the difference must be in the header of the Word file that specifies that there is one type of para-break encoding, when in fact the file contains mixed para-break encodings. Along these same lines of getting under the hood of what's happening in the Word file, I'd love to have a better understanding of how to determine when files contain Unicode versus when they don't, whether files sometimes *think* they contain Unicode but in fact they don't and vice versa, etc. --WilliamW "WilliamWMeyer" wrote: Hi folks, I often am dealing with Word files I didn't create. That is, a human being other than myself created them (perhaps on Macs), or Save As/Export filters created them. For example, Saving As .doc/.rtf out of Adobe Acrobat, or OCR scanning software that saves as .doc/rtf. Because of this I often run into fake returns, which I can manipulate to some degree, in Find & Replace with ^p and ^13. However, if I select a group of these paragraphs and apply a Style to them, their fakeness is revealed and the block of paragraphs is treated as if it were one paragraph. I used to see and solve a problem like this, in which a para mark had a ^10 before or after it, but I haven't seen those ^10s since a couple Word or operating system versions ago. Any ideas? WilliamW |
#4
![]()
Posted to microsoft.public.word.formatting.longdocs
|
|||
|
|||
![]()
[...] If you look at the Word file in a text editor there's no sign of a
difference, so I figure the difference must be in the header of the Word file that specifies that there is one type of para-break encoding, when in fact the file contains mixed para-break encodings. Yes, something like that is my guess too. If you could look into the binary *.doc format (or its equivalent in memory once Word has loaded a doc), functioning paragraph marks would likely have a pointer associated with them that points to a data structure with the style and all the paragraph formatting. In the problematic cases, that pointer wasn't created. Just speculation, though. Along these same lines of getting under the hood of what's happening in the Word file, I'd love to have a better understanding of how to determine when files contain Unicode versus when they don't, whether files sometimes *think* they contain Unicode but in fact they don't and vice versa, etc. Interesting questions... One quick way to tell if a file has "Unicode characters" (precisely, characters that aren't in the old Windows code page 1252) is to try to save as Plain Text (*.txt), choosing the Windows (Standard) encoding. If the file contains such characters, the dialog shows a yellow exclamation mark, and the characters that can't be saved are marked red in the preview window. Greetings, Klaus |
#5
![]()
Posted to microsoft.public.word.formatting.longdocs
|
|||
|
|||
![]() "Klaus Linke" wrote in message ... Along these same lines of getting under the hood of what's happening in the Word file, I'd love to have a better understanding of how to determine when files contain Unicode versus when they don't, whether files sometimes *think* they contain Unicode but in fact they don't and vice versa, etc. Interesting questions... One quick way to tell if a file has "Unicode characters" (precisely, characters that aren't in the old Windows code page 1252) is to try to save as Plain Text (*.txt), choosing the Windows (Standard) encoding. I know about this, and do it, but I'd like to able to control these things without a human being having to look at the file. I've been able to use VBA to cycle through a file character by character. Typically for files that use Unicode chars, the chars used are within a 100-200 char unicode range. Once I've identified what that range is, then I change the values my macro searches for to the values in that range. But going char by char *and* going unicode value by unicode value through the whole 6000-char range of unicode values would take a verrry long time. Already, going char by char through the file takes pretty long. |
#6
![]()
Posted to microsoft.public.word.formatting.longdocs
|
|||
|
|||
![]()
I know about this, and do it, but I'd like to able to control these things
without a human being having to look at the file. Yes, that was the "low tech" approach g I've been able to use VBA to cycle through a file character by character. Typically for files that use Unicode chars, the chars used are within a 100-200 char unicode range. Once I've identified what that range is, then I change the values my macro searches for to the values in that range. But going char by char *and* going unicode value by unicode value through the whole 6000-char range of unicode values would take a verrry long time. Already, going char by char through the file takes pretty long. Then maybe I have something a little more high-tech for ya (see code below)... If you'd rather put the results in an array and process it, instead of printing it out at the end of the document, I'm sure you can adapt the code. Klaus Sub CodesFast() Dim myString, myStringNew, myChar, myCode Dim strOutput, HexString, myCharCount myString = ActiveDocument.Content.Text strOutput = "" Do myChar = left$(myString, 1) myStringNew = Replace(myString, myChar, "", 1, Compa=vbBinaryCompare) myCharCount = Len(myString) - Len(myStringNew) myCode = AscW(myChar) And &HFFFF& strOutput = strOutput & (myCode) & vbTab StatusBar = myCode HexString = Hex$(myCode) While Len(HexString) 4 HexString = "0" & HexString Wend strOutput = strOutput & "U+" & HexString & vbTab If myCode 31 Then strOutput = strOutput & myChar End If strOutput = strOutput & vbTab & LTrim(STR$(myCharCount)) strOutput = strOutput & vbCr myString = myStringNew Loop Until Len(myString) = 0 ActiveDocument.Content.Select Selection.Collapse Direction:=wdCollapseEnd Selection.Range.InsertParagraphBefore Selection.TypeText Text:=" " Selection.Expand Unit:=wdParagraph With ActiveDocument.Bookmarks .Add Range:=Selection.Range, Name:="Codes" .DefaultSorting = wdSortByName .ShowHidden = False End With Selection.Collapse Direction:=wdCollapseStart Selection.TypeText strOutput Selection.GoTo What:=wdGoToBookmark, Name:="Codes" Selection.ConvertToTable Separator:=wdSeparateByTabs Selection.SORT ExcludeHeader:=False, FieldNumber:=1, _ SortFieldType:=wdSortFieldNumeric, _ SortOrder:=wdSortOrderAscending Selection.Rows.ConvertToText Separator:=wdSeparateByTabs ActiveDocument.Bookmarks("Codes").Delete End Sub |
#7
![]()
Posted to microsoft.public.word.formatting.longdocs
|
|||
|
|||
![]() "Klaus Linke" wrote in message ... I know about this, and do it, but I'd like to able to control these things without a human being having to look at the file. Yes, that was the "low tech" approach g I've been able to use VBA to cycle through a file character by character. Typically for files that use Unicode chars, the chars used are within a 100-200 char unicode range. Once I've identified what that range is, then I change the values my macro searches for to the values in that range. But going char by char *and* going unicode value by unicode value through the whole 6000-char range of unicode values would take a verrry long time. Already, going char by char through the file takes pretty long. Then maybe I have something a little more high-tech for ya (see code below)... If you'd rather put the results in an array and process it, instead of printing it out at the end of the document, I'm sure you can adapt the code. Klaus Wow. Thanks, Klaus. I tried it, and saw the results. The Hex and binary stuff in the code are beyond my depth right now, but I think I can use this as a jumping off point for further exploration. -WilliamW |
Reply |
Thread Tools | |
Display Modes | |
|
|
![]() |
||||
Thread | Forum | |||
Replace all soft returns with hard returns | Microsoft Word Help | |||
is the microsoft mega jackpot lottery for real or is it fake? | Microsoft Word Help | |||
two copies of document open automatically - one real one fake? | Microsoft Word Help |