Reply
 
Thread Tools Display Modes
  #1   Report Post  
Raghav Das Raghav Das is offline
Junior Member
 
Posts: 0
Default Searching for character in a Devanagari Unicode glyph

I hope that the Devanagari characters in the following message are displayed properly.

For example, if I want to search (using Find and Replace options of Microsoft Word) for the character म within the glyph मि. Then how do I do it?

I mean to say,

IF there is a Devanagari glyph (which is displayed as single character) मि, WHICH is composed of the following two Unicode Characters:

Total 2 characters "मि"

In HEXADECIMAL

92Eh (म) 93Fh (ि)

In DECIMAL

2350 (म) 2367 (ि)

AND I search for the character 92Eh (म) by typing it in the "Find" box of Microsoft Word, then, although this character is present in the Word file, MS Word doesn't find it-----as if it is not there.

However, when the character 92Eh (म) is NOT followed by the sign of the VOWEL [i.e. when it is NOT followed by 93Fh (ि) ], then Microsoft Word easily finds it.

But, as mentioned above, it doesn't find it when it is followed by a VOWEL-sign such as 93Fh (ि).

QUESTION:

How to find the character like 92Eh (म) EVEN WHEN it is followed by a vowel sign such as 93Fh (ि) using FIND (SEARCH) features of Microsoft Word?

It is very important for me to know the answer for this question because I want to write a Word macro to convert the DEVANAGARI UNICODE text to the DEVANAGARI TEXT in the ISCII encoding [using my method].

Thanking you, in advance.
  #2   Report Post  
Posted to microsoft.public.word.docmanagement
Peter T. Daniels Peter T. Daniels is offline
external usenet poster
 
Posts: 3,215
Default Searching for character in a Devanagari Unicode glyph

Normally I have no problem seeing non-roman characters in my email,
but here I see what are probably Unicode code point numbers.

Before you start this quixotic enterprise, are you certain that every
Unicode character has an equivalent in ISCII (which I know nothing
about)? Does ISCII have a separate character for every possible
conjunct akshara with every possible matra, the way Unicode Korean has
a separate character for every possible syllable block?

That seems unlikely ...

Does ISCII automatically form conjuncts, or do you have to input the
reduced form alongside the full form of the base character?

In Word, you should be able to search any sequence of consonants and
vowels (whether or niot they are combined), and you might even make
some shortcuts using wildcards, but it isn't entirely clear what
you're trying to do.

On Jan 19, 4:22*am, Raghav Das
wrote:
I hope that the Devanagari characters in the following message are
displayed properly.

For example, if I want to search (using Find and Replace options of
Microsoft Word) for the character म within the glyph
मि. Then how do I do it?

I mean to say,

IF there is a Devanagari glyph (which is displayed as single character)
मि, WHICH is composed of the following two Unicode
Characters:

Total 2 characters "मि"

In HEXADECIMAL

92Eh (म) 93Fh (ि)

In DECIMAL

2350 (म) 2367 (ि)

AND I search for the character 92Eh (म) by typing it in the "Find"
box of Microsoft Word, then, although this character is present in the
Word file, MS Word doesn't find it-----as if it is not there.

However, when the character 92Eh (म) is NOT followed by the sign
of the VOWEL [i.e. when it is NOT followed by 93Fh (ि) ], then
Microsoft Word easily finds it.

But, as mentioned above, it doesn't find it when it is followed by a
VOWEL-sign such as 93Fh (ि).

QUESTION:

How to find the character like 92Eh (म) EVEN WHEN it is followed
by a vowel sign such as *93Fh (ि) using FIND (SEARCH) features of
Microsoft Word?

It is very important for me to know the answer for this question because
I want to write a Word macro to convert the DEVANAGARI UNICODE text to
the DEVANAGARI TEXT in the ISCII encoding [using my method].

Thanking you, in advance.

--
Raghav Das


  #3   Report Post  
Raghav Das Raghav Das is offline
Junior Member
 
Posts: 0
Default

Thank you for replying, and
sorry for giving late answer,


'--Before you start this quixotic enterprise, are you certain that every
'--Unicode character has an equivalent in ISCII (which I know nothing
'--about)? Does ISCII have a separate character for every possible
'--conjunct akshara with every possible matra, the way Unicode Korean has
'--a separate character for every possible syllable block?

'--That seems unlikely ...

Yes, every Unicode (Devanagari) character has an equivalent in ISCII.

This is because, ISCII means "Indian Standard Code for Information Interchange".
[Ref--Indian Standard Document 13194, Bureau of Indian Standards, 1991.]

The Unicode (Devanagari part) is based on ISCII.

Yes, ISCII has a separate character for every possible conjunct akshara with every possible matra. Everything in Unicode (Devanagari part) is there in ISCII.

'--Does ISCII automatically form conjuncts, or do you have to input the
'--reduced form alongside the full form of the base character?

No. ISCII doesn't automatically form conjuncts.

ISCII text isn't a readable Devanagari text. The ISCII is simply a format. ISCII only contains the basic "consonants", "vowels", "matras of vowels" and "other marks" just as Unicode (Devanagari) does. Showing the visible conjuncts on the screen is the job of the programmer. ISCII is exactly similar to Unicode (Devanagari part of Unicode). Just as the Unicode Devanagari text in Microsoft Word that is stored in Word files stores only the "consonants", "vowels" and "matras of vowels" etc, and the conjuncts that are visibly displayed on screen are rendered by a component of the operating system (Windows) called "Uniscribe", the ISCII also only contains "consonants", "vowels" and "matras of vowel" etc. and NOT the conjunct forms.

There is practically one-to-one relationship between the every character in ISCII and the every character in Unicode (Devanagari) with some exceptions, but these exceptions can be solved easily.

Since India still very much depends on the old non-Unicode format text for publishing of books in Devanagari (Desk-Top-Publishing). [because the publishing softwares like QuarkXPress, Adobe Indesign etc. do not support Unicode Devanagari text or have only newly introduced it] Indian people haven't got rid of the OLD non-Unicode format text yet. However, since Unicode Devanagari text is becoming popular on Internet, web sites (such as google) and emails, people frequently need to convert NON-UNICODE text to Unicode and vice versa.

There are many third party softwares in India (such as ISM, Shree-Lipi and Indica), which provide converters from ISCII to their format (non-Unicode) and vice versa, and I have purchased many such third party softwares. Hence I can readily convert ISCII text to any of the popular non-Unicode format text of India and vice versa, using these softwares of India, which I have purchased. Because of availability of these third-party softwares, for me, doing Unicode-to-non-Unicode conversion (and vice versa) is equivalent to doing ISCII-to-Unicode conversion (and vice versa).

Of course, these softwares also provide ISCII-to-Unicode and vice versa conversion (using non-VBA programming), but I want to do my own ISCII-to-Unicode and vice versa conversion, using my own Microsoft Word macros, because their conversions aren't perfect and secondly, I have successfully replaced many of their conversion tools with my own VBA Word macros, which I want to do here also.

I have already successfully created an ISCII-to-Unicode (Devanagari) macro in Microsoft Word. But, now, only the reverse direction macro---Unicode-to-ISCII macro----has to be created, and that is giving me problems, as described in my previous message.

I have written the forward direction macro (ISCII-to-Unicode macro) successfully using "Find and Replace" commands. The macro simply issues the following block of statements repeatedly with different values in each occurrence.

For example,


'ISCII-to-Unicode macro [forward macro]
'
'WORKS SUCCESSFULLY
'
'
'e.g.
'
'(1)
Selection.Find.Text = "^0204" 'ISCII code of Devanagari consonant 'ma'
Selection.Find.Replacement.Text = "^u2350"
'Unicode Devanagari consonant 'ma' [92E hex or 2350 decimal]

Selection.Find.Execute Replace:=wdReplaceAll


'(2)
Selection.Find.Text = "^0219" 'ISCII code of Devanagari matra of vowel "hrasva i"
Selection.Find.Replacement.Text = "^u2367"
'Unicode Devanagari 'matra of hrasva i'
'[93F hex or 2367 decimal]

Selection.Find.Execute Replace:=wdReplaceAll


'(3) etc.



As mentioned above, this forward macro works perfectly and successfully.


But the reverse macro gives problems.


'Unicode-to-ISCII macro [reverse macro]
'
'GIVES PROBLEMS
'BECAUSE THE CHARACTER Unicode Devanagari 'ma' [92E hex or 2350 decimal]
'ISN'T FOUND BY MICROSOFT WORD WHEN IT IS IMMEDIATELY FOLLOWED (IN THE FILE)
'BY A VOWEL-MATRA, SUCH AS Unicode Devanagari 'matra of hrasva i'
'[93F hex or 2367 decimal]
'
'IT IS NOT FOUND BY THE FOLLOWING COMMAND, WHEN IT IS FOLLOWED BY ANY VOWEL-MATRA.
'
'e.g.
'
'(1)
Selection.Find.Text = "^u2350" 'Unicode Devanagari consonant 'ma' [92E hex or 2350 decimal]
Selection.Find.Replacement.Text = "^0204" 'ISCII code of Devanagari consonant 'ma'
Selection.Find.Execute Replace:=wdReplaceAll


'(2)
Selection.Find.Text = "^u2367" 'Unicode Devanagari 'matra of hrasva i'
'[93F hex or 2367 decimal]
Selection.Find.Replacement.Text = "^0219" 'ISCII code of Devanagari matra of vowel "hrasva i"
Selection.Find.Execute Replace:=wdReplaceAll


'(3) etc.



As mentioned above (as a comment in the macro), the character Unicode Devanagari 'ma' [92E hex or 2350 decimal] isn't found by Microsoft Word when it is immediately followed (in the file) by a vowel-matra such as Unicode Devanagari 'matra of hrasva i' [93F hex or 2367 decimal]

But when such character (Unicode Devanagari 'ma') exists singly in the file [i.e. when it is NOT followed by any vowel matra] it is replaced by the macro.

So, the command to replace Unicode Consonant with ISCII consonant sometimes becomes successful and sometimes doesn't-----when the consonant is present singly in the file, it is successful, and when the consonant is followed by a vowel matra, it is unsuccessful.

This is a faulty behavior of Microsoft Word. The command should replace it always, whether or not it is followed by any vowel matra or not.

That is what I am talking about.

[As a matter of fact, the vowel of matra ALSO isn't found when it is combined with the consonant. It is found only when it appears singly meaninglessly. (We know that the matra of a vowel cannot appear singly meaningfully, although it is possible to type a single vowel-matra, which would have no meaning.)]



'--In Word, you should be able to search any sequence of consonants and
'--vowels (whether or niot they are combined), and you might even make
'--some shortcuts using wildcards, but it isn't entirely clear what
'--you're trying to do.

You cannot.

When any Devanagari Unicode consonant appears singly in the file, then you can search for it using Find command (either through UI or programmatically), but when it is followed by a vowel sign, it is NOT found by Microsoft Word (neither through UI nor programmatically), as explained above, AND also as explained in my previous message. [Both the messages use the same examples.]

Is there any option in Microsoft Word which would enable it to find the consonants embedded in consonants clusters? (or consonant embedded in consonent+vowel-matra?)

OR should I ask in another way? Has anyone made any Unicode Devanagari font, which doesn't implement ANY conjuncts------just as it would show up under Windows 98, where no "Uniscribe" is present and the operating system wouldn't display any conjuncts-----just plain consonants, vowels and matras of vowels? (which, of course, isn't readable).

[If anyone has made such a font, then I can solve my problem temporarily. I would just apply that font to my text, and then run my macro on it, which will replace all characters now, and my Unicode-to-ISCII conversion, USING MACRO, would become successful.]

Last edited by Raghav Das : January 23rd 13 at 03:32 PM
  #4   Report Post  
Posted to microsoft.public.word.docmanagement
Peter T. Daniels Peter T. Daniels is offline
external usenet poster
 
Posts: 3,215
Default Searching for character in a Devanagari Unicode glyph

I think what you are asking is for the Microsoft engineers to provide
you with a way to neutralize the component that combines the character
codes into the codes that yield the combined glyphs.

I don't think you can get them to do that.

Can you modify a font so that it will not behave like OpenType? so
that the renderer can't "see" that items need to be combined?

Also: I just got back proofs of an article that has examples in
Chinese, Arabic, and Sanskrit script (among others), and in every case
what is printed is only the citation forms of the letters -- no
connections in Arabic, no conjuncts in Sanskrit, no fanqie in Chinese
-- and this, Oxford University Press tells me, was typeset in India!
(The MSWord file supplied to the typesetter by the editor had
everything exactly correct.)

On Jan 23, 9:18*am, Raghav Das
wrote:
Before you start this quixotic enterprise, are you certain that every
Unicode character has an equivalent in ISCII (which I know nothing
about)? Does ISCII have a separate character for every possible
conjunct akshara with every possible matra, the way Unicode Korean

has
a separate character for every possible syllable block?
That seems unlikely ...


Yes, every Unicode (Devanagari) character has an equivalent in ISCII.

This is because, ISCII means "Indian Standard Code for Information
Interchange".
[Ref--Indian Standard Document 13194, Bureau of Indian Standards,
1991.]

The Unicode (Devanagari part) is based on ISCII.

Yes, ISCII has a separate character for every possible conjunct akshara
with every possible matra. Everything in Unicode (Devanagari part) is
there in ISCII.

Does ISCII automatically form conjuncts, or do you have to input the
reduced form alongside the full form of the base character?


No. ISCII doesn't automatically form conjuncts.

ISCII text isn't a readable Devanagari text. The ISCII is simply a
format. ISCII only contains the basic "consonants", "vowels", "matras of
vowels" and "other marks" just as Unicode (Devanagari) does. Showing the
visible conjuncts on the screen is the job of the programmer. ISCII is
exactly similar to Unicode (Devanagari part of Unicode). Just as the
Unicode Devanagari text in Microsoft Word that is stored in Word files
stores only the "consonants", "vowels" and "matras of vowels" etc, and
the conjuncts that are visibly displayed on screen are rendered by a
component of the operating system (Windows) called "Uniscribe", the
ISCII also only contains "consonants", "vowels" and "matras of vowel"
etc. and NOT the conjunct forms.

There is practically one-to-one relationship between the every character
in ISCII and the every character in Unicode (Devanagari) with some
exceptions, but these exceptions can be solved easily.

Since India still very much depends on the old non-Unicode format text
for publishing of books in Devanagari (Desk-Top-Publishing). [because
the publishing softwares like QuarkXPress, Adobe Indesign etc. do not
support Unicode Devanagari text or have only newly introduced it] Indian
people haven't got rid of the OLD non-Unicode format text yet. However,
since Unicode Devanagari text is becoming popular on Internet, web sites
(such as google) and emails, people frequently need to convert
NON-UNICODE text to Unicode and vice versa.

There are many third party softwares in India (such as ISM, Shree-Lipi
and Indica), which provide converters from ISCII to their format
(non-Unicode) and vice versa, and I have purchased many such third party
softwares. Hence I can readily convert ISCII text to any of the popular
non-Unicode format text of India and vice versa, using these softwares
of India, which I have purchased. Because of availability of these
third-party softwares, for me, doing Unicode-to-non-Unicode conversion
(and vice versa) is equivalent to doing ISCII-to-Unicode conversion (and
vice versa).

Of course, these softwares also provide ISCII-to-Unicode and vice versa
conversion (using non-VBA programming), but I want to do my own
ISCII-to-Unicode and vice versa conversion, using my own Microsoft Word
macros, because their conversions aren't perfect and secondly, I have
successfully replaced many of their conversion tools with my own VBA
Word macros, which I want to do here also.

I have already successfully created an ISCII-to-Unicode (Devanagari)
macro in Microsoft Word. But, now, only the reverse direction
macro---Unicode-to-ISCII macro----has to be created, and that is giving
me problems, as described in my previous message.

I have written the forward direction macro (ISCII-to-Unicode macro)
successfully using "Find and Replace" commands. The macro simply issues
the following block of statements repeatedly with different values in
each occurrence.

For example,

'ISCII-to-Unicode macro [forward macro]
'
'WORKS SUCCESSFULLY
'
'
'e.g.
'
'(1)
Selection.Find.Text = "^0204" * * * * 'ISCII code of Devanagari
consonant 'ma'
Selection.Find.Replacement.Text = "^u2350"
'Unicode Devanagari consonant 'ma' [92E hex or
2350 decimal]

Selection.Find.Execute Replace:=wdReplaceAll

'(2)
Selection.Find.Text = "^0219" * * * * 'ISCII code of Devanagari matra
of vowel "hrasva i"
Selection.Find.Replacement.Text = "^u2367"
'Unicode Devanagari 'matra of hrasva i'
'[93F hex or 2367 decimal]

Selection.Find.Execute Replace:=wdReplaceAll

'(3) etc.

As mentioned above, this forward macro works perfectly and
successfully.

But the reverse macro gives problems.

'Unicode-to-ISCII macro [reverse macro]
'
'GIVES PROBLEMS
'BECAUSE THE CHARACTER Unicode Devanagari 'ma' [92E hex or 2350
decimal]
'ISN'T FOUND BY MICROSOFT WORD WHEN IT IS IMMEDIATELY FOLLOWED (IN THE
FILE)
'BY A VOWEL-MATRA, SUCH AS Unicode Devanagari 'matra of hrasva i'
'[93F hex or *2367 decimal]
'
'IT IS NOT FOUND BY THE FOLLOWING COMMAND, WHEN IT IS FOLLOWED BY ANY
VOWEL-MATRA.
'
'e.g.
'
'(1)
Selection.Find.Text = "^u2350" *'Unicode Devanagari consonant 'ma'
[92E hex or 2350 decimal]
Selection.Find.Replacement.Text = "^0204" 'ISCII code of Devanagari
consonant 'ma'
Selection.Find.Execute Replace:=wdReplaceAll

'(2)
Selection.Find.Text = "^u2367" * * * *'Unicode Devanagari 'matra of
hrasva i'
'[93F hex or 2367
decimal]
Selection.Find.Replacement.Text = "^0219" *'ISCII code of Devanagari
matra of vowel "hrasva i"
Selection.Find.Execute Replace:=wdReplaceAll

'(3) etc.

As mentioned above (as a comment in the macro), the character Unicode
Devanagari 'ma' *[92E hex or 2350 decimal] isn't found by Microsoft Word
when it is immediately followed (in the file) by a vowel-matra such as
Unicode Devanagari 'matra of hrasva i' [93F hex or 2367 decimal]

But when such character (Unicode Devanagari 'ma') exists singly in the
file [i.e. when it is NOT followed by any vowel matra] it is replaced by
the macro.

So, the command to replace Unicode Consonant with ISCII consonant
sometimes becomes successful and sometimes doesn't-----when the
consonant is present singly in the file, it is successful, and when the
consonant is followed by a vowel matra, it is unsuccessful.

This is a faulty behavior of Microsoft Word. The command should replace
it always, whether or not it is followed by any vowel matra or not.

That is what I am talking about.

[As a matter of fact, the vowel of matra ALSO isn't found when it is
combined with the consonant. It is found only when it appears singly
meaninglessly. (We know that the matra of a vowel cannot appear singly
meaningfully, although it is possible to type a single vowel-matra,
which would have no meaning.)]

In Word, you should be able to search any sequence of consonants and
vowels (whether or niot they are combined), and you might even make
some shortcuts using wildcards, but it isn't entirely clear what
you're trying to do.


You cannot.

When any Devanagari Unicode consonant appears singly in the file, then
you can search for it using Find command (either through UI or
programmatically), but when it is followed by a vowel sign, it is NOT
found by Microsoft Word (neither through UI nor programmatically), as
explained above, AND also as explained in my previous message. [Both the
messages use the same examples.]

Is there any option in Microsoft Word which would enable it to find the
consonants embedded in consonants clusters? (or consonant embedded in
consonent+vowel-matra?)

OR should I ask in another way? Has anyone made any Unicode Devanagari
font, which doesn't implement ANY conjuncts------just as it would show
up under Windows 98, where no "Uniscribe" is present and the operating
system wouldn't display any conjuncts-----just plain consonants, vowels
and matras of vowels? (which, of course, isn't readable).

[If anyone has made such a font, then I can solve my problem
temporarily. I would just apply that font to my text, and then run my
macro on it, which will replace all characters now, and my
Unicode-to-ISCII conversion, USING MACRO, would become successful.]

--
Raghav Das


Reply
Thread Tools
Display Modes

Posting Rules

Smilies are On
[IMG] code is On
HTML code is Off

Forum Jump

Similar Threads
Thread Thread Starter Forum Replies Last Post
Searching for anything BUT a particular character Idaho Word Man Microsoft Word Help 5 April 2nd 09 04:32 PM
what Unicode character for multiplication symbol like x? Zabarellis Microsoft Word Help 2 September 12th 06 11:18 PM
Character map and unicode nsv Microsoft Word Help 3 August 30th 06 07:33 AM
Searching for a character Dave Neve Microsoft Word Help 1 June 22nd 05 02:22 PM
Searching for a special character. Shinz Microsoft Word Help 2 March 29th 05 03:57 PM


All times are GMT +1. The time now is 10:35 PM.

Copyright ©2000 - 2024, Jelsoft Enterprises Ltd.
Copyright ©2004-2024 Microsoft Office Word Forum - WordBanter.
The comments are property of their posters.
 

About Us

"It's about Microsoft Word"