Russian transcription help

Started by dreamingsky, 13. June 2010, 01:20:11

Previous topic - Next topic

0 Members and 1 Guest are viewing this topic.

dreamingsky

Can someone help me with Russian transcription?  I wanted to write a readme to help users with writing Russian transcription in DfM.  But, I don't know Russian myself.

DfM has 4 Cyrillic transcriptions Normation classes:
1. NormationRus2.java
2. NormationUkr.java
3. NormationRusC.java
4. NormationUkrC.java

A description of the normation classes is here:
http://dictionarymid.sourceforge.net/newdictNormationLang.html

NormationRus2.java:
Allows you to search words both in Cyrillic and Latin transcription (according to the GOST 1971 - but yards are 'x' and there are used no apostrophes).

I found GOST 16876-71 here:
http://en.wikipedia.org/wiki/GOST_16876-71

But, NormationRus2 is a little different from GOST 16876-71.

Cyrillic   GOST 16876-71   Rus2   Ukr   RusC   UkrC
а   a   a   a   a   a
б   b   b   b   b   b
в   v   v   v   v   v
г   g   g   h   g   h
д   d   d   d   d   d
е   e   e   e   e   e
ё   jo   yo   yo   jo   jo
ж   zh   zh   zh   z   z
з   z   z   z   z   z
и   i   i   i   i   i
ї         yi      ji
й   jj   y   y   j   j
к   k   k   k   k   k
л   l   l   l   l   l
м   m   m   m   m   m
н   n   n   n   n   n
о   o   o   o   o   o
п   p   p   p   p   p
р   r   r   r   r   r
с   s   s   s   s   s
т   t   t   t   t   t
у   u   u   u   u   u
ф   f   f   f   f   f
х   kh   kh   kh   ch   ch
ц   c   c   c   c   c
ч   ch   ch   ch   c   c
ш   sh   sh   sh   s   s
щ   shh   shh   shh   sc   sc
ъ         x   x   x   x
ы   y   y   y   y   y
ь   '   x   x   x   x
э   eh   eh   eh   e   e
ю   ju   yu   yu   ju   ju
я   ja   ya   ya   ja   ja
ґ         g      g

Here are the 4 changes:
Cyrillic   GOST   NormationRus2
ё   jo   yo
й   jj   y
ю   ju   yu
я   ja   ya

Were these 4 changes intentional?  Or, are they a mistake?


Also, NormationRusC and NormationUkrC state "according to the Czech ISO norm".  Does anyone know the ISO number?

Gert

Just a remark at the side: these Normation-classes were set up by Michael Kopecky, maybe you could try to contact him ?

Gert

dreamingsky


dreamingsky

Gert

Can you please add these 3 normation classes to the next version of DfM?  They are for Cyrillic.
- NormationCyr1.java   (Russian, Ukrainian, Macedonian)
- NormationCyr2.java   (Russian, Ukrainian, Macedonian)
- NormationBul.java     (Bulgarian)

NormationCyr1.java replaces NormationRus2.java and NormationUkr.java.
NormationCyr2.java replaces NormationRusC.java and NormationUkrC.java.

I fixed a few errors in the transcriptions and added some missing transcriptions from the old normation classes.  We can keep NormationRus2.java, NormationUkr.java, NormationRusC.java, and NormationUkrC.java in DfM for the Czech dictionaries.  But, we can remove the information from newdictNormationLang.html.

Once the new normation classes are added to DfM. I'll edit newdictNormationLang.html to show the new normation classes.

Jeff

dreamingsky

Just for reference, here are the normation classes.  This might save someone the work of learning it later.

Cyrillic     NormationCyr1     NormationCyr2     NormationBul
а     a     a     a
б     b     b     b
в     v     v     v
г     g     g     g
д     d     d     d
е     e     e     e
ё     jo     yo
ж     zh     zh     zh
з     z     z     z
и     i     i     i
й     j     j     y
к     k     k     k
л     l     l     l
м     m     m     m
н     n     n     n
о     o     o     o
п     p     p     p
р     r     r     r
с     s     s     s
т     t     t     t
у     u     u     u
ф     f     f     f
х     h     h     h
ц     c     c     ts
ч     ch     ch     ch
ш     sh     sh     sh
щ     shh     shh     sht
ъ     "     "     aj     aj
ы     y     y
ь     x     x     x
э     eh     eh
ю     ju     yu     yu
я     ja     ya     ya
і     ij     ij
ѳ          fh
ѣ          je
ѵ          yh
ґ     gj     gj
ѓ     gj     gj
є     ye     ye
ї     yi     yi
ѕ     dz     dz
ј     jj     jj
љ     lj     lj
њ     nj     nj
ќ     kj     kj
џ     dj     dj
ў     uj     uj

Gert

Jeff,

Ok, great - I will add these classes, probably in about two weeks.

Best regards,
Gert

Gert

Jeff,

I just tried to build a new version (so you wouldn't have to wait 2 weeks ...).

But I encountered a problem:

[wtkbuild] Compiling 4 source files to C:\Projects\DictionaryForMIDs\Build\DictionaryForMIDs\classes
[wtkbuild] C:\Projects\DictionaryForMIDs\DictionaryForMIDs\src\de\kugihan\dictionaryformids\translation\normation\NormationBul.java:1: illegal character: \65279
[wtkbuild] /*
[wtkbuild] ^
[wtkbuild] C:\Projects\DictionaryForMIDs\DictionaryForMIDs\src\de\kugihan\dictionaryformids\translation\normation\NormationCyr1.java:1: illegal character: \65279
[wtkbuild] /*
[wtkbuild] ^
[wtkbuild] C:\Projects\DictionaryForMIDs\DictionaryForMIDs\src\de\kugihan\dictionaryformids\translation\normation\NormationCyr2.java:1: illegal character: \65279
[wtkbuild] /*
[wtkbuild] ^
[wtkbuild] 3 errors


Hmmm, maybe I still can find the problem.

Regards,
Gert

dreamingsky

I did some research.  I noticed there was 1 difference between the new normation classes and the old NormationRus2.java:
NormationRus2.java does not have a BOM
I put a BOM in the new normation classes

The UCN for the BOM is \uFEFF, the HTML is &#65279.

I re-saved the normation classes with no BOM now.  They should work OK now.

Maybe sometime in the future, it would be nice to support a BOM in DfM normation classes.  But, it's a low priority.

Jeff


Gert

Jeff,

thanks for the update - I will try with your new files.

QuoteMaybe sometime in the future, it would be nice to support a BOM in DfM normation classes.
You mean that the normation classes should filter out the BOM-character ?

Best greetings,
Gert

dreamingsky

QuoteYou mean that the normation classes should filter out the BOM-character ?

Yes, I think that would be good.  Other people in the future may write normation classes with a BOM.  Personally, I save all UTF-8 files with a BOM.  Then the file is guaranteed to open correctly in programs.

Gert

Jeff,

QuoteQuote
You mean that the normation classes should filter out the BOM-character ?

Yes, I think that would be good.  Other people in the future may write normation classes with a BOM.  Personally, I save all UTF-8 files with a BOM.  Then the file is guaranteed to open correctly in programs.

Hmmmm ...
1) The Java compiler seems not to handle the BOM-character in the .java files. Actually I am surprised about that; anyway, we cannot change the Java compiler. Well, maybe searching for a compiler configuration that handles the BOM may be an option.
2) The implementation of the normation class may filter the BOM-character. Didn't I already add that to NormationLib.defaultNormation ? Hmmmm, I intended to do so, but I guess I did not do that yet ... too deep in the night now to think and remember ...

Ok, I will compile your updated files soon.
Best regards,
Gert

dreamingsky

Thanks.  Filtering out the BOM is not a high priority.  Just, if you have the chance, it'd be nice to add the feature.

Jeff

Gert

QuoteThanks.  Filtering out the BOM is not a high priority.  Just, if you have the chance, it'd be nice to add the feature.

Wasn't that already implemented ... ?

Gert

dreamingsky

You just added the feature to remove the BOM from dictionary_input.csv files because the first entry of every dictionary was not indexed before (this was fixed while we were updating the Chinese normation class).  But, this new problem is to remove the BOM from reading the normation java files (NormationEng2.java, normationRus2.java, etc).

Gert

Jeff,

I did run your 3 Normation classes through the compiler and uploaded version 3.5.6 of DictionaryGeneration_empty and JarCreator to the File Release System (the JarCreator update is required because of a 'bad' dependency to the Normation classes that I still need to remove some time in the future).

The compile worked fine; however I did not yet test the new version. Maybe you could have a look at that ?

Thank you for these 3 classes !!

And sorry for the long time that I needed to compile your files.

Best regards,
Gert