DictionaryForMids Forum

Dictionaries => General discussions => Topic started by: dreamingsky on 13. June 2010, 01:20:11

Title: Russian transcription help
Post by: dreamingsky on 13. June 2010, 01:20:11
Can someone help me with Russian transcription?  I wanted to write a readme to help users with writing Russian transcription in DfM.  But, I don't know Russian myself.

DfM has 4 Cyrillic transcriptions Normation classes:
1. NormationRus2.java
2. NormationUkr.java
3. NormationRusC.java
4. NormationUkrC.java

A description of the normation classes is here:
http://dictionarymid.sourceforge.net/newdictNormationLang.html

NormationRus2.java:
Allows you to search words both in Cyrillic and Latin transcription (according to the GOST 1971 - but yards are 'x' and there are used no apostrophes).

I found GOST 16876-71 here:
http://en.wikipedia.org/wiki/GOST_16876-71

But, NormationRus2 is a little different from GOST 16876-71.

Cyrillic   GOST 16876-71   Rus2   Ukr   RusC   UkrC
а   a   a   a   a   a
б   b   b   b   b   b
в   v   v   v   v   v
г   g   g   h   g   h
д   d   d   d   d   d
е   e   e   e   e   e
ё   jo   yo   yo   jo   jo
ж   zh   zh   zh   z   z
з   z   z   z   z   z
и   i   i   i   i   i
ї         yi      ji
й   jj   y   y   j   j
к   k   k   k   k   k
л   l   l   l   l   l
м   m   m   m   m   m
н   n   n   n   n   n
о   o   o   o   o   o
п   p   p   p   p   p
р   r   r   r   r   r
с   s   s   s   s   s
т   t   t   t   t   t
у   u   u   u   u   u
ф   f   f   f   f   f
х   kh   kh   kh   ch   ch
ц   c   c   c   c   c
ч   ch   ch   ch   c   c
ш   sh   sh   sh   s   s
щ   shh   shh   shh   sc   sc
ъ         x   x   x   x
ы   y   y   y   y   y
ь   '   x   x   x   x
э   eh   eh   eh   e   e
ю   ju   yu   yu   ju   ju
я   ja   ya   ya   ja   ja
ґ         g      g

Here are the 4 changes:
Cyrillic   GOST   NormationRus2
ё   jo   yo
й   jj   y
ю   ju   yu
я   ja   ya

Were these 4 changes intentional?  Or, are they a mistake?


Also, NormationRusC and NormationUkrC state "according to the Czech ISO norm".  Does anyone know the ISO number?
Title: Re: Russian transcription help
Post by: Gert on 13. June 2010, 08:41:05
Just a remark at the side: these Normation-classes were set up by Michael Kopecky, maybe you could try to contact him ?

Gert
Title: Re: Russian transcription help
Post by: dreamingsky on 13. June 2010, 12:19:02
Sounds good.
Title: Re: Russian transcription help
Post by: dreamingsky on 15. July 2010, 23:58:55
Gert

Can you please add these 3 normation classes to the next version of DfM?  They are for Cyrillic.
- NormationCyr1.java   (Russian, Ukrainian, Macedonian)
- NormationCyr2.java   (Russian, Ukrainian, Macedonian)
- NormationBul.java     (Bulgarian)

NormationCyr1.java replaces NormationRus2.java and NormationUkr.java.
NormationCyr2.java replaces NormationRusC.java and NormationUkrC.java.

I fixed a few errors in the transcriptions and added some missing transcriptions from the old normation classes.  We can keep NormationRus2.java, NormationUkr.java, NormationRusC.java, and NormationUkrC.java in DfM for the Czech dictionaries.  But, we can remove the information from newdictNormationLang.html.

Once the new normation classes are added to DfM. I'll edit newdictNormationLang.html to show the new normation classes.

Jeff
Title: Re: Russian transcription help
Post by: dreamingsky on 16. July 2010, 00:00:30
Just for reference, here are the normation classes.  This might save someone the work of learning it later.

Cyrillic     NormationCyr1     NormationCyr2     NormationBul
а     a     a     a
б     b     b     b
в     v     v     v
г     g     g     g
д     d     d     d
е     e     e     e
ё     jo     yo
ж     zh     zh     zh
з     z     z     z
и     i     i     i
й     j     j     y
к     k     k     k
л     l     l     l
м     m     m     m
н     n     n     n
о     o     o     o
п     p     p     p
р     r     r     r
с     s     s     s
т     t     t     t
у     u     u     u
ф     f     f     f
х     h     h     h
ц     c     c     ts
ч     ch     ch     ch
ш     sh     sh     sh
щ     shh     shh     sht
ъ     "     "     aj     aj
ы     y     y
ь     x     x     x
э     eh     eh
ю     ju     yu     yu
я     ja     ya     ya
і     ij     ij
ѳ          fh
ѣ          je
ѵ          yh
ґ     gj     gj
ѓ     gj     gj
є     ye     ye
ї     yi     yi
ѕ     dz     dz
ј     jj     jj
љ     lj     lj
њ     nj     nj
ќ     kj     kj
џ     dj     dj
ў     uj     uj
Title: Re: Russian transcription help
Post by: Gert on 16. July 2010, 01:42:37
Jeff,

Ok, great - I will add these classes, probably in about two weeks.

Best regards,
Gert
Title: Re: Russian transcription help
Post by: Gert on 16. July 2010, 20:01:26
Jeff,

I just tried to build a new version (so you wouldn't have to wait 2 weeks ...).

But I encountered a problem:

[wtkbuild] Compiling 4 source files to C:\Projects\DictionaryForMIDs\Build\DictionaryForMIDs\classes
[wtkbuild] C:\Projects\DictionaryForMIDs\DictionaryForMIDs\src\de\kugihan\dictionaryformids\translation\normation\NormationBul.java:1: illegal character: \65279
[wtkbuild] /*
[wtkbuild] ^
[wtkbuild] C:\Projects\DictionaryForMIDs\DictionaryForMIDs\src\de\kugihan\dictionaryformids\translation\normation\NormationCyr1.java:1: illegal character: \65279
[wtkbuild] /*
[wtkbuild] ^
[wtkbuild] C:\Projects\DictionaryForMIDs\DictionaryForMIDs\src\de\kugihan\dictionaryformids\translation\normation\NormationCyr2.java:1: illegal character: \65279
[wtkbuild] /*
[wtkbuild] ^
[wtkbuild] 3 errors


Hmmm, maybe I still can find the problem.

Regards,
Gert
Title: Re: Russian transcription help
Post by: dreamingsky on 22. July 2010, 03:48:28
I did some research.  I noticed there was 1 difference between the new normation classes and the old NormationRus2.java:
NormationRus2.java does not have a BOM
I put a BOM in the new normation classes

The UCN for the BOM is \uFEFF, the HTML is &#65279.

I re-saved the normation classes with no BOM now.  They should work OK now.

Maybe sometime in the future, it would be nice to support a BOM in DfM normation classes.  But, it's a low priority.

Jeff

Title: Re: Russian transcription help
Post by: Gert on 22. July 2010, 19:29:16
Jeff,

thanks for the update - I will try with your new files.

QuoteMaybe sometime in the future, it would be nice to support a BOM in DfM normation classes.
You mean that the normation classes should filter out the BOM-character ?

Best greetings,
Gert
Title: Re: Russian transcription help
Post by: dreamingsky on 22. July 2010, 21:53:58
QuoteYou mean that the normation classes should filter out the BOM-character ?

Yes, I think that would be good.  Other people in the future may write normation classes with a BOM.  Personally, I save all UTF-8 files with a BOM.  Then the file is guaranteed to open correctly in programs.
Title: Re: Russian transcription help
Post by: Gert on 23. July 2010, 04:11:18
Jeff,

QuoteQuote
You mean that the normation classes should filter out the BOM-character ?

Yes, I think that would be good.  Other people in the future may write normation classes with a BOM.  Personally, I save all UTF-8 files with a BOM.  Then the file is guaranteed to open correctly in programs.

Hmmmm ...
1) The Java compiler seems not to handle the BOM-character in the .java files. Actually I am surprised about that; anyway, we cannot change the Java compiler. Well, maybe searching for a compiler configuration that handles the BOM may be an option.
2) The implementation of the normation class may filter the BOM-character. Didn't I already add that to NormationLib.defaultNormation ? Hmmmm, I intended to do so, but I guess I did not do that yet ... too deep in the night now to think and remember ...

Ok, I will compile your updated files soon.
Best regards,
Gert
Title: Re: Russian transcription help
Post by: dreamingsky on 23. July 2010, 05:46:28
Thanks.  Filtering out the BOM is not a high priority.  Just, if you have the chance, it'd be nice to add the feature.

Jeff
Title: Re: Russian transcription help
Post by: Gert on 23. July 2010, 06:52:40
QuoteThanks.  Filtering out the BOM is not a high priority.  Just, if you have the chance, it'd be nice to add the feature.

Wasn't that already implemented ... ?

Gert
Title: Re: Russian transcription help
Post by: dreamingsky on 23. July 2010, 10:55:11
You just added the feature to remove the BOM from dictionary_input.csv files because the first entry of every dictionary was not indexed before (this was fixed while we were updating the Chinese normation class).  But, this new problem is to remove the BOM from reading the normation java files (NormationEng2.java, normationRus2.java, etc).
Title: Re: Russian transcription help
Post by: Gert on 26. July 2010, 19:29:05
Jeff,

I did run your 3 Normation classes through the compiler and uploaded version 3.5.6 of DictionaryGeneration_empty and JarCreator to the File Release System (the JarCreator update is required because of a 'bad' dependency to the Normation classes that I still need to remove some time in the future).

The compile worked fine; however I did not yet test the new version. Maybe you could have a look at that ?

Thank you for these 3 classes !!

And sorry for the long time that I needed to compile your files.

Best regards,
Gert
Title: Re: Russian transcription help
Post by: dreamingsky on 27. July 2010, 04:46:35
Hmm, it did not work.  I got this error:
Thrown de.kugihan.dictionaryformids.general.DictionaryClassNotLoadedException:
Class could not be loaded: de.kugihan.dictionaryformids.translation.normation.Nor
mationBul / Class could not be loaded: de.kugihan.dictionaryformids.translation.
normation.NormationBul

I looked inside verison 3.5.6 of DictionaryForMIDs.jar:
DictionaryForMIDs.jar\de\kugihan\dictionaryformids\translation\normation\

But, I did not see normationBul.class, normationCyr1.class, or normationCyr1.class in the directory.

Should the new normation files be there?  Or did I make a mistake?  Or do I need a new version of DictionaryGeneration too?

Title: Re: Russian transcription help
Post by: Gert on 27. July 2010, 06:47:09
Ahhhh, I think I forgot to add these files to the build file. The name of the Normation files need to be omitted from obfuscation in the build file.

Will correct that.

Gert
Title: Re: Russian transcription help
Post by: Gert on 28. July 2010, 19:45:18
Jeff,

I am sorry for the inconvenience !

Now I uploaded 3.5.7 (also of JarCreator) - hope that version is ok now.

Best greetings,
Gert
Title: Re: Russian transcription help
Post by: dreamingsky on 29. July 2010, 08:45:28
Hmm, it still doesn't work.  I got this error again:

Thrown de.kugihan.dictionaryformids.general.DictionaryClassNotLoadedException: C
lass could not be loaded: de.kugihan.dictionaryformids.translation.normation.Nor
mationBul / Class could not be loaded: de.kugihan.dictionaryformids.translation.
normation.NormationBul

I looked inside DictionaryForMIDs.jar and saw the 3 new files.

Do I need a new version of DictionaryGeneration with the new normation classes?  At the end of the error message it refers to DictionaryGeneration.java:

at de.kugihan.dictionaryformids.dictgen.DictionaryGeneration.main(DictionaryGeneration.java:95)


Here is the full error message:
Thrown de.kugihan.dictionaryformids.general.DictionaryClassNotLoadedException: C
lass could not be loaded: de.kugihan.dictionaryformids.translation.normation.Nor
mationBul / Class could not be loaded: de.kugihan.dictionaryformids.translation.
normation.NormationBul

de.kugihan.dictionaryformids.general.DictionaryClassNotLoadedException: Class co
uld not be loaded: de.kugihan.dictionaryformids.translation.normation.NormationB
ul
        at de.kugihan.dictionaryformids.dataaccess.DictionaryDataFile.getObjectF
orClass(DictionaryDataFile.java:306)
        at de.kugihan.dictionaryformids.dataaccess.DictionaryDataFile.initValues
(DictionaryDataFile.java:257)
        at de.kugihan.dictionaryformids.general.UtilWin.readProperties(UtilWin.j
ava:36)
        at de.kugihan.dictionaryformids.dictgen.DictionaryGeneration.main(Dictio
naryGeneration.java:95)
Thrown de.kugihan.dictionaryformids.general.DictionaryClassNotLoadedException: C
lass could not be loaded: de.kugihan.dictionaryformids.translation.normation.Nor
mationBul / Class could not be loaded: de.kugihan.dictionaryformids.translation.
normation.NormationBul

de.kugihan.dictionaryformids.general.DictionaryClassNotLoadedException: Class co
uld not be loaded: de.kugihan.dictionaryformids.translation.normation.NormationB
ul
        at de.kugihan.dictionaryformids.dataaccess.DictionaryDataFile.getObjectF
orClass(DictionaryDataFile.java:306)
        at de.kugihan.dictionaryformids.dataaccess.DictionaryDataFile.initValues
(DictionaryDataFile.java:257)
        at de.kugihan.dictionaryformids.general.UtilWin.readProperties(UtilWin.j
ava:36)
        at de.kugihan.dictionaryformids.dictgen.DictionaryGeneration.main(Dictio
naryGeneration.java:95)

Title: Re: Russian transcription help
Post by: Gert on 29. July 2010, 19:20:48
Jeff,

I am sorry - in the future I really need to test things before I throw them out ...

Yes, the dependency also exists for DictionaryGeneration. I will provide an update version there also.

Gert
Title: Re: Russian transcription help
Post by: Gert on 29. July 2010, 20:12:47
... I just uploaded DictionaryGeneration 3.5.7 .... just in case you are keen to test it (honestly speaking I just did put it there out of the compiler, without testing; I will have time to test in a few days).

Best greetings,
Gert
Title: Re: Russian transcription help
Post by: dreamingsky on 02. August 2010, 09:33:08
I tested it.  Everything works great.  Thank you very much

Jeff