New Esperanto dictionaries are ready for your suggestions

Started by jn0101, 28. May 2010, 13:20:31

Previous topic - Next topic

0 Members and 1 Guest are viewing this topic.

jn0101

Finally, Ive more or less finished the conversion of the very big list of dictionaries generously donated from http://lernu.net.

You can see the full list (40 language pairs) at http://javabog.dk/filer/paroj/AA_LEGU_MIN.html .
(fuliumi = browse,   provi = try).


On http://javabog.dk/filer/paroj/ is the ZIP files, ready for upload on sourceforge when the last problems and questions are solved:


1) The Esperanto flag (and the Hebrew flag) is missing. Gert, could you add missing flags to SVN ?

Right now I dont use the JarCreator. I just use ZIP and it works fine, but I will change to JarCreator to get rid of unneeded flags.


2) I'd like Esperanto translation of all the language names. This means the list of languages for the UI:
LanguageEnglish
LanguageVietnamesee
LanguageChinese
LanguageJapanese
LanguageThai
LanguageHindi
LanguageIndonesian
LanguageFrench
LanguageSpanish
LanguageGerman
LanguageItalian
LanguageLatin
LanguageRussian
LanguageArabic
LanguageCzech
LanguageSlovak

would have to be extended with:
bg,Bulgara
ca,Kataluna
fa,Persa
fi,Finna
he,Hebrea
hr,Kroata
hu,Hungara
lt,Litova
nl,Nederlanda
no,Norvega
pl,Pola
pt,Portugala
sv,Sveda

Gert, should I just go on adding these language names to JavaME/src/de/kugihan/dictionaryformids/hmi_java_me/uidisplaytext/DictionaryForMIDs.languages?


3) Achim, after upload, should I generate something for your DataLoader class a la
      $data->addDictionary(new Dictionary("Chemical Elements", 174125, "http://sourceforge.net/projects/dictionarymid/files/dictionary%20Elements/3.4.0/DictionaryForMIDs_3.4.0_Elements.zip/download", "DfM_3.4.0_Elements", NULL, NULL, 0, "2009-11-11 12:00:00"));
?


4) In most cases I have a dictionary for each direction. I know this is kinda the double data, but the source database from http://lernu.net has the directions seperate: Try it out on http://eo.lernu.net/cgi-bin/vortaro.pl .
Please comment if you really really think I should try hard to somehow unify the directions (I have no idea how to do that in a good way).


5) The non-Latin languages are not tested very well. Ive done my best but I don't know how to enter e.g. Chinese or Japanese characters, and I am not sure it everything is OK here (assistance needed! :-)
I set the normation class, eg in eo-jp I have:
language2NormationClassName=de.kugihan.dictionaryformids.translation.normation.NormationJpn
is that enough or is there more to do?


6) Is there anything else you suggest I correct before release?
Please check if the ZIP files (in http://javabog.dk/filer/paroj/) look OK.


7) My conversion script (bash) can be found at http://javabog.dk/filer/paroj/konverti.sh
Perhaps you spot something Ive missed.

Thanks!

Jacob

Gert

Super !!!


Quote1) The Esperanto flag (and the Hebrew flag) is missing. Gert, could you add missing flags to SVN ?
I will send Zdenek an email - he provided the flags.


QuoteGert, should I just go on adding these language names to JavaME/src/de/kugihan/dictionaryformids/hmi_java_me/uidisplaytext/DictionaryForMIDs.languages?

Ah, if you do all the work, that sounds comfortable to me ;)
I am not sure whether I did not understand what you intend to do: Do you intend to extend that list with languageBulgarian, languageCatalan etc. ? Would be ok for me.


Quote4) In most cases I have a dictionary for each direction. I know this is kinda the double data, but the source database from http://lernu.net has the directions seperate: Try it out on http://eo.lernu.net/cgi-bin/vortaro.pl  .
Please comment if you really really think I should try hard to somehow unify the directions (I have no idea how to do that in a good way).
I see, you have one dictionary for, example, EpoDeu and another for DeuEpo.
You could merge them in one bidirectional dictionary that includes both directions. Is a bit trick though; well I guess you already made most of the work. Just read http://dictionarymid.sourceforge.net/newdictMultiple.html


Quote5) The non-Latin languages are not tested very well. Ive done my best but I don't know how to enter e.g. Chinese or Japanese characters, and I am not sure it everything is OK here (assistance needed! :-)
I set the normation class, eg in eo-jp I have:
language2NormationClassName=de.kugihan.dictionaryformids.translation.normation.NormationJpn
is that enough or is there more to do?
Normally yes; however I believe for Chinese and Japanese the normation is for the Latin replacements (e.g. Pinyin for Chinese). Well, Jeff knows that better than me.
In any case, it should not harm to set these normation classes.


Best greetings,
Gert

jn0101

Thanks for some great replies. I will proceed, then...

QuoteI see, you have one dictionary for, example, EpoDeu and another for DeuEpo.
You could merge them in one bidirectional dictionary that includes both directions. Is a bit trick though; well I guess you already made most of the work. Just read http://dictionarymid.sourceforge.net/newdictMultiple.html

Before I throw in potentially another day of work in this, please explain what happens.
Wouldnt the dictionary/JAR size be doubled up, as you throw two wordlists in?

As you see the JAR size is already often several megabytes. Doubling that number wouldnt be good. Then it'd be better to let people just choose the direction that is most important for them.

Jacob

axin

Quote from: jn0101 on 28. May 2010, 13:20:31
2) I'd like Esperanto translation of all the language names. This means the list of languages for the UI:
would have to be extended with:
Gert, should I just go on adding these language names to JavaME/src/de/kugihan/dictionaryformids/hmi_java_me/uidisplaytext/DictionaryForMIDs.languages?

As Android has it's own UI files, the languages also need to be added to the array language_localization in
Android/res/values/app-strings.xml
Feel free to either add the languages there as well or to ping me after you committed the JavaME file and I'll add the languages myself.


Quote from: jn0101 on 28. May 2010, 13:20:31
3) Achim, after upload, should I generate something for your DataLoader class a la
      $data->addDictionary(new Dictionary("Chemical Elements", 174125, "http://sourceforge.net/projects/dictionarymid/files/dictionary%20Elements/3.4.0/DictionaryForMIDs_3.4.0_Elements.zip/download", "DfM_3.4.0_Elements", NULL, NULL, 0, "2009-11-11 12:00:00"));
?

Those entries will be (automatically) moved to the database. If you can create new entries automatically then just do that, but if you prefer a web interface for adding the entries, you can log into https://mysql-d.sourceforge.net/ using name/password from the DataLoader class.


Quote from: jn0101 on 28. May 2010, 13:20:31
4) In most cases I have a dictionary for each direction. I know this is kinda the double data, but the source database from http://lernu.net has the directions seperate: Try it out on http://eo.lernu.net/cgi-bin/vortaro.pl .
Please comment if you really really think I should try hard to somehow unify the directions (I have no idea how to do that in a good way).

From the the Android point of view, one dictionary including both directions would be a little easier to use... But having two is still fine, especially if unifying them would be a hassle.


Amazing list of dictionaries and great scripting, btw :)

-Achim

Gert

@jn0101

Quote
QuoteI see, you have one dictionary for, example, EpoDeu and another for DeuEpo.
You could merge them in one bidirectional dictionary that includes both directions. Is a bit trick though; well I guess you already made most of the work. Just read http://dictionarymid.sourceforge.net/newdictMultiple.html

Before I throw in potentially another day of work in this, please explain what happens.
Wouldnt the dictionary/JAR size be doubled up, as you throw two wordlists in?

As you see the JAR size is already often several megabytes. Doubling that number wouldnt be good. Then it'd be better to let people just choose the direction that is most important for them.

I just looked at EpoDeu and DeuEpo ... actually both of them are bidirectional (I did not realize that before) ! For both dictionaries you can search Esperanto and German words.
Just at the side: I noticed that for the German 'schlafen' there is more than one translation in EpoDeu while there is one in DeuEpo (only 'dormi' there).

So, because your dictionaries are already bidirectional, I think there is no reason to merge them.

Best regards,
Gert


jn0101

OK, heres the progress:

ad 1) Ive added the Esperanto flag.
(it was really easy -
wget http://upload.wikimedia.org/wikipedia/commons/thumb/f/f5/Flag_of_Esperanto.svg/32px-Flag_of_Esperanto.svg.png
wget http://upload.wikimedia.org/wikipedia/commons/thumb/f/f5/Flag_of_Esperanto.svg/12px-Flag_of_Esperanto.svg.png
etc :-)


ad 2) English, Esperanto, Danish and Norvegian translatios of the new language names done


ad 4) Yes, if the user chooses to download Esperanto-German (with 23948 entries) he/she can easily reverse it to get an German-Esperanto (with 23948 reversed entries).
Still, the German-Esperanto (with 17041) might be more usefull if the user is mostly translating from german.

So I'll keep both directions if both might be usefull. There might also be some that prefers the smaller version. For example:

Itala-Esperanto (9609 vortoj) foliumi provi JAD JAR ZIP (527K)
Esperanto-Itala (29940 vortoj) foliumi provi JAD JAR ZIP (1,2M)

if users phone cannot handle the 1.2M then the 500K might be worth trying. Also, a beginner in Esperanto might actually prefer a smaller wordlist (there is actually a 'small' subset of the most used 3000 words that might be worth considering - see http://eo.lernu.net/cgi-bin/statistiko.pl - but I have enough work just to get the Lernu people make opinions, so Im not going to propose that).



I'll proceed now with publishing on Sourceforge and writing a manual.

In the Android version of the manual I will refer to the Android Market and describe how to download thru the app.
Achim: do you have a 2d barcode for your app?


As I am using the SVN version I'll name the ZIP files with version 3.5.0, like:
DictionaryForMIDs_3.5.0_DeuEpo_Lernu.zip
Gert, is that OK and could you release a 3.5.0 version, just to keep things in sync, within the next few weeks ?


Jacob

Gert

@jn0101
QuoteGert, is that OK and could you release a 3.5.0 version, just to keep things in sync, within the next few weeks ?

Could you do me a favour and create one of your dictionaries with 3.5.0 from here http://dictionarymid.sourceforge.net/forum/index.php?topic=233.0 and post it as test version under that "DictionaryForMIDs 3.5.0 testversion available / testers wanted" topic ? A version that you did generate with JarCreator.

Best regards,
Gert

jn0101

Uh, I saw your post too late and most have been uploaded to
http://sourceforge.net/projects/dictionarymid/files/?sort=filename&sortdir=asc

I'll stop the uploading now and read your post

jn0101

OK, Ive read it now.

Gert, Ive been testing and working on newest all the time and and following the SVN changes. To me it seems quite inprobable that there are any issues, and I'd like to release before the 11th (actually, today :-). Perhaps we could make these newly released dictionaries part of the ones people can test on, as I'd really like them to get tested more (especially those with non-Latin character sets).

So, I propose I continue the uploading as I intended. This is also required to get proceeding on the OTA stuff with Achin, and for the dictionaries to become visible on the Android app (which I very much look forward to). The last would also make a larger group of people test the 3.5 version (and the Eo dixes).

So, to not delay evrything, inclusive testing of Esperanto dictionaries (there are no previous versions people can use in the meantime, you know :-), I suggest I proceed.


Jacob

Gert

@jn0101
QuoteGert, Ive been testing and working on newest all the time and and following the SVN changes. To me it seems quite inprobable that there are any issues, and I'd like to release before the 11th (actually, today :-). Perhaps we could make these newly released dictionaries part of the ones people can test on, as I'd really like them to get tested more (especially those with non-Latin character sets).

So, I propose I continue the uploading as I intended. This is also required to get proceeding on the OTA stuff with Achin, and for the dictionaries to become visible on the Android app (which I very much look forward to). The last would also make a larger group of people test the 3.5 version (and the Eo dixes).

So, to not delay evrything, inclusive testing of Esperanto dictionaries (there are no previous versions people can use in the meantime, you know :-), I suggest I proceed.

Ok, I am fine with that ! You are using JarCreator, right ... ?

Under that posting for the 3.5.0 test, could you please link to one or more of your 3.5.0 versions for people to test ?

Best regards,
Gert

jn0101

No, I wasnt.

I'll rebuild using JarCreator.

I need to use the compiled DictionaryForMIDs_3.5.0_empty.zip, you provided right?

Gert

Well, I do not want to cause additional effort to you. The reason why I was asking for JarCreator is, that JarCreator handles some cases that are not obvious. Such as to ensure that certain manifest-properties to not exceed 32 characters (required for some Motorola devices).

Let me think ... apart from that 32 character problem there should be no additional crucial problems. So if your manifest-properties for application name and some others does not exceed 32 chars, then your current version should be ok.

If you still plan to re-create your files, please use the 3.5.0 that I provide.

Gert

dreamingsky

Quote5) The non-Latin languages are not tested very well. Ive done my best but I don't know how to enter e.g. Chinese or Japanese characters, and I am not sure it everything is OK here (assistance needed! :-)
I set the normation class, eg in eo-jp I have:
language2NormationClassName=de.kugihan.dictionaryformids.translation.normation.NormationJpn
is that enough or is there more to do?

Hi Jacob, for the Japanese dictionary, also use a DictionaryUpdate.  You can use the same one as the EDICT Japanese dictionary:

language2DictionaryUpdateClassName: de.kugihan.dictionaryformids.dictgen.dictionaryupdate.DictionaryUpdateEDICTJpn
language2NormationClassName: de.kugihan.dictionaryformids.translation.normation.NormationJpn

For the Chinese dictionary, use a DictionaryUpdate too.  You don't need a Normation class, though:
language2DictionaryUpdateClassName=de.kugihan.dictionaryformids.dictgen.dictionaryupdate.DictionaryUpdateCEDICTChi

For all the non-roman languages, you won't be able to search the non-roman language.  You'll only be able to search from Esperanto.  For example, you can only search Esperanto -> Russian or Esperanto -> Hebrew.  You can't search from Russian -> Esperanto or Hebrew -> Esperanto.

We're still waiting for someone to write the code to add a custom text input box for DictionaryforMIDs.  This would allow inputting languages which use non-roman scripts.  Here is a link to the discussion:
http://dictionarymid.sourceforge.net/forum/index.php?topic=52.0

Jeff

jn0101

I never exceed the 32 byte limit. I also remove unneeded icons, as in JarCreator.java.

I did not obfuscate and I do not delete de/kugihan/dictionaryformids/dataaccess/zip/*.class as suggested in  JarCreator.java.

String[] excludeEntries = {
                                                                               "de/kugihan/dictionaryformids/dataaccess/zip"  // zip library for decompr
ession of dictionaries in the file system
                                                                         };

However, I think that didnt work anyway - the package name has been obfuscated in your DictionaryForMIDs_3.5.0_empty.zip anyway, right?


Jacob