If I contribute with dictionaries, will they appear in the DdM's palette?

Started by jn0101, 23. April 2010, 16:34:15

Previous topic - Next topic

0 Members and 1 Guest are viewing this topic.

jn0101

Hi, Im a developer of Apertium, an open-source platform for developing rule-based machine translation systems.

Ive also just got an Android phone and saw your palette of dictionaries downloadable directly thru the Android version. Quite impressive, but a lot of languages are missing.

We have ~20 bilingual dictionaries (see http://wiki.apertium.org/wiki/Main_Page) available on GPL.
Would you be interested in making these dictionaries available?

There have also been some work on http://lernu.net/ to make its big collection of Esperanto dictionaries available (see http://en.lernu.net/helpo/vortaroj.php and expand the VORTARO widget at the right).
If successfull, could these also appear inside the Android application?


Yours,
Jacob

Gert

Jacob,

well, of course we are interested to get your dictionaries on DictionaryForMIDs :D (even though until now I did not yet well understand the mechanisms of Apertium  :-\ ).

Cause your dictionaries are available under GPL we can put them on the DictionaryForMIDs web site for public download.

Concerning the list of dictionaries for Android: the Android version as well as the list of dictionaries for Android is maintained by Achim. Achim will have to speak on this.

Best regards,
Gert

axin

Hi Jacob, hi Gert,

Quote from: Gert on 23. April 2010, 19:25:57
Cause your dictionaries are available under GPL we can put them on the DictionaryForMIDs web site for public download.
Concerning the list of dictionaries for Android: the Android version as well as the list of dictionaries for Android is maintained by Achim. Achim will have to speak on this.
As soon as the dictionaries are available on the DictionaryForMIDs website I add the new links to the OTA installation system. They'll then immediately be available for automatic installation on Android devices. (You may need to drop me a quick note or it may take some time for me to realize that there are new files available...)

It's that easy  8)

Cheers,
Achim

jn0101

This is great news. I'll proceed to generate some dictionary files from Apertium data, then.

WRT "understand the mechanisms of Apertium" it would take some time - it's a full fledged MT system, you have to understand some linguistics to dive in. But ask, and I will happily explain :-)


Jacob

jn0101

OK, here is first try:

Here is what I did:

lt-expand /home/j/esperanto/apertium/apertium-eo-en/apertium-eo-en.eo-en.dix > input0.txt

cat input0.txt | grep -v ':>:' | grep -v ':<:' | sed -re 's/<[^:]+:/:/g' | grep -v '><' | grep -v REGEX | grep -v '#' | sed -re "s/:/\t/g" | sed -e 's/</[/g' | sed -e 's/>/]/g' > input1.txt

rm -rf outputdictionary
mkdir outputdictionary

java -jar ../DictionaryGeneration.jar input1.txt outputdictionary .


The whole stuff (input and result) is here: http://javabog.dk/filer/en-eo.zip


However, I can't run it, as it seems your PC version (http://dictionarymid.sourceforge.net/pc.html) doesent support seem to work for me (on Linux). 
I tried DictionaryForMIDs_2.4.0_EngPor_IDP_dev.zip and DictionaryForMIDs_3.4.0_IDP(Eng-Fra).zip
I get "The dictionary uses own programcode and may not work as expected."

$ java -jar DictionaryForMIDs/DictionaryForMIDs.jar
Exception in thread "main" java.lang.NullPointerException
   at de.kugihan.dictionaryformids.hmi_j2se.DictionaryForSE.fillTableColums(DictionaryForSE.java:352)
   at de.kugihan.dictionaryformids.hmi_j2se.DictionaryForSE.<init>(DictionaryForSE.java:120)
   at de.kugihan.dictionaryformids.hmi_j2se.DictionaryForSE.main(DictionaryForSE.java:98)


Will I have to actually install it on my Android to be able to see of it works?

Gert

You need to run JarCreator in order to build the files DictionaryForMIDs_xxx.jar and DictionaryForMIDs.jad (actually, the jad-file is only need for Java ME).

Also for the PC version you need to have the file DictionaryForMIDs_xxx.jar; there you can simple load this Jar-file.

Best regards,
Gert

axin

Quote from: jn0101 on 25. April 2010, 00:44:19
OK, here is first try:

Here is what I did:

lt-expand /home/j/esperanto/apertium/apertium-eo-en/apertium-eo-en.eo-en.dix > input0.txt

cat input0.txt | grep -v ':>:' | grep -v ':<:' | sed -re 's/<[^:]+:/:/g' | grep -v '><' | grep -v REGEX | grep -v '#' | sed -re "s/:/\t/g" | sed -e 's/</[/g' | sed -e 's/>/]/g' > input1.txt


Btw, you got the languages mixed up: language1 seems to be Esperanto...

Good work though, I tried on Android and it looks fine!

Achim

jn0101

Great to hear you liked it. I'm really happy for the positive feedback.


Here is a 2nd try: http://javabog.dk/filer/2nd_try.zip

In this file I have Esperanto-English and Esperanto-Nepali.
The dictionary files are in eo-en/jar/dictionary and in eo-ne/jar/dictionary/.

To be able to test Ive also packaged using an existing package with JAR/JAD (DictionaryForMIDs_3.4.0_IDP(Eng-Fra).zip). I think your website is great but I feel you lack a wiki where users can document themselves how to get the things up and running. F.ex I have been in doubt how I could most easily test my work (I couldnt get the PC version to work, but Ive managed to get a midlet running in an emulator, so no worry).

Here are some questions:

1) Esperanto have some special characters (ĉ, ĝ, ĵ ...). For example a horse is 'ĉevalo'. Would it be possible that user enters e.g. 'cevalo' (w.o the ^), 'cxevalo' or '^cevalo' for searching such words?
Has this something to do with language1NormationClassName ? Will I have to write my own class to get support for this?


2) Nepali is based on the Devanagari alphabet. How to handle searching in that?
(BTW I did some transliteration work in Java - the project is dead but the code remains: http://code.google.com/p/nepaliconverter)


3) General customization.
It seems MIDlet-Vendor: must be 'Gert Nuber and Contributors'. I'd like to have something like 'Data from the Apertium Project'
Where to put the GPL license/copyright notice/source ?


4) Semisynonyms and directions. In Apertium entries can be in one direction (i.e. you have a A==B and an unidirectional A2->B, where A2 is a noncommon synonym for A). Should such things be handled in DfM ?


5) Naming conventions for word classes. I see you usually use e.g. [Noun]. I am unsure to what range I should stick to your naming conventions. In eo-ne/konverti.sh Ive done like this:

| sed 's/<n>/ [Nomo]/g' \
| sed 's/<np>/ [Propra nomo]/g' \
| sed 's/<adj>/ [Adjektivo]/g' \
| sed 's/<adv>/ [Adverbo]/g' \
| sed 's/<cnjcoo>/ [Konjunkcio]/g' \
| sed 's/<prn>/ [Pronomo]/g' \
| sed 's/<num>/ [Numero]/g' \
| sed 's/<vblex><itr>/ [Verbo netransitiva]/g' \
| sed 's/<vblex><tr>/ [Verbo transitiva]/g' \
| sed 's/<vblex><ditr>/ [Verbo duoble transitiva]/g' \
| sed 's/<ij>/ [Interjekcio, krivorto]/g' \

(assuming you know the Unix command 'sed'... or are you all Windows guys? ;-)


6) It would be great with some collaborative setup where you could edit the files, to fix stuff. Is there somewhere where we could share files (Apertium has SVN, if I press them I can get you a SVN write access, but perhaps you have an easier option... I don't know how strict you police your SVN access).


7) I have an Esperanto translation of the UI lying around. I'll send it when I get it digged out. Does the Android version use the same localization strings?



Thank you,
and Achim, if you could send me a sample Android app / data, I'd be thankfull (or is it just to ZIP the stuff and put on the SD card and open it?)

Wow, a lot of questions, hope its OK :-)

Yours,
Jacob

Gert

QuoteWow, a lot of questions, hope its OK :-)

Sure, that is why we have a forum !!


I am short of time right now (and I will not be able to read emails the next week), so I will just hastily try to address one or the other of your questions. Fortunately we do have active supporters for DfM who may provide you all the support you need  ::)


Quote1) Esperanto have some special characters (ĉ, ĝ, ĵ ...). For example a horse is 'ĉevalo'. Would it be possible that user enters e.g. 'cevalo' (w.o the ^), 'cxevalo' or '^cevalo' for searching such words?
Has this something to do with language1NormationClassName ? Will I have to write my own class to get support for this?

If the characters are based on the Latin characters, then NormationLat should do. NormationLat will allow you to search, e.g., "ĉ" with a normal "c".
Please have a look at NormationLat.java to check whether all characters are covered.


Quote3) General customization.
It seems MIDlet-Vendor: must be 'Gert Nuber and Contributors'. I'd like to have something like 'Data from the Apertium Project'
Where to put the GPL license/copyright notice/source ?

There are two copyrights:
(1) for the dictionary application (which is for Java ME: 'Gert Nuber and Contributors')
and
(2) for the dictionary data; you need to put that in the infoText-property (see http://dictionarymid.sourceforge.net/newdict.html), additionally you may include a detailed readme/license in the ZIP file.


Quote4) Semisynonyms and directions. In Apertium entries can be in one direction (i.e. you have a A==B and an unidirectional A2->B, where A2 is a noncommon synonym for A). Should such things be handled in DfM ?

You can do either way in DfM. It just depends on the input that you provide to DictionaryGeneration, respectively, if you are really much advanced, you could use a DictionaryUpdate-class for special processing. In general, DictionaryForMIDs provides a lot of flexibility for these things. We should look at your item 4) closely to figure out how to do this best.

Quote5) Naming conventions for word classes. I see you usually use e.g. [Noun]. I am unsure to what range I should stick to your naming conventions. In eo-ne/konverti.sh Ive done like this:

If the predefined contents do not match your needs, just use your own ones; no problem with that.


Quote7) I have an Esperanto translation of the UI lying around. I'll send it when I get it digged out. Does the Android version use the same localization strings?

DfM and Android both have their own User Interface and as such their own UI translation files. I will gladly incorporate the Esperanto translation for Java ME (guess Achim may do for Android).

@Achim: oh, what is the latest on the Java ME translations of the International Mother Language Day ... honestly speaking I did not yet work those into the Java ME version.

Regards,
Gert

jn0101

WRT 7): Great Gert. Ive sent you the translation in a private mail.

Jacob

jn0101

@Achim: Ive corrected the Danish translation (the æ sign had become an œ) and added a draft of an Esperanto translation.

Here is output of svn diff. My sourceforge username is 'nordfalk' if you prefer that I add it myself. The Esperanto translation will certainly needs updating, as its just a quick draft that Ive also sent to a translator.

Jacob

jn0101

Achim:

Here is the final version of Android/res/values-eo/strings.xml

Yours,
Jacob

axin

Hi Jacob,

thanks for your great update! Can you check the file Android/res/values-eo/strings.xml in your previous post again? It seems to be empty... Just upload it once more or send me an email, I'll push it together with your patch to the SVN.

To test a dictionary on Android, put the dictionary directory (the one with one DictionaryForMIDs.properties and many *.csv files) somewhere on the SD-card. You could also just put the jar file there, but speed will get significantly slower...

-Achim

jn0101

Sure, here it is.

BTW Ive made changes in a few places in the source tree.

$ svn status
M       Android/res/values/app-strings.xml
M       Android/res/values/strings.xml
A       Android/res/values-eo
A       Android/res/values-eo/strings.xml
M       Android/res/values-da/strings.xml
M       Android/res/values-zh-rTW/strings.xml
M       Android/res/values-hu/strings.xml
M       Android/res/values-zh-rCN/strings.xml
M       JavaME/src/de/kugihan/dictionaryformids/hmi_java_me/lcdui_extension/DfMForm.java
M      DictionaryForMIDs
?       DictionaryForMIDs/src/de/kugihan/dictionaryformids/translation/normation/NormationEpo.java


The change in JavaME/src/de/kugihan/dictionaryformids/hmi_java_me/lcdui_extension/DfMForm.java is a workaround to use microemu.  See
http://code.google.com/p/microemu/issues/detail?id=44&colspec=ID%20Type%20Status%20Priority%20Milestone%20Owner%20Summary%20Reporter%20Stars


DictionaryForMIDs/src/de/kugihan/dictionaryformids/translation/normation/NormationEpo.java is Esperanto normalization.

Jacob