Menu

Show posts

This section allows you to view all posts made by this member. Note that you can only see posts made in areas you currently have access to.

Show posts Menu

Messages - jn0101

#46
Gert, thanks for the explanations, your intention is much clearer now, and even in my conservative mind, it seems a good idea.

About replaceEscapeCharacters, I cannot help.
(XSLT is my no 1 language I just love to hate ;)

Jacob
#47
Finally, Ive more or less finished the conversion of the very big list of dictionaries generously donated from http://lernu.net.

You can see the full list (40 language pairs) at http://javabog.dk/filer/paroj/AA_LEGU_MIN.html .
(fuliumi = browse,   provi = try).


On http://javabog.dk/filer/paroj/ is the ZIP files, ready for upload on sourceforge when the last problems and questions are solved:


1) The Esperanto flag (and the Hebrew flag) is missing. Gert, could you add missing flags to SVN ?

Right now I dont use the JarCreator. I just use ZIP and it works fine, but I will change to JarCreator to get rid of unneeded flags.


2) I'd like Esperanto translation of all the language names. This means the list of languages for the UI:
LanguageEnglish
LanguageVietnamesee
LanguageChinese
LanguageJapanese
LanguageThai
LanguageHindi
LanguageIndonesian
LanguageFrench
LanguageSpanish
LanguageGerman
LanguageItalian
LanguageLatin
LanguageRussian
LanguageArabic
LanguageCzech
LanguageSlovak

would have to be extended with:
bg,Bulgara
ca,Kataluna
fa,Persa
fi,Finna
he,Hebrea
hr,Kroata
hu,Hungara
lt,Litova
nl,Nederlanda
no,Norvega
pl,Pola
pt,Portugala
sv,Sveda

Gert, should I just go on adding these language names to JavaME/src/de/kugihan/dictionaryformids/hmi_java_me/uidisplaytext/DictionaryForMIDs.languages?


3) Achim, after upload, should I generate something for your DataLoader class a la
      $data->addDictionary(new Dictionary("Chemical Elements", 174125, "http://sourceforge.net/projects/dictionarymid/files/dictionary%20Elements/3.4.0/DictionaryForMIDs_3.4.0_Elements.zip/download", "DfM_3.4.0_Elements", NULL, NULL, 0, "2009-11-11 12:00:00"));
?


4) In most cases I have a dictionary for each direction. I know this is kinda the double data, but the source database from http://lernu.net has the directions seperate: Try it out on http://eo.lernu.net/cgi-bin/vortaro.pl .
Please comment if you really really think I should try hard to somehow unify the directions (I have no idea how to do that in a good way).


5) The non-Latin languages are not tested very well. Ive done my best but I don't know how to enter e.g. Chinese or Japanese characters, and I am not sure it everything is OK here (assistance needed! :-)
I set the normation class, eg in eo-jp I have:
language2NormationClassName=de.kugihan.dictionaryformids.translation.normation.NormationJpn
is that enough or is there more to do?


6) Is there anything else you suggest I correct before release?
Please check if the ZIP files (in http://javabog.dk/filer/paroj/) look OK.


7) My conversion script (bash) can be found at http://javabog.dk/filer/paroj/konverti.sh
Perhaps you spot something Ive missed.

Thanks!

Jacob
#48

First, before starting out to define a new XML format, I trust you have checked and you are extremely sure that no existing format could be possibly used.

There is already way too many ways of representing a dictionary in XML out there.

Here is, just as an example Apertium's (http://apertium.svn.sourceforge.net/viewvc/apertium/incubator/apertium-en-fr/apertium-en-fr.en-fr.dix?revision=14163&view=markup):

  <e><p><l>bird<s n="n"/></l><r>oiseau<s n="n"/><s n="m"/></r></p></e>
  <e r="RL"><p><l>cat<s n="n"/></l><r>chat<s n="n"/></r></p></e>
(e=entry, p=pair, l=left, r=right, s=symbol)

By using an existing format you get a bunch of tools for free. For example, the Apertium format can be transformed and used in  a lot of ways, i.a by http://wiki.apertium.org/wiki/Apertium-dixtools .

I am not suggesting adopting Apertium's format. But I strongly encourage you to make sure an existing format (and its toolset and existing data) cannot be adopted.



Second, your format seems extremely verbose. Uncompressed files will be extremely large, a factor 30 of the real information content in them (the cat=chat entry takes 11 lines each approx 30 chars).
Consider a flatter XML structure and some shorter names for the inner elements.

The tags also doesent seem intuitive:

What is a <translationOfDictionary>? (You dont translate dictionaries, you translate words!).
Rename it to <entry> or at least <translationOfWord>.

Rename <translationForLanguage> to <translation>, and rename includeInIndex="true" to index. Make the default value be "true" so the attribute only has to be put on those entries which should be marked "false" (to be excluded from index, I suppose).

Rename <translationForLanguagePart> to <part> and consider if you really need an extra level here.

What is <partNonContent>??



Third you have no notion of naming the languages. I'd add attribute lang="fr" to <translationForLanguage> / <translation>.



Fourthly I am not sure if XML is a win. These days many are going away from XML and back to plain files, as XML really hasnt proved to be an universal and easy to read format easily transformable and treatable evryone that we all were promised 10 years ago. Comma separated files are easier for most people (they can work in spread sheets, for example), and for dictionaries I really can't see where the extreme flexibility (at the cost of easy parsing) is a win.


Yeah, I know I'm negative (and I might be wrong :-), but just consider my thoughts.

Jacob
#50
Quote from: Gert on 01. May 2010, 01:07:14
Quote1) Updates to include Esperanto in the J2ME GUI:
M       JavaME/src/de/kugihan/dictionaryformids/hmi_java_me/uidisplaytext/UIDisplayTextContents.java
M       JavaME/src/de/kugihan/dictionaryformids/hmi_java_me/uidisplaytext/UIDisplayTextItems.java
M       JavaME/src/de/kugihan/dictionaryformids/hmi_java_me/uidisplaytext/DictionaryForMIDs.languages

Great, would you please just commit these updates to SVN (now you have the access rights).

Here you are
http://dictionarymid.svn.sourceforge.net/viewvc/dictionarymid?view=rev&revision=279


Quote from: Gert on 01. May 2010, 01:07:14
Quote2) An update that will make DfM work with microemu.  See
http://code.google.com/p/microemu/issues/detail?id=44&colspec=ID%20Type%20Status%20Priority%20Milestone%20Owner%20Summary%20Reporter%20Stars
The update should be deleted in 6 months (when a new version of microemu is released)

M       JavaME/src/de/kugihan/dictionaryformids/hmi_java_me/lcdui_extension/DfMForm.java

That is DfMForm.java only, right ? Or also DictionaryForMIDs.java ?
Would you do me a favour: could you just add are remark to that line that you entered/modified (sort of "// introduced cause of MicroEmu; to be removed after xxx") ?

Done: http://dictionarymid.svn.sourceforge.net/viewvc/dictionarymid?view=rev&revision=278

Quote from: Gert on 01. May 2010, 01:07:14
Quote3) This is most important:

A         DictionaryForMIDs/src/de/kugihan/dictionaryformids/translation/normation/NormationEpo.java

Just a minor point: could you include your name (or 'identification') in the file header ?

And: could you update that web page:  http://dictionarymid.sourceforge.net/newdictNormationLang.html ? Well, just a question  ::)
[/quote]

Done :-)
#51
OK, committed:

http://dictionarymid.svn.sourceforge.net/viewvc/dictionarymid?view=rev&revision=276

(I see I got the indentation wrong, thats fixed in next commit)
#52
Guys,

I have contact with a Korean Esperanto speaker that made an iPhone dictionary solution which he is considering open sourcing.
He got really enthusiastic about the 26 language pairs from http://lernu.net that are on their way, and I suggested he could adapt his solution to use the DfM engine (and thus have access to all the pairs that DfM publishes).

Ive used http://xmlvm.org before to test convert code from Java to Objective-C and I think it wouldnt be too had to convert the DfM engine to be usable from iPhone. Eventually the whole DfM Android user interface could be converted, but that would probably take a lot more work.

Do you have some sample code (or test code) lying around that shows how to initialize and use the DfM engine from an external app?

Jacob
#53
Sorry, I don't have enough experience in J2ME coding.

I could do a lot, but to restructure and refactor anything, I need unit tests, to see if I break something.

With no unit tests its most often best to leave the code as is - you just risk breaking a lot of things without noticing and you would be using time during the following years, finding out about broken stuff by user feedback and cleaning up - and give bad user expeciences at the same time.

Jacob
#54
Quote from: Gert on 02. May 2010, 08:12:54
Or, would you like to have a dictionary download page similar as http://dictionarymid.sourceforge.net/dict.html, but better adapted to cell phones ? That means making something like a dictmobile.html where people can directly install the Jad/Jar-files (instead of first downloading a zip file).

+1 for making JAR and JADs available directly, as this would allow people to try out the dictionaries the and application from their desktop web browsers, before they install on their phone.

Note that you have to set up a MIME-type for JAD on the web server before it works.

Here are some demos, using the Java Phone Emulator from http://microemu.org.

Try this:
Open emulator at http://microemu.org/microemu-webstart/index.html
Open a web page with links to a DfM  http://javabog.dk/filer/paroj/eo-en/ and drag JAD file to it.

Or this direct link - which will web start Microemu and instruct it to download DfM with a dictionary, ready to use:
http://microemu.org/webstart/javabog.dk/filer/paroj/eo-en/DictionaryForMIDs_eo-en_lernu.jnlp

Its also available as applets:
http://javabog.dk/filer/paroj/apleto/provi_en_apleto.html


Quote from: Gert on 02. May 2010, 08:12:54
Or should all the DictionaryForMIDs dictionaries be downloadable from one application hoster, such as phoload.com or getjar.com.

This would also be cool.

#55
Thanks, finally it works!

Ive got the same error again and again on Eclipse, but this time it seems like I suceeded!


<code>
java.lang.StackOverflowError
   at sun.nio.cs.UTF_8$Decoder.decodeArrayLoop(UTF_8.java:212)
   at sun.nio.cs.UTF_8$Decoder.decodeLoop(UTF_8.java:358)
   at java.nio.charset.CharsetDecoder.decode(CharsetDecoder.java:544)
   at java.lang.StringCoding$StringDecoder.decode(StringCoding.java:136)
   at java.lang.StringCoding.decode(StringCoding.java:169)
   at java.lang.String.<init>(String.java:444)
   at java.lang.String.<init>(String.java:516)
   at java.io.UnixFileSystem.list(Native Method)
   at java.io.File.list(File.java:970)
   at java.io.File.listFiles(File.java:1048)
   at com.android.ide.eclipse.adt.internal.wizards.newproject.NewProjectCreationPage.findSamplesManifests(Unknown Source)
   at com.android.ide.eclipse.adt.internal.wizards.newproject.NewProjectCreationPage.findSamplesManifests(Unknown Source)
   at com.android.ide.eclipse.adt.internal.wizards.newproject.NewProjectCreationPage.findSamplesManifests(Unknown Source)
   at com.android.ide.eclipse.adt.internal.wizards.newproject.NewProjectCreationPage.findSamplesManifests(Unknown Source)
   at com.android.ide.eclipse.adt.internal.wizards.newproject.NewProjectCreationPage.findSamplesManifests(Unknown Source)
   at com.android.ide.eclipse.adt.internal.wizards.newproject.NewProjectCreationPage.loadSamplesForTarget(Unknown Source)
   at com.android.ide.eclipse.adt.internal.wizards.newproject.NewProjectCreationPage.onSdkTargetModified(Unknown Source)
   at com.android.ide.eclipse.adt.internal.wizards.newproject.NewProjectCreationPage.access$21(Unknown Source)
   at com.android.ide.eclipse.adt.internal.wizards.newproject.NewProjectCreationPage$7.widgetSelected(Unknown Source)
   at com.android.sdkuilib.internal.widgets.SdkTargetSelector.setSelection(SdkTargetSelector.java:197)
   at com.android.ide.eclipse.adt.internal.wizards.newproject.NewProjectCreationPage.extractNamesFromAndroidManifest(Unknown Source)
   at com.android.ide.eclipse.adt.internal.wizards.newproject.NewProjectCreationPage.onSampleSelected(Unknown Source)

</code>

BTW just creating new projects with existing sources also worked.

Jacob
#56
OK, Ive done the encryption:

public class DictionaryGeneration {
...
+
+        if ("weakCrypt".equals(DictionaryDataFile.fileEncodingFormat)) {
+          directoryOutput = weakEncrypt(directoryOutput);
+        }
+


+  /**
+   * Very weak encrytion/decryption mechanism
+   */
+  private static String weakEncrypt(String directoryOutput) {
+    StringBuilder res = new StringBuilder(directoryOutput.length());
+    for (char ch : directoryOutput.toCharArray()) {
+        char ch0 = ch;
+        if (ch>=60 && ch<124) ch = (char) (((ch-60)^'+') + 60);
+        res.append(ch);
+        //if (ch>=60 && ch<124) ch = (char) (((ch-60)^'+') + 60);
+        //if (ch0 != ch) throw new InternalError(directoryOutput + ch0 + ch);
+    }
+    return res.toString();
+  }
+
}


and in DictionaryForMIDs/src/de/kugihan/dictionaryformids/translation/Translation.java

@@ -517,7 +517,11 @@
                 indexLanguage < DictionaryDataFile.numberOfAvailableLanguages;
                 ++indexLanguage) {
                        StringBuffer word = dictionaryFile.getWord();
-                       
+
+        if ("weakCrypt".equals(DictionaryDataFile.fileEncodingFormat)) {
+          weakDecrypt(word);
+        }
+
...


+
+  /**
+   * Very weak encrytion/decryption mechanism
+   */
+  private static void weakDecrypt(StringBuffer word) {
+    int n = word.length();
+    while (--n>=0) {
+        char ch = word.charAt(n);
+        if (ch>=60 && ch<124) word.setCharAt(n, (char) (((ch-60)^'+') + 60));
+    }
+  }
+
+

This gives dictionary files like
URIR<<>Y        @TWTUBJUT
URIR<<>YB       UTMJ @TUXOYNJ?T/@TUXOYNJĵT
URIR<<>O        ĵNX@TUXOYNBOJ
URIR<UBU<       UTMJ @TUXOYNJĵT/@TUXOYNJ?T
UR?JUU>WX>      UTMJĵT, ĵNX@Y>BOJĵT, UTMJ =TYVBĝT
UR?JUU>O        ĵNX @Y>BOJ, UTM> =TYVBĝBUOJ

which is perfect for me.


So, if the field fileEncodingFormat is set to "weakCrypt" the encryption kicks in.

Any comments/suggestions/objections ?

Jacob
#57
General discussions / Very big dictionaries
03. May 2010, 17:00:07
Hi, Im compiling a reasonably big wordlist (1.6 MB - 34000 entries).

The dictionary/ subdir gets to 20MB uncompressed and the compressed JAR file is 7MB.

Have there been any attempts to make a more compressed format than a ZIP of csv-files?
I'm thinking of  http://en.wikipedia.org/wiki/Trie and such things.

Take a look at the sizes at http://www.tinylex.com/download.php  (a sister project, but lacks good GUI and a reasonable amount of supported platforms)


Jacob
#58
Hello,

Could you give some hints, or a recipe, on how you compile the sources?

Ive tried with netbeans (with the nbandroid plugin) and with eclipse.
I can generate and run simple apps in boths IDEs but the special setup for DfM keeps getting in the way.

Perhaps include or provide you IDE config files in SVN.

Yours,
Jacob
#59
Quote from: axin on 30. April 2010, 10:02:47
Hi Jacob

Quote from: jn0101 on 29. April 2010, 16:11:00
3) This is most important:

A         DictionaryForMIDs/src/de/kugihan/dictionaryformids/translation/normation/NormationEpo.java

If you could do a new Android build released to Market with 3) included I would be very gratefull, as I could instruct Esperanto speakers to download it and ZIP files with the dictionaries.
If not, could you do a build and instruct me on how to install it on other user's phones?

I just committed the normation and pushed the updated version 0.12 (including Esperanto localization) to the Market.

Ive downloaded it, thanks.



Quote from: axin on 30. April 2010, 10:02:47
Currently I don't have the JavaME source set up, but I guess Gert will have look as soon as he has a minute (@Gert...).

I have the sources up and running, so no need to publish a new J2ME version now, as people should be able to install straight from my test site (http://javabog.dk/filer/paroj/).
The problem was that Android apps are is not easily installable from other places than the Android market.




Quote from: axin on 30. April 2010, 10:02:47
Could you post some steps on how to support microemu and how to create the jnlp files? Or refer me to the relevant documentation?

See http://www.microemu.org/microemu-webstart/index.html

For JAD URL: http://wintermute.de/wap/extended/5ud0ku.jad
just point to:  http://microemu.org/webstart/wintermute.de/wap/extended/5ud0ku.jnlp
Replace bold part of URL with anything.

For example for http://javabog.dk/filer/paroj/eo-en/DictionaryForMIDs_eo-en_lernu.jad
I link to http://microemu.org/webstart/javabog.dk/filer/paroj/eo-en/DictionaryForMIDs_eo-en_lernu.jnlp

If you prefer running as applet instead of web start (see http://www.microemu.org/applet.html) there is a bug:
http://code.google.com/p/microemu/issues/detail?id=45&colspec=ID%20Type%20Status%20Priority%20Milestone%20Owner%20Summary%20Reporter%20Stars
which requires you to change this code before it will work:

Index: JavaME/src/de/kugihan/dictionaryformids/hmi_java_me/DictionaryForMIDs.java
===================================================================
--- JavaME/src/de/kugihan/dictionaryformids/hmi_java_me/DictionaryForMIDs.java  (revision 271)
+++ JavaME/src/de/kugihan/dictionaryformids/hmi_java_me/DictionaryForMIDs.java  (working copy)
@@ -57,7 +57,7 @@
                       
                        // check for supported MIDP version
                        String supportedMidpProfile = System.getProperty("microedition.profiles");
-                       if (supportedMidpProfile.indexOf("MIDP-2.") == -1) {
+                       if (supportedMidpProfile != null && supportedMidpProfile.indexOf("MIDP-2.") == -1) {
                                // if MIDP 2.0 is not supported, then the application will not run
                                utilObj.log("MIDP 2.0 not supported by the device.\n" +
                             applicationName + " will not run correctly !");

#60
I just spotted some code which can't work:


package de.kugihan.dictionaryformids.dataaccess;


public class DictionaryDataFile  {
...

   public static void initValues(boolean initDictionaryGenerationValues)
   {
...
      dictionaryAbbreviation = utilObj.getDictionaryPropertyString("dictionaryAbbreviation", true);
      checkForEmptyProperty(dictionaryAbbreviation);
        }


   // if property is provided but empty, then this is handled as if no property was provided:
   static void checkForEmptyProperty(String propertyName) {
      if (propertyName != null)
         if (propertyName.length() == 0)
            propertyName = null;
   }



If you'd want it to work you'd have to return a value, say like this:


   public static void initValues(boolean initDictionaryGenerationValues)
   {
...
      dictionaryAbbreviation = utilObj.getDictionaryPropertyString("dictionaryAbbreviation", true);
      dictionaryAbbreviation = checkForEmptyProperty(dictionaryAbbreviation);
        }


   // if property is provided but empty, then this is handled as if no property was provided:
   static String checkForEmptyProperty(String propertyName) {
      if (propertyName != null)
         if (propertyName.length() == 0)
            propertyName = null;
                return propertyName;
   }

Jacob