Menu

Show posts

This section allows you to view all posts made by this member. Note that you can only see posts made in areas you currently have access to.

Show posts Menu

Messages - dreamingsky

#76
Problems / Re: Bitmap Font Setup
17. January 2008, 01:58:51
bahathir

The latest version of the Japanese EDICT dictionary is 3.0.3.  It has bitmap fonts in it.  But a newer version of the bitmap font generator was released with version 3.1.0.  I'll try to release a version 3.1.2 of the EDICT dictionary soon.  Oxxide fixed several romaji problems which will be introduced in the following version.

Quote
I would like to see an implementation like TTF/vector fonts, and might be even more efficient or smaller.

It would be great to be able to use TTF files on a phone.  I think it'd be a huge thing to introduce though.

Right now the EDICT dictionary uses romaji to enter Japanese words.  Eventually we'd like to have a true IME (input method editor) to enter the Japanese words in hiragana.  Would you be willing to write some Java code to make an IME?  Peter Kmet wrote a "custom TextField" which can be adapted to use as an IME for Japanese.
http://dictionarymid.gottfried-signs.ch/index.php?topic=52.0

dreamingsky

#77
Oxxide

I looked over your conversion tables.  Everything looks good.

I found a couple double listings in the hiragana and katakana files.  This is a minor error and wouldn't affect the program at all.

Hiragana doubles
ji   じ
jji   っじ
zu   ず
zzu   っず

Katakana doubles
jji   ッジ
zu   ズ
zzu   ッズ

I found a couple romaji listings that were missing in the conversion tables.  They are rare Nihon-shiki transcriptions, but they should probably be added to the tables:

ちゃ   チャ   tya
ちゅ   チュ   tyu
ちょ   チョ   tyo
っづ   ッヅ   ddu
っちゃ   ッチャ   ttya
っちゅ   ッチュ   ttyu
っちょ   ッチョ   ttyo

Also, there is one more uncommon romaji listing maybe we should add: "oh".  It is used for "おう" ("oo" is still used for "おお" I think).
http://en.wikipedia.org/wiki/Hepburn_romanization
Ministry of Foreign Affairs standard (外務省旅券規定) [3], in which the rendering of syllabic n as m before b, m, p is used and the spelling oh for word-final long o is allowed (e.g. Satoh for 佐藤). This is used to romanize Japanese names in passports and is thus also known as "Passport Hepburn".

I used to live near a city called "Akou".  On all the signs the romaji was listed as "Akoh".  I always thought the "oh" was from Nihon-shiki.  But, I guess it is a variation of Hepburn.  I have no idea why the passport agency uses "oh".  I think "ou" is a better idea.

Were all the changes in the TXT files?  Did you need to change any Java code?  If there are any Java code changes then they'll have to wait until version 3.20 is released.  If the only changes are in the TXT files then I might be able to sneak them into version 3.12 and release it.

Jeff
#78
General discussions / Re: Stardict
09. January 2008, 20:50:21
Yes, Stardict has a lot of nice dictionaries, especially Chinese dictionaries.  Unfortunately many of them are commercial and should not be posted on the internet.  The Stardict webpage says the dictionaries are GPL, but many of them are not.

For example, this is a really nice dictionary:
http://stardict.sourceforge.net/Dictionaries_zh_CN.php
oxford-gb dictionary(en - zh_CN) 牛津现代英汉双解词典
http://stardict.sourceforge.net/Dictionaries_zh_TW.php
oxford-big5 dictionary(en - zh_TW) 牛津現代英漢雙解詞典

This dictionary is actually:
"Oxford Advanced Learner's English-Chinese Dictionary" 4th edition (1995)
Simplified (GB): 牛津高阶英汉双解词典
Traditional (Big5): 牛津高階英漢雙解詞典

It is a commercial dictionary published by Oxford University Press.

StarDict also has these dictionaries:
http://stardict.sourceforge.net/Dictionaries_dictd-www.dict.org.php
Longman Dictionary of Contemporary English
Oxford Advanced Learner's Dictionary
Collins Cobuild English Dictionary

They are great English to English Learner's dictionaries.  But they are all commercial dictionaries.

StarDict does have some nice GPL dictionaries, however.
http://stardict.sourceforge.net/Dictionaries_zh_CN.php
Soothill-Hodous Dictionary of Chinese Buddhist Terms(zh_CN - en)
http://stardict.sourceforge.net/Dictionaries_zh_TW.php
Soothill-Hodous Dictionary of Chinese Buddhist Terms(zh_TW - en)

This dictionary is freely available here:
http://www.hm.tyg.jp/~acmuller/soothill/soothill-hodous.html
It is an old paper dictionary whose copyright ran out:
http://www.amazon.com/Dictionary-Chinese-Buddhist-Terms-Sanskrit-Pali/dp/0700714553/ref=ed_oe_p

So, it would be nice to convert some of the StarDict dictionaries to DictionaryforMIDs.  But, first we must check that the dictionary is not commercial.
#79
I wouldn't spend too much time fixing a romaji input for the Japanese EDICT dictionary.  I think moving to a hiragana input is a much better idea.

Instead of making support for numerous romaji systems, it's better to just input the Japanese as Japanese (i.e. hiragana).  Then you don't have to worry about Nihon-shiki and Kunrei-shiki.

Hiragana input would also solve the problem with the long vowels.  On a standard Japanese electronic dictionary you enter long vowels as a combination of hiragana and the Japanese "ー" symbol.

For example, to search for "cart" (カート), you type this in the electronic dictionary:
かーと [hiragana-katakana-hiragana]

If you wanted to search for "クリアアウト", you would enter "くりああうと" (you don't enter the long vowel sign).

Romaji is nice for beginning students of Japanese.  But, Japanese students should abandon romaji pretty early and only use hiragana, katakana, and kanji.
#80
The BOM is causing the problem?  Interesting.  The BOM (byte order mark) http://en.wikipedia.org/wiki/Byte_order_mark isn't an illegal character.  Basically it's a code to tell programs the file is encoded as UTF-8.  UTF-16 uses another character.

It is a normal character: "zero-width no-break space".  If you don't save a UTF-8 file with a BOM, then the next time the program opens the file it must guess what encoding the file has.  If you save with a BOM, then when a program opens the file then it knows the file is UTF-8.

I think it's a good idea to remove the BOM with the font generator not with DictionaryGeneration.  I think having the .csv files with the BOM would be a good idea.  Then you can open the files easier with a program.  I wouldn't recommend asking the users to manually remove it, since it is a good idea to save UTF8 files with a BOM.

I can manually remove the BOM and do some more testing for the time being.  I'll do some more testing tomorrow.
Jeff
#81
I uploaded the "Thai NIU" in version 3.1.2beta1 to Sourceforge.  It is in the "dictionary ThaEng (NIU), 3.1.2" directory.  The file name is "DictionaryForMIDs_3.1.2beta_ThaEng_NIU_Thai.zip".

I found a few more problems with the bitmap fonts.

I.
I ran the bitmap font generator with font size 12 (only 1 size).  Then I started the program.  I went to "Settings" and turned on the bitmap font setting.  Then I went to the "font size" and selected "12" (it was already selected).  Then I got this error (while on the setting screen):

Thrown de.kugihan.dictionaryformids.general.g:
Incorrect bitmap font size setting: 14
Incorrect font size setting: 14

I think the problem is from an earlier setting I had.  Before I had the bitmap fonts set to size 14.  Then I turned off the bitmap fonts.  Then I made size 12 bitmap fonts and re-ran the program.  This is when I got the above error.

I also got that error in a 2nd way.  I ran the program with the bitmap fonts turned on with size 14.  Then I ran the bitmap font generator with only size 16.  When I started the program again (the bitmap fonts option was still turned on from the previous time).  I got the same error about size 14 (I didn't have size 14, only size 16).  I couldn't even get to the search page until I re-ran the font generator with size 14.

This 2nd problem shouldn't actually happen in the real world (because we can only use 1 dictionary at a time now.  I found the problem with the Wireless Toolkit.)  But, later when we get the dictionary loader working, then the problem will arise.

So, maybe we need code that runs when the program is started to check if the "font size" setting currently saved actually exits in the dictionary that is loaded.

II.
The bitmap fonts don't display correctly with the "Arial Unicode MS" font.  Every entry is shifted right half-way in the screen.

Also, most of the text seems to have disappeared.  The entire 1st search result for "table" disappeared when using the bitmap fonts.  It had a long example sentence in it.  With the bitmap fonts, the search result started at the next search result.

III.
I can't scroll down the screen with the bitmap fonts.  A search may have 10 hits.  A page will show 5.  But I can't scroll down to see the other 5.  It just scrolls down into small empty white boxes.

IV.
I tried to verify error II with another font.  So I ran the bitmap font generator with the "Cordia New" font (this is the default font for Thai.  It is installed by Windows).  This time the font was displayed on the left side of the screen correctly.  So only the bitmaps from "Arial Unicode MS" were making errors.  However, I can't use the "Cordia New" font because of the "complex scripts" limitation I mentioned before.  I must use "Arial Unicode MS".

Error III was also fixed by using the "Cordia New" font.  Scrolling worked fine.

V.
The "Cordia New" fonts looked OK.  But, I couldn't actually change the font size.  I'd start with size 12.  Then I'd go in the settings and select size 16.  However, the screen still showed size 12.

VI.
Then I ran the font generator for "Arial Unicode MS" for the Hindi dictionary.  I searched for "temple".  None of the Hindi or anything of the example sentences or grammar tags (in coloured text) showed up.  Only the English in black showed up.

Then I searched for "tree".  Everything looked OK, except the example sentence was on the same line as the search result.  It should be on the next line (there is an "\n" in the source dictionary file).

Then I searched for "house".  Then I got Error II.  The search results were shifted to the right.  And none of the Hindi or example sentences showed up.  Only the English search results were shown.


I uploaded the Hindi source files for the developers.  It is in the "dictionary EngHin (IIIT), 3.1.2" directory.  The file is titled "DfM_Hindi_source_312beta.zip".

You will need Hindi set up on your computer to use it.  For WinXP:
Control Panel -> Regional and Language Options -> Languages -> select "Install files for complex script and right-to-left languages"

Then open the zip file and extract it to "C:\".  The directory structure is already set as:
C:\Temp\Dict\Hindi\

Then run C:\Temp\Dict\Hindi\setup.bat
Then run the bitmap font generator
Then run C:\Temp\Dict\Hindi\jar.bat

The directories are hard-coded in the BAT files.  Feel free to change the environment how you'd like.

The fonts to use for the font generator are "Arial Unicode MS" (an optional install with Microsoft Office 2000 and higher) and "Mangal".  "Mangal" is the default font for Hindi on Windows, but "Arial Unicode MS" looks better.

Jeff
#82
Quote from: Tomcollins on 12. May 2007, 13:39:06
2) I'm not so familiar with the languages you mentioned, but I know that there are also many character-combinations added as one character in unicode! We have the same problem with the chinese pinyin, since there are tones on the vowels. Like xue2sheng1 should be xuéshēng. We do the convertion with a languageUpdateClass so it is just one character in the dictionary and in the bitmapfontImage.


Yes, pinyin with tone marks works fine in DfM with bitmap fonts.  "e2" is the same as "é".  "é" is only one Unicode block.  So it will display fine with a bitmap font.  "Complex scripts" are not like this.  The consonant and vowel are 2 separate Unicode blocks.  There are no merged "consonant + vowel" Unicode blocks.  Then you would need to make literally thousands to Unicode blocks to handle all the possible combinations of consonants and vowels.  It is simpler in fonts to just make the 50 basic parts and squish them together.

Quote from: Tomcollins on 12. May 2007, 13:39:06
1) Well. For a lot of languages there's some kind of transciption to basic latin. e.g. in chinese that would be pinyin. And if this transcription is added in the search index it's no problem to search for e.g. chinese with devices, which do not support chinese characters.

Yes, pinyin is a very useful transcription for indexing the Chinese dictionary.  Luckily pinyin is included in the CEDICT (Chinese-English) and HanDeDict (Chinese-German) dictionaries.  So we can just add an index to the pinyin.  Unfortunately the Thai, Hindi, Khmer, Japanese... dictionaries do not have this roman transliteration in the dictionary (Japanese uses the hiragana script for the transcription of Chinese characters).

We could possibly add this transliteration (sorry, "transliteration" means to type a language in the roman script).  Hindi has a standard transliteration called IAST http://en.wikipedia.org/wiki/IAST.  But, adding this transliteration to the dictionary would considerably increase the file size of the DfM dictionary.

Another option would be to input the Hindi in IAST transliteration.  The user would input IAST and IAST would show up in the search box in DfM.  Then the IAST could be converted "on the fly" to the real Hindi script (called Devanagari) in a languageUpdateClass.

There are a couple problems with this.  First, there is no standard transliteration system for Thai.  We'd have to pick one and then explain it in documentation.  Then the users would have to learn the transliteration method.

Another problem is that IAST for Hindi uses several roman letters that are not common: "ṅñś", for example.  Our phones have no way of typing these letters.  We could change the letters to ".n", or "~n" or something and then write documentation telling the users how to use it.  But, this is inconvenient for the users too.

The best solution would just be to write an IME.  Then you would type directly in Hindi or Thai.


So far I think we need IMEs for the following languages:
Thai
Hindi (India)
Khmer (Cambodia)
Japanese
Russian
Arabic

Does anyone know of any others?

So far we cannot search in these languages in DfM.  We can only search English->langauge2 or German->langauge2, etc.  We cannot search langauge2->English.

If you had a Thai phone, then it would be no problem.  You could use the default IME from the phone to type in Thai.  But what if you had an English phone?  You'd have no way of typing in Thai.

Only now could we start to worry about IMEs.  We had to have bitmap font support before we could do IMEs.

We could also build an IME for Chinese pinyin too (not Chinese characters).  Personally I think " xuéshēng"  is easier to read then "xue2sheng1".   Also, typing "xue2sheng1" must be difficult.  You probably type "e" then press a button to change to "number mode".  Then you press "2".  Then you press the button again to go to "letter mode".  Or using the old style, you press "33" to get an "e" then press "2222" to get the "2".

Wouldn't it be nice to have a separate key for tone marks, say "#" or "1", for example.  So then you press "33" to get an "e".  Then you press "#" and the screen will change from "e#" to "ē" (1st tone).  Press "#" again to get "é" (2nd tone).

If you'd like a Chinese pinyin IME for you PC, then you can download Keyman.  It is the default program for building your own IMEs on a PC.  First go here http://www.tavultesoft.com/70/download.php and download "Keyman Desktop Professional 7.0".  Then go here http://www.tavultesoft.com/keyman/downloads/keyboards/details.php?KeyboardID=346&FromKeyman=0 and download the pinyin IME.

Quote from: Tomcollins on 12. May 2007, 13:39:06
Although a IME would be very nice. (I once talked to one doing an chinese ime by himself but he said that a big problem is to get good complete char lists.)

You're right.  Building an IME for Chinese characters would be difficult.  Finding the Character lists wouldn't be too difficult.  You could just use the "Unihan" database from the Unicode website http://www.unicode.org/charts/unihan.html.  It has a database of all the characters in Unicode.  But, then you'd have to find one of the "frequency" lists that sorts the characters based on how common they are.  Big5: http://technology.chtsai.org/charfreq/, GB: http://lingua.mtsu.edu/chinese-computing/statistics/.

For this you'd need to have a database of at least 5,000 characters (50,000 if you want to be more complete).  That database would probably be at least 300kB for DfM.  That's too much.

A Chinese pinyin IME would be much simpler than an IME for Chinese characters.  We would just rebuild an IME for English and add the button for the tone marks.

Personally I think the best way to start with the IMEs is to build a "European" IME.  This is an IME that has all the letters for European languages: English, German, French, etc.  We'd start with the basic Latin (roman) alphabet.  Then we'd add "äöüß" for German, "éè", etc for French.  We could do it like really old phones like:
3   d
33   e
333   f
3333   3
33333   é
333333   è

I think it'd be better to do it differently and add a separate "diacritic" button ("#", for example).  Then you'd press "33" to get an "e".  Then you'd press "##" to get the "é", for example.

Of course we don't actually need this IME for DfM.  Normation classes can change all "ö" to "oe", for example.  Also, nobody will want to switch from using T9 input to the old "multi-type" method http://en.wikipedia.org/wiki/Predictive_text (just press "843" for "the", instead of "84433").  If you have a German phone, then using the default T9 German IME would be much simpler.

But, it'd be much easier to work on the Java code using the roman alphabet.  It'd be difficult to start with the Thai IME for example and try to explain what each of the letters mean.

Once the European IME is finished then it would very easy to adapt it to other languages.  For example, for Hindi we would just paste these letters into the European code:
3   ka   *
33   ki
333   ku
3333   ke
33333   ko
333333   kau

(* imagine these letters are actually written in Hindi.  If I actually wrote them in Hindi then they'd probably just show up as white boxes since you probably don't have a Hindi font on your computer)

First we should just build a simple "muilti-tap" http://en.wikipedia.org/wiki/Multi-tap IME (press "84433" for "the").  We already have the keystroke layouts for Thai and Japanese ready for this.

Then once that is finished we can start on working on a "true IME" (I'm not sure what to call it) based on the multi-tap IME.  We'll worry about this later.  If anyone wants to learn how to build a "true IME", then take a look at the Keyman program.  Go here http://www.tavultesoft.com/70/download.php and download "Keyman Developer 7.0".  Then look at the documentation.  The PDF file is easier to read than the program help file.  Download it here http://www.tavultesoft.com/keymandev/downloads/.  Then look in the manual for the "Quick French keyboard".  This will give some help in building the "European IME" for DfM.

I think we should base our IME code on Keyman.  It is the default scripting language for building your own IME (I'm actually using Keyman now to build an IME for another project).  There is an open source program for Linux called "Keyboard Mapping for Linux" http://kmfl.sourceforge.net/ that uses the same scripting language as Keyman.

Wow, that turned out long.  I'm done for now.
Jeff
#83
Quote from: Gert on 12. May 2007, 04:11:33
when we use "Arial Unicode MS" for the bitmap fond of the Thai and the Hindi dictionary, then these characters are not getting displayed correctly

Yes, using Arial Unicode MS (or any other font) won't work correctly for Hindi, Thai, Khmer...  In order to correctly display "complex scripts" we would need:
1. "outline" fonts (TTF, etc)
2. something like Uniscribe.dll

It would be possible to reproduce Uniscribe.dll.  Linux uses something like Uniscribe.dll to manage complex scripts.  I'm sure we could find the source for it somewhere.  But, I think it'd be way beyond the scope of DfM to do this.  We'd have to write the code for each language (Thai would need different code from Hindi, etc).

More importantly, we'd need to use regular fonts instead of bitmap fonts.  It would be impossible (as far as I know) to display "complex scripts" using bitmap fonts.  Regular outline fonts have code for each vowel to tell it to move the vowel back over the last letter.  Bitmap fonts don't have this feature.

So, we'll just have to live with what is possible.  It won't be a huge problem if you can get used to reading it with bitmap fonts.  An example would be:
Imagine the letter "ã".  The "~" is above the "a".  Imagine the "~" is the vowel and the "a" is the consonant.  With the bitmap fonts it will look like "a~" instead of "ã".

I think any effort towards these languages should be spent building an IME (input method editor) for Thai, Hindi, and Khmer.  An IME is the way to type in a language.  So far we can't type in Thai, Hindi, Khmer, or Japanese in DfM.  We can only search English -> language2.  We cannot search language2 -> English.  I made a posting about this in the "Feature planning for version 3.2" thread.

Jeff
#84
Yes, many fonts can support a large number of languages.  "Arial Unicode MS" can probably support 30 languages or more (not Khmer though, so far I've only found 2 Unicode Khmer font makers on the internet).  Currently the Thai dictionaries and the Hindi dictionary are using Arial Unicode MS for the bitmap fonts.

The "complex scripts" work differently than the Roman language examples.  There are actually 2 Unicode blocks that get entered when typing.  For the roman example - if you type a German "u umlaut", then press the backspace button once, then the whole letter is deleted.

For the "complex scripts", if you press the backspace button, then only the vowel is deleted.  The consonant will still be there.  For example - in Hindi, if you type "k" then the "k" will show up.  Then you press the "e" and the "e" will show up over the previous "k".  Now if you press the backspace button then only the "e" is deleted.  The "k" and "e" are 2 different Unicode blocks.
#85
Here is my testing with version 3.1.1.

I. JARCreator bug

I tested the 3.1.1 Bitmap Font Generator and the 3.1.0 DictionaryGeneration file.  I saw there is a new JARCreator file too.  That's really great.  It saves me a lot of time.

But, I ran into an error.  I can't get JARCreator to build the JAR with the font files.  I get this error from JARCreator:

Exception in thread "main" java.io.FileNotFoundException: C:\Temp\Dict\Thai_NIU\
dictionary\fonts (Access is denied)
        at java.io.FileInputStream.open(Native Method)
        at java.io.FileInputStream.<init>(Unknown Source)
        at de.kugihan.jarCreator.JarCreator.writeJAR(JarCreator.java:178)
        at de.kugihan.jarCreator.JarCreator.main(JarCreator.java:86)

I got the same problem in the Thai, Khmer (Cambodia), and Hindi (India) dictionaries.

If I don't run the Bitmap Font Generator, then the JARCreator will finish with no problems.  I am using version 3.1.1 of DictionaryForMIDs.jar and DictionaryForMIDs.jad.  I am using version 3.1.0 of JARCreator.  Do I need version 3.1.1 of JARCreator?  I didn't see it on the website.

I really like you can choose multiple font sizes with the Bitmap Font Generator.  That's a big plus.  I thought the fix would only allow you to choose 1 size.  But, you can choose multiple sizes and put them in the JAR file.  That is very nice.  And the .png files are a big plus too.


II. Khmer font bug
The Bitmap Font Generator works good.  I ran into a problem with some Khmer fonts, though (KhmerOS and KhmerOT).  The "KhmerOS System" font worked OK, though.  The 2 fonts got cut off at the bottom.  Only the top 15% of KhmerOS showed up.  Nothing showed up for KhmerOT (OT = opentype).  It's just a white line.

I'm not sure if someone wants to bug fix it.  It's probably not worth it since the "KhmerOS System" font works fine.  The Khmer fonts can be downloaded from here:
http://www.khmeros.info/drupal/


III. complex scripts
Also, there is another bug (technically a feature).  But, there is no way to fix it.  Hindi, Thai, and Khmer are "complex scripts".  Fonts for these languages are a little tricky to make.

These languages (actually scripts for most languages in South Asia and South-East Asia) have separate consonant and vowel marks.  So to type "ke" in Hindi, you type a "k", then the "k consonant" shows up.  Then you type "e" and the "e" shows up.  But, the "e" is not a separate letter.  It is put on top of the "k".

So inside the font are directions for how to move the vowels over the consonants.  But, when you convert a font to a bitmap font, then you lose the instructions for the movement.

So when DfM displays the bitmap fonts, the consonants and vowels are divided up into 2 parts.  There is no way to avoid this.  Thai phones have TTF fonts or something that can handle the vowel marks correctly.

Also, Windows uses the Uniscribe DLL http://en.wikipedia.org/wiki/Uniscribe  to make some even more complicated font adjustments.  For example, in Hindi, a "ii" (long "i") is put to the right of the consonant.  But a "i" (short "i") is to the left of the consonant.  The Uniscribe DLL moves the "i" to the left.

Anyway, that is a lot of techno talk.  Basically I'm saying that the Hindi, Thai, and Khmer dictionaries will look funny with the bitmap fonts.  I doubt anyone would fault us for it, though.  If someone does complain, then we can only tell them that we can't fix it.  The problem is due to limits in the architecture.

Jeff
#86
One big thing that would be great for the Thai, Hindi (India), Khmer (Cambodia), and Japanese dictionaries would be IME support (Input Method Editor).

At the moment, we can't type in those 4 languages in DfM.  So we can only do English -> language2 searches.  We can't do language2 -> English searches.

An IME is how you type in a language.  For example, on an old phone without T9 input:
to type a "c" you press "222".

Here are some other examples for English:
2   a
22   b
222   c
3   d
33   e
333   f
...   
#   .
##   @
###   /
...   


Imagine trying to type a "á" with an old English phone.  There is no way to do it.  You would need to write a new IME to support "á" (press "2222", for example).

Chinese works OK in DfM because the input is done with just regular roman letters (pinyin without tone marks): ta3, gen1, etc.

Sean Kernohan started work on the IME for the Japanese dictionary (the Japanese IME is more involved than the IMEs for other languages).  Damien Hanssens submitted the IME layout for Thai.

If I see a phone from India and Cambodia then I can write down the IME for them.  I won't be in Cambodia again for about a year.  I don't have any plans to go to India anytime soon, though.  I put a little note in the Hindi and Khmer dictionaries asking for someone to submit the IME layouts.

Peter Kmet's "custom TextField" Java code could be adapted to work as an IME.  Can anyone help with the Java portion of building the IME?

Jeff