Menu

Show posts

This section allows you to view all posts made by this member. Note that you can only see posts made in areas you currently have access to.

Show posts Menu

Messages - oxxide

#1
Jeff,
Thanks for fixing the doubles and missed transliterations. I managed to leave out the "tya" series even though i used it for examples in all my posts...
I wouldn't worry about passport hepburn either, it doesn't seem to be used much for japanese input.
All my changes are in these two files, but i haven't even looked at the code that interprets them so i can't tell for sure if they will work. My assumptions were:
- these files are actually used,
- the mappings are scanned in the order they appear in the file,
- the first mapping whose key appears at the beginning of the unconverted string is used.

I'm looking forward to the updated version!
#2
Attached are the hiragana and katakana mapping files with my fixes. Here's what was changed:

In the hiragana file:

  • Fixed mistaken mappings such as ju
  • Added kunrei-shiki mappings (tya, tyu, tyo, etc)
  • Added small character mappings (la=ぁ lya=ゃ etc)
    The small vowels are required to input some words such as ディスク(disc). The other small character mappings are just commonly known shortcuts.
  • Added the vowel extension line ー.

To obtain the katakana file, I copied over the hiragana file and converted all hiragana glyphs to their katakana counterparts.

It would be awesome if someone could look them over for completeness and correctness before they go into a release.

Also note that I never tested this on my device, I just followed the existing mappings when adding / correcting things. Please run a sanity check on a device/simulator before releasing a version with these files.

dreamingsky: I completely agree that romaji alone is an insufficient input method for a dictionary, and that we should move to a kana->kanji converting IME like all Japanese phones. However I still think it's worth maintaining an accurate roman alphabet -> kana mapping because:
- Romaji search is all that's working right now (at least on my phone) so it needs to be supported. Right now I can't look up any words containing "ju" (and they seem to be annoyingly common :D)
- There's little confusion about the 3 romanization systems since they're mostly similar, and in cases where mappings differ, they can be used as equivalents (cha = tya etc.) so there's no conflict.
- It's really not that much work, plus it's done :)
#3
I'm working on an updated version of this file.

Meanwhile, I thought I'd apply the same fixes to the katakana file as well. I found that in this file:
- there are about 10 errors that need to b e fixed
- the number of lines is doubled because long vowels are encoded to produce a vowel lengthening dash 「ー」. For example:

aa=アー
ii=イー
uu=ウー
ee=エー
oo=オー

This sounds like a good convenience at first, but I don't know if I agree with it. My main concern is that it makes it impossible to input アア、イイ、カア、etc. which may actually appear in some words (for example: クリアアウト clear out).

I would recommend including the same mappings as we do in the hiragana file (i.e. get rid of the long vowel mappings) and map the dash to the vowel lengthening dash:

-=ー

The only downside I can think of is that users need to be aware of this and use the dash when they search; searching for "paatona" will not return "partner", but "pa-tona" will work. I think this is fair game as long as it's documented somewhere.

Thoughts on this?
#4
Hi,

I noticed that one of the entries in DictionaryForMIDs.jar/char_lists/romaji_hiragana_UTF8.txt is incorrect:

sho=しょ
ja=じゃ
ju=ふ
jo=じょ
cha=ちゃ

The highlighted entry should read:
ju=じゅ

I've confirmed that this prevents a user from searching any word that contains the syllable "ju" when doing a romaji-based Japanese > English lookup. For example, searching for the romaji string "juku" yields results that match "ふく" (fuku).

Also, I noticed that this file only addresses the Hepburn standard for Japanese romanization. I would recommend this file be extended to support the Nihon-shiki and Kunrei-shiki romanization standards, which are used widely in Japan. Here is a reference on these standards: http://en.wikipedia.org/wiki/Romanization_of_Japanese

Cheers,
oxxide