New Chinese-English Dictictionary available

Started by mats, 19. September 2011, 23:07:12

Previous topic - Next topic

0 Members and 1 Guest are viewing this topic.

mats

A new Chinese-English dictionary is now available at http://dictionarymid.sourceforge.net/dict.html#Chinese. It is based on CC-CEDICT published by MDGB. It is intended for phones without built-in support for Chinese, so it contains a stroke order index (based on modern stroke orders for mainland China) and a bitmap font based on unifont. In order to save space, some information has been omitted from the indexes, though. Still, the dictionary may be too large for many phones. I'll be happy to answer questions and to receive reports on problems.

Noting that many downloads of the Chinese-English dictionary originate from China, I have also prepared a more traditional version, suitable for Chinese phones, omitting the stroke order index and bitmap font, but keeping the Chinese index. This version might be uploaded later.

/Mats

axin

Hi Mats,

it's really great that you created an update for the Chinese dictionary! Actually just one day before your update a user of the Android version send me an email asking about an update  :)

I just quickly tested the dictionary on Android, and in the course of doing so found some smaller issues:

  • If I choose Chinese->English and search for simple characters like 你 or 好 there are no results found. A wildcard search * though returns some results. Is there no index for this direction? Still, I can choose that direction from in the application? Is that a problem of the dictionary? Or the Android implementation...?
  • For some entries that don't have traditional characters, ° is added after the characters (see screenshot). Is that on purpose?
  • Can you explain a little more on how to interpret the stroke order numbers in the entry? (see screenshot)

Anyways, thanks for contributing! And I would also be very interested in the "more traditional version" :)

Cheers,
Achim

mats

Hi Axin,

To keep the size of the dictionaries down, I decided two make two versions, but only one has been uploaded so far. The uploaded dictionary is intended for phones without support for Chinese, and it includes a stroke order index and a bitmap font, but no Chinese index. The second has a Chinese index, but no bitmap font and no stroke order index. Still, the dictionaries are quite large. Both dictionaries were submitted a couple of weeks ago, so I hope it will be uploaded soon. Also, some notes ought to be added to let people know the difference between the dictionaries. I have sent some suggestions to that effect to the web master. The not yet uploaded dictionary contains traditional versions for all entries. Originally, I did not intend the currently uploaded dictionary to have traditional characters at all, but I realized that that was not good idea, so I added them for short words. i.e. when I could display them without line breaks using the bitmap font. For longer words, I just add a degree sign as a warning if the traditional version of the word is different from the simplified version. So it is intentional. I don't know if I understand your question about strokes. They basics is explained in the README-Dictionary.txt or you can check e.g. http://en.wikipedia.org/wiki/Wubihua_method. Taking xiáng as example, here is the corresponding entry from the modern stroke order standard.

axin

Hi Mats,

thanks for the explanation about the strokes, I didn't know that encoding, very interesting!

So the uploaded dictionary is only searchable English -> Chinese? Then there is a small problem with the Dictionary.properties file. You should set the property language1IsSearchable=false
then the it is clear for all dictionary applications that there should not be an option for translating Chinese -> English.

I never set up a dictionary myself and I very much appriciate the work of everyone who set one up, but still I think you should change that property. Not sure if anything else should be changed... And please correct me if I'm wrong!

For me, the perfect dictionary configuration would be two "standard" dictionaries (for the standard user) and one "enhanced" dictionary:

  • Simpliefied Chinese <-> English
  • Traditional Chinese <-> English
  • Enhanced similar to the one you described, including stroke orders and other information that may be suitable...

How big is the other dictionary that you created? The one that includes all the entries?
I guess the size of the dictionary won't be a problem for Android, but will be a problem for the other platforms.

Cheers,
Achim

mats

#4
Hi Axin,

thanks for your interest in the dictionaries. Pinyin search is enabled in the uploaded dictionary, so you can search for Chinese words, although not by using Chinese characters, and the stroke order search is of course also in the direction Chinese -> English.

It is possible that size is no problem for modern phones. My own phone is slow for large dictionaries, so I tried to trim down the size as much as possible. Most of the trimming is actually done in the English index. I have assumed that Chinese phones generally have some stroke based input method built-in, so that the stroke order index is not needed for those, and they don't need any bitmap font either. On the other hand, phones without support for Chinese have no use for a Chinese (Hanzi) index, but they need a bitmap font and a structural index. That is the basis for the two versions I have made.

The unpublished dictionary has about the same size as the current. An all-inclusive dictionary would be about 17 MB, but the Chinese index per se only adds about one MB. I've temporarily made the unpublished version (plus a large version) available at http://mats_ogren.ownit.name/DictionaryForMIDs.html if you would like to try it. I think that if everything is included, it would be nice to have the option to select which fields to display. Currently, it seems that such a feature is not implemented, although there are some traces of it in the Java code. I think that at least single character words should have both traditional and simplified versions included, if they differ. I'm thankful for your suggestions, though.

If I release updated dictionaries in the future, they will be in a single package, even if the package might include more than one version. Two packages seem to be one too many to manage on the web site.

/Mats