DictionaryForMids Forum

DictionaryForMIDs for Mobiles (Java ME; most devices from Nokia, Samsungs, RIM (Blackberry), LG, SonyEricsson, Motorola and plenty more) => Problems => Topic started by: arsen_a on 29. July 2007, 10:32:31

Title: Is it possible to decrease the dictionary size?
Post by: arsen_a on 29. July 2007, 10:32:31
Hello everybody!

Recently I came across to this project and I really appreciate it. I have already made a Spanish-Russian dictionary by your tools, but unfortunately file size is 465kb and it can not run on my Nokia6230. Is it possible to make the dictionary smaller than 300kb? If I make it unidirectional, will it decrease the file size? How can I do that (make unidirectional)? Thanks.
Title: Re: Is it possible to decrease the dictionary size?
Post by: Gert on 29. July 2007, 12:32:08
You could first look at the index.files that were generated by DictionaryGeneration and check if these contain unnecessary information. Often dictionaries include content that should not be included in the indexes.
Maybe you just could post here in the forum some lines from the index files (best also post your DictionaryForMIDs.properties file).

Making the dictionary unidirectional will reduce the index file size also (but not the directory files).

Another point: did you use JarCreator ? JarCreator includes only those application icons that are really needed for the dictionary. This makes the resulting JAR file smaller compared to a manually assembled JAR file.

Gert

Title: Re: Is it possible to decrease the dictionary size?
Post by: arsen_a on 29. July 2007, 16:49:05
Hi, Gert!

Of course I used JarCreator and gained almost 150kb, because the outpout directory size was 600kb and now the .jar file size is 450kb. Here is my DictionaryForMIDs.properties file

infoText:  Spanish-Russian dictionary
      dictionaryAbbreviation: IDP
      numberOfAvailableLanguages: 2
      language1DisplayText: Spanish
      language2DisplayText: Russian
      language1FilePostfix: Esp
      language2FilePostfix: Rus
      dictionaryGenerationSeparatorCharacter: ':'
      indexFileSeparationCharacter: ':'
      searchListFileSeparationCharacter: ':'
      dictionaryFileSeparationCharacter: ':'
      dictionaryGenerationInputCharEncoding: UTF-8
      indexCharEncoding: UTF-8
      searchListCharEncoding: UTF-8
      dictionaryCharEncoding: UTF-8
      language1DictionaryUpdateClassName:

de.kugihan.dictionaryformids.dictgen.dictionaryupdate.DictionaryUpdateIDP
      language2DictionaryUpdateClassName:

de.kugihan.dictionaryformids.dictgen.dictionaryupdate.DictionaryUpdateIDPSpa
      language1NormationClassName: de.kugihan.dictionaryformids.translation.normation.NormationEng
      language2NormationClassName: de.kugihan.dictionaryformids.translation.normation.NormationLat

I have modified it a bit, because characters in .txt file are in Unicode format and the separator is ':'.
Regarding to the other values, for example index files, that you have mentioned, what can I change there to make the size smaller? Do you mean this variables?
•   searchListFileMaxSize/indexFileMaxSize/dictionaryFileMaxSize

I had a look at the index.files but did not find anything because their also contain the same words from directory.files! Do I need this files? I have checked, all the index.files weight 500kb, and directory.files weight 100kb.
Thank you for your help :)
Title: Re: Is it possible to decrease the dictionary size?
Post by: Gert on 29. July 2007, 18:31:53
500kB index files vs. 100 kB directory files is highly suspicious ! Let's look at this a little closer.

First a few comments about the file DictionaryForMIDs.properties:
Concerning the entries  language1DictionaryUpdateClassName and language2DictionaryUpdateClassName, I guess your dictionary does not have the rather complicated syntax of the IDP dictionaries, right ? Then you should remove these two lines.

NormationEng is probably not the right for Spanish, it better should be NormationLat (NormationLat is best for Spanish). As normation class for language2 you could use the new NormationRus.

Could you update your DictionaryForMIDs.properties for these lines and then re-generate the files. If the size of the index files is still big, could you then double check that only the desired words from the inputdictionaryfile are indexed ? Maybe there are unnecessary index entries ?

About the searchListFileMaxSize/indexFileMaxSize/dictionaryFileMaxSize: with these properties you can reduce the size of the single files, but if the files are smaller, then there will be more files, so the sum will not be less.

There are still a few more possibilities to reduce the file sizes, such as including only an unidirectional index as you suggested. For an unidirectional index, just set languageXGenerateIndex and languageXIsSearchable to false for the language where you don't want to have an index.

Keep us updated about your progress !
Gert





Title: Re: Is it possible to decrease the dictionary size?
Post by: arsen_a on 29. July 2007, 20:59:38
Thanks for your reply!

I have modified the .properties file as you suggested, removed language1DictionaryUpdateClassName and language2DictionaryUpdateClassName, then changed NormationEng into NormationLat and NormationLat into NormationRus. Regarding to complicated syntax, I used as a source for the dictionary, a .txt file containing special characters, for example ñ, ó etc, can this be the cause of my problem?
I have recompiled now and got the same result, file size is 485kb, bigger than before, also the size of index and directory files have been changed, directories-298kb, index-567kb. You advised to "double check that only the desired words from the inputdictionaryfile are indexed ? Maybe there are unnecessary index entries"! Does this program add some unnnecessary files to the original dictionary? I mean not in the .txt file but in the final file? From where does it take that words? I think all the words in Spanish dictionary are necessary :) What can I do next?
Title: Re: Is it possible to decrease the dictionary size?
Post by: arsen_a on 29. July 2007, 21:52:15
 I have just made a unidirectional dictionary as you advised, but now the file size is 377 :(( I think the problem is in the index files.
I would like to provide this information for comparision: my dictionary source .txt file is 300kb, because it is in Unicode format. The original file was in ISO-8859-1 format, size was 200kb but after compilation, I ran that program on PC Mobile phone emulator and russian letters were unreadable so I decided to change the encoding into Unicode and the file became bigger for 100kb. I have English-Russian dictionary installed on my phone, that is 230kb, the source dictionary for that program weights 560kb and it contains almost 25000 words. So, I am wondering, why this Spanish dictionary, that contains only 10000 words, is becoming 485kb?   
Title: Re: Is it possible to decrease the dictionary size?
Post by: Gert on 30. July 2007, 07:14:56
Please have a look at the description of the index files and directory files in the chapter "Files generated by the DictionaryGeneration tool" of the section "Setting up a new dictionary". I still believe that the index files contain entries that are not needed. If your dictionary contains information such as on grammatical category, then this may be the case. Such information can blow the index size, because without additional hints DictionaryGeneration will interpret such information as phrases and generate index entries for this.

The directory files contain the translations from the inputdictionaryfiles. So the size of these files is roughly the same. Note however that these files are ZIP compressed in the JAR file. So if your inputdictionaryfile is 300 kb, the resulting compressed files could be less than 150 kb (depending on the content).

Unicode makes the files a bigger, I assume you use UTF-8 encoding, right ? ISO-8859-1 will not work for Russian characters. With Unicode you will not have a problem for characters such as  ñ, ó etc.

Can you check the compressed file sizes in the JAR file ? How much space is used by the 'dictionary' directory of the JAR file and how much size by the application ?

Another tip: I think you can remove the icons-folder from the JAR-file. I believe the application will still run (well, obviously you won't have icons then).

Note that the English-Russion dictionary is using a much older version of DfM and the application size was much smaller then.

Gert


Title: Re: Is it possible to decrease the dictionary size?
Post by: Gert on 30. July 2007, 08:02:46
I just realize that the DictionaryUpdatePartialIndex is not documented on the web pages (I thought that I did; probably I just wanted to do it but then I forgot :( ).

To prevent indexing of non-wanted parts (such as grammatical categories), use the DictionaryUpdate-class "DictionaryUpdatePartialIndex" for that language. The set the parts that shall not be indexed in double braces: {{ }}

for example
original line in the inputdictionaryfile:
cat (n)   Katze (f;n)

needs to be changed to
cat {{(n)}}   Katze {{(f;n)}}

The run DictionaryGeneration again.
Hope that helps.

I really need to add this on the web pages ! Without that documentation you cannot understand, sorry for this.

Gert
Title: Re: Is it possible to decrease the dictionary size?
Post by: arsen_a on 30. July 2007, 13:51:17
Hi, Gert!

Sorry for delayed answer, I just tried to recompile with the settings that you advised, I have added DictionaryUpdatePartialIndex: {{ }} in the properties file and now I got dictionary with the size of 385kb, so we have gained another 100kb :) Regarding to the icons, I forget to tell you that I am using the light version of empty dictionary, which is 112kb and seems, does not contain icons. Do you have any other ideas to decrease the file size? Thank you very much!
Title: Re: Is it possible to decrease the dictionary size?
Post by: arsen_a on 30. July 2007, 21:06:15
Hi Gert,

I just had a look into index files again and found something interesting, as I already said, I have added DictionaryUpdatePartialIndex: {{ }} line into properties file. I discovered, that in there are two big ~40kb index files in the output directory, I looked into them and found that the letters 'm' and 'f' had a lot of indexes or pointers for the directory files. I think you know, that this letters are used to describe the gender of noun, so I think our program did not identified information in the  {{ }} correctly. Dear Gert, can you tell me, did I write that line correctly, is teh syntax right?
DictionaryUpdatePartialIndex: {{ }}
Another question is, here is a line from the index file:
millonario m :5-1215-B
and here is the same word in the directory5 file:
millonario_{{(m)}}:(here are the characters in Unicode format)
Is this correct? BTW, after compilation I ran the program on a PC emulator and when I enter a word in Spanish, after translation I got the word in such format: some word {{(m)}}, is that normal? Should I see this braces or not?
Thanks a lot!
Title: Re: Is it possible to decrease the dictionary size?
Post by: Gert on 30. July 2007, 21:30:24
Sorry for my hurriedly description above (I know I need to update the web pages !!).

You need to put in DictionaryForMIDs.properties the line
language1DictionaryUpdateClassName=de.kugihan.dictionaryformids.dictgen.dictionaryupdate.DictionaryUpdatePartialIndex

(or language2... whichever is the right language)

Then it will work.

Gert
Title: Re: Is it possible to decrease the dictionary size?
Post by: arsen_a on 31. July 2007, 11:22:52
Hello Gert,
Thank you for your help, I have added the line (language1DictionaryUpdateClassName=de.kugihan. etc) and now my properties file looks like this:
infoText:  Spanish-Russian dictionary
                dictionaryAbbreviation: IDP
                numberOfAvailableLanguages: 2
                language1DisplayText: Spanish
                language2DisplayText: Russian
                language1FilePostfix: Esp
                language2FilePostfix: Rus
                language2IsSearchable: false
                language2GenerateIndex: false
                language1DictionaryUpdateClassName=de.kugihan.dictionaryformids.dictgen.dictionaryupdate.DictionaryUpdatePartialIndex
                dictionaryGenerationSeparatorCharacter: ':'
                indexFileSeparationCharacter: ':'
                searchListFileSeparationCharacter: ':'
                dictionaryFileSeparationCharacter: ':'
                dictionaryGenerationInputCharEncoding: UTF-8
                indexCharEncoding: UTF-8
                searchListCharEncoding: UTF-8
                dictionaryCharEncoding: UTF-8
                language1NormationClassName: de.kugihan.dictionaryformids.translation.normation.NormationLat
                language2NormationClassName: de.kugihan.dictionaryformids.translation.normation.NormationRus

now the final .jar file size is 356kb, we have gained another 30kb but it still can not run on my Nokia :(
BTW, here is some information, maybe it can be helpful

Creating: ./output/dictionary/DictionaryForMIDs.properties
Property searchListFileMaxSize set to 235
Property indexFileMaxSize set to 11999
Property dictionaryFileMaxSize set to 6783
Property language1IndexNumberOfSourceEntries set to 9779
Done: property file
Title: Re: Is it possible to decrease the dictionary size?
Post by: Gert on 31. July 2007, 17:11:20
Hmmm ...

Do you know what precisely is the jar size limit for your Nokia ?

Did you have a look at the index files again to check that there are no more superfluous entries ? Can you give me the file sizes for the index files again ? Are the index files well compressed in the JAR-file (you could use any ZIP tool to get a rought compress ratio for the files) ?

Gert
Title: Re: Is it possible to decrease the dictionary size?
Post by: arsen_a on 31. July 2007, 19:05:55
Hi again,

Regarding to the precise file limit for jar applications on my Nokia, I think it can be 300kb, because once I have installed a EuroMap program,which size was 296kb and it ran normally to my astonishment. After that I thought that my Nokia can run even more, installed another dictionary ( I can't remember which one but I remember the size was ~365kb ) and my Nokia could not identify that file. The generated message was: incorrect file.
As to the index file size, as I have mentioned before, each file weights 10kb.
What do you mean by saying: Are the index files well compressed in the JAR-file ? For making the dictionary I use automatic method that is described in the how to. I use this command:
java -jar JarCreator.jar dictionarydirectory emptyjar outputdirectory
( of course with correction for my directory names ).  Do I have to do this step manually? I just checked to compress the finally created jar file with zip and the file size decreased with 5kb only.
I forget to tell you that I have also checked the content of index files and did not find any other unnecessary information there :(
Title: Re: Is it possible to decrease the dictionary size?
Post by: Gert on 31. July 2007, 19:40:46
Yes, using JarCreator as you do is the right way, no need for manual steps. JarCreator does compress the files in the JAR-file, I was just wondering whether there everything worked well. So a 10kb index file is 5kb after compression, right ?

Could you give me the current total size of the index files (uncompressed) ?

The directory files were about 300 kb uncomppressed if I remember well. That may be roughly 150 kb compressed, maybe less. Plus the 112 kb application code and the size of the index files.

Well, would be useful to know the file size limit of your Nokia. You could try to dermine it by just removing some of the files in the /dictionary folder of the JAR-file (use any zip-tool for this) and then try to install the application.

Gert
Title: Re: Is it possible to decrease the dictionary size?
Post by: arsen_a on 01. August 2007, 20:20:09
Hi Gert,

Sorry for my delayed reply but I have some good news for you. Today I managed to install the dictionary with size of 346kb and it ran normally but I have deleted 3 index files from the output directory so it is not the right solution. I also checked to install a file with size of 356kb and it didn't run. I suppose the limit of file size for my Nokia is 350kb. Now, the question is, can we make the dictionary smaller by 10kb?
I had a look into dictionary functions and found some "unnecessary" things, for example if I set to on: Use bitmap fonts, then russian words become unreadable :( Can't we cut this section from the empty dictionary file? Also there are some "unnecessary" languages, for example Vietnamese and Japanese, I think I don't need them.
The next important thing- As I can understand, in the settings field, when we choose the language for the program interface, Russian is not completed yet! You know, Gert I can update that section, I speak russian very well for already 25 years! :) Please just tell me in which format (I guess in Unicode) you would like to have that missing and incorrect words? Seems that's all, thank you very much for your help, I will wait for your answer!
Title: Re: Is it possible to decrease the dictionary size?
Post by: Gert on 02. August 2007, 05:44:16
Yes, remove the normation files for the languages that you do not need, this may give you the 10 kB.

Your problem with the bitmap fonts only exists in the development version that you are using. The released version does not allow to select bitmap fonts unless these fonts are actually configured for the dictionary.

If you could provide us with an update for the Russian user interface, that would be great ! Just read "We need your support / Translations for DictionaryForMIDs user interface". I directly attached the languages files, we need to have this file updated. Besides, Spanish needs an update also, can you help there, too ?

Gert
Title: Re: Is it possible to decrease the dictionary size?
Post by: arsen_a on 02. August 2007, 11:15:50
Dear Gert, I am on my way of updating Russian section of Translations for DictionaryForMIDs and have several questions to you. Could you please explain the meaning of some words in this program? For example: back word, forward word, do  they mean one word back and one word forward?! I also experienced problems with word 'hits' , does it mean the number of coinciding during the search?
In the end of the translation, in Free Memory section I couldn't find the separation between words, for example:  character\n\nTo control .
Regarding to the Spanish translation, as I said, I just started to learn Spanish :) but I will ask my teacher maybe she can help us in this case?!
Title: Re: Is it possible to decrease the dictionary size?
Post by: Gert on 02. August 2007, 18:33:00
"Back word" means that you want to go back to the word that you translated previously. And the other way round for "forward word".

"Hits" means the number of translations that were found.

\n is the newline character, means that the following words will appear on a new line.

Besides, I believe the list of entries on the DfM web page is not complete any more (maybe you could send an email to Quynh ?), if you look at the "#UIDisplayTextItems" section, then you see all the required entries.

Hey, maybe your Spanish teacher will like to use DfM also :)

Gert

Title: Re: Is it possible to decrease the dictionary size?
Post by: arsen_a on 02. August 2007, 19:57:26
 Hi Gert, thanks for the explanation but one thing is unclear "character\n\nTo control" or "character  \n\n  To control" where is the separator? 
  Regarding to the email to Quynh, I will write it tomorrow, I already found the list of the words that need to be translated ;)
Regarding to my Spanish teacher, what do you think, why I am torturing myself for this Spanish-Russian dictionary if I can use Spanish-English dictionary? All is for her :) I think she will be glad to translate that words but I think she is not so good in Spanish computer terms.
I forget to ask you one important thing, how can I remove from the empty dictionary "unwanted" files? Is it possible to use JarCreator.jar for decompressing/compressing  .jar files? Because I used WinAce for changing files inside the jar archive, the program saved normally the file but when I tried to run it on the phone I got the error message "Unable to start", PC emulator also can not run it, it closes immediately :( As I can guess, if I remove some files from the archive I have to make changes in the class files. Am I right or not?
Thanks for everything ;)
Title: Re: Is it possible to decrease the dictionary size?
Post by: Gert on 02. August 2007, 20:24:14
Not sure whether I get your question on the separator. "character\n\nTo control" is the better version, because in "character  \n\n  To control" you have disturbing blanks.

Oh, so your teacher ows you something ;)

Removing files: you can remove the files from the output of JarCreator. Then you also should delete the JAD-file (the JAR-file alone will run). You just can remove class files with any ZIP tool ('7-zip', 'filzip', the 'jar'-tool, etc.), the resulting file should run. There is no need to modify other class files.

If you remove class files from the 'empty' application (i.e. the input to JarCreator), I then you need to keep the JAD file, I think you just can remove the class files from the empty JAR-file. But I never tried to do this.

Gert
Title: Re: Is it possible to decrease the dictionary size?
Post by: arsen_a on 03. August 2007, 09:04:50
Hi Gert,
It seems I didn't explained clearly my question regarding "character\n\nTo control" , I would like to know, which words must be translated 'To control' or 'control' ? :)
Regarding to the program, I think I will use 'filzip' I just downloaded it. I will wait for your answer regarding to the words for translation.
Title: Re: Is it possible to decrease the dictionary size?
Post by: Gert on 03. August 2007, 09:54:11
"to control", like "In order to control ..."

Do you have any further questions for the translations ? Well, please just ask (but, besides, starting with next week I will be 'out of town' for two weeks).

Gert
Title: Re: Is it possible to decrease the dictionary size?
Post by: arsen_a on 03. August 2007, 10:51:06
Hi Gert,

Thank you very much for your support. At last I got the working dictionary with size of 346kb and have no other questions to you :) Regarding to the translation, now I know which words need to be translated. I think I will write an email to Quynh and put your email in CC. BTW in the future if you have questions regarding Russian or need some translation don't hesitate to contact me ;) Thanks a lot again and have a good rest! :) Bye bye. 
Title: Re: Is it possible to decrease the dictionary size?
Post by: Gert on 03. August 2007, 10:57:23
Great - and thank you for your support !!

Besides, maybe you have some time to do publicity for DictionaryForMIDs ? We really need people who work on publicity/advertisement !!

Best greetings,
Gert
Title: Re: Is it possible to decrease the dictionary size?
Post by: arsen_a on 03. August 2007, 11:09:56
Gert, what can I do for publicity/advertisement for DictionaryForMIDs ? I think If I found someone who is interested in mobile dictionaries I will give him the links to your dictionary page ;) Is that enough?
Title: Re: Is it possible to decrease the dictionary size?
Post by: Gert on 03. August 2007, 11:19:59
That's a good idea, but, well, if you have some more time (and interest) you could for example upload dictionaries to several web sites, or do some other things. Just have a look at the section "We need your support / Publicity" on our homepage.

Gert
Title: Re: Is it possible to decrease the dictionary size?
Post by: arsen_a on 03. August 2007, 12:51:19
Ok, Gert, I understand, I will do my best ;)
Title: Re: Is it possible to decrease the dictionary size?
Post by: Gert on 03. August 2007, 18:49:28
Super - the whole project will rely on you  ;D

Gert
Title: Re: Is it possible to decrease the dictionary size?
Post by: arsen_a on 03. August 2007, 19:14:07
Looool :)