Setting up a new dictionary for DictionaryForMIDs


For change notes from past releases see here.

Setting up a dictionary is just configuration, there is no need to have programming knowledge or a development environment. And if you have any problem during setting up a dictionary for DictionaryForMIDs, just contact us and we will assist you.

Setting up a dictionary for DictionaryForMIDs involves the following steps:

  1. Setup dictionary file
  2. Setup work environment
  3. Configure DictionaryForMIDs.properties
  4. Build dictionary
  5. Publish dictionary (optional)

 

1. Setup the dictionary file

The easiest way to store a dictionary is by using a spreadsheet program such as Microsoft Excel.  Create two columns: the headword for the first language, and the definition in the second language.  Microsoft Access can also be used to store the dictionary, but it is more difficult to edit and sort the data.  Microsoft Excel 2007 is best as each cell can store an unlimited amount of data.  Microsoft Excel 2003 can only store 255 letters in each cell.

Adding line breaks (optional)

Some definitions can be very long. The definition would be easier to read if there were line breaks.  Here is an example:

Read [tab] 1. To read, advice, counsel. 2. To interpret; to explain; as, to read a riddle. 3. To tell; to declare; to recite

Note: [tab] is for the tab-separator character (would be '\t' in DictionaryForMIDs.properties)

The definition would display as a long string of words. To add a line break, add a "\n" to the definition.  For example:

Read [Tab] 1. To read, advice, counsel\n2. To interpret; to explain; as, to read a riddle.\n3. To tell; to declare; to recite

Now the definition will be displayed like this:

Read
1. To read, advice, counsel.
2. To interpret; to explain; as, to read a riddle.
3. To tell; to declare; to recite.

Excluding text from generated index files (optional)

Some people who ran DictionaryGeneration experienced that the files generated were very big. For example, if the inputdictionaryfile was 2 MB, then the generated files were 10 MB or more. In this case the index files likely contained unnecessary information.

To illustrate the problem, here is an example with a line from the inputdictionaryfile:

sleep  The state of reduced consciousness of a human or animal [tab] Schlaf  Zustand der Ruhe eines Tieres oder Menschen

Note: [tab] is for the tab-separator character (would be '\t' in DictionaryForMIDs.properties)

Here, without additional information, DictionaryGeneration would index all expressions that are included in the explanatory texts (e.g. "The state of reduced consciousness of a human or animal"). This is undesirable.

The solution is to put the text that does not need to be included in the index between {{ and }}.

In the example:
sleep  {{The state of reduced consciousness of a human or animal}}[tab]Schlaf  {{Zustand der Ruhe eines Tieres oder Menschen}}

Then the size of the generated files would collapse. For an inputdictionaryfile with lines as in the above example, the compressed result will likely be below 2 MB.

Define content declarations (optional)

Content declarations allow users to add different styles to different parts of a definition.  For example, all grammar tags could be displayed in blue color and all sample sentences could be displayed in italics.  For more information see here.

Using two source dictionaries (optional)

Typically only one source dictionary is used to create a built dictionary.  A input dictionary file has a headword followed by definition in the second language.  DictionaryforMIDs creates a bi-directional dictionary by building an index from language1 -> language2 as well as language2 -> language1.

However, you may have two separate input dictionary files.  One input file is from language1 -> language2.  The second input file is from language2 -> language1.  For information for setting up these kinds of dictionaries, please see here.

 

2. Setup the work environment

Please download the DictionaeryforMIDs work environment here.  It is a self-extracting ZIP file.  It will extract the files to C:\Dict\.  You may change the directory, but you will need to manually change the batch files to the new directories.  Therefore it is recommended to leave the directory as C:\Dict\ for new users.

The work environment contains the following tools:

DictionaryForMIDs.jar
DictionaryForMIDs.jad
These are the empty files that you will add your dictionary into.
DictionaryGeneration.jar This tool will build the index files for you.
JarCreator.jar This tool will package your index files into one file.
FontGenerator.jar This tool will create bitmap fonts for users who do not have the necessary fonts on their phone.

Next, download Java (J2SE Runtime) and install it.  DictionaryforMIDs depends on Java to create the dictionary files.  You can download Java from here (this file is > 10 MB).

Next, check to see that your Java environment is setup correctly.  You can check this by building the sample dictionary that is included with DictionaryforMIDs.  Go to Start Menu -> Accessories -> Command Prompt.  Then type "cd c:\dict\" (without the quotation marks).

Then type "setup" and press Enter.  When the command prompt returns, type "jar" and press Enter.  If all goes well the command prompt will return without giving an error.  The sample dictionary will show up in the C:\Dict\JAR\ directory.

If you got an error message about missing Java, then you may need to add Java to your Windows path:

Start Menu-> Control Panel -> System -> Advanced System Settings -> Environment Variables -> System Variables -> Path -> Edit

Add this to the end of your path (you need the ";"):
;C:\Program Files\Java\jre6\bin\
Then reboot and you should be good to go.

Note: If you are using a 64 bit version of Windows, then you may need to use "C:\Program Files (x86)" instead of "C:\Program Files" depending on which version of Java you downloaded.

Once your environment is ready, you can create your input dictionary file.  Open Dictionary_input.txt in Microsoft Word.  Delete all the data in the file.  Then open Microsoft Excel and copy the two columns you created and paste them into Microsoft Word.  Save the file again (using UTF-8 encoding) and the file is ready.

 

3. Configuring the properties of the file DictionaryForMIDs.properties

Here is where you customize DictionaryforMIDs for your dictionary.  Here is a sample DictionaryForMIDs.properties file:

	infoText: IDP (English - German), version 1.1 19Feb99: http://www.ilovelanguages.com/IDP/IDPfiles.html
	dictionaryAbbreviation: IDP(Eng-Ger)
	numberOfAvailableLanguages: 2
	
	language1DisplayText: English
	language2DisplayText: German
	language1FilePostfix: Eng
	language2FilePostfix: Ger
	
	language1IsSearchable: true
	language2IsSearchable: true
	language1GenerateIndex: true
	language2GenerateIndex: true
	
	language1HasSeparateDictionaryFile: false
	language2HasSeparateDictionaryFile: false
	
	dictionaryGenerationSeparatorCharacter: '\t'
	indexFileSeparationCharacter: '\t'
	searchListFileSeparationCharacter: '\t'
	dictionaryFileSeparationCharacter: '\t'
	
	dictionaryGenerationOmitParFromIndex: true
	dictionaryGenerationInputCharEncoding: UTF-8
	indexCharEncoding: UTF-8
	searchListCharEncoding: UTF-8
	dictionaryCharEncoding: UTF-8
	
	language1DictionaryUpdateClassName: de.kugihan.dictionaryformids.dictgen.dictionaryupdate.DictionaryUpdate
	language2DictionaryUpdateClassName: de.kugihan.dictionaryformids.dictgen.dictionaryupdate.DictionaryUpdate
	language1NormationClassName: de.kugihan.dictionaryformids.translation.normation.Normation.NormationEng
	language2NormationClassName: de.kugihan.dictionaryformids.translation.normation.Normation.NormationGer
	

You usually will only have to change the items in blue.  Everything else will probably remain the same.  You can edit DictionaryForMIDs.properties with NotePad or any other word processing program.

The infoText field has three parts: title of dictionary, version of dictionary, and the website of the original data.  The title will be similar to the dictionaryAbbreviation.  If the dictionary has a title already, then use that title.  If there is no original title then use the title of the website that created that original dictionary or the last name of the person who created the original dictionary (the last name of the person on the website, not your name).  Next, add the version of the dictionary listed on the webpage.  If a version is not given, then use the date written on the webpage.  If that is not available too, then use today's date.  Then add the link to the website where you got the original dictionary file.

Next, write in the languages of the dictionary.  The language in the left column of the dictionary file will be language1.  The language on the right side will be language2.  Then write in the 3-digit code for the languages.  The codes are available here.

Normation classes are used to search multiple languages using simple input words.  For example, for German 'Umlauts' (ä, ö, ü), the NormationGer normation class allows the user to only type "a", "o", or "u" and still find the correct words.  Normation classes for several languages have already been created.  You can find a list here.

If a Normation class isn't available for your language, then you can find information to create a new one here.  Or, you can just use the default Normation class.  In this case use this:

language1NormationClassName: de.kugihan.dictionaryformids.translation.normation.Normation

Customization of DictionaryGeneration with a DictionaryUpdate class (optional)

The DictionaryGeneration tool can be customized by DictionaryUpdate classes. Read here for a description of DictionaryUpdate classes.

 

4. Build the dictionary

Now you are ready to build your dictionary.  Go to Start Menu -> Accessories -> Command Prompt.  Then type "cd c:\dict\" (without the quotation marks).  Then type "setup" and press Enter.  When the command prompt returns, type "jar" and press Enter.  Your finished dictionary will show up in the C:\Dict\JAR\ directory.

The setup.bat and jar.bat files are just simple batch files used to build your dictionary.  setup.bat runs the DictionaryGeneration tool to build the index files.  jar.bat runs the JarCreator tool to put your index files into a single usable file.

The batch files can be opened in NotePad or any word processor to change the directories.  For more information on using the DictionaryGeneration tool, see here.  For information on using JARCreator, please see here.

Creating bitmap fonts (optional)

The users of your dictionary may not have the necessary fonts to use the dictionary.  For example, if you created an English -> Russian dictionary, then your users may not have the Russian fonts on their phone to view the Russian words.  Therefore it may be useful to add bitmap fonts to the dictionary.  This will enable the user to view any language contained in the dictionary.  The FontGenerator tool is found in the C:\Dict\Tools\ directory.  For help in creating bitmap fonts, please see here.

 

5. Publishing your dictionary and submitting it to the 'dictionary archive' (optional)

If your dictionary is open source, then you can publish it on the download page. For this, please send an email to Peter Kmet (send cc to Gert Nuber), see contact. Also, please post a message about your dictionary in the DictionaryForMIDs forum. Once your dictionary is published, please also send the inputdictionaryfile and the other files that are needed to the 'dictionary archive'. For more information see here.

Packaging into a ZIP file

For packaging the files, put the 4 files (1) DictionaryForMIDs_xxx.jar (2) DictionaryForMIDs_xxx.jad (3) README.txt and (4) COPYING.txt into a ZIP file. You should use this file naming convention:

DictionaryForMIDs_VVVVV_XXXYYY_ZZZ.zip
VVVVV: version of DictionaryForMIDs, for example "3.4.0"
XXX: language1FilePostfix, for example "Eng"
YYY: language2FilePostfix, for example "Por"
ZZZ: info on the origin of the dictionary (can be longer than 3 characters), for example "IDP" or "freedict"; sould be the same as defined in the property dictionaryAbbreviation.

 

If you have any problem with setting up a new dictionary, just contact us and we will try to help you !