Making dictionaries available with the current version

Started by Inkus, 09. March 2007, 08:03:51

Previous topic - Next topic

0 Members and 1 Guest are viewing this topic.

Inkus

Hi!

Gert inspired me to get involved into DictionaryForMIDs project, so here I am. I'll try to work on some scripts to help generate dictionaries that would work with the last version of DictionaryForMIDs. If anyone has thoughts/suggestions about this, please post here.

Gert

Be welcome !! You are the person for whom we searched for 2 years  :) :) :)

Some people already did make thoughts about scripts that put the dictionaries on the latest version of
DictionaryForMIDs. I will try to summarize this here:

1.
There was the idea to use write the ant scripts for this. Ant is a tool that was originally developed primarily to build Java applications (and also for the DictionaryForMIDs project there is an ant build script to generate all the tools and applications). Actually, ant is not specific for Java; it is a useful tool that allows you to handle file generations and dependencies among files. Plus that ant scripts are portable accross platforms (Linux, MS Windows, etc.).
Peter Kmet already did do first prototype-like ant scripts. I attached these to this posting.

Sean Kernohan also had some thoughts on scripts.

2.
Zdenek Broz (he is the maintainer of dicts.info; on the DfM download page you find his dictionaries under the DTS-section) does have simple scripts that completely automatically produce the resulting DictionaryForMIDs_xxx.zip files from the dictionaries that he maintains on dicts.info. His scripts produce everything automatically: the file DictionaryForMIDs.properties, the generated files from DictionaryGeneration, the jar/jad-files from JarCreator, plus the resulting ZIP file with the packaged README files etc.
So this is the good example !!
However: the dictionaries that he is producing, for the moment, don't make use of any advanced features such as bi-directional dictionaries, Content definitions, etc.

I will write more later ...
Gert

Gert

And here is what I think needs to be done for the dictionaries that can be downloaded from our web site (apart from the DTS dictionaries for which Zdenek already did write the scripts):

1. Collection of the dictionaryinputfiles
As of now, for the dictionaries that can be downloaded from our web size, we do not have an archive of the dictionaryinputfiles ! For example I myself have the dictionaryinputfiles for some dictionaries on my PC, Jeff has several more, and others have a few.
So we need to collect all the dictionaryinputfiles, along with all related files, such as copyright statements or possibly source files. Plus for each dictionary the file DictionaryForMIDs.properties needs to be stored. All of those file should be put into an archive.

2. Generating the dictionaries with DictionaryGeneration
Each of the dictionaryinputfiles needs to be run through DictionaryGeneration. This should be automated by a script.

For bi-lingual dictionaries (many are bi-lingual), this a little more complicated. See under the description of the property languageXHasSeparateDictionaryFile the "For documentation, see here".

3. Creating the files DictionaryForMIDs.jar/jad with JarCreator
The generated dictionaries need to be put in a Jar-file with the latest version of DictionaryForMIDs. This is done with JarCreator. Again this should be automated by a script.

4. Putting the files DictionaryForMIDs.jar/jad in a ZIP package
By a script, the files DictionaryForMIDs.jar/jad need to be put in a ZIP package, together with README & COPYING file, plus if appropriate additional copyright statements or other files. The Java Jar tool can be used for this, or 7-zip, or whatever else ZIP-tool is preferred.

5. Publishing the ZIP packages on the web site
Finally, the ZIP packages need to be published for download on the DfM web site. Getting the files there involves uploading them to the SourceForge File Release System.
This step 5 probably requires some co-ordination with Peter (Peter is responsible for publishing dictionaries).


In general, I'd propose that all data (the archive of inputdictionaryfiles, the scripts, etc.) are stored on SourceForge once established. So this will be a long-term contribution to the DfM project.

Actually, I believe that the main focus of this task is not only writing scripts, probably the scripts can be done rather quickly; but beyond scripts there are other things that need to be done: co-ordinating with other people, getting clarifications and so on.

This task of bringing the dictionaries to the latest version of DictionaryForMIDs is one of the most important tasks that we have in the DfM project !!

Best greetings,
Gert

Gert

Oh, I did forget step "2.5":

2.5 Generation of the bitmap fonts
Starting with version 3.1 of DictionaryForMIDs the bitmap font support will be complete. Then for all dictionaries that have any characters beyond ISO-8859-1 should have a bitmap font. This bitmap font is generated with the BitmapFontGenerator tool.
Right now there is 'only' an graphical user interface for this BitmapFontGenerator, possibly a command line interface may have to be added so that it can be better used from scripts.

Gert

Inkus

If I understand correctly, dictionaries are out of scope of DictionaryForMIDs project - I mean any of DictionaryForMIDs project members does not manage what is inside dictionaries. All the dictionaries you use are from outside (like Internet Dictionary Project,  http://www.freedict.org/ ,etc), all that matters is how to get the translations of words from them. As each of outside dictionary is in different format (right?), you have convert them to the format, that dictionaryForMids application could understand. I guess the converted dictionaries are called dictionaryinputfiles. Right?

Gert

Absolutely right !

We do use dictionary 'from outside' and set them up for DictionaryForMIDs. As part of the DictionaryForMIDs project we do not work on the content of these dictionaries (well, of course you can have an activity also on the dictionary content, just that this is not scope of the DictionaryForMIDs project).

So, yes, as part of the DictionaryForMIDs project we are retrieving 'dictionary source files' from various dictionaries, and if necessary we convert them to a tab/comma separated files and if possible we include some more enhancements such as 'content definitions'. This results in the inputdictionaryfile which is run through DictionaryGeneration.

And right, typically each of the 'dictionary source files' have their own format (within a dictionary project such as freedict, the format is normally the same).

Depending on the 'dictionary source files', creating the inputdictionaryfile for DictionaryGeneration can be between zero effort and 'quite some effort'. 'Quite some effort' can occur when the 'dictionary source file' is in bad shape (duplicate entries, etc.) and some restructuring needs to be done first. High-quality dictionaries usually can be set up rather quickly. Partly it also depends on the person's endeavour to get to a really good result.

Fortunately, for the existing dictionaries that you can download from the DictionaryForMIDs web site, this task was already done. That means that for all those dictionaries on the DictionaryForMIDs web site there is already a inputdictionaryfile. To collect all these, that's what I described under 'Task 1'.

Just ask if you have any futher questions !

Greetings,
Gert