Setting up the 'VocTrainVH-dictionary'

Started by Gert, 06. February 2010, 20:50:49

Previous topic - Next topic

0 Members and 1 Guest are viewing this topic.

Gert

Colleagues,

we received the "VocTrainVH-dictionary" from Sabine from Vox Humanitatis and from Zdenek. This dictionary contains almost 30 languages.

For testing purposes, I just tried to do a "quick-and-dirty" set-up of this dictionary, but DictionaryGeneration reported an error:

Checking: ..\..\Related Files\Dictionaries\vox\VocTrainVH-eng.txt

Number of separator-characters is not correct in line 359

Number of found separator-characters: 30 / expected: 29


Here are the files that I used: http://www.kugihan.de/dict/download/test_versions/3.4vox/vox dictionary.zip


  • File VocTrainVH-eng.xls:  the file that I received
  • File VocTrainVH-eng.txt:  the xls-file saved as UTF-8 encoded tab-separated txt file. Note than I did put the first 2 columns at the end. I also removed the top line.
  • DictionaryForMIDs.properties: the corresponding properties file.


I did not yet spend time to check what is wrong at line 359; maybe there is a tab within a word, so that in total there are 30 separator characters instead of 29 ?

Maybe Sabine, could you have a look at that ?

Best regards,
Gert

Sabine Emmy Eller

Yes, the issue should probably be this one:
http://www.voxhumanitatis.org/content/leading-spaces-finnish-terminology

I found leading spaces/tabs in the Finnish terminology - this was updated yesterday night. Here it does not produce problems since I work with semicolon as separator.

Attached a file with the corrected cells. If you can copy/paste the Finnish language column in the big table, please do - if you don't have time: I can do probably tomorrow.

Sabine

Gert

Ok thanks - I will look at your update.

DfM can handle any separator character, also semicolons; however I do not know how to store an Excel-file with semicolons as separator. Also, I believe it is possible to replace tab with "\t" (backslash+t) then the tabs will be preserved for DfM and it is still possible to use the tab as a separator.

Keep you updated.

Gert

Gert

I am sorry, I just failed to include your update ... well I told you that I am not very fluent with that file handling ;)

I checked the original (non-updated) excel sheet: there are 46 tabs in that file, which probably cause the problem.

Till next time !
Gert

Sabine Emmy Eller

Ok, so the best thing is: do a search/replace within the table. Whenever I find something wrong I create an issue on our website, because often I simply don't have the time to do things immediately.

But now you also see what I want to say when talking about translators "they are not used to how software actually works" - well some of them yes, but most of them not, therefore we must make it as easy as possible to get the UI-translations. I don't know how often I repeated: please don't do any copy/paste from one application to the other, because you don't know which "rubbish" you pull in ... I prefer three correct entries to twenty that then need to be corrected and which screw up all the other stuff.

I better go to bed now ... it's late. Good night to everyone who reads :-)

Gert

Seems that I am really not the right person to set up your dictionary ... I do not know how to create the inputdictionaryfile from your Excel file. Here is the situation:

(1) I have an Excel file that has tabs in it. So saving as tab-separated file will not work. But I do not know how to remove the tabs in Excel before saving.

(2) DfM supports any separation character, also for example comma or semicolon. However I do not know how to save the Excel file with such a separator and at the same time keeping a sensible character encoding such as UTF-8 (DfM supports plenty of encodings; UTF-8 is used usually).

(3) The update on Finnish that you sent before, I do not know how to read that with Excel (all text is put in the first column).


Sabine, I believe there is nothing wrong with your dictionary (no issue to create), the problem is just, that I do not know how to convert the Excel file.


In case anyone likes to set up the dictionary, just run DictionaryGeneration as described here: http://dictionarymid.sourceforge.net/newdict.html#SetupGeneration

E.g.
java -jar DictionaryGeneration.jar  inputdir/VocTrainVH-eng.txt  outputdir/dictionary  inputdir

In inputdir there must be the file VocTrainVH-eng.txt and the file DictionaryForMIDs.properties. The outputdir must contain an empty subdirectory with the name dictionary.


Regards,
Gert

Gert

Ok, after some more 'playing around' with Excel I could generate a 'quick-and-dirty' set-up for DfM. I removed lines that caused troubles and I copied another language to FIN.

Here is the result: http://www.kugihan.de/dict/download/test_versions/3.4vox/vox-quick-and-dirty.zip

The inputdictionaryfile is UTF-16 encode. Only for the first two languages (English and Russian) I entered the correct DisplayText, so that Zdenek's flags are shown and that the UI translation will work for these languages. And I also added for English and Russian the NormationClass.

This is only a very first step to demonstrate that this dictionary works well with DictionaryForMIDs.

Regards,
Gert