Is this file good for conversion?

Started by Itman, 17. July 2017, 13:51:29

Previous topic - Next topic

0 Members and 1 Guest are viewing this topic.

Itman

Hi, here is a file which comes out of a stardict dictionary. I dont know if its good for converting or not. Can you please have a look on the attached image?
Thanks


Gert

I just had a look at the screenshot.

I guess it is not easy to convert this dictionary for DictionaryForMIDs. What words/expression does it lookup? I mean, is it for looking up expressions such as "anders denkende". How is such an expression identified? Is it following a newline/carriage return?

Well, some preprocessing will certainly be required: the tags such as <blockquote> or <c> would need to be converted into something that is understood by DictionaryForMIDs (some "content" there).

In total I think, yes it will be possible to convert this dictionary for DictionaryForMIDs, but it will be some effort.

Besides, what 'dictionary' is it?

Regards,
Gert

Itman

Thank you for your answer. Its a normal Duden German German dictionary.
In .dsl format it looks like  this:



Since I am creating .dsl dictionaries myself, it is not a problem to convert it to a different format manually (by regular expressions). How is the DFM format built? What tags are you using?

Gert

The DfM format is documented in our web pages:
You could look at
http://dictionarymid.sourceforge.net/DfM-Creator/index.html, there "Complete Documentation"
http://dictionarymid.sourceforge.net/DfM-Creator/newdict.html
And for tags: http://dictionarymid.sourceforge.net/DfM-Creator/newdictContent.html

Duden, I assume, would be copyrighted material and could not be made available for public download (the licensing conditions would tell more).

Regards,
Gert

Itman

#4
I downloaded the merriam webster dictionary. Is there a possibilty to convert it back to txt?

And how do I mark italics and bold text and colors in the txt file?

Gert

1.  merriam webster
Say, what do you mean by "convert it back to txt"? Does this mean how to convert it in a format that is readable by DfM-Creator ("Input CSV file").


2. Italics etc.
You can define italics etc. via the Contents. From http://dictionarymid.sourceforge.net/DfM-Creator/newdictContent.html:

languageXContentNNFontStyle
Defines the font style for the content. Allowed values are provided in the ComboBoxes as follows:
bold
italic
underlined
plain

Examples from the screen-shot:

language1Content01FontStyle:plain
language1Content02FontStyle:italic

language2Content01FontStyle:bold
language2Content02FontStyle:italic

language3Content01FontStyle:underlined
language3Content02FontStyle:bold

This property is optional, the default value is plain.

Itman

Sorry I dont get it. Look at my first screenshot. I mark bold with <b> </b> how should I tell the languageXContentNNDisplayText to mark it bold?

Yes I mean back to the input file.

Gert

Example from your last screenshot:

Source:
[ b][c blue]Abartung[/c][/b]

Convert for DfM to:
[01Abartung]

To make "Abartung" bold and blue (assuming it is the column for language2):
language2Content01FontStyle:bold
language2Content01FontColour:0,0,255



Here is the complete example from http://dictionarymid.sourceforge.net/DfM-Creator/newdictContent.html:

Content tags for the dictionaries

In the dictionaries the content parts are marked with the following syntax:

Each content has a start delimiter at the beginning and an end delimiter at the end.

Start delimiter:
[NN
where NN is the content number. This needs to be a two-digit number !

End delimiter:
]

To use a [ or ] character in the text (without content syntax) a \ (backslash) must be prepended:  \[ and \]
A newline-character is \n and a tab-character is \t

Here is an example for a language2 column:

dictionary [01dikshionari] [02noun] [03\nA book that contains translations for words.]
(Content numbers are boldfaced only for didactical purposes)
In that example the following properties are declared:

language2Content01DisplayText:contentPronunciation
language2Content02DisplayText:contentGrammaticalCategory
language2Content03DisplayText:contentNotes


Contents can also be nested. Example:

dictionary [01dikshionari] [03\nA book that contains translations for words. [02noun]\nAlso exists in electronic form]

Itman

#8
What am I doing wrong?




Gert

language1 (= column 1) does not have content declarations -> Language-1 number of content declarations = 0
language2 (= column 2) does have content declarations -> Language-2 number of content declarations = 2 or 3 (or other value)

Strings such as "<c>", "<blockquote>" and "&lt" need to be replaced in column 2.

Regards,
Gert

Itman

#10
Quote from: Gert on 17. July 2017, 19:11:53
language1 (= column 1) does not have content declarations -> Language-1 number of content declarations = 0


Its not possible. Each time I save it, it jumps back to Language -1 (3) and Language -2 (0)

Itman

The creator doesnt let me to have 0 declarations in Language 1

Gert

Hmmm, that is probably something that should be looked at by Karim (the developer of DfM-Creator).

You simple could set it to 1 for language1; this should not disturb.

Regards,
Gert

Itman

It did not change anything. But maybe I am doing something else wrong?




Gert

From a quick look the configuration in DfM-Creator seems ok for me.

Still, in the dictionary strings such as "<c>", "<blockquote>" and "&lt" need to be replaced in column 2. DictionaryForMIDs does not know any of these strings and when you use the dictionary then this will be displayed as plain text.

When you convert the dictionary from the source, you need to convert/filter strings like "<c>", "<blockquote>", ...

Regards,
Gert