LEX2ALL
Un article de Loria Wiki.
LEX2ALL is a lexicon converter. It converts lexicon in a specific format (.lex files) to DyALog, LLP2 or Geni format. It is available locally with darcs version controling system. It is written in haskell and provides the lexConverter executable.
See also Depots_de_lexiques.
Note that this converter works with both syntactic and morphologic lexicons. To see the available options, please invoke:
lexConverter -h
Sommaire |
Lexicon format
NB: The first version of this format has been designed by Benoit Crabbé.
lemmas (i.e. syntactic lexicon)
The lex format in use at the moment allow for many pieces of information (including semantics). Its fields are:
*ENTRY: used to store the lemma,
*CAT: syntactic category,
*SEM: semantic information (at this time of writing a macro call, note that these macros are extracted automatically from the metagrammar by XMG),
*LAM: semantic information represented as lambda terms (optional field),
*ACC: verb acceptance
- (for word having several meanings such as: parler, example:
- jean parle anglais ; jean parle à marie) unused
*FAM: the family (i.e. subcategorization frame)
*EX: list of exceptions (in tagml this is a features list having the value "-") unused
*EQUATIONS: anchoring equations, of the form:
- node -> [top|bot.]feat = val
*COANCHORS: coanchor equations, of the form:
- node -> lemma / category
These equations are used to specify a lexical item that has to be added in the tree.
N.B.
- all these fields are ordered.
Example:
*ENTRY: ce
*CAT: cl
*SEM: basicProperty[rel=ce]
*ACC: 1
*FAM: CliticT
*FILTERS: []
*EX: {}
*EQUATIONS:
anc -> func = suj
anc -> refl = -
*COANCHORS:
Emacs mode
Thanks to Jerôme Perrin.
(require 'generic-x)
(define-generic-mode 'lex-mode
'("%");;comments
'("ENTRY" "CAT" "SEM" "LAM" "ACC" "FAM" "FILTERS" "EX" "EQUATIONS" "COANCHORS");;keywords
'(
("ENTRY : +\\(\\sw[a-zA-Z0-9_.-]*\\)" 1 'font-lock-type-face);noms de classes
("include [a-zA-Z0-9\.]+" . font-lock-constant-face)
)
'(".lex\\'") ;;file extension
nil
"Major mode for lex editing")
This mode is associated with file having a .lex extension.
Morphological items
Morphological entries are of the form
lexical item <tabulation> associated lemma <tabulation> [ feat1 = val1 ; feat2 = val2 ; etc (optional ";")]
Here are some examples:
aime aimer [pos = v; mode = subj; pers = 3; num = sg;] ait avoir [pos = v; mode = subj; pers = 3; num = sg;] ami ami [pos = adj; gen = m; num = sg;]
Output formats
- DyALog: XML format
- LLP2: XML format (aka tagml)
- Geni: text format (for lemmas only)
Installation and usage
Installation :
cd LEX2ALL/
make
make install
Usage: lexConverter [OPTION...] files...
-v --verbose verbose output on stderr
-H, -h --help show help
-L --lemmas Converting mode -> lemmas
-M --morph Converting mode -> morphological items
-r --recode To convert from Latin1 to UTF8
-d --dyalog Output format: dyalog
-l --llp2 Output format: llp2
-g --geni Output format: geni
-x --xml Output format: geni XML
-t --tulipa Output format: TuLiPA
-o [FILE] --output[=FILE] output FILE (default: stdout)
-i [FILE] --input[=FILE] input FILE (default: stdin)
Related links
Catégories de la page: GenI | Haskell | TSNLP | TALARIS
