LEX2ALL

Un article de Loria Wiki.

LEX2ALL is a lexicon converter. It converts lexicon in a specific format (.lex files) to DyALog, LLP2 or Geni format. It is available locally with darcs version controling system. It is written in haskell and provides the lexConverter executable.

See also Depots_de_lexiques.

Note that this converter works with both syntactic and morphologic lexicons. To see the available options, please invoke:

lexConverter -h

Sommaire

Lexicon format

NB: The first version of this format has been designed by Benoit Crabbé.

lemmas (i.e. syntactic lexicon)

The lex format in use at the moment allow for many pieces of information (including semantics). Its fields are:
*ENTRY: used to store the lemma,
*CAT: syntactic category,
*SEM: semantic information (at this time of writing a macro call, note that these macros are extracted automatically from the metagrammar by XMG),
*LAM: semantic information represented as lambda terms (optional field),
*ACC: verb acceptance

(for word having several meanings such as: parler, example:
jean parle anglais ; jean parle à marie) unused

*FAM: the family (i.e. subcategorization frame)
*EX: list of exceptions (in tagml this is a features list having the value "-") unused
*EQUATIONS: anchoring equations, of the form:

node -> [top|bot.]feat = val

*COANCHORS: coanchor equations, of the form:

node -> lemma / category

These equations are used to specify a lexical item that has to be added in the tree.

N.B.

  • all these fields are ordered.

Example:

*ENTRY: ce
*CAT: cl
*SEM: basicProperty[rel=ce]
*ACC: 1
*FAM: CliticT
*FILTERS: []
*EX: {}
*EQUATIONS:
anc -> func = suj
anc -> refl = -
*COANCHORS:

Emacs mode

Thanks to Jerôme Perrin.

(require 'generic-x)
(define-generic-mode 'lex-mode
'("%");;comments
'("ENTRY" "CAT" "SEM" "LAM" "ACC" "FAM" "FILTERS" "EX" "EQUATIONS" "COANCHORS");;keywords
'(
  ("ENTRY : +\\(\\sw[a-zA-Z0-9_.-]*\\)" 1 'font-lock-type-face);noms de classes
  ("include [a-zA-Z0-9\.]+" . font-lock-constant-face)
)
'(".lex\\'") ;;file extension
nil
"Major mode for lex editing")

This mode is associated with file having a .lex extension.

Morphological items

Morphological entries are of the form

lexical item <tabulation> associated lemma <tabulation> [ feat1 = val1 ; feat2 = val2 ; etc (optional ";")]

Here are some examples:

aime    aimer   [pos = v; mode = subj; pers = 3; num = sg;]
ait     avoir   [pos = v; mode = subj; pers = 3; num = sg;]
ami     ami     [pos = adj; gen = m; num = sg;]

Output formats

  • DyALog: XML format
  • LLP2: XML format (aka tagml)
  • Geni: text format (for lemmas only)

Installation and usage

Installation :
       cd LEX2ALL/
       make
       make install
Usage: lexConverter [OPTION...] files...
       -v        --verbose        verbose output on stderr
       -H, -h    --help           show help
       -L        --lemmas         Converting mode -> lemmas
       -M        --morph          Converting mode -> morphological items
       -r        --recode         To convert from Latin1 to UTF8
       -d        --dyalog         Output format: dyalog
       -l        --llp2           Output format: llp2
       -g        --geni           Output format: geni
       -x        --xml            Output format: geni XML
       -t        --tulipa         Output format: TuLiPA
       -o [FILE] --output[=FILE]  output FILE (default: stdout)
       -i [FILE] --input[=FILE]   input FILE (default: stdin)

Related links

Récupérée de « http://wiki.loria.fr/wiki/LEX2ALL »
Outils personels