SemConst/Documentation

Un article de Loria Wiki.

Technically speaking, SemConst is an XMG / DyALog wrapper, which performs semantic construction for Tree Adjoining Grammars.

Sommaire

Underlying ideas

Background

(We assume that the reader already have some knowledge of Tree Adjoining Grammars, in all cases, references are given).

What we want to do with SEMCONST, is to build the basic compositional semantic of sentences. To do this, we make several choices :

  • the syntactic formalism we use to describe natural language is Tree Adjoining Grammars (TAG, see [Joshi et al, 75] ; [Joshi and Shabes, 97]).
  • the semantic representation we handle is flat semantics, ako predicate logic with no recursive predicates (for instance see [Bos, 95]).
  • the semantic compositional operation is union modulo unification (the predicate we handle contain as arguments, either constants or unification variables).

In this context, the syntax / semantics interface is expressed within a TAG by associating each elementary tree of a TAG with elementary flat semantic formulas. These formulas share unification variables with specific node features.

Thus, semantic construction is done the following way. During tree rewriting, node features are unified (see definition of adjunction and substitution operations with TAG). As a consequence semantic arguments are updated (more precisely unified). At the end of a derivation, by taking the union of all elemantary semantic formulas, we obtain the semantic representation corresponding to the sentence described by the lexical items at the frontier of the final tree. This representation is a list of predicates.

This semantic construction process is detailed in [Gardent and Kallmeyer, 03].

NB Note that several proposals have been made concerning semantic construction with TAG (see e.g. [Kallmeyer and Joshi, 99] ; [Kallmeyer, 02] ; [Seddah, 04]). Up to now, no consensus has been found to state whether the derived tree or the derivation tree contains the most appropriate information, and thus is the best place to perform semantic construction.

Additional comments

In spite of the fact that several attempts have been made to perform semantic construction with TAG. No one have been shown to work with wide-coverage grammars. A target of our approach is to fill this gap (note that such a task is possible with the LKB system for Head-driven Phrase Structure Grammars).

How do we do this ? We use techniques from grammar representations. The idea is to design not a grammar, which would be a painful task, but a reduced description of the grammar, a metagrammar. Then a specific device expand this description to obtain the whole grammar. The metagrammatical framework we use is eXtensible MetaGrammar (see [Duchier et al, 04]).

When dealing with a huge amount of trees, parsing may become a greedy task in terms of memory size and time. That is why SemConst uses the DyALog system, which uses techniques from dynamic and tabular programming to compile a very efficient TAG / TIG parser from a given grammar.

DyALog-made TAG parsers produce as output a derivation forest. SemConst uses that forest to extract the derivations that lead to complete parses. A specific prolog module eventually perform the semantic construction itself from (a) these parses and (b) the semantic information contained in the trees involved in these parses.

SemConst's input

What about the resources ?

What are the needed resources to perform semantic construction ?

  • First we need a grammar (TAG) whose rules include semantic information.More precisely, the grammar we consider is made of tree schematas associated with flat semantic formulas. With SemConst, such a grammar is compiled from a metagrammar using the eXtensible MetaGrammar formalism (see [Gardent, 06]).
  • Secondly, we need lexica. More precisely, we need both a lexicon containing lemmas and a lexicon containing morphological information. This split of the lexicon has been introduced for parsing with TAG by [XTAG group, 01]. The idea is to deal with tree schematas rather than lexicalised trees. Thus we reduce the combinatory that has to be handle. Indeed, during parsing, the tokenizer reads a sentence, lemmatize it, selects the associated lemmas, which are related with tree schematas. Thus, the tokenizer can give the information needed by the parser to select a sub-grammar containing the lexicalized relevant trees.
  • thirdly, we use a corpus, i.e. a set of sentences which are supposed to be covered by the grammar. These sentences are the ones semantic construction is performed on. Note that knowing the corpus whose semantic we are going to construct before compiling the metagrammar is useful in so far as it allows to extend the performances of the system by reducing the grammar to be considered according to the corpus' content (we compile a parser corresponding to the relevant sub-grammar).

What about the formats ?

  • Considering the metagrammar, as it has been argued before, we use the eXtensible MetaGrammar (XMG) formalism to produce a real size TAG. The description language the XMG system implements is described in XMG's manual.
  • The lemmas are described using a home-made language (.lex files, see below). The reason why we use such a language (proposed by Benoit Crabbé) are the following:
  1. this language is rich in so far as it can contain several information (path equations, flat semantics, etc).
  2. this type of lexicon can be translated by the lexConverter into several output formats (DyALog's XML, LLP2's tagml, or GenI text format).

The language we use to describe both kinds of lexica (lemmas and morphological items) is the following:

(see also lexConverter's manual)

lemmas (i.e. syntactic lexicon)

The lex format in use at the moment allow for many pieces of information (including semantics). Its fields are:
*ENTRY: used to store the lemma,
*CAT: syntactic category,
*SEM: semantic information (at this time of writing a macro call, note that these macros are extracted automatically from the metagrammar by XMG),
*LAM: semantic information represented as lambda terms (optional field),
*ACC: verb acceptance

(for word having several meanings such as: parler, example:
jean parle anglais ; jean parle à marie) unused

*FAM: the family (i.e. subcategorization frame)
*EX: list of exceptions (in tagml this is a features list having the value "-") unused
*EQUATIONS: anchoring equations, of the form:

node -> feat = val
(To be discussed: how to extend this representation to have path equation with top and bot ?)

*COANCHORS: coanchor equations, of the form:

node -> lemma / category

These equations are used to specify a lexical item that has to be added in the tree.

N.B.

  • all these fields are ordered.

Example:

*ENTRY: ce
*CAT: cl
*SEM: basicProperty[rel=ce]
*ACC: 1
*FAM: CliticT
*FILTERS: []
*EX: {}
*EQUATIONS:
anc -> func = suj
anc -> refl = -
*COANCHORS:

Emacs mode

Thanks to Jérôme Perrin.

(require 'generic-x)
(define-generic-mode 'lex-mode
'("%");;comments
'("ENTRY" "CAT" "SEM" "LAM" "ACC" "FAM" "FILTERS" "EX" "EQUATIONS" "COANCHORS");;keywords
'(
  ("ENTRY : +\\(\\sw[a-zA-Z0-9_.-]*\\)" 1 'font-lock-type-face);noms de classes
  ("include [a-zA-Z0-9\.]+" . font-lock-constant-face)
)
'(".lex\\'") ;;file extension
nil
"Major mode for lex editing")

This mode is associated with file having a .lex extension.

Morphological items

Morphological entries are of the form

lexical item <tabulation> associated lemma <tabulation> [ feat1 = val1 ; feat2 = val2 ; etc (optional ";")]

Here are some examples:

aime    aimer   [pos = v; mode = subj; pers = 3; num = sg;]
ait     avoir   [pos = v; mode = subj; pers = 3; num = sg;]
ami     ami     [pos = adj; gen = m; num = sg;]

SemConst's output

SemConst's output contain several pieces of information, more precisely, for a sentence the SemConst's report consists of:

  1. (in interactive mode) the derivation forest computed by the DyALog-made TAG parser (this forest is represented as a Context Free Grammar, see [Villemonte De La Clergerie, 05]).
  2. the derivation duration (in seconds).
  3. the representation of the sentence's semantics (corresponding to a flat semantic formula).

For instance, here is an entry of SemConst's results' log (in corpus mode):

[ Parse available for the sentence: "il fait beau" ]
Semantic representation:
[A:faire(B), C:il(B), D:beau(B)]

** parsing duration: 0.47

And here is an example of a derivation forest (interactive mode):

Sentence: c est difficile de comprendre
Answer:
    L = []
    N = 5
    A = 0

----------------------------------------------------------------------
Shared Forest

*ANSWER*{answer=> [L = [],N = 5,A = 0]}
	0 <-- [0]1
s{mode=> ind, wh=> -}(0,5)
	1 <-- ( [0]2 [v_anc]3 4 | [0]5 [v_anc]3 4 )
s{mode=> ind, wh=> -}(0,0) * s{control_gen=> f, control_num=> sg, control_pers=> 3, inv=> -, mode=> inf, princ=> -, wh=> -}(0,4)
	2 <-- ( [1]6 [21]7 [adj_anc]8 9 10 | [1]6 [21]7 [adj_anc]8 11 10 | [1]6 [21]7 [adj_anc]8 9 10 | [1]6 [21]7 [adj_anc]8 11 10 | 
              [1]6 [21]7 [adj_anc]12 9 10 | [1]6 [21]7 [adj_anc]12 11 10 | [1]6 [21]7 [adj_anc]12 9 10 | [1]6 [21]7 [adj_anc]12 11 10 )
verbose!anchor(comprendre, 4, 5, Tn0V-116, v{aux=> avoir, aux_refl=> -, inv=> -, mode=> inf}, [comprendre,comprendre])
	3 <-- 
verbose!struct(Tn0V-116)
	4 <-- 
s{mode=> ind, wh=> -}(0,0) * s{control_gen=> m, control_num=> sg, control_pers=> 3, inv=> -, mode=> inf, princ=> -, wh=> -}(0,4)
	5 <-- ( [1]13 [21]7 [adj_anc]8 9 10 | [1]13 [21]7 [adj_anc]8 11 10 | [1]13 [21]7 [adj_anc]8 9 10 | 
              [1]13 [21]7 [adj_anc]8 11 10 | [1]13 [21]7 [adj_anc]12 9 10 | [1]13 [21]7 [adj_anc]12 11 10 | [1]13 [21]7 [adj_anc]12 9 10 | 
              [1]13 [21]7 [adj_anc]12 11 10 )
cl{func=> suj, gen=> f, num=> sg, refl=> -}(0,1)
	6 <-- [cl_anc]14 15
v{cop=> +, inv=> -, mode=> ind, num=> sg, pers=> 3}(1,2)
	7 <-- [v_anc]16 17
verbose!anchor(difficile, 2, 3, Tn0vAdes1-5, adj{gen=> f, num=> sg}, [difficile,difficile])
	8 <-- 
verbose!lexical([de], 3, 4, c, [de,de])
	9 <-- 
verbose!struct(Tn0vAdes1-5)
	10 <-- 
verbose!lexical([de], 3, 4, p, [de,de])
	11 <-- 
verbose!anchor(difficile, 2, 3, Tn0vAdes1-5, adj{gen=> m, num=> sg}, [difficile,difficile])
	12 <-- 
cl{func=> suj, gen=> m, num=> sg, refl=> -}(0,1)
	13 <-- [cl_anc]18 15
verbose!anchor(c, 0, 1, TCliticT-198, cl{func=> suj, gen=> f, num=> sg, refl=> -}, [ce,c])
	14 <-- 
verbose!struct(TCliticT-198)
	15 <-- 
verbose!anchor(est, 1, 2, TCopule-201, v{cop=> +, inv=> -, mode=> ind, num=> sg, pers=> 3}, [être,est])
	16 <-- 
verbose!struct(TCopule-201)
	17 <-- 
verbose!anchor(c, 0, 1, TCliticT-198, cl{func=> suj, gen=> m, num=> sg, refl=> -}, [ce,c])
	18 <-- 

SemConst's semantic construction process

The whole algorithm

The process used to construct semantic representations is the following:

  1. we pre-process the corpus we want to cover semantically, the goal is twice: (a) normalising it (remove the capitalized letters and the punctuation), and (b) retrieve the tree families that are associated with the lexicon used in the corpus,
  2. we use the result of the pre-processing to build a new valuation file (a valuation file, for a metagrammar, is the file containing the name of the tree families to compute, so the size of the produced grammar is directly related to the number of tree families to compute),
  3. we check the valuation file we've built to be able to retrieve potential unknown tree family and print warning messages (such a situation may explain a lack of covering),
  4. we compile the metagrammar according to the valuation file computed, and convert the lexica into DyALog's format (note that this is done using a Makefile),
  5. we compile the parser for the grammar previously produced.
  6. eventually, we can parse and construct the semantic representation (see below) for a given sentence.
  7. (in corpus and batch modes), we can produce statistics and compile a pdf summarization (using latex).

The semantic construction itself

The semantic construction process implemented in SemConst is the following :

  1. we pre-process the grammar to extract from the trees their semantic structure. By semantic structure we mean that (a) we take the semantic formula associated with the tree and (b) we build a semantic tree which corresponds to the syntactic tree with only semantic features on nodes. As a result we obtain two grammars in bijection with each other: (i) a purely syntactic TAG, (ii) a semantic grammar. The link between these two grammars is maintained through tree identifiers.
  2. we perform TAG parsing with the syntactic grammar. The result of the parsing is a derivation forest.
  3. we process that derivation forest to obtain its different readings (i.e. the derivations).
  4. eventually we recompute these derivations on the semantic trees (i.e. we build the semantic derived tree). The unifications involved in substitution and adjunction will update the semantic arguments. Thus we obtain the semantic representation we were looking for.

Getting SemConst

SemConst can be obtained only as sources (because several tools are used and different compilers needed).

You can get the sources either through a svn repository, using the following command:

svn checkout svn://scm.gforge.inria.fr/svn/paule/trunk/SemConst

or as a tarball, here.

Installing SemConst

SemConst's installation is quite easy, but the hardest part is to gather all required tools / compilers and install them.

Requirements

First, you need to install the following softwares:

  • The Oz/Mozart developing environment (includes an Oz compiler, current version: 1.3.2)

All the information concerning Oz/Mozart's installation is given in the installation manual.

  • An Haskell compiler (we recommand GHC version 6.4.1 or higher)

There exists different ways of getting ghc according to your system (e.g. debian package). All the relevant information is given on GHC's website.

NB: you also need the parsec library (as of version 6.6, it is no more part of the standard library).

  • A Perl interpreter (part of most linux platforms)

Most of unix-like systems already include a perl interpreter.

  • An XSLT processor (such as xsltproc)

If there is no packaged version for your OS, you can download libxml's sources on xmlsoft's website and then invoke (for standard installation):

./configure
make
sudo make install

Available as a debian package or tarball. The install procedure (if no package is available) is the following:

  cd src                               
  ./configure [OPTIONS]                
  make                                 
  sudo make install 

Available as debian packages, otherwise the sources can be compiled the usual way:

./configure
make
sudo make install
  • The XMG system

Once you downloaded the sources (info on the web page), just install the select package via:

ozmake --install --package=http://www.mozart-oz.org/mogul/populate/duchier/select/duchier-select__1.3.0__source__1.8.pkg

and then invoke:

cd MGCOMPILER
ozmake --upgrade

Don't forget to update your $PATH to include

${HOME}/.oz/1.3.2/bin
  • the VIEWER (TAG explorer, part of the XMG-TOOLS)

Once you've donwloaded the XMG sources via anonymous svn access (see the web page) all you have to do is:

cd trunk/XMG-TOOLS/VIEWER
ozmake --upgrade
  • The Mathweb Xmlparser (XML Parser written in Oz, available as a MOGUL package)

Once you've downloaded the package at http://www.mozart-oz.org/mogul/info/mathweb/xmlparser.html, untar it via:

tar xzvf mathweb-xmlparser.tgz

then go to the source directory and invoke:

cd mathweb-xmlparser
./configure --prefix=${HOME}/.oz/1.3.2
make install

This will install the needed libraries in ~/.oz/1.3.2/cache/x-ozlib/mathweb/

  • The Inputsource Oz library (a class for reading data from an arbitrary sequence of heterogenous sources such as files, urls, strings and virtual strings, used by the XML parser)

Once you've downloaded the ozmake package at http://www.mozart-oz.org/mogul/populate/duchier/inputsource/duchier-inputsource__1.3.0__source__0.pkg, just type:

ozmake --install --package=duchier-inputsource__1.3.0__source__0.pkg
  • The lexConverter (lexicon convertion program written in haskell)

The lexConverter is available via svn access using the following command:

svn checkout svn://scm.gforge.inria.fr/svn/paule/trunk/SemConst

Alternatively, a tarball of the sources is available here.

Once you get the sources, you can compile and install it via:

cd LEX2ALL
make
sudo make install
  • the DyALog system (Version 1.11.0 or higher)

(new) Alpage tools (DyALog, tag_utils, forest_utils) can be installed via the alpi installation softwares (recommended).

Otherwise, you need to install Hans Boehm's Garbage Collector for C prior to install DyALog. Then the installation process is the same as usual:

./configure
make
sudo make install

Don't forget to include the following path in your $LD_LIBRARY_PATH:

/usr/local/lib/DyALog

NB: if you installed DyALog using alpi, the latter will tell you the environment variables you have to define.

  • the tag_utils (Version 1.10 or higher, perl utilities for DyALog-like TAGs)
tar xzvf tag_utils-1.10.tar.gz
perl Makefile.PL
make
sudo make install

Note that you need to install the following perl modules:

[ Getopt::Mixed ]
[ XML::Parser ]
[ XML::Generator ]

for these you can use the cpan command or the CPAN search.

  • the forest_utils (Version 0.07 or higher, perl utilities for DyALog-like parse forests)

Once you've downloaded the sources, just invoke:

tar xzvf forest_utils-0.07.tar.gz
cd forest_utils-0.07
perl Makefile.PL
make
sudo make install

NB: note that you can also install the forest_utils using alpi.

Note that you need to install the following perl modules:

[ Getopt::Mixed  ]
[ XML::Parser ]
[ XML::Generator ]
[ Data::Dumper ]
[ Parse::RecDescent ]

for these you can use the cpan command or the CPAN search.

Installation process

Once all the prerequesites are done, you simply have to change directory to SemConst's sources:

cd SemConst/

change the execution rights of the install.sh file:

chmod a+x install.sh

and finally invoke the install shell script:

./install.sh

NB: note that you need to define the ALPAGE_PREFIX environment variable pointing to the repository where DyALog and other alpage tools have been installed, e.g. /usr/local/ or /home/... prior to invoking ./install.sh

Remark: why all these requirements ?

First, when designing our semantic construction system, we wanted to reuse as much as possible existing tools whose efficiency has been proved (especially the DyALog system). Furthermore, many different formats are used by each of these tools. As a consequence, we need (a) compilers for the main components (written in Haskell, Oz and DyALog) and (b) additional tools to convert the resources (xsltproc, perl).

Here is the list of functionalities provided by SemConst's components:

  • GHC (Haskell). This is used by the lexConverter (convertion of the lexica .lex to .tag.xml, .tagml or .geni).
  • Oz/Mozart. The SemConst's engine, along with the GUI are developed in Oz.
  • xsltproc. The XMG XML format is converted to DyALog and LLP2's ones using xsltproc.
  • GNU-prolog. The module performing semantic construction itself is written in Prolog.
  • GNU-build tools. These tools are used by the DyALog system to compile the TAG / TIG parser.
  • XMG. The XMG system is used to compile a TAG with semantics from a metagrammar.
  • the VIEWER. This XMG-TOOL is used in interactive mode only to help debugging the metagrammar.
  • the Mathweb Xmlparser. This parser generator from DTD is used to provide an XML parser to process the parse forest (DyALog-parser-s output).
  • the Inputsource library is used by the Xmlparser to process the parse forest given by DyALog.
  • the DyALog system. It is used to compile a TAG / TIG parser from a TAG grammar.
  • the tag_utils. These are used to convert a TAG grammar (and its lexica) from DyALog's XML format into DyALog native format (clauses).
  • the forest_utils. These are used to convert the parse forest from DyALog's text format into XML.
  • Perl / Latex. Perl is used in several scripts for calling the TAG / TIG parser in batch mode, and for computing statistics. The Latex language is used to provide the user with a pdf report.

Using SemConst

SemConst's main options

The SemConst program has 4 modes:

  • interactive (used to construct the semantic representation of a given sentence)
  • corp (used to build the semantic representation of the sentences belonging to a corpus, default mode)
  • batch (used to process a directory containing corpuses, a different sub-grammar is computed for each corpus)
  • all (used to process a directory containing corpuses, the whole grammar is computed only once and is used for all corpuses)

So to sum up, we have:

./SemConst.exe --interactive

Opens SEMCONST's GUI for semantic construction in interactive mode.

./SemConst.exe --corp

Opens SEMCONST's GUI for semantic construction in corpus mode, ie to perform semantic construction on a whole corpus (default).

./SemConst.exe --batch

Performs semantic construction on a directory of corpus (batch processing, no GUI, a different parser for each corpus). Note that all the relevant command line options (metagrammar, lexica, etc) must be provided.

./SemConst.exe --all

Performs semantic construction on a directory of corpus (batch processing, no GUI, a single parser covering the whole grammar). Note that all the relevant command line options (metagrammar, lexica, etc) must be provided.

SemConst's command line options

-g METAGRAMMAR_FILE

to set the metagrammar to use (if the metagrammar is splitted in many files, give the valuation file)

-l LEMMAS

to set the lemmas

-m MORPH

to set the morphological lexicon

-c CORPUS

to set the corpus (corp or batch mode, in the latter it must be a directory)

-o OUTPUT

to set the output (corp or batch mode, in the latter it must be a directory)

-h

to print help

-v

verbose mode

-w

to activate semantic construction (w stands for "with semantics", default is only syntactic parsing)

Important remark

The SemConst program will copy the linguistic resources (metagrammar and lexica) and convert them locally. Furthermore, it will compile a parser locally, too. This is done to prevent alteration of the resources, to avoid directory writing/reading rights troubles, and to ease convertions.

One consequence of this is that you must run the SemConst.exe program from its directory:

./SemConst.exe [OPTIONS]

SemConst's architecture

The sources' directory is organized the following way. At its root, it contains the CeCILL License (in both English and French) which is GLP-like, the installation shell script, the GUI/engine Oz sources and the main Makefile with its associated Makevars (this file is completed automatically by the MakeWriter.oz program).

There are 6 sub-directories:

  • convert_tools: repository containing the xslt and perl script used for converting data (lexica, grammar, etc).
  • resources: local repository where all the linguistic resources are copied.
  • scripts: repository containing shell and perl scripts (used for tokenizing, parsing, semantic construction, statistics computing, corpus pre-processing, etc).
  • parser: repository where the DyALog-made parser is compiled.
  • semantics: repository containing the sources for the semantic construction module (semantic grammar, derivations).
  • tmp: repository used for statistics computing and latex compilation.

FAQ (common pitfalls)

See also http://wiki.loria.fr/wiki/SemConst/Common_pitfalls

  • MetaTAG not found ?

make sure you installed the XMG system and you updated your $PATH. The MetaTAG program should be available this way (with bash):

export PATH=${PATH}:${HOME}/.oz/1.3.2/bin
  • libbuiltins.so.1 not found ?

once you installed DyALog, make sure you didn't forget to update your $LD_LIBRARY_PATH like this (with bash):

export LD_LIBRARY_PATH=/usr/local/lib/DyALog:${PATH}
  • error while installing SemConst via ./install.sh ?
Can't locate Automake/Struct.pm in @INC (@INC contains: 
/usr/share/automake-1.9 /etc/perl ... .)  
at /usr/local/bin/automake-dyalog line 49.
BEGIN failed--compilation aborted at /usr/local/bin/automake-dyalog line 49.
configure: error: cannot find install-sh or install.sh in ./ "."/./

This means some part of automake (needed by the DyAlog system) is not found. Check where Struct.pm is on your machine (e.g. /usr/local/share/automake) and update your PERL5LIB environment variable or use the alpi tool.

  • SemConst crashes when it tries to compile the parser with the following message ?
/usr/bin/install -c ./presmall_header.tag small_header.tag
/usr/local/bin/dyacc -I . -analyze tag2tig trees.tag -o tig_header.tag
/usr/local/bin/dyacc -I . -autoload -parse -verbose_tag -res small_header.tag -res tig_header.tag -autoload -verbose_tag -parse -c -o tig_mg-main.o 
`test -f 'main.tag' || echo './'`main.tag
Line 24 of /home/parmenti/SEMCONST/parser/main.tag:
        Syntax Error : 's!$ft' not a feature functor
make[1]: *** [tig_mg-main.o] Fehler 1
make[1]: Verlasse Verzeichnis '/home/parmenti/SEMCONST/parser'
make: *** [parser/tig_mg] Fehler 2

In this case, it means the s nodes have no feature in the grammar, and thus DyALog does not recognize this as a valid node category.

You can fix it by adapting the preheader.tag file (i.e. declaring a feature by hand, prior to the automatic detection).

  • The parser is compiled but you can't use it ?
./scripts/semparse.sh: 31: Syntax error: Bad fd number
./scripts/semparse.sh: 42: Syntax error: Bad fd number

This seems to come from a convention used by ubuntu Edgy in which /bin/sh is dash and not bash (see http://diveintomark.org/archives/2006/09/19/bad-fd-number)

  • syntactic parse available but no semantic parse ?
*** next 80 characters ***

*** end ***

%************************** scanner error ***********************
%**
%** start tag expected
%**
%** in line 1, column 1
%**
%**
%** Call Stack:
%** procedure 'Scanner,reportErr/fast' in file "./XmlScanner.oz", line 428, column 6, PC = 136898600
...

This Oz error message happens when the forest.xml file (derivation forest outputed from the parser after text to xml convertion) is empty. This may be the result of a missing Perl library used by DyALog's forest converter (e.g. Parse::RecDescent).

  • No semantic representation and no error message ?

Check the format of the morphological lexicon (it needs tabulations, no spaces, frequent errors come from pastes).

  • No semantic available when using latin1 encoding on machines having LANG set to UTF8 ?

Specify the encoding in the semantic construction prolog module (semantic/dyalog.pl):

:- load_files(['lexique.pl','forest.pl'],[encoding(iso_latin_1)]).

See Also

References


In this section, we do not give an exhaustive bibliography, we rather give the references of the papers that are relatively closely linked to this work.

[Bos, 95] J. Bos. Predicate Logic Unplugged. In proceedings of the Tenth Amsterdam Colloquium, 1995.
[Crabbé et al., 04a] Benoît Crabbé, Bertrand Gaiffe and Azim Roussanaly. Représentation et gestion du lexique d'une grammaire d'arbres adjoints in Traitement Automatique des Langues, 43,3, 2004.
[Crabbé and Duchier, 04b] Benoît Crabbé and Denys Duchier. Metagrammar Redux in Proceedings of CSLP'04, Roskilde, Denmark, 2004.
[Crabbé, 05a] Benoit Crabbé. Grammatical development with XMG, in proceedings of the Fifth International Conference on Logical Aspects of Computational Linguistics (LACL05), Bordeaux, 2005.
[Crabbé, 05b] Benoit Crabbé. Représentation informatique de grammaires fortement lexicalisées - Application à la grammaire d'arbres adjoints, Université Nancy 2, 2005.
[Duchier et al., 04] Denys Duchier, Joseph Le Roux and Yannick Parmentier. The Metagrammar Compiler: An NLP Application with a Multi-paradigm Architecture, in proceedings of the Second International Mozart/Oz Conference (MOZ'2004), Charleroi, 2004.
[Gardent, 06] C. Gardent. Intégration d'une dimension sémantique dans les grammaires d'arbres adjoints. In Actes de la conférence TALN'2006, Leuven, 2006.
[Gardent and Kallmeyer, 03] Claire Gardent and Laura Kallmeyer. Semantic construction in FTAG, in proceedings of the 10th meeting of the European Chapter of the Association for Computational Linguistics, Budapest, 2003.
[Gardent and Parmentier, 05] Claire Gardent and Yannick Parmentier, Large scale semantic construction for Tree Adjoining Grammars, In proceedings of the Fifth International Conference on Logical Aspects of Computational Linguistics (LACL05), Bordeaux, 2005.
[Joshi et al., 75] A. Joshi, L. Levy and M. Takahashi, Tree Adjunct Grammars, In Journal of the Computer and System Sciences, volume 10, pages 136-163.
[Joshi and Shabes, 97] Aravind Joshi and Yves Schabes. Tree-Adjoining Grammars, In Handbook of Formal Languages, G. Rozenberg and A. Salomaa editors, Springer, Berlin, New York, volume 3, pages 69 - 124, 1997.
[Kallmeyer, 02] L. Kallmeyer. Enriching the TAG Derivation Tree for Semantics, In Stephan Busemann (ed.) KONVENS 2002. 6. Konferenz zur Verarbeitung natürlicher Sprache. Proceedings, Saarbrücken, September 2002, 67-74.
[Kallmeyer and Joshi, 99] L. Kallmeyer and A. Joshi. Factoring Predicate Argument and Scope Semantics: Underspecified Semantics with LTAG. In Paul Dekker (ed.) 12th Amsterdam Colloquium. Proceedings, December 1999, 169-174.
[Seddah, 04] Djamé Seddah. Synchronisation des connaissances syntaxiques et sémantiques pour l'analyse d'énoncés en langage naturel à partir du formalisme des Grammaires d'Arbres Adjoints, Thèse de Doctorat, Université Henri Poincaré - Nancy 1.
[Villemonte De La Clergerie, 05] Eric Villemonte de la Clergerie, DyALog: a Tabular Logic Programming based environment for NLP. In Proceedings of 2nd International Workshop on Constraint Solving and Language Processing (CSLP'05). Barcelona, Spain, 2005.
[XTAG group, 01] XTAG-Research-Group. A Lexicalized Tree Adjoining Grammar for English, IRCS, University of Pennsylvania, IRCS-01-03, 2001.

Acknowledgements and Contacts

The pdf version of this wiki page has been created using the html2ps and ps2pdf utilities.

Claire . Gardent _AT_ loria . fr

Yannick . Parmentier _AT_ loria . fr

Yparmenti 2 mai 2006 à 15:58 (CEST)

Outils personels