Common grammar manifesto/Lexical macros

Un article de Loria Wiki.

Sommaire

Introduction

We aim to represent the semantics of lexical entries in a way which:

  1. is convenient for the linguist to write
  2. allows GenI to know that the entry should be selected (on the basis of its input semantics)
  3. provides an interface to the tree semantics (thus allows the SelectTAG to perform tree anchoring)

Proposal

We would like to introduce some kind of macro language which provides expansions in two dimensions, an input semantic dimension and a tree interface dimension.

Macro definition

Let's start with an example macro:

define(`binaryVerb', `L:$1(E) L:$2(E A) L:$3(E B)
* INTERFACE: [ rel=$1
             , theta1=$2
             , theta2=$3
             , arg1=A
             , arg2=B ]')

How would this macro be used? Let's consider the semantics of the verb eat, expressed in macro form:

binaryVerb(eat, agent, patient)

Automatic macro definition ?

These macros can either (a) be defined manually or (b) be extracted automatically from the metagrammatical description (semantics classes).

  • Concerning (a) there is not much to say, except that it ensures a maximum control on the macro definition and can be developed relatively quickly (note that there is no scale issue since there shouldn't be a huge amount of macros).
  • The second approach (b) gives a way to guaranty the correspondance between the macros and the semantic information specified in the metagrammar (ie inside the trees).

Status of the macro body

What is the status of the representations and equations the macro definition contains ? The macro definitions must contain both (a) the semantic representation the GenI generator expects as input to allow for lexical selection and (b) the (selection and filtering) equations the SELECTOR needs to perform tree anchoring.

  • Concerning (a): the semantic representations consist of a list of labelled predicates with arguments:
L:$1(E) L:$2(E A) L:$3(E B)

Note that the names of the predicates are arguments of the macro. These predicates correspond in fact to the predicates that are defined within the semantic classes of the metagrammar. What about the values of the predicate arguments ? Do we need to associate identifiers to them or not ?

The answer is yes, so that we can ensure the correspondance between the trees' semantics and the input semantics after anchoring. This is discussed below with the use of interface equations (features arg1 and arg2 below).

  • Concerning (b): the anchoring equations consist of the association between interface features and specific values that come from the lexicon.
[ rel=$1
     , theta1=$2
     , theta2=$3
     , arg1=A
     , arg2=B ]

Here again, the feature names rel, theta1, theta2, arg1, arg2 are those contained in the interface associated to the trees in the metagrammar. The values of the 3 first features are given by the parameters of the macro, and what about the 2 others arg1 and arg2 ? In fact these 2 features' role is to provide a convenient way to ensure consistency between trees' semantics and input semantics during anchoring (and thus between trees' semantics and semantics features on nodes), ie to put the semantic values at the right place in the formulas associated to trees. In our example, we use the arg1 and arg2 features to contain respectively the argument of the $2 and $3 argument of the input semantic.

We should note 3 things here:

  1. the name arg1 and arg2 are arbitrary defined (within the metagrammar).
  2. (?) we could avoid using these features, but this would mean make anchoring more complicated as it would have to perform unification on input semantics on top of anchoring equations to ensure that the semantics is correctly updated (on the nodes).
  3. the link between the semantic representation and the interface equations (identifiers A and B in our example) is guaranty by the metagrammar (it is explicitely defined within the semantic classes).

Macro expansions

GenI expansion

This should be the more obvious expansion. Given an input semantics (as below), we want to select any lexical item whose semantics subsumes the input.

Given an input like:

h1:eat(e1) h1:agent(e1 x) h2:patient(e1 y)

We would select the lexical item eat, because its semantics subsumes it (identifiers beginning with uppercase letters represent variables):

L:eat(E) L:agent(E A) L:patient(E B)

Tree interface expansion

What may be less obvious is why we would need a separate expansion for the tree semantics. Why not just reuse the above representation?

It's a problem of specifying the order of the literals, which is important, because we really need to keep the syntax/semantics interface straight. Consider the tree for a simple transitive verb:

S(N/X, V(anchor), N/Y)
L:Rel(I) L:Theta1(E X) L:Theta2(E Y)
syntaxsemantics

While we could unify this with the semantic representation above, we can't really guarantee what the result of unification would be. We could either get one of:

Theta1 <- agent,   X <- A, Theta2 <- patient, Y <- B # john eats cake
Theta1 <- patient, X <- B, Theta2 <- agent  , Y <- A # cake eats john

The best bet is to be able to make explicit the order of things. We chose to use the tree interface, and from the tree interface persective, the binaryVerb macro for eats would expand into:

[ verbRel=eat
, theta1=agent
, theta2=patient
, arg1=A
, arg2=B ]

So note here that the coindexation between the GenI expansion and the tree interface expansion is very important!

More examples

Beaucoup

Macro call quantifier(beaucoup)
Tree semantics and interface in MG _:Q(E,L1,L2)

quantifierRel = Q

The macro quantifier(Q)
GenI expansion _:Q(_,L1,L2)
Tree interface expansion quantifierRel = Q

Implementation in m4:

define(`quantifier', `_:$1(_,L1,L2)
* INTERFACE: [ quantiferRel=$1
             , arg1=L1
             , arg2=L2 ]')

Adorer

Macro call binary_verb_troponym(aimer,exp,cause,beaucoup)
Tree semantics and interface in MG _:Rel(E) _:Theta1(_,_) _:Theta2(_,_) _:Mod(E)

rel = Rel, theta1 = Theta1, theta2 = Theta2, mod = Mod

The macro binary_verb_troponym(Rel,Theta1, Theta2, Mod)
GenI expansion _:Rel(E) _:Theta1(_,_) _:Theta2(_,_) _:Mod(E)
Tree interface expansion rel = Rel, theta1 = Theta1, theta2 = Theta2, mod = Mod

Implementation in m4:

define(`binary_verb_troponym', `L:$1(E) L:$2(E X) L:$3(E Y) L:$4(E)
* INTERFACE: [ Rel=$1
             , theta1=$2
             , theta2=$3
             , mod=$4]')

Implementation details

There are 2 points that need to be introduced here :

  • writing macros
  • instantiating macros

Writing macros using the XMG system

To create macro definitions, we extended the XMG system to compile the semantic classes of a metagrammar and write the information they contain in a specific format. The advantage is the possibility to extract the macros straight out of the metagrammar (ie we avoid macros writing and updating troubles). All the user has to do is to specify in the metagrammar which class are relevant for macro extraction, by stating:

semantics ClassId1 ClassId2 ... ClassIdN

Suppose we declared the semantic classes in the metagrammar test.mg, then we can compile this metagrammar and at the same time extract automatically our semantic macros by invoking:

MetaTAG test.mg --chk -c test.rec -s test.macros
 

Here the macros are written in a text file called test.macros. This file will be included by the lex lexicon file before convertion by the lexConverter, see LEX2ALL.

Using macros

Semantic macros consists of associations of Macro_name [ named parameters ] with predicate-arguments formulas
such as in:

binaryRel[theta2=?B,theta1=?F,rel=?I]
        semantics:[!A:?B(?C,?D) !E:?F(?C,?G) !H:?I(?C) ]
        interface:[arg1=?G,arg2=?D,index=?C,label0=!H,label1=!E,label2=!A,rel=?I,theta1=?F,theta2=?B]

The interface information represents the syntax/semantics interface, it allows to update syntactic trees in accordance with semantic formulas.

In the syntactic lexicon (lemmas), one may instanciate such a macro by invoking its Macro_name while specifying the value of the parameters:

*ENTRY: aimer
*CAT: n
*SEM: binaryRel[rel=aimer,theta1=agent, theta2=patient]
*ACC: 1
*FAM: n0Vn1
*FILTERS: []
*EX: {}
*EQUATIONS:
*COANCHORS:

Related links

  • lexicon converter LEX2ALL
  • XMG Metagrammar compiler
Outils personels