Design and Implementation of a flexible object oriented matcher in CLOS (Common Lisp Object System)

Context

At the AI-Lab we use a unification based grammar formalism called Fluid Construction Grammar (FCG). It is fully implemented in lisp but lacks easy extensibility (it is not OO) and could be more flexible in its processing. See What is unification for more information.

Projectvoorstel

Matching and merging (or unification) is a very common technique in Artificial Intelligence. It is used extensively in language processing formalisms. At the AI-Lab we have developed our own formalism called Fluid Construction Grammar (FCG) [3]. It was primarily based on the code found in [1]. The project would involve writing either from scratch or re-using (i.e. refactoring) the existing code. This choice will depend on the student's analysis of implementation possibilities. A normal unification engine is however not very flexible in the sense that it returns either a set of bindings (of the unification is successful) or nil (false) if the unification fails. The main difference in this project proposal is that the merging engine should always return a maximally informative result. In case of failure we would like to know exactly why it fails and preferably (but this would be an extra) even the possibility to continue with a partial merge. This sort of flexible processing is very necessary in the language research done at the AI-Lab since many of the structures that need to be merged are incomplete and thus require such partial solutions.

Implementatieplan

First the student and supervisor need to agree on a minimal set of prerequisites. The student will be supplied with a full specification of the sorts of structures the merger needs to cope with. These are quite basic tree-like feature structures (also a common notion in AI). Before implementation can start, a careful design of the required classes and other abstractions will be agreed upon with the supervisor. The feature structures themselves should be implemented as classes as well. Something that is lacking in the current implementation (they are plain lists).

Concerning the requirements we will distinguish between basic and complex feature structures. The first phases of implementation will always focus on basic feature structures, which are tree-like structures containing symbols and variables. The complex feature structures can contain special operators that change the behaviour of the matcher and merger. A basic feature structure in lisp bracket notation would look something like this:

((?top-unit
   (form ((meets ?article-unit ?noun-unit)))
   (syn-subunits (?noun-unit ?article-unit))) 
 (?noun-unit
   (syn-cat noun)) 
 (?article-unit
   (syn-cat article)))

We also distinguish between a rigid and a flexible matcher. A rigid matcher is simply one that returns either nil (in case the structures do not match) or a set of bindings in case the structures do match. The point is however to implement a flexible or what is also called partial matcher, that is capable of continuing even when the match is not perfect. This might seem like a small adjustment but is however a very complicated problem, requiring many AI techniques to be used.

Basisvereisten

At this point we envision the following milestones during implementation: We have developed a unit-testing framework and the student will need to write a full test-suite for all components that are written. These unit tests (or if the student motivates another manner of testing) are of vital importance for the project. Of course documentation and readability of the code are also very important. A last point of importance is representation of the processing. In the Lab we have developed a web interface to output results more elegantly (instead of writing to the REPL). There should be some minimal representation of the feature structures and the results of the processing in this web interface. This would require only very basic HTML knowledge.

Extras

Achtergrond & Literatuur

For more information on FCG check out the FCG website

Contact

Pieter Wellens
pieter@arti.vub.ac.be