Matching and merging (or unification) is a very common technique in Artificial Intelligence. It is used extensively in language processing formalisms. At the AI-Lab we have developed our own formalism called Fluid Construction Grammar (FCG) [3]. It was primarily based on the code found in [1]. The project would involve writing either from scratch or re-using (i.e. refactoring) the existing code. This choice will depend on the student's analysis of implementation possibilities. A normal unification engine is however not very flexible in the sense that it returns either a set of bindings (of the unification is successful) or nil (false) if the unification fails. The main difference in this project proposal is that the merging engine should always return a maximally informative result. In case of failure we would like to know exactly why it fails and preferably (but this would be an extra) even the possibility to continue with a partial merge. This sort of flexible processing is very necessary in the language research done at the AI-Lab since many of the structures that need to be merged are incomplete and thus require such partial solutions.
First the student and supervisor need to agree on a minimal set of prerequisites. The student will be supplied with a full specification of the sorts of structures the merger needs to cope with. These are quite basic tree-like feature structures (also a common notion in AI). Before implementation can start, a careful design of the required classes and other abstractions will be agreed upon with the supervisor. The feature structures themselves should be implemented as classes as well. Something that is lacking in the current implementation (they are plain lists).
Concerning the requirements we will distinguish between basic and complex feature structures. The first phases of implementation will always focus on basic feature structures, which are tree-like structures containing symbols and variables. The complex feature structures can contain special operators that change the behaviour of the matcher and merger. A basic feature structure in lisp bracket notation would look something like this:
((?top-unit (form ((meets ?article-unit ?noun-unit))) (syn-subunits (?noun-unit ?article-unit))) (?noun-unit (syn-cat noun)) (?article-unit (syn-cat article)))
We also distinguish between a rigid and a flexible matcher. A rigid matcher is simply one that returns either nil (in case the structures do not match) or a set of bindings in case the structures do match. The point is however to implement a flexible or what is also called partial matcher, that is capable of continuing even when the match is not perfect. This might seem like a small adjustment but is however a very complicated problem, requiring many AI techniques to be used.