Parsing techniques in Prolog
Klaus von Bremen
This site presents a book. The author is a lingvist - working in the tradition of what, with a very general term, may be called 'generative grammar' - getting interested, some time ago, in the problems of computerized parsing. The result of this interest is a book, Prolog Parsers, about parsing in Standard, Edinburgh, Prolog.
The concern of the book is with the evaluation of parsing techniques, not with the implementation of particular grammars, such as e. g. ALE. - Though the argumentation is in terms of context-free phrase structure grammar, the problems discussed will arise with any type of grammars, be they tree-adjoining grammars, principle-based grammars, unification grammars etc.
I exemplify, using a basic shift-reduce parser, the problems encountered by this type of parser - backtracking and excessive looking for non-existing parsings. I then analyze two parsers proposed in the literature, that try to solve these problems by combining shift-reduce parsing with top-down prediction - the concept that makes left-corner parsing so efficient - and show, that they do not work, due to inefficient reference to the parse stack.
I then improve the basic model, incrementing it with respectively look-ahead and oracles. I compare two different versions of this parser, a depth-first search and a breadth-first search one. No great efficiency differences emerge between these two searching types.
I propose two models of a DCG parser, that combine DCG parsing with, respectively, shift-reduce parsing of a reverse string and left-corner parsing. This approach is much less efficient, but has certain advantages.
I discuss a chart parser by Gazdar and Mellish, for which I propose an improved version, and propose a string-related chart parser.
Finally, I make efficiency comparisons between, and discuss the pros and cons of, these different types of parsing techniques. Pure left-corner parsing - as exemplified by the BUP-parser - turns out, for sheer speed efficiency, vastly superior to all other parser types, but may not be, after all, the best parsing method, due to certain inherent limitations discussed in the book
Prolog versus Lisp parsers. The speed efficiency of Prolog parsers depends heavily on on vocabulary size, that of Lisp parsers does not. That is a weighty argument in favour of Lisp.
Lisp has no built-in backtracking. The latter must be programmed. This fact clearly favours Prolog, except for such parsers that do without backtracking (chart parsers, tree forest parsers).
The readability, comprehensibility and 'linguisticality' of the code favours Prolog. The grammarlike structure of its syntax - it was invented by Colmerauer for linguistic analysis - would seem, other things being equal, to make it a linguist's first choice.
Lisp, naturally, produces analyses; it does not build structure. (Though it can be made to do so: see the Lisp programs in the book). Prolog builds, naturally, structures. So, the choice between them, under this aspect, depends on what one wants to get. For purely linguistic purposes, the resulting tree parses under Prolog favour the latter.
When all comes to all, Prolog is the first choice for linguists working in the field of the computational analysis of language. For other purposes the choice may just be a matter of taste and convenience.
The Prolog-programs were written for, and efficiency-tested - in interpreted form - under ARITY- Prolog 5.1, on an AT 386 - a slow machine by to-day's standards. For the purpose of the book this is an advantage, though: it makes the speed differences between the parsing techniques discussed in the book stand out more clearly than on a fast machine.
The programs also run under ARITY-Prolog 6.1 (under Windows-DOS), and should run under ARITY/PROLOG32. There is, however, also a modified version for SWI-Prolog - which runs under Windows - which should run, with minor terminological adaptations, under Prologs such as QUINTUS-Prolog and SICSTUS-Prolog.
The main difference between the ARITY interpreter and those for these Prologs is, that the former does not require declarations for dynamic predicates, whereas the latter do.
The appendix contains all Prolog programs discussed in the book.
There are a number of runnable Prolog- and Lisp-files on this site.
Goto The Book
Contact me Klaus von Bremen
LINKS RELATED TO THIS SITE