Subject Ph.D. Defense: Measuring and Extending LR(1) Parser Generation From David Pager Date Monday, March 16, 2009 9:45 am To ics-faculty, lis-faculty, uhm-ics-grads, uhm-ics-gradlike-l, cis-program, eefaculty, eegrad The Ph.D. defense by Xin Chen will be conducted on April 3 4-6 p.m. in Post 302 MEASURING AND EXTENDING LR(1) PARSER GENERATION Commonly used parser generation algorithms such as LALR, LL and SLR all have their restrictions. The canonical LR(1) algorithm proposed by Knuth in 1965 is regarded as the most powerful parser generation algorithm for context-free languages, but is very expensive in time and space and has long been considered as impractical by the community. There have been LR(1) algorithms that can improve the time and space efficiency of the canonical LR(1) algorithm, but there had been no systematic study of them. Good LR(1) parser generator implementation is also rare. LR(k) parser generation is even more expensive and complicated than LR(1), but it can be an alternative to GLR in natural language processing and other applications. To solve these problems, this work explored ways of improving relevant data structure and algorithms, and implemented an efficient, practical and Yacc-compatible LR(0)/LALR(1)/LR(1)/LR(k) parser generator Hyacc, which has been released to the open source community. A empirical study was conducted comparing different LR(1) parser generation algorithms and with LALR(1) algorithms. The result shows that LR(1) parser generation based upon improved algorithms and carefully selected data structures can, with modern computing facilities, be sufficiently efficient to be of practical use. An extension was made to the unit production elimination algorithm to remove redundant states. A LALR(1) implementation based on the first phase of lane-tracing was done, which is another alternative of LALR(1) algorithm. The second phase of lane-tracing algorithm, which has not been discussed in detail before, was analyzed and implemented. A new LR(k) algorithm called the edge-pushing algorithm, which is based on recursively applying the lane-tracing process, was designed and implemented. Finally, a latex2gDPS compiler was created using Hyacc to demonstrate its usage.