Predictive parsers Automatic Construction of efficient Parsers: LR parsers, the canonical Collection of LR(0) items
Predictive Parsers
- Definition: A type of top-down parsing technique that analyzes a string (input) by predicting the next production rule to apply based on the current state and input symbol.
- Key Points:
- No backtracking: Once a rule is chosen, there's no need to look back and try other rules. This makes them efficient for unambiguous grammars.
- LL(k) grammars: Predictive parsers require Leftmost Derivation (LL(k)) grammars, where k is the number of lookahead symbols needed to make accurate predictions.
- Limited Applicability: Due to the strict LL(k) requirement, predictive parsers are not as widely applicable as other parsing techniques, especially for complex grammars with ambiguous constructs.
LR Parsers
- Definition: Bottom-up parsing technique that uses a stack and an input buffer to analyze the string.
- Types:
- LR(0): Simplest form, but limited applicability due to possible ambiguity.
- SLR(1): Addresses some LR(0) ambiguities using a single lookahead symbol.
- LALR(1): More powerful than SLR(1), handling more complex grammars.
- Key Points:
- Handles Ambiguity: LR parsers can handle ambiguous grammars by constructing a state machine known as the "canonical collection of LR(0) items" to resolve ambiguities.
- More General: Can handle a wider range of grammars compared to predictive parsers, making them more widely used in compiler construction.
Canonical Collection of LR(0) Items
- Purpose: Helps LR parsers construct a parsing table to handle ambiguous grammars in a deterministic way.
- Components:
- Items: Represent potential parsing states, consisting of a grammar rule, a dot representing the current position in the rule, and a lookahead symbol set.
- GOTO: Function that moves the dot to the next symbol in a production rule.
- SHIFT: Action to shift the current input symbol onto the stack and advance to the next input symbol.
- REDUCE: Action to apply a production rule (represented by the item) by popping symbols from the stack and replacing them with the non-terminal on the left-hand side of the rule.
- ACCEPT: Indicates successful parsing when the end-of-file symbol is encountered at the top of the stack and the lookahead symbol is the end-of-file symbol.
Example (LR(0) Parsing Table Construction)
Consider the grammar:
E -> T + E | T
T -> F * T | F
F -> (E) | id
-
Generate initial item set:
{[S -> •E, $]}
, whereS
is the start symbol and$
is the end-of-file marker.
-
Iteratively expand the item set using GOTO and SHIFT:
- Repeat until no new items are generated:
- For each item
[A -> α • B β, a]
, apply GOTO onB
with lookaheada
:- Add
[B -> • γ, a]
to the item set ifB
is a non-terminal.
- Add
- For each item
[A -> α B • β, a]
, whereB
is a terminal:- Add
[A -> α B • β, b]
to the item set for each lookaheadb
in the FOLLOW set ofA
.
- Add
- For each item
- Repeat until no new items are generated:
-
Construct parsing table:
- Rows represent states (item sets).
- Columns represent terminals and the end-of-file symbol (
$
). - Entries indicate actions (SHIFT, REDUCE, or GOTO) based on the item in the state and the current symbol.
- If no action is defined, it's an error.
Importance of LR(0) Items and Parsing Tables
The canonical collection of LR(0) items and the resulting parsing table play a crucial role in enabling LR parsers to handle ambiguous grammars effectively. By analyzing the item sets and lookaheads, the parser can make deterministic decisions about which production rule to apply, even when faced with ambiguity in the input.
While predictive parsers offer efficiency for unambiguous grammars, their limited applicability makes them less suitable for real-world scenarios with complex language structures. LR parsers provide a more robust solution, handling ambiguous grammars and being widely used in compiler construction for various programming languages.