Skip to content

Grammars

Grammars are the fundamental element to describe a language. They are written in a special language called antlr-ng grammar syntax. This section gives an overview of the different parts of a grammar and how they are used in the parser generation process.

The following topics are covered in this section:

  • Grammar Syntax: a graphical approach to the antlr-ng grammar structure.
  • Options: How to customize the generation process.
  • Parser Rules: A closer look at parser rules (non-terminals).
  • Lexer Rules: How are tokens (terminals) defined?
  • Wildcard: About matching any input. Greedy or not greedy is the question.
  • Unicode: How to define tokens with any Unicode character?

There are 3 types of grammars in antlr-ng:

  • Parser grammar: Only contains parser rule defintions. Options and actions apply only to the generated parser. Parser rules can reference other parser rules and lexer rules.
  • Lexer grammar: Only contains lexer rule definitions. Just like for parser grammars any given option applies only to the generated lexer. It is not allowed to reference parser rules in lexer rules.
  • Combined: A combination of parser and lexer rules in a single file. Options that can have a different meaning/value in parser and lexer apply only to the parser (e.g. the super class), while others (e.g. the language) are used for both types. Named actions (e.g. @header) must be scoped to identify where to use them (e.g. @parser::header. Otherwise they are used for the parser.

Given the special conditions and mentioned restrictions, it is recommended to use non-combined grammars whenever possible. This makes the grammar easier to understand and maintain. Combined grammars are more like beginner grammars to ease learning antlr-ng.