#header-inner {background-position: right !important; width: 100% !important;}

11/20/14

Simple Parser Tutorial.

What this post is about?

it's a simple example of how context-free grammar parsers can be designed & implemented.

it's fairly advanced topic, requires knowledge of programming, programming languages, regular expressions & related.

perhaps it's too few words, please contact me if You wish something elaborated.


Language.

our parser will accept texts written in language with:

1. integer numbers & boolean values.
2. arithmetic operations on integer numbers:
- addition,
- substraction,
- multiplication,
- division.
3. arithmetic operations on boolean values:
- not,
- and,
- or.
4. expressions & grouping operations (parentheses).
5. variables & assignment instruction.
6. conditional instruction if.
7. input & print instructions.

this should enable for basic arithmetic/logic operations whose results can be printed on screen.

they are mandatory for any serious programming language in my opinion, input & print instructions as well - even if they are only standard library functions.


Context-Free Grammar.

first we'll design language's grammar, which consists of productions, terminal symbols, nonterminal symbols & separators written in a serie of code lines.

* token -> instruction | expression
* instruction -> conditional_instruction | assignment_instruction | variable_declaration | input_instruction | print_instruction
* conditional_instruction -> if ( boolean_expression) instruction;
* assignment_instruction -> identifier = expression;
* variable_declaration -> variable_type identifier;
* variable_type -> int | boolean
* print_instruction -> print identifier;
* expression -> number_expression | boolean_expression | ( expression ) | input_instruction
* number_expression -> number | ( number_expression ) | -number_expression | number_expression + number_expression | number_expression - number_expression | number_expression * number_expression | number_expression / number_expression
* boolean_expression -> boolean_value | ( boolean_expression ) | not boolean_expression | boolean_expression or boolean_expression | boolean_expression and boolean_expression


in this language we do not allow for conversion between types, for simplicity.

what is an identifier, number, boolean_value etc... we'll define a little later, during tokenization process. terminal symbols of grammar will be defined there, as well as other things.


Tokenization.

(to be continued, if/when i can).

See also: Context-Free Grammars, Simple compiler design.

No comments:

Post a Comment