Lexical Analysis


It is the process of grouping individual characters into basic entities, known as tokens or lexemes.

Examples: integer, variable, addition operator.

The lexical analyser (aka scanner) takes a source file and produces a stream of tokens, error messages and diagnostics.

Here you can see a parse tree with lexing: parse-tree-lexing

Grammars for lexical analysis

Related to Formal Languages.

Type 3 grammar for binary numbers:

  • integer 0 | 1 | integer0 | integer 1

Derived from the trivially equivalent but clearer type 2 grammar:

  • integer digit | integer digit digit 0 | 1

Typical tokens:

TypeDescriptionExample
integerstring of decimal digits1729, 561, 0
variablestring starting with a letterloopvar, t1, x
operatorone of +, -, * or /+, /
VARthe letters V, A, RVAR

Symbol Tables

They are used throughout the compiler to build information about symbols:

struct symbtab
{
  char *name;
  int type;
  int blockno;  /* Block where declared */
  int addr;
}