![]() |
VOOZH | about |
Lexical analysis, also known as scanning, is the first phase of a compiler. In this phase, the compiler reads the source code character by character from left to right and groups them into meaningful units called tokens. These tokens are then passed to the next phase of compilation, known as syntax analysis.
Sequence of characters that represents a basic unit of meaning in a programming language. Tokens are defined by the grammar of the language and are used by the parser to understand program structure.
Reserved words that have predefined meanings in a programming language.
if, else, for, while, int, voidIdentifiers are names given to variables, functions, arrays, or other user-defined elements.
Rules:
_Examples:count, _sum, totalValue
Fixed values whose value does not change during program execution.
Examples:
103.14'a'"Hello"Perform operations on operands.
Examples:
+, -, *, /<, >, ==&&, ||Used to structure programs.
Examples:
; → statement terminator, → separator{ } → code blocks[ ] → arraysRead more about Tokens.
Actual sequence of characters in the source code that matches a token pattern.
Example
float → KEYWORD
abs_zero_Kelvin → IDENTIFIER
= → OPERATOR
273 → INTEGER
; → SEMICOLON
| Lexemes | Tokens | Lexemes Continued... | Tokens Continued... |
|---|---|---|---|
| while | WHILE | a | IDENTIFIER |
| ( | LPAREN | = | ASSIGNMENT |
| a | IDENTIFIER | a | IDENTIFIER |
| >= | COMPARISON | - | ARITHMETIC |
| b | IDENTIFIER | 2 | INTEGER |
| ) | RPAREN | ; | SEMICOLON |
Tokens in a programming language are defined using regular expressions. The lexical analyzer (scanner) uses a Deterministic Finite Automaton (DFA) to recognize these tokens because DFAs can identify regular languages efficiently.
Each final state of the DFA represents a specific token type. The process of converting regular expressions into a DFA can be automated, making token recognition fast and systematic.
The lexical analyzer can also detect errors such as:
It reports errors along with the line number and column number.
Read more about Working of Lexical Analyzer in Compiler.
Input:
a = b + c;Token sequence:
id = id + id ;Each id refers to an entry in the symbol table containing details about the variable.
Program:
int main()
{
int a, b;
a = 10;
return 0;
}
Valid tokens:
'int' 'main' '(' ')' '{'
'int' 'a' ',' 'b' ';'
'a' '=' '10' ';'
'return' '0' ';'
'}'
Comments and extra spaces are ignored by the lexical analyzer. Only meaningful tokens are identified and passed to the next phase of compilation.
int main()
{
int a = 10, b = 20;
printf("sum is:%d", a+b);
return 0;
}
Answer: Total number of token: 27.
int max(int i);
- Lexical analyzer first read int and finds it to be valid and accepts as token.
- max is read by it and found to be a valid function name after reading (
- int is also a token , then again I as another token and finally ;
Answer: Total number of tokens 7: int, max, ( ,int, i, ), ;