The Chomsky Hierarchy
A formal language consists of a finite set of terminal symbols, a finite set of non-terminal symbols, a finite set of production rules, and a start symbol. They can be organised into a hierarchical structure like so:
- Type 0 - Unrestricted grammars
Unrestricted grammars generate all languages that can be recognised by a Turing machine, which can also be called recursively enumerable langauges. - Type 1 - Context-sensitive grammars
Context-sensitive grammars generate context-sensitive languages, and have rules of the form αA&beta &rarr αγβ with A non-terminal, α, β, γ strings of terminals and non-terminals. These languages can be recognised by a linear bounded automaton. - Type 2 - Context-free grammars
These have rules of the form A &rarr &gamma, with A non-terminal, γ a string of terminals and non-terminals. Languages generated by these grammars can be recognised by a non-deterministic push-down automaton. These grammars are the theoretical basis for the syntax of most programming languages. - Type 3 - Regular grammars
Regular languages can be decided by a finite-state automaton. They have rules of the form A &rarr &epsilon, A &rarr αB
Labels: formal grammars, NLP