Regular expressions
Regular expressions can be seen as a parallel to arithmetic expressions.
- \( (5 + 3) \times 4 \) is an expression whose symbols represent numbers (e.g. natural numbers) and as a whole represents another number, 32.
- \( (0 \cup 1)0^\star \) is an expression involving [...] and as a whole represents [...].
A regular expression \( R \) is a sequence of symbols that is one of:
- a symbol \( a \) for some \( a \) in an alphabet \( \Sigma \)
- \( \varepsilon \)
- \( \emptyset \)
- \( (R_1 \cup R_2) \), where \( R_1 \) and \( R_2 \) are regular expressions
- \( (R_1 \circ R_2) \), where \( R_1 \) and \( R_2 \) are regular expressions
- \( (R_1^*) \), where \( R_1 \) is a regular expression
A regular expression is said to represent a language according to the rules below. We write \( L(R) \) to mean the language of the regular expression \( R \).
- From item 1, the regular expression \( a \) represents the language \( \{a\} \).
- From item 2, the regular expression \( \varepsilon \) represents the language \( \{ \varepsilon \} \).
- From item 3, the regular expression \( \emptyset \) represents [...].
- From item 4 & 5, the regular expression \( ( R_1 \cup R_2) \) and \( (R_1 \circ R_2) \) represent the languages obtained by taking the union and concatenation of the languages represented by \( R_1 \) and \( R_2 \).
- From item 6, the regular expression \( (R_1^\star) \) represents the star of the language represented by \( R_1 \).
We say that two regular expressions \( R_1 \) and \( R_2 \) are equal if they represent the same language, and we write \( R_1 = R_2 \).
This definition is lacking in how it describes 'represents'. Maybe it is a function. I think it's coverage of equality is just sufficient, as it delegates the well defined formulation of set equality.
\( \varepsilon \) vs \( \emptyset \)
The regular expression \( \varepsilon \) represents the language containing a single string, the empty string, \( \{\varepsilon\} \). The regular expression \( \emptyset \) represents the empty language, \( \{\} \).
Precedence order
Star has highest precedence, followed by concatenation, with union last.
\( R^+ \)Shorthand
\( R^+ \) is used as convenient shorthand for [...].
Two identities
If \( R \) is any regular expression then: