# Regular Expressions

Regular expressions can be seen as a parallel to arithmetic expressions.

- \( (5 + 3) \times 4 \) is an expression whose symbols represent numbers (e.g. natural numbers) and as a whole represents another number, 32.
- \( (0 \cup 1)0^\star \) is an expression involving [...] and as a whole represents [...].

A regular expression \( R \) is a sequence of symbols that is one of:

- a symbol \( a \) for some \( a \) in an alphabet \( \Sigma \)
- \( \varepsilon \)
- \( \emptyset \)
- \( (R_1 \cup R_2) \), where \( R_1 \) and \( R_2 \) are regular expressions
- \( (R_1 \circ R_2) \), where \( R_1 \) and \( R_2 \) are regular expressions
- \( (R_1^*) \), where \( R_1 \) is a regular expression

A regular expression is said to represent a language according to the rules below. We write \( L(R) \) to mean the language of the regular expression \( R \).

- From item 1, the regular expression \( a \) represents the language \( \{a\} \).
- From item 2, the regular expression \( \varepsilon \) represents the language \( \{ \varepsilon \} \).
- From item 3, the regular expression \( \emptyset \) represents [...].
- From item 4 & 5, the regular expression \( ( R_1 \cup R_2) \) and \( (R_1 \circ R_2) \) represent the languages obtained by taking the union and concatenation of the languages represented by \( R_1 \) and \( R_2 \).
- From item 6, the regular expression \( (R_1^\star) \) represents the star of the language represented by \( R_1 \).

We say that two regular expressions \( R_1 \) and \( R_2 \) are equal if they represent the same language, and we write \( R_1 = R_2 \).

This definition is lacking in how it describes 'represents'. Maybe it is a function. I think it's coverage of equality is just sufficient, as it delegates the well defined formulation of set equality.

#### \( \varepsilon \) vs \( \emptyset \)

The regular expression \( \varepsilon \) represents the language containing a single string, the empty string, \( \{\varepsilon\} \). The regular expression \( \emptyset \) represents the empty language, \( \{\} \).

#### Precedence order

Star has highest precedence, followed by concatenation, with union last.

#### \( R^+ \)Shorthand

\( R^+ \) is used as convenient shorthand for [...].

#### Two identities

If \( R \) is any regular expression then: