# Regular expressions

Regular expressions can be seen as a parallel to arithmetic expressions.

- \( (5 + 3) \times 4 \) is an expression whose symbols represent numbers (e.g. natural numbers) and as a whole represents another number, 32.
- \( (0 \cup 1)0^\star \) is an expression involving languages (sets of strings) and as a whole represents another language, the set of all strings which start with a 0 or 1 and is followed by any number of 0s.

A regular expression \( R \) is a sequence of symbols that is one of:

- a symbol \( a \) for some \( a \) in an alphabet \( \Sigma \)
- \( \varepsilon \)
- \( \emptyset \)
- \( (R_1 \cup R_2) \), where \( R_1 \) and \( R_2 \) are regular expressions
- \( (R_1 \circ R_2) \), where \( R_1 \) and \( R_2 \) are regular expressions
- \( (R_1^*) \), where \( R_1 \) is a regular expression

A regular expression is said to represent a language according to the rules below. We write \( L(R) \) to mean the language of the regular expression \( R \).

- From item 1, the regular expression \( a \) represents the language \( \{a\} \).
- From item 2, the regular expression \( \varepsilon \) represents the language \( \{ \varepsilon \} \).
- From item 3, the regular expression \( \emptyset \) represents the empty language, \( \{\} \).
- From item 4 & 5, the regular expression \( ( R_1 \cup R_2) \) and \( (R_1 \circ R_2) \) represent the languages obtained by taking the union and concatenation of the languages represented by \( R_1 \) and \( R_2 \).
- From item 6, the regular expression \( (R_1^\star) \) represents the star of the language represented by \( R_1 \).

We say that two regular expressions \( R_1 \) and \( R_2 \) are equal if they represent the same language, and we write \( R_1 = R_2 \).

This definition is lacking in how it describes 'represents'. Maybe it is a function. I think it's coverage of equality is just sufficient, as it delegates the well defined formulation of set equality.

#### \( \varepsilon \) vs \( \emptyset \)

The regular expression \( \varepsilon \) represents the language containing a single string, the empty string, \( \{\varepsilon\} \). The regular expression \( \emptyset \) represents the empty language, \( \{\} \).

#### Precedence order

Star has highest precedence, followed by concatenation, with union last.

#### \( R^+ \)Shorthand

\( R^+ \) is used as convenient shorthand for \( RR^* \); thus, \( R^* = R^+ \cup \epsilon \).

#### Two identities

If \( R \) is any regular expression then:

[Note: Shouldn't there be an extra rule for creating a regular expression from a set? How \( \Sigma \) allowed in the regular expressions given as an example on this card?]

### Example

The regular expressions here represent the languages below.

- \( 0^*10^* \)
- \( \Sigma^*1\Sigma^* \)
- \( \Sigma^*001\Sigma^* \)
- \( 1^*(01^+)^*\)
- \( (\Sigma \Sigma)^* \)
- \( (\Sigma \Sigma \Sigma)^* \)
- \( 01 \cup 10 = \{01, 10\} \)
- \( 0\Sigma^*0 \cup 1\Sigma^*1 \cup 0 \cup 1 \)
- \( (0 \cup \varepsilon)1^* = 01^* \cup 1^* \)
- \( (0 \cup \varepsilon)(1 \cup \varepsilon) \)
- \( 1^*\emptyset \)
- \( \emptyset^* \)

- \( \{w | w \text{ contains a single 1} \} \)
- \( \{w | w \text{ has at least one 1} \} \)
- \( \{w | w \text{ contains the string 001 as a subsequence} \} \)
- \( \{w | \text{ every 0 in } w \text{ is followed by at least one 1} \} \)
- \( \{w | w \text{ is a string of even length} \} \)
- \( \{w | \text{ the length of } w \text{ is a multiple of 3} \} \)
- \( \{01, 10 \} \)
- \( \{w | w \text{ starts and ends with the same symbol} \} \)
- \( 01^* \cup 1^* \)
- \( \{\varepsilon, 0, 1, 01 \} \)
- \( \emptyset \)
- \( \{\varepsilon\} \)