 Show Question
Math and science::Theory of Computation

# Regular expressions

Regular expressions can be seen as a parallel to arithmetic expressions.

• $$(5 + 3) \times 4$$ is an expression whose symbols represent numbers (e.g. natural numbers) and as a whole represents another number, 32.
• $$(0 \cup 1)0^\star$$ is an expression involving languages (sets of strings) and as a whole represents another language, the set of all strings which start with a 0 or 1 and is followed by any number of 0s.

A regular expression $$R$$ is a sequence of symbols that is one of:

1. a symbol $$a$$ for some $$a$$ in an alphabet $$\Sigma$$
2. $$\varepsilon$$
3. $$\emptyset$$
4. $$(R_1 \cup R_2)$$, where $$R_1$$ and $$R_2$$ are regular expressions
5. $$(R_1 \circ R_2)$$, where $$R_1$$ and $$R_2$$ are regular expressions
6. $$(R_1^*)$$, where $$R_1$$ is a regular expression

A regular expression is said to represent a language according to the rules below. We write $$L(R)$$ to mean the language of the regular expression $$R$$.

1. From item 1, the regular expression $$a$$ represents the language $$\{a\}$$.
2. From item 2, the regular expression $$\varepsilon$$ represents the language $$\{ \varepsilon \}$$.
3. From item 3, the regular expression $$\emptyset$$ represents the empty language, $$\{\}$$.
4. From item 4 & 5, the regular expression $$( R_1 \cup R_2)$$ and $$(R_1 \circ R_2)$$ represent the languages obtained by taking the union and concatenation of the languages represented by $$R_1$$ and $$R_2$$.
5. From item 6, the regular expression $$(R_1^\star)$$ represents the star of the language represented by $$R_1$$.

We say that two regular expressions $$R_1$$ and $$R_2$$ are equal if they represent the same language, and we write $$R_1 = R_2$$.

This definition is lacking in how it describes 'represents'. Maybe it is a function. I think it's coverage of equality is just sufficient, as it delegates the well defined formulation of set equality.

#### $$\varepsilon$$ vs $$\emptyset$$

The regular expression $$\varepsilon$$ represents the language containing a single string, the empty string, $$\{\varepsilon\}$$. The regular expression $$\emptyset$$ represents the empty language, $$\{\}$$.

#### Precedence order

Star has highest precedence, followed by concatenation, with union last.

#### $$R^+$$Shorthand

$$R^+$$ is used as convenient shorthand for $$RR^*$$; thus, $$R^* = R^+ \cup \epsilon$$.

#### Two identities

If $$R$$ is any regular expression then:

$R \cup \emptyset = R$
$R \circ \varepsilon = R$

[Note: Shouldn't there be an extra rule for creating a regular expression from a set? How $$\Sigma$$ allowed in the regular expressions given as an example on this card?]

### Example

The regular expressions here represent the languages below.

1. $$0^*10^*$$
2. $$\Sigma^*1\Sigma^*$$
3. $$\Sigma^*001\Sigma^*$$
4. $$1^*(01^+)^*$$
5. $$(\Sigma \Sigma)^*$$
6. $$(\Sigma \Sigma \Sigma)^*$$
7. $$01 \cup 10 = \{01, 10\}$$
8. $$0\Sigma^*0 \cup 1\Sigma^*1 \cup 0 \cup 1$$
9. $$(0 \cup \varepsilon)1^* = 01^* \cup 1^*$$
10. $$(0 \cup \varepsilon)(1 \cup \varepsilon)$$
11. $$1^*\emptyset$$
12. $$\emptyset^*$$
1. $$\{w | w \text{ contains a single 1} \}$$
2. $$\{w | w \text{ has at least one 1} \}$$
3. $$\{w | w \text{ contains the string 001 as a subsequence} \}$$
4. $$\{w | \text{ every 0 in } w \text{ is followed by at least one 1} \}$$
5. $$\{w | w \text{ is a string of even length} \}$$
6. $$\{w | \text{ the length of } w \text{ is a multiple of 3} \}$$
7. $$\{01, 10 \}$$
8. $$\{w | w \text{ starts and ends with the same symbol} \}$$
9. $$01^* \cup 1^*$$
10. $$\{\varepsilon, 0, 1, 01 \}$$
11. $$\emptyset$$
12. $$\{\varepsilon\}$$

#### Source

Introduction to the Theory of Computation, Sipser
p64-65