 # Symbol codes

A binary symbol code $$C$$ for an ensemble a probability space $$(\Omega, \mathcal{A}_x, \mathcal{P}_x)$$ is a mapping from the range of events $$\mathcal{A}_x = \{a_1, ..., a_n\}$$, to [...]. Let $$x \in \mathcal{A}_x$$. $$c(x)$$ denotes the codeword corresponding to $$x$$ and $$l(x)$$ will denote its length, with shorthand $$l_i = l(a_i)$$.

The extended code, denoted $$C^+$$, is a mapping from [...] to [...] obtained by concatenation:

$c^+(x_1x_2...x_N) = c(x_1)c(x_2)...c(x_N)$

A code $$C(X)$$ is said to be [...] if, under the extended code $$C^+$$, no two distinct strings have the same encoding. So:

$\forall x,y \in \mathcal{A}_x^+, x \neq y \implies c^+(x) \neq c^+(y)$

or equivalently:

$\forall x,y \in \mathcal{A}_x^+, c^+(x) = c^+(y) \implies x = y$

A symbol code is a prefix code if [...]. A prefix code is also known as a self-punctuating or instantaneous code, as an encoded string can be decoded from left to right without having to look ahead to subsequent codewords. A prefix code is uniquely decodable.

The expected length $$L(C, X)$$ of a symbol code $$C$$ for ensemble $$X$$ is:

$L(C,X) = \sum_{x \in \mathcal{A}_x} P(x)l(x)$

Alternatively written as:

$L(C, X) = \sum_{i=1}^{I} p_il_i$

Where $$I = |\mathcal{A}_x|$$