Symbol codes
A binary symbol code \( C \) for an ensemble a probability space \( (\Omega, \mathcal{A}_x, \mathcal{P}_x) \) is a mapping from the range of events \( \mathcal{A}_x = \{a_1, ..., a_n\} \), to \( \{0, 1\}^+ \). Let \( x \in \mathcal{A}_x \). \( c(x) \) denotes the codeword corresponding to \( x \) and \( l(x) \) will denote its length, with shorthand \( l_i = l(a_i) \).
The extended code, denoted \( C^+ \), is a mapping from \( \mathcal{A}_x^+ \) to \( \{0,1\}^+ \) obtained by concatenation:
A code \( C(X) \) is said to be uniquely decodable if, under the extended code \( C^+ \), no two distinct strings have the same encoding. So:
or equivalently:
A symbol code is a prefix code if no codeword is a prefix of any other codeword. A prefix code is also known as a self-punctuating or instantaneous code, as an encoded string can be decoded from left to right without having to look ahead to subsequent codewords. A prefix code is uniquely decodable.
The expected length \( L(C, X) \) of a symbol code \( C \) for ensemble \( X \) is:
Alternatively written as:
Where \( I = |\mathcal{A}_x| \)