what is properties of entropy in information theory ?

**ENTROPY (I.e., AVERAGE INFORMATION) **

**(i) Definition**

** **In a practical communication system, we usually transmit long sequences of symbols from an information source. Thus, we are more interested in the average information that a source produces than the information content of a single symbol.

In order to get the information content of the symbol, we take notice of\ the fact that the flow of information in a system can fluctuate widely because of randomness involved into the selection of the symbols. Thus, we require to talk about the average information content of the symbols in a long message.

**(ii) Assumptions**

** **Thus, for quantitative representation of average information per symbol we Make the following assumptions:

(i) The source is stationary so that the probabilities may remain constant with time.

(ii) The successive symbols are statistically independent and come form the source at a average rate of *r* symbols per second.

**(iii) Mathematical Expression**

** **The mean value of *I*(x_{i}) over the alphabet of source X with *m* different symbols is represented by **H(X)** and given by

* Since log [AB] = log A + log B

**equation**

or **equation**

The quantity **H(X)** is known as the entropy of source **X**. It is a measure of the average information content per source symbol. The source entropy H(X) can be considered as the average amount of uncertainty within source *X* that is resolved by use of the alphabet.

**(iv) Entropy for Binary Source**

** **It may be noted that for a binary source *X* which generates independent symbols 0 and 1 with equal probability, the source entropy H(X) is

H(X) = log_{2} log_{2} = 1b/symbol …(9.8)

**(v) Lower and Upper Bounds on Entropy H(X) **

The source entropy H(X) satisfies the following relation:

0 £ H(X) £ log

_{2}

*m*…(9.9)

where

*m*is the size (number of symbols) of the alphabet of source X. The lower bound corresponds to no uncertainty, which occurs when one symbol has

DO YOU KNOW? |

A simple interpretation of the source entropy is the following : on the average, we can expect to get H bits of information per symbol in long messages from the information source even through we cannot say in advance what symbol sequences will occur. |

probability P(x_{i}) = 1 while P(x_{i}) = 0 for j i, so X emits the same symbol x* _{i}* all the time. The upper bound corresponds to the maximum uncertainty which occurs when P(x

_{i}) = 1/m for all

*i*, that is, when all symbols are equality likely to be emitted by

*X*.

**9.7 INFORMATION RATE**

Definition and Mathematical Expression

If the time rate at which source

*X*emits symbols is

*r*(symbols s), the information rate R of the source is given by

R = rH(X) b/s …(9.10)

Here R is information rate.

H(X) is Entropy or average information

and

*r*is rate at which symbols are generated.

Information rate R is represented in average number of bits of information per second. It is calculated as under:

R = …(9.11)

or R = Information bits/second …(9.12)

**EXAMPLE 9.8. Verify the following expression:**

*I***(x**

*x*_{i}*) =*_{j}*I*(x*) +*_{i}*I*(x_{i}) if x*and x*_{i}*are independent*_{j}* Using the relation I(x

_{i}) = log

_{2}.

**Solution:**If x

_{i}and x

_{j}.indepedent, then we know that

P(x

_{i}x

_{k}) = P(xi) P(x

_{j})

Also

*I*(x

_{i}x

_{j}) = log

**equation**

or

*I*(x

_{i}x

_{j}) =

*I*(x

_{i}) +

*I*(x

_{j})

**Hence Proved**.

**EXAMPLE 9.9. A Discrete Memoryless Source (DMS) X has four symbols x**

_{1}, x_{2}, x_{3}, x_{4}with probabilities*P*(x_{1}) = 0.4,*P*(x_{2}) = 0.3, P(x_{3}) = 0.2, P(x_{4}) = 0.1.**(i) Calculate H(X).**

**(ii) Find the amount of information contained in the messages x**

_{1}x_{2}x_{1}x_{3}and x_{4}x_{3}, x_{3}x_{2}, and compare with the H(X) obtained in part (i).

*(U.P. Tech-Semester Exam. 2002-2003)***Solution:**(i) We know that entropy is given by

**equation**

Substituting values and simplifying, we get

H(X) = – 0.4 log

_{2}0.4 – 0.3 log

_{2}0.3 – 0.2 log

_{2}0.2 – 0.1 log

_{2}0.1

Solving, we get

H(X) = 1.85 b/symbol

**Ans.**

**(ii) Now, we have**

P(x

_{1}x

_{2}x

_{1}x

_{3}) (0.4)(0.3)(0.4)(0.2) = 0.0096

Therefore,

*I*(x

_{1}x

_{2}x

_{1}x

_{3}) = – log

_{2}0.0096 = 6.70 b/symbol*

**Ans.**

Thus,

*I*(x

_{1}x

_{2}x

_{1}x

_{3}) < 7.4 [= 4H(X)] b/symbol

Also, P(x

_{4}x

_{3}x

_{3}x

_{2}) = (0.1)(0.2)

^{2}(0.3) = 0.0012

Therefore

*I*(x

_{4}x

_{3}x

_{3}x

_{2}) = – log

_{2}0.0012 = 9.70 b/symbol

We conclude that

*I*(x

_{4}x

_{3}x

_{3}x

_{2}) > 7.4 [= 4H(X)J b/symbol

**Ans.**

**EXAMPLE 9.10. Consider a binary memoryless source**

*X*with two symbols x_{1}and x_{2}. Prove that H(X) is maximum when both x_{1}and x_{2}equiprobable.**Solution:**Here, let us assume that

P(x

_{1}) = α so that P(x

_{2}) = 1 – α

We know that entropy is given by

**equation**

Thus, H(X) = – α log

_{2}α – (1- α) log

_{2}(1- α)

Differentiating above equation with respect to α, we get

or = [-α log

_{2}α- (1 – α) log

_{2}(1-α)] …(i)

Using the relation

**equation**

We get = – log

_{2}α + log

_{2}(1 – α) = log

_{2}

Note that the maximum value of H(X) requires that

This means that

or α =

Note that H(X) = 0 when α = 0 or 1.

When P(x

_{1}) = P(x

_{2}) = , H(X) is maximum and is given by

H(X) = log

_{2}+ log

_{2}= 1 b/symbol …(ii)

**EXAMPLE 9.11. Verify the following expression:**

**0 ≤ P(x**

_{i}) log_{2}*m***where**

*m*is the size of the alphabet of X.**Solution: Proof of the lower bound:**

Since 0 ≤ P(x

_{i}) ≤ 1,

and log

_{2 }

Then, it follows that

P(x

_{i}) log

_{2 }

Thus,

**equation**

Next, we note that

P(x

_{i}) log

_{2 }

If and only if P(x

_{i})= 0 or 1. Since

**equation**

When

*P*(x

_{i}) = 1 then P(x

_{j}) = 0 for j i. Thus, only in this case, H(X) = 0.

**Proof of the upper bound**

**Let us consider two probability distributions [P(x**

_{i})= P

*] and [Q(x*

_{i}*) = Q*

_{i}*] on the alphabet {x*

_{i}*},*

_{i}*i*= 1, 2, ….m, such that

**equation**

We know that

**equation**

Next, using the inequality

In α ≤ α – 1 α ≥ 0 …(iii)

and noting that the equality holds only if α = 1, we obtain

**equation**

**equation**P

*= 0 by using equations (ii).*

_{i}Hence, we have

**equation**

where the equality holds only if Q

*= P*

_{i}*for all*

_{i}*i*.

Setting Q

*i = 1,2, …,m …(iv)*

_{i}We get

**equation**

Therefore, we get H(X) ≤ log

_{2}

*m*

and the quality holds only if the symbols in

*X*are equiprobable.

**EXAMPLE 9.12. A high-resolution black-and-white TV picture consists of about 2 x 10**

^{6}picture elements and 16 different brightness levels. Pictures are repeated at the rate of 32 per second. All picture elements are assumed to be independent, and all levels have equal likelihood of occurrence. Calculate the average rate of information conveyed by this TV picture source.

*(U.P. Tech. (C.O.), 2003)***Solution:**We know that entropy is given by

log

_{2}P(x

_{i}) bits/symbol …(i)

Here, given that m = 16, P(x

_{i}) =

Substituting all the values in equation (i), we get

**equation**

The rate,

*r*, at which symbols are generated is given by

*r*= 2(10

^{6})(32) = 64(10

^{6}) elements/second

Hence, the average rate of information conveyed is given by

R = rH(X) = 64(10

^{6})(4) = 256(10

^{6}) b/s = 256 Mb/s

**Ans.**

**EXAMPLE 9.13. Given a telegraph source having two symbols, dot and dash. The dot duration is 0.2s The dash duration is 3 times the dot duration. The probability of the dot’s occurring is twice that of the dash, and the time between symbols is 0.2s. Calculate the information rate of the telegraph source.**

**Solution:**Given that

*P*(dot) = 2

*P*(dash)

*P*(dot) +

*P*(dash) = 3P(dash) = 1

From above two equations, we have

P(dash) = and P(dot) =

Further, we know that the entropy is given by

log

_{2}P(x

_{i}) bits/symbol

Expanding accordingly, we get

H(X) = -P(dot) log2

*P*(dot) – P(dash) log2 P(dash)

Substituting all the values, we get

H(X) = 0.667(0.585) + 0.333(1.585) = 0.92 b/symbol

Also, given that

t

_{dot}= 0.2s, t

_{dash}= 0.6s, t

_{space}= 0.2 s

Thus, the average time per symbol will be

T

_{s}= P(dot)t

_{dot}+ P(dash)t

_{dash}+ t

_{space}= 0.5333 s/symbol

Further, the average symbol rate will be given by

r = = 1.875 symbols/s

Therefore, the average information rate of the telegraph source will be

R =

*r*H(X) = 1.875(0.92) = 1.725 b/s

**Ans.**

**EXAMPLE 9.14. An analog signal is bandlimited to f**

_{m}. Hz and sampled at Nyquist rate. The samples are quantized into 4 levels. Each level represent one symbols. Thus there are 4 symbols. The probabilities of occurrence of these 4 levels (symbols) are P(x_{1}) = P(x_{4}) =**and P(x**

_{2}) = P(x_{3}) =

**Obtain information rate of the source.**

*(Delhi University, 2000)*Solution: We are given four symbols with probabilities P(x

_{i}) = P(x

_{4}) = and P(x

_{2}) = P(x

_{3}) = .

Average information H(X) (or entropy) is expressed as,

*H(X)*= P(x

_{1}) log

_{2}+ P(x

_{2}) log

_{2}+ P(x

_{3}) log

_{2}+ P(x

_{2}) = P(x

_{3}) =

Substituting all the given values, we get

or H(X) = log

_{2}, 8 + log

_{2}+ log

_{2}+ log2 (8)

or H(X) = 1.8 symbol

**Ans.**

It is given that the signal is sampled at Nyquist rate. Nyquist rate for fm Hz bandlimited signal is,

Nyquist rate = 2fm samples/sec.

Since every sample generates one source symbol,

Therefore, symbols per second,

*r*= 2fm symbols/sec.

Information rate is given by

R =

*r*H(X)

Putting, values of

*r*and H(X) in this equation, we get

R = 2fm symbols/sec x 1.8 bits/symbol = 3.6 fm its/sec

**Ans.**

In this example, there are four levels. Those four levels may he coded using binary PCM as shown in Table 9.1

**Table 9.1**

S.No. |
Symbol or level |
Probability |
Binary digits |

1. | Q_{1} |
0 0 | |

2. | Q_{2} |
0 1 | |

3. | Q_{3} |
1 0 | |

43. | Q_{4} |
1 1 |

Hence, two binary digits (binits) are required to send each symbol. We know that symbols are sent at the rate of 2fm, symbols/sec. Therefore, transmission rate of binary digits will be,

Binary digits (binits) rate = 2 binits/symbol x 2*fm* symbols/sec,

= 4*fm* binits/sec.

Because one binit is capable of conveying 1 bit of information therefore, the above coding scheme is capable of conveying 4*fm*, bits of information per second. But in this example, we have obtained that we are transmitting 9.6 *fm* bits of information per second. This mean, that the information carrying ability of binary PCM is not completely utilized by the transmission scheme.

**EXAMPLE 9.15. In the transmission scheme of last example, obtain the information rate if all symbols are equally likely. **

**Also, comment on the result. **

** Solution:** We are given that there are four symbols. Since they are equally likely, therefore, their probabilities will be equal to i.e.,

P(x_{1}) = P(x_{2}) = P(x_{3}) = P(x_{4}) =

We know that, average information per symbol (Entropy) is given as,

*H(X)* = P(x_{1}) log_{2} + P(x_{2}) log_{2} + P(x_{3}) log_{2} + P(x_{4}) log_{2}

since P(x_{1}) = P(x_{2}) = P (x_{3}) = P(x_{4})

Therefore, H(X) = 4 P(x). log2

Again, since P(x_{1}) = P(x_{2}) = P(x_{3}) = P(x_{4}) = P(x) =

Thus, H(X) = log_{2} (4) = 2 bits/symbol

We know that the information rate is expressed as,

R = *r*H(X)

Here, *r* = 2*fm* symbols/sec. as obtained in last example. Substituting, these values in above equation, we get,

R = 2*fm* symbols/sec. x 2 bits/symbol = 4*fm* bits/sec. **Ans.**

**Comments:** Just before this example we have seen that a binary coded PCM with 2 binits per message is capable of conveying 4B bits of information per second. The transmission scheme discussed in above example transmits 4B bits of information per second. This has been made possible since all the messages are equally likely. Thus with binary PCM clding, the maximum information rate is achieved if all messages are equally likely.

**EXAMPLE 9.16. A discrete source emits one of five symbols once every millisecond with probabilities 1/2, 1/4, 1/8, 1/16 and 1/16 respectively. Determine the source entropy and information rate. **

**Solution:** We know that the source entropy is given as

**equation**

or H(X) = log_{2}(2) + log2 (4) + log_{2}(8) + log_{2} (16) + log_{2} (16)

or H(X) = = 1.875 bit/symbol **Ans.**

The symbol rate *r * = = 1000 symbols/sec.

Therefore, the information rate is expressed as

R = rH(X) = 1000 x 1.875 = 1875 bits-sec. **Ans.**

**EXAMPLE 9.17. The probabilities of the five possible outocomes of an experiment are given as**

**P(x1) = ****, P(x2) = **** P(x3) = **** P(x4) = P(x5) = **

** Determine the entropy and information rate if there are 16 outcomes per second. (Delhi University, 1999)**

**Solution:**The entropy of the system is given as

**equation**

or

*H(X)*= log

_{2 }2 + log

_{2}4 + log

_{2 }8 + log

_{2}16 + log

_{2}16

or

*H(X)*=

or

*H(X)*= = = 1.875 bits/outcome

**Ans.**

Now, rate of outcomes

*r*= 16 outcomes/sec.

Therefore, the rate of information

*R*will be

R = rH(X) = 16 x = 30 bits/sec

**Ans.**

**EXAMPLE 9.18. An analog signal bandlimited to 10 kHz is quantized in 8 levels of a PCM system with probabilities of 1/4, 1/5 1/5, 1/10, 1/10, 1/20, 1/20 and 1/20 respectively. Find the entropy and the rate of information.**

**Solution:**We know that according to the sampling theorem, the signal must be sampled at a frequency given as

f

*= 10 x 2 kHz = 20 kHz*

_{s}Considering each of the eight quantized levels as a message, the entropy of the source may written as

*H(X)=*log

_{2}4 + log

_{2}5 + log

_{2}5 + log

_{2}10 + log

_{2}10 + log

_{2}20 + log

_{2}20 + log

_{2}20

or H(X) log

_{2}4 + log

_{2}4 + log

_{2}20 = 2.84 bits/message

DO YOU KNOW? |

The reader should be aware of the fact that the data rate and information rate are two distinctly different quantities. |

As the sampling frequency is 20 kHz then, the rate at which message is produced, will be given as

or *r* = 20000 messages/sec.

Hence the information rate is

R = *r*H(X) = 20000 x 2.84 = 56800 bit/sec. **Ans.**