what is properties of entropy in information theory ?
ENTROPY (I.e., AVERAGE INFORMATION)
(i) Definition
            In a practical communication system, we usually transmit long sequences of symbols from an information source. Thus, we are more interested in the average information that a source produces than the information content of a single symbol.
In order to get the information content of the symbol, we take notice of\ the fact that the flow of information in a system can fluctuate widely because of randomness involved into the selection of the symbols. Thus, we require to talk about the average information content of the symbols in a long message.
(ii)        Assumptions
            Thus, for quantitative representation of average information per symbol we Make the following assumptions:
(i)         The source is stationary so that the probabilities may remain constant with time.
(ii)        The successive symbols are statistically independent and come form the source at a average rate of r symbols per second.
(iii)       Mathematical Expression
            The mean value of I(x_i) over the alphabet of source X with m different symbols is represented by H(X) and given by
*          Since log [AB] = log A + log B
equation
or                                                  equation
The quantity H(X) is known as the entropy of source X. It is a measure of the average information content per source symbol. The source entropy H(X) can be considered as the average amount of uncertainty within source X that is resolved by use of the alphabet.
(iv)       Entropy for Binary Source
            It may be noted that for a binary source X which generates independent symbols 0 and 1 with equal probability, the source entropy H(X) is
H(X) = log₂ log₂ = 1b/symbol              …(9.8)
(v)        Lower and Upper Bounds on Entropy H(X)
The source entropy H(X) satisfies the following relation:
0 £ H(X) £ log₂m                    …(9.9)
where m is the size (number of symbols) of the alphabet of source X. The lower bound corresponds to no uncertainty, which occurs when one symbol has

DO YOU KNOW?

A simple interpretation of the source entropy is the following : on the average, we can expect to get H bits of information per symbol in long messages from the information source even through we cannot say in advance what symbol sequences will occur.

probability P(x_i) = 1 while P(x_i) = 0 for j i, so X emits the same symbol x_i all the time. The upper bound corresponds to the maximum uncertainty which occurs when P(x_i) = 1/m for all i, that is, when all symbols are equality likely to be emitted by X.

9.7       INFORMATION RATE
Definition and Mathematical Expression
If the time rate at which source X emits symbols is r (symbols s), the information rate R of the source is given by
R = rH(X) b/s                                      …(9.10)
Here R is information rate.
H(X) is Entropy or average information
and r is rate at which symbols are generated.
Information rate R is represented in average number of bits of information per second. It is calculated as under:
R =             …(9.11)
or                             R = Information bits/second                                      …(9.12)
EXAMPLE 9.8. Verify the following expression:
I(x_ix_j) = I(x_i) + I(x_i) if x_i and x_j are independent
*          Using the relation I(x_i) = log₂ .
Solution: If x_i and x_j .indepedent, then we know that
P(x_ix_k) = P(xi) P(x_j)
Also                             I(x_ix_j) = log
equation
or                                 I(x_ix_j) = I(x_i) + I(x_j)     Hence Proved.
EXAMPLE 9.9. A Discrete Memoryless Source (DMS) X has four symbols x₁, x₂, x₃, x₄ with probabilities P(x₁) = 0.4, P(x₂) = 0.3, P(x₃) = 0.2, P(x₄) = 0.1.
            (i)         Calculate H(X).
            (ii)        Find the amount of information contained in the messages x₁ x₂ x₁ x₃ and x₄ x₃, x₃ x₂, and compare with the H(X) obtained in part (i).
(U.P. Tech-Semester Exam. 2002-2003)
Solution: (i) We know that entropy is given by
equation
Substituting values and simplifying, we get
H(X) = – 0.4 log₂ 0.4 – 0.3 log₂ 0.3 – 0.2 log₂ 0.2 – 0.1 log₂ 0.1
Solving, we get
H(X) = 1.85 b/symbol                         Ans.
            (ii) Now, we have
P(x₁ x₂ x₁ x₃) (0.4)(0.3)(0.4)(0.2) = 0.0096
Therefore,
I(x₁x₂x₁x₃) = – log₂ 0.0096 = 6.70 b/symbol* Ans.
Thus,               I(x₁x₂x₁x₃) < 7.4 [= 4H(X)] b/symbol
Also,                P(x₄x₃x₃x₂) = (0.1)(0.2)²(0.3) = 0.0012
Therefore
I(x₄x₃x₃x₂) = – log₂ 0.0012 = 9.70 b/symbol
We conclude that
I(x₄x₃x₃x₂) > 7.4 [= 4H(X)J b/symbol             Ans.
EXAMPLE 9.10. Consider a binary memoryless source X with two symbols x₁ and x₂. Prove that H(X) is maximum when both x₁ and x₂ equiprobable.
Solution: Here, let us assume that
P(x₁) = α so that P(x₂) = 1 – α
We know that entropy is given by
equation
Thus,               H(X) = – α log₂ α – (1- α) log₂ (1- α)
Differentiating above equation with respect to α, we get
or                                 = [-α log₂ α- (1 – α) log₂ (1-α)]                  …(i)
Using the relation
equation
We get                         = – log₂ α + log₂ (1 – α) = log₂
Note that the maximum value of H(X) requires that
This means that
or   α =
Note that H(X) = 0 when α = 0 or 1.
When               P(x₁) = P(x₂) = , H(X) is maximum and is given by
H(X) = log₂ +   log₂ = 1 b/symbol                                …(ii)
EXAMPLE 9.11. Verify the following expression:
0 ≤ P(x_i) log₂ m
            where m is the size of the alphabet of X.
Solution: Proof of the lower bound:
Since                                       0 ≤ P(x_i) ≤ 1,
and     log₂
Then, it follows that
P(x_i) log₂
Thus,                                       equation
Next, we note that
P(x_i) log₂
If and only if P(x_i)= 0 or 1. Since
equation
When P(x_i) = 1 then P(x_j) = 0 for j i. Thus, only in this case, H(X) = 0.
Proof of the upper bound
            Let us consider two probability distributions [P(x_i)= P_i] and [Q(x_i) = Q_i] on the alphabet {x_i}, i = 1, 2, ….m, such that
equation
We know that
equation
Next, using the inequality
In α ≤ α – 1 α ≥ 0                                             …(iii)
and noting that the equality holds only if α = 1, we obtain
equation
                                    equation P_i = 0 by using equations (ii).
Hence, we have
equation
where the equality holds only if Q_i= P_i for all i.
Setting                                     Q_i                            i = 1,2, …,m                 …(iv)
We get                         equation
Therefore, we get                    H(X) ≤ log₂ m
and the quality holds only if the symbols in X are equiprobable.
EXAMPLE 9.12. A high-resolution black-and-white TV picture consists of about 2 x 10⁶ picture elements and 16 different brightness levels. Pictures are repeated at the rate of 32 per second. All picture elements are assumed to be independent, and all levels have equal likelihood of occurrence. Calculate the average rate of information conveyed by this TV picture source.
(U.P. Tech. (C.O.), 2003)
Solution: We know that entropy is given by
log₂ P(x_i) bits/symbol                             …(i)
Here, given that                      m = 16, P(x_i) =
Substituting all the values in equation (i), we get
equation
The rate, r, at which symbols are generated is given by
r = 2(10⁶)(32) = 64(10⁶) elements/second
Hence, the average rate of information conveyed is given by
R = rH(X) = 64(10⁶)(4) = 256(10⁶) b/s = 256 Mb/s Ans.
EXAMPLE 9.13. Given a telegraph source having two symbols, dot and dash. The dot duration is 0.2s The dash duration is 3 times the dot duration. The probability of the dot’s occurring is twice that of the dash, and the time between symbols is 0.2s. Calculate the information rate of the telegraph source.
Solution: Given that
P(dot) = 2P(dash)
P(dot) + P(dash) = 3P(dash) = 1
From above two equations, we have
P(dash) = and P(dot) =
Further, we know that the entropy is given by
log₂ P(x_i) bits/symbol
Expanding accordingly, we get
H(X) = -P(dot) log2 P(dot) – P(dash) log2 P(dash)
Substituting all the values, we get
H(X) = 0.667(0.585) + 0.333(1.585) = 0.92 b/symbol
Also, given that
t_dot = 0.2s, t_dash = 0.6s, t_space = 0.2 s
Thus, the average time per symbol will be
T_s = P(dot)t_dot + P(dash)t_dash + t_space = 0.5333 s/symbol
Further, the average symbol rate will be given by
r = = 1.875 symbols/s
Therefore, the average information rate of the telegraph source will be
R = rH(X) = 1.875(0.92) = 1.725 b/s Ans.
EXAMPLE 9.14. An analog signal is bandlimited to f_m. Hz and sampled at Nyquist rate. The samples are quantized into 4 levels. Each level represent one symbols. Thus there are 4 symbols. The probabilities of occurrence of these 4 levels (symbols) are P(x₁) = P(x₄) = and P(x₂) = P(x₃) = Obtain information rate of the source.                                               (Delhi University, 2000)
Solution: We are given four symbols with probabilities P(x_i) = P(x₄) = and P(x₂) = P(x₃) = .
Average information H(X) (or entropy) is expressed as,
H(X) = P(x₁) log₂ + P(x₂) log₂ + P(x₃) log₂ + P(x₂) = P(x₃) =
Substituting all the given values, we get
or         H(X) = log₂, 8 + log₂ + log₂ + log2 (8)
or         H(X) = 1.8 symbol Ans.
It is given that the signal is sampled at Nyquist rate. Nyquist rate for fm Hz bandlimited signal is,
Nyquist rate = 2fm samples/sec.
Since every sample generates one source symbol,
Therefore, symbols per second, r = 2fm symbols/sec.
Information rate is given by
R = rH(X)
Putting, values of r and H(X) in this equation, we get
R = 2fm symbols/sec x 1.8 bits/symbol = 3.6 fm its/sec                     Ans.
In this example, there are four levels. Those four levels may he coded using binary PCM as shown in Table 9.1
Table 9.1

S.No.	Symbol or level	Probability	Binary digits
1.	Q₁		0 0
2.	Q₂		0 1
3.	Q₃		1 0
43.	Q₄		1 1

Hence, two binary digits (binits) are required to send each symbol. We know that symbols are sent at the rate of 2fm, symbols/sec. Therefore, transmission rate of binary digits will be,
Binary digits (binits) rate = 2 binits/symbol x 2fm symbols/sec,
= 4fm binits/sec.
Because one binit is capable of conveying 1 bit of information therefore, the above coding scheme is capable of conveying 4fm, bits of information per second. But in this example, we have obtained that we are transmitting 9.6 fm bits of information per second. This mean, that the information carrying ability of binary PCM is not completely utilized by the transmission scheme.
EXAMPLE 9.15. In the transmission scheme of last example, obtain the information rate if all symbols are equally likely.
Also, comment on the result.
            Solution: We are given that there are four symbols. Since they are equally likely, therefore, their probabilities will be equal to i.e.,
P(x₁) = P(x₂) = P(x₃) = P(x₄) =
We know that, average information per symbol (Entropy) is given as,
H(X) = P(x₁) log₂ + P(x₂) log₂ + P(x₃) log₂ + P(x₄) log₂
since                            P(x₁) = P(x₂) = P (x₃) = P(x₄)
Therefore,                    H(X) = 4 P(x). log2
Again, since    P(x₁) = P(x₂) = P(x₃) = P(x₄) = P(x) =
Thus,                           H(X) = log₂ (4) = 2 bits/symbol
We know that the information rate is expressed as,
R = rH(X)
Here, r = 2fm symbols/sec. as obtained in last example. Substituting, these values in above equation, we get,
R = 2fm symbols/sec. x 2 bits/symbol = 4fm bits/sec. Ans.
Comments: Just before this example we have seen that a binary coded PCM with 2 binits per message is capable of conveying 4B bits of information per second. The transmission scheme discussed in above example transmits 4B bits of information per second. This has been made possible since all the messages are equally likely. Thus with binary PCM clding, the maximum information rate is achieved if all messages are equally likely.
EXAMPLE 9.16. A discrete source emits one of five symbols once every millisecond with probabilities 1/2, 1/4, 1/8, 1/16 and 1/16 respectively. Determine the source entropy and information rate.
Solution: We know that the source entropy is given as
equation
or         H(X) = log₂(2) + log2 (4) + log₂(8) + log₂ (16) + log₂ (16)
or         H(X)    = = 1.875 bit/symbol   Ans.
The symbol rate r = = 1000 symbols/sec.
Therefore, the information rate is expressed as
R = rH(X) = 1000 x 1.875 = 1875 bits-sec.                Ans.
EXAMPLE 9.17. The probabilities of the five possible outocomes of an experiment are given as
P(x1) = , P(x2) = P(x3) = P(x4) = P(x5) =
            Determine the entropy and information rate if there are 16 outcomes per second.                                                                                            (Delhi University, 1999)
Solution: The entropy of the system is given as
equation
or         H(X) = log₂2 + log₂ 4 + log₂8 + log₂ 16 + log₂ 16
or         H(X)    =
or         H(X) = = = 1.875 bits/outcome    Ans.
Now, rate of outcomes r = 16 outcomes/sec.
Therefore, the rate of information R will be
R = rH(X) = 16 x = 30 bits/sec      Ans.
EXAMPLE 9.18. An analog signal bandlimited to 10 kHz is quantized in 8 levels of a PCM system with probabilities of 1/4, 1/5 1/5, 1/10, 1/10, 1/20, 1/20 and 1/20 respectively. Find the entropy and the rate of information.
Solution: We know that according to the sampling theorem, the signal must be sampled at a frequency given as
f_s = 10 x 2 kHz = 20 kHz
Considering each of the eight quantized levels as a message, the entropy of the source may written as
H(X)= log₂ 4 + log₂ 5 + log₂ 5 + log₂ 10 + log₂ 10 + log₂ 20 + log₂ 20 + log₂ 20
or         H(X) log₂ 4 + log₂ 4 + log₂ 20 = 2.84 bits/message

DO YOU KNOW?

The reader should be aware of the fact that the data rate and information rate are two distinctly different quantities.

As the sampling frequency is 20 kHz then, the rate at which message is produced, will be given as
or r = 20000 messages/sec.
Hence the information rate is
R = rH(X) = 20000 x 2.84 = 56800 bit/sec. Ans.

ENTROPY | what is properties of entropy in information theory (AVERAGE INFORMATION)

Leave a Reply

Leave a Reply Cancel reply

Leave a Reply