摘要:We prove a lower estimate on the increase in entropy when two copies of a conditional random variable X | Y, with X supported on Z_q={0,1,...,q-1} for prime q, are summed modulo q. Specifically, given two i.i.d. copies (X_1,Y_1) and (X_2,Y_2) of a pair of random variables (X,Y), with X taking values in Z_q, we show H(X_1 + X_2 \mid Y_1, Y_2) - H(X|Y) >=e alpha(q) * H(X|Y) (1-H(X|Y)) for some alpha(q) > 0, where H(.) is the normalized (by factor log_2(q)) entropy. In particular, if X | Y is not close to being fully random or fully deterministic and H(X| Y) \in (gamma,1-gamma), then the entropy of the sum increases by Omega_q(gamma). Our motivation is an effective analysis of the finite-length behavior of polar codes, for which the linear dependence on gamma is quantitatively important. The assumption of q being prime is necessary: for X supported uniformly on a proper subgroup of Z_q we have H(X+X)=H(X). For X supported on infinite groups without a finite subgroup (the torsion-free case) and no conditioning, a sumset inequality for the absolute increase in (unnormalized) entropy was shown by Tao in [Tao, CP&R 2010]. We use our sumset inequality to analyze Ari kan's construction of polar codes and prove that for any q-ary source X, where q is any fixed prime, and anyepsilon > 0, polar codes allow efficient data compression of N i.i.d. copies of X into (H(X)+epsilon)N q-ary symbols, as soon as N is polynomially large in 1/epsilon. We can get capacity-achieving source codes with similar guarantees for composite alphabets, by factoring q into primes and combining different polar codes for each prime in factorization. A consequence of our result for noisy channel coding is that for all discrete memoryless channels, there are explicit codes enabling reliable communication within epsilon > 0 of the symmetric Shannon capacity for a block length and decoding complexity bounded by a polynomial in 1/epsilon. The result was previously shown for the special case of binary-input channels [Guruswami/Xial, FOCS'13; Hassani/Alishahi/Urbanke, CoRR 2013], and this work extends the result to channels over any alphabet.
关键词:Polar codes; polynomial gap to capacity; entropy sumset inequality; arbitrary alphabets