We prove a lower estimate on the increase in entropy when two copies of a conditional random variable X Y , with X supported on Z q = 0 1 q − 1 for prime q , are summed modulo q . Specifically, given two i.i.d. copies ( X 1 Y 1 ) and ( X 2 Y 2 ) of a pair of random variables ( X Y ) , with X taking values in Z q , we show H ( X 1 + X 2 Y 1 Y 2 ) − H ( X Y ) ( q ) H ( X Y )(1 − H ( X Y )) for some 0"> ( q ) 0 , where H ( ) is the normalized (by factor log 2 q ) entropy. In particular, if X Y is not close to being fully random or fully deterministic and H ( X Y ) ( 1 − ) , then the entropy of the sum increases by q ( ) . Our motivation is an effective analysis of the finite-length behavior of polar codes, for which the linear dependence on is quantitatively important. The assumption of q being prime is necessary: for X supported uniformly on a proper subgroup of Z q we have H ( X + X ) = H ( X ) . For X supported on infinite groups without a finite subgroup (the torsion-free case) and no conditioning, a sumset inequality for the absolute increase in (unnormalized) entropy was shown by Tao (Tao '10).
We use our sumset inequality to analyze Ar\i kan's construction of polar codes and prove that for any q -ary source X , where q is any fixed prime, and any 0"> 0 , polar codes allow {\em efficient} data compression of N i.i.d. copies of X into ( H ( X ) + ) N q -ary symbols, {\em as soon as N is polynomially large in 1 }. We can get capacity-achieving source codes with similar guarantees for composite alphabets, by factoring q into primes and combining different polar codes for each prime in factorization.
A consequence of our result for noisy channel coding is that for {\em all} discrete memoryless channels, there are explicit codes enabling reliable communication within 0"> 0 of the symmetric Shannon capacity for a block length and decoding complexity bounded by a polynomial in 1 . The result was previously shown for the special case of binary-input channels (Guruswami-Xia '13, Hassani-Alishahi-Urbanke '13), and this work extends the result to channels over any alphabet.