No single algorithm can compress all possible files

Kliment · on Jan 9, 2010

You can make that more concise too. An ideal compressor would be able to map every string of length n to a string of length m<n and back. Since a bijection requires that its source and target sets are of the same cardinality, and there are more strings of length m than length n, this compressor cannot be a bijection. Or even more concisely: you can't fit 10 pigeons into 9 pigeonholes unless one pigeon shares.

Ariens · on Jan 9, 2010

Isn't it the opposite? there are more strings of length n than the number of string with length m < n.

There are b^n strings of length n, but only (b^n - 1)/(b - 1) strings of length m < n, where b is the number of possible characters

Kliment · on Jan 9, 2010

Oops, indeed, it was meant the other way around. Past the edit window now though. Thanks for pointing it out.

brazzy · on Jan 9, 2010

Um, yeah. Pigeonhole principle. Not exactly news...

http://en.wikipedia.org/wiki/Pigeonhole_principle

Any programmer who does not know this near-instinctively (and also related things like the impossibility of having distinct hash values for all strings) should be ashamed of himself and brush up on basic math.

rlpb · on Jan 9, 2010

Additionally, a compression algorithm need only have a worst case of adding one extra bit in the case that it couldn't compress a file.

niyazpk · on Jan 9, 2010

This post reminded me about the day when I invented a perfect compression algorithm ( http://www.diovo.com/2010/01/the-perfect-compression-algorit... )

abdulhaq · on Jan 9, 2010

'Near-instinctive' knowledge of this, before being exposed to the basic idea, would make a good marker for high intelligence, and one which I failed at (and I'm no slouch). Ask a few colleagues about it, and prepare to be disappointed!

brazzy · on Jan 10, 2010

I did mean "near-instinctive" to apply after having learned about the concepts of surjective, injective and bijective functions. These are so basic and important that I believe you can't be a professional programmer without having been exposed to them, and having them stick at an instinctive level.

cduan · on Jan 9, 2010

Technically, this proof is slightly flawed. Why?

Take the set of all files of 10 bits. There are 1024 possibilities.

You can compress these files to 9 bits. But you could also compress them to 8, 7, 6, 5, 4, 3, 2, 1, or even 0 bits. In total, that gives you: 1024 possibilities!

Of course, the issue is the initial assumption that the files all contain exactly 10 bits. If we had started with the problem of compressing files between 0 and 10 bits to files with at most 9 bits, the proof is trivial.

Ariens · on Jan 9, 2010

That only gives 1023 possibilities

teeja · on Jan 9, 2010

Obviously, since there are are files that no algorithm can compress.

http://en.wikipedia.org/wiki/Kolmogorov_complexity

http://www.cs.auckland.ac.nz/CDMTCS/chaitin/ns.html

Ariens · on Jan 9, 2010

I did not understand how this proves that no single algorithm can compress all files. Why does he assume that the compression generates less files?

DLWormwood · on Jan 9, 2010

I was going to post a response, but brazzy beat me to it...

http://news.ycombinator.com/item?id=1041416

Ariens · on Jan 9, 2010

I knew the principle but I read the article as if set 1024 and set 512 where specific instances of the sets ence I was unable to derive the conclusion.