Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
No single algorithm can compress all possible files (worthamention.wordpress.com)
12 points by abdulhaq on Jan 9, 2010 | hide | past | favorite | 14 comments


You can make that more concise too. An ideal compressor would be able to map every string of length n to a string of length m<n and back. Since a bijection requires that its source and target sets are of the same cardinality, and there are more strings of length m than length n, this compressor cannot be a bijection. Or even more concisely: you can't fit 10 pigeons into 9 pigeonholes unless one pigeon shares.


Isn't it the opposite? there are more strings of length n than the number of string with length m < n.

There are b^n strings of length n, but only (b^n - 1)/(b - 1) strings of length m < n, where b is the number of possible characters


Oops, indeed, it was meant the other way around. Past the edit window now though. Thanks for pointing it out.


Um, yeah. Pigeonhole principle. Not exactly news...

http://en.wikipedia.org/wiki/Pigeonhole_principle

Any programmer who does not know this near-instinctively (and also related things like the impossibility of having distinct hash values for all strings) should be ashamed of himself and brush up on basic math.


Additionally, a compression algorithm need only have a worst case of adding one extra bit in the case that it couldn't compress a file.


This post reminded me about the day when I invented a perfect compression algorithm ( http://www.diovo.com/2010/01/the-perfect-compression-algorit... )


'Near-instinctive' knowledge of this, before being exposed to the basic idea, would make a good marker for high intelligence, and one which I failed at (and I'm no slouch). Ask a few colleagues about it, and prepare to be disappointed!


I did mean "near-instinctive" to apply after having learned about the concepts of surjective, injective and bijective functions. These are so basic and important that I believe you can't be a professional programmer without having been exposed to them, and having them stick at an instinctive level.


Technically, this proof is slightly flawed. Why?

Take the set of all files of 10 bits. There are 1024 possibilities.

You can compress these files to 9 bits. But you could also compress them to 8, 7, 6, 5, 4, 3, 2, 1, or even 0 bits. In total, that gives you: 1024 possibilities!

Of course, the issue is the initial assumption that the files all contain exactly 10 bits. If we had started with the problem of compressing files between 0 and 10 bits to files with at most 9 bits, the proof is trivial.


That only gives 1023 possibilities


Obviously, since there are are files that no algorithm can compress.

http://en.wikipedia.org/wiki/Kolmogorov_complexity

http://www.cs.auckland.ac.nz/CDMTCS/chaitin/ns.html


I did not understand how this proves that no single algorithm can compress all files. Why does he assume that the compression generates less files?


I was going to post a response, but brazzy beat me to it...

http://news.ycombinator.com/item?id=1041416


I knew the principle but I read the article as if set 1024 and set 512 where specific instances of the sets ence I was unable to derive the conclusion.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: