Brian-M wrote:Unfortunately, this process has two flaws...
Firstly, you'll end up with lots of false positives, so not only will you end up with the information you started with, you'll also end up with lots of information that never existed before, and you may not be able to tell which information is the information that was originally encoded.
Yes. Among the "lots" of information that you can extract from your algorithm (that never existed before) is a nearly-identical copy of your original information which has a subtle flaw or bug that gives the exactly wrong answer.
There are an infinite number of programs whose source code has the same MD5 (and SHA-1 and whatever additional hash you want to use) as the GNU Hello World example, but that prints "Goodbye World" instead. But on the bright side, if you include the original file size in your compressed data, almost all of the files that match the hashes won't compile. But one might.
Or for this comic example, an infinite number of proofs that can be extracted from the margin have a subtle flaw that makes them invalid. I have a fascinating proof for this, but this post is too small to... oh, wait. Here, I've compressed it for you: -=> . <=-