## Create a perfect lossless compression algorithm

A forum for good logic/math puzzles.

Moderators: jestingrabbit, Moderators General, Prelates

HonoreDB
Posts: 161
Joined: Tue Feb 06, 2007 4:32 pm UTC
Contact:

### Create a perfect lossless compression algorithm

Alice and Bob have been contracted to transmit information over an expensive channel. Alice will be given a series of files, each consisting of exactly 1048576 random bits. Upon receiving each one, she can send whatever information she wants to Bob, at a cost of \$1 per bit. Bob must then recreate the file. If he gets it right, Alice and Bob receive \$1048576, and Alice will get another random file.

What algorithm can Alice and Bob use to maximize their expected net profit per file, with zero chance of getting a file wrong?

Yat
Posts: 136
Joined: Tue Feb 03, 2009 2:05 pm UTC

### Re: Create a perfect lossless compression algorithm

Does Alice have a way of telling Bob the transfer is over, or do we need to establish a protocl which allows Bob to know when he has read the last bit, and stop expecting new information from Alice ?

If Bob can know when the transfer is over wothout having to figuring it out with the data, then
Spoiler:
we can map all the 1048576 bits files to files with length inferior or equal to 1048576 bits. We have 2^n files of length n, (2^n)-1 files of length below n, so in our case only one specific file will require 1048576 bits. I don't really know how to generalize for any file length, but trying manually with a spreadsheet on manageable sizes seems to indicate that the number of bits gained with this method quickly converges to 2.

HonoreDB
Posts: 161
Joined: Tue Feb 06, 2007 4:32 pm UTC
Contact:

### Re: Create a perfect lossless compression algorithm

Yes, Bob knows when the transfer is over (otherwise I think this is impossible). And your spoilered analysis describes the sort of thing I'm looking for, and constitutes a proof that it's optimal...the rest of the challenge is coming up with an algorithm that actually does that. Obviously there are many, but I've come up with one very simple one.

Qaanol
The Cheshirest Catamount
Posts: 3069
Joined: Sat May 09, 2009 11:55 pm UTC

### Re: Create a perfect lossless compression algorithm

Well, there’s the trivial encoding…
Spoiler:
Start with the first non-zero bit of the file, and transfer all the bits from after it to the end of the file. This is like sending the value of the file as a number, but the first digit would always be a 1 so that gets ignored. The files “…0001” and “…0000” have to be sent as full-length strings (well, if Alice can send the empty message and have Bob receive it, then only “…0000” needs to become full-length.)

Also, if the connection is something like a direct fibreoptic line…
Spoiler:
Alice and Bob obtain identical clocks, and agree to a basic unit of time, like the millisecond. Alice sends exactly two bits to Bob, separated in time by a number of milliseconds equal to the numerical value of the file plus one.

Cost: \$2
Max time per file: 17.5 minutes
Profit: Over \$3,599,993 per hour
wee free kings

jaap
Posts: 2094
Joined: Fri Jul 06, 2007 7:06 am UTC
Contact:

### Re: Create a perfect lossless compression algorithm

Qaanol wrote:Also, if the connection is something like a direct fibreoptic line…
Spoiler:
Alice and Bob obtain identical clocks, and agree to a basic unit of time, like the millisecond. Alice sends exactly two bits to Bob, separated in time by a number of milliseconds equal to the numerical value of the file plus one.

Cost: \$2
Max time per file: 17.5 minutes
Profit: Over \$3,599,993 per hour

I think you miscalculated there.
Spoiler:
The numerical value of the file is some number with a value of up to 2^1048576, not merely 1048576, so the transmission may take up to 2^1048576 milliseconds. That is somewhat too long to wait for.

Yakk
Poster with most posts but no title.
Posts: 11129
Joined: Sat Jan 27, 2007 7:27 pm UTC
Location: E pur si muove

### Re: Create a perfect lossless compression algorithm

HonoreDB wrote:Yes, Bob knows when the transfer is over (otherwise I think this is impossible). And your spoilered analysis describes the sort of thing I'm looking for, and constitutes a proof that it's optimal...the rest of the challenge is coming up with an algorithm that actually does that. Obviously there are many, but I've come up with one very simple one.

Spoiler:
If your sequence ends in a 1, the rest of the file consists of 0s.
If your file ends in a 0, the rest of the file consists of 1s.

0 is 011111111...
1 is 100000000...
00 is 0011111111...
01 is 010000000...
10 is 101111111...
11 is 110000000...
as you can easily see, these do not overlap. So I'm assigning each of the 2n-1+2n-2+...+21 shorter-than-n length nonempty strings a unique string of length 2n.

Each string of bits shorter than 2^n corresponds to a distinct string of length 2n. There are 2n-2 such strings (yes, -2) that are non-empty.

Thus there are 2 strings left, and I assign the empty message to one, and encode the other in full. Can you see what the two remaining strings are? Try it with n=2.

It took me a while to convince myself that there really, really where only 2 remaining strings. The real trick is that 2n-1 catches a whole pile of strings (half of them!) quite obviously. Of the remaining strings, 2n-2 catches half of them as well, etc. all the way down. When you try to construct a counter example to the two string case, you'll note how the compression algorithm forces you to generate one of two strings...
Last edited by Yakk on Mon Dec 17, 2012 8:29 pm UTC, edited 1 time in total.
One of the painful things about our time is that those who feel certainty are stupid, and those with any imagination and understanding are filled with doubt and indecision - BR

Last edited by JHVH on Fri Oct 23, 4004 BCE 6:17 pm, edited 6 times in total.

dudiobugtron
Posts: 1098
Joined: Mon Jul 30, 2012 9:14 am UTC
Location: The Outlier

### Re: Create a perfect lossless compression algorithm

Is this a disguised joke about imperial measurement (how many bits to the Bob)?

Also, if they are prepared to lose money on some files, they might be able to increase their expected value by taking a few gambles. And they could then lower that risk to 0 by just agreeing to transfer the whole file if it would be too long. I guess they'd need to know something about what the files are likely to look like, or whether they're completely random, etc... But I think they could use some sort of statistical approach to choose an algorithm that was great most of the time, and so better overall.

Another thing to note is that the information they are sending includes the length of the bitstring. So they could use this to encode some extra information.

Another thing they could do is to have a few different compression algorithms which work better with different types of file, and then use the first few bits to describe which algorithm they're using. They lose a few dollars, but the potential gain in being able to optimise the algorithm for the particular file seems like it would be worth it.
Last edited by dudiobugtron on Mon Dec 17, 2012 8:24 pm UTC, edited 1 time in total.

Qaanol
The Cheshirest Catamount
Posts: 3069
Joined: Sat May 09, 2009 11:55 pm UTC

### Re: Create a perfect lossless compression algorithm

jaap wrote:
Qaanol wrote:Also, if the connection is something like a direct fibreoptic line…
Spoiler:
Alice and Bob obtain identical clocks, and agree to a basic unit of time, like the millisecond. Alice sends exactly two bits to Bob, separated in time by a number of milliseconds equal to the numerical value of the file plus one.

Cost: \$2
Max time per file: 17.5 minutes
Profit: Over \$3,599,993 per hour

I think you miscalculated there.
Spoiler:
The numerical value of the file is some number with a value of up to 2^1048576, not merely 1048576, so the transmission may take up to 2^1048576 milliseconds. That is somewhat too long to wait for.

Right you are. However, with a minor twerk it still allows significant improvement:

Spoiler:
If Alice sends each bit within 0.002s of the previous, she can encode 2 bits per dollar (the bit transferred, plus a timing bit), and that’s just at millisecond accuracy. This cuts her total number of transmissions in half, for a minimum profit of \$524,288 per message, and an upper bound of 17.5 minutes per message.
wee free kings

HonoreDB
Posts: 161
Joined: Tue Feb 06, 2007 4:32 pm UTC
Contact:

### Re: Create a perfect lossless compression algorithm

Yakk's solution is the one I had in mind, and Qaanol's (first) one is just as good. The other one looks like a promising avenue for cheats, too...

jestingrabbit
Factoids are just Datas that haven't grown up yet
Posts: 5967
Joined: Tue Nov 28, 2006 9:50 pm UTC
Location: Sydney

### Re: Create a perfect lossless compression algorithm

dudiobugtron wrote:Is this a disguised joke about imperial measurement (how many bits to the Bob)?

Also, if they are prepared to lose money on some files, they might be able to increase their expected value by taking a few gambles. And they could then lower that risk to 0 by just agreeing to transfer the whole file if it would be too long. I guess they'd need to know something about what the files are likely to look like, or whether they're completely random, etc... But I think they could use some sort of statistical approach to choose an algorithm that was great most of the time, and so better overall.

Another thing to note is that the information they are sending includes the length of the bitstring. So they could use this to encode some extra information.

Another thing they could do is to have a few different compression algorithms which work better with different types of file, and then use the first few bits to describe which algorithm they're using. They lose a few dollars, but the potential gain in being able to optimise the algorithm for the particular file seems like it would be worth it.

But, no matter what they do, they have to have a pairing between all 2^(2^20) files and their possible messages. To maximise profit they need to use all the binary strings that are shorter than 2^20 in length, and only one or two of the length 2^20 strings (depending on whether the empty string counts as sendable). Whatever cunning you might want to deploy, there is nothing better than the procedures that are laid out by Yakk and Qaanol, though there are certainly a whole lot of strategies that are just as good (though I doubt that there are any simpler to explain).
ameretrifle wrote:Magic space feudalism is therefore a viable idea.

dudiobugtron
Posts: 1098
Joined: Mon Jul 30, 2012 9:14 am UTC
Location: The Outlier

### Re: Create a perfect lossless compression algorithm

jestingrabbit wrote:But, no matter what they do, they have to have a pairing between all 2^(2^20) files and their possible messages. To maximise profit they need to use all the binary strings that are shorter than 2^20 in length, and only one or two of the length 2^20 strings (depending on whether the empty string counts as sendable). Whatever cunning you might want to deploy, there is nothing better than the procedures that are laid out by Yakk and Qaanol, though there are certainly a whole lot of strategies that are just as good (though I doubt that there are any simpler to explain).

This makes sense - and yes you are right, assuming any bitstring is equally likely (which we can assume given the 'random' in the question), then any compression algorithm that does this is as good as any other.

However, there is one extra piece of information they can transmit, and that is what was sent in the previous file. Is there any way they can use this to their advantage?

Lopsidation
Posts: 183
Joined: Tue Oct 27, 2009 11:29 pm UTC

### Re: Create a perfect lossless compression algorithm

dudiobugtron wrote:However, there is one extra piece of information they can transmit, and that is what was sent in the previous file. Is there any way they can use this to their advantage?

No, because
jestingrabbit wrote:no matter what they do, they have to have a pairing between all 2^(2^20) files and their possible messages

dudiobugtron
Posts: 1098
Joined: Mon Jul 30, 2012 9:14 am UTC
Location: The Outlier

### Re: Create a perfect lossless compression algorithm

For ease of illustration, let's take the situation where the messages are only 2 bits long (instead of 1048576). There are four message they might want to send:
00, 01, 10 or 11
What you're saying is that they could encode two of these as 0 and 1, but that still leaves two which need a full 2 bits to describe them. Let's say 0 = 00, and 1 means 11, which means we have four potential ways to describe 01 and 10. So, we could say that 01 and 00 represented 01, and 10 and 11 represented 10. This means that when we send the message, if it's two bits long, we can encode and extra bit of information in our choice of which one we used to describe the message.

Now, I can't see how we could use this to our advantage, since we don't know anything about what the next message will be yet. But I thought it was worth asking anyway, in case someone else could think of a way of using it. I'm happy to accept that it's not possible to use that information if that turns out to be the case however, which I think is likely.

Yakk
Poster with most posts but no title.
Posts: 11129
Joined: Sat Jan 27, 2007 7:27 pm UTC
Location: E pur si muove

### Re: Create a perfect lossless compression algorithm

dudiobugtron wrote:For ease of illustration, let's take the situation where the messages are only 2 bits long (instead of 1048576). There are four message they might want to send:
00, 01, 10 or 11
What you're saying is that they could encode two of these as 0 and 1, but that still leaves two which need a full 2 bits to describe them. Let's say 0 = 00, and 1 means 11, which means we have four potential ways to describe 01 and 10. So, we could say that 01 and 00 represented 01, and 10 and 11 represented 10. This means that when we send the message, if it's two bits long, we can encode and extra bit of information in our choice of which one we used to describe the message.
Yes, by having 2 messages with 2 stops in them, you can send more data than by sending 1 message with 1 stop. The entire joke here is that we are using stops to encode extra data.

If, however, you don't know the message you want to send next time, you are out of luck -- you could send gigabytes of data prior to knowing a message, and if the message is uniform and random the gigabytes of data won't let you do any better than the above tricks.
One of the painful things about our time is that those who feel certainty are stupid, and those with any imagination and understanding are filled with doubt and indecision - BR

Last edited by JHVH on Fri Oct 23, 4004 BCE 6:17 pm, edited 6 times in total.

mward
Posts: 123
Joined: Wed Jun 22, 2011 12:48 pm UTC

### Re: Create a perfect lossless compression algorithm

dudiobugtron wrote:For ease of illustration, let's take the situation where the messages are only 2 bits long (instead of 1048576). There are four message they might want to send:
00, 01, 10 or 11
What you're saying is that they could encode two of these as 0 and 1, but that still leaves two which need a full 2 bits to describe them.

They can also send an empty file: which gives three possibilities. The encoding scheme deletes all the bits at the last 01 or 10 transition: the decoder then pads the file with the opposite of the last bit sent. Let's say that they assume that the "current bit" before sending any bits is a 1. Then an empty file tells the decoder that the file should be padded with zeros. We have:

empty file = 00
0 = 01
1 = 10
11 = 11

Note that if the person creating the sequence of "random" files knows, or can deduce, Alice and Bob's scheme then he/she can prevent them making any profit just by getting them to send files of all 1's. Since there are no transitions, no bits can be deleted from the end of the file.

Yakk
Poster with most posts but no title.
Posts: 11129
Joined: Sat Jan 27, 2007 7:27 pm UTC
Location: E pur si muove

### Re: Create a perfect lossless compression algorithm

Spoiler:
So, assuming we can "end a message" for free, and can send empty messages, we actually have a trinary system, but one without an end-of-message indicator.

binary bits
3^k = 2^2^20
k = log_3(2^2^20)
k = 2^20 ln(2) / ln(3)
k =~ 661578 (after rounding up)
So we send 661578 trinary bits, for a profit of at least 386998\$ per message, and an average profit of 607524\$ because the stop messages aren't bits, and hence are free.

We can do better, reaching a limit of 2^20-1 dollars of profit per message, by simply sending a unary encoded string of empty messages, followed by a non-empty 1 bit message to indicate that the message has ended. But that takes much longer.
One of the painful things about our time is that those who feel certainty are stupid, and those with any imagination and understanding are filled with doubt and indecision - BR

Last edited by JHVH on Fri Oct 23, 4004 BCE 6:17 pm, edited 6 times in total.

dudiobugtron
Posts: 1098
Joined: Mon Jul 30, 2012 9:14 am UTC
Location: The Outlier

### Re: Create a perfect lossless compression algorithm

For the purposes of the question, is it possible they will get the same message twice? If not, they could gradually increase their efficiency.

mfb
Posts: 950
Joined: Thu Jan 08, 2009 7:48 pm UTC

### Re: Create a perfect lossless compression algorithm

Based on Yakk's last post:

Spoiler:
If we can send "end of message" for free, it is possible to improve the bit/cost-ratio with more "end of message" inside:
Use 3 bits to decode a number n between 0 and 7, send n times "end of message", send the 4th bit afterwards (without an "end of message" here). Repeat until all 2^20 bits are sent.
Money gained: \$ 3/4*2^20
Sent signals: On average 9/8*2^20+1
Money/sent signal: \$ 2/3

Alternative: Encode 2 bits as "end of message"
Money gained: \$ 2/3 *2^20
Sent signals: On average 5/6 * 2^20
Money/sent signal: \$ 4/5

Hmm, even better. With more bits encoded as string of "end of message", you can get more money per message. But can we get more money per sent data?

With clocks, and within their lifetime (:D):
Use milliseconds without data transfer similar to "end of message" in the method above (but use 1 to 2^n instead of 0 to 2^n-1, of course). If the amount of messages they can get is somewhat limited, they might choose to encode more bits as empty milliseconds. With 9+1 bits per group, the groups will need an average of ~250ms, for a total message length of ~25000s or 7 hours, and give ~\$750000. Nice.

@dudiobugtron: They have to send MANY messages for that .

Yakk
Poster with most posts but no title.
Posts: 11129
Joined: Sat Jan 27, 2007 7:27 pm UTC
Location: E pur si muove

### Re: Create a perfect lossless compression algorithm

Spoiler:
We can do better.

Suppose the fastest we can send a bit or an end of message is X Hz.

What we do is we decide at each 1/X seconds to either send an "end of message" or send nothing.

This lets us send a binary stream without actually paying for any data, generating X\$/second of cash flow.

Can we do better if we slip some 0s and 1s into it (that cost 1\$ each) in order to increase the data flow? Maybe! I mean, each 0 or 1 transmits a bit of data (is it a 0 or a 1?) for 1\$, so is revenue neutral in that sense. It does, however, take up a transmission slot (which would have profited us 1\$). But the fact we are sending a 0 or a 1 is some information.

Suppose we did a quad value transmission scheme. (0, 1, end of message, no data). This presumes we are allowed to send no data even in the middle of a message. Then each 1/X seconds we get to send 2 bits, and this costs us 0.5\$ on average. So we spend (X/2)\$ per second on this transmission, and earn 2X\$ per second, generating 1.5X\$/second in profit -- more income than the "don't send any data" model!

Really, I need to pull out calculus and model this to see where the optimal point is.
One of the painful things about our time is that those who feel certainty are stupid, and those with any imagination and understanding are filled with doubt and indecision - BR

Last edited by JHVH on Fri Oct 23, 4004 BCE 6:17 pm, edited 6 times in total.

dudiobugtron
Posts: 1098
Joined: Mon Jul 30, 2012 9:14 am UTC
Location: The Outlier

### Re: Create a perfect lossless compression algorithm

I'm not sure a time-dependent solution would work, since they have to be 100% sure that they get the file right.

mfb
Posts: 950
Joined: Thu Jan 08, 2009 7:48 pm UTC

### Re: Create a perfect lossless compression algorithm

Yakk wrote:Really, I need to pull out calculus and model this to see where the optimal point is.

Did that.
Spoiler:
An ideal encoding can be described by the probabilities of e (end of message), n (no bit), 0 and 1. The number of bits (data) per bit (sent) is simply the entropy of the message. pe=pn and p0=p1 are necessary to get the optimal data transmission, and the relation between those can be chosen to maximize whatever we want.
Bits of data per bits sent is given by H=-2*p1 log2(p1) - 2*(1/2-p1) log2(1/2-p1). Money cost per bit sent is 2p1.

Money per time is H-2p1 multiplied with the data rate.
Plot at WolframAlpha

Setting its derivative to 0 gives log(1/2-x) = log(2x) or x=1/6.
In other words, 1/3 of the bits are 0 or 1 and 2/3 are end of message or no bit.
At this point, money per data rate is log(27)/log(8) or about 1.585.

If we cannot refuse to send bits, or cannot use timing information, the optimal ratio is at x=1/4 (so 50% are end of message) and \$1 per bit.

As we transmit 2^20 bits of data, effects from the finite size of the message are negligible.

By the way, what happened to the math tags? They do not get parsed (at least at my computer).

dudiobugtron
Posts: 1098
Joined: Mon Jul 30, 2012 9:14 am UTC
Location: The Outlier

### Re: Create a perfect lossless compression algorithm

mfb wrote:By the way, what happened to the math tags? They do not get parsed (at least at my computer).

They're no longer with us:
viewtopic.php?f=10&t=99137