What if there was a forum for discussing these?

Messysaurus wrote:Why wouldn't the number of string possibilities be 140^27 instead of 27^140 (first sentence)? My thought process: if you were to look at numbers 1-1000, there are 10 characters and with a length of 3 characters; 10^3 = 1000 possibilities, not 3^10 = ~57k. I apologize in advance if I'm missing something obvious.

You're right that with a length of 3 characters and 10 possible digits for each of those characters, there are 10^3 = 1000 possibilities. You're just getting the application wrong when it comes to the twitter situation. For the 3 character, 10 digit case, we have length = 3 and possible characters = 10, and the number of possible strings is calculated by

possible strings = (possible characters)^(length) = 10^3 = 1000

For the twitter case, we have length = 140 and possible characters = 27, so indeed

possible strings = (possible characters)^(length) = 27^140

and not 140^27.

?ecnetnes siht ni retcarahc tsal eht sseug ot drah ti sI

(I tried a few compressors to compress Huckleberry Finn from the Gutenberg project, but they all seemed to fail miserably. 25% was the best result. None of them had the option "treat this file as plain English", though. I think they should come up with the idea themselves)
This what-if reminded me of this:

It's an analysis of how many possible songs there can be.

MythSearcher wrote:This reminds me of 2 things:
1) The Freefall comic (can't find the right one, but it is about tweeting all possible combination of letters and dictionary words and storage capacity limits)
2) There are leap years. =_,= (super evil grin)

That conversation between Florence and Dvorak starts here:
http://freefall.purrsia.com/ff2100/fc02056.htm

DR6 wrote:Even then, if we assumed that only the first 100 languages are relevant, (or 10 for that matter) already makes a difference. (You end up with 10^53 and 10^52 respectively).

Actually, what's really relevant in your consideration is the increased entropy, i.e. you assume 1.2 bits per character instead of 1.1. A ~10%-change in entropy makes a 10%-change to the binary exponent (or "effective" text length) rather than to the result, so you easily change the result by orders of magnitude..
Now even in the what-if text an uncertainty range for the normal English entropy is admitted to be on the order of 20% (1.0 to 1.2 bits per character, thus the 1.1 "mean" value) and I assume this number might change with time or with the group of people (or even method) chosen to determine the entropy in the first place.
The total uncertainty of this calculation is therefore already many orders of magnitude, so that considering other languages most likely lies within that uncertainty...

Hi guys! This is my first post to the forums. I was inspired by the new whatif to create a bookmarklet that changes everything on twitter to either "there's a horse in aisle five" or "my house is full of traps"

`javascript:(function(){var elements=document.getElementsByTagName("p");for(var i=0; elements[i]; i++){var e=elements[i];if(e.className == "js-tweet-text"){e.innerHTML=((Math.random()<0.5) ? "My house is full of traps." : "There's a horse in aisle five.");}}}());`

orthogon wrote:I have to admit to not getting Twitter. I mean, I get the idea, but I hate the way that the character limit makes writers whom I admire for their lucid, erudite, clear and insightful prose in other media produce awkward, garbled and ugly tweets.

The thing that annoys me about Twitter is that once you start tweeting, everybody expects you to keep tweeting pretty much 24/7, so if you go offline for a few hours people start asking where you went and act annoyed that you didn't keep posting updates. It's like they assume that they have a God-given right to know your status at all times--sort of like those people who call your cell phone and act annoyed that you can't suddenly drop everything any time of the day or night to talk to them, just because it's a cell phone and not a landline.

Very similar to the Van Loon quote on this What If is an excellent song call Randy Described Eternity by Built To Spill, which starts with these lyrics:

Every thousand years
This metal sphere
Ten times the size of Jupiter
Floats just a few yards past the earth
And take a swipe at it
With a single feather
Hit it once every thousand years
'til you've worn it down
To the size of a pea
Yeah I'd say that's a long time
But it's only half a blink
In the place you're gonna be

Tried linking to the song and the lyrics in full, but got denied as "spam". Hmm. Please go give the song a spin on YouTube.

YellowYeti wrote:What proportion have to spell 'lose' as 'loose' before it becomes the correct spelling?
I don't know, but I'm tipping "weary" to become the correct spelling for "wary" first.

cantab314 wrote:
YellowYeti wrote:What proportion have to spell 'lose' as 'loose' before it becomes the correct spelling?
I don't know, but I'm tipping "weary" to become the correct spelling for "wary" first.

If you'll notice, "Defiantly" has already overtaken "Definitely"
For comparison, that means that if the cabbage guy from Avatar: The Last Airbender filled up his cart with lettuce instead, it would be about a quarter of a lethal dose.

There's a horse in aisle five. Good luck reassembling it from all the frozen beefburgers and lasagne though.
If you have 7 billion people speaking all possible tweets, it'd only take 6.523 eternal minutes. ((10^47 seconds / 7 billion / (10^32 years)) days)

silverkitty wrote:"To a normal English speaker, “Hi, I’m Mxyztplk” is basically indistinguishable from “Hi, I’m Mxzkqklt” "
...how many English speakers have to recognize something before it becomes "normal"?

In the 1970's Americans learned to recognize and say Zbigniew Brzezinski. (Hah. Spell check wants to change Zbigniew to ignitible.)

I wz srprzd no1 pntd out...

Bah, I can't do it anymore.

I was surprised that no one has pointed out...
that Twitter posts are often pre-compressed, meaning their information density is higher than 1-1.2 bits per letter.
But I suppose he was talking about 'proper English' sentences.

Randall wrote:Hi, I’m Mxyztplk

Dammit! Now I'll need to change my password everywhere.

We'll never run out of things to say, but that doesn't stop us from repeating/retweeting ourselves continually.

There's a horse in aisle five. Hypothetically speaking, Summer Glau would be more likely to tweet that her house is full of traps than inform the world that there's a horse in aisle five. I guess. In any case, that possibility seems more awesome.
FarAlSamShaidar wrote:I wrote this all out expecting the difference to be larger than that, though now that I think about it I can see why it's not a HUGE difference. Still, off by more than 2x, it's bigger than rounding errors.

It's off by a factor of exactly* 2.2, but you've overlooked the biggest rounding error. The uncertainty in the information density of English text is +/-0.1 bits per character, which is magnified by the exponentiation and the length of a Twitter post to be +/- four orders of magnitude**.

*Well, 2.2 - 1/(2^152.9), which is close enough.
**A factor of 16384, to be precise.

The thing that gets me is wrestling with the sense of scale. A bird wearing away a mountain one speck at a time is clearly going to take a long time. But this is no ordinary mountain.

for example, the top 37.5% of that mountain is in outer space. That's 11 Mt. Eversts in the atmosphere and 7 Mt. Everests in space on top of it. And that's from Sea level to the top of Everest. It's almost double if you just count base to summit height.
Granted it is a fairly spiky mountain - it's slope is a bit steeper than 60 degrees, so that should cut down on it's volume, but still...

And then there's the fact that it only gets worn away a speck every 1,000 years. In that 10,000 years from the invention of writing to the present the bird has worn away 10 mm^3 of mountain. 1/10 of a ml. That's a fingernail clipping.

eculc wrote:
cantab314 wrote:
YellowYeti wrote:What proportion have to spell 'lose' as 'loose' before it becomes the correct spelling?
I don't know, but I'm tipping "weary" to become the correct spelling for "wary" first.

If you'll notice, "Defiantly" has already overtaken "Definitely"

Oh, Dear God. The poor spellers have won.
Oh. I am so sorry. They didn't intend to.

I know this; Because, I am one of 'them'.
Spelling is hard. Good spellers deserve our respect.
What they want is for us to spell well.

I do what I can. Spelling well is not something I can do.
Zen:
If a word in the dictionary were misspelled; How would we know?

It is not funny to some people. I knew a woman that could spell.
When she found out I could not spell, it stressed our relationship.

No color is as off putting as ignorance.
She saw it as willful ignorance.
It's not.

Poor Spellers Untie was a great slogan when we had no chance of winning.
It's not so funny, now.
JesterBLUE wrote:The thing that gets me is wrestling with the sense of scale. A bird wearing away a mountain one speck at a time is clearly going to take a long time. But this is no ordinary mountain.

for example, the top 37.5% of that mountain is in outer space. That's 11 Mt. Eversts in the atmosphere and 7 Mt. Everests in space on top of it. And that's from Sea level to the top of Everest. It's almost double if you just count base to summit height.
Granted it is a fairly spiky mountain - it's slope is a bit steeper than 60 degrees, so that should cut down on it's volume, but still...

And then there's the fact that it only gets worn away a speck every 1,000 years. In that 10,000 years from the invention of writing to the present the bird has worn away 10 mm^3 of mountain. 1/10 of a ml. That's a fingernail clipping.

That's exactly the point. To boggle the mind.

The quote that came to my mind when I read this was that one from the Quran:
Were every tree on earth a pen
And were the ocean filled with ink
With seven oceans more
Even so the words of God
Would not be exhausted

(Luq'man 31:27)

(which, if you interpret it to mean "words about God," is a rather humorous description of the rate at which theology is published).

Regarding the question of using multiple languages, the amount of overlap would vary significantly depending on how we define the difference between a language and a dialect. For example, many linguists consider English and Scots and to be two different languages, because of their history, despite the fact that they're largely mutually intelligible and a number of sentences could be written identically in both. They're an example of convergent evolution. And many other languages are in similar situations.
Another thing to consider is the fact that the Chinese characters used to write a sentence in, say, Mandarin, can often be read as a meaningful sentence in other Chinese langauges like Cantonese (albeit pronounced differently), and may even be meaningful in Japanese (though probably not as a complete sentence). Thus the number of possible sentences written in Chinese script is significantly lower per language than those written in Roman script.

tibfulv wrote:Hm. I never knew the lands where that mountain was was supposed to be Svithjod. If I remember correctly, that's the norse name for Sweden, or as Svithjod hin mikla (Great Svithjod), Russia. Based on the Karakorum?

huanghos bookmarklet is working beautifully, too.

I don't know which particular Nordic language spells it suchly, but in Icelandic, Sweden is "Svíþjóð". With time most Nordics lost the þ and ð in favor of th and d (the latter being an especially bad shift, ð and d do not sound similar). Þjóð means nation - hence, the "Sví Nation"

computronium wrote:Very similar to the Van Loon quote on this What If is an excellent song call Randy Described Eternity by Built To Spill, which starts with these lyrics:

Every thousand years
This metal sphere
Ten times the size of Jupiter
Floats just a few yards past the earth
And take a swipe at it
With a single feather
Hit it once every thousand years
'til you've worn it down
To the size of a pea
Yeah I'd say that's a long time
But it's only half a blink
In the place you're gonna be

Tried linking to the song and the lyrics in full, but got denied as "spam". Hmm. Please go give the song a spin on YouTube.

Apparently the both of us registered to talk about eternity, not Twitter. What the Van Loon quote reminded me of was an even older reference to eternity from a hellfire-and-brimstone sermon in A Portrait of The Artist as a Young Man; according to the Google Books link, the Van Loon book was published in 1921, and Portrait of the Artist was published in 1916 (and serialized prior to that.) I had to read it in AP Lang, you see, and the eternity sermon is basically the one thing that's stayed with me after the rest of the book disappeared. I'm paraphrasing, but eternity is described thusly: imagine a mountain of sand a million miles high. Now imagine a bird flies to the mountain every million years and removes one grain of sand. Then when the mountain is gone, the bird once again flies to where the mountain was and replaces it, one grain at a time, every million years. When the mountain has disappeared and reappeared once, that is not even an instant in the span of eternity.

I would imagine that the idea of such a mountain to illustrate eternity predates Joyce as well. But the Van Loon quote works better for illustrating the Twitter question.

eculc wrote:
cantab314 wrote:
YellowYeti wrote:What proportion have to spell 'lose' as 'loose' before it becomes the correct spelling?
I don't know, but I'm tipping "weary" to become the correct spelling for "wary" first.

If you'll notice, "Defiantly" has already overtaken "Definitely"

Oh, Dear God. The poor spellers have won.
Oh. I am so sorry. They didn't intend to.

I know this; Because, I am one of 'them'.
Spelling is hard. Good spellers deserve our respect.
What they want is for us to spell well.

I do what I can. Spelling well is not something I can do.
Zen:
If a word in the dictionary were misspelled; How would we know?

It is not funny to some people. I knew a woman that could spell.
When she found out I could not spell, it stressed our relationship.

No color is as off putting as ignorance.
She saw it as willful ignorance.
It's not.

Poor Spellers Untie was a great slogan when we had no chance of winning.
It's not so funny, now.
This may be one of the best things I've ever read on the internet ever.

DR6 wrote:Ah, but we are talking about possible tweets, without accounting how probable they are.

Interestingly, as he bases his calculations on entropy, his answer really counts probable English sentences! As soon as sentences do not all have the same probability we get a value that is too low. To take an extreme example, if we had a language with 1048578 valid 140 character sentences, but where people most of the time (like 1048575 times in 1048576) used just two of them we get 2.0000273952 sentences which we'd round to two, instead of over a million. Now I wonder to what extent this affects the value for English sentences...
If a word in the dictionary were misspelled; How would we know?

Fortunately, there is no "THE dictionary", there are only "dictionaries". Thus we can apply the Byzantine Generals Algorithm to the various dictionaries to determine which is the "correct" spelling. And if it's misspelled in all the dictionaries? Well, then it's not really misspelled, as they are the definitive source of correct spelling. There is no nebulous "correct spelling" in the ether, only that which we define for ourselves through common acceptance.

FarAlSamShaidar wrote:There's a rather large problem with Randall's math. Thought I'd NEVER say that. But this is not like binary in that leading (or trailing, depending on endian-ness) 0s don't affect the result. In other words, messages of 139 characters in length are wholly different than those of 140 characters. Even more so for those of 100 characters in length. Etc. If we assume that Mr. H. wants any string of characters that are English, rather than complete and logical sentences (which is, more or less, the assumption Randall makes) then even one-letter messages such as "I" and "a" are valid. Using 1.1 bits per letter then, the proper answer is 2140*1.1 + 2139*1.1 + 2138*1.1 ... + 22*1.1 + 21*1.1; or in other words (sorry, I don't really know LaTeX or if it can be used in forums) SUM(2n*1.1, 1, 140). That gives an answer of approximately 4.28*1046.

I wrote this all out expecting the difference to be larger than that, though now that I think about it I can see why it's not a HUGE difference. Still, off by more than 2x, it's bigger than rounding errors.

Thank you, I wanted to say exactly that.
Oktalist wrote:"If a million monkeys were given a million typewriters, eventually one of them might produce the complete works of Shakespeare, but to reach it would it be worth wading through four hundred copies of 'Money' by Martin Amis?"
- Simon Munnery

I'm just a dumb American, and I don't know who Simon Munnery is, but I do know that Martin Amis is a freakin' genius. So I'd have to answer "yes."

Does anyone know where I can find the small community that repeated the same six posts over and over in the same order for ten years?
snowyowl wrote:Does anyone know where I can find the small community that repeated the same six posts over and over in the same order for ten years?

Why would a group of People do such a stupid thing?
snowyowl wrote:Does anyone know where I can find the small community that repeated the same six posts over and over in the same order for ten years?

Why would a group of People do such a stupid thing?

because they can. that's the only reason required.

### What-if 34: Twitter, ebook compression

So I've tried compressing a .txt ebooks and I have yet to reach the acclaimed 1/8th size of compression. I've found bzip2 seems to compress text the best by a fair margin (vs zip, gz, 7z) and it still only achieves about 1/4th. Am I doing it wrong? Or is it just not an ideal enough case to reach 1/8th?

I used "The History of Pottery Part 1 by H. B. Walters" on Project Gutenberg since it's fairly large (over 1MB) and fairly recent (2013) so I figured there wouldn't be any weird change in character frequency from being written a long time ago.

1. I converted it from utf-8 to ascii using iconv
2. I zipped it using bzip2 --best --keep
3. I divided the filesize of the compressed file by the uncompressed and got roughly .26, not the .13ish I was hoping to

eculc wrote:
cantab314 wrote:
YellowYeti wrote:What proportion have to spell 'lose' as 'loose' before it becomes the correct spelling?
I don't know, but I'm tipping "weary" to become the correct spelling for "wary" first.

If you'll notice, "Defiantly" has already overtaken "Definitely"

Now I'm wondering if there are people counting uses of each spelling variant. I'm also wondering if spell checking software is causing language shifts to move towards increasing numbers of homonyms.

### Re: What-if 34: Twitter, ebook compression

mgold wrote:So I've tried compressing a .txt ebooks and I have yet to reach the acclaimed 1/8th size of compression. I've found bzip2 seems to compress text the best by a fair margin (vs zip, gz, 7z) and it still only achieves about 1/4th. Am I doing it wrong? Or is it just not an ideal enough case to reach 1/8th?

The mentioned compressors work without context, which is part of the point of them but makes predictions somewhat unreliable.
I suspect there's a small error in this What-If. The article states,
This means that a good compression algorithm should be able to compress ASCII English text—which is eight bits per letter—to about 1/8th of its original size.

However, ASCII is and has always been a 7-bit character set. (Various microcomputer manufacturers and operating system developers have devised their own 8-bit character sets whose lower 7 bits are fully or mostly identical to ASCII, but the sets as a whole were never properly referred to as ASCII, even by their creators.) The compression ratio given in the article is probably a bit off then.

