1286: "Encryptic"

This forum is for the individual discussion thread that goes with each new comic.

Moderators: Moderators General, Prelates, Magistrates

User avatar
gmalivuk
GNU Terry Pratchett
Posts: 26836
Joined: Wed Feb 28, 2007 6:02 pm UTC
Location: Here and There
Contact:

Re: 1286: "Encryptic"

Postby gmalivuk » Tue Nov 05, 2013 12:17 am UTC

MadH wrote:
PinkShinyRose wrote:
Title Text wrote:It was bound to happen eventually. This data theft will enable almost limitless [xkcd.com/792]-style password reuse attacks in the coming weeks. There's only one group that comes out of this looking smart: Everyone who pirated Photoshop.

Wouldn't that be most amateur photoshop users?
I can't figure out if you're saying most of the people who pirate Photoshop are amateurs (true) or that only amateurs pirate Photoshop (false).
I think both of those are ways to interpret "mostly amateur photoshop users", but "most amateur photoshop users" means >50% of amateur photoshop users. It doesn't say anything about the fraction of Photoshop piraters.
Unless stated otherwise, I do not care whether a statement, by itself, constitutes a persuasive political argument. I care whether it's true.
---
If this post has math that doesn't work for you, use TeX the World for Firefox or Chrome

(he/him/his)

User avatar
edo
Posts: 436
Joined: Thu Mar 28, 2013 7:05 pm UTC
Location: ~TrApPeD iN mY PhOnE~

Re: 1286: "Encryptic"

Postby edo » Tue Nov 05, 2013 1:04 am UTC

When I look at "Perloined" I think of passwords I've created that are based on translations of English words into other languages...
Co-proprietor of a Mome and Pope Shope

ps.02
Posts: 378
Joined: Fri Apr 05, 2013 8:02 pm UTC

Re: 1286: "Encryptic"

Postby ps.02 » Tue Nov 05, 2013 5:12 am UTC

PinkShinyRose wrote:None of the Japanese names exceed 6 characters in length, they are in katakana which are also on Unicode plane 0 and should therefore be of the same length as Roman characters.

Hmmm, unlikely. Block ciphers work in bytes, not in characters. (For DES and 3DES, each block is 8 bytes. For many newer ciphers, 16 bytes.) Now, note that the comic depicts some passwords as being a single 8-byte block. This implies one of 3 things:

(1) Adobe used an encoding such as UTF-8 where some ASCII characters are a single byte, but other Plane 0 characters are 2 or 3 bytes; or

(2) Adobe used an encoding such as UTF-16 where any Plane 0 character is the same length (2 bytes) but also that they allowed users to pick passwords as short as 4 characters, or

(3) There's a separate character set marker not shown in the comic, so different language groups can use different 1-byte characters, using legacy encodings.

...Or of course,

(1a) Adobe wasn't hip enough to allow non-ASCII passwords at all. (But that would be awfully lame, even for Adobe.)

User avatar
YellowYeti
Posts: 59
Joined: Fri Jul 27, 2012 6:05 am UTC

Re: 1286: "Encryptic"

Postby YellowYeti » Tue Nov 05, 2013 12:19 pm UTC

I'm submitting an entry for 'Sexy Earlobes', 'Best TOS episode' and 'Sugarland':

my understanding is that Earlobes and TOS contain the same first 8 characters, and that TOS and Sugarland have a final string in common:

Sexy Earlobes - Charlie Sheen ( Apparently there was some sitcom which had him do a routine about sexy earlobes )
Best TOS Episode - Charlie X
Sugarland - Malcolm X ( It seems some kind of seminar on Malcolm X was held at Sugarland, Tx )

Are these from the real leaked password files?

speising
Posts: 2367
Joined: Mon Sep 03, 2012 4:54 pm UTC
Location: wien

Re: 1286: "Encryptic"

Postby speising » Tue Nov 05, 2013 1:06 pm UTC

YellowYeti wrote:I'm submitting an entry for 'Sexy Earlobes', 'Best TOS episode' and 'Sugarland':

my understanding is that Earlobes and TOS contain the same first 8 characters, and that TOS and Sugarland have a final string in common:

Sexy Earlobes - Charlie Sheen ( Apparently there was some sitcom which had him do a routine about sexy earlobes )
Best TOS Episode - Charlie X
Sugarland - Malcolm X ( It seems some kind of seminar on Malcolm X was held at Sugarland, Tx )

Are these from the real leaked password files?


that can't be true.
'Charlie X' is not the best TOS episode. *startflamewar*

sotanaht
Posts: 246
Joined: Sat Nov 27, 2010 2:14 am UTC

Re: 1286: "Encryptic"

Postby sotanaht » Tue Nov 05, 2013 1:39 pm UTC

gmalivuk wrote:
cellocgw wrote:
gmalivuk wrote:
Draco18s wrote:Just as a commentary on the alt-text:
It's possible to have pirated Photoshop and have an Adobe.com password.

My problem in protecting myself from "[xkcd.com/792]-style password reuse attacks" is I have no idea what my Adobe password was before the attack, nor which sites I use that may have used the same one (I've got a half-dozen or so different passwords that conform to different "strength" checks).
If you only have half a dozen passwords that you reuse, none of them are strong.


No, they can be ultra-mega-strong regardless of how often they're used. It's just that the result of exposing one is much greater than if every site had its own password. Don't confuse probability with outcome.
Well yes, technically what is weak in that case is the whole security process you're using, rather than any one password+website combination.

Which means, on the other hand, also don't confuse individually strong passwords with strong security protocols.


6 passwords (at any given time, changing passwords regularly is still a good idea) is more than enough for a sufficiently strong protocol. Do what I do:

  • One weak password for everything that does not matter. Forum sites like XKCD get this password, as do game accounts with nothing of value to lose. Worst case scenario, someone can impersonate me on forums... I'm so scared (/sarcasm).
  • One strong password for everything containing (only) my credit/debit card information. Regardless of whether this is hacked on one site or a dozen the result is the same: canceling my cards and opening an identity theft case.
  • Individual strong passwords for Email (due to password recovery) and any individual account that is a considerable risk. For me, this is my bank, Steam, and WoW accounts. If someone hacks steam or WoW I could end up losing large amounts of invested time and or money and might not be able to get it back. Bank could probably get my debit password, but I'd rather not risk it.

Now if only forums and other non-threats would stop asking for "strong" passwords.

pnj
Posts: 2
Joined: Tue Nov 05, 2013 2:30 pm UTC

Re: 1286: "Encryptic"

Postby pnj » Tue Nov 05, 2013 2:41 pm UTC

in the crossword section, on the right hand size, there's 8 individual blanks. the 3rd password has a 2nd block, which is 1-8 characters, so it has an extra long blank box tacked on the end, for a variable length word.


This was a great explanation, but there's still something wrong with the number of blank spaces at the right. The first two passwords are identical and short. The third has the same prefix, and a suffix. So for these entries, the crossword should show one eight character block for all three prefixes, and then one block for the 3rd suffix. Instead there are two prefix blocks of eight letters (which would presumably be filled identically).

For password 8 (with your own hand...), there's no place to write the prefix solution.

Unless Randall was just being sloppy, which I think unlikely, then I don't understand what's going on in the solutions column.

gnutrino
Posts: 100
Joined: Sat Sep 06, 2008 9:02 am UTC
Location: Over the edge...

Re: 1286: "Encryptic"

Postby gnutrino » Tue Nov 05, 2013 4:47 pm UTC

AussieJono wrote:Just wanted to get in there and say that alpha, obvious and Michael Jackson are almost certainly abc or ABC. I don't know anything about encryption, so let me have this one and feel smart for five minutes.

YellowYeti wrote:I'm submitting an entry for 'Sexy Earlobes', 'Best TOS episode' and 'Sugarland':

my understanding is that Earlobes and TOS contain the same first 8 characters, and that TOS and Sugarland have a final string in common:

Sexy Earlobes - Charlie Sheen ( Apparently there was some sitcom which had him do a routine about sexy earlobes )
Best TOS Episode - Charlie X
Sugarland - Malcolm X ( It seems some kind of seminar on Malcolm X was held at Sugarland, Tx )


Good Job.

I'm gonna go ahead and guess that that "Duh" is password and "name1" is matthias1 so "57" is probably password57 and the one without a hint between them is password1.

I still can't get the monster mash/purloined/pokemon ones though, anybody got any more ideas?

YellowYeti wrote:Are these from the real leaked password files?

Almost certainly not

User avatar
PinkShinyRose
Posts: 835
Joined: Mon Nov 05, 2012 6:54 pm UTC
Location: the Netherlands

Re: 1286: "Encryptic"

Postby PinkShinyRose » Tue Nov 05, 2013 4:51 pm UTC

ps.02 wrote:
PinkShinyRose wrote:None of the Japanese names exceed 6 characters in length, they are in katakana which are also on Unicode plane 0 and should therefore be of the same length as Roman characters.

Hmmm, unlikely. Block ciphers work in bytes, not in characters. (For DES and 3DES, each block is 8 bytes. For many newer ciphers, 16 bytes.) Now, note that the comic depicts some passwords as being a single 8-byte block. This implies one of 3 things:

(1) Adobe used an encoding such as UTF-8 where some ASCII characters are a single byte, but other Plane 0 characters are 2 or 3 bytes; or

(2) Adobe used an encoding such as UTF-16 where any Plane 0 character is the same length (2 bytes) but also that they allowed users to pick passwords as short as 4 characters, or

(3) There's a separate character set marker not shown in the comic, so different language groups can use different 1-byte characters, using legacy encodings.

...Or of course,

(1a) Adobe wasn't hip enough to allow non-ASCII passwords at all. (But that would be awfully lame, even for Adobe.)

You're right, I didn't really know how UTF-8 worked but should have known from the assumption that a character is one byte long that the entire Unicode plane 0 would not be of equal length (as having all code points would take two bytes). Silly me, but what idiot came up with UTF-8 anyway, I mean: only ASCII characters are shorter than all other characters, it's ridiculous that someone came up with a standard to represent only unaccented Roman characters in the first place (especially coming from a country with a significant number of native French and Spanish speakers), let alone having someone do a do-over of the whole fiasco in the internet age, at a time Unicode was invented because the separate standards for each script were getting ridiculous. </rant>

To be fair: after all this, should it really surprise us if adobe allowed 4 character passwords? Actually some websites count bytes of password length, in which case adobe most probably allows 4 character passwords.

On the other hand: as the passwords are hashed anyway, would it really matter if one encoding standard is used consistently for all scripts, they could use shift-JIS if the password is in a Japanese script while using ASCII for passwords that use only unaccented Roman characters (and a lot of other standards for other scripts (possibly UTF-16 for mixed scripts)). If the same encoding is consistently used for a given input, would there be any need to mark the used encoding with the output? However, considering the lack of sophistication of their password system this seems unlikely.

sotanaht wrote:Now if only forums and other non-threats would stop asking for "strong" passwords.

I really think some forums shouldn't demand passwords at all. Why would you need a password when asking a question on a helpdesk forum? I think forum passwords are like NickServ passwords, they're often unnecessary and it wouldn't generally matter if it's removed after 30 days of not logging in...

TLDR: ps.02 was mostly right, I just started disliking UTF-8. I wouldn't put it past adobe to accept four character passwords and don't think the encoding method needs to be listed with the output if consistently used. Oh, and forum passwords are like NickServ passwords...

User avatar
orthogon
Posts: 3104
Joined: Thu May 17, 2012 7:52 am UTC
Location: The Airy 1830 ellipsoid

Re: 1286: "Encryptic"

Postby orthogon » Tue Nov 05, 2013 5:11 pm UTC

PinkShinyRose wrote:To be fair: after all this, should it really surprise us if adobe allowed 4 character passwords? Actually some websites count bytes of password length, in which case adobe most probably allows 4 character passwords.

I see what you're thinking: is there a language in which "correct horse battery staple" can be translated into four characters? Unfortunately, only "horse" appears to have a single-character Chinese translation according to Google.
xtifr wrote:... and orthogon merely sounds undecided.

User avatar
davidstarlingm
Posts: 1255
Joined: Mon Jun 01, 2009 4:33 am UTC

Re: 1286: "Encryptic"

Postby davidstarlingm » Tue Nov 05, 2013 5:14 pm UTC

sotanaht wrote:6 passwords (at any given time, changing passwords regularly is still a good idea) is more than enough for a sufficiently strong protocol. Do what I do:

  • One weak password for everything that does not matter. Forum sites like XKCD get this password, as do game accounts with nothing of value to lose. Worst case scenario, someone can impersonate me on forums... I'm so scared (/sarcasm).
  • One strong password for everything containing (only) my credit/debit card information. Regardless of whether this is hacked on one site or a dozen the result is the same: canceling my cards and opening an identity theft case.
  • Individual strong passwords for Email (due to password recovery) and any individual account that is a considerable risk. For me, this is my bank, Steam, and WoW accounts. If someone hacks steam or WoW I could end up losing large amounts of invested time and or money and might not be able to get it back. Bank could probably get my debit password, but I'd rather not risk it.

Now if only forums and other non-threats would stop asking for "strong" passwords.

Why do you say that changing passwords regularly is a good idea? Unless you're afraid that a password has been compromised (at which point you should change that one and that one only), changing passwords does nothing to help security. Or do you have a generalized fear that passwords can be compromised without you knowing and so you just change them randomly? If a password gets compromised, you'll probably know it.

And why use weak passwords at all? Making a strong password is just a matter of adding more characters. Unless you're using "strong password" to mean this kind of password.

My problem thus far has been that there's simply no single password pattern (length, types of characters, etc) that will work for even a fraction of the sites I use. Some have a minimum of 16 characters while others have a maximum of 12; some require you to use either _, /, ?, !, ~, or % (Seriously, what the hell? If you require one of a handful of characters, EVERYONE KNOWS THOSE ARE THE REQUIRED CHARACTERS), while others won't allow any symbols at all. Plus, the whole "change your password every three months" business so I can't keep track of which password patterns I'm using.

User avatar
PinkShinyRose
Posts: 835
Joined: Mon Nov 05, 2012 6:54 pm UTC
Location: the Netherlands

Re: 1286: "Encryptic"

Postby PinkShinyRose » Tue Nov 05, 2013 5:24 pm UTC

orthogon wrote:
PinkShinyRose wrote:To be fair: after all this, should it really surprise us if adobe allowed 4 character passwords? Actually some websites count bytes of password length, in which case adobe most probably allows 4 character passwords.

I see what you're thinking: is there a language in which "correct horse battery staple" can be translated into four characters? Unfortunately, only "horse" appears to have a single-character Chinese translation according to Google.

Did you use official Chinese? What about Classical Chinese (although I suppose battery and staple may not exist in that language)? Maybe written Cantonese would work, I think it has more separate characters... I like your idea though, what about Ancient Egyptian? Maybe some other (partially) ideographic script?

On the other hand: I don't think any of these would fit into the 8 bytes of UTF-16 text, they probably need some characters from higher Unicode plains.

Why am I suddenly thinking about the waste of perfectly good sound combinations in most languages (consisting of pronounceable series of sounds existing in that language) causing unnecessarily long words? Battery could be named 'ando' perfectly fine, that word isn't used in English yet is it?

User avatar
gmalivuk
GNU Terry Pratchett
Posts: 26836
Joined: Wed Feb 28, 2007 6:02 pm UTC
Location: Here and There
Contact:

Re: 1286: "Encryptic"

Postby gmalivuk » Tue Nov 05, 2013 6:23 pm UTC

Inefficiency means redundancy, which means comprehensibility even in the noisy environments where most speech happens.
Unless stated otherwise, I do not care whether a statement, by itself, constitutes a persuasive political argument. I care whether it's true.
---
If this post has math that doesn't work for you, use TeX the World for Firefox or Chrome

(he/him/his)

User avatar
Dracomax
Posts: 998
Joined: Wed Mar 27, 2013 1:11 pm UTC

Re: 1286: "Encryptic"

Postby Dracomax » Tue Nov 05, 2013 7:06 pm UTC

orthogon wrote:
PinkShinyRose wrote:To be fair: after all this, should it really surprise us if adobe allowed 4 character passwords? Actually some websites count bytes of password length, in which case adobe most probably allows 4 character passwords.

I see what you're thinking: is there a language in which "correct horse battery staple" can be translated into four characters? Unfortunately, only "horse" appears to have a single-character Chinese translation according to Google.

...I originally started to read this as "I know what you are thinking. 'Did he use UTF-16, or UTF-8?' in all the excitement, I don't really know myself. so, you've got to ask yourself one question. 'Do I feel Lucky?' Well, do ya, Punk?"

Carry on making the world safe for cryptography.
“have i gone mad?
im afraid so, but let me tell you something, the best people usualy are.”
― Lewis Carroll, Alice in Wonderland

jonat
Posts: 11
Joined: Tue Mar 26, 2013 9:08 pm UTC

Re: 1286: "Encryptic"

Postby jonat » Tue Nov 05, 2013 9:38 pm UTC


User avatar
Eternal Density
Posts: 5590
Joined: Thu Oct 02, 2008 12:37 am UTC
Contact:

Re: 1286: "Encryptic"

Postby Eternal Density » Wed Nov 06, 2013 3:46 am UTC

rhomboidal wrote:hehe, I remember getting an email from Adobe about the hack. If 125,615,814-across's hint is "YOU KNOW" then I've got that one nailed.
Two Words:
Spoiler:
Image
Play the game of Time! castle.chirpingmustard.com Hotdog Vending Supplier But what is this?
In the Marvel vs. DC film-making war, we're all winners.

User avatar
ucim
Posts: 6896
Joined: Fri Sep 28, 2012 3:23 pm UTC
Location: The One True Thread

Re: 1286: "Encryptic"

Postby ucim » Wed Nov 06, 2013 5:50 am UTC

davidstarlingm wrote:Why do you say that changing passwords regularly is a good idea?
You won't know when a database that includes your password is stolen for decoding. If people changed their passwords regularly, that compromised database would become useless quickly.

Jose
Order of the Sillies, Honoris Causam - bestowed by charlie_grumbles on NP 859 * OTTscar winner: Wordsmith - bestowed by yappobiscuts and the OTT on NP 1832 * Ecclesiastical Calendar of the Order of the Holy Contradiction * Heartfelt thanks from addams and from me - you really made a difference.

mishka
Posts: 49
Joined: Mon Mar 28, 2011 3:47 am UTC

Re: 1286: "Encryptic"

Postby mishka » Wed Nov 06, 2013 7:42 am UTC

Okay, since it's out now, my password for adobe is crayola.
No, I didn't use it elsewhere.
No, I didn't even use the same username as on xkcd.

leifbk
Posts: 33
Joined: Wed Nov 26, 2008 7:24 am UTC
Location: Bærum, Norway
Contact:

Re: 1286: "Encryptic"

Postby leifbk » Wed Nov 06, 2013 10:22 am UTC

PinkShinyRose wrote:Silly me, but what idiot came up with UTF-8 anyway

Ken Thompson.

PinkShinyRose wrote:I mean: only ASCII characters are shorter than all other characters, it's ridiculous that someone came up with a standard to represent only unaccented Roman characters in the first place (especially coming from a country with a significant number of native French and Spanish speakers), let alone having someone do a do-over of the whole fiasco in the internet age, at a time Unicode was invented because the separate standards for each script were getting ridiculous. </rant>

I think that it's pretty smart, because it means that the massive amount of text that is written in 7-bit ASCII also is valid UTF-8. Like it or not, but the overwhelming majority of text on the Interweb is written in the 26-letter English alphabet. And as a Norwegian who works with the national characters æøå, I vastly prefer UTF-8 over any of the more or less exotic and mutually incompatible "code pages" that we had to struggle with in the Dark Ages of MS-DOS. UTF-8 has become the lingua franca that is understood and properly treated everywhere.

User avatar
YellowYeti
Posts: 59
Joined: Fri Jul 27, 2012 6:05 am UTC

Re: 1286: "Encryptic"

Postby YellowYeti » Wed Nov 06, 2013 10:54 am UTC

gnutrino wrote:
AussieJono wrote:Just wanted to get in there and say that alpha, obvious and Michael Jackson are almost certainly abc or ABC. I don't know anything about encryption, so let me have this one and feel smart for five minutes.

YellowYeti wrote:I'm submitting an entry for 'Sexy Earlobes', 'Best TOS episode' and 'Sugarland':

my understanding is that Earlobes and TOS contain the same first 8 characters, and that TOS and Sugarland have a final string in common:

Sexy Earlobes - Charlie Sheen ( Apparently there was some sitcom which had him do a routine about sexy earlobes )
Best TOS Episode - Charlie X
Sugarland - Malcolm X ( It seems some kind of seminar on Malcolm X was held at Sugarland, Tx )


Good Job.

I'm gonna go ahead and guess that that "Duh" is password and "name1" is matthias1 so "57" is probably password57 and the one without a hint between them is password1.

I still can't get the monster mash/purloined/pokemon ones though, anybody got any more ideas?


Best I can come up with is that Monster Mash was recorded by Bobby (Boris) Pickett ( He did the mash? ) and Boris Blacher wrote an opera based on Poe's 'The Purloined Letter'

My Pokemon-Fu has failed me for the connection there though, so I may be completely off-track

User avatar
xkcdgeek
Posts: 1
Joined: Wed Nov 06, 2013 11:02 am UTC

Re: 1286: "Encryptic"

Postby xkcdgeek » Wed Nov 06, 2013 11:15 am UTC

Could be a true story:

I used an xkcd 936 style password ("correct horse battery staple"). And it was compromised by the Adobe leak. My password was "questionamericanpersonalreligion".

User avatar
Flumble
Yes Man
Posts: 2266
Joined: Sun Aug 05, 2012 9:35 pm UTC

Re: 1286: "Encryptic"

Postby Flumble » Wed Nov 06, 2013 11:16 am UTC

I couldn't withhold myself from checking out the database - the excerpt on xkcd is indeed made up. :|

mishka wrote:Okay, since it's out now, my password for adobe is crayola.
No, I didn't use it elsewhere.
No, I didn't even use the same username as on xkcd.

Thanks, now we have to reverse-crack your username...

Kit.
Posts: 1117
Joined: Thu Jun 16, 2011 5:14 pm UTC

Re: 1286: "Encryptic"

Postby Kit. » Wed Nov 06, 2013 11:36 am UTC

PinkShinyRose wrote:Silly me, but what idiot came up with UTF-8 anyway, I mean: only ASCII characters are shorter than all other characters, it's ridiculous that someone came up with a standard to represent only unaccented Roman characters in the first place (especially coming from a country with a significant number of native French and Spanish speakers), let alone having someone do a do-over of the whole fiasco in the internet age, at a time Unicode was invented because the separate standards for each script were getting ridiculous.

You seem to overestimate the importance of accented Roman characters on the Internet.

While English is still the main language of the Internet, the second most used language by the number of visitors uses Han script. The second most used by the number of websites uses Cyrillic.

speising
Posts: 2367
Joined: Mon Sep 03, 2012 4:54 pm UTC
Location: wien

Re: 1286: "Encryptic"

Postby speising » Wed Nov 06, 2013 12:53 pm UTC

YellowYeti wrote:Best I can come up with is that Monster Mash was recorded by Bobby (Boris) Pickett ( He did the mash? ) and Boris Blacher wrote an opera based on Poe's 'The Purloined Letter'

My Pokemon-Fu has failed me for the connection there though, so I may be completely off-track


those two don't share 8 letters, though.
Last edited by speising on Wed Nov 06, 2013 5:05 pm UTC, edited 1 time in total.

User avatar
PinkShinyRose
Posts: 835
Joined: Mon Nov 05, 2012 6:54 pm UTC
Location: the Netherlands

Re: 1286: "Encryptic"

Postby PinkShinyRose » Wed Nov 06, 2013 1:48 pm UTC

leifbk wrote:
PinkShinyRose wrote:Silly me, but what idiot came up with UTF-8 anyway

Ken Thompson.

Okay, maybe he wasn't an idiot, I consider it a lapse of judgement.
leifbk wrote:
PinkShinyRose wrote:I mean: only ASCII characters are shorter than all other characters, it's ridiculous that someone came up with a standard to represent only unaccented Roman characters in the first place (especially coming from a country with a significant number of native French and Spanish speakers), let alone having someone do a do-over of the whole fiasco in the internet age, at a time Unicode was invented because the separate standards for each script were getting ridiculous. </rant>

I think that it's pretty smart, because it means that the massive amount of text that is written in 7-bit ASCII also is valid UTF-8. Like it or not, but the overwhelming majority of text on the Interweb is written in the 26-letter English alphabet. And as a Norwegian who works with the national characters æøå, I vastly prefer UTF-8 over any of the more or less exotic and mutually incompatible "code pages" that we had to struggle with in the Dark Ages of MS-DOS. UTF-8 has become the lingua franca that is understood and properly treated everywhere.

But is it expected to remain that way? I can agree with whitespace characters being truncated at the cost of other characters, but don't really understand the added value of adding highly script specific characters in what's supposed to be an international standard.
Kit. wrote:
PinkShinyRose wrote:Silly me, but what idiot came up with UTF-8 anyway, I mean: only ASCII characters are shorter than all other characters, it's ridiculous that someone came up with a standard to represent only unaccented Roman characters in the first place (especially coming from a country with a significant number of native French and Spanish speakers), let alone having someone do a do-over of the whole fiasco in the internet age, at a time Unicode was invented because the separate standards for each script were getting ridiculous.

You seem to overestimate the importance of accented Roman characters on the Internet.

While English is still the main language of the Internet, the second most used language by the number of visitors uses Han script. The second most used by the number of websites uses Cyrillic.

The part about accented characters referred to ASCII, I think that was designed in the 1960's, a time with significant international communications (and the US obviously already had a large Spanish and French speaking population) but long before the internet. The do-over part was about UTF-8 still lacking representation of several scripts of limited size within a reasonable number of bytes (2 bytes would have been unreasonable in the 1960's but not in the 1990's). I also meant that ASCII induced derivative standards of similar size by discriminating unnecessarily against other variations of Latin scripts (and more extensive standards for other scripts, but those seem inevitable within the 1 byte restriction). I feel similarly that UTF-8 screams for the introduction of differently nationalised variants with Hebrew, Arabic, Greek, Cyrillic or Devanagari characters truncated as opposed to Latin characters, especially Devanagari as it is ridiculously long in UTF-8.

Zirion
Posts: 1
Joined: Tue Aug 31, 2010 7:41 pm UTC

Re: 1286: "Encryptic"

Postby Zirion » Wed Nov 06, 2013 2:46 pm UTC

How about nickwiger for "he did the mash" (writer of "the Original Monster Mash"), leaving clauncher or clawitzer (impossible to tell which though) for the pokemon clue. I can't work out how to get nickwige to match up with purloined though. Feels like nick and purloin are close enough that it might be right, but I can't see how to deal with the "wige" part.

One other thought - how's the padding done on this? Is there a chance for a coincidental match on the remaining characters, or do they always use (say) null characters to fill out the last few bytes. In theory you might say "take password; pad out with 12345678; encrypt", but that doesn't seem as good as just saying "pad with nulls".

Plus, I don't think it's Randall's style to leave the Pokemon indeterminable. Unless it was a cat based Pokemon.

speising
Posts: 2367
Joined: Mon Sep 03, 2012 4:54 pm UTC
Location: wien

Re: 1286: "Encryptic"

Postby speising » Wed Nov 06, 2013 3:33 pm UTC

oh yeah, UTF-8 codepages FTW!

Kit.
Posts: 1117
Joined: Thu Jun 16, 2011 5:14 pm UTC

Re: 1286: "Encryptic"

Postby Kit. » Wed Nov 06, 2013 3:52 pm UTC

PinkShinyRose wrote:
leifbk wrote:
PinkShinyRose wrote:I mean: only ASCII characters are shorter than all other characters, it's ridiculous that someone came up with a standard to represent only unaccented Roman characters in the first place (especially coming from a country with a significant number of native French and Spanish speakers), let alone having someone do a do-over of the whole fiasco in the internet age, at a time Unicode was invented because the separate standards for each script were getting ridiculous. </rant>

I think that it's pretty smart, because it means that the massive amount of text that is written in 7-bit ASCII also is valid UTF-8. Like it or not, but the overwhelming majority of text on the Interweb is written in the 26-letter English alphabet. And as a Norwegian who works with the national characters æøå, I vastly prefer UTF-8 over any of the more or less exotic and mutually incompatible "code pages" that we had to struggle with in the Dark Ages of MS-DOS. UTF-8 has become the lingua franca that is understood and properly treated everywhere.

But is it expected to remain that way?

Why not?

PinkShinyRose wrote:I can agree with whitespace characters being truncated at the cost of other characters, but don't really understand the added value of adding highly script specific characters in what's supposed to be an international standard.

Are you suggesting that there should be no international standards for encoding any particular script?

PinkShinyRose wrote:
Kit. wrote:You seem to overestimate the importance of accented Roman characters on the Internet.

While English is still the main language of the Internet, the second most used language by the number of visitors uses Han script. The second most used by the number of websites uses Cyrillic.

The part about accented characters referred to ASCII, I think that was designed in the 1960's, a time with significant international communications (and the US obviously already had a large Spanish and French speaking population)

ITA2 (which it was supposed to replace) was using 6 bits (5 bits + register shift codes), was an international standard (introduced by CCITT) and obviously had no accented Roman characters.

Kit. wrote:The do-over part was about UTF-8 still lacking representation of several scripts of limited size within a reasonable number of bytes (2 bytes would have been unreasonable in the 1960's but not in the 1990's).

I don't see how 2 bytes "are reasonable", but 3 bytes "aren't".

What was reasonable is to keep compatibility with ASCII (to keep portability of the source code) and to reduce the maximum byte length of a single encoded character.

Kit. wrote:I also meant that ASCII induced derivative standards of similar size by discriminating unnecessarily against other variations of Latin scripts (and more extensive standards for other scripts, but those seem inevitable within the 1 byte restriction).

Well, considering that ASCII was not "1-byte", but "7-bit", most of what you said in this sentence is untrue.

Kit. wrote:I feel similarly that UTF-8 screams for the introduction of differently nationalised variants with Hebrew, Arabic, Greek, Cyrillic or Devanagari characters truncated as opposed to Latin characters, especially Devanagari as it is ridiculously long in UTF-8.

I see no such need. UTF-8 is a good international standard, not an intentionally national standard like ASCII (as initially designed). And if you don't forget to count for markup, you may see HTML documents with Devanagari texts to be shorter in UTF-8 than in UTF-16.

User avatar
PinkShinyRose
Posts: 835
Joined: Mon Nov 05, 2012 6:54 pm UTC
Location: the Netherlands

Re: 1286: "Encryptic"

Postby PinkShinyRose » Wed Nov 06, 2013 4:25 pm UTC

Kit. wrote:
I also meant that ASCII induced derivative standards of similar size by discriminating unnecessarily against other variations of Latin scripts (and more extensive standards for other scripts, but those seem inevitable within the 1 byte restriction).

Well, considering that ASCII was not "1-byte", but "7-bit", most of what you said in this sentence is untrue.

I meant practical restriction. I think in a lot of applications an extra bit was added to ASCII characters for ease of use. If 12 bits would be used for every characters some applications might fill it up until it's 16 bits long. On the other hand, I do agree that using the eighth wasted bit would probably suffice to add a new script of similar size.
I feel similarly that UTF-8 screams for the introduction of differently nationalised variants with Hebrew, Arabic, Greek, Cyrillic or Devanagari characters truncated as opposed to Latin characters, especially Devanagari as it is ridiculously long in UTF-8.

I see no such need. UTF-8 is a good international standard, not an intentionally national standard like ASCII (as initially designed). And if you don't forget to count for markup, you may see HTML documents with Devanagari texts to be shorter in UTF-8 than in UTF-16.[/quote]
Only in certain documents. In a plaintext file it would be much longer. Besides, I'm not arguing that UTF-8 promotes the use of UTF-16 in India (which would reduce the size of a Devanagari character to 2 bytes and increase the size of a ASCII character to 2 bytes compared to the 3 and 1 bytes in UTF-8). I'm arguing that a different Unicode encoding implementation would be more efficient than UTF-16: using Devanagari characters of 1 byte and ASCII characters of 2 bytes (or possibly the other way around in the case of mark-up or programming languages), maybe both sets being one byte could also work but that would probably mean that all other scripts would be at least 3 bytes a character. I suppose the last two options are somewhat similar to what the Peoples Republic of China did (although with their huge script their own script is still 2 bytes a character, on the other hand this is compensated by each character carrying more information).

On the other hand, maybe most peoples and governments don't really mind, there is still only a US English version of HTML and css (otherwise browsers would have to parse 'color=' identically to 'Farbe=', 'kleur=', 'couleur=' (okay, my French is bad, so it might be wrong) and possibly 'colour=' (somewhat inefficient, although they didn't go with 'clr=' either...). There were better examples, but I have no idea what body is in languages other than English and Dutch.

chernobyl
Posts: 23
Joined: Wed Jun 27, 2007 6:24 am UTC
Location: Sofia, Bulgaria
Contact:

Re: 1286: "Encryptic"

Postby chernobyl » Wed Nov 06, 2013 9:35 pm UTC

Every option has its pros and cons, and given the state and history of technology, losing backward compatibility with ASCII would be worse than anything else. UTF8 was probably the best solution. I'm saying that even though I'm a bit ticked about having to pay extra for sending SMS in my language (which uses the cyrillic script).

ps.02
Posts: 378
Joined: Fri Apr 05, 2013 8:02 pm UTC

Re: 1286: "Encryptic"

Postby ps.02 » Thu Nov 07, 2013 3:02 am UTC

PinkShinyRose wrote:Silly me, but what idiot came up with UTF-8 anyway, I mean: only ASCII characters are shorter than all other characters, it's ridiculous that someone came up with a standard to represent only unaccented Roman characters in the first place (especially coming from a country with a significant number of native French and Spanish speakers)

As mentioned, Ken Thompson, while designing Plan 9. Despite the name, Plan 9 was the opposite of done by idiots. (:

You don't have to use UTF-8 if you don't like the fact that characters can range from 1 to 4 bytes (1 to 3 in plane 0). Microsoft didn't use it, so you'd be in good company. Well, you'd be in company anyway. But don't knock it until you understand the actual design choices. UTF-8 has some excellent properties that most other encodings, including other Unicode encodings, don't have:

1) Legacy compatibility: All ASCII text is valid UTF-8 with no change in meaning. OK, so you think privileging ASCII over other encodings is English language imperialist or whatever, but seriously, if you can't be backward-compatible with everything it's nice to at least be backward-compatible with something. In 1992 or so when UTF-8 was born, there was an awful lot of ASCII in the world. And there still is.

1a) Everything that looks like ASCII is ASCII. That is, no ASCII bytes appear in any non-ASCII characters. This means, among other things, that any text-processing tools that use ASCII characters as delimiters (say, CSV files, or XML, or hundreds of other file types) will correctly parse UTF-8 input even if they don't know how to decode UTF-8. Don't underestimate how many bugs in all levels of software stacks, and including security issues, can be avoided by ensuring that anything that looks like a delimiter really is one.

1b) No zero bytes except actual NUL characters. This seems esoteric unless you come from a world where character strings are commonly delimited by a NUL byte. The designer of UTF-8 came from that world (actually he helped invent that world), as do I.

2) All character boundaries are unambiguous. In most other multi-byte encodings, if you point randomly into the middle of your text, you cannot necessarily tell where the nearest character boundary is. The best you can do is make educated guesses based on common or uncommon patterns. In UTF-8, every character boundary is unambiguous. So if you happen to jump into the middle of a character, just read forward or backward a couple of bytes and when you get to a character boundary you'll know, because the two high bits won't be 1 0.

3) Recognizable. If your string of bytes includes a significant number of non-ASCII bytes, and it all parses as valid UTF-8, it is overwhelmingly likely to actually be UTF-8. That is, non-UTF-8 text almost never looks like UTF-8 unless it is almost entirely ASCII. (Speaking of which, let's ignore UTF-7, shall we?) Whereas, with most other single- and multi-byte encodings, it goes more like: "Assuming this is ISO-8859-15, all the alphabetic characters happen to spell valid French words. Therefore this is probably ISO-8859-15." (Which will still fail, because ISO-8859-1 and ISO-8859-15 are so nearly identical that such a system would find your text equally likely to be either one.) Doing that kind of guessing basically requires a corpus of text for each language and character set combination you are likely to encounter. Is this probably UTF-8? is so much easier to answer it's not even funny. It either validates or it doesn't.

4) Stateless. Well of course you have to look at the 1 to 4 bytes that make up an individual character, but beyond that, there's no context to decoding the characters. (Granted, most other character encodings have this property as well, but not all. See Shift-JIS.)

Well, point 4 needs an asterisk because UTF-8 is a Unicode encoding and Unicode itself is not stateless. In interpreting and rendering Unicode, you do have to maintain quite a bit of state above the level of individual characters. (Combining characters, RTL vs LTR, ligatures in scripts like Arabic or Devanagiri, e.g.)

User avatar
PinkShinyRose
Posts: 835
Joined: Mon Nov 05, 2012 6:54 pm UTC
Location: the Netherlands

Re: 1286: "Encryptic"

Postby PinkShinyRose » Thu Nov 07, 2013 4:06 pm UTC

I get UTF-8 from a purely engineering point of view, most programming languages are in English after all. Except that backwards compatibility is nice, but it's kind of a big burden to put on a system that is probably supposed to last for a couple of decades, about halfway through which the old standards are probably mostly deprecated in practice and at the end of which very few recently used texts are in the old encoding (while most systems would keep ASCII support separately for a while anyway). The social aspect is what I mostly object to, it does seem US imperialist even if it results from earlier decisions. It also assumes that English will remain the most important language for international communications for however long UTF-8 is intended to last, which seems dubious. I'm not sure why, but creating a nearly Earth-wide character encoding system and then strongly biasing it towards one language (at the cost several others) seems odd and disrespectful to the disadvantaged languages.

Kit.
Posts: 1117
Joined: Thu Jun 16, 2011 5:14 pm UTC

Re: 1286: "Encryptic"

Postby Kit. » Thu Nov 07, 2013 5:12 pm UTC

PinkShinyRose wrote:
Kit. wrote:
I also meant that ASCII induced derivative standards of similar size by discriminating unnecessarily against other variations of Latin scripts (and more extensive standards for other scripts, but those seem inevitable within the 1 byte restriction).

Well, considering that ASCII was not "1-byte", but "7-bit", most of what you said in this sentence is untrue.

I meant practical restriction. I think in a lot of applications an extra bit was added to ASCII characters for ease of use. If 12 bits would be used for every characters some applications might fill it up until it's 16 bits long. On the other hand, I do agree that using the eighth wasted bit would probably suffice to add a new script of similar size.

When ASCII was developed, 8-bit bytes (or actually half-words and/or quarter-words) were already quite common, but not yet industry standard. In addition to 16-bit and 32-bit machines, there also were 12-bit, 18-bit, 24-bit, 36-bit machines, including quite popular ones. There was no expectation in the industry for 8-bit (or 16-bit) sequences to be treated somehow completely differently from other kinds. Even when 8-bit bytes were used, one bit might be treated differently from other ones when fed to the terminal device: as a parity bit or as a "special effect" bit (bold or inverse) - or just completely stripped by the transmission equipment. It could also be used to work with two code pages (such as for pseudographics and/or national languages) independently preselected from different sections of the terminal ROM (or even preloaded into the terminal RAM).

Besides, ASCII did not "induce" 7-bit "derivatives". ASCII was developed as one of the local alternatives. It did not intend to contain accented characters, that's what the alternative code sets were supposed to do.

PinkShinyRose wrote:
Kit. wrote:
I feel similarly that UTF-8 screams for the introduction of differently nationalised variants with Hebrew, Arabic, Greek, Cyrillic or Devanagari characters truncated as opposed to Latin characters, especially Devanagari as it is ridiculously long in UTF-8.

I see no such need. UTF-8 is a good international standard, not an intentionally national standard like ASCII (as initially designed). And if you don't forget to count for markup, you may see HTML documents with Devanagari texts to be shorter in UTF-8 than in UTF-16.

Only in certain documents. In a plaintext file it would be much longer. Besides, I'm not arguing that UTF-8 promotes the use of UTF-16 in India (which would reduce the size of a Devanagari character to 2 bytes and increase the size of a ASCII character to 2 bytes compared to the 3 and 1 bytes in UTF-8). I'm arguing that a different Unicode encoding implementation would be more efficient than UTF-16: using Devanagari characters of 1 byte and ASCII characters of 2 bytes (or possibly the other way around in the case of mark-up or programming languages), maybe both sets being one byte could also work but that would probably mean that all other scripts would be at least 3 bytes a character.

What for? If they want a compact representation of the documents written in their own script, they already have ISCII. If they want some internationally useful encoding, there is no point for them to not stick to the international standard.

PinkShinyRose wrote:On the other hand, maybe most peoples and governments don't really mind, there is still only a US English version of HTML and css (otherwise browsers would have to parse 'color=' identically to 'Farbe=', 'kleur=', 'couleur=' (okay, my French is bad, so it might be wrong) and possibly 'colour=' (somewhat inefficient, although they didn't go with 'clr=' either...). There were better examples, but I have no idea what body is in languages other than English and Dutch.

As a software engineer working in an international team, I don't want to be forced to learn that ಬಣ್ಣ I could see in the source code written by me and modified by someone from Bangalore means "color". They won't be happy to see färg or цвет or ფერი there either. So, one language, please. And I see no reason for it not to be English.

lgw
Posts: 437
Joined: Mon Apr 12, 2010 10:52 pm UTC

Re: 1286: "Encryptic"

Postby lgw » Thu Nov 07, 2013 6:56 pm UTC

PinkShinyRose wrote:I get UTF-8 from a purely engineering point of view, most programming languages are in English after all. Except that backwards compatibility is nice, but it's kind of a big burden to put on a system that is probably supposed to last for a couple of decades, about halfway through which the old standards are probably mostly deprecated in practice and at the end of which very few recently used texts are in the old encoding (while most systems would keep ASCII support separately for a while anyway). The social aspect is what I mostly object to, it does seem US imperialist even if it results from earlier decisions. It also assumes that English will remain the most important language for international communications for however long UTF-8 is intended to last, which seems dubious. I'm not sure why, but creating a nearly Earth-wide character encoding system and then strongly biasing it towards one language (at the cost several others) seems odd and disrespectful to the disadvantaged languages.


The country with the best army sets the international standards, and pretty much always has. Does this seem like a good idea to most people? Well, if you have the biggest army, you don't much need to care what seems like a good idea to most people. As time goes on the power will shift and the bias will change, and eventually everyone gets a chance.

You can protest if that's your thing. But be warned, possibly the most clever poem ever was written as a protest against a bad encoding/alphabet scheme. Your efforts will likely be embarrassing by comparison.

But really, who cares what's short vs long in UTF8? There are very few computerized environments today in which the size of your encoded text could possibly matter. In another decade or two, there won't be any. And we still won't agree on line endings. And character code 7 will still mean "bell", regardless.
"In no set of physics laws do you get two cats." - doogly

speising
Posts: 2367
Joined: Mon Sep 03, 2012 4:54 pm UTC
Location: wien

Re: 1286: "Encryptic"

Postby speising » Thu Nov 07, 2013 7:20 pm UTC

in defense of ascii, and by extension 1 bitbyte utf-8 characters, the 26 characters of the english alphabet are also the least common denominator of a lot of other languages. all that use roman script. it makes a lot more sense to appease all of them, than devanagari (whatever that may be).
Last edited by speising on Thu Nov 07, 2013 9:18 pm UTC, edited 1 time in total.

User avatar
gmalivuk
GNU Terry Pratchett
Posts: 26836
Joined: Wed Feb 28, 2007 6:02 pm UTC
Location: Here and There
Contact:

Re: 1286: "Encryptic"

Postby gmalivuk » Thu Nov 07, 2013 8:46 pm UTC

PinkShinyRose wrote:It also assumes that English will remain the most important language for international communications for however long UTF-8 is intended to last, which seems dubious.
Do you think it's intended to last more than the few decades you refer to earlier in your post? Because I don't think it's remotely dubious to think English will remain an international lingua franca for at least that long.

And in any case, it doesn't have to be English, it just has to be some combination of languages that use the same alphabet, which is quite a lot of them.
Unless stated otherwise, I do not care whether a statement, by itself, constitutes a persuasive political argument. I care whether it's true.
---
If this post has math that doesn't work for you, use TeX the World for Firefox or Chrome

(he/him/his)

User avatar
adaviel
Posts: 41
Joined: Wed Jan 14, 2009 5:30 pm UTC
Location: Vancouver Canada
Contact:

Re: 1286: "Encryptic"

Postby adaviel » Fri Nov 08, 2013 5:22 pm UTC

Interestingly, it looks like Randall's email is one of the ones leaked. With encrypted password "cILuB1/+/esYdlwScL4nsQ==" and hint "q".
So was mine. So were about 80 at my workplace, 1200 at my nearest university, and 2 million in the .edu TLD.
Just looking at usernames can be fun - actually, there is probably more entropy in many of the usernames than the
actual passwords, e.g. things like "cupcake_of_doom.334@hotmail.com"
As a source of email addresses for spamming, it needs serious washing. About 25% of my workplace ones were stale when I did a mailout.

A few other relevant links:
http://anonnews.org/forum/post/64784
http://arstechnica.com/security/2013/11 ... -crackers/
http://www.hydraze.org/2013/10/some-inf ... sers-leak/
http://www.leemangold.com/2013/11/02/ad ... reach-faq/
http://stricture-group.com/files/adobe-top100.txt
http://tobtu.com/adobe.php
https://www.adobe.com/ca/account/sign-i ... otcom.html - to change your password

Top 10 passwords:
# Count Ciphertext Plaintext
1. 1911938 EQ7fIpT7i/Q= 123456
2. 446162 j9p+HwtWWT86aMjgZFLzYg== 123456789
3. 345834 L8qbAD3jl3jioxG6CatHBw== password
4. 211659 BB4e6X+b2xLioxG6CatHBw== adobe123
5. 201580 j9p+HwtWWT/ioxG6CatHBw== 12345678
6. 130832 5djv7ZCI2ws= qwerty
7. 124253 dQi0asWPYvQ= 1234567
8. 113884 7LqYzKVeq8I= 111111
9. 83411 PMDTbP0LZxu03SwrFUvYGA== photoshop

So, with all those efforts (including Randall's) to educate users about passwords, the most common password has "improved" in the last 15 years from 1234 to 12345 to 123456.

speising
Posts: 2367
Joined: Mon Sep 03, 2012 4:54 pm UTC
Location: wien

Re: 1286: "Encryptic"

Postby speising » Sat Nov 09, 2013 3:29 pm UTC

yay, i'm in the top 100!

i find it nice of adobe, to force you to reset the pw only after you entered the correct identification.
that way, it's easy to verify the deduced passwords from the file.

btw,why does the previous top 10 list only contain 9 entries?

User avatar
PinkShinyRose
Posts: 835
Joined: Mon Nov 05, 2012 6:54 pm UTC
Location: the Netherlands

Re: 1286: "Encryptic"

Postby PinkShinyRose » Sat Nov 09, 2013 4:32 pm UTC

adaviel wrote:Top 10 passwords:
# Count Ciphertext Plaintext
1. 1911938 EQ7fIpT7i/Q= 123456
2. 446162 j9p+HwtWWT86aMjgZFLzYg== 123456789
3. 345834 L8qbAD3jl3jioxG6CatHBw== password
4. 211659 BB4e6X+b2xLioxG6CatHBw== adobe123
5. 201580 j9p+HwtWWT/ioxG6CatHBw== 12345678
6. 130832 5djv7ZCI2ws= qwerty
7. 124253 dQi0asWPYvQ= 1234567
8. 113884 7LqYzKVeq8I= 111111
9. 83411 PMDTbP0LZxu03SwrFUvYGA== photoshop

So, with all those efforts (including Randall's) to educate users about passwords, the most common password has "improved" in the last 15 years from 1234 to 12345 to 123456.

Is this due to education efforts or due to more sites actually requiring a 6 character password (so, adobe probably didn't allow 4 character passwords? At least no 4 character ASCII passwords I suppose). I don't think any of these are serious attempts to keep others from using their account. They're probably just there because Adobe required a password for some ridiculous purpose.

User avatar
gmalivuk
GNU Terry Pratchett
Posts: 26836
Joined: Wed Feb 28, 2007 6:02 pm UTC
Location: Here and There
Contact:

Re: 1286: "Encryptic"

Postby gmalivuk » Sat Nov 09, 2013 4:55 pm UTC

The sad thing is that with nearly 2 million instances of "123456", at least a few of them almost certainly *are* actual attempts to prevent access.
Unless stated otherwise, I do not care whether a statement, by itself, constitutes a persuasive political argument. I care whether it's true.
---
If this post has math that doesn't work for you, use TeX the World for Firefox or Chrome

(he/him/his)


Return to “Individual XKCD Comic Threads”

Who is online

Users browsing this forum: No registered users and 87 guests