1209:"Encoding"

This forum is for the individual discussion thread that goes with each new comic.

Moderators: Moderators General, Prelates, Magistrates

User avatar
Quicksilver
Posts: 437
Joined: Wed Apr 29, 2009 6:21 am UTC

1209:"Encoding"

Postby Quicksilver » Wed May 08, 2013 4:38 am UTC

Image
Larger image
http://xkcd.com/1209
Alt Text:"I don't see how; the C0 block is right there at the beginning."
Your image is still no match for my 2560x1600 30" monitor, Randall!
Last edited by Quicksilver on Wed May 08, 2013 6:03 am UTC, edited 1 time in total.

User avatar
rhomboidal
Posts: 761
Joined: Wed Jun 15, 2011 5:25 pm UTC
Contact:

Re: 1209:"Encoding"

Postby rhomboidal » Wed May 08, 2013 4:45 am UTC

You'd think aircraft controls would have an Alt key.

User avatar
Xantix
Posts: 24
Joined: Tue Oct 30, 2012 5:16 pm UTC

Re: 1209:"Encoding"

Postby Xantix » Wed May 08, 2013 5:09 am UTC

Well, on the bright side, the groundwriter did a good job.

Image

User avatar
RebeccaRGB
Posts: 336
Joined: Sat Mar 06, 2010 7:36 am UTC
Location: Lesbians Love Bluetooth
Contact:

Re: 1209:"Encoding"

Postby RebeccaRGB » Wed May 08, 2013 7:31 am UTC

Ño, thẽ c̃om̃bining̃ diacr̃itics̃ go 𝑜𝑣𝑒𝑟 th̃e ĩnt̃er̃rõb̃añg̃‽̃
Stephen Hawking: Great. The entire universe was destroyed.
Fry: Destroyed? Then where are we now?
Al Gore: I don't know. But I can darn well tell you where we're not—the universe!

User avatar
orthogon
Posts: 2696
Joined: Thu May 17, 2012 7:52 am UTC
Location: The Airy 1830 ellipsoid

Re: 1209:"Encoding"

Postby orthogon » Wed May 08, 2013 7:47 am UTC

WRONG!

It later emerged that the skywriter had subcontracted the job to BHG's drone network. Regular ongoing payments to BHG's empire are required to prevent one's intimate secrets being written in mile-high letters in the sky. Who said there was no business model?

Worryingly, I guess that's entirely do-able with today's technology.

This has been my favourite one for a long time.

EDIT: The drone fleet was finally defeated when Jeff Goldblum successfully injected an RTL character into an input string, resulting in a reversal of the controls.
xtifr wrote:... and orthogon merely sounds undecided.

VanI
Posts: 61
Joined: Mon Mar 19, 2012 2:54 am UTC

Re: 1209:"Encoding"

Postby VanI » Wed May 08, 2013 8:29 am UTC

Some combining diacritics go b̥e̦ņęa̩t̪h̫ the modified character. Some of them even o̴ver strike!⃠
I swear, a fireball lied to me just the other day...

squonk
Posts: 127
Joined: Fri May 21, 2010 12:25 pm UTC

Re: 1209:"Encoding"

Postby squonk » Wed May 08, 2013 9:52 am UTC

The last dozen and a half comics have been pretty lazy. Randall is now slacking off all of the time!

User avatar
Red Hal
Magically Delicious
Posts: 1445
Joined: Wed Nov 28, 2007 2:42 pm UTC

Re: 1209:"Encoding"

Postby Red Hal » Wed May 08, 2013 9:59 am UTC

...and by slacking off you mean "producing at least one drawing every hour for the time thread, new comics three times a week, researching for a weekly what-if, and holding down a day job"?
Lost Greatest Silent Baby X Y Z. "There is no one who loves pain itself, who seeks after it and wants to have it, simply because it is pain..."

asdfzxc
Posts: 60
Joined: Mon Jun 08, 2009 7:04 pm UTC

Re: 1209:"Encoding"

Postby asdfzxc » Wed May 08, 2013 10:06 am UTC

Red Hal wrote:...and by slacking off you mean "producing at least one drawing every hour for the time thread, new comics three times a week, researching for a weekly what-if, and holding down a day job"?

xkcd is his day job.

User avatar
Klear
Posts: 1965
Joined: Sun Jun 13, 2010 8:43 am UTC
Location: Prague

Re: 1209:"Encoding"

Postby Klear » Wed May 08, 2013 10:24 am UTC

If they use the interrobang (and the pilot doesn't object), they deserve whatever bad stuff happens to them.

VanI
Posts: 61
Joined: Mon Mar 19, 2012 2:54 am UTC

Re: 1209:"Encoding"

Postby VanI » Wed May 08, 2013 11:18 am UTC

Anyone else notice that "C0" is not the name of a Unicode block? It's actually the "C0 Controls and Basic Latin" block. Wonder how long till Randall fixes it...
I swear, a fireball lied to me just the other day...

User avatar
Red Hal
Magically Delicious
Posts: 1445
Joined: Wed Nov 28, 2007 2:42 pm UTC

Re: 1209:"Encoding"

Postby Red Hal » Wed May 08, 2013 11:30 am UTC

asdfzxc wrote:
Red Hal wrote:...and by slacking off you mean "producing at least one drawing every hour for the time thread, new comics three times a week, researching for a weekly what-if, and holding down a day job"?

xkcd is his day job.
Ah yes, I forgot.
Lost Greatest Silent Baby X Y Z. "There is no one who loves pain itself, who seeks after it and wants to have it, simply because it is pain..."

User avatar
peewee_RotA
Posts: 500
Joined: Mon Dec 12, 2011 1:19 pm UTC

Re: 1209:"Encoding"

Postby peewee_RotA » Wed May 08, 2013 11:39 am UTC

Sounds like a movie found in the back of the video store

Wild Babes of the Force 3: Interrobang
"Vowels have trouble getting married in Canada. They can’t pronounce their O’s."

http://timelesstherpg.wordpress.com/about/

The Cat

Re: 1209:"Encoding"

Postby The Cat » Wed May 08, 2013 11:41 am UTC

Spoiler:
Yep.jpg


el monstro, slinky... no shortage there. happy to be done.

endolith
Posts: 227
Joined: Tue Jan 01, 2008 2:14 am UTC
Location: New York, NY
Contact:

Re: 1209:"Encoding"

Postby endolith » Wed May 08, 2013 1:33 pm UTC

☠☠☠☠☠☠☠☠


I̴̻̮̜̠̼̺̘̙̙̲̖̫̝̲̯͔̒̅̊ͬͬ̊̚̕͢͢ͅ ̢̛ͫ̋ͭͧ̑ͦ̈́̇ͬ̃́̔ͥ̒̀̓̋͆͘҉̜͍̦̣ľ̡͛̂ͣ̈́͂͌̽̅̈́ͦͫ͑ͪ̾ͪͬ́ͮ͢͏͖̞̫͙̬̜̤̺̦̝̦͇̰͍͕͉͍͙́͜ͅǐ̶͖̫̥̖̪̹̾̐̏̒̃͊͐̏͂̍͒̐ͨͣ͡ͅkͫ͌̓̅͂́҉͕̝̤̪͚͍̳̥̰̠͢e̡̡̱͉͉̮ͧͭͤ̓͢͟͠ ͭ̾ͨͣ̏́͛͢҉̬̬̱̬̦͙̻̖͚̭Ų̶̛̯̥͙̥̰̞̰̩̫̣̥̰͍͇͒̌̀̐̔̈̅͞͞ͅņ̷̤̪̩̱̟̲̳̀́̅͑́̌į̟̖̗̗̜͔̤̖̙̩̝ͫ̈́̾̉ͬͤ͊ͫͯ͆͋̄͌̈́́́̕c̵̮̙͇̦̹͇̮̣͍͖͇̟̲̞̞͎̱͉̳̾̎̊̔ͨ̐̓͌ͣ̓͐̚͢o̧͖̱̯͕̺̻͍͂ͭ̌̅̃̒̇͡d̴̴̜͍̬̥͍̺͎͐̿̽̽͛͆̌̃͌͐̕ȩ̰͈͕͕͇̼̦̱͚̰̫̺̜̞̟̈̈̑ͬ̓̿̓̀̀̆̆̈́̄͂͊̒ͤ͢͟͠͠.̛̮͈̙̘̜͖̼̜͎̟̮ͭͭͧͭ̂͊̏́ͪ̓̒̀͢͟


☃☃☃☃☃☃☃☃☃

User avatar
higgs-boson
Posts: 519
Joined: Tue Mar 26, 2013 12:00 pm UTC
Location: Europe (UTC + 4 newpix)

Re: 1209:"Encoding"

Postby higgs-boson » Wed May 08, 2013 2:03 pm UTC

endolith wrote:☠☠☠☠☠☠☠☠


I̴̻̮̜̠̼̺̘̙̙̲̖̫̝̲̯͔̒̅̊ͬͬ̊̚̕͢͢ͅ ̢̛ͫ̋ͭͧ̑ͦ̈́̇ͬ̃́̔ͥ̒̀̓̋͆͘҉̜͍̦̣ľ̡͛̂ͣ̈́͂͌̽̅̈́ͦͫ͑ͪ̾ͪͬ́ͮ͢͏͖̞̫͙̬̜̤̺̦̝̦͇̰͍͕͉͍͙́͜ͅǐ̶͖̫̥̖̪̹̾̐̏̒̃͊͐̏͂̍͒̐ͨͣ͡ͅkͫ͌̓̅͂́҉͕̝̤̪͚͍̳̥̰̠͢e̡̡̱͉͉̮ͧͭͤ̓͢͟͠ ͭ̾ͨͣ̏́͛͢҉̬̬̱̬̦͙̻̖͚̭Ų̶̛̯̥͙̥̰̞̰̩̫̣̥̰͍͇͒̌̀̐̔̈̅͞͞ͅņ̷̤̪̩̱̟̲̳̀́̅͑́̌į̟̖̗̗̜͔̤̖̙̩̝ͫ̈́̾̉ͬͤ͊ͫͯ͆͋̄͌̈́́́̕c̵̮̙͇̦̹͇̮̣͍͖͇̟̲̞̞͎̱͉̳̾̎̊̔ͨ̐̓͌ͣ̓͐̚͢o̧͖̱̯͕̺̻͍͂ͭ̌̅̃̒̇͡d̴̴̜͍̬̥͍̺͎͐̿̽̽͛͆̌̃͌͐̕ȩ̰͈͕͕͇̼̦̱͚̰̫̺̜̞̟̈̈̑ͬ̓̿̓̀̀̆̆̈́̄͂͊̒ͤ͢͟͠͠.̛̮͈̙̘̜͖̼̜͎̟̮ͭͭͧͭ̂͊̏́ͪ̓̒̀͢͟


☃☃☃☃☃☃☃☃☃


You could hide the Holy Bible in a LIKE-Button, couldn't you.

I'm still wondering why on earth one needs diacritics on an interrobang.
Or did anyone come across a comma-cedille lately?
Apostolic Visitator, Holiest of Holy Fun-Havers
You have questions about XKCD: "Time"? There's a whole Wiki dedicated to it!

The Cat

Re: 1209:"Encoding"

Postby The Cat » Wed May 08, 2013 2:15 pm UTC

Ain't got time for that.

User avatar
Adam H
Posts: 1267
Joined: Thu Jun 16, 2011 6:36 pm UTC

Re: 1209:"Encoding"

Postby Adam H » Wed May 08, 2013 2:23 pm UTC

Step 1) scratch head
Step 2) wikipedia "skywriter"
Step 3) wikipedia "diacritic"
Step 4) ctrl-f "combining"
Step 6) wikipedia "interrobang"
Step 7) shrug

TOTAL COMPREHENSION ACHIEVED.
-Adam

User avatar
peewee_RotA
Posts: 500
Joined: Mon Dec 12, 2011 1:19 pm UTC

Re: 1209:"Encoding"

Postby peewee_RotA » Wed May 08, 2013 2:53 pm UTC

The Cat wrote:Ain't got time for that.


Smoke in the sky? Oh lord Jesus, it's a fire!
"Vowels have trouble getting married in Canada. They can’t pronounce their O’s."

http://timelesstherpg.wordpress.com/about/

User avatar
alvinhochun
Posts: 54
Joined: Wed Nov 14, 2012 3:07 pm UTC

Re: 1209:"Encoding"

Postby alvinhochun » Wed May 08, 2013 3:35 pm UTC

The only thing I know is that Unicode is fun.

The Cat

Re: 1209:"Encoding"

Postby The Cat » Wed May 08, 2013 3:39 pm UTC

Ain't nobody got time for that. fixed

rmsgrey
Posts: 3081
Joined: Wed Nov 16, 2011 6:35 pm UTC

Re: 1209:"Encoding"

Postby rmsgrey » Wed May 08, 2013 3:44 pm UTC

Xantix wrote:Well, on the bright side, the groundwriter did a good job.

Image


But what's the Unicode for it?

The Cat

Re: 1209:"Encoding"

Postby The Cat » Wed May 08, 2013 4:21 pm UTC


iabervon
Posts: 54
Joined: Fri Nov 03, 2006 5:25 am UTC

Re: 1209:"Encoding"

Postby iabervon » Wed May 08, 2013 4:35 pm UTC

Just remember, when you see U+202E, do not attempt to fly the plane backwards.

Mirkwood
Posts: 70
Joined: Mon Dec 06, 2010 9:10 am UTC

Re: 1209:"Encoding"

Postby Mirkwood » Wed May 08, 2013 6:49 pm UTC

I'm of the imperialist school of encoding; we shouldn't need to support any diacritics not used in English!

mmmiller
Posts: 1
Joined: Thu Sep 20, 2012 3:27 pm UTC

Re: 1209:"Encoding"

Postby mmmiller » Wed May 08, 2013 6:53 pm UTC

I was wondering if there's a way to see how much one reference on XKCD moves the needle on a Wikipedia page.

Turns out, there's a site for that: stats.grok.se/en/201305/Interrobang

philipquarles
Posts: 27
Joined: Fri Feb 12, 2010 8:44 pm UTC

Re: 1209:"Encoding"

Postby philipquarles » Wed May 08, 2013 10:51 pm UTC

Is a combining diacritic a diacritic that is combined with the character it modifies, or is it a diacritic that indicates that the modified character should be read as if it were combined with another character?

User avatar
ucim
Posts: 5570
Joined: Fri Sep 28, 2012 3:23 pm UTC
Location: The One True Thread

Re: 1209:"Encoding"

Postby ucim » Thu May 09, 2013 2:20 am UTC

philipquarles wrote:Is a combining diacritic a diacritic that is combined with the character it modifies, or is it a diacritic that indicates that the modified character should be read as if it were combined with another character?
I don't know, but it seems to me that Unicode got it wrong. E with an accent is not a distinct letter from E without an accent, and therefore should not be a distinct character. Searching for one should pull up the other. Characters should have been encoded as base characters with modifications, and searches could then easily work off of base characters, disregarding the modifications.

Of course, they never asked me, mumble mumble, GET OFF THE LAWN you mumble mumble...

Jose
Order of the Sillies, Honoris Causam - bestowed by charlie_grumbles on NP 859 * OTTscar winner: Wordsmith - bestowed by yappobiscuts and the OTT on NP 1832 * Ecclesiastical Calendar of the Order of the Holy Contradiction * Please help addams if you can. She needs all of us.

endolith
Posts: 227
Joined: Tue Jan 01, 2008 2:14 am UTC
Location: New York, NY
Contact:

Re: 1209:"Encoding"

Postby endolith » Thu May 09, 2013 2:30 am UTC

ucim wrote:I don't know, but it seems to me that Unicode got it wrong. E with an accent is not a distinct letter from E without an accent, and therefore should not be a distinct character. Searching for one should pull up the other. Characters should have been encoded as base characters with modifications, and searches could then easily work off of base characters, disregarding the modifications.


The point of unicode is to unite all the encodings that already exist, so if an encoding already exists with an e with an accent, that had to be included. That's why there's both Ω and Ω, for instance.

User avatar
ucim
Posts: 5570
Joined: Fri Sep 28, 2012 3:23 pm UTC
Location: The One True Thread

Re: 1209:"Encoding"

Postby ucim » Thu May 09, 2013 2:38 am UTC

endolith wrote:The point of unicode is to unite all the encodings that already exist, so if an encoding already exists with an e with an accent, that had to be included. That's why there's both Ω and Ω, for instance.

Image
Jose
Order of the Sillies, Honoris Causam - bestowed by charlie_grumbles on NP 859 * OTTscar winner: Wordsmith - bestowed by yappobiscuts and the OTT on NP 1832 * Ecclesiastical Calendar of the Order of the Holy Contradiction * Please help addams if you can. She needs all of us.

TortoiseWrath
Posts: 22
Joined: Sun Feb 24, 2013 12:28 am UTC

Re: 1209:"Encoding"

Postby TortoiseWrath » Thu May 09, 2013 2:57 am UTC

⸘̈̈́͒𝐈𝐬 𝑎𝑛𝑦𝑜𝑛𝑒 𝓮𝓵𝓼𝓮 𝔰𝔩𝔦𝔤𝔥𝔱𝔩𝔶 𝕔𝕠𝕟𝕔𝕖𝕣𝕟𝕖𝕕 𝚊𝚋𝚘𝚞𝚝 𝓦𝓗𝓐𝓣, 𝑒𝑥𝑎𝑐𝑡𝑙𝑦, 𝒉𝒆 𝑚𝑎𝑦 𝔥𝔞𝔳𝔢 𝔅𝔈𝔈𝔑 𝖙𝖗𝖞𝖎𝖓𝖌 𝗍𝗈 𝗚𝗘𝗧 𝕿𝕳𝕰 𝕾𝕶𝖄𝖂𝕽𝕴𝕿𝕰𝕽 𝕥𝕠 𝔀𝓻𝓲𝓽𝓮‽͖͌

VanI
Posts: 61
Joined: Mon Mar 19, 2012 2:54 am UTC

Re: 1209:"Encoding"

Postby VanI » Thu May 09, 2013 9:28 am UTC

ucim wrote:I don't know, but it seems to me that Unicode got it wrong. E with an accent is not a distinct letter from E without an accent, and therefore should not be a distinct character. Searching for one should pull up the other. Characters should have been encoded as base characters with modifications, and searches could then easily work off of base characters, disregarding the modifications.
Jose


Except that those sorts of search criteria are extremely language dependent. That's why Unicode has the Common Locale Data Repository - to codify that English searches should be conducted differently, and match the base character of a combined letter + diacritic. On the other hand, that would be horrible if you were searching Vietnamese text, with nasalization and tone marks all over the vowels. If an application isn't matching "naïve" to "naive", it's because they aren't doing what Unicode recommends they do.
I swear, a fireball lied to me just the other day...

User avatar
Klear
Posts: 1965
Joined: Sun Jun 13, 2010 8:43 am UTC
Location: Prague

Re: 1209:"Encoding"

Postby Klear » Thu May 09, 2013 9:58 am UTC

VanI wrote:
ucim wrote:I don't know, but it seems to me that Unicode got it wrong. E with an accent is not a distinct letter from E without an accent, and therefore should not be a distinct character. Searching for one should pull up the other. Characters should have been encoded as base characters with modifications, and searches could then easily work off of base characters, disregarding the modifications.
Jose


Except that those sorts of search criteria are extremely language dependent. That's why Unicode has the Common Locale Data Repository - to codify that English searches should be conducted differently, and match the base character of a combined letter + diacritic. On the other hand, that would be horrible if you were searching Vietnamese text, with nasalization and tone marks all over the vowels. If an application isn't matching "naïve" to "naive", it's because they aren't doing what Unicode recommends they do.


Yeah. In Czech, á isn't usually considered a separate letter form a, but č is a different letter from c. Mostly. Probably.

The Cat

Re: 1209:"Encoding"

Postby The Cat » Thu May 09, 2013 11:59 am UTC

Clearly that's without studying the spanish c. Very acute! Interpretation, motivation, and rationalization all play a role. C0 block seems to be the standard diacritic used for E. I've exhausted too much time here. Ciao! The mountains be a calling. Nothing like bird watching in the mountains. Trust you'll have a nice summer.

User avatar
Lenoxus
Posts: 120
Joined: Thu Jan 06, 2011 11:14 pm UTC

Re: 1209:"Encoding"

Postby Lenoxus » Thu May 09, 2013 6:17 pm UTC

You have to be impressed by a skywriter making perfect rectangles, though.

In seriousness, it always bugs me to see the rectangles, which has happened in every web browser that I've used. Reading about it just now, I've learned that there is no actual "all-in-one Unicode font", which rather surprises me. I know that 2^16 is a big number, but if the Unicode consortium has the ability to assign a symbol to each number, why haven't they made a font yet?

silverpie
Posts: 6
Joined: Sun Apr 01, 2012 3:38 pm UTC

Re: 1209:"Encoding"

Postby silverpie » Thu May 09, 2013 6:20 pm UTC

philipquarles wrote:Is a combining diacritic a diacritic that is combined with the character it modifies, or is it a diacritic that indicates that the modified character should be read as if it were combined with another character?


The former. For instance, combining version of the diacritic ^, plus letter c, gives ĉ. (The latter case would actually be handled by a diacritic that combined with both of the characters involved--here comes an attempted example... ae͡)


Lenoxus wrote:Reading about it just now, I've learned that there is no actual "all-in-one Unicode font", which rather surprises me. I know that 2^16 is a big number, but if the Unicode consortium has the ability to assign a symbol to each number, why haven't they made a font yet?


Currently impossible, because 216 is the most characters a font can contain within current standards, and Unicode has 17 times that many positions.

User avatar
RebeccaRGB
Posts: 336
Joined: Sat Mar 06, 2010 7:36 am UTC
Location: Lesbians Love Bluetooth
Contact:

Re: 1209:"Encoding"

Postby RebeccaRGB » Thu May 09, 2013 7:31 pm UTC

higgs-boson wrote:Or did anyone come across a comma-cedille lately?

One comma-cedilla,̧ coming up!̂̂̂̂̂
silverpie wrote:
Lenoxus wrote:Reading about it just now, I've learned that there is no actual "all-in-one Unicode font", which rather surprises me. I know that 2^16 is a big number, but if the Unicode consortium has the ability to assign a symbol to each number, why haven't they made a font yet?

Currently impossible, because 216 is the most characters a font can contain within current standards, and Unicode has 17 times that many positions.

Also, there are many issues with making a pan-Unicode font: many of the characters will appear different in different locales, so no one font can properly support every locale; there are very few people who know every single script in Unicode well enough to create a good-looking font for it; and you kind of have to be insane to put together a 60K+-glyph font. As a result most fonts that attempt to cover all of Unicode end up looking pretty ugly. That said, the closest you'll come to an "all-in-one Unicode font" is GNU Unifont (monospaced, bitmapped, ugly) and Code2000 (defunct, not free, ugly).
Stephen Hawking: Great. The entire universe was destroyed.
Fry: Destroyed? Then where are we now?
Al Gore: I don't know. But I can darn well tell you where we're not—the universe!

User avatar
ucim
Posts: 5570
Joined: Fri Sep 28, 2012 3:23 pm UTC
Location: The One True Thread

Re: 1209:"Encoding"

Postby ucim » Thu May 09, 2013 8:54 pm UTC

If
{ e-with-an-accent doesn't match in search with e-without-an-accent means somebody is doing something wrong }
then
{ somebody will always be doing something wrong }
because
{ you can't control the user's choice of software }
and
{ this applies especially on the net }

It seems to me that unicode will lead to more and more fonts having missing pieces. In fact, most of the fonts are mostly missing pieces, so even though you might write in unicode, you still have to restrict your character set if you want a good chance that the message will display properly. And with unicode, sanitizing becomes much more problematic, since there are a thousand worlds of characters whose effect on your display are unknown or undefined.

Program and data are already intertwined enough1 - unicode makes it more difficult to erect a barrier between them.

Jose
1As we all know, program is just data that is executed, data is just program in frozen form. Treating (user) data like program is necessary and dangerous at the same time.
Order of the Sillies, Honoris Causam - bestowed by charlie_grumbles on NP 859 * OTTscar winner: Wordsmith - bestowed by yappobiscuts and the OTT on NP 1832 * Ecclesiastical Calendar of the Order of the Holy Contradiction * Please help addams if you can. She needs all of us.

speising
Posts: 2070
Joined: Mon Sep 03, 2012 4:54 pm UTC
Location: wien

Re: 1209:"Encoding"

Postby speising » Thu May 09, 2013 9:28 pm UTC

ucim wrote:If
{ e-with-an-accent doesn't match in search with e-without-an-accent means somebody is doing something wrong }
then
{ somebody will always be doing something wrong }
because
{ you can't control the user's choice of software }
and
{ this applies especially on the net }

It seems to me that unicode will lead to more and more fonts having missing pieces. In fact, most of the fonts are mostly missing pieces, so even though you might write in unicode, you still have to restrict your character set if you want a good chance that the message will display properly. And with unicode, sanitizing becomes much more problematic, since there are a thousand worlds of characters whose effect on your display are unknown or undefined.

Program and data are already intertwined enough1 - unicode makes it more difficult to erect a barrier between them.

Jose
1As we all know, program is just data that is executed, data is just program in frozen form. Treating (user) data like program is necessary and dangerous at the same time.


except for funny experiments, your choice of characters is normally restricted anyway by the language you are writing in. the vast majority of the unicode space is simply irrelevant to any specific text.
if you write in an exotic language, your target audience will have an appropriate font available.

User avatar
Pfhorrest
Posts: 3913
Joined: Fri Oct 30, 2009 6:11 am UTC
Contact:

Re: 1209:"Encoding"

Postby Pfhorrest » Thu May 09, 2013 9:43 pm UTC

I can't speak for the Windows or Linux worlds, but Apple's type engine handles font incompleteness in a nifty way by falling through holes in one font to another font that does have a glyph for that character set, down eventually to an all-inclusive font at the bottom which just draws a rect with some numbers to let you know something about what you should be seeing there if nothing else. I'm not sure how it picks which font to fall through to on the way there though. But even though e.g. Times New Roman doesn't have any Hebrew or Cyrilic glyphs, when I read something written in one of those scripts I still see the appropriate characters written in a font which supports them.
Forrest Cameranesi, Geek of All Trades
"I am Sam. Sam I am. I do not like trolls, flames, or spam."
The Codex Quaerendae (my philosophy) - The Chronicles of Quelouva (my fiction)


Return to “Individual XKCD Comic Threads”

Who is online

Users browsing this forum: Bing [Bot] and 46 guests