Unicode characters

Things that don't belong anywhere else. (Check first).

Moderators: Moderators General, Prelates, Magistrates

User avatar
sehku17
Posts: 27
Joined: Mon Oct 20, 2008 9:35 pm UTC
Location: Missouri, USA

Unicode characters

Postby sehku17 » Tue Oct 21, 2008 12:48 am UTC

I was recently doing some web design and found the need to know the numerical values for various Unicode characters, so I made a simple script in javascript to find the first few thousand Unicode characters. For those of you who wish to try, open a new tab and type "javascript:" followed by this code

Code: Select all

document.write("<table cols='2' cellpadding='5'><tr><th>Character</th><th>Number</th>");for(n=0;n<=10100;n++){document.write("<tr><td><font size='16'>&#"+n+"</font></td><td>"+n+"</td></tr>");}document.write("</table>");


While I was looking through this chart I found a few notable characters such as ☬, ☭, ☕, ☘, and ☔. I know that Unicode supports many languages, but these and similar pictographs seemed very common, and seemed to serve no real purpose. I was wondering if anyone could tell me the logic of having these characters, any when these would ever be used in the real world. It adds a lot of overhead with no real function. Did Unicode designers just add them to mess with us? Or is there some language composed entirely of pictures of scissors and umbrellas?

If anyone knows the logic, could they please explain it?
The box said "Requires Windows 2000 or better," so I installed Linux.

User avatar
phlip
Restorer of Worlds
Posts: 7572
Joined: Sat Sep 23, 2006 3:56 am UTC
Location: Australia
Contact:

Re: Unicode characters

Postby phlip » Tue Oct 21, 2008 1:02 am UTC

Well, is there any particular reason not to include them? I mean, there's enough room in Unicode to have all of that, and room to spare.

That said, I'm not sure why these particular symbols are in the BMP, I thought that range was usually restricted to the more useful stuff (since it's all you can use with UCS-2)... I had a look at the official standard for the segment, but there's no rationale there. Or on Wikipedia, either. (Incidentally, both of those sources are probably more convenient than your JavaScript code... a good character map program (or failing that, the one that comes with Windows) is useful, too.) But still, that segment takes up what, 1/256 of the BMP? 1/4352 of the entirety of the Unicode range? I'm not worried about that... there's more space unused, or marked as "private use", than all the pointless just-for-fun segments added together.

Code: Select all

enum ಠ_ಠ {°□°╰=1, °Д°╰, ಠ益ಠ╰};
void ┻━┻︵​╰(ಠ_ಠ ⚠) {exit((int)⚠);}
[he/him/his]

User avatar
sehku17
Posts: 27
Joined: Mon Oct 20, 2008 9:35 pm UTC
Location: Missouri, USA

Re: Unicode characters

Postby sehku17 » Tue Oct 21, 2008 1:20 am UTC

Thanks for the research.

Private use and blank characters are also an issue for overhead, which is why a fair amount of the time I just use ASCII.

However, the question was the rationale for including them, regardless of if that is a good thing or a bad thing. I have no problem including them, but they see almost no use and I was wondering why they were there if no one used them.

As far as my code, the reason for it is to give the numerical values of the characters for inclusion when I am writing html more than to display the characters. To my knowledge it was just as easy to write that script as it was to find a way to get my other programs to properly display it in my desired format.

Anyway the question is still out there: why were these types of characters included in Unicode?
The box said "Requires Windows 2000 or better," so I installed Linux.

Asleep or Wrong
Posts: 78
Joined: Tue Nov 13, 2007 10:34 am UTC
Location: sirmio
Contact:

Re: Unicode characters

Postby Asleep or Wrong » Tue Oct 21, 2008 1:44 am UTC

can't much answer your question but I would like to draw attention to the only good use of pictorial unicode characters:
http://☃.net/

User avatar
The Hyphenator
Posts: 791
Joined: Mon Nov 19, 2007 2:16 am UTC
Location: The Shades, Ankh-Morpork

Re: Unicode characters

Postby The Hyphenator » Tue Oct 21, 2008 2:01 am UTC

Asleep or Wrong wrote:can't much answer your question but I would like to draw attention to the only good use of pictorial unicode characters:
http://☃.net/
Sites like these are why the internet exists.
The image link changes whenever I find a new cool website.
Spoiler:
Image

User avatar
phlip
Restorer of Worlds
Posts: 7572
Joined: Sat Sep 23, 2006 3:56 am UTC
Location: Australia
Contact:

Re: Unicode characters

Postby phlip » Tue Oct 21, 2008 2:57 am UTC

They're also useful for variable names...
Generalizing Overloading for C++2000 wrote:☎->✆(); // take my phone (☎) off hook (✆)

(Slightly edited, since the original actually uses some random dingbats font, rather than actual Unicode pointlessness)

Also, here is a bigger list of all the symbolic Unicode subranges.
Last edited by phlip on Fri Jun 21, 2013 4:26 am UTC, edited 1 time in total.

Code: Select all

enum ಠ_ಠ {°□°╰=1, °Д°╰, ಠ益ಠ╰};
void ┻━┻︵​╰(ಠ_ಠ ⚠) {exit((int)⚠);}
[he/him/his]

User avatar
'; DROP DATABASE;--
Posts: 3284
Joined: Thu Nov 22, 2007 9:38 am UTC
Location: Midwest Alberta, where it's STILL snowy
Contact:

Re: Unicode characters

Postby '; DROP DATABASE;-- » Tue Oct 21, 2008 5:24 am UTC

I think these characters would see more use if ① more people knew they existed and ② it were less likely they'd get mangled in transmission.
phlip wrote:
Generalizing Overloading for C++2000 wrote:☎->✆(); // take my phone (☎) off hook (✆)
I don't know whether to be amazed or sick.
poxic wrote:You suck. And simultaneously rock. I think you've invented a new state of being.

User avatar
phlip
Restorer of Worlds
Posts: 7572
Joined: Sat Sep 23, 2006 3:56 am UTC
Location: Australia
Contact:

Re: Unicode characters

Postby phlip » Tue Oct 21, 2008 5:32 am UTC

'; DROP DATABASE;-- wrote:I don't know whether to be amazed or sick.
I think "a bit of both" is the intent of pretty much the whole document... of which that line is only one of the many "Dear God, no" moments.

Code: Select all

enum ಠ_ಠ {°□°╰=1, °Д°╰, ಠ益ಠ╰};
void ┻━┻︵​╰(ಠ_ಠ ⚠) {exit((int)⚠);}
[he/him/his]

User avatar
'; DROP DATABASE;--
Posts: 3284
Joined: Thu Nov 22, 2007 9:38 am UTC
Location: Midwest Alberta, where it's STILL snowy
Contact:

Re: Unicode characters

Postby '; DROP DATABASE;-- » Tue Oct 21, 2008 5:42 am UTC

Please tell me overloading whitespace characters is not actually possible.
poxic wrote:You suck. And simultaneously rock. I think you've invented a new state of being.

User avatar
phlip
Restorer of Worlds
Posts: 7572
Joined: Sat Sep 23, 2006 3:56 am UTC
Location: Australia
Contact:

Re: Unicode characters

Postby phlip » Tue Oct 21, 2008 5:44 am UTC

It's not. Note that the publication date of that paper is 1 April, 1998.

Code: Select all

enum ಠ_ಠ {°□°╰=1, °Д°╰, ಠ益ಠ╰};
void ┻━┻︵​╰(ಠ_ಠ ⚠) {exit((int)⚠);}
[he/him/his]

Hit3k
Posts: 1156
Joined: Sun Apr 22, 2007 9:12 am UTC
Location: Melbourne, Australia
Contact:

Re: Unicode characters

Postby Hit3k » Tue Oct 21, 2008 8:06 am UTC

I'd hate to be the new guy, opening main.cpp and seeing that. I'd probably shoot myself, or just walk out.
Sungura wrote:My mom made me watch a star wars. Two of them , actually. The Death Star one and the one where the dude ends up in the swamp with the weird guy who talks funny.

User avatar
sehku17
Posts: 27
Joined: Mon Oct 20, 2008 9:35 pm UTC
Location: Missouri, USA

Re: Unicode characters

Postby sehku17 » Tue Oct 21, 2008 6:08 pm UTC

Yeah, just because you can name functions like that and use those characters in code doesn't mean you should. It would end up getting really confusing if you used that notation with any frequency.
The box said "Requires Windows 2000 or better," so I installed Linux.

Ciber
Posts: 126
Joined: Fri Mar 15, 2013 1:33 pm UTC

Re: Unicode characters

Postby Ciber » Fri Jun 21, 2013 4:10 am UTC

Until you use it frequently enough that it becomes standard practice. :twisted:


Return to “General”

Who is online

Users browsing this forum: No registered users and 9 guests