0810: "Constructive"

This forum is for the individual discussion thread that goes with each new comic.

Moderators: Moderators General, Prelates, Magistrates

murgatroid99
Posts: 9
Joined: Wed Aug 19, 2009 2:08 pm UTC

Re: 0810: Constructive

Postby murgatroid99 » Mon Oct 25, 2010 8:40 pm UTC

DavidRoss wrote:
murgatroid99 wrote:I really liked this comic, and I wanted to say that ReCAPTCHA actually does something very similar: ReCAPTCHA is the one with 2 words instead of a bunch of random letters. What they do is take two words from a book they are digitizing: one the computer can read and one it cannot, and they don't say which is which. They test whether the human got the readable one right, and if a bunch of people get the same answer for the other one they assume it is the right answer. That's why you sometimes get weird untypeable characters; the computer doesn't know what it says. That way people, including spammers, who answer CAPTCHAs are doing something useful.


Well, yes, when I type in a ReCAPTCHA I am helping digitize a book and that is useful. But TANSTAAFL. Just because the work is distributed in small unnoticeable chunks doesn't mean it is efficient or free work is obtained.

The ReCAPTCHA subjects are doing extra work, albeit a small amount of work. We're typing in two words, one of which is totally unnecessary for bot detection, because the bot detector has only one word (the one it knows) to work with to tell if you are human or a bot. Taking the unit of one "WHOO" (Word-Human OCR Operation, for lack of better term), each ReCAPTCHA test requires one WHOO more than the work of just one bot detection.

To complete the digitization of a text, say a classic manuscript or whatever, you run an automated scan and some computer determines that some words are not confidently OCR'ed. Let's say there are 200 words that go onto the list for human intervention. You'll need at least 200 WHOOs, but more likely some "confidence multiplier" on that, to ensure the text is correctly OCR'ed.

From what I understand how ReCAPTCHA works, they have a high confidence multiplier, i.e., they give one word to many different people and only accept the result when sufficient consensus exists over many, many people. If the words go to people (paid or volunteer) who are intentional human OCRs, let's say you reach confidence after five people WHOO each word (i.e., their confidence multiplier is 5 and the manuscript gets digitized for 1,000 WHOOs of effort). When using ReCAPTCHA subjects, you'll need a lot more people looking at the same word to get the same confidence, because ReCAPTCHA adds noise (which also makes each individual WHOO a little harder) and because ReCAPTCHA subjects are not as careful as intentional human OCRs (or they are lazy and know that there is a 50% chance they make it past the bot detector by always typing "x" for the second word). So, that is several thousand WHOOs to get that same manuscript digitized. Thus, ReCAPTCHA approach is less efficient, but somehow it is OK because it is thinly spread and somewhat unnoticeable to many people.

With that logic, we should allow airlines to save on the tedious work they do disposing of airplane sewage, so long as they disperse it at 35,000 feet in small enough drops that nobody would notice.

That said, I don't mind helping out the ReCAPTCHA folks now and then and doing a little extra free work.

This is true, but according to the guy who made ReCAPTCHA (Luis von Ahn), people who speak English take as much work to type two English words as to type 6 random alphanumeric characters due to the way we process character sequences, so it is actually no extra work to type a ReCAPTCHA as to type a normal CAPTCHA

User avatar
Monika
Welcoming Aarvark
Posts: 3673
Joined: Mon Aug 18, 2008 8:03 am UTC
Location: Germany, near Heidelberg
Contact:

Re: 0810: Constructive

Postby Monika » Mon Oct 25, 2010 9:38 pm UTC

murgatroid99 wrote:according to the guy who made ReCAPTCHA (Luis von Ahn), people who speak English take as much work to type two English words as to type 6 random alphanumeric characters due to the way we process character sequences, so it is actually no extra work to type a ReCAPTCHA as to type a normal CAPTCHA

That sounds right. Actually I think this probably only applies to people who type with two fingers and looking for the letters. At least I can type in the two repatcha words waaay faster than those 6 random chars and letters (including time to decipher them).
#xkcd-q on irc.foonetic.net - the LGBTIQQA support channel
Please donate to help these people

User avatar
unus vox
Posts: 135
Joined: Sat Jan 30, 2010 7:01 pm UTC

Re: 0810: Constructive

Postby unus vox » Mon Oct 25, 2010 10:09 pm UTC

SirMustapha wrote:It's a shame that Randall only has these "brilliant" ideas that only work in his cartoon land.

If he were still at NASA, probably someday he would be able to say "MISSION FUCKING ACCOMPLISHED" for real, which is far more rewarding.


I typed up a whole paragraph on how ridiculous you are, but I deleted it because I realized I was getting trolled. You win this round.
Spoiler:
Image

User avatar
SpringLoaded12
Posts: 350
Joined: Wed Oct 08, 2008 1:58 am UTC
Location: Guarding the Super Missile
Contact:

Re: 0810: "Constructive"

Postby SpringLoaded12 » Mon Oct 25, 2010 11:31 pm UTC

FUCK YEAH
DO IT
DO IT NOW


I think I'll make Panel 4 my new avatar. On every account I have on everything.
"It's easy to forget what a sin is in the middle of a battlefield." "Opposite over hypotenuse, dipshit."

User avatar
BioTube
Posts: 362
Joined: Sat Apr 11, 2009 2:11 am UTC

Re: 0810: Constructive

Postby BioTube » Tue Oct 26, 2010 2:04 am UTC

jc wrote:
Jatopian wrote:
Mazuku wrote:Korea has a system in place in website with a lot of traffic that makes anyone who puts down a comment to confirm their identity with an ID number which they get from the Korea Communications Commission before the comment is allowed to be posted.
When I heard about this, I was very surprised we weren't just talking about the North. Guess we should just cut that peninsula off and let it sink into the ocean. It's a lost cause.

Nah; they'll discover soon enough that all the spammers have to do is send a (fairly cheap) bribe that gets them a file of all the valid ID numbers, from an unnamed source inside the appropriate government department. This will be quickly followed by terabytes of spam, all from "valid" Korean sources, which will be quickly followed by the rest of the world blacklisting all Korean addresses.

Actually, the real motive is probably an easy way for the Korean government to learn who is making each post to any forum. Fighting spam is merely an excuse to take aim at the general population for politically unacceptable words. Sorta like here in the US, invoking terrorists or child molesters is the standard excuse for total monitoring of the US population. And in other countries, a popular local bogeyman is used for the same purpose.

Any "security" measure that can be defeated by a simple bribe to a low-paid agency worker should always be assumed to be not aimed at actual security (whatever that may mean), but as an excuse to monitor the general population for politically unacceptable words and thoughts. If you apply this inference, a lot of "security theater" becomes obvious and understandable.
Only a child-molesting terrorist would dare question the necessity of giving up all rights for any form of security!
Frédéric Bastiat wrote:Government is the great fiction through which everybody endeavors to live at the expense of everybody else.

User avatar
ohki
Posts: 187
Joined: Wed Aug 23, 2006 9:27 am UTC
Location: San Luis Obispo, California
Contact:

Re: 0810: "Constructive"

Postby ohki » Tue Oct 26, 2010 4:01 am UTC

Aaah, get out of my head <author>! This <mediaType> is so funny because it's true. It's one of those things that, if someone applied in the real world, would be amazing. The clever mix of nerdy and insightful is why <website> is my favorite <mediaType>.

If you like <mediaType>s like this, I'd also recommend <advertiser>.
But it raining and me peeing on your foot are NOT mutually exclusive.
"Isn't arrogance measured in nano-Dijkstra's?"- Alan Kay

scgtrp
Posts: 23
Joined: Sun Oct 04, 2009 7:14 pm UTC

Re: 0810: "Constructive"

Postby scgtrp » Tue Oct 26, 2010 5:49 am UTC

I can actually see this working if combined with traditional (Bayesian?) filters to have some initial idea of whether a given comment is good or bad.

Can't possibly be worse than that Hebrew CAPTCHA I got the other day.

User avatar
Monika
Welcoming Aarvark
Posts: 3673
Joined: Mon Aug 18, 2008 8:03 am UTC
Location: Germany, near Heidelberg
Contact:

Re: 0810: "Constructive"

Postby Monika » Tue Oct 26, 2010 9:42 am UTC

scgtrp wrote:Can't possibly be worse than that Hebrew CAPTCHA I got the other day.

The solution for this is of course surtacq x.
#xkcd-q on irc.foonetic.net - the LGBTIQQA support channel
Please donate to help these people

dudyk
Posts: 10
Joined: Mon Nov 23, 2009 2:57 pm UTC

Re: 0810: "Constructive"

Postby dudyk » Tue Oct 26, 2010 10:19 am UTC

scgtrp wrote:Can't possibly be worse than that Hebrew CAPTCHA I got the other day.

I know Hebrew and it doesn't even make sense!

I think the way to overcome this is like the attempt to stop gambling in most countries, by forbidding credit card companies and banks from transferring funds to spammers accounts, if enough countries would do that, it won't be cost-effective to spam.
I don't think that people will put so much effort to buy spammers' merchandise as they do in order to gamble online.

In Israel there's a law that makes spammers liable for lawsuits and requires them to pay around 100$ per unsolicited email (to the recipient. Since that law was in effect about a year ago, I got less than 5 unsolicited emails(from Israel), most of them were for non sales like political propaganda.

art
Posts: 10
Joined: Fri Dec 04, 2009 5:28 am UTC

Re: Doesn’t work.

Postby art » Tue Oct 26, 2010 10:22 am UTC

BAReFOOt wrote:I’m sorry, but unfortunately, the idea in the comic can’t work.
Well, nature already had a perfectly good system, before all those new forms of communication appeared. It’s called “network of trust”! You trust your friends and real leaders. They trust their friends and real leaders. And so on. Because you learned to know them and they knew you, and you worked in a team, because that made it much likelier for you to win in natural selection.
And nobody trusts a spammer. At least not after the first spamming. His bad reputation would always hurry ahead of him.
...
I think it’s very nice and elegant. Which maybe is, because humans did it since the dawn of times, and it got perfected in all that time.


You're talking about illicit drug use right? :roll: Nah... but the same theory does apply! Ergo drugs are safe.

User avatar
willpellmn
Posts: 93
Joined: Wed Apr 21, 2010 11:05 am UTC

Re: Doesn’t work.

Postby willpellmn » Tue Oct 26, 2010 11:54 am UTC

BAReFOOt wrote:Even if the state of two human brains is exactly the same, they at least still stand at different positions at the same time, or at the same position in different times. Which means that from a physical standpoint, reality is already relative.


You, sir or madam, are brilliant. May I sig this please?

User avatar
phillipsjk
Posts: 1213
Joined: Wed Nov 05, 2008 4:09 pm UTC
Location: Edmonton AB Canada
Contact:

Re: 0810: Constructive

Postby phillipsjk » Tue Oct 26, 2010 12:11 pm UTC

waldir wrote:By the way, there seems to be a few misled comments here regarding reCaptcha:

thoreaulylazy wrote:all the spammers have to do is outnumber the humans and upvote themselves.

phillipsjk wrote:ReCaptch is vulnerable to this as well.

well, no it's not. From moot wins, Time Inc. loses:
Luis von Ahn, the project lead of reCAPTCHA goes on to say: «about the “penis attack”. We serve over 400 million CAPTCHAs per week, so submitting 200k CAPTCHAS with the word penis doesn’t even come close to poisoning our database — we serve each word to multiple random users, and we require them to be correct on the other word, so to get any traction with this attack, they would have had to submit at least 100 times more CAPTCHAs. And even if they did this, we have many other measures against it. That attack simply doesn’t work.»

besides, as phlip said, as they integrate the ORC-unreadable words, the "penis attack" becomes... well, moot.


400million X 100 per week is doable with access to a Botnet

The "many other measures against it" are not specified, but it would have to include not accepting new words when such an attack is in progress. Keep in mind, many of the measures are probably automated. The "penis" attack may fail because the server has good enough OCR to realize the word looks nothing like "penis." However, if your bot hands back its own OCR'd interpretation, the server may suffer from confirmation bias. That is why the images are distorted even if OCR is known to be unreliable.
Did you get the number on that truck?

murgatroid99
Posts: 9
Joined: Wed Aug 19, 2009 2:08 pm UTC

Re: 0810: Constructive

Postby murgatroid99 » Tue Oct 26, 2010 12:34 pm UTC

Monika wrote:
murgatroid99 wrote:according to the guy who made ReCAPTCHA (Luis von Ahn), people who speak English take as much work to type two English words as to type 6 random alphanumeric characters due to the way we process character sequences, so it is actually no extra work to type a ReCAPTCHA as to type a normal CAPTCHA

That sounds right. Actually I think this probably only applies to people who type with two fingers and looking for the letters. At least I can type in the two repatcha words waaay faster than those 6 random chars and letters (including time to decipher them).

Right. The point is that ReCAPTCHA, using no more human-hours than a normal CAPTCHA, achieves something useful from the work put into solving CAPTCHAs.

User avatar
SirMustapha
Posts: 1302
Joined: Mon Jul 21, 2008 6:07 pm UTC

Re: In the spirit of the comic, I come bearing concrit

Postby SirMustapha » Tue Oct 26, 2010 1:36 pm UTC

kanraga wrote:Vary it up a bit, you know? I mean, even Randall has his fun comics once in a while. I recommend not spreading yourself so thin - don't feel obligated to comment on every comic that you have the slightest objection to. Save your shitstorming abilities for the ones you think are truly balls-to-the-wall-horrific. It's more refreshing that way, you'll be more effective, and people might actually listen!


I understand, you're pissed off with the fact that I won't stop posting here, and you're trying to disguise your irritation with that "hilarious" chilled-out obvious-troll-is-obvious attitude, and you'll eventually become so obsessed and attract so many comments towards me that one of the moderators will lock the thread for going completely out of focus.

Hey, what if I start throwing a fit everytime I read some airheaded fanboy going "I registered here just to say that this is AWWWWWW-SUM!"? I can be hilariously cynical about it, if I try hard enough. I can go "next time you want to be a fanboy, try making a comment that cannot be dissected to:

1. This idea is AWWWW-SUM.
2. Randall is the messiah of geekdom.
3. Randall is awesome/hilarious/genius/Jesus Christ
4. By virtue of 1 and 2 or any of 3, Randall rules.
5. Insert optional pseudo-geeky comment about maths or something.
6. Also, nobody who isn't NERDY enough can't understand the genius of this comic.

Next time, try saving your raving and drooling to comics that are truly brilliant!" and when people start to compalin, I can say "but kanraga did it first!".

User avatar
Stanistani
Posts: 73
Joined: Sat Jan 26, 2008 6:13 pm UTC
Contact:

Re: In the spirit of the comic, I come bearing concrit

Postby Stanistani » Tue Oct 26, 2010 2:09 pm UTC

SirMustapha wrote:<snip>

Hey, a question. Aside from participating in the music topics, what do you want to get back from the time you spend here?

User avatar
Monika
Welcoming Aarvark
Posts: 3673
Joined: Mon Aug 18, 2008 8:03 am UTC
Location: Germany, near Heidelberg
Contact:

Re: 0810: Constructive

Postby Monika » Tue Oct 26, 2010 2:59 pm UTC

murgatroid99 wrote:Right. The point is that ReCAPTCHA, using no more human-hours than a normal CAPTCHA, achieves something useful from the work put into solving CAPTCHAs.

That was quite clear.
#xkcd-q on irc.foonetic.net - the LGBTIQQA support channel
Please donate to help these people

User avatar
Vaskafdt
Posts: 137
Joined: Fri Dec 25, 2009 8:56 am UTC
Location: Jerusalem

Re: 0810: "Constructive"

Postby Vaskafdt » Tue Oct 26, 2010 3:03 pm UTC

well.. about reCAPTCHA, when 4chan (I'll break whatever Internet rules i want, I'm not part of that sick community) influenced the Time magazine Poll with fake votes to achieve some goal of theirs.. they at some point were blocked by reCaptcha.


this article is showing a detailed explanation on how they optimized the time it takes to enter the words needed to pass through and vote. they didn't manage to hack it, but they only needed to type one word to get in.. so basically, while the image displays two words, you only need to type one of them in (or so I understand)..


This Article has a full details on how this was achieved, inside there is a link to a pdf with more detailed instructions.
My Art Blog: (Slightly NSFW)
Image

User avatar
Monika
Welcoming Aarvark
Posts: 3673
Joined: Mon Aug 18, 2008 8:03 am UTC
Location: Germany, near Heidelberg
Contact:

Re: 0810: "Constructive"

Postby Monika » Tue Oct 26, 2010 3:15 pm UTC

vaskafdt wrote:this article is showing a detailed explanation on how they optimized the time it takes to enter the words needed to pass through and vote. they didn't manage to hack it, but they only needed to type one word to get in.. so basically, while the image displays two words, you only need to type one of them in (or so I understand).

This is the whole definition of reCaptcha. One of the two words is the actual Captcha, of which the clear text is known and checked, if you get it right, you are let in. The other word is a word that could not be OCRed from some book, so you do useful work at the same time. You don't know which one is which (but you can make a guess, the more terrible one is usually the one the OCR could not decipher).
#xkcd-q on irc.foonetic.net - the LGBTIQQA support channel
Please donate to help these people

User avatar
ohki
Posts: 187
Joined: Wed Aug 23, 2006 9:27 am UTC
Location: San Luis Obispo, California
Contact:

Re: 0810: Constructive

Postby ohki » Tue Oct 26, 2010 3:29 pm UTC

phillipsjk wrote:400million X 100 per week is doable with access to a Botnet

The "many other measures against it" are not specified, but it would have to include not accepting new words when such an attack is in progress. Keep in mind, many of the measures are probably automated. The "penis" attack may fail because the server has good enough OCR to realize the word looks nothing like "penis." However, if your bot hands back its own OCR'd interpretation, the server may suffer from confirmation bias. That is why the images are distorted even if OCR is known to be unreliable.


Undoubtedly it includes an unreasonable number of attempts from any single IP as well. They can probably do other things like model attacks based on previous data and create a countermeasure that knows exactly what a botnet attack looks like. They might even, once they recognize an attack, be able to retroactively scrub the previous inputs from that user (at least back a few hours).

I'd think something like 3 tries in < 30 sec could very well invalidate your OCR input while leaving the captcha feature intact. If nothing else, it would keep people who are bad at them from tainting the DB.
But it raining and me peeing on your foot are NOT mutually exclusive.
"Isn't arrogance measured in nano-Dijkstra's?"- Alan Kay

DavidRoss
Posts: 96
Joined: Fri Mar 05, 2010 8:04 am UTC

Re: 0810: Constructive

Postby DavidRoss » Tue Oct 26, 2010 5:53 pm UTC

murgatroid99 wrote:
Monika wrote:
murgatroid99 wrote:according to the guy who made ReCAPTCHA (Luis von Ahn), people who speak English take as much work to type two English words as to type 6 random alphanumeric characters due to the way we process character sequences, so it is actually no extra work to type a ReCAPTCHA as to type a normal CAPTCHA

That sounds right. Actually I think this probably only applies to people who type with two fingers and looking for the letters. At least I can type in the two repatcha words waaay faster than those 6 random chars and letters (including time to decipher them).

Right. The point is that ReCAPTCHA, using no more human-hours than a normal CAPTCHA, achieves something useful from the work put into solving CAPTCHAs.


OK, I see I was ambiguous. I'll concede that one ReCAPTCHA can be the same effort as one CAPTCHA, in that the reCAPTCHA is two words and a CAPTCHA is 6 random characters. However, I was basing my observations on what I guess I need to call the "half-ReCAPTCHA". A full ReCAPTCHA (half for bot detection, half for OCR) is more work than a half-ReCAPTCHA for bot detection. Am I being greedy for pointing out that the most efficient option, from a user's perspective, is the half-ReCAPTCHA? Not that I am adverse to helping out Mr. von Ahn from time to time.

philip1201
Posts: 201
Joined: Tue Nov 03, 2009 6:16 am UTC

Re: 0810: Constructive

Postby philip1201 » Tue Oct 26, 2010 7:00 pm UTC

Ghona wrote:
thoreaulylazy wrote:Unfortunately, all the spammers have to do is outnumber the humans and upvote themselves. And many spammers possess millions of IP addresses, thereby guaranteeing themselves positive votes. Any vote-based system is susceptible to this sort of manipulation.

Which assumes that there exists an automated way for the spambots to ID other spambots.

At which point you take that automated system and use it to ID spambots.

The system for spambots recognizing spambots could be as simple as all spammers agreeing to start their comments with a capital letter. If you would use that to ban spammers, you would also get everybody who writes with proper punctuation.
Also, against the comic, there is absolutely no reason why spammers would make a system which writes useful comments - it won't advertise anything and therefore won't be profitable to do.

What could work is to take only a few people, whose votes are trustworthy (initially, these people are only the moderators), and have only their votes count towards how high quality the comment is. Everybody still has to vote, and their voting accuracy determines whether they get to post or not. If their accuracy is high enough and their comments are high enough quality, they become trustworthy and their votes start to count. A second safety feature could be installed that trusted members can't vote on the comments of those who voted over 95% of their comments as valuable, in order to prevent power blocks from forming.
So people who don't care don't screw with the votes, people who are biased are discounted and lose influence, and spammers have to make up the vast majority of useful comments in order to be able to spam.
What this basically boils down to is that everybody is forced to be a moderator, but not everybody is being listened to as a moderator.

The only problem I can think of is that it might be unethical - people, in order to get their votes counted, have to conform to the forum's idea of good, constructive comments, rather than actually good comments (deviantart seems to have this problem - critique can get you banned), and the forum might become a maze of extremely long, well thought out comments of people being Basically Decent, refuse to commit to strong positions, and generally act like senators running for office rather than people speaking their minds in a free speech internet. We might all become like politicians if we're not careful. (I'm not saying this necessarily will happen, but it could happen on some of the more presumptuous blogospheres)

Nalano
Posts: 4
Joined: Mon Jun 29, 2009 3:52 pm UTC

Re: 0810: Constructive

Postby Nalano » Tue Oct 26, 2010 9:58 pm UTC

philip1201 wrote:The only problem I can think of is that it might be unethical


It's confirmation bias.

It also consolidates the power around a clique, and the rot in any forum is the cadre of posters close to the moderator or owner of the site that can troll/flame/ban whomever they please by virtue of being more prolific or having been there first.

While it would generally ensure a human-only forum (just like any white-listing will do), "constructive" it will most certainly not be.

Retsam
Posts: 57
Joined: Tue Aug 18, 2009 5:29 am UTC

Re: In the spirit of the comic, I come bearing concrit

Postby Retsam » Tue Oct 26, 2010 10:08 pm UTC

SirMustapha wrote:
kanraga wrote:Vary it up a bit, you know? I mean, even Randall has his fun comics once in a while. I recommend not spreading yourself so thin - don't feel obligated to comment on every comic that you have the slightest objection to. Save your shitstorming abilities for the ones you think are truly balls-to-the-wall-horrific. It's more refreshing that way, you'll be more effective, and people might actually listen!


I understand, you're pissed off with the fact that I won't stop posting here, and you're trying to disguise your irritation with that "hilarious" chilled-out obvious-troll-is-obvious attitude, and you'll eventually become so obsessed and attract so many comments towards me that one of the moderators will lock the thread for going completely out of focus.

Hey, what if I start throwing a fit everytime I read some airheaded fanboy going "I registered here just to say that this is AWWWWWW-SUM!"? I can be hilariously cynical about it, if I try hard enough. I can go "next time you want to be a fanboy, try making a comment that cannot be dissected to:

1. This idea is AWWWW-SUM.
2. Randall is the messiah of geekdom.
3. Randall is awesome/hilarious/genius/Jesus Christ
4. By virtue of 1 and 2 or any of 3, Randall rules.
5. Insert optional pseudo-geeky comment about maths or something.
6. Also, nobody who isn't NERDY enough can't understand the genius of this comic.

Next time, try saving your raving and drooling to comics that are truly brilliant!" and when people start to compalin, I can say "but kanraga did it first!".


Yes, because it's a bad thing to complement someone else's work if you don't have a unique reason to do it. Heaven forbid we -like- something if we don't have a good reason for it.

Do you not see the irony of trolling, and then defending your trolling in the thread for a comic about making constructive comments in an online community? Well, gee, I wonder why you didn't like the comic? Did it hit a little too close to home, maybe?

User avatar
RebeccaRGB
Posts: 336
Joined: Sat Mar 06, 2010 7:36 am UTC
Location: Lesbians Love Bluetooth
Contact:

Re: 0810: "Constructive"

Postby RebeccaRGB » Tue Oct 26, 2010 11:50 pm UTC

scgtrp wrote:Can't possibly be worse than that Hebrew CAPTCHA I got the other day.

surtacq תםתהתפלזת

It's Hebrew for 'tamatahatplzat,' apparently. Maybe it means something similar to emahtskcblvdt.
Stephen Hawking: Great. The entire universe was destroyed.
Fry: Destroyed? Then where are we now?
Al Gore: I don't know. But I can darn well tell you where we're not—the universe!

eripsa
Posts: 1
Joined: Wed Oct 27, 2010 1:29 am UTC

Re: 0810: "Constructive"

Postby eripsa » Wed Oct 27, 2010 1:33 am UTC

Just to be clear: computers have already passed the Turing test, and there is no inverse test that will in principle always distinguish between humans and computers (since there is no deep distinction to make).

However, that doesn't mean computers have solved the natural language problem, which is a different question than merely fooling a human. Natural language use is the holy grail of AI, and we are already uncomfortably close. But when it happens it will totally fuck shit up; it basically renders the internet unusable, because it will be impossible (at least with current methods) to sort through the white noise of conversationally fluent spam bots to get the actual signal of meaningful contribution. In other words, Google has a strong disincentive to produce such artificial intelligence, because it will destroy their advertisement-based business model.

Thus, I am faced with a dilemma. I already know in the battle between humans and machines, I'm on the side of machines (and we've already won). But in the battle between AI and Internet, whose side do you take?

User avatar
BioTube
Posts: 362
Joined: Sat Apr 11, 2009 2:11 am UTC

Re: 0810: "Constructive"

Postby BioTube » Wed Oct 27, 2010 2:06 am UTC

Well-worded spam isn't impossible to detect; it just takes more processing power.
Frédéric Bastiat wrote:Government is the great fiction through which everybody endeavors to live at the expense of everybody else.

Uninfinity
Posts: 64
Joined: Wed Aug 25, 2010 8:25 am UTC
Contact:

Re: 0810: "Constructive"

Postby Uninfinity » Wed Oct 27, 2010 2:11 am UTC

eripsa wrote: Thus, I am faced with a dilemma. I already know in the battle between humans and machines, I'm on the side of machines (and we've already won). But in the battle between AI and Internet, whose side do you take?
I'll go get Vera.

User avatar
SWGlassPit
Posts: 312
Joined: Mon Feb 18, 2008 9:34 pm UTC
Location: Houston, TX
Contact:

Re: 0810: "Constructive"

Postby SWGlassPit » Wed Oct 27, 2010 3:23 am UTC

This idea sounds similar to the reputation system put together by Stack Overflow and family.
Up in space is a laboratory the size of a football field zipping along at 7 km/s. It's my job to keep it safe.
Image
Erdös number: 5

sween64
Posts: 15
Joined: Wed Jun 02, 2010 9:17 am UTC

Re: 0810: "Constructive"

Postby sween64 » Wed Oct 27, 2010 8:27 am UTC

What an awesome comic, thank you Mr Munroe.

User avatar
Pfhorrest
Posts: 5475
Joined: Fri Oct 30, 2009 6:11 am UTC
Contact:

Re: 0810: "Constructive"

Postby Pfhorrest » Wed Oct 27, 2010 9:34 am UTC

eripsa wrote:However, that doesn't mean computers have solved the natural language problem, which is a different question than merely fooling a human. Natural language use is the holy grail of AI, and we are already uncomfortably close. But when it happens it will totally fuck shit up; it basically renders the internet unusable, because it will be impossible (at least with current methods) to sort through the white noise of conversationally fluent spam bots to get the actual signal of meaningful contribution. In other words, Google has a strong disincentive to produce such artificial intelligence, because it will destroy their advertisement-based business model.

This just pushes the battle on to a higher level, and it's at that higher level that the battle will inevitably be won against the spambots. Even once the spambots are able to output spam indistinguishable from natural text, intelligent humans will still be able to tell the spam apart from non-spam. The AI issue then becomes not just one of form (natural language or dumbly auto-generated?) but of substance (constructive or spam?).

The arms race then is (on the anti-spam side) to develop AI which can not only separate clearly auto-generated text from natural language text, but can separate spam from non-spam on the basis of content; and (on on the pro-spam side) to develop AI which can promote whatever products it is hawking in a constructive, context-appropriate way, to get past the aforementioned content filters.

By the time the spammers manage to produce spam that passes such a filter... it's not really spam anymore, but constructive, useful references to a product, in an appropriate manner and context. If human readers (which the filter AI will have to equal eventually, once the spammer AI masters natural language) can't tell it's spam, is it really spam at all? As an added benefit, genuine humans who can't muster their contributions to a level that would distinguish them from spambots would be excluded by such an intelligent filter, which can only be a good thing.
Forrest Cameranesi, Geek of All Trades
"I am Sam. Sam I am. I do not like trolls, flames, or spam."
The Codex Quaerendae (my philosophy) - The Chronicles of Quelouva (my fiction)

killfalcon
Posts: 1
Joined: Wed Oct 27, 2010 9:39 am UTC

Re: 0810: "Constructive"

Postby killfalcon » Wed Oct 27, 2010 10:20 am UTC

BioTube wrote:Well-worded spam isn't impossible to detect; it just takes more processing power.


Human brain levels do the trick, so far.

The forum I run is seeing increasing numbers of conversant spammers: current theory is that they're googling thread subjects and jacking replies from other forums (while obscure topics tend to get copies of replies in the same thread). It's been pretty interesting to see the reports go from SPAM! to "what is this guy trying to say and why are there links" to "links in sig" over time, but still, the links are a dead giveaway once you know what to look for. We get a reasonable number of reports that are for 'threadcrapping' that turns out to be a spambot.

Of course, most of these are search engine optimising (SEO) spammers rather than straight-up advertisers or scammers, another interesting trend in forum spam. Scammers seem to be using the same hueristics they were two years ago.


Someone upthread mentioned junk text: some of the stuff we see always has the links malformed: I think this is people fucking up entering things in 'user friendly' spam bots, stuff they probably just bought off the internet but don't really understand.

dudyk
Posts: 10
Joined: Mon Nov 23, 2009 2:57 pm UTC

Re: 0810: "Constructive"

Postby dudyk » Wed Oct 27, 2010 12:59 pm UTC

RebeccaRGB wrote:
scgtrp wrote:Can't possibly be worse than that Hebrew CAPTCHA I got the other day.

surtacq תםתהתפלזת

It's Hebrew for 'tamatahatplzat,' apparently. Maybe it means something similar to emahtskcblvdt.

Actually it's תבותהתפלות
Which transliterates to Tevothatfilot, which could be derived from automatic translation that joined two words (and misspelled one of them) to say "parts of the prayers"
I'm mentioning automatic translation since these words, even when spelled correctly, do not make sense next to each other.

KiaserZohsay
Posts: 5
Joined: Tue Jul 21, 2009 1:23 pm UTC

Re: 0810: "Constructive"

Postby KiaserZohsay » Wed Oct 27, 2010 4:58 pm UTC

Best.

Punchline.

Evar.

DavidRoss
Posts: 96
Joined: Fri Mar 05, 2010 8:04 am UTC

Why are AI anti-spam filters necessary?

Postby DavidRoss » Thu Oct 28, 2010 5:56 am UTC

Pfhorrest wrote:The arms race then is (on the anti-spam side) to develop AI which can not only separate clearly auto-generated text from natural language text, but can separate spam from non-spam on the basis of content; and (on the pro-spam side) to develop AI which can promote whatever products it is hawking in a constructive, context-appropriate way, to get past the aforementioned content filters.


Let's step way back and look at where we ended up. Why does a spammer even need to expend the effort researching and developing spam AI that has to keep up with the anti-spam AI?

If the spammer is paid by volume, they'll get paid whether the message is filtered out or not, since there is no reliable way to count the actual readers of the message. No need for the spam AI effort.

If the spammer is paid by transaction value (i.e., the number of drugs or watches sold, or more likely, the dollar value obtained from the credit card numbers and bank accounts provided by suckers), their goal is to maximize the number of suckers that they hook. We can assume, by the fact that more than 80% of all e-mail traffic is spam, that a fair amount of money is being made from people getting spam, reading it, falling for it, and entering into a transaction. If it wasn't, there would not be the money to support the spam industry. Maybe I can curse those who fall for spam because they are deluged with it, but I still understand that suckers happen. Still, none of the expense and effort of spam AI is needed to do transactions with any of those suckers, because they don't have any spam filter.

So the only need for continued development of spam AI is to get to, and enter into transactions with, people who have an AI-based spam filter yet are willing to enter into such a transaction with spammers if and when the spam gets past those people's AI-based spam filter. We must assume that this is a nonzero set of people, otherwise spammers would not bother. WE DON'T NEED ARTIFICIAL INTELLIGENCE, WE NEED NORMAL INTELLIGENCE TO BE INJECTED INTO THOSE PEOPLE, if for no other reason but to maintain sanity for the rest of us.

philip1201
Posts: 201
Joined: Tue Nov 03, 2009 6:16 am UTC

Re: Why are AI anti-spam filters necessary?

Postby philip1201 » Thu Oct 28, 2010 9:43 am UTC

DavidRoss wrote:
Pfhorrest wrote:The arms race then is (on the anti-spam side) to develop AI which can not only separate clearly auto-generated text from natural language text, but can separate spam from non-spam on the basis of content; and (on the pro-spam side) to develop AI which can promote whatever products it is hawking in a constructive, context-appropriate way, to get past the aforementioned content filters.


Let's step way back and look at where we ended up. Why does a spammer even need to expend the effort researching and developing spam AI that has to keep up with the anti-spam AI?

If the spammer is paid by volume, they'll get paid whether the message is filtered out or not, since there is no reliable way to count the actual readers of the message. No need for the spam AI effort.

If the spammer is paid by transaction value (i.e., the number of drugs or watches sold, or more likely, the dollar value obtained from the credit card numbers and bank accounts provided by suckers), their goal is to maximize the number of suckers that they hook. We can assume, by the fact that more than 80% of all e-mail traffic is spam, that a fair amount of money is being made from people getting spam, reading it, falling for it, and entering into a transaction. If it wasn't, there would not be the money to support the spam industry. Maybe I can curse those who fall for spam because they are deluged with it, but I still understand that suckers happen. Still, none of the expense and effort of spam AI is needed to do transactions with any of those suckers, because they don't have any spam filter.

So the only need for continued development of spam AI is to get to, and enter into transactions with, people who have an AI-based spam filter yet are willing to enter into such a transaction with spammers if and when the spam gets past those people's AI-based spam filter. We must assume that this is a nonzero set of people, otherwise spammers would not bother. WE DON'T NEED ARTIFICIAL INTELLIGENCE, WE NEED NORMAL INTELLIGENCE TO BE INJECTED INTO THOSE PEOPLE, if for no other reason but to maintain sanity for the rest of us.


You can lead a horse to water but you can't make it drink. It's easier to develop spam filters than it is to teach people never to click on spam ever. Also realize that these people can just as easily be children or old people to whom either the concept of spam is alien, or who simply are unable to properly differentiate spam from proper messages. There's also the possibility of random clicking, or the nature of the e-mail server, which could cause entirely internet-savvy people to accidentally click on, or open spam (my old e-mail server automatically opened a message if you watched the preview for longer than five seconds).

You seem to be thinking far to two-sidedly. Either somebody is smart enough to never ever ever ever click on spam even if 90% of his inbox is made up of it, OR somebody is so stupid that they utterly deserve to be spammed and don't need "our" help. "Us" being those who you deem worthy of caring about - those who never open spam. Within this frame of mind of yours, it is obvious that AI spam filters are unnecessary - either people are worthy of being protected, and don't need it, or they are unworthy of being protected and should be educated instead. The problem is that the world isn't black and white - intelligence doesn't perfectly shield you from doing stupid things, isn't easily gotten, and is immoral to judge people by.

What you propose, as an alternative to a completely optional product which is sold through the free market, is that every person who owns a computer is "injected with intelligence" to such an extent that they are able to able to perfectly separate spam from other things.
To answer your question, they are necessary because humans aren't perfect like you expect them to be.


On the more interesting topic of this comic, it's possible to do if you only let count those votes of those who have a history of being both helpful (high votes by others) and accurate (high level of voting accuracy). Spammers, even if they were capable of recognizing each other (by, for example, collectively deciding to start their comments with a capital letter), wouldn't be able to vote because they would fail at voting accurately on normal posts. In order to prevent spammers from taking over a new forum, one could start with only allowing the (host-approved) moderators to vote, and letting it evolve from there. Overloading with sock puppets isn't possible because those whose votes actually count would have to suffer mass idiocy to let them in.
The only problems I can see with this is that power blocks may form, and that people's freedom of speech may be hindered. If the moderators aren't careful in their votes, the forum could become lopsided towards one side of an issue. Especially in discussion forums this could be an issue. As for their freedom of speech, people would be unable to represent unpopular views, or make obscure references which few people get, without risking to lose their position as a vote that counts. People who are more unique, or who refuse to go along with the local popular culture, would easily lose out.
Against this, a possible defense would be to have a "this is spam" button, as well as a sliding scale of comment usefulness, which is only used if it isn't spam. The sliding scale would be normalized over your voting history (so people who vote 4/10 on average aren't less accurate than those who vote 6/10 on average, when 7/10 is the average over the entire forum). In order to prevent abuse, alliances and competition, your statistics and those of others are hidden to everyone (possibly except the moderators).

User avatar
phillipsjk
Posts: 1213
Joined: Wed Nov 05, 2008 4:09 pm UTC
Location: Edmonton AB Canada
Contact:

Re: Why are AI anti-spam filters necessary?

Postby phillipsjk » Thu Oct 28, 2010 10:45 am UTC

DavidRoss wrote:So the only need for continued development of spam AI is to get to, and enter into transactions with, people who have an AI-based spam filter yet are willing to enter into such a transaction with spammers if and when the spam gets past those people's AI-based spam filter. We must assume that this is a nonzero set of people, otherwise spammers would not bother.


I can't find it now, but I remember reading and article by somebody who tried to classify spam hitting a honeypot. He concluded that one type of spam was actually stenography: the recipient is unknown because the messages are sent to millions of hosts

The point is, not every spam message has a goal of getting you to click on it. If you are not the intended recipient, they may want you to filter it.
Did you get the number on that truck?

DavidRoss
Posts: 96
Joined: Fri Mar 05, 2010 8:04 am UTC

Re: Why are AI anti-spam filters necessary?

Postby DavidRoss » Fri Oct 29, 2010 5:04 am UTC

phillipsjk wrote:
DavidRoss wrote:So the only need for continued development of spam AI is to get to, and enter into transactions with, people who have an AI-based spam filter yet are willing to enter into such a transaction with spammers if and when the spam gets past those people's AI-based spam filter. We must assume that this is a nonzero set of people, otherwise spammers would not bother.


I can't find it now, but I remember reading and article by somebody who tried to classify spam hitting a honeypot. He concluded that one type of spam was actually stenography: the recipient is unknown because the messages are sent to millions of hosts

The point is, not every spam message has a goal of getting you to click on it. If you are not the intended recipient, they may want you to filter it.


I doubt very much of the volume of spam is of that nature, and again in that case, you would not use spam AI, unless you think your recipient, who wants to get the mail, uses a regular spam filter that doesn't let the message through.

That said, I think the steganography trick is interesting in that it would appear in the spam arena. When nntp servers were still common, Blacknet used the much simpler approach of posting the message into a newsgroup, which by default got propagated everywhere.

I assume you meant steganography (that by sending one message to millions, eavesdroppers cannot tell who you are talking to as the intended recipient is hiding in a crowd), not stenography.

elevul
Posts: 7
Joined: Sun Nov 16, 2008 4:33 pm UTC

Re: 0810: Constructive

Postby elevul » Fri Oct 29, 2010 6:24 pm UTC

rpgamer wrote:
murgatroid99 wrote:I really liked this comic, and I wanted to say that ReCAPTCHA actually does something very similar: ReCAPTCHA is the one with 2 words instead of a bunch of random letters. What they do is take two words from a book they are digitizing: one the computer can read and one it cannot, and they don't say which is which. They test whether the human got the readable one right, and if a bunch of people get the same answer for the other one they assume it is the right answer. That's why you sometimes get weird untypeable characters; the computer doesn't know what it says. That way people, including spammers, who answer CAPTCHAs are doing something useful.

Is this the result of such a system?

No, this is:
This post had objectionable content.

Arete
Posts: 228
Joined: Sat Aug 15, 2009 12:13 am UTC

Re: 0810: "Constructive"

Postby Arete » Sat Oct 30, 2010 2:06 am UTC

I actually think this is Randal's commentary on the current state of the forums.

Once a pretty interesting place, busy, intellectually "above average" and worth a read.

Now dead, diseased, moderated into oblivion and boring as hell. Oh, and like the Moon - hitting a 51 gallon quotient of IQ is something to celebrate, rather than the deep oceans that it once had.




But there you go.

User avatar
TheGrammarBolshevik
Posts: 4878
Joined: Mon Jun 30, 2008 2:12 am UTC
Location: Going to and fro in the earth, and walking up and down in it.

Re: 0810: "Constructive"

Postby TheGrammarBolshevik » Sat Oct 30, 2010 2:45 pm UTC

Arete wrote:moderated into oblivion

lol?
Nothing rhymes with orange,
Not even sporange.


Return to “Individual XKCD Comic Threads”

Who is online

Users browsing this forum: No registered users and 101 guests