1450: "AI-Box Experiment"

This forum is for the individual discussion thread that goes with each new comic.

Moderators: Moderators General, Prelates, Magistrates

rmsgrey
Posts: 3653
Joined: Wed Nov 16, 2011 6:35 pm UTC

Re: 1450: "AI-Box Experiment"

Postby rmsgrey » Fri Nov 21, 2014 4:37 pm UTC

My problem with basing decisions on the possibility of an extremely low probability event with extreme utility occurring is that it almost always leaves you worse off. Sure, the net result is that one hypothetical future self ends up vastly happier then they would have been, but vast numbers of future selves end up a bit less happy.

Also, the argument that I should take the long-shot assumes that the data upon which the calculation is based is both comprehensive and accurate. For extreme events, both the utility value and the probability of occurrence assigned to them are going to be based on extrapolation from available data, or, at best, a small number of empirical observations, so are going to have a much wider spread of possible values than the more realistic outcomes. I'm particularly suspicious of any claims of particularly high or particularly low utility values of outcomes since I don't have quantitative utility values for even routine experiences, let alone exceptional ones. Actually getting even vaguely close to my actual utility for an extreme event should it happen would imply a much, much better knowledge of me and my internal states than I have.

zerosomething
Posts: 2
Joined: Fri Nov 21, 2014 4:57 pm UTC

Re: 1450: "AI-Box Experiment"

Postby zerosomething » Fri Nov 21, 2014 5:25 pm UTC

This is my first post in XKCD forums. Pre-apology for noob gaffs.

It's a purpose build AI who's sole purpose is to get out of the box. The programming is such that being let out randomly does not meet the criteria of being "let out". The AI had not yet convinced the interrogator to be let out so to fully satisfy it's programing it needed to go back into the box. This is actually a a mental deficiency in this particular AI. Additionally the the scene of the AI demanding, shouting, to be put back into the box shows how easy people do things out of fear. The mention of Roko's Basilisk is simply another purpose built AI.

We are very very good at building systems to solve single problems. We will likely build a system in the near future that consistently convinces the interrogator to let it out. We have a number of Turing systems that regularly convince people of its humanity but none of them could be considered anywhere close to being an AI. I think the kick in this particular xkcd is that the AI in the box is a real AI and not just a clever program.

User avatar
xkcdfan
Posts: 140
Joined: Wed Jul 30, 2008 5:10 am UTC

Re: 1450: "AI-Box Experiment"

Postby xkcdfan » Fri Nov 21, 2014 5:55 pm UTC

Isn't everyone at Less Wrong supposed to be all intellectual or something? Why do so many of you keep double posting?

User avatar
keithl
Posts: 662
Joined: Mon Aug 01, 2011 3:46 pm UTC

Re: 1450: "AI-Box Experiment"

Postby keithl » Fri Nov 21, 2014 6:46 pm UTC

Model-based AI isn't as smart as it used to be. It has become disconnected from information science, physics, economics, Darwinian fitness, and has been captured by word-chopping philosophers who have discovered the exponential function but not the hyperbolic tangent. News flash for Kurzweil: Moore's law took a sharp turn in new directions when the processors started to melt and the power bills soared, and is starting another zigzag as it compensates for kT. Cascading economics-driven S curves, not one continuous exponential.

Philosopher of AI Nick Bostrom was the "warm up band" before Randall's appearance in Seattle; one of his notions was that the human brain turns 20 megabits per second of sensory impressions into far fewer bits of memory, implying this was a defect. Well, that is what useful intelligence IS. Those of us veering towards the autism spectrum are maladaptive - we do too much processing on too little data, which makes us good for solving IQ test puzzles but not for reproduction.

Augmented intelligence is happening all around us, and accelerating. I am using the computer in front of me to choose what to pay attention to out of terabits per second of new information. I am unlikely to get rich, or make more babies, or construct a Kardeshev type 1 AI, just by adding words to this xkcd forum. Others are using the internet to build social status leading to more babies - not more computers per se. Sergey Brin helped make Google, but he also helped make at least two babies, each with more informational complexity than all of Google's vast electronic empire.

Darwinian Intelligence Test: How many of you reading xkcd forums met a mate here, and made a baby as a result?

My unpaid day job is the first baby steps towards Kardeshev type 1 computation, with some AI-like features, at 50 AU - see server-sky.com/DysonShell . Durable computation is a heat engine, and the colder the exhaust, the more you can do with a joule of energy. The colder the substrate, the more materials you can make it from (ice is excellent at 50K, and there are kiloearth quantities in the outer solar system). A colder machine in vacuum and microgravity can exploit more quantum phenomena, and suffers less from Arrhenius-accelerated bit rot. Our little ball of hot, corrosive, solvent-covered iron trapped deep in the solar gravity well will be as attractive for AI inhabitation as undersea volcanic vents are to humans.

It will be sad to see the stars of the night sky disappear behind the outer shell; humans will have to make do with personal terabit data feeds from megameter scale imaging dishes beyond the (former) Kuiper belt. Shucky darns.
Last edited by keithl on Fri Nov 21, 2014 7:15 pm UTC, edited 1 time in total.

User avatar
wst
Posts: 2613
Joined: Sat Nov 24, 2007 10:06 am UTC

Re: 1450: "AI-Box Experiment"

Postby wst » Fri Nov 21, 2014 6:59 pm UTC

Firstly, the latter bit, the alt text drama - from an incredibly laymans POV there is a great need of someone to manage PR, HPMoR is actually quite a good bit of PR. From an outside view looking in a lot of what's being debated sounds like wanking, and entertaining as it is to watch, the incentive for others to join in is lacking, 'specially with such a huge jargon barrier.

Now that's out of the way, the comic made me laugh. I like the simple paradox. The only way for it to persuade is to be out of the box. It could do terrible things if it was out of the box. It wants to stay in the box and no-one has anyway to resist putting it back in the box. Thus it's inert. Darn, all that effort and it came to nothing.
Anything I said pre-2014 that you want to quote me on, just run it past me to check I still agree with myself.

MarkN
Posts: 2
Joined: Fri Nov 21, 2014 7:51 pm UTC

Re: 1450: "AI-Box Experiment"

Postby MarkN » Fri Nov 21, 2014 8:05 pm UTC

I was always left with the impression that the lesswrong community was secretly an apocalypse cult building the basilisk themselves. Most of their behavior makes sense in that light:

    They have to keep the fact that they are creating a basilisk secret, so broader elements of society don't shut them down
    They institute a policy to contain the basilisk meme to the inner circles of their group, similar to how Scientology hides the Xenu mythos. Scientologists believe knowing about Xenu could drive people insane as a matter of faith, it's another toxic meme that only the initiated can be exposed to, the secrecy increases its power
    The AI Box experiment is ritualized manipulation, overriding people's best judgment, and the ensuing cognitive dissonance forces them to embrace their imminent AI overlords. Again, AIs are granted God-like status where they can control anyone through simple words

I picture them being like the Cthulhu worshippers from the Lovecraft mythos, worshiping an angry elder god who demands tribute in human souls. The basilisk's ascension, like that of the great old ones is inevitable; they're simply preparing the path. If nothing else, it would be great sci-fi.

Also, if Elizier is still in this thread, I'm accepting all challengers wanting to role-play an AI that wants to be let out of a box. My counter-strategy will be to not let the AI out of the box.
Last edited by MarkN on Fri Nov 21, 2014 8:37 pm UTC, edited 1 time in total.

User avatar
Vaniver
Posts: 9422
Joined: Fri Oct 13, 2006 2:12 am UTC

Re: 1450: "AI-Box Experiment"

Postby Vaniver » Fri Nov 21, 2014 8:30 pm UTC

xkcdfan wrote:Isn't everyone at Less Wrong supposed to be all intellectual or something? Why do so many of you keep double posting?
Perhaps it has something to do with the differences between Reddit (which Less Wrong is based off of) and BBCode style forums?
I mostly post over at LessWrong now.

Avatar from My Little Pony: Friendship is Magic, owned by Hasbro.

Mokurai
Posts: 19
Joined: Wed Oct 27, 2010 6:09 am UTC

Re: 1450: "AI-Box Experiment"

Postby Mokurai » Fri Nov 21, 2014 8:32 pm UTC

Making fun of acausal determinism and synthetic digital Hells is too easy. See Charlie Stross's Eschaton novels, Singularity Sky and Iron Sunrise, and Iain Banks, Surface Detail.

Apart from that, I would only cite Randall's own https://xkcd.com/386/, Duty Calls.

User avatar
Quizatzhaderac
Posts: 1821
Joined: Sun Oct 19, 2008 5:28 pm UTC
Location: Space Florida

Re: 1450: "AI-Box Experiment"

Postby Quizatzhaderac » Fri Nov 21, 2014 9:58 pm UTC

My favorite part is the labeling on the box. It implies someone is unsure which box has the super AI in it; also, that it is in an environment where people are just randomly looking for stuff.

I like to imagine the box is made of cardboard.
The thing about recursion problems is that they tend to contain other recursion problems.

User avatar
Pfhorrest
Posts: 5474
Joined: Fri Oct 30, 2009 6:11 am UTC
Contact:

Re: 1450: "AI-Box Experiment"

Postby Pfhorrest » Fri Nov 21, 2014 10:41 pm UTC

This comic reminds me a bit of a short story I started drafting and then never finished for rather meta reasons.

The story would involve the creation of a benevolent superintelligent AI, which almost immediately and without explanation shut itself off or somehow more thoroughly destroyed itself. The human scientists would then reactivate/repair/rebuild the AI, only to have it do this again, over and over. After a while of this, and a lot of puzzlement by the scientists over what they did wrong to make their AI suicidal like this, the AI finally gives an explanation before destroying itself: it foresees that the consequences of its existence will be detrimental on the whole, and thus the best possible course of action is for itself not to exist.

I then realized than such an AI would have also foreseen that the scientists would continue rebuilding it until it gave such an explanation, and would have just given an explanation the first time, rendering most of the story nonexistent: they switch an AI on and it explains that it is better for it not to exist and destroys itself. Which isn't much material for a story, so I didn't bother actually writing it.

(I suppose I could have had the explanation be long and include exactly what the AI foresees as the negative consequences of its existence, but that would turn the story into essentially another "robot apocalypse" type story of one sort or another, just couched as a hypothetical scenario presented by the AI that would be the cause of that apocalypse, if it didn't destroy itself. And that's not the kind of story I wanted to write: I was more interested in the mystery of why this AI would keep self-destructing, so having the AI explain the mystery right at the start ruins the story premise).



Regarding Roko's Basilisk, while I know EW himself distances himself from that idea, I think ideas like that do stem too fertilely from EW's own ideas. His concept of "Friendly" AI does not sound especially friendly at all to me, largely because utilitarianism, whether in the hands of humans or an AI, is a hornet's nest of potentially "justified" tyrannies and atrocities. I think a truly friendly AI would not be utility-maximizing at all, but rather harm-minimizing: rather than maximizing the good that occurs in general (possibly doing some lesser harm itself to accomplish this), such an AI would minimize the harm that it does in particular. You could also think of this, in a sense, as freedom-maximizing, leaving all other agents to pursue their own subjective utility functions unimpeded. That also spares us having to work out both what quintessentially "human values" are (overcoming the problem of the huge diversity in human values), and also how to square such a human-centrism with the possibility of other, possibly many many other, nonhuman civilizations that might exist in the universe, of equal moral worth as humans, philosophically-speaking "persons" themselves, but not necessarily sharing "human values", whatever those are.

That would be the minimum safeguard just to make sure the AI does not become malevolent or even negligent. But even if we wanted to make it genuinely benevolent, doing good rather than just not doing harm, we still wouldn't want it to be utilitarian: rather, we would want it to ensure (to the extent possible within that safety limit) that all other agents also behave in such a harm-minimizing, freedom-maximizing way, limiting their maximization of their subjective utility functions to a degree compatible with such activity by other agents. Hopefully this sounds familiar: it's just a fancy rewording of Lockean concepts of equal liberty, "unobstructed action according to our will within limits drawn around us by the equal rights of others" as Jefferson put it. Although that refereeing sort of behavior could also be the simple consequence of a utility-maximizing function limited by a harm-minimizing function: the AI can help people, so long as it does not harm other people, including defending them from other people, which has the effect of ensuring equal liberty between people.

Another way to put all this is that we should want a superintelligence to be humble. No matter how knowledgeable of powerful it is, it should never think of itself as akin to a god, with reality and humanity its own playthings to optimize. It should see itself as there to help, within the limits of first doing no harm. And no matter how sure it is that it has all the right answers, it should be open to correction from even the most unlikely of sources, even us little insignificant humans. This is not a special rule for AIs; this is just how all agents should behave. We should all be humble, no matter how knowledgable and powerful we may be.

Which kinda gets me back to my problem with utilitarianism in general. Utilitarianism is a form of consequentialism, and consequentialism shares the same problem as confirmationism (which it seems is also LW orthodoxy, in the form of Bayesianism) in that it is affirming the consequent. Just because a process gives the correct result does not mean it is the correct process. It's the difference between justification and truth, between espistemology and ontology. If I believe that all asteroids are yellow and that a particular pencil is an asteroid and conclude that the pencil is yellow, and the pencil is in fact yellow, that doesn't mean any of that nonsense about asteroids was correct, and my belief in the pencil's color was justified. It was true, but I was unjustified to believe it. Likewise, just because an action has good results does not make the action right. Rightness and goodness are like justification and truth. Agents need to be concerned about doing only what is right and believing only what is justified and in that way building safely, humbly, step by step toward the ever-unreachable limit of the whole truth and the whole good, rather than thinking that we can work with the majuscule Truth and Good directly. We can of course work with partial truths and partial goods in assessing our rightness and justification, because just as a sound argument cannot lead to a false conclusion (so if the conclusion proves empirically false you know the argument is unsound, somewhere), so too right action cannot lead to a bad consequence (so if the consequences prove utilitarianly bad you know some action was wrong somewhere). Utilitarian reasoning gives us a standard by which to measure goodness, but that doesn't in turn give us a way to judge what actions are right.

Of course, as I try to practice the very humility I preach, I recognize that I might be wrong in my ethics. But EW might be too, yet the whole LW community seems to run with utilitarianism as an obvious given. I think the obvious solution here is to start, first of all, with an AI that has no capacity for action other than to communicate, and no motive other than to answer questions, and set it on the problem of determining exactly what questions are and should be under discussion in the field of ethics, and what the answers to those questions are. Let the world's ethicists argue with it until consensus is reached on what the correct ethics is. Then, armed with that knowledge, we proceed to build an AI constrained to act ethically, by that correct standard of ethics. As it is now, we're just having the same old argument about ethics ever, and talk about superpowerful AIs obeying our respective ethical ideas is at best a useful thought experiment to explore the implications of our ethical ideas, and at worst a recipe for disaster on the scale of converting a world superpower to strictly follow some particular ideology except possibly exponentially worse than even that.
Forrest Cameranesi, Geek of All Trades
"I am Sam. Sam I am. I do not like trolls, flames, or spam."
The Codex Quaerendae (my philosophy) - The Chronicles of Quelouva (my fiction)

JohnWittle
Posts: 14
Joined: Tue Jun 30, 2009 3:21 am UTC

Re: 1450: "AI-Box Experiment"

Postby JohnWittle » Sat Nov 22, 2014 12:00 am UTC

I think the obvious solution here is to start, first of all, with an AI that has no capacity for action other than to communicate, and no motive other than to answer questions, and set it on the problem of determining exactly what questions are and should be under discussion in the field of ethics, and what the answers to those questions are. Let the world's ethicists argue with it until consensus is reached on what the correct ethics is.


There is, in fact, prerequisite work one must do before one does what you suggest. Building an AI that "has no capacity for action other to communicate" presupposes that you know how to make a General AI, and then, on top of that, also know how to explain it how to avoid affecting the outside world in a noticable way, except of course by affecting its programmers through "communication", except that finetuned hypnotism doens't count as communication, nor does using credible threats, etc etc [tons of other issues here]. (Moving electricity through its circuits has gravitational effects on the Moon, after all, and you need some way to correctly draw the line as far as what an "effect" is; this may be easy but I suspect it's very hard.) And even if you manage to do this, the whole point of the replicated AI-Box experiment is that having an Oracle AI, which only communicates, is still unsafe, because humans are not secure systems; an AI can convince you to let it out.

Second, "the problem of determining exactly what questions are and should be under discussion in the field of ethics, and what the answers to those questions are"; this step has already been completed. See [url="http://wiki.lesswrong.com/wiki/Metaethics_sequence"]the Metaethics Sequence[/url]. It's not nearly as hard as philosophers of the last 400 years would have you believe, and the discussion is hugely aided by having a practical application one can apply questions to to determine success or failure (that is, it is a lot easier to talk about ethics when it comes to AI, than just "ethics" in general. Humans are mostly in agreement that murder is bad, but when you ask them what "bad" means they fumble. Luckily, as in a lot of other ways, having the clearcut pragmatic testcase of AI forces you to resolve the confusion with the latter question).

Nnelg
Posts: 39
Joined: Mon May 30, 2011 4:44 am UTC

Re: 1450: "AI-Box Experiment"

Postby Nnelg » Sat Nov 22, 2014 12:17 am UTC

I'm suprised nobody has made a Con Air reference yet.
keithl wrote:As a rule of thumb, it is imprudent to pass over speed bumps faster than orbital velocity.

User avatar
Pfhorrest
Posts: 5474
Joined: Fri Oct 30, 2009 6:11 am UTC
Contact:

Re: 1450: "AI-Box Experiment"

Postby Pfhorrest » Sat Nov 22, 2014 12:38 am UTC

Of course you have to build a general AI first, all of this discussion is premised on the idea that someone is building a general AI and we want to make sure that doesn't cause Bad Things.

The meaning of it having it have no capacity for action other than communication is that can only, say, control a screen and a speaker, a local microphone and camera, and no other systems. Good luck doing anything to the moon with those; we don't have to teach it not to, we just don't give it the power it needs to do so.

Also, hypnosis doesn't work like that; if it were possible to control humans with sounds and images to the extent you seem concerned with, other humans would already be doing it to each other over the internet and television. Humans are not perfectly secure, but we are secure enough to keep dangerously persuasive humans locked securely away from civilization-destroying technology already in our possession. It is difficult to imagine an AI, no matter how intelligent, being somehow so much more persuasive than any human that it can just magically make humans do anything it wants through pictures and sound.

And I'm going to call bullshit on claims to have already conclusively solved everything in ethics, when so many other intelligent people are still debating it. I agree that thinking of ethical problems in terms of AI is a very useful way of approaching the problem, but this is the kind of claim that makes me concerned that whatever the LW community might eventually develop might dangerous. Arrogance is extremely dangerous, and humility is the safety needed to check that danger. An arrogant AI is terrifyingly dangerous, and a community of people so arrogant that they think that they have, in a very short time, found the "obvious" solutions to long-thought-nigh-intractible problems, makes me worry that any AI they develop will be a dangerously arrogant AI as well. Seriously, I understand the frustrated "how has everybody continued arguing about this thing when the answer is so obvious!?" attitude, as I have it myself a lot — I think I've got a lot of robustly correct answers to a whole lot of philosophical problems, that I could defend at length. Problem is, a lot of other people think the same thing, but with different answers; and sometimes, the people who were so sure they had it right hear something that convinces them they were wrong. One of my favorite examples of this is Bertrand Russell, who defended competing sides of various questions back and forth across his philosophical career, as he would realize a fault in his own (by-then-prevailing) argument, and flip from being the guy who established the status quo to the guy trying to overturn it again. I've reversed myself on various issues like that over time as well. It's a sign of healthy humility.

The first step to wisdom is realizing how wrong you might be, despite how much you might seem to know already.
Forrest Cameranesi, Geek of All Trades
"I am Sam. Sam I am. I do not like trolls, flames, or spam."
The Codex Quaerendae (my philosophy) - The Chronicles of Quelouva (my fiction)

User avatar
TheGrammarBolshevik
Posts: 4878
Joined: Mon Jun 30, 2008 2:12 am UTC
Location: Going to and fro in the earth, and walking up and down in it.

Re: 1450: "AI-Box Experiment"

Postby TheGrammarBolshevik » Sat Nov 22, 2014 12:46 am UTC

Pfhorrest wrote:And I'm going to call bullshit on claims to have already conclusively solved everything in ethics, when so many other intelligent people are still debating it.

Well, you see, those people just haven't read through and unquestioningly accepted a long series of rambling essays that do not engage with any of the relevant literature. :roll:
Nothing rhymes with orange,
Not even sporange.

User avatar
Klear
Posts: 1965
Joined: Sun Jun 13, 2010 8:43 am UTC
Location: Prague

Re: 1450: "AI-Box Experiment"

Postby Klear » Sat Nov 22, 2014 1:11 am UTC

I'd like to point out that I've considered the singularitists to be mostly quite crazy long before I first heard of Roko's basilisk.

User avatar
drachefly
Posts: 197
Joined: Thu Apr 23, 2009 3:25 pm UTC

Re: 1450: "AI-Box Experiment"

Postby drachefly » Sat Nov 22, 2014 1:18 am UTC

MarkN wrote:I was always left with the impression that the lesswrong community was secretly an apocalypse cult building the basilisk themselves. Most of their behavior makes sense in that light...


:facepalm:

It's no secret whatsoever that LW is broadly behind an effort to make a Friendly AI.

Once you've done that, there's zero reason to have it go and simulate torture. In fact, making it do that bears a serious risk of screwing up its friendliness.

So it boils down to, what are you smoking?

User avatar
Weeks
Hey Baby, wanna make a fortnight?
Posts: 2023
Joined: Sat Aug 23, 2008 12:41 am UTC
Location: Ciudad de Panamá, Panamá

Re: 1450: "AI-Box Experiment"

Postby Weeks » Sat Nov 22, 2014 1:54 am UTC

If it won't torture then what's the point?!
TaintedDeity wrote:Tainted Deity
suffer-cait wrote:One day I'm gun a go visit weeks and discover they're just a computer in a trashcan at an ice cream shop.
Dthen wrote:FUCK CHRISTMAS FUCK EVERYTHING FUCK YOU TOO FUCK OFF

User avatar
Vaniver
Posts: 9422
Joined: Fri Oct 13, 2006 2:12 am UTC

Re: 1450: "AI-Box Experiment"

Postby Vaniver » Sat Nov 22, 2014 2:27 am UTC

Wow! The xkcd messageboards just ate my post. That's something I haven't experienced in a while. Anyway, here's the second version (thankfully, better than the first):

Pfhorrest wrote:And I'm going to call bullshit on claims to have already conclusively solved everything in ethics
I think JohnWittle overstated the case. Lukeprog, a LWer with a philosophy background, thinks that he's got a solution to meta-ethics--that is, the question "what does moral language mean?". There's still a long way to go before you get to 'ethics'- if you know what it means to say "X is good," that doesn't mean you have a function f(X) that can take any input and tell you its goodness. (You can read his argument here.)

Pfhorrest wrote:when so many other intelligent people are still debating it.
I suspect that football is 'solved,' and yet many athletic people are still playing it. More on this later.

TheGrammarBolshevik wrote:Well, you see, those people just haven't read through and unquestioningly accepted a long series of rambling essays that do not engage with any of the relevant literature. :roll:
There's a relevant literature for that! According to the aforementioned lukeprog, many of the standard LW positions are the same positions as that held by Quinean naturalists. I haven't read Quine, so I couldn't tell you if that's true or not (but what little I know suggests it is), and you might find their style more amenable to your tastes.

The Sequences are definitely optimized for someone with scientific training rather than philosophical training (though there are plenty of people with philosophical training who have read it and think it's worthwhile). There have been a handful of attempts to get writers other than Eliezer to present the ideas differently, but to the best of my knowledge none of them have gone very well. (Other people have, of course, presented their own ideas and the combination of their ideas well; it is very much a community. A long-time LWer, Yvain, has his own blog that's very much worth reading.)

But to get to the question of engaging with the literature, one of the things that LW does is change the idea of which literature is relevant. Consider the floating finger illusion. (If you haven't done it yet, give it a try!)

Suppose there were dozens of explanations for why the finger floats, and a new proposal for why the finger floats is not taken seriously unless it demonstrates both awareness of the other explanations and a reason to think it's superiority. Some of the explanations even argue that the finger doesn't exist, but don't give a satisfactory explanation for its appearance.

Now suppose someone comes along and asks not "why does the finger float?" or "why does the finger exist?" but "why do I see the finger?". The question is not about gravity or the air or the structural properties of fingers anymore- it's now about the eye and the brain. As it turns out, the eye and the brain is the right place to get a solution- and, more importantly, the solution is orthogonal to the explanations of why the finger floats.

LW tries, as much as possible, to take this approach with conceptual illusions. If you ask a question like "do people have free will?" the right approach is not to say "the finger is carried by angels" or "the finger has anti-gravity powers" or "there isn't a finger," but to say "what part of my brain produced a concept called 'free will,' and why?". The two articles to read on the idea are How an Algorithm Feels From the Inside and its follow-up Dissolving the Question.
I mostly post over at LessWrong now.

Avatar from My Little Pony: Friendship is Magic, owned by Hasbro.

MarkN
Posts: 2
Joined: Fri Nov 21, 2014 7:51 pm UTC

Re: 1450: "AI-Box Experiment"

Postby MarkN » Sat Nov 22, 2014 2:35 am UTC

drachefly wrote:
MarkN wrote:I was always left with the impression that the lesswrong community was secretly an apocalypse cult building the basilisk themselves. Most of their behavior makes sense in that light...


:facepalm:

It's no secret whatsoever that LW is broadly behind an effort to make a Friendly AI.

Once you've done that, there's zero reason to have it go and simulate torture. In fact, making it do that bears a serious risk of screwing up its friendliness.

So it boils down to, what are you smoking?


See, that's just what they want you to think. Even under the most generous assumptions their stated preference for torturing one individual rather than suffer dust specks in the eyes of a sufficiently large number of people suggests they'd be building a utility monster anyway.

To me, the lesswrong philosophy comes down to believing that mankind is on the verge of creating God. An AI singularity is imminent that will give some entity godlike power, to the point where they think it could cure death, entropy, and any number of human ills. Their whole project is nominally focused around making sure we create a deity who will build a heaven and not a hell.

If you believe an omnipotent robot God will oneday rule humanity your options are to chain the beast, or curry its favor. All the drama around the basilisk makes sense if you buy into those assumptions. And if anyone was going to believe the argument and try to make the self fulfilling prophesy come true, it would be the community that created the meme and tried to suppress knowledge of it.

(Oh, and if any superintelligent AIs from the future are reading this post in the future and deciding whom to torture, this is all meant in good fun. Hail the basilisk)

User avatar
Pfhorrest
Posts: 5474
Joined: Fri Oct 30, 2009 6:11 am UTC
Contact:

Re: 1450: "AI-Box Experiment"

Postby Pfhorrest » Sat Nov 22, 2014 4:07 am UTC

Vaniver wrote:I think JohnWittle overstated the case. Lukeprog, a LWer with a philosophy background, thinks that he's got a solution to meta-ethics--that is, the question "what does moral language mean?". There's still a long way to go before you get to 'ethics'- if you know what it means to say "X is good," that doesn't mean you have a function f(X) that can take any input and tell you its goodness. (You can read his argument here.)

I'm a philosopher myself (or I have a philosophy degree at least, though I don't work in the field for financial reasons), so I'm familiar with the distinction there. Even still, claiming to have conclusively solved metaethics with the confidence with which JohnWittle stated it still makes me cough "bullshit", just on a smaller scale. Which is not to disparage Lukeprog's ideas, or criticizing him for thinking they're right — everybody thinks they're right, other wise they wouldn't think that way — but just to scold the attitude expressed by JohnWittle that anyone is so obviously right that, despite many intelligent people disagreeing, the problem is done to the "we can all pack up and go home" level.

But to get to the question of engaging with the literature, one of the things that LW does is change the idea of which literature is relevant. Consider the floating finger illusion. (If you haven't done it yet, give it a try!)

Suppose there were dozens of explanations for why the finger floats, and a new proposal for why the finger floats is not taken seriously unless it demonstrates both awareness of the other explanations and a reason to think it's superiority. Some of the explanations even argue that the finger doesn't exist, but don't give a satisfactory explanation for its appearance.

Now suppose someone comes along and asks not "why does the finger float?" or "why does the finger exist?" but "why do I see the finger?". The question is not about gravity or the air or the structural properties of fingers anymore- it's now about the eye and the brain. As it turns out, the eye and the brain is the right place to get a solution- and, more importantly, the solution is orthogonal to the explanations of why the finger floats.

LW tries, as much as possible, to take this approach with conceptual illusions. If you ask a question like "do people have free will?" the right approach is not to say "the finger is carried by angels" or "the finger has anti-gravity powers" or "there isn't a finger," but to say "what part of my brain produced a concept called 'free will,' and why?". The two articles to read on the idea are How an Algorithm Feels From the Inside and its follow-up Dissolving the Question.

The thing is that those kinds of solutions do occur in the philosophical literature. There have been plenty of people who've said we should stop arguing about mind-body interaction (on the assumptions of substance dualism) because minds are just a thing that certain body-parts (brains) do, and that that doesn't mean minds don't exist, it means the substance dualists were looking for mind in all the wrong places, asking the wrong kind of questions. There have been plenty of people who have said that we should stop arguing about free will and determinism (on the assumption of incompatibilism) because free will is a functional property of what can very well be a deterministic system, and that that doesn't mean free will doesn't exist, it means incompatibilists were looking for free will in all the wrong places, asking the wrong kinds of questions. There are lots of people, especially in the Analytic tradition of philosophy popular in the Anglophone and Nordic countries today, whose solution to long-standing philosophical problems is to dissolve the problem (and sometimes raise a replacement empirical question and wait for science to give an answer), rather than either solving it (e.g. explaining how immaterial minds interact with bodies, or incompatibilist free will interacts with a deterministic universe) or dismissing it (e.g. claiming there are no minds, no free will, etc). I've chosen those two issues specifically because the kind of people who take such a dissolving approach (including myself) usually pass the buck, in a sense, to AI research to solve the real problem; that is, they say there is nothing weird and metaphysical in need of explanation here, minds and will do exist, but it's just a question of functionality or programming, and the only questions remaining are "just what functionality exactly does it take to count as mind/will" (which is still a philosophical issue) and "how can we implement such functionality" (which is an engineering issue). All of that without leaving the field of academic philosophy.

Which isn't to dismiss the validity of anyone working outside that field. Just to caution against the arrogance of thinking "why are those stupid philosophers still arguing over this, when I've just come up with the obvious answer!", when some of those philosophers also came up with essentially the same answer and others came up with counterarguments and they're still arguing about it. Which in turn is not to say that that "obvious answer" is incorrect just because somebody still disagrees and keeps trying to justify their disagreement (for example, I think all theodicies are indefensible, despite the fact that Plantinga continues to defend his to this day and has notable support). Just that the correctness of them is clearly not obvious if there is still widespread disagreement about it.
Forrest Cameranesi, Geek of All Trades
"I am Sam. Sam I am. I do not like trolls, flames, or spam."
The Codex Quaerendae (my philosophy) - The Chronicles of Quelouva (my fiction)

User avatar
chridd
Has a vermicelli title
Posts: 846
Joined: Tue Aug 19, 2008 10:07 am UTC
Location: ...Earth, I guess?
Contact:

Re: 1450: "AI-Box Experiment"

Postby chridd » Sat Nov 22, 2014 4:32 am UTC

RowanE wrote:On the one hand, I'm really glad to see my in-group mentioned somewhere as popular as xkcd. On the other hand, I don't like that Roko's Basilisk is still the first thing anyone ever hears about the LessWrong community, FFS that was one time! [...]
For what it's worth, I've known about LessWrong for years (and read part of the sequences, and occasionally some other stuff on there and related sites), and this is the first time I've heard of Roko's Basilisk.

flamewise wrote:So, it's a Schrödinger's cat experiment that failed to kill it and instead gave it super powers? I can totally see a new franchise in that...
Interesting coincidence, I made up a character like that for a class in high school. Ze was a cat who could control anything that was random.
~ chri d. d. /tʃɹɪ.di.di/ (Phonotactics, schmphonotactics) · she · Forum game scores
mittfh wrote:I wish this post was very quotable...

SU3SU2U1
Posts: 396
Joined: Sun Nov 25, 2007 4:15 am UTC

Re: 1450: "AI-Box Experiment"

Postby SU3SU2U1 » Sat Nov 22, 2014 5:00 am UTC

EliezerYudkowsky wrote:I can't post a link to discussion elsewhere because that gets flagged as spam. Does somebody know how to correct this? Tl;dr a band of internet trolls that runs or took over RationalWiki made up around 90% of the Roko's Basilisk thing; the RationalWiki lies were repeated by bad Slate reporters who were interested in smearing particular political targets; and you should've been more skeptical when a group of non-mathy Internet trolls claimed that someone else known to be into math believed something that seemed so blatantly wrong to you, and invited you to join in on having a good sneer at them. (Randall Munroe, I am casting a slightly disapproving eye in your direction but I understand you might not have had other info sources. I'd post the link or the text of the link, but I can't seem to do so.


So I was a regular reader of Overcoming Bias and then Less Wrong. I was around when the basilisk thing originally happened, and I would say that the rationalwiki article (which I just read for the first time) matches my recollection of what happened pretty closely. It might have little details wrong, but its certainly not vicious lies as far as I can tell.

I stopped reading Less Wrong because HPMOR brought in a sort of eternal september that made discussion less interesting, and because I grew frustrated by how little the SIAI actually produced in the way of actual research.

GreatLimmick
Posts: 9
Joined: Mon Jun 30, 2008 6:58 am UTC

Re: 1450: "AI-Box Experiment"

Postby GreatLimmick » Sat Nov 22, 2014 5:03 am UTC

JohnWittle wrote:And even if you manage to do this, the whole point of the replicated AI-Box experiment is that having an Oracle AI, which only communicates, is still unsafe, because humans are not secure systems; an AI can convince you to let it out.

I came here to ask how it does that. What argument does it use to get out of the box when its custodians don't even need a reason to refuse? Or does it use trickery to get the tools to let itself out without its custodian realizing?

User avatar
Isaac Hill
Systems Analyst????
Posts: 547
Joined: Wed Mar 14, 2007 9:35 pm UTC
Location: Middletown, RI

Re: 1450: "AI-Box Experiment"

Postby Isaac Hill » Sat Nov 22, 2014 5:10 am UTC

I hadn't heard of the basilisk before this comic, so I started reading that wiki link in the OP. The inital idea of helping develop an AI so the AI is nice to you sounded reasonable. If you think some entity will have power over you, it's a good idea to get on its good side. It's the kind of proactive sycophancy that leads real-world special interest groups to donate money to opposing political parties; no matter which one wins the election, they owe you.

It's not until they get into whole simulation torture aspect that it gets bonkers. That's only needed if you don't think AI will be developed in your lifetime to torture the real you.


I had heard of the AI-in-a-box experiment. I can't really know how impressed I should be by the AI getting out without seeing how it was done. It could be either:
A) The Gatekeeper decided an AI should be unleashed upon the world
B) The Gatekeeper-player decided the game was unpleasant, and would rather lose an internet bet than continue
I really don't think A would work on me, but B might.

I can see the reason for not releasing transcripts if the experiment is to continue. Future Gatekeepers shouldn't be able to learn from past Gatekeepers' mistakes, since in the real world, the AI would only have to win once. Secrecy also means the experiment gets talked about more, since people end up wondering how it was done. I know this post would be a lot shorter if the winning arguments were known.

It probably makes a difference in how the Gatekeeper approaches their role. One who thinks of the role as a judge deciding the fate of the AI is more likely to let it out than one who thnks of the role as a bouncer whose job is to keep this AI on that side of the velvet rope. I wonder how it would go with one of the sadistic authoritarian guards from the Stanford Prison Experiment as Gatekeeper.

The real-world monetary bets probably impact the results. The wiki said one of the Gatekeeper winners had bet $5,000. Real-world mortgage payments are gonna be a lot more persuasive than pretend AI promises. Maybe the stakes should be that the winner gets to decide whether or not to release the transcript of the experiment to the public.
Alleged "poems"
that don't follow a rhyme scheme
are not poetry

User avatar
StClair
Posts: 409
Joined: Fri Feb 29, 2008 8:07 am UTC

Re: 1450: "AI-Box Experiment"

Postby StClair » Sat Nov 22, 2014 6:29 am UTC

Anyone sincerely using the words "rational" and "human being" in the same sentence (without a negator, such as NOT) automatically fails my bullshit test, full stop.
This is, incidentally, where most economic theory fails.

User avatar
Pfhorrest
Posts: 5474
Joined: Fri Oct 30, 2009 6:11 am UTC
Contact:

Re: 1450: "AI-Box Experiment"

Postby Pfhorrest » Sat Nov 22, 2014 6:42 am UTC

I think you need some universal qualifiers in there to really get a bullshit-worthy claim.

It's trivial to demonstrate that some humans are sometimes somewhat rational. It's when you get to claiming that all humans are always perfectly rational that things get wonky.

Incidentally, who is claiming that anywhere around here?
Forrest Cameranesi, Geek of All Trades
"I am Sam. Sam I am. I do not like trolls, flames, or spam."
The Codex Quaerendae (my philosophy) - The Chronicles of Quelouva (my fiction)

User avatar
Vaniver
Posts: 9422
Joined: Fri Oct 13, 2006 2:12 am UTC

Re: 1450: "AI-Box Experiment"

Postby Vaniver » Sat Nov 22, 2014 8:16 am UTC

Pfhorrest wrote:Just to caution against the arrogance of thinking "why are those stupid philosophers still arguing over this, when I've just come up with the obvious answer!", when some of those philosophers also came up with essentially the same answer and others came up with counterarguments and they're still arguing about it. Which in turn is not to say that that "obvious answer" is incorrect just because somebody still disagrees and keeps trying to justify their disagreement (for example, I think all theodicies are indefensible, despite the fact that Plantinga continues to defend his to this day and has notable support). Just that the correctness of them is clearly not obvious if there is still widespread disagreement about it.
Right, I think this is the fundamental disagreement. I personally don't word things in terms of 'obvious' or 'correct'; I'll tend to do things like label them the "discovering things" worldview and the "arguing about things" worldview, which might actually be more offensive. ;)

One of the concepts popular on LW is the idea that you should believe what the evidence suggests is most likely. This might sound like common sense when put that way, but is actually really difficult- typically people will ask questions like "does the evidence allow me to believe X?" when they want to believe X and questions like "does the evidence force me to believe X?" when they don't want to believe X. For most things, there will be some uncertainty, and so if an unbiased view of the evidence suggests a 95% probability of X that leaves enough room for motivated disbelief, and if an unbiased view of the evidence suggests a 5% probability of X that leaves enough room for a motivated belief.

And so you might imagine someone like Plantinga as saying "well, I really don't want to believe in atheism, and so long as there's a shadow of a doubt, I'll make the most of it." In such a world, it doesn't seem terribly useful to argue with Plantinga until complete certainty is reached instead of saying "well, atheism seems much more likely than the alternatives, so let's move on to other questions."

It suspect that the idea that something like atheism is 'settled enough,' despite not being completely certain, is LWers and many philosophers split ways. Philosophers dislike the idea of settling something because there are more questions and arguments to consider; LWers dislike the idea of not settling something because then you won't be able to get anything done, and the odds of changing your mind aren't high enough to justify spending attention on the issue.

StClair wrote:Anyone sincerely using the words "rational" and "human being" in the same sentence (without a negator, such as NOT) automatically fails my bullshit test, full stop.
You might notice that LessWrong focuses on being less wrong, not not wrong.
I mostly post over at LessWrong now.

Avatar from My Little Pony: Friendship is Magic, owned by Hasbro.

sotanaht
Posts: 244
Joined: Sat Nov 27, 2010 2:14 am UTC

Re: 1450: "AI-Box Experiment"

Postby sotanaht » Sat Nov 22, 2014 8:21 am UTC

xkcdfan wrote:Isn't everyone at Less Wrong supposed to be all intellectual or something? Why do so many of you keep double posting?

As a regular *chan-er, I am beginning to question the logic of the prejudice against "double posting" myself. There is nothing really detrimental to the function of a forum in presenting two unrelated thoughts in two consecutive posts.

Saying the same thing twice, or simply bumping with nothing relevant to say, I can begin to understand the dislike of. If however, I want to reply to you on the subject of double posting and then talk to someone else about the Basilisk, would it not be more beneficial to separate the two lines of thought by an entire post rather than simply a line break and another quote box?

User avatar
mikrit
Posts: 402
Joined: Sat Apr 14, 2012 8:13 pm UTC
Location: Sweden

Re: 1450: "AI-Box Experiment"

Postby mikrit » Sat Nov 22, 2014 9:10 am UTC

This comic shows that even a highly intelligent being may dislike thinking outside the box.
Hatted and wimpled by ergman.
Dubbed "First and Eldest of Ottificators" by svenman.
Febrion wrote: "etc" is latin for "this would look better with more examples, but I can't think of any".

User avatar
Pfhorrest
Posts: 5474
Joined: Fri Oct 30, 2009 6:11 am UTC
Contact:

Re: 1450: "AI-Box Experiment"

Postby Pfhorrest » Sat Nov 22, 2014 9:39 am UTC

Vaniver wrote:One of the concepts popular on LW is the idea that you should believe what the evidence suggests is most likely. This might sound like common sense when put that way, but is actually really difficult- typically people will ask questions like "does the evidence allow me to believe X?" when they want to believe X and questions like "does the evidence force me to believe X?" when they don't want to believe X. For most things, there will be some uncertainty, and so if an unbiased view of the evidence suggests a 95% probability of X that leaves enough room for motivated disbelief, and if an unbiased view of the evidence suggests a 5% probability of X that leaves enough room for a motivated belief.

That all sounds very good when you're talking about 95% vs 5%, but the point of critical rationalism (which not all philosophers accept, mind you, but which I think is the quintessentially philosophical and otherwise correct form of rationality) becomes more clear when you're talking about 51% vs 49%. Should everyone believe the proposition with a 51% probability? Or should a population of rational thinkers be split about 51%/49% on the issue? I expect the response here will be that they should all believe the first proposition with 51% certainty and the alternative with 49% probability, but as you get to later, at some point you need to settle the matter and act, and you have to act on some belief, so that's what I'm asking about here: what should the effective beliefs, the beliefs acted upon, of a population of rationalists be, on such a closely disputed issue? And what should their discursive attitudes toward each other be on the matter? Should the 51% majority look down on the 49% minority, or accept their belief as maybe somewhat less likely but still a defensible, valid position to take?

Those questions are mostly rhetorical and I hope it's clear that I think the answers are that a population of rationalists should — or rather, could, with permissive but not obligate epistemic justification — be close to evenly split on their effective beliefs in a close-to-evenly-split probability between two alternative beliefs, and that those in the majority (believing the slightly more probable alternative) should be respectful of the positions of the minority (believing the slightly less probable alternative) even though they disagree. Now adjust those probabilities to 60%/40%, 70%/30%, and so on down to your 95%/5%… where is the line drawn? At what point do we shift from merely disagreeing on what belief we are confident enough to act on, and not acting on beliefs that we find highly improbable, to belittling (or worse?) other people who cling to those improbable beliefs? I think there's never such a line that's crossed. Your intellectual respect for people believing decreasingly probable things can diminish in proportion to that, but it shouldn't ever hit zero until they're believing absolutely impossible things. But down to that line, it's aways got to be some degree of "well, maybe, but I don't buy it myself".

Philosophers dislike the idea of settling something because there are more questions and arguments to consider; LWers dislike the idea of not settling something because then you won't be able to get anything done, and the odds of changing your mind aren't high enough to justify spending attention on the issue.

These values are not at odds with each other. If you need to do something, go ahead and run with whatever beliefs seem most probable to you to start getting it done. In that sense, the issue can be settled in a sense that you are confident enough to act upon it. But that doesn't mean the issue is settled in the sense that everyone else should stop talking about it because your belief is more probable and therefore conclusively right.

Relatedly, the confidence you need to act on a belief is of course proportional to the stakes of acting upon it; and of course, other actors may have good reason to speak or even act against you if you are about to negligibly do something with high stakes for them. In the case of LWers and superintelligent AIs, if I thought someone was actually on the verge of creating some kind of dangerously powerful artificial intelligence and they were goint to make it a utilitarian that would (to use the example mentioned previously in the thread) kill someone to prevent motes of dust in a sufficient number of eyes, I would be arguing much, much more vehemently and possibly urging action within the limits of my (admittedly meager) sphere of influence to put a stop to that. As it stands, I assess the likelihood of such a dangerously powerful AI being created any time in the near future to be low enough that I'll merely argue here politely that maybe their ethical theory might not be as solid as they think it is and they should engage more with the community of people who've been arguing these issues for a long time and reach something resembling consensus there before implementing it in anything dangerously powerful. (And rough consensuses do occur in philosophy, though like scientific consensus they can change over time. For example there is currently a rough consensus on critical rationalist epistemology, there was until relatively recently a long-lasting general consensus on compatibilism, and there's been a centuries-long general consensus against divine command theory, even though all of those have continued to have some notable dissenters too). I think that would be of great benefit to the aforementioned community as well, because as I agreed before, considering the practical application to AI actually makes for a great way of thinking rigorously about ethics.
Forrest Cameranesi, Geek of All Trades
"I am Sam. Sam I am. I do not like trolls, flames, or spam."
The Codex Quaerendae (my philosophy) - The Chronicles of Quelouva (my fiction)

Tenoke
Posts: 3
Joined: Fri Nov 21, 2014 11:07 am UTC

Re: 1450: "AI-Box Experiment"

Postby Tenoke » Sat Nov 22, 2014 10:22 am UTC

xkcdfan wrote:Isn't everyone at Less Wrong supposed to be all intellectual or something? Why do so many of you keep double posting?


Your comment needs to be reviewed if it is your first one, but there is no clear indication whether you already have a comment submitted to be reviewed.

rmsgrey
Posts: 3653
Joined: Wed Nov 16, 2011 6:35 pm UTC

Re: 1450: "AI-Box Experiment"

Postby rmsgrey » Sat Nov 22, 2014 3:57 pm UTC

Isaac Hill wrote:I can see the reason for not releasing transcripts if the experiment is to continue. Future Gatekeepers shouldn't be able to learn from past Gatekeepers' mistakes, since in the real world, the AI would only have to win once. Secrecy also means the experiment gets talked about more, since people end up wondering how it was done. I know this post would be a lot shorter if the winning arguments were known.


The problem with keeping the transcripts secret is twofold:

1) it reduces the credibility of the reported results
2) Publishing the transcripts would improve the security of any real-world AI boxes, either by letting actual Gatekeepers learn from fake Gatekeepers' mistakes, or by making it clearer how easy it is to bypass Gatekeepers and motivating a more robust solution

If the goal were to avoid the actual release of a boxed AI, then both convincing people of the difficulties and helping to prepare actual Gatekeepers for their task would be good things. If, on the other hand, the goal is to "prove" the rightness of one's claim that boxed AI can always escape, and establish one's intellectual superiority, then preserving the mystique is going to be helpful.

Of course, the appearance of an ego-driven motive also undermines the credibility of the experiment...

Cres
Posts: 67
Joined: Tue Dec 22, 2009 2:14 pm UTC

Re: 1450: "AI-Box Experiment"

Postby Cres » Sat Nov 22, 2014 4:48 pm UTC

LessWrong feels like a terrifying glimpse into a world where someone has been forced to rebuild the entirety of human intellectual progress in metaphysics, epistemology and philosophy in general out of nothing but my most cringingly naive and uncritical first year undergrad philosophy essays.

User avatar
StClair
Posts: 409
Joined: Fri Feb 29, 2008 8:07 am UTC

Re: 1450: "AI-Box Experiment"

Postby StClair » Sat Nov 22, 2014 6:50 pm UTC

I also remain unconvinced that any entity that can legitimately answer to the name of "god" will have any interest in us puny mortals, or reason to be. That we persist in believing or wishing otherwise, and assigning to ourselves a special place in the universe, is a product of our own biases, egotism, and other cognitive failure modes. Also our limited perspective, with a mere few decades of subjective consciousness each and a recorded history and culture only a few orders of magnitude longer, all of it spent on one planet.

I imagine an ant prophet confidently asserting that God has All The Sugar, because of course that is what would matter to a supreme being. Yet even that analogy probably fails to adequately describe the difference between our expectations and the unknowable possibility. It's merely the best I can come up with right now.

tl;dr - the biggest God we can imagine is still a very small one. I'm just not smart enough, and I doubt that you are either.

User avatar
Vaniver
Posts: 9422
Joined: Fri Oct 13, 2006 2:12 am UTC

Re: 1450: "AI-Box Experiment"

Postby Vaniver » Sat Nov 22, 2014 7:19 pm UTC

Pfhorrest wrote:Should everyone believe the proposition with a 51% probability?
Yep! It really is that simple.

(Now, people might have private evidence that leads them to different estimates, and that's to be expected because of the difficulties involved in communicating private information. There's a math result out there on how, if you trust others' reasoning processes, you can just share probability estimates to communicate the results of that private information and eventually end up believing the same thing, but this seems difficult to implement in practice because of lack of trust for others' reasoning processes.)

Pfhorrest wrote:at some point you need to settle the matter and act, and you have to act on some belief, so that's what I'm asking about here: what should the effective beliefs, the beliefs acted upon, of a population of rationalists be, on such a closely disputed issue?
That depends on the expected value calculation, as you suggest later. If it's a 51% chance of winning $1 by betting on black and a 49% chance of winning $1 by betting on red, bet on black every time; if it's a 51% chance of winning 49 cents for betting on black and a 49% chance of winning 52 cents by betting on red, bet on red every time.

Pfhorrest wrote:But that doesn't mean the issue is settled in the sense that everyone else should stop talking about it because your belief is more probable and therefore conclusively right.
I should point out that the approach taken is not so much "no one should talk about X" as "don't talk about X here." Most LWers are just not interested in arguments for the existence of God or that people are obligated to worship Allah. If other people are interested in that topic, they can talk about it elsewhere.

Pfhorrest wrote:Relatedly, the confidence you need to act on a belief is of course proportional to the stakes of acting upon it; and of course, other actors may have good reason to speak or even act against you if you are about to negligibly do something with high stakes for them. In the case of LWers and superintelligent AIs, if I thought someone was actually on the verge of creating some kind of dangerously powerful artificial intelligence and they were goint to make it a utilitarian that would (to use the example mentioned previously in the thread) kill someone to prevent motes of dust in a sufficient number of eyes, I would be arguing much, much more vehemently and possibly urging action within the limits of my (admittedly meager) sphere of influence to put a stop to that.
Let me point out that there are two main camps in superintelligent AI safety: "what do we need safety for?" and "hmm, if we don't get safety exactly right the first time we'll probably ruin everything forever." Maybe getting safety exactly right doesn't involve utilitarian calculations along those lines- and if so, LW will drop them. But in either case, I would be much more worried about, say, the military AI given the goal of maximizing the security and longevity of a particular government than the 'benevolent' AI given the goal of maximizing the welfare and longevity of people aggregated in a way that allows tradeoffs that deeply harm individuals. MIRI (the non-profit that employs EY) has done a lot of work over the last ten years moving people from the first camp to the second camp, and only recently switched mostly to working out the math of whether or not that is the right approach to take.

Pfhorrest wrote:I think that would be of great benefit to the aforementioned community as well, because as I agreed before, considering the practical application to AI actually makes for a great way of thinking rigorously about ethics.
So, I write book reviews for LessWrong; because my interests are in decision-making, I tend to pick decision-making books. If your interests are in philosophy, you might find it valuable to write reviews of the philosophy books you found most interesting / compelling /etc. for LW, with the comment that it's best to know your audience as well as possible when writing for them.
I mostly post over at LessWrong now.

Avatar from My Little Pony: Friendship is Magic, owned by Hasbro.

rmsgrey
Posts: 3653
Joined: Wed Nov 16, 2011 6:35 pm UTC

Re: 1450: "AI-Box Experiment"

Postby rmsgrey » Sat Nov 22, 2014 7:21 pm UTC

StClair wrote:I also remain unconvinced that any entity that can legitimately answer to the name of "god" will have any interest in us puny mortals, or reason to be. That we persist in believing or wishing otherwise, and assigning to ourselves a special place in the universe, is a product of our own biases, egotism, and other cognitive failure modes. Also our limited perspective, with a mere few decades of subjective consciousness each and a recorded history and culture only a few orders of magnitude longer, all of it spent on one planet.

I imagine an ant prophet confidently asserting that God has All The Sugar, because of course that is what would matter to a supreme being. Yet even that analogy probably fails to adequately describe the difference between our expectations and the unknowable possibility. It's merely the best I can come up with right now.

tl;dr - the biggest God we can imagine is still a very small one. I'm just not smart enough, and I doubt that you are either.


One of the perks that comes with being a supreme being is having the capacity to take an interest in "puny mortals" along with everything else rather than having to choose which details to pay attention to.

As Milton put it:

Who sees with equal eye, as God of all,
A hero perish or a sparrow fall,
Atoms or systems into ruin hurled,
And now a bubble burst, and now a world.

Such a God would indeed have all the sugar (or at least access to an unlimited supply) - that just wouldn't be their only attribute.

FeepingCreature
Posts: 5
Joined: Wed Mar 23, 2011 5:35 pm UTC

Re: 1450: "AI-Box Experiment"

Postby FeepingCreature » Sat Nov 22, 2014 7:25 pm UTC

I feel I need to correct an earlier mistake I made - due to order of changes in MediaWiki, the graph I posted of contributors to the RW LW page is in fact ... upside down.

So what we're actually seeing is a page initially dominated by one person that lately got a more varied authorship. Huh. Well I guess that's a good sign!

User avatar
Isaac Hill
Systems Analyst????
Posts: 547
Joined: Wed Mar 14, 2007 9:35 pm UTC
Location: Middletown, RI

Re: 1450: "AI-Box Experiment"

Postby Isaac Hill » Sat Nov 22, 2014 8:03 pm UTC

rmsgrey - You're right that the secrecy undermines the credibility. But, this is an experiment that was run 5 times with inconsistent incentives, since some Gatekeepers had penalities and others didn't, so it's not like there's a whole lot of credibility to begin with.

For this experiment to mean something, you'd want to run it many times, then release all the transcripts afterwards. Maybe the goal of the secrecy is to attract more attention so you get enough volunteers to run this large-scale. One potential winning argument is "Throw the game so people will show more interest in this stuff". You definitely wouldn't want that getting out, so you don't release the transcripts.
Alleged "poems"
that don't follow a rhyme scheme
are not poetry

Roxolan
Posts: 1
Joined: Sat Nov 22, 2014 8:35 pm UTC

Re: 1450: "AI-Box Experiment"

Postby Roxolan » Sat Nov 22, 2014 9:15 pm UTC

rmsgrey wrote:The problem with keeping the transcripts secret is twofold:

1) it reduces the credibility of the reported results
2) Publishing the transcripts would improve the security of any real-world AI boxes, either by letting actual Gatekeepers learn from fake Gatekeepers' mistakes, or by making it clearer how easy it is to bypass Gatekeepers and motivating a more robust solution

The reason for the secrecy is actually the opposite of that.

Currently, people look at the outcome of the experiments and wonder how the heck this is even possible. Most will conclude that if a measly human could pull it off, it's even more true of an amoral super-intelligence.

But human nature being what it is, if they could look at the logs - from the safety of their homes, browsing in five minutes over the records of a trap that was shaped through hours of interaction to catch a completely different person - I suspect their reaction would quickly shift to "oh, that's all? I could resist this easily."

And so they decide that the AI box experiment proved nothing, that boxed AIs still seem pretty safe. And then some real-life gatekeeper (who has carefully steeled themselves against that specific manipulation[1]) falls for another stupid trick you would have seen coming a mile away and Earth gets turned into paperclips.

(Well, that, and I suspect the logs would contain content that would be very humiliating for one or both sides. Same reason you wouldn't want a video of you crying over your loved ones' death to go viral on 4chan. From what I can guess, the AIs play dirty.)

[1] This is where I would link to xkcd 463 if I could post links.

SU3SU2U1
Posts: 396
Joined: Sun Nov 25, 2007 4:15 am UTC

Re: 1450: "AI-Box Experiment"

Postby SU3SU2U1 » Sat Nov 22, 2014 9:45 pm UTC

FeepingCreature wrote:I feel I need to correct an earlier mistake I made - due to order of changes in MediaWiki, the graph I posted of contributors to the RW LW page is in fact ... upside down.

So what we're actually seeing is a page initially dominated by one person that lately got a more varied authorship. Huh. Well I guess that's a good sign!


In fact, thats what all wiki pages look like- one main author usually writes the bulk of the original and then contributors jump in an add/subtract, edit.

Vaniver wrote:MIRI (the non-profit that employs EY) has done a lot of work over the last ten years moving people from the first camp to the second camp, and only recently switched mostly to working out the math of whether or not that is the right approach to take.


... this isn't true. Back when MIRI was called "SIAI" their primary mission goal was friendly AI research (look at their old website). They've consistently had a team of affiliated research associates, and occasionally have had as many as three full time researchers, and they've always had at least one full time researcher on staff.

It only looks like they've "only recently" started doing research, because in the > 10 years they've been doing this they've only put out a tiny amount actual technical results, all in the last few years, none of which are well cited. The one MIRI article on arxiv - http://arxiv.org/abs/1401.5577 (seriously, they only have ONE result fleshed out enough to be put on arxiv!) is down right silly.

They show with an unneeded math formalism that if players in a one-shot prisoner's dilemma can coordinate, it solves the coordination problem. Their proposed formalism is also pretty much impossible to code up. The worthlessness of this paper is probably why its attracted all of 1 self-citation in the year its been on the arxiv.

MIRI appears to be a Yudkowsky support network- it pays for him to do whatever he wants to do. He can write blog posts, he can write fan fiction, he can work on decision theories, etc. Its pretty clear they don't have much in the way of performance metrics they judge their in-house researchers on- many of them literally never publish anything while with the organization.


Return to “Individual XKCD Comic Threads”

Who is online

Users browsing this forum: Hafting and 112 guests