1132: "Frequentists vs. Bayesians"

This forum is for the individual discussion thread that goes with each new comic.

Moderators: Moderators General, Prelates, Magistrates

User avatar
San Fran Sam
Posts: 228
Joined: Tue Nov 15, 2011 5:54 pm UTC

Re: 1132: "Frequentists vs. Bayesians"

Postby San Fran Sam » Mon Nov 12, 2012 6:26 pm UTC

rmsgrey wrote:
San Fran Sam wrote:
J Thomas wrote:
Coyne wrote:Then, of course, there's the fact that, per our best understanding, our Sun can't go nova. It's not massive enough.


Thank you for that qualifier. Every time I see somebody post "Our Sun can never go nova. Science has proven that once and for all" it makes my teeth itch.


Sure it can.

Spoiler:
Didn't you see the last episode of Babylon 5?


I think you meant the last episode of the penultimate season of [name of show] - the last episode would either be the final episode of the final season, or the penultimate episode of the final season (since the final episode was produced as the last episode of the production run for the penultimate season).

Anyway, causing (or threatening to cause) our sun to go nova is a well-established way for sufficiently-advanced aliens to eliminate us...


You are correct sir, it was the last episode of the SECOND to last season. i was going by memory as i haven't seen that show since it ended and that event seemed like an appropriate way to end the series.

as to aliens destroying the Earth, there is always this classic. The Forge of God by Greg Bear.

User avatar
Dmytry
Posts: 68
Joined: Mon Jun 01, 2009 12:39 pm UTC
Contact:

Re: 1132: "Frequentists vs. Bayesians"

Postby Dmytry » Mon Nov 12, 2012 8:47 pm UTC

Or what if some stars have small black holes inside of them, LHC destroying the world style. You may be really sure Sun can not go nova, but how sure are you that Sun does not contain a microscopic black hole? Such black hole would have negligible effect, growing quite slowly until some day it rapidly snowballs.

The certainty that Sun can not go nova is a red herring and is quite irrelevant to the bets as long as the other side of the bet is not microscopic.

Even if the sun (or a similar star) could go nova at any time, for the sun to go nova specifically today is very unlikely simply by selection (a lot of days it could go nova). See, low probability, by frequentist reasoning. Then you get exact same Bayes theorem after you count all the days that Sun did not go Nova yet the detector has given a false positive, and compare that to all the days that Sun did go nova and the detector returned true positive. Try it. This is how Bayes theorem is explained.

The only way Bayesianism - self proclaimed one - differs is that you may call a number a probability without any justification, but even at this there is the desire to give lower probabilities to more complicated hypotheses (Occam's razor), and there's a scheme, called Solomonoff induction, where prior of 2-length is used for the 'hypothesis' (self delimited prefix Turing machine tape) of specific length. The math with this prior on hypotheses probabilities is not in any way what so ever different from calculations of a hypothetical insane frequentist that believes the world is a self delimited prefix Turing machine which iterates over every input tape.

Reasonably, you can only put a bound on the prior probability of the hypothesis by counting how much of the hypothesis you have made up without good justification, akin to estimating probability of hitting a target when you are shooting blind, based on size of the target in steradians. The issue is that you don't know all the random prejudices that seem totally natural, and you don't know the dimensionality of the space into which you are guessing, you only obtain a bound to be no more certain in your hypothesis than say one in 1000 because you know that your hypothesis has at least 3 instances of arbitrarily picking one thing out of at least 10. Easy to screw this sort of estimate up, either by taking as obviously true some shaky assumption (e.g. that space is Euclidean, which a lot of philosophers simply assumed as given until Einstein), or by under-estimating possible solution space.

User avatar
Coyne
Posts: 1061
Joined: Fri Dec 18, 2009 12:07 am UTC
Location: Orlando, Florida
Contact:

Re: 1132: "Frequentists vs. Bayesians"

Postby Coyne » Tue Nov 13, 2012 4:03 am UTC

Dmytry wrote:Or what if some stars have small black holes inside of them, LHC destroying the world style. You may be really sure Sun can not go nova, but how sure are you that Sun does not contain a microscopic black hole? Such black hole would have negligible effect, growing quite slowly until some day it rapidly snowballs.


I don't think that counts; it's not really a matter of the "Sun going nova". Just like if our Sun got hit by a neutron star: There would be a mighty big bang that definitely would be as big as a nova, but it wouldn't be our Sun "going nova", it would be "two stars colliding".
In all fairness...

garaden
Posts: 18
Joined: Thu Aug 11, 2011 3:40 am UTC

Re: 1132: "Frequentists vs. Bayesians"

Postby garaden » Tue Nov 13, 2012 4:13 am UTC

Dmytry wrote:Even if the sun (or a similar star) could go nova at any time, for the sun to go nova specifically today is very unlikely simply by selection (a lot of days it could go nova). See, low probability, by frequentist reasoning. Then you get exact same Bayes theorem after you count all the days that Sun did not go Nova yet the detector has given a false positive, and compare that to all the days that Sun did go nova and the detector returned true positive. Try it. This is how Bayes theorem is explained.


Not quite. By the same reasoning, you could be part of a sun-worshiping tribe and notice that the sun is exceedingly unlikely to wink out, since it hasn't done so in the uncountable years since you've all been worshiping it. Sure, you could stop worshiping it and see what happens, but it'd take many years to get to the same confidence (maybe it's just getting angrier and angrier...). But if you know physics and astronomy, you can apply the sheer complexity of "a sentient sun which for some reason cares about the opinions of hairless apes 93 million miles away" as a really low prior probability in Bayes's Theorem.

The point isn't "the sun's never exploded before, why should it explode today?". More like "the sun exploding would be a gross violation of well-established physical laws". The key is using experimental evidence to infer general laws, which can then predict the results of further experiments without us even having to run them. That's something frequentism alone can't handle, though as someone pointed out above (heck, it might have been Dmytry), experimental design and the scientific community is typically expected to take care of that kind of thing, arguably by using a form of Bayesian reasoning ("...by your powers combined..."). And of course putting Bayesian posterior probabilities in a paper as proof of significance would be worse than useless ("but I really, really think my theory is correct, that's why my prior is so high!").

User avatar
Dmytry
Posts: 68
Joined: Mon Jun 01, 2009 12:39 pm UTC
Contact:

Re: 1132: "Frequentists vs. Bayesians"

Postby Dmytry » Tue Nov 13, 2012 5:51 am UTC

garaden wrote:
Dmytry wrote:Even if the sun (or a similar star) could go nova at any time, for the sun to go nova specifically today is very unlikely simply by selection (a lot of days it could go nova). See, low probability, by frequentist reasoning. Then you get exact same Bayes theorem after you count all the days that Sun did not go Nova yet the detector has given a false positive, and compare that to all the days that Sun did go nova and the detector returned true positive. Try it. This is how Bayes theorem is explained.


Not quite. By the same reasoning, you could be part of a sun-worshiping tribe and notice that the sun is exceedingly unlikely to wink out, since it hasn't done so in the uncountable years since you've all been worshiping it. Sure, you could stop worshiping it and see what happens, but it'd take many years to get to the same confidence (maybe it's just getting angrier and angrier...). But if you know physics and astronomy, you can apply the sheer complexity of "a sentient sun which for some reason cares about the opinions of hairless apes 93 million miles away" as a really low prior probability in Bayes's Theorem.

You can only do so because complex something can be really many things other than a sentient being that cares about the opinions of hairless apes. Once again low probability arises from rarity among indistinguishable. High complexity itself doesn't imply low probability; decaying not quite elliptical orbit of a planet is much more complex than a circle. A specific guess about a high complexity object, however, does have low probability.

Half-handwaving a low probability for something you already been culturally taught is wrong, that's self described Bayesianism all right. Hint: it doesn't work for anything new. You'd have a handwaved high prior probability for the sun being a sentient being and so on if you were a sun worshipper (conceptual complexity of a conscious being in any case is no greater than that of the grand unified theory of everything, as the latter gives rise to the former by mechanical application of the laws of physics). High prior to that would be terrible because it would be essentially non falsifiable. Indeed that's precisely what self described Bayesianists do, handwaving a high prior for future killer robots (and it doesn't have to be very high for it to be a big deal because a lot of people will die and won't ever live).

I mentioned Solomonoff induction; the idea to measure complexity of the hypothesis by length of Turing Machine tape that implements the hypothesis. That seems sane but it as far as we know gives waay low probability to any sort of symmetry (e.g. CPT symmetry, even very approximate), and high priors for some sort of Game-of-life universe with irreversible laws. And even this advanced "Bayesian" scheme is totally homologous to a frequentist with a ridiculous dogmatic belief.

edit: also how do we dispose of the idea that length of Turing machine tape is a good measure of complexity of a hypothesis? Well, there's the frequentist approach: we live in world that has CPT symmetry, at least approximate, which would be incredibly unlikely if the length of Turing machine tape was a good measure of complexity of a hypothesis. The decision procedure for rejecting a method that gives very low probability to observed data, is rarely wrong, and we adopt such decision procedure.
The point isn't "the sun's never exploded before, why should it explode today?". More like "the sun exploding would be a gross violation of well-established physical laws". The key is using experimental evidence to infer general laws, which can then predict the results of further experiments without us even having to run them. That's something frequentism alone can't handle,

How so? If we are speaking of self described Bayesianism: these guys can indeed give low probability to anything (e.g. non MWI world) via handwave, and that does not have frequentist justification, and the handwave can do anything including things that you don't know how to do in a non handwavy manner. Application of the laws, however, does. It is not described as "frequentist" per se because nobody competent is interested in labelling things as either "frequentist" or "Bayesian" whenever it is not appropriate to do so.
though as someone pointed out above (heck, it might have been Dmytry), experimental design and the scientific community is typically expected to take care of that kind of thing, arguably by using a form of Bayesian reasoning ("...by your powers combined..."). And of course putting Bayesian posterior probabilities in a paper as proof of significance would be worse than useless ("but I really, really think my theory is correct, that's why my prior is so high!").

Absolutely. Well, the self described Bayesians really, really want that world, where they would get a lot of slack for their ability to verbally handwave and assert things about relative 'complexities' .

edit: some really wise words on the arguments over interpretations of probability:
It is unanimously agreed that statistics depends somehow on probability. But, as to what probability is and how it is connected with statistics, there has seldom been such complete disagreement and breakdown of communication since the Tower of Babel. Doubtless, much of the disagreement is merely terminological and would disappear under sufficiently sharp analysis.

Leonard Savage, The foundations of statistics, 1954.

User avatar
Dmytry
Posts: 68
Joined: Mon Jun 01, 2009 12:39 pm UTC
Contact:

Re: 1132: "Frequentists vs. Bayesians"

Postby Dmytry » Tue Nov 13, 2012 12:35 pm UTC

Coyne wrote:
Dmytry wrote:Or what if some stars have small black holes inside of them, LHC destroying the world style. You may be really sure Sun can not go nova, but how sure are you that Sun does not contain a microscopic black hole? Such black hole would have negligible effect, growing quite slowly until some day it rapidly snowballs.


I don't think that counts; it's not really a matter of the "Sun going nova". Just like if our Sun got hit by a neutron star: There would be a mighty big bang that definitely would be as big as a nova, but it wouldn't be our Sun "going nova", it would be "two stars colliding".

Well, the final stage of collapse would probably still result in a lot of fusion and neutrino pulse, so the detector would still work.

rmsgrey
Posts: 3457
Joined: Wed Nov 16, 2011 6:35 pm UTC

Re: 1132: "Frequentists vs. Bayesians"

Postby rmsgrey » Tue Nov 13, 2012 12:41 pm UTC

Dmytry wrote:Reasonably, you can only put a bound on the prior probability of the hypothesis by counting how much of the hypothesis you have made up without good justification, akin to estimating probability of hitting a target when you are shooting blind, based on size of the target in steradians. The issue is that you don't know all the random prejudices that seem totally natural, and you don't know the dimensionality of the space into which you are guessing, you only obtain a bound to be no more certain in your hypothesis than say one in 1000 because you know that your hypothesis has at least 3 instances of arbitrarily picking one thing out of at least 10. Easy to screw this sort of estimate up, either by taking as obviously true some shaky assumption (e.g. that space is Euclidean, which a lot of philosophers simply assumed as given until Einstein), or by under-estimating possible solution space.


Or by treating things as independent which aren't.

AtG
Posts: 51
Joined: Wed Mar 12, 2008 6:27 am UTC

Re: 1132: "Frequentists vs. Bayesians"

Postby AtG » Tue Nov 13, 2012 12:54 pm UTC

Thanks Dmytry, yours is a standpoint I can adopt.

Personally I like mathematical theorems more than polemics about fine points of epistemology. David Williams - Weighing the Odds is proba... almost surely a good read for both sides, though his texts are a bit heavy at times.

Aiwendil
Posts: 311
Joined: Thu Apr 07, 2011 8:53 pm UTC
Contact:

Re: 1132: "Frequentists vs. Bayesians"

Postby Aiwendil » Tue Nov 13, 2012 8:35 pm UTC

Since wumpus refers to HPMOR, I think he's talking about Yudkowsky & co.


Yes I meant Yudkowsky and crew. I am not exactly sure who else self-identifies with "Bayesian" enough to call themselves that


This mostly comes from one somewhat infamous non-mathematician non-physicist "rationalist" "bayesian" self publishing an incredible amount of nonsense such as this, and his groupthink club.


Ah, I see. I was not familiar with him. He does indeed appear to be quite muddled in his thinking, both on a scientific level and on a logical/philosophical one.

But there are other people (like me) who would self-identify as Bayesian (well, maybe we wouldn't shout it from the rooftops like this Yudkowsky character, but we'd accept it as a suitable and applicable label). I consider myself a Bayesian in the sense that I think that when interpreting the results of an experiment, it's necessary to take prior probabilities into consideration. You'd think this would be an uncontrovesial stance, but there really are some phycisists who think the thing to do is to calculate the (frequentist) probability that we would get the result we did, assuming the null hypothesis, and then, if that probability is, say 5%, simply declare that we are 95% confident that the null hypothesis is not true.

User avatar
Dmytry
Posts: 68
Joined: Mon Jun 01, 2009 12:39 pm UTC
Contact:

Re: 1132: "Frequentists vs. Bayesians"

Postby Dmytry » Tue Nov 13, 2012 10:56 pm UTC

Aiwendil wrote:
Since wumpus refers to HPMOR, I think he's talking about Yudkowsky & co.


Yes I meant Yudkowsky and crew. I am not exactly sure who else self-identifies with "Bayesian" enough to call themselves that


This mostly comes from one somewhat infamous non-mathematician non-physicist "rationalist" "bayesian" self publishing an incredible amount of nonsense such as this, and his groupthink club.


Ah, I see. I was not familiar with him. He does indeed appear to be quite muddled in his thinking, both on a scientific level and on a logical/philosophical one.

But there are other people (like me) who would self-identify as Bayesian (well, maybe we wouldn't shout it from the rooftops like this Yudkowsky character, but we'd accept it as a suitable and applicable label). I consider myself a Bayesian in the sense that I think that when interpreting the results of an experiment, it's necessary to take prior probabilities into consideration. You'd think this would be an uncontrovesial stance, but there really are some phycisists who think the thing to do is to calculate the (frequentist) probability that we would get the result we did, assuming the null hypothesis, and then, if that probability is, say 5%, simply declare that we are 95% confident that the null hypothesis is not true.

That sounds like a bit of caricature... I do have a stance that in many circumstances we really can not do any better than make a decision procedure with a low, bounded risk of being wrong, and follow it. Priors are a very messy business. They are quite ridiculous, even, in a way. You are to assign a non-zero, not even very small probability that it is totally, completely correct, to a theory, that it may literally be the great truth of how universe works. Physics is not like that, it is about making approximations that work, there's no guarantee the mathematics even in principle allows to capture the ultimate structure of the universe perfectly and there's no justification that it would. Physics works by making it unlikely that we would adopt a bad method. It's very elegant, in fact, and un-obvious, and counter intuitive - it goes against our predominantly Bayesian (in the sense of priors updated by Bayes rule) intuition. (How do you propose we set priors in physics, by the way?)

It works particularly well in physics where a: no sensible priors for hypotheses exist, and b: confidences can be made utterly enormous with several replications of the experiment. In a single experiment you may often have an enormous number of samples, as well.

Back when people believed in all those silly things, against the evidence, that was Bayesian reasoning. Dogmatic crazy religious beliefs in spite of evidence, that's Bayesian. Set probability of 1 to something and you'll never rid of it!

But there was this solution for getting rid of wrong ideas: if you believe in something really hard, you can still commit to risk of stopping believing in it, at one chance in, say, a billion or a billion billion (which is easily achievable in physics; the physics doesn't deal with 5% confidence intervals, psychology does), if it was true. The big, highly counter intuitive leap was doing it retrospectively as well.

Then, the practical matters, which are actually very important too. How good is it, actually, to have a lot of hypotheses in parallel being assigned probabilities? You want to predict anything, you have to evaluate them all, and the combinations blow up so fast it is not possible to do anything even if you were a Dyson sphere brain. Consider the case of measuring constants, if you wish, to have a less discrete example in mind. Bayesian? Nice, each constant is now a probability distribution, every product is a convolution, and so on. Anything nonlinear, you get an integral over all the variables. (One time online I did encounter particularly obnoxious non mathematician Bayesianist that proclaimed he is doing this in his mind somehow, as if it was a good thing. The dude must literally have had some problem with the headmeat, or been trolling, it's impossible to tell nowadays).

That is not to detract in any way from usefulness of "Bayes theorem". It must be understood though that "Bayes theorem" is something just like, say, Archimedes law, and the calculations for a ship will work just fine if you sum the pressure forces on the bottom of the hull (which is also homologous to calculating the ship's volume). If you start counting the cases in frequentist manner, with this 'prior' coming from frequency of some kind (which you can always express any reasonable justification of a Bayesian prior as), you do same mathematics. The shout from rooftop bayesians sort of remind me of some rather annoying case of a guy that simply did not get the derived results. You have barrel falling into water? You must use Archimedes law or you are denying Archimedes law, that's how the guy seen it. (The barrel being dynamic and the whole problem being far harder than any hydrostatics, Archimedes law was not particularly useful even though it did describe the result when the barrel would stop bobbling up and down)

edit: a note. Results of previous experiments or application of the laws are not a 'prior' for Bayes theorem, they do not give you probability that a hypothesis is true (or useful or anything), furthermore they can't be straightforwardly combined with current experiment because you don't know if they are statistically independent. They can only be combined with current experiment under presumption of a theory or hypothesis. The way I see Bayes theorem is as a proof that you can't ever know probability of theory given the data - given that you don't know probability of the theory without the data.

Aiwendil
Posts: 311
Joined: Thu Apr 07, 2011 8:53 pm UTC
Contact:

Re: 1132: "Frequentists vs. Bayesians"

Postby Aiwendil » Wed Nov 14, 2012 6:04 pm UTC

Dmytry wrote:You are to assign a non-zero, not even very small probability that it is totally, completely correct, to a theory, that it may literally be the great truth of how universe works.


To any theory that is consistent and not (yet) directly contradicted by empirical data, yes. It seems perfectly reasonable to me to assign a non-zero probability to any such theory; and on the contrary it seems quite unreasonable to assign any such theory a probability of exactly zero. I'm not sure why you think the probability will be "not even very small"; it could be very small indeed.

Physics is not like that, it is about making approximations that work, there's no guarantee the mathematics even in principle allows to capture the ultimate structure of the universe perfectly and there's no justification that it would.


Here we are getting into philosophical territory, and I suspect we may have very different ideas about what the nature of science fundamentally is. For what it's worth, I hold that the aim of science is to generate an accurate description of phenomena. And in my view, an exactly correct description of the phenomena is necessarily possible in principle, though there's no a priori guarantee that the correct description will be simple in the way that real scientific theories do, in fact, prove to be. But even if I were to grant the possibility that the universe is not exactly describable, even in principle, it still would obviously not follow that we can be sure that it is not describable. Thus, we still could not assign a zero probability to any particular theory, as long as that theory is consistent and non-contradicted.

Dogmatic crazy religious beliefs in spite of evidence, that's Bayesian. Set probability of 1 to something and you'll never rid of it!


Sure, if you pick crazy priors, you will be crazy.

But there was this solution for getting rid of wrong ideas: if you believe in something really hard, you can still commit to risk of stopping believing in it, at one chance in, say, a billion or a billion billion


Exactly. But what you've just described is perfectly Bayesian. All you're saying is 'don't set any prior to exactly 1 (or exactly 0)'.

Then, the practical matters, which are actually very important too. How good is it, actually, to have a lot of hypotheses in parallel being assigned probabilities? You want to predict anything, you have to evaluate them all, and the combinations blow up so fast it is not possible to do anything even if you were a Dyson sphere brain.


But being a Bayesian doesn't mean that you have to actually compute an update to your probabilities for every possible hypothesis every time you get a new piece of data. Nor does it mean that, in practice, to find P(x), you have to do a sum/integral over P(x|y) where y ranges over all possible hypotheses. That would be like saying that believing in the Standard Model requires you to do an enormous QFT calculation including every particle in the known universe if you simply want to know, say, how much stress a piece of rope can withstand before it breaks. It's perfectly fine to ignore terms in the probability sum if you expect them to make a negligible contribution. It's also fine to restrict yourself to a certain domain D of hypotheses and calculate P(x|D) instead of P(x), if you want. But it's well to recognize in that case that what you've calculated is P(x|D) rather than P(x).

How do you propose we set priors in physics, by the way?


Oh, don't mistake me - I think that the problem of setting priors is a very serious and a very deep one. In fact, this is the old problem of induction identified by Hume - or what's left of it once you have Bayes's theorem. But one does not solve or avoid the problem by adopting a non-Bayesian approach; one merely sweeps it under the rug.

Consider an example where we've done an experiment to try to determine some physical constant and we're now trying to set a 90% confidence limit on the value of that constant. I've encountered physicists who seem to think that by just doing a p-value test and not explicitly including priors, they are somehow on surer epistemic footing than those who take a Bayesian approach. But doing that p-value test is exactly equivalent to picking a certain set of values for the priors - namely, a uniform distribution over values of the constant we're measuring - and then using Bayes's theorem. All the frequentist has done is made the choice of priors implicit rather than explicit, but they sometimes act as though this has given them the moral high ground.

Now, I don't think this actually matters terribly much in practice, at least in the long run. Setting the priors to be a uniform distribution in some parameter of your theory is usually a pretty reasonable thing to do, so the frequentist approach is usually not crazy. And in practice, if you do the right experiments and get good statistics, then you will converge on the same result as long as your priors are not pathological.

I guess what bothers me about (some) physicists who adopt something of an anti-Bayesian stance is two things, one logical or perhaps terminological, and the other practical.

1. Some phycisists tend to talk as if P(data|hypothesis) and P(hypothesis|data) were the same thing. On a certain assumption about the priors, those two things will have the same value, but they are different entities. And if you want to calculate P(hypothesis|data), then you have to make some assumption about priors - even if that assumption is just a uniform distribution.

2. In a few cases - namely, when we do actually have good reason to assume some particular priors - the non-Bayesian simply gets the wrong answer. For example, there have been several experiments designed to put constraints on the mass of the neutrino where the best fit value has ended up being a negative mass-squared (i.e. an imaginary value for the mass). Some of these have, using frequentist statistics, published confidence intervals that lie partially or completely in the negative mass-squared region. But surely it's absurd in this case to assume a uniform prior distribution over positive and negative values of mass-squared, since the negative values are unphysical (i.e. if you put an imaginary value in for the mass of a particle, the theory becomes incoherent). The thing to do, surely, is to set the prior probability of an imaginary mass to zero. I mean, by all means, acknowledge that your best fit value is negative. But if you want to talk about a confidence limit on the true value for the mass - that is, if you want to ask 'in light of this data, what can I say about what mass the neutrino is likely to have?' - then don't give unphysical values a non-zero probability.

But look, if what you're saying is that there are times when it's appropriate to use Bayes's theorem and times when it's not, then I completely agree with you - even if, perhaps, we'd disagree about whether it's the appropriate thing to use in certain specific instances.

J Thomas
Everyone's a jerk. You. Me. This Jerk.^
Posts: 1190
Joined: Fri Sep 23, 2011 3:18 pm UTC

Re: 1132: "Frequentists vs. Bayesians"

Postby J Thomas » Wed Nov 14, 2012 6:29 pm UTC

Aiwendil wrote:Here we are getting into philosophical territory, and I suspect we may have very different ideas about what the nature of science fundamentally is. For what it's worth, I hold that the aim of science is to generate an accurate description of phenomena. And in my view, an exactly correct description of the phenomena is necessarily possible in principle, though there's no a priori guarantee that the correct description will be simple in the way that real scientific theories do, in fact, prove to be. But even if I were to grant the possibility that the universe is not exactly describable, even in principle, it still would obviously not follow that we can be sure that it is not describable.


I want to thank you and Dmytry for this exchange. You both take reasonable, useful positions, and you both object to unreasonable positions taken by third parties. Clearly, it's possible to do frequentist reasoning wrong, and it's possible to do bayesian reasoning wrong.

I wanted to go off on a tangent from this single paragraph I quoted. You say that science should generate accurate descriptions, and it should be possible in principle to get things exactly correct.

But isn't it true that it has been proven that nothing in the real world can be measured precisely? Once you get below Planck's Constant then there is no way anything can ever be measured, and every precise measurement will involve Planck's constant one way or another. So we know that there can never ever be a completely precise measurement. Physics has proven this just as physics has proven that physics will never ever discover anything to go faster than lightspeed.

There can be new discoveries in physics, but the basics are known now and can never be changed. There are hypotheses in physics that are uncertain, but the uncertainty principle and relativity are now known to be true.

Am I right, or am I right?
The Law of Fives is true. I see it everywhere I look for it.

SHISHKABOB
Posts: 27
Joined: Fri Dec 11, 2009 3:14 pm UTC

Re: 1132: "Frequentists vs. Bayesians"

Postby SHISHKABOB » Wed Nov 14, 2012 6:39 pm UTC

the bayesian guy probably took a basic astronomy class at some point

is that the joke?

Fire Brns
Posts: 1114
Joined: Thu Oct 20, 2011 2:25 pm UTC

Re: 1132: "Frequentists vs. Bayesians"

Postby Fire Brns » Wed Nov 14, 2012 6:51 pm UTC

Dmytry wrote:Or what if some stars have small black holes inside of them, LHC destroying the world style. You may be really sure Sun can not go nova, but how sure are you that Sun does not contain a microscopic black hole? Such black hole would have negligible effect, growing quite slowly until some day it rapidly snowballs.
Black holes are not vacuum cleaners IN SPACE. If our sun collapsed into a black hole it would still exert the same effective gravity on Earth but we would still have to deal with the whole "No Light Or Heat" from it and whatever radiation manages to escape it.
Pfhorrest wrote:As someone who is not easily offended, I don't really mind anything in this conversation.
Mighty Jalapeno wrote:It was the Renaissance. Everyone was Italian.

User avatar
neremanth
Posts: 157
Joined: Wed Jul 25, 2012 4:24 pm UTC
Location: UK

Re: 1132: "Frequentists vs. Bayesians"

Postby neremanth » Wed Nov 14, 2012 7:16 pm UTC

SHISHKABOB wrote:the bayesian guy probably took a basic astronomy class at some point

is that the joke?


Oh, how I wish it was.

AtG
Posts: 51
Joined: Wed Mar 12, 2008 6:27 am UTC

Re: 1132: "Frequentists vs. Bayesians"

Postby AtG » Wed Nov 14, 2012 7:53 pm UTC

J Thomas wrote:But isn't it true that it has been proven that nothing in the real world can be measured precisely? Once you get below Planck's Constant then there is no way anything can ever be measured, and every precise measurement will involve Planck's constant one way or another. So we know that there can never ever be a completely precise measurement. Physics has proven this just as physics has proven that physics will never ever discover anything to go faster than lightspeed.

There can be new discoveries in physics, but the basics are known now and can never be changed. There are hypotheses in physics that are uncertain, but the uncertainty principle and relativity are now known to be true.

Am I right, or am I right?


What is truth?

Relativity theory is not proven through formal logic from some set of self-evident axioms, it just fits observations really really well. Rejecting it corresponds strongly with being a crackpot, although most crackpots like Einstien.

The uncertainty principle does follow from the mathematical model of quantum mechanics. (At least the dp dx thingy in nonrelativistic quantum mechanics corresponds to a theorem in Fourier analysis, I leave the rest to the physicists here.)

Aiwendil
Posts: 311
Joined: Thu Apr 07, 2011 8:53 pm UTC
Contact:

Re: 1132: "Frequentists vs. Bayesians"

Postby Aiwendil » Wed Nov 14, 2012 9:07 pm UTC

J Thomas wrote:But isn't it true that it has been proven that nothing in the real world can be measured precisely? Once you get below Planck's Constant then there is no way anything can ever be measured, and every precise measurement will involve Planck's constant one way or another. So we know that there can never ever be a completely precise measurement. Physics has proven this just as physics has proven that physics will never ever discover anything to go faster than lightspeed.

There can be new discoveries in physics, but the basics are known now and can never be changed. There are hypotheses in physics that are uncertain, but the uncertainty principle and relativity are now known to be true.


The uncertainty principle and relativity are certainly not known to be true. They're very well confirmed, but it is possible in principle for future observations to contradict them.

But that actually is not relevant to my claim that an exactly correct description of the universe is possible in principle. Note that I am not claiming that it is possible for anyone to actually discover what that correct description is. In fact, it's not necessarily the case, a priori at least, that that correct description be expressible in the form of a few succinct, very general statements - i.e. in the form of scientific laws.

SHISHKABOB
Posts: 27
Joined: Fri Dec 11, 2009 3:14 pm UTC

Re: 1132: "Frequentists vs. Bayesians"

Postby SHISHKABOB » Thu Nov 15, 2012 4:42 am UTC

Aiwendil wrote:
J Thomas wrote:But isn't it true that it has been proven that nothing in the real world can be measured precisely? Once you get below Planck's Constant then there is no way anything can ever be measured, and every precise measurement will involve Planck's constant one way or another. So we know that there can never ever be a completely precise measurement. Physics has proven this just as physics has proven that physics will never ever discover anything to go faster than lightspeed.

There can be new discoveries in physics, but the basics are known now and can never be changed. There are hypotheses in physics that are uncertain, but the uncertainty principle and relativity are now known to be true.


The uncertainty principle and relativity are certainly not known to be true. They're very well confirmed, but it is possible in principle for future observations to contradict them.



I'd like to point out for general purposes that if a scientific theory is not capable of being disproved then it's not a scientific theory whatsoever.

User avatar
dudiobugtron
Posts: 1098
Joined: Mon Jul 30, 2012 9:14 am UTC
Location: The Outlier

Re: 1132: "Frequentists vs. Bayesians"

Postby dudiobugtron » Thu Nov 15, 2012 5:44 am UTC

Yes, however that might be because it is a scientific theorem. ;)
Image

User avatar
Dmytry
Posts: 68
Joined: Mon Jun 01, 2009 12:39 pm UTC
Contact:

Re: 1132: "Frequentists vs. Bayesians"

Postby Dmytry » Thu Nov 15, 2012 8:45 am UTC

Aiwendil wrote:
Dmytry wrote:You are to assign a non-zero, not even very small probability that it is totally, completely correct, to a theory, that it may literally be the great truth of how universe works.


To any theory that is consistent and not (yet) directly contradicted by empirical data, yes. It seems perfectly reasonable to me to assign a non-zero probability to any such theory; and on the contrary it seems quite unreasonable to assign any such theory a probability of exactly zero. I'm not sure why you think the probability will be "not even very small"; it could be very small indeed.

Physics is not like that, it is about making approximations that work, there's no guarantee the mathematics even in principle allows to capture the ultimate structure of the universe perfectly and there's no justification that it would.


Here we are getting into philosophical territory, and I suspect we may have very different ideas about what the nature of science fundamentally is. For what it's worth, I hold that the aim of science is to generate an accurate description of phenomena. And in my view, an exactly correct description of the phenomena is necessarily possible in principle, though there's no a priori guarantee that the correct description will be simple in the way that real scientific theories do, in fact, prove to be. But even if I were to grant the possibility that the universe is not exactly describable, even in principle, it still would obviously not follow that we can be sure that it is not describable. Thus, we still could not assign a zero probability to any particular theory, as long as that theory is consistent and non-contradicted.

The issue is what probability, specifically, want you to assign? It is entirely unquantifiable. edit: the issue with assigning probabilities is also that they must sum to 1 , this is hard to ensure when you can't enumerate. We have no idea of how much of a wild guess the "world is exactly representable with mathematics" is... the mathematics is not exactly representable with other mathematics either, you can't do true vectors or true reals or the like on a Turing machine...

I do not believe free hanging probabilities are even well defined. You need a theory to calculate such probabilities. The probability of data given a theory, in contrast, is well defined by the theory itself.

Dogmatic crazy religious beliefs in spite of evidence, that's Bayesian. Set probability of 1 to something and you'll never rid of it!


Sure, if you pick crazy priors, you will be crazy.

And how do you pick priors? Using frequentist-ish reasoning?

But there was this solution for getting rid of wrong ideas: if you believe in something really hard, you can still commit to risk of stopping believing in it, at one chance in, say, a billion or a billion billion


Exactly. But what you've just described is perfectly Bayesian. All you're saying is 'don't set any prior to exactly 1 (or exactly 0)'.

If I give even one in a billion probability to existence of Christian-style God, I can be Pascal-wagered, as I would also rely on the probabilities for decisionmaking. See the insane "Bayesians" and their killer robots.


Then, the practical matters, which are actually very important too. How good is it, actually, to have a lot of hypotheses in parallel being assigned probabilities? You want to predict anything, you have to evaluate them all, and the combinations blow up so fast it is not possible to do anything even if you were a Dyson sphere brain.


But being a Bayesian doesn't mean that you have to actually compute an update to your probabilities for every possible hypothesis every time you get a new piece of data. Nor does it mean that, in practice, to find P(x), you have to do a sum/integral over P(x|y) where y ranges over all possible hypotheses. That would be like saying that believing in the Standard Model requires you to do an enormous QFT calculation including every particle in the known universe if you simply want to know, say, how much stress a piece of rope can withstand before it breaks. It's perfectly fine to ignore terms in the probability sum if you expect them to make a negligible contribution. It's also fine to restrict yourself to a certain domain D of hypotheses and calculate P(x|D) instead of P(x), if you want. But it's well to recognize in that case that what you've calculated is P(x|D) rather than P(x).

How do you propose we set priors in physics, by the way?


Oh, don't mistake me - I think that the problem of setting priors is a very serious and a very deep one. In fact, this is the old problem of induction identified by Hume - or what's left of it once you have Bayes's theorem. But one does not solve or avoid the problem by adopting a non-Bayesian approach; one merely sweeps it under the rug.

Consider an example where we've done an experiment to try to determine some physical constant and we're now trying to set a 90% confidence limit on the value of that constant. I've encountered physicists who seem to think that by just doing a p-value test and not explicitly including priors, they are somehow on surer epistemic footing than those who take a Bayesian approach. But doing that p-value test is exactly equivalent to picking a certain set of values for the priors - namely, a uniform distribution over values of the constant we're measuring - and then using Bayes's theorem. All the frequentist has done is made the choice of priors implicit rather than explicit, but they sometimes act as though this has given them the moral high ground.

It is simpler and doesn't let you get as much information in through your prejudices. Choosing among priors lets you that much choice over screwing up the result.

Now, I don't think this actually matters terribly much in practice, at least in the long run. Setting the priors to be a uniform distribution in some parameter of your theory is usually a pretty reasonable thing to do, so the frequentist approach is usually not crazy. And in practice, if you do the right experiments and get good statistics, then you will converge on the same result as long as your priors are not pathological.

I guess what bothers me about (some) physicists who adopt something of an anti-Bayesian stance is two things, one logical or perhaps terminological, and the other practical.

1. Some phycisists tend to talk as if P(data|hypothesis) and P(hypothesis|data) were the same thing. On a certain assumption about the priors, those two things will have the same value, but they are different entities. And if you want to calculate P(hypothesis|data), then you have to make some assumption about priors - even if that assumption is just a uniform distribution.

Competent ones don't do that. P(hypothesis|data) is undefined, that is my position. One can bullshit it because anything can be bullshitted, but you can't calculate it. There isn't such thing to be calculated. Bayes theorem proves you'll never have it.

P(hypothesis|data, theory) may be well defined for some theories though, and that's where Bayesian methods come into use, if the theory is any good. It's what good Bayesians tend to mean by P(hypothesis|data)

With regards to the use of P(data|hypothesis) , it can be used directly in decision strategies, you can calculate bound on how much you lose.

2. In a few cases - namely, when we do actually have good reason to assume some particular priors - the non-Bayesian simply gets the wrong answer. For example, there have been several experiments designed to put constraints on the mass of the neutrino where the best fit value has ended up being a negative mass-squared (i.e. an imaginary value for the mass). Some of these have, using frequentist statistics, published confidence intervals that lie partially or completely in the negative mass-squared region. But surely it's absurd in this case to assume a uniform prior distribution over positive and negative values of mass-squared, since the negative values are unphysical (i.e. if you put an imaginary value in for the mass of a particle, the theory becomes incoherent). The thing to do, surely, is to set the prior probability of an imaginary mass to zero. I mean, by all means, acknowledge that your best fit value is negative. But if you want to talk about a confidence limit on the true value for the mass - that is, if you want to ask 'in light of this data, what can I say about what mass the neutrino is likely to have?' - then don't give unphysical values a non-zero probability.

Seems like post-hoc mending with the results, to be honest. It's up to reader to interpret that, not up to experimenter to fix up his results so that they would be correct.

But look, if what you're saying is that there are times when it's appropriate to use Bayes's theorem and times when it's not, then I completely agree with you - even if, perhaps, we'd disagree about whether it's the appropriate thing to use in certain specific instances.

Well, I do not believe it to be even usable in certain circumstances... What exists is P(theory2|data and theory1) , where theory1 is what provides priors and the complete computational framework for the refinement of the theory1 into theory2. The most loud "Bayesians" on the internet are guys that want to obtain priors by bullshitting for a very nefarious purpose (separating naive folks from their money for the purpose of having a guy that never invented any algorithm I know of, or done anything really notable, fight the killer robots with the friendly robots). They're most interested in obscuring the necessity of theory1.

Then, there's this promotion of 'passive' probabilistic decision making rather than decision procedures/strategies. Example of passive probabilistic decision making: you don't have low enough probability for that guy fighting killer robots successfully, so you calculate ridiculous expected utilities from donation, donate, and have some scammer rolling in cash and admiration. Example of strategic decision making: you adopt a procedure of "if someone has some clearly non bullshit accomplishments that are technically notable (e.g. solved some long-ish-standing mathematical problems etc etc), and asks for money to save the world, and looks sane, and is not gaining from this transaction under assumption of no saving the world, consider donating". The former is how scammers earn their money, the latter is how world can get saved in the off chance that such becomes necessary.

Me, I see the processes that a scientist or anyone else employs as strategies... and this squares fine with use of "probability of data given hypothesis" to put a bound on probability of dangerous failures, and for calculating how much resources have to be spent on experimentation given the cost of consequences of type 1 and type 2 error in particular circumstances. Yes that can be done by setting priors but that is problematic because you have various prior based probabilities in the decision making, which easily lead to bad outcomes (Pascal's wager etc). Some reliance on prejudices is entirely unavoidable, but the least you rely on the correctness of the least prejudices the better. It may be that ultimate truth of the universe is math-descriptable, but I do not want to rely on this as assumption, and because probability of that is not quantifiable, I'd rather avoid relying on made up number for that. Seeking the approximations, you'll find the truth if you can, but seeking the ultimate truth you'll miss the approximations...
Last edited by Dmytry on Thu Nov 15, 2012 2:15 pm UTC, edited 5 times in total.

User avatar
Dmytry
Posts: 68
Joined: Mon Jun 01, 2009 12:39 pm UTC
Contact:

Re: 1132: "Frequentists vs. Bayesians"

Postby Dmytry » Thu Nov 15, 2012 8:58 am UTC

Fire Brns wrote:
Dmytry wrote:Or what if some stars have small black holes inside of them, LHC destroying the world style. You may be really sure Sun can not go nova, but how sure are you that Sun does not contain a microscopic black hole? Such black hole would have negligible effect, growing quite slowly until some day it rapidly snowballs.
Black holes are not vacuum cleaners IN SPACE. If our sun collapsed into a black hole it would still exert the same effective gravity on Earth but we would still have to deal with the whole "No Light Or Heat" from it and whatever radiation manages to escape it.

we would have to deal with enormous kaboom problem as the matter falls in, undergoes fusion, et cetera.

rmsgrey
Posts: 3457
Joined: Wed Nov 16, 2011 6:35 pm UTC

Re: 1132: "Frequentists vs. Bayesians"

Postby rmsgrey » Thu Nov 15, 2012 11:20 am UTC

As far as neutrino mass goes, if the best evidence points toward an imaginary rest mass (despite that being incompatible with our best prior theories) then rejecting that as impossible would be unscientific. If the evidence supports both real and imaginary rest mass, then, while the reporting of the data should make that clear, it's not unreasonable for your conclusion to assert that the rest mass is within a stated real range and exclude the imaginary possibility (at least until more data comes in)

The "FTL" neutrinos a while back were an example of scientists using bayesian reasoning - something like: "given our prior probability of neutrinos traveling faster than light, and that of experimental error, we're pretty sure that the result showing neutrinos traveling at FTL speeds is evidence of experimental error." Which is why the initial report essentially said: "can anyone suggest where we went wrong please?"

User avatar
Dmytry
Posts: 68
Joined: Mon Jun 01, 2009 12:39 pm UTC
Contact:

Re: 1132: "Frequentists vs. Bayesians"

Postby Dmytry » Thu Nov 15, 2012 12:24 pm UTC

rmsgrey wrote:As far as neutrino mass goes, if the best evidence points toward an imaginary rest mass (despite that being incompatible with our best prior theories) then rejecting that as impossible would be unscientific. If the evidence supports both real and imaginary rest mass, then, while the reporting of the data should make that clear, it's not unreasonable for your conclusion to assert that the rest mass is within a stated real range and exclude the imaginary possibility (at least until more data comes in)

The "FTL" neutrinos a while back were an example of scientists using bayesian reasoning - something like: "given our prior probability of neutrinos traveling faster than light, and that of experimental error, we're pretty sure that the result showing neutrinos traveling at FTL speeds is evidence of experimental error." Which is why the initial report essentially said: "can anyone suggest where we went wrong please?"

Well, I think that's a bit of an oversimplification and potentially a dangerous one too in so much as it is used to lend credibility to picking priors out of the air based on handwave at some point, or to the notion of "probability of theory given data". Ultimately we were pretty sure that neutrinos don't go faster than light because a: theory where they don't works very well, and b: we had evidence that neutrinos do not go faster than light, as well as evidence via not noticing this earlier. We don't have a formalism how to quantify this, but it appears that such formalism would be possible under sufficiently detailed analysis - we would be sure that we are wrong somewhere, and the bulk of placement of wrongitude would be within the faster than light measurements than any other measurements. By the way, the probability of ultimate correctness of the major theories of physics is currently zero for all intents and purposes (there's no grand unified theory of everything, combining QM and GR is tricky). We just don't expect to see this particular sort of disagreement between theory and reality now. Yes, those can be used as 'prior probabilities' in the incredibly handwavy and shaky Bayes theorem calculation (statistical independence is not assured and so on and so forth). And the problem with priors is that too low or too high priors give you beliefs that can't ever be disposed of (any experiments that you can do will have common failure mode, i.e. not statistically independent, so they will never boost confidence past some point) - stubbornness, if you wish, whereas avoiding too low or too high priors leads to insane decisionmaking as you can make very high importance hypotheses (the numbers that a string of code can produce grow faster than any computable function).

Aiwendil
Posts: 311
Joined: Thu Apr 07, 2011 8:53 pm UTC
Contact:

Re: 1132: "Frequentists vs. Bayesians"

Postby Aiwendil » Thu Nov 15, 2012 10:45 pm UTC

Dmytry wrote:The issue is what probability, specifically, want you to assign?


And how do you pick priors? Using frequentist-ish reasoning?


As I said, I don't have an answer. This is a very deep problem. Certainly, in some restricted cases it makes sense to use frequentist-ish reasoning - e.g. in the classic case of taking the frequency of a disease in the population into account when interpreting the results of a test for that disease. But the general problem of selecting priors is a serious one. The thing is, as I said, you don't solve the problem by rejecting Bayesianism; you merely ignore it.

If I give even one in a billion probability to existence of Christian-style God, I can be Pascal-wagered


But assigning the existence of a Christian-style God a probability of exactly zero is just as arbitrary as setting it to any other value. And it also seems epistemically questionable; surely it is not logically impossible that a Christian-style God could exist. Therefore assigning it a probability of zero, rather than an extremely small value, would seem to be unreasonable. Now you might say, 'well, I'm not assigning it any probability'. But clearly, that's just ignoring the problem, not solving it. That's getting around the difficulties of answering certain questions by simply not asking those questions.

It is simpler and doesn't let you get as much information in through your prejudices. Choosing among priors lets you that much choice over screwing up the result.


It's simpler only in that you've chosen a certain definite procedure for assigning priors (which you've implicitly done, even if you don't call them priors). That's fine, but you should recognize that: 1. This (i.e. assigning priors) is what you're doing, and 2. This choice suffers from the problem of the justification of those priors just as any other choice does.

Competent ones don't do that.


They sometimes do (i.e. talk as if P(data|hypothesis) and P(hypothesis|data) are the same thing). Now, I'm sure they do in fact understand that those two things are different; my complaint is that they sometimes fail in practice to be clear about it - which is potentially confusing to people who may not be so familiar with statistics, like students.

P(hypothesis|data, theory) may be well defined for some theories though, and that's where Bayesian methods come into use, if the theory is any good. It's what good Bayesians tend to mean by P(hypothesis|data)


Yes, that's a good point, and I should have been more careful about distinguishing P(hypothesis|data, theory) from P(hypothesis|data). As I said, it's perfectly fine within the Bayesian approach to restrict yourself to a certain domain D of hypotheses and calculate P(x|y, D) - and calculating P(hypothesis|data, theory) is a case of doing just that. Note, though, that even in calculating P(hypothesis|data, theory), you still have the problem of priors. That is, you still need to assume some value for P(hypothesis|theory).

Seems like post-hoc mending with the results, to be honest. It's up to reader to interpret that, not up to experimenter to fix up his results so that they would be correct.


No one's proposing 'fixing up' the results. By all means, publish the data and publish the best fit value. But if you really think the experimenter should not interpret the results, then they shouldn't be publishing confidence intervals at all. Publishing a confidence interval that was calculated by (tacitly) assuming a uniform distribution of prior probabilities over positive and negative values of mass-squared is no more honest than publishing one that was calculated by assuming a uniform distrubution over only physical - i.e., positive - values. Both are interpretations of the data. Both purport to answer the question 'Given this data, and some set of theoretical and auxiliary assumptions, what range of values can we be 95% sure that the neutrino mass falls within?' Nor is the former approach to answering that question somehow less biased than the latter. Neither one is agnostic about the prior probabilities; each has made a choice about what distribution of priors to use. Both choices are in a sense arbitrary. The only thing I can see that can justify either choice over the other is that the former choice assigns a non-zero prior probability to hypotheses that (again, given theoretical assumptions) cannot possibly be true.

Now there's certainly a debate one can have about to what extent experimentalists should interpret their data in their own publications. I think there are good arguments for both sides, and that the best answer is probably somewhere in between the extremes. If your argument is that experiments shouldn't publish confidence limits on the values they're measuring, fine. But I maintain that if you want to obtain a confidence limit for a quantity, assigning non-zero prior probabilities to non-nonsensical values of that quantity is the wrong way to do it.

The most loud "Bayesians" on the internet are guys that want to obtain priors by bullshitting for a very nefarious purpose (separating naive folks from their money for the purpose of having a guy that never invented any algorithm I know of, or done anything really notable, fight the killer robots with the friendly robots).


Just in case I haven't made it sufficiently clear, I am in no wise in sympathy with these people.

rmsgrey wrote:As far as neutrino mass goes, if the best evidence points toward an imaginary rest mass (despite that being incompatible with our best prior theories) then rejecting that as impossible would be unscientific.


The thing is, it's not just that an imaginary mass would disagree with our current theory. It's that mass as defined within the Standard Model simply isn't the sort of thing that can have an imaginary value. It's as if you were trying to determine the price of some item that's for sale, and you came up with an imaginary value. That's not just unlikely; it's incoherent. A price, by definition, can't be imaginary.

Now, it's certainly true that one could imagine alternatives to the Standard Model in which the SM's real-valued, positive mass is replaced by a complex quantity. But if what you're interested in is P(mass=x|data, Standard Model), then it makes no sense not to exclude imaginary values for x.

Dmytry wrote:Ultimately we were pretty sure that neutrinos don't go faster than light because a: theory where they don't works very well, and b: we had evidence that neutrinos do not go faster than light, as well as evidence via not noticing this earlier.


I agree completely. But I think that what you are saying is equivalent to 'The prior probability that neutrinos are faster than light was very small, for these reasons: . . .' And in this case, even though we can't rigorously set a prior probability from first principles, it is very reasonable to suppose that the prior probability of FTL neutrinos is very low, for exactly the reasons your mentioned.

User avatar
Dmytry
Posts: 68
Joined: Mon Jun 01, 2009 12:39 pm UTC
Contact:

Re: 1132: "Frequentists vs. Bayesians"

Postby Dmytry » Fri Nov 16, 2012 8:57 am UTC

I don't have much time right now but i'll address this:

Now you might say, 'well, I'm not assigning it any probability'. But clearly, that's just ignoring the problem, not solving it. That's getting around the difficulties of answering certain questions by simply not asking those questions.

Assigning it probability is not solving, it's pretending that you solved. What's about mass, in kilograms, of the probability of God? Would you fault me for not assigning it a mass in kilograms? What if I need this mass for shipping my theology manuscript without having scales? :)

The probability, for practical purposes (e.g. writing non-trivial software that employs probabilistic reasoning correctly), is not even representable with a real number. You need to represent all the potential correlations with other random variables, as well.

edit: also, you wouldn't fault physics for not seeking to find the origin point of the universe? The vector space is perfectly well defined without one. Likewise, with 'probabilities'. The amounts of evidence are perfectly defined without the prior just as vector space is defined without origin.
Yes, that's a good point, and I should have been more careful about distinguishing P(hypothesis|data, theory) from P(hypothesis|data). As I said, it's perfectly fine within the Bayesian approach to restrict yourself to a certain domain D of hypotheses and calculate P(x|y, D) - and calculating P(hypothesis|data, theory) is a case of doing just that. Note, though, that even in calculating P(hypothesis|data, theory), you still have the problem of priors. That is, you still need to assume some value for P(hypothesis|theory).

P(hypothesis|theory) is given by theory. E.g. you can have a theory that we live inside Turing machine that iterates over all input tapes, then you have a probability assigned to any computable 'hypotheses', including some intelligence(s) that are running a simulator of our universe.

J Thomas
Everyone's a jerk. You. Me. This Jerk.^
Posts: 1190
Joined: Fri Sep 23, 2011 3:18 pm UTC

Re: 1132: "Frequentists vs. Bayesians"

Postby J Thomas » Fri Nov 16, 2012 2:23 pm UTC

Dmytry wrote:I don't have much time right now but i'll address this:

Now you might say, 'well, I'm not assigning it any probability'. But clearly, that's just ignoring the problem, not solving it. That's getting around the difficulties of answering certain questions by simply not asking those questions.

Assigning it probability is not solving, it's pretending that you solved. What's about mass, in kilograms, of the probability of God? Would you fault me for not assigning it a mass in kilograms? What if I need this mass for shipping my theology manuscript without having scales? :)

The probability, for practical purposes (e.g. writing non-trivial software that employs probabilistic reasoning correctly), is not even representable with a real number. You need to represent all the potential correlations with other random variables, as well.

edit: also, you wouldn't fault physics for not seeking to find the origin point of the universe? The vector space is perfectly well defined without one. Likewise, with 'probabilities'. The amounts of evidence are perfectly defined without the prior just as vector space is defined without origin.


Well, but he has a point. Like, if you calculate the probability of some hand from a deck of cards, you start out assuming that each card is equally likely to be in each position. That's the way to bet unless you know better. But if you know that the deck has been shuffled once and not cut, and you know the previous order, that's a whole lot of prior information. Cards that were near the top of the deck will still be near the top, and likewise for the bottom. Cards that were near the middle might now be near the top or the bottom, you don't know as much about them. (Or what you know about them may be harder to use.) It makes sense to use that information if you have it, and if you can handle the complexity.

Similarly, if you're estimating a value that you suppose is within a finite range, it makes a kind of sense to start out assuming it's equally likely to be anywhere within that range. Or you might know something that makes some other starting point more reasonable. You can't help but start with some assumption about the distribution, and then your data gives you reason to change it.

Is there a way to do it so that you give zero weight to your priors? That would be ideal if you think they deserve zero weight. But you still have to make that choice.

That does seem like the purest approach. Give zero weight to your priors, see just what the data says independent of everything else, and take it from there. But if you know about biases in your sampling etc then you ought to account for them, and once you start doing that sort of thing you have to choose where to stop.
The Law of Fives is true. I see it everywhere I look for it.

User avatar
Dmytry
Posts: 68
Joined: Mon Jun 01, 2009 12:39 pm UTC
Contact:

Re: 1132: "Frequentists vs. Bayesians"

Postby Dmytry » Fri Nov 16, 2012 4:17 pm UTC

J Thomas wrote:
Dmytry wrote:I don't have much time right now but i'll address this:

Now you might say, 'well, I'm not assigning it any probability'. But clearly, that's just ignoring the problem, not solving it. That's getting around the difficulties of answering certain questions by simply not asking those questions.

Assigning it probability is not solving, it's pretending that you solved. What's about mass, in kilograms, of the probability of God? Would you fault me for not assigning it a mass in kilograms? What if I need this mass for shipping my theology manuscript without having scales? :)

The probability, for practical purposes (e.g. writing non-trivial software that employs probabilistic reasoning correctly), is not even representable with a real number. You need to represent all the potential correlations with other random variables, as well.

edit: also, you wouldn't fault physics for not seeking to find the origin point of the universe? The vector space is perfectly well defined without one. Likewise, with 'probabilities'. The amounts of evidence are perfectly defined without the prior just as vector space is defined without origin.


Well, but he has a point. Like, if you calculate the probability of some hand from a deck of cards, you start out assuming that each card is equally likely to be in each position. That's the way to bet unless you know better. But if you know that the deck has been shuffled once and not cut, and you know the previous order, that's a whole lot of prior information. Cards that were near the top of the deck will still be near the top, and likewise for the bottom. Cards that were near the middle might now be near the top or the bottom, you don't know as much about them. (Or what you know about them may be harder to use.) It makes sense to use that information if you have it, and if you can handle the complexity.

But that's not really the 'prior' in the sense, that's just a model of imperfect shuffle. It's like knowing that a number is a sum of two dies and thus is less likely to give 12 or 2 than to give 7 .

It's kind of really difficult to describe the subtle competent disagreement here (and the terminological confusion). Let me try:

Bayesian as in "Bayesian spam filtering": could just as well have been named 'frequentist spam filtering'. You determine prevalence of spam, you determine prevalence of spammy words, you use Bayes formula to calculate frequency of spam that meets observed criteria. It has nothing to do with probability interpretation.

Bayesian as in probability interpretation: proclaiming that any belief has free-standing probability of being correct, which, by Bayes theorem, requires assignment of probabilities ex nihilo somewhere. Bayesian would start with some ex nihilo numeric belief in something and then modify it as evidence comes in. The initial beliefs will always play a role.

Frequentist as in frequentist methods: the methods that do not rely on assignment of probabilities ex nihilo. Example: if I make thousand theories over the lifetime, then test them, and one works with a risk of it working via mere coincidence of 1 in a billion, and then I believe that theory, as a process, this has up to about 1 in a million chance of letting coincidence result in an invalid belief in a theory (that's if all the theories being tested were certain to be flat out wrong).

Frequentist as in frequentist probability interpretation: probability is seen as frequency of something among something else under circumstances where we can not initially distinguish between those. E.g. a frequency of decks with joker on top, among the imperfectly shuffled decks of specific initial order.

Issues with Bayesianism: it strikes me as pure obfuscation of the fact that the process which generates the prior probabilities is itself effectively a theory of the world. Furthermore, as there is no essential distinction between "Bayesianism" and "frequentism" of the kind that would warrant a self label of Bayesian along with opposition to anything, the self labelled Bayesians seem to be simply very confused and preaching their confusion.

edit: a good example is a die that is slightly non symmetrical. What is the probability of it landing particular face upwards after it was thrown down on specific surface from specific height? How do you find out? What is the ideal way to find out? The ideal way would be to analytically determine the partitioning of the initial phase space of the die into regions corresponding to final states (which side is up) (imagine initial orientation space being broken into incredibly tiny cells that correspond to the faces it will land on afterwards) , and then find what fraction of the initial phase space corresponds to that face being up, semi analytically at least (e.g. by finding size of some of those tiny cells, which will be much more precise than just simulating same number of throws). That squares fine with frequentism, and has nothing to do with any subjective beliefs in anything.

J Thomas
Everyone's a jerk. You. Me. This Jerk.^
Posts: 1190
Joined: Fri Sep 23, 2011 3:18 pm UTC

Re: 1132: "Frequentists vs. Bayesians"

Postby J Thomas » Sat Nov 17, 2012 4:00 pm UTC

Dmytry wrote:
J Thomas wrote:
Dmytry wrote:I don't have much time right now but i'll address this:

Like, if you calculate the probability of some hand from a deck of cards, you start out assuming that each card is equally likely to be in each position. That's the way to bet unless you know better. But if you know that the deck has been shuffled once and not cut, and you know the previous order, that's a whole lot of prior information. Cards that were near the top of the deck will still be near the top, and likewise for the bottom. Cards that were near the middle might now be near the top or the bottom, you don't know as much about them. (Or what you know about them may be harder to use.) It makes sense to use that information if you have it, and if you can handle the complexity.

But that's not really the 'prior' in the sense, that's just a model of imperfect shuffle. It's like knowing that a number is a sum of two dies and thus is less likely to give 12 or 2 than to give 7 .


It's knowledge you have about the process. If you don't know about the shuffle, you can assume the cards are completely random and calculate them all as equally likely. If you know more, you can use what you know.

It's kind of really difficult to describe the subtle competent disagreement here (and the terminological confusion). Let me try:

Bayesian as in "Bayesian spam filtering": could just as well have been named 'frequentist spam filtering'. You determine prevalence of spam, you determine prevalence of spammy words, you use Bayes formula to calculate frequency of spam that meets observed criteria. It has nothing to do with probability interpretation.


Agreed. People do a lot of stupid stuff and call it Bayesian. Or even stuff that isn't so stupid, but the buzzword often does not fit.

Bayesian as in probability interpretation: proclaiming that any belief has free-standing probability of being correct, which, by Bayes theorem, requires assignment of probabilities ex nihilo somewhere. Bayesian would start with some ex nihilo numeric belief in something and then modify it as evidence comes in. The initial beliefs will always play a role.


Yes, and if the beliefs are not based on anything useful then it would be good to find ways to minimize that role.

Frequentist as in frequentist methods: the methods that do not rely on assignment of probabilities ex nihilo. Example: if I make thousand theories over the lifetime, then test them, and one works with a risk of it working via mere coincidence of 1 in a billion, and then I believe that theory, as a process, this has up to about 1 in a million chance of letting coincidence result in an invalid belief in a theory (that's if all the theories being tested were certain to be flat out wrong.


You start here with the belief that all the theories are wrong, and they have equal probabilities of appearing to be right by coincidence. In practice, their chance of wrongly appearing to be right might vary considerably.

Frequentist as in frequentist probability interpretation: probability is seen as frequency of something among something else under circumstances where we can not initially distinguish between those. E.g. a frequency of decks with joker on top, among the imperfectly shuffled decks of specific initial order.


Yes, that works too. Your knowledge of the imperfection of the shuffle will affect the likelihood. If you know exactly how the deck was shuffled then you know whether or not there is a joker on top. Whether you consider that knowledge as a prior or as a part of your experimental conditions is up to you. You get the same answer either way. If you didn't know about the imperfect shuffles but you assumed they were perfect shuffles for lack of evidence, and then when you collected enough data you could disprove that hypothesis, that would be a different prior.

Issues with Bayesianism: it strikes me as pure obfuscation of the fact that the process which generates the prior probabilities is itself effectively a theory of the world. Furthermore, as there is no essential distinction between "Bayesianism" and "frequentism" of the kind that would warrant a self label of Bayesian along with opposition to anything, the self labelled Bayesians seem to be simply very confused and preaching their confusion.


For the moment I want to agree that you can find ways to get the same result either way, and that the difference is only in your concepts about what you are doing. Then the value in thinking in bayesian terms comes from what they do for you. Does it encourage people to think out their statistical problems better? Is it easier to teach? Etc.

In that context the people who say that the Bayesian approach has revolutionized statistics, everybody is wrong but them, and they get to assume anything they want and be right, are not relevant. They prove that bayesian methods are easy to misunderstand when people very much want to misunderstand them. "It is difficult to get a man to understand something, when his livelihood depends upon his not understanding it." It doesn't help that for a very long time statistics was taught on the theory that a few geniuses could figure it out, and then train a lot of normal people like chimpanzees to use it by rote. It's hard to be sure what's easy to teach, given that there's so much misunderstanding floating around generally.

edit: a good example is a die that is slightly non symmetrical. What is the probability of it landing particular face upwards after it was thrown down on specific surface from specific height? How do you find out? What is the ideal way to find out? The ideal way would be to analytically determine the partitioning of the initial phase space of the die into regions corresponding to final states (which side is up) (imagine initial orientation space being broken into incredibly tiny cells that correspond to the faces it will land on afterwards) , and then find what fraction of the initial phase space corresponds to that face being up, semi analytically at least (e.g. by finding size of some of those tiny cells, which will be much more precise than just simulating same number of throws). That squares fine with frequentism, and has nothing to do with any subjective beliefs in anything.


I'm sorry, I didn't follow that. Are you saying, run deterministic simulations of the die starting at lots of different initial conditions -- variations in horizontal velocity, spin, etc -- and see how they come out, and then figure that the actual results on average are likely to match up to the simulations?

How do you analytically determine the partitioning of the initial phase space?
The Law of Fives is true. I see it everywhere I look for it.

Aiwendil
Posts: 311
Joined: Thu Apr 07, 2011 8:53 pm UTC
Contact:

Re: 1132: "Frequentists vs. Bayesians"

Postby Aiwendil » Sat Nov 17, 2012 7:44 pm UTC

Dmytry wrote:Assigning it probability is not solving, it's pretending that you solved. What's about mass, in kilograms, of the probability of God? Would you fault me for not assigning it a mass in kilograms? What if I need this mass for shipping my theology manuscript without having scales?


What, you don't have an applied theology hardware shop anywhere near you?

Okay, to be serious - the thing is that the "mass of the probability of God" is a meaningless locution. It's a syntax error. A probability, by definition, is not something that can have mass. Nor is a hypothesis something that can have mass. But it seems intuitively clear that, with an ordinary, common-sense understanding of the word 'probability', a hypothesis is something that can have a probability. We ascribe probabilities to hypotheses all the time, even if we do so in an approximate fashion. Why does one bring an umbrella when it looks dark outside, or when the weather report says that a storm is coming? Because one thinks that the proposition 'it is going to rain today' has a significant probability of being true. That likelihood estimate is no less susceptible to the problem of priors than any other, but few people would argue that we must refrain from making such judgements.

'Is it going to rain today?' and 'Is there a Christian-style God?' are perfectly sensible questions, in a way that 'What is the mass of the probability of God?' is not. (Note: this is all assuming that 'Christian-style God' is defined in a meaningful way, which it sometimes is and sometimes isn't). That's why refusing any assignment of probability to them is not solving the problem of priors, but ignoring it.

P(hypothesis|theory) is given by theory.


No, it's not. Take again neutrino mass and the Standard Model. The masses of the elementary particles are parameters in the Standard Model; the theory itself doesn't tell you what they are, or provide any kind of probability distribution for their values. (Although it does tell you that those masses are real and positive.) So if you perform an experiment and you want to extract a value for P(x<mass<y|data,Standard Model), you have to choose values for your priors. If that's not a question you want to ask, fine. But it's a question a lot of people do want to ask (and it's the question that is supposedly being answered when one a confidence limit is shown).

Frequentist as in frequentist methods: the methods that do not rely on assignment of probabilities ex nihilo. Example: if I make thousand theories over the lifetime, then test them, and one works with a risk of it working via mere coincidence of 1 in a billion, and then I believe that theory, as a process, this has up to about 1 in a million chance of letting coincidence result in an invalid belief in a theory (that's if all the theories being tested were certain to be flat out wrong).


What you are doing here, whatever you might call it, is estimating a Bayesian probability. Note that when you say there is a 1 in a million chance that your belief in the theory you've chosen is invalid, this is equivalent to saying that P(theory|data) = 0.999999. And to get that 1 in a million figure, you implicitly assigned equal prior probabilities to each of your thousand theories. Now, maybe assigning them equal prior probabilities is the best choice. But it is a choice. You do not escape the problem of priors this way.

Furthermore, as there is no essential distinction between "Bayesianism" and "frequentism" of the kind that would warrant a self label of Bayesian along with opposition to anything, the self labelled Bayesians seem to be simply very confused and preaching their confusion


I would label myself a Bayesian for the reasons I've described. Am I included with those who are very confused and preaching their confusion? If so, fine, we can continue to debate the issue (and maybe I'll convince you that I'm not very confused or maybe you'll convince me that I am). But if not, then I don't see how it's relevant to keep bringing up the views of people like Yudkowsky, which - as far as I can tell - no one here is supporting.

User avatar
Dmytry
Posts: 68
Joined: Mon Jun 01, 2009 12:39 pm UTC
Contact:

Re: 1132: "Frequentists vs. Bayesians"

Postby Dmytry » Sun Nov 18, 2012 12:31 am UTC

Aiwendil wrote:
Dmytry wrote:Assigning it probability is not solving, it's pretending that you solved. What's about mass, in kilograms, of the probability of God? Would you fault me for not assigning it a mass in kilograms? What if I need this mass for shipping my theology manuscript without having scales?


What, you don't have an applied theology hardware shop anywhere near you?

Okay, to be serious - the thing is that the "mass of the probability of God" is a meaningless locution. It's a syntax error. A probability, by definition, is not something that can have mass. Nor is a hypothesis something that can have mass. But it seems intuitively clear that, with an ordinary, common-sense understanding of the word 'probability', a hypothesis is something that can have a probability. We ascribe probabilities to hypotheses all the time, even if we do so in an approximate fashion. Why does one bring an umbrella when it looks dark outside, or when the weather report says that a storm is coming? Because one thinks that the proposition 'it is going to rain today' has a significant probability of being true. That likelihood estimate is no less susceptible to the problem of priors than any other, but few people would argue that we must refrain from making such judgements.

What you have is the probability of rain given storm warning or given darkness outside, and given the theory of how rain works. This is well defined by the model, akin to how probability of fair die rolling 6 is well defined by our model of symmetrical die that does sufficient number of bounces. On the other hand, the free standing probability, such as probability of existence of god, or probability of ultimate correctness of the string theory, is not so defined. If we had a meta theory of how theories of everything works and what universes could be there and what universes could contain sentient observers, within this meta theory the probability of God or probability of ultimate correctness of the string theory, may have been defined. The probability is a type of value with specific properties that pops up in a broad class of calculations within some theories.
P(hypothesis|theory) is given by theory.


No, it's not. Take again neutrino mass and the Standard Model. The masses of the elementary particles are parameters in the Standard Model; the theory itself doesn't tell you what they are, or provide any kind of probability distribution for their values. (Although it does tell you that those masses are real and positive.) So if you perform an experiment and you want to extract a value for P(x<mass<y|data,Standard Model), you have to choose values for your priors. If that's not a question you want to ask, fine. But it's a question a lot of people do want to ask (and it's the question that is supposedly being answered when one a confidence limit is shown).

When a theory does not provide any kind of probability distribution, there is no probability, there's a number made up. Given that the numbers do not appear out of nowhere, but are products of mental processes or the like, there always is some sort of theory, if not a good one, then a very bad one, behind the number - e.g. if standard model does not provide you with a prior on mass of electron, then some mixture of the day of the week and what the theoretician had for breakfast and various random feelings and prejudices will be the ultimate theory of everything that will be employed to make conclusions about the world. A very bad theory of everything. It is generally advisable to decrease the influence of such extraneous factors.

Frequentist as in frequentist methods: the methods that do not rely on assignment of probabilities ex nihilo. Example: if I make thousand theories over the lifetime, then test them, and one works with a risk of it working via mere coincidence of 1 in a billion, and then I believe that theory, as a process, this has up to about 1 in a million chance of letting coincidence result in an invalid belief in a theory (that's if all the theories being tested were certain to be flat out wrong).


What you are doing here, whatever you might call it, is estimating a Bayesian probability. Note that when you say there is a 1 in a million chance that your belief in the theory you've chosen is invalid, this is equivalent to saying that P(theory|data) = 0.999999. And to get that 1 in a million figure, you implicitly assigned equal prior probabilities to each of your thousand theories. Now, maybe assigning them equal prior probabilities is the best choice. But it is a choice. You do not escape the problem of priors this way.

There's some miscommunication here. What I am saying is that I can, before running those experiments on those theories, conclude that in the worst case scenario (every theory is wrong, I.e. prior of precisely 0), there will only be about one in a million chance that one of theories will be non-falsified. Thus, even if I am fully certain that every theory is wrong, I can still agree, for a quite minor reward, to commit to believe in a theory if it is not falsified. It is, really, very elegant. When you don't know the prior, you can assume the prior of 0 and calculate the losses from commitment to believe it given the highly improbable fluke evidence, and make those losses tolerably small. You can even make this decision post-hoc, if you are sure that there was no cherry picking or have a bound on the cherry picking.

edit: i.e. what I am saying is P(me believing in false theory|all theories are false)=0.000001 upon completion of the strategy I outlined, not P(theory|data) = 0.999999 . This sort of stuff often pops up when you write software that has to have specific worst case performance. edit: and by false, i mean non-predictive, i.e. the Newtonian mechanics is not considered 'false' upon completion of a crude experiment, merely inaccurate but a necessary step nonetheless.
Furthermore, as there is no essential distinction between "Bayesianism" and "frequentism" of the kind that would warrant a self label of Bayesian along with opposition to anything, the self labelled Bayesians seem to be simply very confused and preaching their confusion


I would label myself a Bayesian for the reasons I've described. Am I included with those who are very confused and preaching their confusion? If so, fine, we can continue to debate the issue (and maybe I'll convince you that I'm not very confused or maybe you'll convince me that I am). But if not, then I don't see how it's relevant to keep bringing up the views of people like Yudkowsky, which - as far as I can tell - no one here is supporting.

Well, did you ever label yourself a Bayesian in any conversation before this one?

Aiwendil
Posts: 311
Joined: Thu Apr 07, 2011 8:53 pm UTC
Contact:

Re: 1132: "Frequentists vs. Bayesians"

Postby Aiwendil » Sun Nov 18, 2012 3:32 am UTC

Dmytry wrote:What you have is the probability of rain given storm warning or given darkness outside, and given the theory of how rain works. This is well defined by the model, akin to how probability of fair die rolling 6 is well defined by our model of symmetrical die that does sufficient number of bounces. On the other hand, the free standing probability, such as probability of existence of god, or probability of ultimate correctness of the string theory, is not so defined.


There are two points to be made here. First, from the fact that we have no way of obtaining a prior probability (or, equivalently, a free-standing probability) for the existence of a god, it does not follow that 'the probability that god exists' is meaningless in the way that 'the mass of the probability' is. Here we are getting into fundamental issues of the way various terms are used. But it seems to me that from the facts that a. I cannot rationally be certain that god exists, and b. I cannot rationally be certain that god does not exist, it follows that c. the probability of god's existence is greater than zero and less than one. Now perhaps that c follows from a and b does not seem intuitive to you, but I would say that on a common-sense understanding of the term 'probability', the argument is valid. But I readily grant that my common-sense notion may not be universal.

But we need into such semantics. The second point is that - while you're right that what we have is the probability of rain given a storm warning or given darkness outside, I will maintain that what we want, what we care about, what we actually use, is P(rain) itself, unconditionally. Suppose we have two theories. Among the implications of theory 1 is that it is likely to rain when there are dark clouds in the sky; theory 2 on the other hand implies that it is likely to rain when it is sunny. Let's say it's bright and sunny. I want to know whether to bring an umbrella when I go out (and if I won't need one, I don't want to bring one). I can estimate P(rain|theory1), which turns out to be very low, and P(rain|theory2), which turns out to be very high. So far my reasoning is not infected with priors. But now I must actually choose what to do, and obviously I will choose the answer provided by theory 1 - that is, I will not bring an umbrella, because it's sunny (and I will take the liberty of assuming that you would make the same choice). How can I, or you, justify this? In the strictest sense, we can't. But the reason I made that choice is because I believe that the probability P(theory1) is much greater than P(theory2). I believe this because many past experiences have rendered theory 1 probable - which is to say that P(theory1|experience) > P(theory2|experience). But of course, whatever my past experiences were, I could only establish that inequality by making some assumption about the truly fundamental prior probabilities of the two theories. Note that that assumption could be as innocuous as 'the probability of the two theories is about equal', and experience would quickly render theory 1 far more likely. But some such assumption is needed. Or else you can't decide when to bring an umbrella.

When a theory does not provide any kind of probability distribution, there is no probability, there's a number made up.


I agree. That's the problem of priors, and ultimately the problem of induction.

if standard model does not provide you with a prior on mass of electron, then some mixture of the day of the week and what the theoretician had for breakfast and various random feelings and prejudices will be the ultimate theory of everything that will be employed to make conclusions about the world. A very bad theory of everything. It is generally advisable to decrease the influence of such extraneous factors.


I'm not sure whether this is intended as a criticism of the SM. If it is, then I'd love to know how you propose to develop a physical theory with no free parameters. If it isn't, then you must acknowledge that the theory provides no prediction for its parameters, and thus any attempt to put confidence limits on them (i.e. measure them) must make assumptions about the distribution of prior probabilities.

There's some miscommunication here. What I am saying is that I can, before running those experiments on those theories, conclude that in the worst case scenario (every theory is wrong, I.e. prior of precisely 0), there will only be about one in a million chance that one of theories will be non-falsified. Thus, even if I am fully certain that every theory is wrong, I can still agree, for a quite minor reward, to commit to believe in a theory if it is not falsified. It is, really, very elegant. When you don't know the prior, you can assume the prior of 0 and calculate the losses from commitment to believe it given the highly improbable fluke evidence, and make those losses tolerably small. You can even make this decision post-hoc, if you are sure that there was no cherry picking or have a bound on the cherry picking.

edit: i.e. what I am saying is P(me believing in false theory|all theories are false)=0.000001 upon completion of the strategy I outlined, not P(theory|data) = 0.999999 . This sort of stuff often pops up when you write software that has to have specific worst case performance. edit: and by false, i mean non-predictive, i.e. the Newtonian mechanics is not considered 'false' upon completion of a crude experiment, merely inaccurate but a necessary step nonetheless.


Ah, I'm sorry I misunderstood you. (I think the reason I did so was your use of the word 'believe', which can perhaps be a bit slippery). That kind of frequentist reasoning, of course, is fine, as far as it goes. But it bears pointing out that, if you then test the theories as you describe and find that one of them works in a way that would be a 1 in a billion fluctuation given the null hypothesis, you still have no idea whether that theory or the null hypothesis is more likely to be true. You still cannot claim that this theory is more likely to be true than it is to be false, or that it is more likely to be true than is some other theory, or that it is rational to accept this theory over any other. (And I guess in that sense I think it's not quite accurate - or at least that it's misleading - to say that you 'believe' in the theory.) I'm sure I'm not telling you anything you don't know, but (as I said before) failure to be clear on this point is something that has bothered me in some physicists from time to time.

Well, did you ever label yourself a Bayesian in any conversation before this one?


I think I have, but I admit that those cases were conversations along similar lines to this one. It's not quite as though when I introduce myself I say, "Hi, I'm a Bayesian."

User avatar
Dmytry
Posts: 68
Joined: Mon Jun 01, 2009 12:39 pm UTC
Contact:

Re: 1132: "Frequentists vs. Bayesians"

Postby Dmytry » Sun Nov 18, 2012 8:04 am UTC

How can I, or you, justify this? In the strictest sense, we can't.

Well, we don't really have to justify that sort of thing. We can assume that the universe is predictable by our mental processes; if it isn't, there isn't anything else we can do. Then we can try to make the best use of the data we got.

Along those lines, the hypotheses have this quantity, sort of "to what extent the hypothesis was made up?", on which we can also have a bound. E.g. I can have a hypothesis that the mass of god is between 100 and 200 kg, with no other qualifiers. Then I can make a hypothesis that the mass of god is 150 to 151kg, with exact same reasoning but making up a single value instead of a range. I then know that the latter speculation is (at least) 100x more speculative than the former, even though I don't know how absolutely speculative either speculation is (I have a suspicion that any sort of analysis that could produce a 'probability' number for those, would produce 0 for either). Even though I can not in any way justify any sort of prior probability on either speculation, I know how those speculations relate and I know how to bet between them. Of course that does not work in all circumstances because we can't reflect on our mental processes well enough, but I think existence of this relation between hypotheses may be what drives this very strong intuition that probabilities of hypotheses really exist or that you need those - intuitively we know that some hypotheses are less probable than others, and we jump from this relation to 2 real numbers assigned to the hypotheses, to some sort of grand philosophical probability of hypothesis being really true. But all of that cancels out, as you have noticed - only relations between hypotheses matter for the final decisionmaking.
You still cannot claim that this theory is more likely to be true than it is to be false

But can anyone else make this sort of claim? I don't see how. If nobody can, then those that do, are wrong.

or that it is more likely to be true than is some other theory, or that it is rational to accept this theory over any other.

Those claims can sometimes be made, as per above. We can't really quantify relative speculativeness of the theories not because it is ill defined (speculation is part of the apparatus that makes theories, and it is in principle quantifiable), but because we don't have much insight into how we generate theories.

Aiwendil
Posts: 311
Joined: Thu Apr 07, 2011 8:53 pm UTC
Contact:

Re: 1132: "Frequentists vs. Bayesians"

Postby Aiwendil » Sun Nov 18, 2012 7:05 pm UTC

Dmytry wrote:Well, we don't really have to justify that sort of thing. We can assume that the universe is predictable by our mental processes; if it isn't, there isn't anything else we can do. Then we can try to make the best use of the data we got.


Ah, but you have to assume more than just that the universe is predictable. You have to make assumptions about which hypotheses are to be used in making predictions. For instance, in the rain example, consider the following two hypotheses: 1. Whenever it is sunny outside it is unlikely to rain; 2. Up until 18 November 2012 (today), whenever it is sunny outside it is unlikely to rain; from 19 November 2012 onward, whenever it is sunny outside it is likely to rain. Both hypotheses describe a predictable universe, and either one could be used as the basis for making predictions. Both are equally confirmed by past experience - in fact, they are guaranteed to be equally confirmed by past experience since they make the same predictions for all past events. There's nothing precluding us from using hypothesis 2; doing so would be perfectly coherent and it would allow us to make predictions just as easily as with hypothesis 1. But obviously, it is hypothesis 1 that I will use in deciding whether to bring an umbrella to work tomorrow, rather than hypothesis 2. And though I'm sure you'd do the same, I think we agree that this is, in a strict sense, a rationally unjustified assumption. All I am saying is that that assumption amounts to assigning a higher prior probability to hypothesis 1 than to hypothesis 2.

E.g. I can have a hypothesis that the mass of god is between 100 and 200 kg, with no other qualifiers. Then I can make a hypothesis that the mass of god is 150 to 151kg, with exact same reasoning but making up a single value instead of a range. I then know that the latter speculation is (at least) 100x more speculative than the former,


This is wandering a bit from the point, but I think you have some hidden assumptions in there when you conclude that your second hypothesis is 100 times more speculative than your first. To wit, you have to choose to talk about the range you've selected in mass as opposed to mass-squared, or mass-cubed, or any other function of mass. The first hypothesis is equivalent to 'the square of the mass of god is between 10,000 and 40,000 kg^2' and the second to 'the square of the mass of god is between 22,500 kg^2 and 22,801 kg^2'. From these numbers, you would calculate that the second hypothesis is slightly more than 100 times as speculative as the first. If you used the square root of mass, you'd calculate that it's slightly less than 100 times as speculative. This is actually isomorphic to the problem of assigning uniform Bayesian priors for the value of a quantity; you have to choose what to make them uniform in. (By the way, in all this I'm assuming that somehow the 'mass of god' is a meaningful term, which it could be with a certain type of god, but is probably not in the standard Judeo-Christian conception).

Even though I can not in any way justify any sort of prior probability on either speculation, I know how those speculations relate and I know how to bet between them.


Actually, you don't, for the reasons I discussed above. To get your speculativeness value, you have to choose uniformity with respect to some particular quantity (mass, mass-squared, etc.). Okay, for the numbers you picked, you'd have to go to pretty large (or small) exponents before you get a very significant difference, but the difference is there. And incidentally, once you start making bets, you're interpreting speculativeness as the inverse of probability, and this becomes exactly the problem of assigning priors in some distribution over the quantity. Now, in this case, you can still say that the 'speculativeness' of the second hypothesis is guaranteed to be greater than or equal to that of the first (or, that the first is guaranteed to be more probable than the second). But that's only because the second hypothesis is implied by the first but not vice versa (i.e. the second describes a subset of the first). So even that weak statement can never be made if you're comparing hypotheses where neither includes the other.

or that it is more likely to be true than is some other theory, or that it is rational to accept this theory over any other.


Those claims can sometimes be made, as per above. We can't really quantify relative speculativeness of the theories not because it is ill defined (speculation is part of the apparatus that makes theories, and it is in principle quantifiable), but because we don't have much insight into how we generate theories.


But, as per above, those claims can only be made in cases where one hypothesis includes the other.

I'm afraid I've also got to quibble with your notion of 'speculativeness'. First of all, you haven't defined it; you say that it is in principle quantifiable, but I'd like to know how. You also seem to say that it has to do with cognition and with the way theories are generated, which leaves you the task of explaining how exactly the means by which a theory was generated are supposed to affect the evaluation of that theory (we're often admonished that the context of discovery and the context of justification are not entirely separate from one another, but we're rarely shown how they are supposed to be related). In particular, since you then interpret speculativeness in terms of probability (since you say that claims like 'this thory is more likely to be true than that theory' can sometimes be made), you need to show how exactly speculativeness is related to probability.

User avatar
Dmytry
Posts: 68
Joined: Mon Jun 01, 2009 12:39 pm UTC
Contact:

Re: 1132: "Frequentists vs. Bayesians"

Postby Dmytry » Sun Nov 18, 2012 8:00 pm UTC

I only mentioned speculativeness in attempt as to explain why one can have such a strong intuition that some hypotheses are more probable than others. I am assuming that both hypotheses about mass of god would use identical reasoning about, specifically, the mass (rather than mass squared or the like). This is a property of how a hypothesis was produced and how it works, it's not a property of the universe at any rate. The universe simply has a mass of god, or doesn't have one, unless you believe you live in a multiverse, with some universes containing gods of different masses, in which case the subjective you is still in one universe which has one specific value (or it is inapplicable).

From where I am standing, the probabilities are how we represent properties of certain processes, such as the symmetrical relation between sides of a die combined with the highly nonlinear and mixing process of the bouncing. You would have to at least agree that the probability you assign to a hypothesis is a different sort of number, arising from a different process entirely (e.g. some neuronal interactions that produce your general feeling of truthiness, later given out by you as a number), a number which does not have the same usefulness as the one arising from e.g. symmetry considerations in the die. You seem to see the probability as representing something real - forgive me if I'm wrong. I see probability as a mathematical tool, for representing how the unknowns can cancel out after integration (it is real in only that sense - it is real that symmetrical die, after bounces, mixes up initial phase space so that 1/6 of it corresponds to each final face state, and the cells in phase space that map to the final faces are very tiny). I routinely use it to model outcome of totally deterministic processes (driven by pseudo-random number generator); a pseudo-random number generator has certain symmetries which allow the concept of probability to be useful. I do concede that sometimes we can represent properties of process of generation of hypotheses in such a fashion - e.g. if you hypothesise that the dice has landed six, given the general understanding of how dices work.

edit: With regards to the rain example, the question of how probable is the hypothesis that it will rain when sunny starting from today, there may be a symmetry between "it rains when dark before today, then the rain mechanism changes to another one where it still rains when dark, after today", and "it rains when dark before today, and then the rain mechanism changes to another one where it rains when it is bright, after today", but no such symmetry for "the mechanism stays intact and it keeps raining when it is dark". Evidence breaks symmetries between opposing hypotheses. A belief is an asymmetry, in a way.

User avatar
Dmytry
Posts: 68
Joined: Mon Jun 01, 2009 12:39 pm UTC
Contact:

Re: 1132: "Frequentists vs. Bayesians"

Postby Dmytry » Mon Nov 19, 2012 1:51 am UTC

I came up with a good example of what I am trying to say with regards to 'probabilities of hypothesis':

Suppose I have a hypothesis that the pseudorandom number generator Mersenne Twister, the MT19937 variation, given the seed of 0, will produce 1503540437 as first 32-bit result. The probability of such is 2-32 if I made this up while knowing of certain properties of MT (that it works fine as uniform random number generator), near 1 if i ran the code and obtained the result, and somewhere in-between if I laboriously calculated it by hand. (And it is pretty ridiculous to treat such calculation as update on 2-32 prior, not to mention awfully hard to get right). Partial yet rigorous calculations may also be made, which can produce results with different 'probabilities'. The probability is a number which we use to represent some relations within a system, no more no less; the system can be the world, or it may be what generates the hypothesis, or both. We also have the subjective degree of belief, in my opinion it needs a word different from the word 'probability'. Plausibility, maybe, that would reflect it's handwavy nature.

J Thomas
Everyone's a jerk. You. Me. This Jerk.^
Posts: 1190
Joined: Fri Sep 23, 2011 3:18 pm UTC

Re: 1132: "Frequentists vs. Bayesians"

Postby J Thomas » Mon Nov 19, 2012 3:23 am UTC

Dmytry wrote:I came up with a good example of what I am trying to say with regards to 'probabilities of hypothesis':

Suppose I have a hypothesis that the pseudorandom number generator Mersenne Twister, the MT19937 variation, given the seed of 0, will produce 1503540437 as first 32-bit result. The probability of such is 2-32 if I made this up while knowing of certain properties of MT (that it works fine as uniform random number generator), near 1 if i ran the code and obtained the result, and somewhere in-between if I laboriously calculated it by hand.


Very nice! So at this point you aren't talking at all about a system that varies with unknown confounding variables and experimental error and such. Computer systems will always give the same result for the same calculation, unless one of them is incorrect. You are talking entirely about your own lack of knowledge. If all you know is that it is a uniform random number generator, the probability is 1 in 2^32. But once you have seen the result once, then you know. Either it's right or it's wrong. If you have done calculations that might themselves contain errors, then the question becomes how likely those errors are. But once you have seen the result once, then you know. Either it's right or it's wrong.

There is a single truth out there, and all your statistics are about is your own uncertainty what that truth is. In this case the traditional "frequentist" approach is useless. Once you have seen the true answer once, you know.

(And it is pretty ridiculous to treat such calculation as update on 2-32 prior, not to mention awfully hard to get right). Partial yet rigorous calculations may also be made, which can produce results with different 'probabilities'.


Yes. If you can calculate the last digit precisely, then either you know the hypothesis is wrong because the last digit is wrong, or you have increased the probability to 1 in 2^31.

The probability is a number which we use to represent some relations within a system, no more no less; the system can be the world, or it may be what generates the hypothesis, or both. We also have the subjective degree of belief, in my opinion it needs a word different from the word 'probability'. Plausibility, maybe, that would reflect it's handwavy nature.


It would be interesting to make a clear distinction between things which are unknown because of possible error that introduces noise into the outputs, versus things which are unknown simply because you do not know them. But can we always tell the difference?
The Law of Fives is true. I see it everywhere I look for it.

User avatar
Dmytry
Posts: 68
Joined: Mon Jun 01, 2009 12:39 pm UTC
Contact:

Re: 1132: "Frequentists vs. Bayesians"

Postby Dmytry » Mon Nov 19, 2012 3:29 pm UTC

J Thomas wrote:
Dmytry wrote:I came up with a good example of what I am trying to say with regards to 'probabilities of hypothesis':

Suppose I have a hypothesis that the pseudorandom number generator Mersenne Twister, the MT19937 variation, given the seed of 0, will produce 1503540437 as first 32-bit result. The probability of such is 2-32 if I made this up while knowing of certain properties of MT (that it works fine as uniform random number generator), near 1 if i ran the code and obtained the result, and somewhere in-between if I laboriously calculated it by hand.


Very nice! So at this point you aren't talking at all about a system that varies with unknown confounding variables and experimental error and such. Computer systems will always give the same result for the same calculation, unless one of them is incorrect. You are talking entirely about your own lack of knowledge. If all you know is that it is a uniform random number generator, the probability is 1 in 2^32. But once you have seen the result once, then you know. Either it's right or it's wrong. If you have done calculations that might themselves contain errors, then the question becomes how likely those errors are. But once you have seen the result once, then you know. Either it's right or it's wrong.

Yes, basically that.

There is a single truth out there, and all your statistics are about is your own uncertainty what that truth is. In this case the traditional "frequentist" approach is useless. Once you have seen the true answer once, you know.

I'm not sure what is the traditional "frequentist" approach, to be honest. I do mathematics, not some sort of taxonomy of methods. The frequentist interpretation of probability squares fine with that - if I were to employ the 'make up a number' strategy, i'd be correct 2^-32 of the time, but if I were to employ "calculate reliably" strategy, i'd be correct virtually all the time...

(And it is pretty ridiculous to treat such calculation as update on 2-32 prior, not to mention awfully hard to get right). Partial yet rigorous calculations may also be made, which can produce results with different 'probabilities'.


Yes. If you can calculate the last digit precisely, then either you know the hypothesis is wrong because the last digit is wrong, or you have increased the probability to 1 in 2^31.

Yup. Or there may be a slight flaw in the PRNG which would make e.g. numbers with odd count of bits set less common than numbers with even count of bits set.

The probability is a number which we use to represent some relations within a system, no more no less; the system can be the world, or it may be what generates the hypothesis, or both. We also have the subjective degree of belief, in my opinion it needs a word different from the word 'probability'. Plausibility, maybe, that would reflect it's handwavy nature.


It would be interesting to make a clear distinction between things which are unknown because of possible error that introduces noise into the outputs, versus things which are unknown simply because you do not know them. But can we always tell the difference?

Well, I consider the mathematical probability to be, primarily, a technique which we can sometimes use to calculate a big sum or a big integral even though we are unable to calculate a single value. Not measure of ignorance, but a collection of methods for application of knowledge of the relations between outcomes and the like. E.g. the knowledge that the initial phase space of the symmetrical 6-sided die that is to do several bounces is very near evenly split into very tiny cells corresponding to the sides that the die will roll, is represented with probability of rolling each side 1/6. The symmetry is not essential in principle; for slightly non symmetrical die one could sample the initial phase space using a jittered grid, obtaining better convergence in results than one would by simply simulating same number of throws. One could also resolve the cell boundaries (I used something similar one time to model photons / global illumination, you get much better convergence, it's like 1/sqrt(n) noise vs 1/n^2 errors where n is number of samples). With regards to 'prior' probabilities of hypotheses, those are entirely dependent on how the hypotheses were constructed as per PRNG example, and can often only be given an upper bound (as it is often not possible to quantify all the implicit assumptions).

Aiwendil
Posts: 311
Joined: Thu Apr 07, 2011 8:53 pm UTC
Contact:

Re: 1132: "Frequentists vs. Bayesians"

Postby Aiwendil » Mon Nov 19, 2012 6:29 pm UTC

Dmytry wrote:I only mentioned speculativeness in attempt as to explain why one can have such a strong intuition that some hypotheses are more probable than others.


Okay, in that case, this is exactly the same as setting Bayesian priors via a 'strong intuition'. Which, I agree, is what we do. But (as indeed you have argued) it does not enable you to rationally justify the statement 'this hypothesis is more probable than that one'.

You seem to see the probability as representing something real


It depends on what you mean by 'real'. I see the probability of a hypothesis as a tool that can be used in maximizing the accuracy of one's predictions, or in deciding between alternatives based on expected utility. Of course, that is different from the (equally valid) frequentist definition of the probability of some condition as the fraction of instances in which that condition is satisfied divided. The two are related in that if you have a set of hypotheses and assign each an equal prior probability (in the former sense), then the probability (again the first sense) of the hypothesis that a certain condition holds is equal to the fraction of hypotheses in which that condition holds.

Your definition of probability for a die in terms of phase space is one of the latter, frequentist kind; you're talking about the fraction of phase space in which the die gets a particular result. That's fine. But if you ask me, it's also kind of useless. It doesn't tell me, for instance, how to bet - unless, of course, I assume that all points in phase space are equally likely; that is, unless I assume a uniform distribution of priors over phase space.

As to which concept has the better right to the name 'probability', I'd be inclined to say that when an ordinary person uses that term, they're thinking about predictions, expectations, rational betting behaviour, that sort of thing. But I'm perfectly happy to call both things probability, as long as they are not conflated.

With regards to the rain example, the question of how probable is the hypothesis that it will rain when sunny starting from today, there may be a symmetry between "it rains when dark before today, then the rain mechanism changes to another one where it still rains when dark, after today", and "it rains when dark before today, and then the rain mechanism changes to another one where it rains when it is bright, after today", but no such symmetry for "the mechanism stays intact and it keeps raining when it is dark". Evidence breaks symmetries between opposing hypotheses. A belief is an asymmetry, in a way.


First of all, in my book, 'it rains when dark before today and then after today it still rains when dark' is the same hypothesis as just 'it rains when dark', notwithstanding any talk about 'mechanisms'. So in order to get this 'symmetry' you have to count hypotheses in a very particular, and I would say non-intuitive, way.

But even granting that you could somehow count hypotheses like that, to get them to cancel out you still need to assume that they have equal prior probabilities. You simply cannot get the result 'I don't need to bring an umbrella today, because it's sunny' without making some unwarranted assumptions about prior probabilities.

J Thomas
Everyone's a jerk. You. Me. This Jerk.^
Posts: 1190
Joined: Fri Sep 23, 2011 3:18 pm UTC

Re: 1132: "Frequentists vs. Bayesians"

Postby J Thomas » Mon Nov 19, 2012 7:14 pm UTC

Dmytry wrote:
J Thomas wrote:
Spoiler:
Dmytry wrote:I came up with a good example of what I am trying to say with regards to 'probabilities of hypothesis':

Suppose I have a hypothesis that the pseudorandom number generator Mersenne Twister, the MT19937 variation, given the seed of 0, will produce 1503540437 as first 32-bit result. The probability of such is 2-32 if I made this up while knowing of certain properties of MT (that it works fine as uniform random number generator), near 1 if i ran the code and obtained the result, and somewhere in-between if I laboriously calculated it by hand.


Very nice! So at this point you aren't talking at all about a system that varies with unknown confounding variables and experimental error and such. Computer systems will always give the same result for the same calculation, unless one of them is incorrect. You are talking entirely about your own lack of knowledge. If all you know is that it is a uniform random number generator, the probability is 1 in 2^32. But once you have seen the result once, then you know. Either it's right or it's wrong. If you have done calculations that might themselves contain errors, then the question becomes how likely those errors are. But once you have seen the result once, then you know. Either it's right or it's wrong.

Yes, basically that.


There is a single truth out there, and all your statistics are about is your own uncertainty what that truth is. In this case the traditional "frequentist" approach is useless. Once you have seen the true answer once, you know.

I'm not sure what is the traditional "frequentist" approach, to be honest. I do mathematics, not some sort of taxonomy of methods. The frequentist interpretation of probability squares fine with that - if I were to employ the 'make up a number' strategy, i'd be correct 2^-32 of the time, but if I were to employ "calculate reliably" strategy, i'd be correct virtually all the time...

Yes, because you set it up that way. If your computer was getting power fluctuations big enough to cause occasional glitches (and somehow the OS kept working) you might not always get the same answer. In that case you could look at a large collection of answers and get some idea how often the power changed etc. Or you could actually measure the power on a nanosecond by nanosecond basis, and maybe with enough data you could learn to predict precisely which wrong answer to expect when.

Or if your main processing chip was overheating, you might not be able to tell how preceding calculations overheated precisely which parts of the chip, and as the chip degraded you couldn't expect reproducible results. But you could be sure that in a deterministic universe it *would* be possible to discover all those details if it was worth the effort. Or in a world of quantum states it might be possible that unknowable matters could have an effect. But we can ignore all of that because your processor and voltage regulator and heat sink etc etc were all designed carefully to keep every known variable from varying enough to matter.

The probability is a number which we use to represent some relations within a system, no more no less; the system can be the world, or it may be what generates the hypothesis, or both. We also have the subjective degree of belief, in my opinion it needs a word different from the word 'probability'. Plausibility, maybe, that would reflect it's handwavy nature.


It would be interesting to make a clear distinction between things which are unknown because of possible error that introduces noise into the outputs, versus things which are unknown simply because you do not know them. But can we always tell the difference?

Well, I consider the mathematical probability to be, primarily, a technique which we can sometimes use to calculate a big sum or a big integral even though we are unable to calculate a single value. Not measure of ignorance, but a collection of methods for application of knowledge of the relations between outcomes and the like. E.g. the knowledge that the initial phase space of the symmetrical 6-sided die that is to do several bounces is very near evenly split into very tiny cells corresponding to the sides that the die will roll, is represented with probability of rolling each side 1/6. The symmetry is not essential in principle; for slightly non symmetrical die one could sample the initial phase space using a jittered grid, obtaining better convergence in results than one would by simply simulating same number of throws. One could also resolve the cell boundaries (I used something similar one time to model photons / global illumination, you get much better convergence, it's like 1/sqrt(n) noise vs 1/n^2 errors where n is number of samples). With regards to 'prior' probabilities of hypotheses, those are entirely dependent on how the hypotheses were constructed as per PRNG example, and can often only be given an upper bound (as it is often not possible to quantify all the implicit assumptions).


Once again I'm not sure I caught that. Are you saying that you run a simulation, and rather than attempting to introduce random inputs (say from a uniform distribution) you instead use nonrandom inputs that cover the space of inputs more evenly? That sounds good if you can arrange it. Actual random inputs would tend to oversample some areas because they are random, and that will create bias which takes large samples to reduce. If you can sample more evenly you can get your integral quicker.

But for that you need an accurate model, right? You have a model of a die being rolled from a set height, and you let the rotation speed and angle, and the horizontal direction relative to the spin, and the horizontal velocity -- all the variables you think should vary -- vary, and you calculate results, and that gives you an integral. But any error in the model may lead to an error in the result. And what if there's something about the way people actually roll dice that biases the rotation angle relative to the horizontal direction? Unless you bias your model correctly that can lead to a wrong result too. Probably better to measure the way people actually roll dice, and when you do that you might collect the results and use those instead of your simulation model -- except that you can't get the actual people to vary their results in ways that will spread the inputs across the phase space correctly. Their horizontal velocities might fit some nonrandom distribution, etc. But then, in that case what you want to know about is the way people actually do it rather than a prediction that they will be evenly distributed across the domain....

It looks like once you put a lot of effort getting the priors right, you can do great simulations.
The Law of Fives is true. I see it everywhere I look for it.