Statistics question.

For the discussion of math. Duh.

Moderators: gmalivuk, Moderators General, Prelates

Posts: 44
Joined: Sun May 01, 2011 5:18 am UTC

Statistics question.

How do you work this out.
Say you have weighed a number or rocks.
For example you weigh 5 rocks, and their weights are 6,7,7,8,9.
How would I work out what the 95% confidence interval is for the weight I will get when I weigh the next rock.
I get a sample mean of 7.4, a sample variance of 1.04 and an unbiased estimate for the population variance of 1.3.
Now if I knew for certain what the population mean was I could just use a T-distribution, but I can't since I don't know what the population mean is.
http://officeofstrategicinfluence.com/spam/

lorb
Posts: 405
Joined: Wed Nov 10, 2010 10:34 am UTC
Location: Austria

Re: Statistics question.

That's not even possible unless you make same assumption about (or know) the distribution of the rocks weights. What you can do is calculate a confidence interval that tells you that your sample had a 95% chance to originate from a population with the mean within that interval. Be aware though that a sample size of just 5 is too small to make accurate statements.
Please be gracious in judging my english. (I am not a native speaker/writer.)
http://decodedarfur.org/

z4lis
Posts: 767
Joined: Mon Mar 03, 2008 10:59 pm UTC

Re: Statistics question.

You can still use the t-distrubtion, as long as the rocks are normally distributed. Check out this wikipedia junk. The t-value's distribution is independent of the sample distribution's true mean and standard deviation as long as you assume the samples are normal and independent. You don't need to know anything about the true mean to use it, and that's the magic. You want to come up a number a so that Pr(xbar - a <= mu <= xbar + a) >= 0.95.

Now start maniuplating the inequalities inside the Pr:

-a <= mu - xbar <= a

-a/(s/sqrt(n)) <= (mu - xbar)/(s/sqrt(n)) <= a/(s/sqrt(n))

-a/(s/sqrt(n)) <= -t <= a/(s/sqrt(n))

So we want to go look in our table for the t-square distribution with n-1 degrees of freedom and find the number C so that

Pr(-C <= t <= C) >= 0.95

We then solve C = a/(s/sqrt(n)) for a, and that's the radius of our confidence interval. Note that the s in the formula for t is the sample variance, so you know what it is.
What they (mathematicians) define as interesting depends on their particular field of study; mathematical anaylsts find pain and extreme confusion interesting, whereas geometers are interested in beauty.

lorb
Posts: 405
Joined: Wed Nov 10, 2010 10:34 am UTC
Location: Austria

Re: Statistics question.

z4lis wrote:You can still use the t-distrubtion, as long as the rocks are normally distributed.

Be aware that that is a big assumption to make. Always consider where your data comes from. Rocks for example are usually not normally distributed. (There are a lot more tiny pebble stones than big boulders.) At least you should do a normality test.
Please be gracious in judging my english. (I am not a native speaker/writer.)
http://decodedarfur.org/

Posts: 44
Joined: Sun May 01, 2011 5:18 am UTC

Re: Statistics question.

z4lis wrote:You can still use the t-distrubtion, as long as the rocks are normally distributed. Check out this wikipedia junk. The t-value's distribution is independent of the sample distribution's true mean and standard deviation as long as you assume the samples are normal and independent. You don't need to know anything about the true mean to use it, and that's the magic. You want to come up a number a so that Pr(xbar - a <= mu <= xbar + a) >= 0.95.

Now start maniuplating the inequalities inside the Pr:

-a <= mu - xbar <= a

-a/(s/sqrt(n)) <= (mu - xbar)/(s/sqrt(n)) <= a/(s/sqrt(n))

-a/(s/sqrt(n)) <= -t <= a/(s/sqrt(n))

So we want to go look in our table for the t-square distribution with n-1 degrees of freedom and find the number C so that

Pr(-C <= t <= C) >= 0.95

We then solve C = a/(s/sqrt(n)) for a, and that's the radius of our confidence interval. Note that the s in the formula for t is the sample variance, so you know what it is.

No, that tells me the 95% confidence interval for the mean.
http://officeofstrategicinfluence.com/spam/

lorb
Posts: 405
Joined: Wed Nov 10, 2010 10:34 am UTC
Location: Austria

Re: Statistics question.

Oh, did not check the formula provided. It is indeed possible to calculate a prediction interval with the t distribution from a sample assuming the population is normal.

Your prediction interval is the sample mean plus/minus t*s*sqrt(1+1/n) where t is the relevant percentile from the t distribution and s ist the sample standard deviation. Wikipedia explains how this works.

Your sample still is very small and rocks are still rarely normally distributed.
Please be gracious in judging my english. (I am not a native speaker/writer.)
http://decodedarfur.org/

Mokele
Posts: 775
Joined: Fri Aug 21, 2009 8:18 pm UTC
Location: Atlanta, GA

Re: Statistics question.

blademan9999 wrote:No, that tells me the 95% confidence interval for the mean.

I'm assuming this is a low-level stats course, so you can probably assume normality simply because that's what all these classes do at that level. If you know the current mean and standard deviation, the 95% CI is just two standard deviations in each direction from the mean. This means that 95% of rocks in your population will be in this interval.

As others have said, rocks are likely not distributed normally, and the sample size is small, but I'll wager that this is an intro stats problem and thus we're meant to just blindly plow ahead with normal distributions and small, easy-to-calculate samples (a bit like how into physics always neglects air resistance).
"With malleus aforethought, mammals got an earful of their ancestor's jaw" - J. Burns, Biograffiti

z4lis
Posts: 767
Joined: Mon Mar 03, 2008 10:59 pm UTC

Re: Statistics question.

blademan9999 wrote:No, that tells me the 95% confidence interval for the mean.

Whoops, my formula is off by a factor of sqrt(n-1/n) then!
What they (mathematicians) define as interesting depends on their particular field of study; mathematical anaylsts find pain and extreme confusion interesting, whereas geometers are interested in beauty.

PM 2Ring
Posts: 3700
Joined: Mon Jan 26, 2009 3:19 pm UTC
Location: Sydney, Australia

Re: Statistics question.

I would expect rock sizes / masses to be better modeled by a power law than a normal distribution. A power law works fairly well for asteroids and other space rocks, but I guess it'd be a bit more complicated for terrestrial rocks, although geologists use power laws to model rock fracture sizes, too.

From Asteroid size distribution
Wikipedia wrote:The number of asteroids decreases markedly with size. Although this generally follows a power law, there are 'bumps' at 5 km and 100 km, where more asteroids than expected from a logarithmic distribution are found.

FWIW, this power law also applies to meteoroids and hence to craters on airless bodies. According to the documentation of pamcrater, a crater simulation program originally written a couple of decades ago by John Walker:
John Walker wrote:The number of craters of a given size varies as the reciprocal of the area as described on pages 31 and 32 of Peitgen and Saupe, "The Science Of Fractal Images"; cratered bodies in the Solar System are observed to obey this relationship. The formula used to obtain crater radii governed by this law from a uniformly distributed pseudorandom sequence was developed by Rudy Rucker.

From Extent of power-law scaling for natural fractures in rock
Geology wrote:Abstract

New data sets from natural faults and extension fractures exhibit simple power-law scaling across 3.4–4.9 orders of magnitude, regardless of rock type or movement mode. The data show no evidence of natural gaps or scaling changes. Each data set consists of independent measurements made at different observational scales; a power-law regression to the subset of smaller fractures in each case provides an extrapolation that accurately predicts associated larger fractures. Consequently, data representing a limited range of fracture sizes may be used to characterize a much broader spectrum of fracture sizes.

Also see
On the Size Distributions of Asteroid Taxonomic Classes: The Collisional Interpretation

Posts: 44
Joined: Sun May 01, 2011 5:18 am UTC

Re: Statistics question.

lorb wrote:Oh, did not check the formula provided. It is indeed possible to calculate a prediction interval with the t distribution from a sample assuming the population is normal.

Your prediction interval is the sample mean plus/minus t*s*sqrt(1+1/n) where t is the relevant percentile from the t distribution and s ist the sample standard deviation. Wikipedia explains how this works.

Your sample still is very small and rocks are still rarely normally distributed.

I figured that would probably how it worked.
Well, now I know for sure how to do these types of questions. Also that example with the rocks was just that, an example. I really just wanted to know how to do those types of questions. Thanks.
http://officeofstrategicinfluence.com/spam/