Excel needs to sod off and die

A place to discuss the science of computers and programs, from algorithms to computability.

Formal proofs preferred.

Moderators: phlip, Moderators General, Prelates

Atre
Posts: 25
Joined: Wed Aug 06, 2008 8:36 pm UTC

Excel needs to sod off and die

Postby Atre » Thu Oct 15, 2009 11:20 pm UTC

I needed to extrapolate intermediate values for some collected data, so I graphed it in Excel (scatter graph) and fitted a 6th order Polynomial to the data (damn good fit) I planned to nick the equation for the polynomial to predict values between my data points... Problem is that it doesn't work, the stated equation for the polynomial doesn't fit by several orders of magnitude!!

Does anyone know how to fix the problem?

PS.Data
Spoiler:
80 0.00402
100 0.004415
120 0.004585
140 0.004585
160 0.004545
180 0.0045
200 0.00444
220 0.004405
240 0.004375
260 0.004355
273.15 0.004334601
280 0.004335766
300 0.00434
350 0.004368
400 0.00437

achan1058
Posts: 1783
Joined: Sun Nov 30, 2008 9:50 pm UTC

Re: Excel needs to sod off and die

Postby achan1058 » Thu Oct 15, 2009 11:30 pm UTC

Can you give me the polynomial, and why are you even trying to fit a deg 6 polynomial to start with?

User avatar
Kizyr
Posts: 2070
Joined: Wed Nov 15, 2006 4:16 am UTC
Location: Virginia
Contact:

Re: Excel needs to sod off and die

Postby Kizyr » Fri Oct 16, 2009 1:02 am UTC

Even without looking at your data, you sound like you're describing the basic problem with overfitting.

A 6th-order polynomial will naturally be a "damn good fit", because the more orders, the more you're precisely fitting the function to your data. The problem there is that when you try to generalize to new data (such as the spaces between your data points), it won't be a very good fit. KF
~Kizyr
Image

achan1058
Posts: 1783
Joined: Sun Nov 30, 2008 9:50 pm UTC

Re: Excel needs to sod off and die

Postby achan1058 » Fri Oct 16, 2009 2:13 am UTC

Interesting looking data. I don't think polynomial fit would work on this thing......

masher
Posts: 821
Joined: Tue Oct 23, 2007 11:07 pm UTC
Location: Melbourne, Australia

Re: Excel needs to sod off and die

Postby masher » Fri Oct 16, 2009 2:45 am UTC

Fix it by doing it properly?

It works for me...
Spoiler:
excel.png
excel.png (17.63 KiB) Viewed 3929 times




@Kizyr: I'd be comfortable using this formula between ~140 and 300. I'd be worried about the edges of the dataset. Assuming we can call the temperature dependence "smooth"

Atre
Posts: 25
Joined: Wed Aug 06, 2008 8:36 pm UTC

Re: Excel needs to sod off and die

Postby Atre » Fri Oct 16, 2009 7:33 am UTC

Interesting, the polynomial you've got is subtly different to mine.... given the magnitude of the power terms that might make a massive difference (seriously, for x =100, looks good on the graph; but the displayed eqt gave the answer of -2000ish when pasted into the spreadsheet. Was not pleased) Cheers for the help, I'll plug and chug with your (presumably) good equation

Atre
Posts: 25
Joined: Wed Aug 06, 2008 8:36 pm UTC

Re: Excel needs to sod off and die

Postby Atre » Sat Oct 17, 2009 11:08 am UTC

Solved Problem: The default display of the equation (as seen in Masher's post) leads to horrible rounding errors. Need to right click equation and change the display to many sig fig and then the equation works.... eejit machine.


In answer to why I'm doing this - as said in first post I need to convert between the quantities that are x & y on that plot. For a huge dataset (several hundred) that is a real pain in the arse to do by hand. An automated equation to do the job would be lovely, a 6th order polynomial doesn't have anything to do with the underlying physics but it matches most of the data well and that is all I require. For where it doesn't fit I have another solution :D ...

Ended
Posts: 1459
Joined: Fri Apr 20, 2007 3:27 pm UTC
Location: The Tower of Flints. (Also known as: England.)

Re: Excel needs to sod off and die

Postby Ended » Sat Oct 17, 2009 11:46 am UTC

If you're getting frustrated with Excel, there are many other tools which can do a good job at this sort of thing. E.g. in gnuplot (assuming the data is in a file called 'data'):

Code: Select all

f(x) = a + b*x + c*x**2 + d*x**3 + e*x**4 + f*x**5 + g*x**6
fit f(x) 'data' via a,b,c,d,e,f,g
plot f(x), 'data'


Output:
Spoiler:

Code: Select all

After 20 iterations the fit converged.
final sum of squares of residuals : 4.30903e-10
rel. change during last iteration : -8.26587e-08

degrees of freedom    (FIT_NDF)                        : 8
rms of residuals      (FIT_STDFIT) = sqrt(WSSR/ndf)    : 7.33914e-06
variance of residuals (reduced chisquare) = WSSR/ndf   : 5.38629e-11

Final set of parameters            Asymptotic Standard Error
=======================            ==========================

a               = -0.00492909      +/- 0.0005009    (10.16%)
b               = 0.000248624      +/- 1.673e-05    (6.729%)
c               = -2.58813e-06     +/- 2.181e-07    (8.427%)
d               = 1.387e-08        +/- 1.428e-09    (10.3%)
e               = -4.0936e-11      +/- 4.982e-12    (12.17%)
f               = 6.34018e-14      +/- 8.827e-15    (13.92%)
g               = -4.02833e-17     +/- 6.237e-18    (15.48%)

fit.jpg
fit.jpg (27.04 KiB) Viewed 3835 times
Generally I try to make myself do things I instinctively avoid, in case they are awesome.
-dubsola

CortoPasta
Posts: 38
Joined: Fri Aug 15, 2008 5:51 pm UTC

Re: Excel needs to sod off and die

Postby CortoPasta » Tue Oct 20, 2009 2:50 pm UTC

I needed to extrapolate intermediate values for some collected data, so I graphed it in Excel (scatter graph) and fitted a 6th order Polynomial to the data (damn good fit) I planned to nick the equation for the polynomial to predict values between my data points... Problem is that it doesn't work, the stated equation for the polynomial doesn't fit by several orders of magnitude!!

Does anyone know how to fix the problem?

PS.Data
Spoiler:
Spoiler:
80 0.00402
100 0.004415
120 0.004585
140 0.004585
160 0.004545
180 0.0045
200 0.00444
220 0.004405
240 0.004375
260 0.004355
273.15 0.004334601
280 0.004335766
300 0.00434
350 0.004368
400 0.00437


Usually when my stated equation for a polynomial doesn't fit, it's because I used the dryer on a high heat setting. Next time check the tag and use tumble dry low...

CrazyIvan
Posts: 30
Joined: Thu May 28, 2009 4:17 pm UTC

Re: Excel needs to sod off and die

Postby CrazyIvan » Tue Oct 20, 2009 11:23 pm UTC

A good rule of thumb is you should only at max one explanatory variable for every 10 observations you have. Which means really, your data can manage a linear fit, and a second order fit is stretching. I've also never, ever seen someone give a plausible reason for a 6-order term besides "well, it fit well". Which is *never* a reason to include it in the model.

Bad statistics are bad. And shortcuts are just massively bad science.

mouseposture
Posts: 42
Joined: Tue Sep 15, 2009 2:42 am UTC

Re: Excel needs to sod off and die

Postby mouseposture » Sun Oct 25, 2009 11:19 pm UTC

CrazyIvan wrote:A good rule of thumb is you should only at max one explanatory variable for every 10 observations you have. Which means really, your data can manage a linear fit, and a second order fit is stretching.


Agreed, but

CrazyIvan wrote: I've also never, ever seen someone give a plausible reason for a 6-order term besides "well, it fit well". Which is *never* a reason to include it in the model.


"It fit well" is a fine reason if, like the OP, all you want to do is extrapolate* the data, not understand it.
(But only ceteris paribus.)

*Wrote extrapolate, but apparently means interpolate.

CrazyIvan
Posts: 30
Joined: Thu May 28, 2009 4:17 pm UTC

Re: Excel needs to sod off and die

Postby CrazyIvan » Wed Oct 28, 2009 1:46 am UTC

mouseposture wrote:"It fit well" is a fine reason if, like the OP, all you want to do is extrapolate* the data, not understand it.
(But only ceteris paribus.)

*Wrote extrapolate, but apparently means interpolate.


Parsimony and model building is admittedly less of a big deal with a purely predictive model, but it still leaves a bad taste in my mouth. Doing anything with your data without a reason why gives me an uncomfortableness.

You get things like the OP's rounding problem as estimates get teeny tiny, a false idea of the precision of your prediction, and honestly, where do you stop? The OP fit a 6th order term. Why not a 7th? An 8th? Has he added negative or fractional power terms? If so, why not?

Those are questions you *should* be able to answer, that I very much doubt the OP can.

User avatar
headprogrammingczar
Posts: 3072
Joined: Mon Oct 22, 2007 5:28 pm UTC
Location: Beaming you up

Re: Excel needs to sod off and die

Postby headprogrammingczar » Wed Oct 28, 2009 5:06 pm UTC

That data looks a lot like a decaying cosine function. Polynomials are never good approximations for trig functions.
<quintopia> You're not crazy. you're the goddamn headprogrammingspock!
<Weeks> You're the goddamn headprogrammingspock!
<Cheese> I love you

mouseposture
Posts: 42
Joined: Tue Sep 15, 2009 2:42 am UTC

Re: Excel needs to sod off and die

Postby mouseposture » Thu Oct 29, 2009 2:51 am UTC

CrazyIvan wrote:
Parsimony and model building is admittedly less of a big deal with a purely predictive model, but it still leaves a bad taste in my mouth. Doing anything with your data without a reason why gives me an uncomfortableness.


I think you're assuming the OP is doing science. He may be doing process engineering, in which case flagrant empiricism* is perfectly appropriate. In which case, OP's mistake is not failure to base a model on an understanding of the underlying physical process, but failure to choose a modelling technique (e.g. lowess) that's appropriate when it's not possible, or cost-effective, to gain such understanding.

*Not my cup of tea, but heck, somebody's got to keep those oil-refineries and doughnut-bakeries running.


Return to “Computer Science”

Who is online

Users browsing this forum: No registered users and 7 guests