How to find confidence interval for vector-valued function?

For the discussion of math. Duh.

Moderators: gmalivuk, Moderators General, Prelates

User avatar
Qaanol
The Cheshirest Catamount
Posts: 3060
Joined: Sat May 09, 2009 11:55 pm UTC

How to find confidence interval for vector-valued function?

Postby Qaanol » Wed Oct 31, 2018 11:51 pm UTC

I have a function f: ℝ→ℝn that takes one input and produces n outputs. It represents the result of a physical process, and is equal to a theoretical model g(x) plus a measurement error. For a given sample, I can measure the values of f(x), and I want to find out the value of x.

In particular, there is a certain ideal input x0, and I want to determine a confidence level at which I can say whether or not x = x0. That is, the value of p for which x0 falls at the boundary of the (1-p) confidence interval around x or vice-versa.

The theoretical model g(x) is known, and if it makes things easier can be assumed to be linear. The error of measurement is modeled as a multivariate normal distribution, with mean zero and a known (but different) standard deviation in each dimension. The error can be assumed either independent of x, or that its standard deviations are proportional to x.

How can I find the p-value associated with x = x0?
wee free kings

User avatar
Dason
Posts: 1310
Joined: Wed Dec 02, 2009 7:06 am UTC
Location: ~/

Re: How to find confidence interval for vector-valued function?

Postby Dason » Thu Nov 01, 2018 1:36 pm UTC

Can you give a concrete example of some f and g that you might actually consider? Or even an extremely simplified example. Simplified might be better to make it easier to talk about but I know sometimes it's hard to capture all of the necessary details while simplifying.
double epsilon = -.0000001;

User avatar
Qaanol
The Cheshirest Catamount
Posts: 3060
Joined: Sat May 09, 2009 11:55 pm UTC

Re: How to find confidence interval for vector-valued function?

Postby Qaanol » Thu Nov 01, 2018 5:03 pm UTC

Adding a certain amount (x) of a reagent to a chemical solution, then measuring the concentrations of several different reaction products (should be g(x), actual measured values have some error). I want to know if the correct amount (x0) of the reagent was added.
wee free kings

SuicideJunkie
Posts: 340
Joined: Sun Feb 22, 2015 2:40 pm UTC

Re: How to find confidence interval for vector-valued function?

Postby SuicideJunkie » Thu Nov 01, 2018 7:29 pm UTC

It would basically be a confidence area/volume/hypervolume rather than an interval, right?

Like drawing a contour map over your variables' range, with p instead of altitude, and more dimensions.

User avatar
Eebster the Great
Posts: 3107
Joined: Mon Nov 10, 2008 12:58 am UTC
Location: Cleveland, Ohio

Re: How to find confidence interval for vector-valued function?

Postby Eebster the Great » Thu Nov 01, 2018 8:07 pm UTC

No, x is still a scalar, so he is looking for an interval of real numbers (multiplied by some physical unit). Or at least a distribution over R if it isn't unimodal and so doesn't lend itself to a CI. It seems like you should just be able to get separate distributions for each observation and then multiply them together in some special way.

User avatar
Zamfir
I built a novelty castle, the irony was lost on some.
Posts: 7508
Joined: Wed Aug 27, 2008 2:43 pm UTC
Location: Nederland

Re: How to find confidence interval for vector-valued function?

Postby Zamfir » Fri Nov 02, 2018 10:35 am UTC

I think I have worked out the math here, but please check...

You have some process that draws a sample x from a random variable X with unknown distribution (more on this later)
This is your reagent.

Your reaction products are drawn from a k-valued random variable Y. For a given reagent sample x, Y is a multivariate normal distribution:

Y(X=x) ~ Nk(g(x), SIGMA(x))

g(x) is a known k-valued function of x, and the k-by-k covariant matrix SIGMA is a know function of x
---------------------------------------
Now, you have drawn a sample y from Y

You are looking for an interval [xlow, xhi], such that the conditional probabilities
p(X<xlow | Y=y) = ptarget
p(X>xhi | Y=y ) = ptarget
ptarget might (for example) be 0.025, for a 95% confidence interval
In general, you want the cumulative dsitribution function

FX(x) = p(X<x | Y =y)

from there you can choose intervals
----------------------------
It is easier to work with probability density functions (which we can later integrate to get the cdf FX)
So we are looking for the pdf

fX(x | Y=y)

We can use Bayes

fX(x | Y=y) = fY(y | X=x) * fX(x) / fY(y)

Do we have everything in that formula?
We know fY(y | X=x) = phi_k( g(x), SIGMA(x))
where phi_k(mu, SIGMA) is a k-valued gaussian function with a mean vector mu and covariate matrix SIGMA

fX(x) is the prior, the unknown distribution of X. This depends on your reagent mechanism
Perhaps that is already accurate and always close to x0. Then you should take this into account, telling you that x is likely close to x0 even if y suggests something else. But I think you are interested in the case where you don't know much about the prior.
For that case, we can assume a constant prior, fX(x) = c
Of course, it cannot be constant from -inf to inf. But it might be zero only far out, where phi_k is already close to zero.

Last part:
fY(y) = int(-inf to inf; [fY(y | X=x) * fX(x) ])dx
So this is not a function of x, just a constant to make the pdf integrate to 1. the constant c will appear here again, cancelling out.
---------------------
In the general case, this is the end. You can use numerical integration to find fX(x | Y=y)

For simplificatieon, we might take a constant matrix SIGMA and assume that g(x) is an affine function g(x) = a*x + b, where a and b are k-valued vectors
Then we can solve directly.

The nice thing is, Y(X=x) is a gaussian randomvariable, and affine transforms of a gaussian random variable are still gaussian.

Which means we can define a new, single-valued random variable
Z = vT*Y - vT*b
where we pick v such that
vT*a = 1
And the magic of gaussians gives us:

fZ(z | X=x) = phi_k (vT*g(x) -vTb, vt*SIGMA*v) = phi_k(vT*[a*x +b] -vT*b, vt*SIGMA*v) = phi_k(x, vT*SIGMA*v)

In other words, Z is an estimator for X with a (single-valued) variance s2 = vT*SIGMA*v

Going back to the bayes formula, we can write it in terms of Z:

fX(x | Z=z) = fZ(z | X=x) * fX(x) / fZ(z)

Since fZ(z | X=x) already has an integral of 1 by construction, and since we assume that fX(x) is constant over the range of interest, then fZ(z)
only compensates for the unknown value of the constant in fX(x). They cancel out, leaving our desired result:

fX(x | Z=z) = phi_k(x, vT*SIGMA*v) = phi_k(x, s2)

------------------------
Effectively, vT*Y - vT*b is a weighted average of sour reaction product samples, constructed to estimate X
For example, we can take vT = [0, 1/an, 0,0 ...], so we use one of the products as sole estimator and we end with variance
s2 = vT*SIGMA*v = SIGMAnn, the error variance of that single product in isolation

We can take other combinations, to make the variance as small as possible.
The best value for v is a quadratic optimization problem,
v_opt = argmin( vT*SIGMA*v ) constrained by aT*v = 1

You'll have to check this, but I think you need to solve the following system for the optimal v:

[SIGMA a ][v] = [0]
[aT 0 ][lambda] [1]
-----------------------

Mmm, that got out of hand. Again, please check the steps involved


Return to “Mathematics”

Who is online

Users browsing this forum: No registered users and 8 guests