stats::reg
-- regression (general
least square fit)Consider a ``model function'' f with n parameters p.1, .., p.n relating a dependent variable y and m independent variables x.1, .., x.m:
y = f(x.1, .., x.m; p.1, .., p.n).
Given k different measurements x.1.j, .., x.k.j for the independent variables x.j and corresponding measurements y.1, .., y.k for the dependent variable y, one fits the parameters p.1, .., p.n by minimizing the ``weighted quadratic deviation''
Delta^2(p.1,..,p.n) = sum(w.i*abs(y.i - f(x.i.1,..,x.i.m; p1,..,p.n)^2, i=1..k).
stats::reg
(..data.., f,..)
computes numerical approximations of the fit parameters.
stats::reg([x.1.1,..,x.k.1], .., [x.1.m,..,x.k.m],
[y.1,..,y.k] <, [w.1,..,w.k]>, f, [x.1,..,x.m], [p.1,..,p.n]
<, StartingValues =
[p.1(0),..,p.n(0)]>)
stats::reg([[x.1.1,..,x.1.m, y.1 <, w.1>], ..,
[x.k.1,..,x.k.m, y.k <, w.k>]], f, [x.1,..,x.m], [p.1,..,p.n]
<, StartingValues =
[p.1(0),..,p.n(0)]>)
stats::reg(s, c.1, .., c.m, cy <, cw>, f, [x.1,..,x.m],
[p.1,..,p.n] <, StartingValues =
[p.1(0),..,p.n(0)]>)
stats::reg(s, [c.1, .., c.m, cy <, cw>], f,
[x.1,..,x.m], [p.1,..,p.n] <, StartingValues =
[p.1(0),..,p.n(0)]>)
x.1.1,..,x.k.m |
- | numerical sample data for the independent variables.
The entry x.i.j represents the i-th measurement
of the independent variable x.j . |
y.1,..,y.k |
- | numerical sample data for the dependent variable. The
entry y.i represents the i-th measurement of
the dependent variable. |
w.1,..,w.k |
- | numerical weight factors. The entry w.i
is used as a weight for the data (x.i.1,..,x.i.m,y.i) of
the i-th measurement. If no weights are provided, then
w.i = 1 is used. |
f |
- | the model function: an arithmetical expression
representing a function of the independent variables x.1, ..,
x.m and the fit parameters p.1, .., p.n . The
expression must not contain any symbolic objects apart from x.1,
.., p.n . |
x.1,..,x.m |
- | the independent variables: identifiers or indexed identifiers. |
p.1,..,p.n |
- | the fit parameters: identifiers or indexed identifiers. |
p.1(0),..,p.n(0) |
- | The user can assist the internal numerical search by
providing numerical starting values p.i(0) for the fit
parameters p.i . These should be reasonably close to the
optimal fit values. The starting values p.i(0) = 1.0 are
used, if no other values are provided by the user. |
s |
- | a sample of domain type stats::sample containing the data
x.i.j for the independent variables, the data
y.i for the dependent variable and, optionally, the
weights w.i . |
c.1,..,c.m |
- | positive integers representing column indices of the
sample s . Column c.j provides the
measurements x.i.j for the independent variable
x.j . |
cy |
- | a positive integer representing a column index of the
sample s . This column provides the measurements
y.i for the dependent variable. |
cw |
- | a positive integer representing a column index of the
sample s . This column provides the weight factors
w.i . |
a list [[p.1,..,p.n], Delta^2] containing the optimized
fit parameters p.i minimizing the quadratic deviation. The
minimized value of this deviation is given by Delta^2, it
indicates the quality of the fit. All returned data are floating point
values. FAIL
is returned, if a least square fit of the
data is not possible with the given model function or if the internal
numerical search failed.
The function is sensitive to the environment variable DIGITS
, which determines the
numerical working precision.
float
.p1 + p2*x1^2 +
exp(p3 + p4*x2)
with the independent variables x1,
x2
and the fit parameters p1, .., p4
is
accepted.There are rare cases where the implemented algorithm converges to a local minimum rather than to a global minimum. In particular, this problem may arise when the model involves periodic functions. It is recommended to provide suitable starting values for the fit parameters in this case. Cf. example 4.
=
[p.1(0),..,p.n(0)]
We fit a linear function y = p1 + p2*x1 to four data pairs (x.i.1,y.i) given by two lists:
>> stats::reg([0, 1, 2, 3], [1, 3, 5, 7], p1 + p2*x1, [x1], [p1, p2])
[[1.0, 2.0], 0.0]
The parameter values p1=1.0, p2=2.0 provide a perfect fit: the quadratic deviation vanishes.
We fit an exponential function y = a*exp(x) to five data pairs (x.i,y.i). Weights are used to decrease the influence of the ``exceptional pair'' (x,y) = (5.0, 6.5*10^6) on the fit:
>> stats::reg([[1.1, 54, 1], [1.2, 73, 1], [1.3, 98, 1], [1.4, 133, 1], [5.0, 6.5*10^6, 10^(-4)]], a*exp(b*x), [x], [a, b])
[[1.992321622, 2.999602426], 0.2001899629]
We create a sample with four columns. The first column is a counter labeling the measurements. The second and third column provide measured data of two variables x1 and x2, respectively. The last column provides corresponding measurements of a dependent variable.
>> s := stats::sample([[1, 0, 0, 1.1], [2, 0, 1, 5.4], [3, 1, 1, 8.5], [4, 1, 2, 18.5], [5, 2, 1, 15.0], [6, 2, 2, 24.8]])
1 0 0 1.1 2 0 1 5.4 3 1 1 8.5 4 1 2 18.5 5 2 1 15.0 6 2 2 24.8
First, we try to model the data provided by the columns 2, 3, 4 by a function that is linear in the variables x1, x2. We specify the data columns by a list of column indices:
>> stats::reg(s, [2, 3, 4], p1 + p2*x1 + p3*x2, [x1, x2], [p1, p2, p3])
[[-0.9568181818, 4.688636364, 7.272727273], 15.23613636]
The quadratic deviation is rather large, indicating that a linear function is inappropriate to fit the data. Next, we extend the model and consider a polynomial fit function of degree 2. This is still a linear regression problem, because the fit parameters enter the model function linearly. We specify the data columns by a sequence of column indices:
>> stats::reg(s, 2, 3, 4, p1 + p2*x1 + p3*x2 + p4*x1^2 + p5*x2^2, [x1, x2], [p1, p2, p3, p4, p5])
[[1.1, 1.525, 1.5, 1.625, 2.8], 0.01]
Finally, we include a further term p6*x1*x2
in the model, obtaining a perfect fit:
>> stats::reg(s, 2, 3, 4, p1 + p2*x1 + p3*x2 + p4*x1^2 + p5*x2^2 + p6*x1*x2, [x1, x2], [p1, p2, p3, p4, p5, p6])
[[1.1, 1.6, 1.35, 1.7, 2.95, -0.2], 4.267632241e-34]
>> delete s:
We create a sample of two columns:
>> s := stats::sample([[1, -1.44], [2, -0.82], [3, 0.97], [4, 1.37]])
1 -1.44 2 -0.82 3 0.97 4 1.37
The data are to be modeled by a function of the form y = p1*sin(p2*x), where the first column contains measurements of x and the second column contains corresponding data for y. Note that in this example there is no need to specify column indices, because the sample contains only two columns:
>> stats::reg(s, a*sin(b*x), [x], [a, b])
[[-1.499812823, 1.281963381], 0.00001255632629]
Fitting a periodic function may be problematic. We provide starting values for the fit parameters and obtain a quite different set of parameters approximating the data with the same quality:
>> stats::reg(s, a*sin(b*x), [x], [a, b], StartingValues = [2, 5])
[[1.499812823, 5.001221926], 0.00001255632629]
>> delete s:
stats::reg
uses a Marquard-Levenberg gradient
expansion algorithm. Searching for the minimum of
Delta^2(p.1,..,p.n) the algorithm does not simply follow the
negative gradient, but the diagonal terms of the curvature matrix are
increased by a factor that is optimized in each step of the
search.