density.pr {GeneTS}R Documentation

Density Estimation Via Poisson Regression

Description

The function density.pr estimates a one-dimensional probability density distribution by a Poisson regression fit to histogram counts.

Usage

density.pr(x, ncells=100, trim.lo=1/1000, trim.up=1/1000, df=7, plot=FALSE, ...)

Arguments

x the raw input data (missing and infinite values will be automatically removed)
ncells number of cells of the underlying histogram
trim.lo fraction of data points to be removed from the lower tail of the distribution (this prevents convergence problems)
trim.up fraction of data points to be removed from the upper tail of the distribution (this prevents convergence problems)
df degree of freedom of the natural spline used as regression function
plot plot estimated density and the underlying histogram
... arguments passed to the plot function if plot=TRUE

Details

Density estimation using Poisson regression is advocated in Efron and Tibshirani (1996), and these authors trace the origins of this method back to Lindsey (1974a, b).

The algorithm in density.pr proceeds in two steps. First, a histogram is constructed as prelimary density estimate. Subsequently, a Poisson regression is employed to fit the histogram counts, where a natural spline is used as regression function.

Value

A list of class density with the following component:

x the coordinates of the points where the density is estimated.
y the estimated density values.
bw the bin width used in the underlying histogram.
n the original sample size (before elimination of missing and infinite values).
dispersion estimated dispersion (a warning is issued if dispersion > 1.5 - this can usually be fixed by increasing the df parameter).
call the call which produced the result.
data.name the deparsed name of the input data.
has.na TRUE if input data contains missing data or infinite values.
histogram the underlying histogram.

Author(s)

Korbinian Strimmer (http://www.statistik.lmu.de/~strimmer/).

Part of the code of the density.pr function was adopted from the locfdr package.

References

Efron, B., and Tibshirani, R. (1996). Using specially designed exponential families for density estimation. Annals of Statistics, 24, 2431–2461.

Lindsey, J. (1974a). Comparison of probability distributions. JRSS B, 36, 38-47.

Lindsey, J. (1974b). Construction and comparison of statistical models. JRSS B, 36, 418-425.

See Also

density.

Examples

# load GeneTS library
library("GeneTS")

# load data 
data(faithful)
z <- faithful[,1]
rz <- range(z)

# estimate density 
d1a <- density.pr(z, plot=TRUE)
d1a

# discretization is not critical in this algorithm
d1b <- density.pr(z, plot=TRUE, ncells=80)
d1c <- density.pr(z, plot=TRUE, ncells=40)

plot(d1a, col=2)
lines(d1b, col=2)
lines(d1c, col=2)

# comparison with kernel density estimate
d2a <- density(z, from=rz[1], to=rz[2])
d2b <- density(z, from=rz[1], to=rz[2], bw="bcv")

plot(d1a, col=2, xlim=rz)
lines(d2a)
lines(d2b)

[Package GeneTS version 2.10.1 Index]