Choosing between ASH, LBFP and GLBFP
Source:vignettes/GLBFP_estimator_choice.Rmd
GLBFP_estimator_choice.RmdThis vignette compares the three estimator families exposed by the package. It is a practical guide rather than a universal ranking of methods.
-
ASH()andASH_estimate()implement averaged shifted histogram estimates. -
LBFP()andLBFP_estimate()implement linear blend frequency polygon estimates. -
GLBFP()andGLBFP_estimate()implement the general linear blend frequency polygon estimate.
All estimators share the same basic inputs: data,
b, optional grid bounds, and, for ASH/GLBFP, the shift
vector m.
library(GLBFP)
x <- cbind(rnorm(200), rnorm(200, sd = 1.25))
b <- c(0.75, 0.9)
m <- c(2, 2)
point <- c(0, 0)
fits <- list(
ASH = ash(point, x, b = b, m = m),
LBFP = lbfp(point, x, b = b),
GLBFP = glbfp(point, x, b = b, m = m)
)
vapply(fits, function(z) z$estimation, numeric(1))
#> ASH LBFP GLBFP
#> 0.1240741 0.1395065 0.1349684Grid estimates can be compared through the common
*_estimate() interface.
grid_ash <- ash_estimate(x, b = b, m = m, grid_size = 15)
grid_lbfp <- lbfp_estimate(x, b = b, grid_size = 15)
grid_glbfp <- glbfp_estimate(x, b = b, m = m, grid_size = 15)
comparison <- data.frame(
method = c("ASH", "LBFP", "GLBFP"),
mean_density = c(
mean(grid_ash$densities),
mean(grid_lbfp$densities),
mean(grid_glbfp$densities)
),
max_density = c(
max(grid_ash$densities),
max(grid_lbfp$densities),
max(grid_glbfp$densities)
)
)
comparison
#> method mean_density max_density
#> 1 ASH 0.02629630 0.1574074
#> 2 LBFP 0.02434245 0.1441655
#> 3 GLBFP 0.02439168 0.1476182Practical starting rules
As a first pass:
- use
LBFPwhen a simple linear blend frequency polygon is sufficient; - use
GLBFPwhen a tunable shifted linear blend estimator is desired; - use
ASHwhen an averaged shifted histogram representation is desired.
The bandwidth vector b usually matters more than small
changes in m. Use compute_bi_optim() as a
reproducible starting point, then inspect sensitivity around that value.
This helper implements a plug-in bandwidth choice motivated by the
optimal cell-width calculation for multivariate frequency polygons in
Carbon and Duchesne (2024).
For manuscript figures or numerical comparisons, report the selected
b, the selected m, the grid definition, and
the estimator family. This makes the result reproducible and avoids
treating the default display as a statistical conclusion by itself.