agfh
: Agnostic Fay-Herriot
ModelThe agfh
package implements the Agnostic Fay-Herriot
small area model (AGFH). It also contains tools for using the
traditional Fay-Herriot model with similar syntax for convenience. This
document gives a detailed treatment to a typical use case, analyzing toy
data with both the agnostic and traditional models.
The traditional Fay-Herriot approach models areal units m = 1, …, M in a hierarchical fashion. A top level sampling distribution corresponds to the observed response Ym, and a linking distribution governs the latent variable θm. The top-level precisions Dm are typically assumed to be known. Its success is evident it its widespread use in small area estimation, where the goal is typically to estimate θm. Both frequentist and hierarchical Bayesian estimation is common.
The AGFH model replaces the Normality assumption in the sampling model with a more flexible distribution learned from the data.
In the AGFH model, the response instead is given by where Um are independent random variables following a distribution governed by the following equations The function g(⋅) is modeled using a Gaussian process and learned from the data.
The example below demonstrates common functionality in
agfh
.
The methods null_gen
, beta_err_gen
, and
gamma_err_gen
generate data with sampling errors that are
distributed according to the Normal, Beta, and Gamma distributions
respectively. Note that in the Beta and Gamma cases the sampling errors
are transformed so their mean and variance match the the first two
moments of the traditional model.
set.seed(2023)
M <- 50
p <- 3
D <- rep(0.1, M)
lamb <- 1/2
dat <- beta_err_gen(M, p, D, lamb, 1/12, 1/6)
## AGFH Sampling The AGFH sampler can be configured as in typical
Bayesian analysis. How many samples the user requests and how many
iterations it thins determine how long the sampler will run; below it
will run for
n.mcmc*n.thin
iterations. The latent variance
λ has an Inverse-Gamma prior
on it, and its hyperparameters can be specified. As in the hierarchical
Bayes model, β has a flat
(improper) prior.
n.mcmc <- 1e2
n.thin <- 20
S.init <- seq(-10,10,length.out=M)
thet.var.prior.a <- 1
thet.var.prior.b <- 1
mh.scale.thetas <- 1
The GP governing g(⋅) employs a radial basis kernel with given hyperparameters. Once complete, the sampler returns the initial values, hyperparameters, and posterior samples.
mhg_sampler <- make_agfh_sampler(X=dat$X, Y=dat$Y, D=dat$D,
var_gamma_a=thet.var.prior.a, var_gamma_b=thet.var.prior.b,
S=S.init,
kern.a0=0.1,
kern.a1=0.1,
kern.fuzz=1e-2)
init <- list(beta=rep(0,p),
theta= predict(lm(dat$Y ~ dat$X), data.frame(dat$X)),
theta.var=1 ,
gamma = rep(0, length(S.init)))
mhg.out <- mhg_sampler(init, n.mcmc, n.thin, mh.scale.thetas, trace=FALSE)
The final state of g(⋅)
informs our understanding of the sampling errors. As seen below, the
model captures the distribution sampling errors.
agfh
also contains tools to estimate the traditional
Fay-Herriot model using hierarchical Bayes. In this case, the parameters
β, λ are given the
prior p(β, λ) = 1 ⋅ IG (a, b).
params.init <- list(beta=rep(0,p),
theta=predict(lm(dat$Y ~ dat$X), data.frame(dat$X)),#rep(0,M),
theta.var=1)
sampler <- make_gibbs_sampler(dat$X, dat$Y, dat$D, thet.var.prior.a, thet.var.prior.b)
gibbs.out <- sampler(params.init, n.mcmc, n.thin)
The Fay-Herriot model also admits straightforward estimation via
frequentist/empirical Bayes. The EBLUP estimator of θm may be found
using RM_theta_eblup(...)
. By default, this will employ a
methods-of-moments estimator of λ. agfh
also employs
several other means to estimate λ, including an adjusted REML
estimate.
theta.ests.freqeb <- RM_theta_eblup(dat$X, dat$Y, dat$D)
adj_reml_thvar <- adj_resid_likelihood_theta_var_maker(dat$X, dat$Y, dat$D)
theta.var.est.adj.reml <- theta_var_est_grid(adj_reml_thvar)
theta.ests.adj.reml <- RM_theta_eblup(dat$X, dat$Y, dat$D, theta.var.est.adj.reml)
theta.ests.lm <- predict(lm(dat$Y ~ dat$X), data.frame(dat$X))
Below we summarize the output from the various estimation methods.
As evidenced below, a more accurate treatment of the sampling errors improves estimates of θm.
AGFH | HB | LM | FreqEB |
---|---|---|---|
0.66 | 1.02 | 1.24 | 0.86 |