---
title: "Prior Calibration Approaches for Parametric Components of Stochastic Tree Ensembles"
output: rmarkdown::html_vignette
vignette: >
  %\VignetteIndexEntry{Prior-Calibration}
  %\VignetteEncoding{UTF-8}
  %\VignetteEngine{knitr::rmarkdown}
bibliography: vignettes.bib
editor_options: 
  markdown: 
    wrap: 72
---

```{r, include = FALSE}
knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>"
)
```

# Background

The "classic" BART model of @chipman2010bart 

\begin{equation*}
\begin{aligned}
y &= f(X) + \epsilon\\
f(X) &\sim \text{BART}\left(\alpha, \beta\right)\\
\epsilon &\sim \mathcal{N}\left(0,\sigma^2\right)\\
\sigma^2 &\sim \text{IG}\left(a,b\right)
\end{aligned}
\end{equation*}

is semiparametric, with a nonparametric tree ensemble $f(X)$ and a homoskedastic error variance parameter $\sigma^2$. Note that in @chipman2010bart, $a$ and $b$ are parameterized with $a = \frac{\nu}{2}$ and $b = \frac{\nu\lambda}{2}$.

# Setting Priors on Variance Parameters in `stochtree`

By default, `stochtree` employs a Jeffreys' prior for $\sigma^2$
\begin{equation*}
\begin{aligned}
\sigma^2 &\propto \frac{1}{\sigma^2}
\end{aligned}
\end{equation*}
which corresponds to an improper prior with $a = 0$ and $b = 0$.

We provide convenience functions for users wishing to set the $\sigma^2$ prior as in @chipman2010bart. 
In this case, $\nu$ is set by default to 3 and $\lambda$ is calibrated as follows:

1. An "overestimate," $\hat{\sigma}^2$, of $\sigma^2$ is obtained via simple linear regression of $y$ on $X$
2. $\lambda$ is chosen to ensure that $p(\sigma^2 < \hat{\sigma}^2) = q$ for some value $q$, typically set to a default value of 0.9.

This is done in `stochtree` via the `calibrateInverseGammaErrorVariance` function.

```{r}
# Load library
library(stochtree)

# Generate data
n <- 500
p <- 5
X <- matrix(runif(n*p), ncol = p)
f_XW <- (
    ((0 <= X[,1]) & (0.25 > X[,1])) * (-7.5) + 
    ((0.25 <= X[,1]) & (0.5 > X[,1])) * (-2.5) + 
    ((0.5 <= X[,1]) & (0.75 > X[,1])) * (2.5) + 
    ((0.75 <= X[,1]) & (1 > X[,1])) * (7.5)
)
noise_sd <- 1
y <- f_XW + rnorm(n, 0, noise_sd)

# Test/train split
test_set_pct <- 0.2
n_test <- round(test_set_pct*n)
n_train <- n - n_test
test_inds <- sort(sample(1:n, n_test, replace = FALSE))
train_inds <- (1:n)[!((1:n) %in% test_inds)]
X_test <- X[test_inds,]
X_train <- X[train_inds,]
y_test <- y[test_inds]
y_train <- y[train_inds]

# Calibrate the scale parameter for the variance term as in Chipman et al (2010)
nu <- 3
lambda <- calibrateInverseGammaErrorVariance(y_train, X_train, nu = nu)
```

Now we run a BART model with this variance parameterization

```{r}
general_params <- list(sigma2_global_shape = nu/2, sigma2_global_scale = (nu*lambda)/2)
bart_model <- bart(X_train = X_train, y_train = y_train, X_test = X_test, 
                   num_gfr = 0, num_burnin = 1000, num_mcmc = 100, 
                   general_params = general_params)
```

Inspect the out-of-sample predictions of the model

```{r}
plot(rowMeans(bart_model$y_hat_test), y_test, xlab = "predicted", ylab = "actual")
abline(0,1,col="red",lty=3,lwd=3)
```

Inspect the posterior samples of $\sigma^2$

```{r}
plot(bart_model$sigma2_global_samples, ylab = "sigma^2", xlab = "iteration")
abline(h = noise_sd^2, col = "red", lty = 3, lwd = 3)
```

# References