Title: | Stochastic Tree Ensembles (XBART and BART) for Supervised Learning and Causal Inference |
---|---|
Description: | Flexible stochastic tree ensemble software. Robust implementations of Bayesian Additive Regression Trees (BART) Chipman, George, McCulloch (2010) <doi:10.1214/09-AOAS285> for supervised learning and Bayesian Causal Forests (BCF) Hahn, Murray, Carvalho (2020) <doi:10.1214/19-BA1195> for causal inference. Enables model serialization and parallel sampling and provides a low-level interface for custom stochastic forest samplers. |
Authors: | Drew Herren [aut, cre] |
Maintainer: | Drew Herren <[email protected]> |
License: | MIT + file LICENSE |
Version: | 0.1.1 |
Built: | 2025-02-13 03:01:50 UTC |
Source: | https://github.com/stochastictree/stochtree |
Flexible stochastic tree ensemble software. Robust implementations of Bayesian Additive Regression Trees (BART) Chipman, George, McCulloch (2010) doi:10.1214/09-AOAS285 for supervised learning and Bayesian Causal Forests (BCF) Hahn, Murray, Carvalho (2020) doi:10.1214/19-BA1195 for causal inference. Enables model serialization and parallel sampling and provides a low-level interface for custom stochastic forest samplers.
Maintainer: Drew Herren [email protected] (ORCID)
Authors:
Richard Hahn
Jared Murray
Carlos Carvalho
Jingyu He
Other contributors:
Pedro Lima [contributor]
stochtree contributors [copyright holder]
Eigen contributors (C++ source uses the Eigen library for matrix operations, see inst/COPYRIGHTS) [copyright holder]
xgboost contributors (C++ tree code and related operations include or are inspired by code from the xgboost library, see inst/COPYRIGHTS) [copyright holder]
treelite contributors (C++ tree code and related operations include or are inspired by code from the treelite library, see inst/COPYRIGHTS) [copyright holder]
Microsoft Corporation (C++ I/O and various project structure code include or are inspired by code from the LightGBM library, which is a copyright of Microsoft, see inst/COPYRIGHTS) [copyright holder]
Niels Lohmann (C++ source uses the JSON for Modern C++ library for JSON operations, see inst/COPYRIGHTS) [copyright holder]
Daniel Lemire (C++ source uses the fast_double_parser library internally, see inst/COPYRIGHTS) [copyright holder]
Victor Zverovich (C++ source uses the fmt library internally, see inst/COPYRIGHTS) [copyright holder]
Useful links:
Report bugs at https://github.com/StochasticTree/stochtree/issues
Run the BART algorithm for supervised learning.
bart( X_train, y_train, leaf_basis_train = NULL, rfx_group_ids_train = NULL, rfx_basis_train = NULL, X_test = NULL, leaf_basis_test = NULL, rfx_group_ids_test = NULL, rfx_basis_test = NULL, num_gfr = 5, num_burnin = 0, num_mcmc = 100, previous_model_json = NULL, previous_model_warmstart_sample_num = NULL, general_params = list(), mean_forest_params = list(), variance_forest_params = list() )
bart( X_train, y_train, leaf_basis_train = NULL, rfx_group_ids_train = NULL, rfx_basis_train = NULL, X_test = NULL, leaf_basis_test = NULL, rfx_group_ids_test = NULL, rfx_basis_test = NULL, num_gfr = 5, num_burnin = 0, num_mcmc = 100, previous_model_json = NULL, previous_model_warmstart_sample_num = NULL, general_params = list(), mean_forest_params = list(), variance_forest_params = list() )
X_train |
Covariates used to split trees in the ensemble. May be provided either as a dataframe or a matrix. Matrix covariates will be assumed to be all numeric. Covariates passed as a dataframe will be preprocessed based on the variable types (e.g. categorical columns stored as unordered factors will be one-hot encoded, categorical columns stored as ordered factors will passed as integers to the core algorithm, along with the metadata that the column is ordered categorical). |
y_train |
Outcome to be modeled by the ensemble. |
leaf_basis_train |
(Optional) Bases used to define a regression model |
rfx_group_ids_train |
(Optional) Group labels used for an additive random effects model. |
rfx_basis_train |
(Optional) Basis for "random-slope" regression in an additive random effects model.
If |
X_test |
(Optional) Test set of covariates used to define "out of sample" evaluation data.
May be provided either as a dataframe or a matrix, but the format of |
leaf_basis_test |
(Optional) Test set of bases used to define "out of sample" evaluation data.
While a test set is optional, the structure of any provided test set must match that
of the training set (i.e. if both |
rfx_group_ids_test |
(Optional) Test set group labels used for an additive random effects model. We do not currently support (but plan to in the near future), test set evaluation for group labels that were not in the training set. |
rfx_basis_test |
(Optional) Test set basis for "random-slope" regression in additive random effects model. |
num_gfr |
Number of "warm-start" iterations run using the grow-from-root algorithm (He and Hahn, 2021). Default: 5. |
num_burnin |
Number of "burn-in" iterations of the MCMC sampler. Default: 0. |
num_mcmc |
Number of "retained" iterations of the MCMC sampler. Default: 100. |
previous_model_json |
(Optional) JSON string containing a previous BART model. This can be used to "continue" a sampler interactively after inspecting the samples or to run parallel chains "warm-started" from existing forest samples. Default: |
previous_model_warmstart_sample_num |
(Optional) Sample number from |
general_params |
(Optional) A list of general (non-forest-specific) model parameters, each of which has a default value processed internally, so this argument list is optional.
|
mean_forest_params |
(Optional) A list of mean forest model parameters, each of which has a default value processed internally, so this argument list is optional.
|
variance_forest_params |
(Optional) A list of variance forest model parameters, each of which has a default value processed internally, so this argument list is optional.
|
List of sampling outputs and a wrapper around the sampled forests (which can be used for in-memory prediction on new data, or serialized to JSON on disk).
n <- 100 p <- 5 X <- matrix(runif(n*p), ncol = p) f_XW <- ( ((0 <= X[,1]) & (0.25 > X[,1])) * (-7.5) + ((0.25 <= X[,1]) & (0.5 > X[,1])) * (-2.5) + ((0.5 <= X[,1]) & (0.75 > X[,1])) * (2.5) + ((0.75 <= X[,1]) & (1 > X[,1])) * (7.5) ) noise_sd <- 1 y <- f_XW + rnorm(n, 0, noise_sd) test_set_pct <- 0.2 n_test <- round(test_set_pct*n) n_train <- n - n_test test_inds <- sort(sample(1:n, n_test, replace = FALSE)) train_inds <- (1:n)[!((1:n) %in% test_inds)] X_test <- X[test_inds,] X_train <- X[train_inds,] y_test <- y[test_inds] y_train <- y[train_inds] bart_model <- bart(X_train = X_train, y_train = y_train, X_test = X_test, num_gfr = 10, num_burnin = 0, num_mcmc = 10)
n <- 100 p <- 5 X <- matrix(runif(n*p), ncol = p) f_XW <- ( ((0 <= X[,1]) & (0.25 > X[,1])) * (-7.5) + ((0.25 <= X[,1]) & (0.5 > X[,1])) * (-2.5) + ((0.5 <= X[,1]) & (0.75 > X[,1])) * (2.5) + ((0.75 <= X[,1]) & (1 > X[,1])) * (7.5) ) noise_sd <- 1 y <- f_XW + rnorm(n, 0, noise_sd) test_set_pct <- 0.2 n_test <- round(test_set_pct*n) n_train <- n - n_test test_inds <- sort(sample(1:n, n_test, replace = FALSE)) train_inds <- (1:n)[!((1:n) %in% test_inds)] X_test <- X[test_inds,] X_train <- X[train_inds,] y_test <- y[test_inds] y_train <- y[train_inds] bart_model <- bart(X_train = X_train, y_train = y_train, X_test = X_test, num_gfr = 10, num_burnin = 0, num_mcmc = 10)
Run the Bayesian Causal Forest (BCF) algorithm for regularized causal effect estimation.
bcf( X_train, Z_train, y_train, propensity_train = NULL, rfx_group_ids_train = NULL, rfx_basis_train = NULL, X_test = NULL, Z_test = NULL, propensity_test = NULL, rfx_group_ids_test = NULL, rfx_basis_test = NULL, num_gfr = 5, num_burnin = 0, num_mcmc = 100, previous_model_json = NULL, previous_model_warmstart_sample_num = NULL, general_params = list(), prognostic_forest_params = list(), treatment_effect_forest_params = list(), variance_forest_params = list() )
bcf( X_train, Z_train, y_train, propensity_train = NULL, rfx_group_ids_train = NULL, rfx_basis_train = NULL, X_test = NULL, Z_test = NULL, propensity_test = NULL, rfx_group_ids_test = NULL, rfx_basis_test = NULL, num_gfr = 5, num_burnin = 0, num_mcmc = 100, previous_model_json = NULL, previous_model_warmstart_sample_num = NULL, general_params = list(), prognostic_forest_params = list(), treatment_effect_forest_params = list(), variance_forest_params = list() )
X_train |
Covariates used to split trees in the ensemble. May be provided either as a dataframe or a matrix. Matrix covariates will be assumed to be all numeric. Covariates passed as a dataframe will be preprocessed based on the variable types (e.g. categorical columns stored as unordered factors will be one-hot encoded, categorical columns stored as ordered factors will passed as integers to the core algorithm, along with the metadata that the column is ordered categorical). |
Z_train |
Vector of (continuous or binary) treatment assignments. |
y_train |
Outcome to be modeled by the ensemble. |
propensity_train |
(Optional) Vector of propensity scores. If not provided, this will be estimated from the data. |
rfx_group_ids_train |
(Optional) Group labels used for an additive random effects model. |
rfx_basis_train |
(Optional) Basis for "random-slope" regression in an additive random effects model.
If |
X_test |
(Optional) Test set of covariates used to define "out of sample" evaluation data.
May be provided either as a dataframe or a matrix, but the format of |
Z_test |
(Optional) Test set of (continuous or binary) treatment assignments. |
propensity_test |
(Optional) Vector of propensity scores. If not provided, this will be estimated from the data. |
rfx_group_ids_test |
(Optional) Test set group labels used for an additive random effects model. We do not currently support (but plan to in the near future), test set evaluation for group labels that were not in the training set. |
rfx_basis_test |
(Optional) Test set basis for "random-slope" regression in additive random effects model. |
num_gfr |
Number of "warm-start" iterations run using the grow-from-root algorithm (He and Hahn, 2021). Default: 5. |
num_burnin |
Number of "burn-in" iterations of the MCMC sampler. Default: 0. |
num_mcmc |
Number of "retained" iterations of the MCMC sampler. Default: 100. |
previous_model_json |
(Optional) JSON string containing a previous BCF model. This can be used to "continue" a sampler interactively after inspecting the samples or to run parallel chains "warm-started" from existing forest samples. Default: |
previous_model_warmstart_sample_num |
(Optional) Sample number from |
general_params |
(Optional) A list of general (non-forest-specific) model parameters, each of which has a default value processed internally, so this argument list is optional.
|
prognostic_forest_params |
(Optional) A list of prognostic forest model parameters, each of which has a default value processed internally, so this argument list is optional.
|
treatment_effect_forest_params |
(Optional) A list of treatment effect forest model parameters, each of which has a default value processed internally, so this argument list is optional.
|
variance_forest_params |
(Optional) A list of variance forest model parameters, each of which has a default value processed internally, so this argument list is optional.
|
List of sampling outputs and a wrapper around the sampled forests (which can be used for in-memory prediction on new data, or serialized to JSON on disk).
n <- 500 p <- 5 X <- matrix(runif(n*p), ncol = p) mu_x <- ( ((0 <= X[,1]) & (0.25 > X[,1])) * (-7.5) + ((0.25 <= X[,1]) & (0.5 > X[,1])) * (-2.5) + ((0.5 <= X[,1]) & (0.75 > X[,1])) * (2.5) + ((0.75 <= X[,1]) & (1 > X[,1])) * (7.5) ) pi_x <- ( ((0 <= X[,1]) & (0.25 > X[,1])) * (0.2) + ((0.25 <= X[,1]) & (0.5 > X[,1])) * (0.4) + ((0.5 <= X[,1]) & (0.75 > X[,1])) * (0.6) + ((0.75 <= X[,1]) & (1 > X[,1])) * (0.8) ) tau_x <- ( ((0 <= X[,2]) & (0.25 > X[,2])) * (0.5) + ((0.25 <= X[,2]) & (0.5 > X[,2])) * (1.0) + ((0.5 <= X[,2]) & (0.75 > X[,2])) * (1.5) + ((0.75 <= X[,2]) & (1 > X[,2])) * (2.0) ) Z <- rbinom(n, 1, pi_x) noise_sd <- 1 y <- mu_x + tau_x*Z + rnorm(n, 0, noise_sd) test_set_pct <- 0.2 n_test <- round(test_set_pct*n) n_train <- n - n_test test_inds <- sort(sample(1:n, n_test, replace = FALSE)) train_inds <- (1:n)[!((1:n) %in% test_inds)] X_test <- X[test_inds,] X_train <- X[train_inds,] pi_test <- pi_x[test_inds] pi_train <- pi_x[train_inds] Z_test <- Z[test_inds] Z_train <- Z[train_inds] y_test <- y[test_inds] y_train <- y[train_inds] mu_test <- mu_x[test_inds] mu_train <- mu_x[train_inds] tau_test <- tau_x[test_inds] tau_train <- tau_x[train_inds] bcf_model <- bcf(X_train = X_train, Z_train = Z_train, y_train = y_train, propensity_train = pi_train, X_test = X_test, Z_test = Z_test, propensity_test = pi_test, num_gfr = 10, num_burnin = 0, num_mcmc = 10)
n <- 500 p <- 5 X <- matrix(runif(n*p), ncol = p) mu_x <- ( ((0 <= X[,1]) & (0.25 > X[,1])) * (-7.5) + ((0.25 <= X[,1]) & (0.5 > X[,1])) * (-2.5) + ((0.5 <= X[,1]) & (0.75 > X[,1])) * (2.5) + ((0.75 <= X[,1]) & (1 > X[,1])) * (7.5) ) pi_x <- ( ((0 <= X[,1]) & (0.25 > X[,1])) * (0.2) + ((0.25 <= X[,1]) & (0.5 > X[,1])) * (0.4) + ((0.5 <= X[,1]) & (0.75 > X[,1])) * (0.6) + ((0.75 <= X[,1]) & (1 > X[,1])) * (0.8) ) tau_x <- ( ((0 <= X[,2]) & (0.25 > X[,2])) * (0.5) + ((0.25 <= X[,2]) & (0.5 > X[,2])) * (1.0) + ((0.5 <= X[,2]) & (0.75 > X[,2])) * (1.5) + ((0.75 <= X[,2]) & (1 > X[,2])) * (2.0) ) Z <- rbinom(n, 1, pi_x) noise_sd <- 1 y <- mu_x + tau_x*Z + rnorm(n, 0, noise_sd) test_set_pct <- 0.2 n_test <- round(test_set_pct*n) n_train <- n - n_test test_inds <- sort(sample(1:n, n_test, replace = FALSE)) train_inds <- (1:n)[!((1:n) %in% test_inds)] X_test <- X[test_inds,] X_train <- X[train_inds,] pi_test <- pi_x[test_inds] pi_train <- pi_x[train_inds] Z_test <- Z[test_inds] Z_train <- Z[train_inds] y_test <- y[test_inds] y_train <- y[train_inds] mu_test <- mu_x[test_inds] mu_train <- mu_x[train_inds] tau_test <- tau_x[test_inds] tau_train <- tau_x[train_inds] bcf_model <- bcf(X_train = X_train, Z_train = Z_train, y_train = y_train, propensity_train = pi_train, X_test = X_test, Z_test = Z_test, propensity_test = pi_test, num_gfr = 10, num_burnin = 0, num_mcmc = 10)
Chipman, H., George, E., Hahn, R., McCulloch, R., Pratola, M. and Sparapani, R. (2022). Bayesian Additive Regression Trees, Computational Approaches. In Wiley StatsRef: Statistics Reference Online (eds N. Balakrishnan, T. Colton, B. Everitt, W. Piegorsch, F. Ruggeri and J.L. Teugels). https://doi.org/10.1002/9781118445112.stat08288
calibrateInverseGammaErrorVariance( y, X, W = NULL, nu = 3, quant = 0.9, standardize = TRUE )
calibrateInverseGammaErrorVariance( y, X, W = NULL, nu = 3, quant = 0.9, standardize = TRUE )
y |
Outcome to be modeled using BART, BCF or another nonparametric ensemble method. |
X |
Covariates to be used to partition trees in an ensemble or series of ensemble. |
W |
(Optional) Basis used to define a "leaf regression" model for each decision tree. The "classic" BART model assumes a constant leaf parameter, which is equivalent to a "leaf regression" on a basis of all ones, though it is not necessary to pass a vector of ones, here or to the BART function. Default: |
nu |
The shape parameter for the global error variance's IG prior. The scale parameter in the Sparapani et al (2021) parameterization is defined as |
quant |
(Optional) Quantile of the inverse gamma prior distribution represented by a linear-regression-based overestimate of |
standardize |
(Optional) Whether or not outcome should be standardized ( |
Value of lambda
which determines the scale parameter of the global error variance prior (sigma^2 ~ IG(nu,nu*lambda)
)
n <- 100 p <- 5 X <- matrix(runif(n*p), ncol = p) y <- 10*X[,1] - 20*X[,2] + rnorm(n) nu <- 3 lambda <- calibrateInverseGammaErrorVariance(y, X, nu = nu) sigma2hat <- mean(resid(lm(y~X))^2) mean(var(y)/rgamma(100000, nu, rate = nu*lambda) < sigma2hat)
n <- 100 p <- 5 X <- matrix(runif(n*p), ncol = p) y <- 10*X[,1] - 20*X[,2] + rnorm(n) nu <- 3 lambda <- calibrateInverseGammaErrorVariance(y, X, nu = nu) sigma2hat <- mean(resid(lm(y~X))^2) mean(var(y)/rgamma(100000, nu, rate = nu*lambda) < sigma2hat)
Compute and return a vector representation of a forest's leaf predictions for every observation in a dataset.
The vector has a "row-major" format that can be easily re-represented as
as a CSR sparse matrix: elements are organized so that the first n
elements
correspond to leaf predictions for all n
observations in a dataset for the
first tree in an ensemble, the next n
elements correspond to predictions for
the second tree and so on. The "data" for each element corresponds to a uniquely
mapped column index that corresponds to a single leaf of a single tree (i.e.
if tree 1 has 3 leaves, its column indices range from 0 to 2, and then tree 2's
leaf indices begin at 3, etc...).
computeForestLeafIndices( model_object, covariates, forest_type = NULL, forest_inds = NULL )
computeForestLeafIndices( model_object, covariates, forest_type = NULL, forest_inds = NULL )
model_object |
Object of type |
covariates |
Covariates to use for prediction. Must have the same dimensions / column types as the data used to train a forest. |
forest_type |
Which forest to use from 1. BART
2. BCF
3. ForestSamples
|
forest_inds |
(Optional) Indices of the forest sample(s) for which to compute leaf indices. If not provided,
this function will return leaf indices for every sample of a forest.
This function uses 0-indexing, so the first forest sample corresponds to |
List of vectors. Each vector is of size num_obs * num_trees
, where num_obs = nrow(covariates)
and num_trees
is the number of trees in the relevant forest of model_object
.
X <- matrix(runif(10*100), ncol = 10) y <- -5 + 10*(X[,1] > 0.5) + rnorm(100) bart_model <- bart(X, y, num_gfr=0, num_mcmc=10) computeForestLeafIndices(bart_model, X, "mean") computeForestLeafIndices(bart_model, X, "mean", 0) computeForestLeafIndices(bart_model, X, "mean", c(1,3,9))
X <- matrix(runif(10*100), ncol = 10) y <- -5 + 10*(X[,1] > 0.5) + rnorm(100) bart_model <- bart(X, y, num_gfr=0, num_mcmc=10) computeForestLeafIndices(bart_model, X, "mean") computeForestLeafIndices(bart_model, X, "mean", 0) computeForestLeafIndices(bart_model, X, "mean", c(1,3,9))
Return each forest's leaf node scale parameters.
If leaf scale is not sampled for the forest in question, throws an error that the leaf model does not have a stochastic scale parameter.
computeForestLeafVariances(model_object, forest_type, forest_inds = NULL)
computeForestLeafVariances(model_object, forest_type, forest_inds = NULL)
model_object |
Object of type |
forest_type |
Which forest to use from 1. BART
2. BCF
|
forest_inds |
(Optional) Indices of the forest sample(s) for which to compute leaf indices. If not provided,
this function will return leaf indices for every sample of a forest.
This function uses 0-indexing, so the first forest sample corresponds to |
Vector of size length(forest_inds)
with the leaf scale parameter for each requested forest.
X <- matrix(runif(10*100), ncol = 10) y <- -5 + 10*(X[,1] > 0.5) + rnorm(100) bart_model <- bart(X, y, num_gfr=0, num_mcmc=10) computeForestLeafVariances(bart_model, "mean") computeForestLeafVariances(bart_model, "mean", 0) computeForestLeafVariances(bart_model, "mean", c(1,3,5))
X <- matrix(runif(10*100), ncol = 10) y <- -5 + 10*(X[,1] > 0.5) + rnorm(100) bart_model <- bart(X, y, num_gfr=0, num_mcmc=10) computeForestLeafVariances(bart_model, "mean") computeForestLeafVariances(bart_model, "mean", 0) computeForestLeafVariances(bart_model, "mean", c(1,3,5))
computeForestLeafIndices
for the forests in a designated forest sample container.Compute and return the largest possible leaf index computable by computeForestLeafIndices
for the forests in a designated forest sample container.
computeForestMaxLeafIndex( model_object, covariates, forest_type = NULL, forest_inds = NULL )
computeForestMaxLeafIndex( model_object, covariates, forest_type = NULL, forest_inds = NULL )
model_object |
Object of type |
covariates |
Covariates to use for prediction. Must have the same dimensions / column types as the data used to train a forest. |
forest_type |
Which forest to use from 1. BART
2. BCF
3. ForestSamples
|
forest_inds |
(Optional) Indices of the forest sample(s) for which to compute max leaf indices. If not provided,
this function will return max leaf indices for every sample of a forest.
This function uses 0-indexing, so the first forest sample corresponds to |
Vector containing the largest possible leaf index computable by computeForestLeafIndices
for the forests in a designated forest sample container.
X <- matrix(runif(10*100), ncol = 10) y <- -5 + 10*(X[,1] > 0.5) + rnorm(100) bart_model <- bart(X, y, num_gfr=0, num_mcmc=10) computeForestMaxLeafIndex(bart_model, X, "mean") computeForestMaxLeafIndex(bart_model, X, "mean", 0) computeForestMaxLeafIndex(bart_model, X, "mean", c(1,3,9))
X <- matrix(runif(10*100), ncol = 10) y <- -5 + 10*(X[,1] > 0.5) + rnorm(100) bart_model <- bart(X, y, num_gfr=0, num_mcmc=10) computeForestMaxLeafIndex(bart_model, X, "mean") computeForestMaxLeafIndex(bart_model, X, "mean", 0) computeForestMaxLeafIndex(bart_model, X, "mean", c(1,3,9))
Convert the persistent aspects of a covariate preprocessor to (in-memory) C++ JSON object
convertPreprocessorToJson(object)
convertPreprocessorToJson(object)
object |
List containing information on variables, including train set categories for categorical variables |
wrapper around in-memory C++ JSON object
cov_mat <- matrix(1:12, ncol = 3) preprocess_list <- preprocessTrainData(cov_mat) preprocessor_json <- convertPreprocessorToJson(preprocess_list$metadata)
cov_mat <- matrix(1:12, ncol = 3) preprocess_list <- preprocessTrainData(cov_mat) preprocessor_json <- convertPreprocessorToJson(preprocess_list$metadata)
Wrapper around a C++ container of tree ensembles
json_ptr
External pointer to a C++ nlohmann::json object
num_forests
Number of forests in the nlohmann::json object
forest_labels
Names of forest objects in the overall nlohmann::json object
num_rfx
Number of random effects terms in the nlohman::json object
rfx_container_labels
Names of rfx container objects in the overall nlohmann::json object
rfx_mapper_labels
Names of rfx label mapper objects in the overall nlohmann::json object
rfx_groupid_labels
Names of rfx group id objects in the overall nlohmann::json object
new()
Create a new CppJson object.
CppJson$new()
A new CppJson
object.
add_forest()
Convert a forest container to json and add to the current CppJson
object
CppJson$add_forest(forest_samples)
forest_samples
ForestSamples
R class
None
add_random_effects()
Convert a random effects container to json and add to the current CppJson
object
CppJson$add_random_effects(rfx_samples)
rfx_samples
RandomEffectSamples
R class
None
add_scalar()
Add a scalar to the json object under the name "field_name" (with optional subfolder "subfolder_name")
CppJson$add_scalar(field_name, field_value, subfolder_name = NULL)
field_name
The name of the field to be added to json
field_value
Numeric value of the field to be added to json
subfolder_name
(Optional) Name of the subfolder / hierarchy under which to place the value
None
add_integer()
Add a scalar to the json object under the name "field_name" (with optional subfolder "subfolder_name")
CppJson$add_integer(field_name, field_value, subfolder_name = NULL)
field_name
The name of the field to be added to json
field_value
Integer value of the field to be added to json
subfolder_name
(Optional) Name of the subfolder / hierarchy under which to place the value
None
add_boolean()
Add a boolean value to the json object under the name "field_name" (with optional subfolder "subfolder_name")
CppJson$add_boolean(field_name, field_value, subfolder_name = NULL)
field_name
The name of the field to be added to json
field_value
Numeric value of the field to be added to json
subfolder_name
(Optional) Name of the subfolder / hierarchy under which to place the value
None
add_string()
Add a string value to the json object under the name "field_name" (with optional subfolder "subfolder_name")
CppJson$add_string(field_name, field_value, subfolder_name = NULL)
field_name
The name of the field to be added to json
field_value
Numeric value of the field to be added to json
subfolder_name
(Optional) Name of the subfolder / hierarchy under which to place the value
None
add_vector()
Add a vector to the json object under the name "field_name" (with optional subfolder "subfolder_name")
CppJson$add_vector(field_name, field_vector, subfolder_name = NULL)
field_name
The name of the field to be added to json
field_vector
Vector to be stored in json
subfolder_name
(Optional) Name of the subfolder / hierarchy under which to place the value
None
add_integer_vector()
Add an integer vector to the json object under the name "field_name" (with optional subfolder "subfolder_name")
CppJson$add_integer_vector(field_name, field_vector, subfolder_name = NULL)
field_name
The name of the field to be added to json
field_vector
Vector to be stored in json
subfolder_name
(Optional) Name of the subfolder / hierarchy under which to place the value
None
add_string_vector()
Add an array to the json object under the name "field_name" (with optional subfolder "subfolder_name")
CppJson$add_string_vector(field_name, field_vector, subfolder_name = NULL)
field_name
The name of the field to be added to json
field_vector
Character vector to be stored in json
subfolder_name
(Optional) Name of the subfolder / hierarchy under which to place the value
None
add_list()
Add a list of vectors (as an object map of arrays) to the json object under the name "field_name"
CppJson$add_list(field_name, field_list)
field_name
The name of the field to be added to json
field_list
List to be stored in json
None
add_string_list()
Add a list of vectors (as an object map of arrays) to the json object under the name "field_name"
CppJson$add_string_list(field_name, field_list)
field_name
The name of the field to be added to json
field_list
List to be stored in json
None
get_scalar()
Retrieve a scalar value from the json object under the name "field_name" (with optional subfolder "subfolder_name")
CppJson$get_scalar(field_name, subfolder_name = NULL)
field_name
The name of the field to be accessed from json
subfolder_name
(Optional) Name of the subfolder / hierarchy under which the field is stored
None
get_integer()
Retrieve a integer value from the json object under the name "field_name" (with optional subfolder "subfolder_name")
CppJson$get_integer(field_name, subfolder_name = NULL)
field_name
The name of the field to be accessed from json
subfolder_name
(Optional) Name of the subfolder / hierarchy under which the field is stored
None
get_boolean()
Retrieve a boolean value from the json object under the name "field_name" (with optional subfolder "subfolder_name")
CppJson$get_boolean(field_name, subfolder_name = NULL)
field_name
The name of the field to be accessed from json
subfolder_name
(Optional) Name of the subfolder / hierarchy under which the field is stored
None
get_string()
Retrieve a string value from the json object under the name "field_name" (with optional subfolder "subfolder_name")
CppJson$get_string(field_name, subfolder_name = NULL)
field_name
The name of the field to be accessed from json
subfolder_name
(Optional) Name of the subfolder / hierarchy under which the field is stored
None
get_vector()
Retrieve a vector from the json object under the name "field_name" (with optional subfolder "subfolder_name")
CppJson$get_vector(field_name, subfolder_name = NULL)
field_name
The name of the field to be accessed from json
subfolder_name
(Optional) Name of the subfolder / hierarchy under which the field is stored
None
get_integer_vector()
Retrieve an integer vector from the json object under the name "field_name" (with optional subfolder "subfolder_name")
CppJson$get_integer_vector(field_name, subfolder_name = NULL)
field_name
The name of the field to be accessed from json
subfolder_name
(Optional) Name of the subfolder / hierarchy under which the field is stored
None
get_string_vector()
Retrieve a character vector from the json object under the name "field_name" (with optional subfolder "subfolder_name")
CppJson$get_string_vector(field_name, subfolder_name = NULL)
field_name
The name of the field to be accessed from json
subfolder_name
(Optional) Name of the subfolder / hierarchy under which the field is stored
None
get_numeric_list()
Reconstruct a list of numeric vectors from the json object stored under "field_name"
CppJson$get_numeric_list(field_name, key_names)
field_name
The name of the field to be added to json
key_names
Vector of names of list elements (each of which is a vector)
None
get_string_list()
Reconstruct a list of string vectors from the json object stored under "field_name"
CppJson$get_string_list(field_name, key_names)
field_name
The name of the field to be added to json
key_names
Vector of names of list elements (each of which is a vector)
None
return_json_string()
Convert a JSON object to in-memory string
CppJson$return_json_string()
JSON string
save_file()
Save a json object to file
CppJson$save_file(filename)
filename
String of filepath, must end in ".json"
None
load_from_file()
Load a json object from file
CppJson$load_from_file(filename)
filename
String of filepath, must end in ".json"
None
load_from_string()
Load a json object from string
CppJson$load_from_string(json_string)
json_string
JSON string dump
None
Persists a C++ random number generator throughout an R session to
ensure reproducibility from a given random seed. If no seed is provided,
the C++ random number generator is initialized using std::random_device
.
rng_ptr
External pointer to a C++ std::mt19937 class
new()
Create a new CppRNG object.
CppRNG$new(random_seed = -1)
random_seed
(Optional) random seed for sampling
A new CppRNG
object.
Convert a list of (in-memory) JSON representations of a BART model to a single combined BART model object which can be used for prediction, etc...
createBARTModelFromCombinedJson(json_object_list)
createBARTModelFromCombinedJson(json_object_list)
json_object_list |
List of objects of type |
Object of type bartmodel
n <- 100 p <- 5 X <- matrix(runif(n*p), ncol = p) f_XW <- ( ((0 <= X[,1]) & (0.25 > X[,1])) * (-7.5) + ((0.25 <= X[,1]) & (0.5 > X[,1])) * (-2.5) + ((0.5 <= X[,1]) & (0.75 > X[,1])) * (2.5) + ((0.75 <= X[,1]) & (1 > X[,1])) * (7.5) ) noise_sd <- 1 y <- f_XW + rnorm(n, 0, noise_sd) test_set_pct <- 0.2 n_test <- round(test_set_pct*n) n_train <- n - n_test test_inds <- sort(sample(1:n, n_test, replace = FALSE)) train_inds <- (1:n)[!((1:n) %in% test_inds)] X_test <- X[test_inds,] X_train <- X[train_inds,] y_test <- y[test_inds] y_train <- y[train_inds] bart_model <- bart(X_train = X_train, y_train = y_train, num_gfr = 10, num_burnin = 0, num_mcmc = 10) bart_json <- list(saveBARTModelToJson(bart_model)) bart_model_roundtrip <- createBARTModelFromCombinedJson(bart_json)
n <- 100 p <- 5 X <- matrix(runif(n*p), ncol = p) f_XW <- ( ((0 <= X[,1]) & (0.25 > X[,1])) * (-7.5) + ((0.25 <= X[,1]) & (0.5 > X[,1])) * (-2.5) + ((0.5 <= X[,1]) & (0.75 > X[,1])) * (2.5) + ((0.75 <= X[,1]) & (1 > X[,1])) * (7.5) ) noise_sd <- 1 y <- f_XW + rnorm(n, 0, noise_sd) test_set_pct <- 0.2 n_test <- round(test_set_pct*n) n_train <- n - n_test test_inds <- sort(sample(1:n, n_test, replace = FALSE)) train_inds <- (1:n)[!((1:n) %in% test_inds)] X_test <- X[test_inds,] X_train <- X[train_inds,] y_test <- y[test_inds] y_train <- y[train_inds] bart_model <- bart(X_train = X_train, y_train = y_train, num_gfr = 10, num_burnin = 0, num_mcmc = 10) bart_json <- list(saveBARTModelToJson(bart_model)) bart_model_roundtrip <- createBARTModelFromCombinedJson(bart_json)
Convert a list of (in-memory) JSON strings that represent BART models to a single combined BART model object which can be used for prediction, etc...
createBARTModelFromCombinedJsonString(json_string_list)
createBARTModelFromCombinedJsonString(json_string_list)
json_string_list |
List of JSON strings which can be parsed to objects of type |
Object of type bartmodel
n <- 100 p <- 5 X <- matrix(runif(n*p), ncol = p) f_XW <- ( ((0 <= X[,1]) & (0.25 > X[,1])) * (-7.5) + ((0.25 <= X[,1]) & (0.5 > X[,1])) * (-2.5) + ((0.5 <= X[,1]) & (0.75 > X[,1])) * (2.5) + ((0.75 <= X[,1]) & (1 > X[,1])) * (7.5) ) noise_sd <- 1 y <- f_XW + rnorm(n, 0, noise_sd) test_set_pct <- 0.2 n_test <- round(test_set_pct*n) n_train <- n - n_test test_inds <- sort(sample(1:n, n_test, replace = FALSE)) train_inds <- (1:n)[!((1:n) %in% test_inds)] X_test <- X[test_inds,] X_train <- X[train_inds,] y_test <- y[test_inds] y_train <- y[train_inds] bart_model <- bart(X_train = X_train, y_train = y_train, num_gfr = 10, num_burnin = 0, num_mcmc = 10) bart_json_string_list <- list(saveBARTModelToJsonString(bart_model)) bart_model_roundtrip <- createBARTModelFromCombinedJsonString(bart_json_string_list)
n <- 100 p <- 5 X <- matrix(runif(n*p), ncol = p) f_XW <- ( ((0 <= X[,1]) & (0.25 > X[,1])) * (-7.5) + ((0.25 <= X[,1]) & (0.5 > X[,1])) * (-2.5) + ((0.5 <= X[,1]) & (0.75 > X[,1])) * (2.5) + ((0.75 <= X[,1]) & (1 > X[,1])) * (7.5) ) noise_sd <- 1 y <- f_XW + rnorm(n, 0, noise_sd) test_set_pct <- 0.2 n_test <- round(test_set_pct*n) n_train <- n - n_test test_inds <- sort(sample(1:n, n_test, replace = FALSE)) train_inds <- (1:n)[!((1:n) %in% test_inds)] X_test <- X[test_inds,] X_train <- X[train_inds,] y_test <- y[test_inds] y_train <- y[train_inds] bart_model <- bart(X_train = X_train, y_train = y_train, num_gfr = 10, num_burnin = 0, num_mcmc = 10) bart_json_string_list <- list(saveBARTModelToJsonString(bart_model)) bart_model_roundtrip <- createBARTModelFromCombinedJsonString(bart_json_string_list)
Convert an (in-memory) JSON representation of a BART model to a BART model object which can be used for prediction, etc...
createBARTModelFromJson(json_object)
createBARTModelFromJson(json_object)
json_object |
Object of type |
Object of type bartmodel
n <- 100 p <- 5 X <- matrix(runif(n*p), ncol = p) f_XW <- ( ((0 <= X[,1]) & (0.25 > X[,1])) * (-7.5) + ((0.25 <= X[,1]) & (0.5 > X[,1])) * (-2.5) + ((0.5 <= X[,1]) & (0.75 > X[,1])) * (2.5) + ((0.75 <= X[,1]) & (1 > X[,1])) * (7.5) ) noise_sd <- 1 y <- f_XW + rnorm(n, 0, noise_sd) test_set_pct <- 0.2 n_test <- round(test_set_pct*n) n_train <- n - n_test test_inds <- sort(sample(1:n, n_test, replace = FALSE)) train_inds <- (1:n)[!((1:n) %in% test_inds)] X_test <- X[test_inds,] X_train <- X[train_inds,] y_test <- y[test_inds] y_train <- y[train_inds] bart_model <- bart(X_train = X_train, y_train = y_train, num_gfr = 10, num_burnin = 0, num_mcmc = 10) bart_json <- saveBARTModelToJson(bart_model) bart_model_roundtrip <- createBARTModelFromJson(bart_json)
n <- 100 p <- 5 X <- matrix(runif(n*p), ncol = p) f_XW <- ( ((0 <= X[,1]) & (0.25 > X[,1])) * (-7.5) + ((0.25 <= X[,1]) & (0.5 > X[,1])) * (-2.5) + ((0.5 <= X[,1]) & (0.75 > X[,1])) * (2.5) + ((0.75 <= X[,1]) & (1 > X[,1])) * (7.5) ) noise_sd <- 1 y <- f_XW + rnorm(n, 0, noise_sd) test_set_pct <- 0.2 n_test <- round(test_set_pct*n) n_train <- n - n_test test_inds <- sort(sample(1:n, n_test, replace = FALSE)) train_inds <- (1:n)[!((1:n) %in% test_inds)] X_test <- X[test_inds,] X_train <- X[train_inds,] y_test <- y[test_inds] y_train <- y[train_inds] bart_model <- bart(X_train = X_train, y_train = y_train, num_gfr = 10, num_burnin = 0, num_mcmc = 10) bart_json <- saveBARTModelToJson(bart_model) bart_model_roundtrip <- createBARTModelFromJson(bart_json)
Convert a JSON file containing sample information on a trained BART model to a BART model object which can be used for prediction, etc...
createBARTModelFromJsonFile(json_filename)
createBARTModelFromJsonFile(json_filename)
json_filename |
String of filepath, must end in ".json" |
Object of type bartmodel
n <- 100 p <- 5 X <- matrix(runif(n*p), ncol = p) f_XW <- ( ((0 <= X[,1]) & (0.25 > X[,1])) * (-7.5) + ((0.25 <= X[,1]) & (0.5 > X[,1])) * (-2.5) + ((0.5 <= X[,1]) & (0.75 > X[,1])) * (2.5) + ((0.75 <= X[,1]) & (1 > X[,1])) * (7.5) ) noise_sd <- 1 y <- f_XW + rnorm(n, 0, noise_sd) test_set_pct <- 0.2 n_test <- round(test_set_pct*n) n_train <- n - n_test test_inds <- sort(sample(1:n, n_test, replace = FALSE)) train_inds <- (1:n)[!((1:n) %in% test_inds)] X_test <- X[test_inds,] X_train <- X[train_inds,] y_test <- y[test_inds] y_train <- y[train_inds] bart_model <- bart(X_train = X_train, y_train = y_train, num_gfr = 10, num_burnin = 0, num_mcmc = 10) tmpjson <- tempfile(fileext = ".json") saveBARTModelToJsonFile(bart_model, file.path(tmpjson)) bart_model_roundtrip <- createBARTModelFromJsonFile(file.path(tmpjson)) unlink(tmpjson)
n <- 100 p <- 5 X <- matrix(runif(n*p), ncol = p) f_XW <- ( ((0 <= X[,1]) & (0.25 > X[,1])) * (-7.5) + ((0.25 <= X[,1]) & (0.5 > X[,1])) * (-2.5) + ((0.5 <= X[,1]) & (0.75 > X[,1])) * (2.5) + ((0.75 <= X[,1]) & (1 > X[,1])) * (7.5) ) noise_sd <- 1 y <- f_XW + rnorm(n, 0, noise_sd) test_set_pct <- 0.2 n_test <- round(test_set_pct*n) n_train <- n - n_test test_inds <- sort(sample(1:n, n_test, replace = FALSE)) train_inds <- (1:n)[!((1:n) %in% test_inds)] X_test <- X[test_inds,] X_train <- X[train_inds,] y_test <- y[test_inds] y_train <- y[train_inds] bart_model <- bart(X_train = X_train, y_train = y_train, num_gfr = 10, num_burnin = 0, num_mcmc = 10) tmpjson <- tempfile(fileext = ".json") saveBARTModelToJsonFile(bart_model, file.path(tmpjson)) bart_model_roundtrip <- createBARTModelFromJsonFile(file.path(tmpjson)) unlink(tmpjson)
Convert a JSON string containing sample information on a trained BART model to a BART model object which can be used for prediction, etc...
createBARTModelFromJsonString(json_string)
createBARTModelFromJsonString(json_string)
json_string |
JSON string dump |
Object of type bartmodel
n <- 100 p <- 5 X <- matrix(runif(n*p), ncol = p) f_XW <- ( ((0 <= X[,1]) & (0.25 > X[,1])) * (-7.5) + ((0.25 <= X[,1]) & (0.5 > X[,1])) * (-2.5) + ((0.5 <= X[,1]) & (0.75 > X[,1])) * (2.5) + ((0.75 <= X[,1]) & (1 > X[,1])) * (7.5) ) noise_sd <- 1 y <- f_XW + rnorm(n, 0, noise_sd) test_set_pct <- 0.2 n_test <- round(test_set_pct*n) n_train <- n - n_test test_inds <- sort(sample(1:n, n_test, replace = FALSE)) train_inds <- (1:n)[!((1:n) %in% test_inds)] X_test <- X[test_inds,] X_train <- X[train_inds,] y_test <- y[test_inds] y_train <- y[train_inds] bart_model <- bart(X_train = X_train, y_train = y_train, num_gfr = 10, num_burnin = 0, num_mcmc = 10) bart_json <- saveBARTModelToJsonString(bart_model) bart_model_roundtrip <- createBARTModelFromJsonString(bart_json) y_hat_mean_roundtrip <- rowMeans(predict(bart_model_roundtrip, X_train)$y_hat)
n <- 100 p <- 5 X <- matrix(runif(n*p), ncol = p) f_XW <- ( ((0 <= X[,1]) & (0.25 > X[,1])) * (-7.5) + ((0.25 <= X[,1]) & (0.5 > X[,1])) * (-2.5) + ((0.5 <= X[,1]) & (0.75 > X[,1])) * (2.5) + ((0.75 <= X[,1]) & (1 > X[,1])) * (7.5) ) noise_sd <- 1 y <- f_XW + rnorm(n, 0, noise_sd) test_set_pct <- 0.2 n_test <- round(test_set_pct*n) n_train <- n - n_test test_inds <- sort(sample(1:n, n_test, replace = FALSE)) train_inds <- (1:n)[!((1:n) %in% test_inds)] X_test <- X[test_inds,] X_train <- X[train_inds,] y_test <- y[test_inds] y_train <- y[train_inds] bart_model <- bart(X_train = X_train, y_train = y_train, num_gfr = 10, num_burnin = 0, num_mcmc = 10) bart_json <- saveBARTModelToJsonString(bart_model) bart_model_roundtrip <- createBARTModelFromJsonString(bart_json) y_hat_mean_roundtrip <- rowMeans(predict(bart_model_roundtrip, X_train)$y_hat)
Convert a list of (in-memory) JSON strings that represent BCF models to a single combined BCF model object which can be used for prediction, etc...
createBCFModelFromCombinedJson(json_object_list)
createBCFModelFromCombinedJson(json_object_list)
json_object_list |
List of objects of type |
Object of type bcfmodel
n <- 500 p <- 5 X <- matrix(runif(n*p), ncol = p) mu_x <- ( ((0 <= X[,1]) & (0.25 > X[,1])) * (-7.5) + ((0.25 <= X[,1]) & (0.5 > X[,1])) * (-2.5) + ((0.5 <= X[,1]) & (0.75 > X[,1])) * (2.5) + ((0.75 <= X[,1]) & (1 > X[,1])) * (7.5) ) pi_x <- ( ((0 <= X[,1]) & (0.25 > X[,1])) * (0.2) + ((0.25 <= X[,1]) & (0.5 > X[,1])) * (0.4) + ((0.5 <= X[,1]) & (0.75 > X[,1])) * (0.6) + ((0.75 <= X[,1]) & (1 > X[,1])) * (0.8) ) tau_x <- ( ((0 <= X[,2]) & (0.25 > X[,2])) * (0.5) + ((0.25 <= X[,2]) & (0.5 > X[,2])) * (1.0) + ((0.5 <= X[,2]) & (0.75 > X[,2])) * (1.5) + ((0.75 <= X[,2]) & (1 > X[,2])) * (2.0) ) Z <- rbinom(n, 1, pi_x) E_XZ <- mu_x + Z*tau_x snr <- 3 rfx_group_ids <- rep(c(1,2), n %/% 2) rfx_coefs <- matrix(c(-1, -1, 1, 1), nrow=2, byrow=TRUE) rfx_basis <- cbind(1, runif(n, -1, 1)) rfx_term <- rowSums(rfx_coefs[rfx_group_ids,] * rfx_basis) y <- E_XZ + rfx_term + rnorm(n, 0, 1)*(sd(E_XZ)/snr) test_set_pct <- 0.2 n_test <- round(test_set_pct*n) n_train <- n - n_test test_inds <- sort(sample(1:n, n_test, replace = FALSE)) train_inds <- (1:n)[!((1:n) %in% test_inds)] X_test <- X[test_inds,] X_train <- X[train_inds,] pi_test <- pi_x[test_inds] pi_train <- pi_x[train_inds] Z_test <- Z[test_inds] Z_train <- Z[train_inds] y_test <- y[test_inds] y_train <- y[train_inds] mu_test <- mu_x[test_inds] mu_train <- mu_x[train_inds] tau_test <- tau_x[test_inds] tau_train <- tau_x[train_inds] rfx_group_ids_test <- rfx_group_ids[test_inds] rfx_group_ids_train <- rfx_group_ids[train_inds] rfx_basis_test <- rfx_basis[test_inds,] rfx_basis_train <- rfx_basis[train_inds,] rfx_term_test <- rfx_term[test_inds] rfx_term_train <- rfx_term[train_inds] bcf_model <- bcf(X_train = X_train, Z_train = Z_train, y_train = y_train, propensity_train = pi_train, rfx_group_ids_train = rfx_group_ids_train, rfx_basis_train = rfx_basis_train, X_test = X_test, Z_test = Z_test, propensity_test = pi_test, rfx_group_ids_test = rfx_group_ids_test, rfx_basis_test = rfx_basis_test, num_gfr = 10, num_burnin = 0, num_mcmc = 10) bcf_json_list <- list(saveBCFModelToJson(bcf_model)) bcf_model_roundtrip <- createBCFModelFromCombinedJson(bcf_json_list)
n <- 500 p <- 5 X <- matrix(runif(n*p), ncol = p) mu_x <- ( ((0 <= X[,1]) & (0.25 > X[,1])) * (-7.5) + ((0.25 <= X[,1]) & (0.5 > X[,1])) * (-2.5) + ((0.5 <= X[,1]) & (0.75 > X[,1])) * (2.5) + ((0.75 <= X[,1]) & (1 > X[,1])) * (7.5) ) pi_x <- ( ((0 <= X[,1]) & (0.25 > X[,1])) * (0.2) + ((0.25 <= X[,1]) & (0.5 > X[,1])) * (0.4) + ((0.5 <= X[,1]) & (0.75 > X[,1])) * (0.6) + ((0.75 <= X[,1]) & (1 > X[,1])) * (0.8) ) tau_x <- ( ((0 <= X[,2]) & (0.25 > X[,2])) * (0.5) + ((0.25 <= X[,2]) & (0.5 > X[,2])) * (1.0) + ((0.5 <= X[,2]) & (0.75 > X[,2])) * (1.5) + ((0.75 <= X[,2]) & (1 > X[,2])) * (2.0) ) Z <- rbinom(n, 1, pi_x) E_XZ <- mu_x + Z*tau_x snr <- 3 rfx_group_ids <- rep(c(1,2), n %/% 2) rfx_coefs <- matrix(c(-1, -1, 1, 1), nrow=2, byrow=TRUE) rfx_basis <- cbind(1, runif(n, -1, 1)) rfx_term <- rowSums(rfx_coefs[rfx_group_ids,] * rfx_basis) y <- E_XZ + rfx_term + rnorm(n, 0, 1)*(sd(E_XZ)/snr) test_set_pct <- 0.2 n_test <- round(test_set_pct*n) n_train <- n - n_test test_inds <- sort(sample(1:n, n_test, replace = FALSE)) train_inds <- (1:n)[!((1:n) %in% test_inds)] X_test <- X[test_inds,] X_train <- X[train_inds,] pi_test <- pi_x[test_inds] pi_train <- pi_x[train_inds] Z_test <- Z[test_inds] Z_train <- Z[train_inds] y_test <- y[test_inds] y_train <- y[train_inds] mu_test <- mu_x[test_inds] mu_train <- mu_x[train_inds] tau_test <- tau_x[test_inds] tau_train <- tau_x[train_inds] rfx_group_ids_test <- rfx_group_ids[test_inds] rfx_group_ids_train <- rfx_group_ids[train_inds] rfx_basis_test <- rfx_basis[test_inds,] rfx_basis_train <- rfx_basis[train_inds,] rfx_term_test <- rfx_term[test_inds] rfx_term_train <- rfx_term[train_inds] bcf_model <- bcf(X_train = X_train, Z_train = Z_train, y_train = y_train, propensity_train = pi_train, rfx_group_ids_train = rfx_group_ids_train, rfx_basis_train = rfx_basis_train, X_test = X_test, Z_test = Z_test, propensity_test = pi_test, rfx_group_ids_test = rfx_group_ids_test, rfx_basis_test = rfx_basis_test, num_gfr = 10, num_burnin = 0, num_mcmc = 10) bcf_json_list <- list(saveBCFModelToJson(bcf_model)) bcf_model_roundtrip <- createBCFModelFromCombinedJson(bcf_json_list)
Convert a list of (in-memory) JSON strings that represent BCF models to a single combined BCF model object which can be used for prediction, etc...
createBCFModelFromCombinedJsonString(json_string_list)
createBCFModelFromCombinedJsonString(json_string_list)
json_string_list |
List of JSON strings which can be parsed to objects of type |
Object of type bcfmodel
n <- 500 p <- 5 X <- matrix(runif(n*p), ncol = p) mu_x <- ( ((0 <= X[,1]) & (0.25 > X[,1])) * (-7.5) + ((0.25 <= X[,1]) & (0.5 > X[,1])) * (-2.5) + ((0.5 <= X[,1]) & (0.75 > X[,1])) * (2.5) + ((0.75 <= X[,1]) & (1 > X[,1])) * (7.5) ) pi_x <- ( ((0 <= X[,1]) & (0.25 > X[,1])) * (0.2) + ((0.25 <= X[,1]) & (0.5 > X[,1])) * (0.4) + ((0.5 <= X[,1]) & (0.75 > X[,1])) * (0.6) + ((0.75 <= X[,1]) & (1 > X[,1])) * (0.8) ) tau_x <- ( ((0 <= X[,2]) & (0.25 > X[,2])) * (0.5) + ((0.25 <= X[,2]) & (0.5 > X[,2])) * (1.0) + ((0.5 <= X[,2]) & (0.75 > X[,2])) * (1.5) + ((0.75 <= X[,2]) & (1 > X[,2])) * (2.0) ) Z <- rbinom(n, 1, pi_x) E_XZ <- mu_x + Z*tau_x snr <- 3 rfx_group_ids <- rep(c(1,2), n %/% 2) rfx_coefs <- matrix(c(-1, -1, 1, 1), nrow=2, byrow=TRUE) rfx_basis <- cbind(1, runif(n, -1, 1)) rfx_term <- rowSums(rfx_coefs[rfx_group_ids,] * rfx_basis) y <- E_XZ + rfx_term + rnorm(n, 0, 1)*(sd(E_XZ)/snr) test_set_pct <- 0.2 n_test <- round(test_set_pct*n) n_train <- n - n_test test_inds <- sort(sample(1:n, n_test, replace = FALSE)) train_inds <- (1:n)[!((1:n) %in% test_inds)] X_test <- X[test_inds,] X_train <- X[train_inds,] pi_test <- pi_x[test_inds] pi_train <- pi_x[train_inds] Z_test <- Z[test_inds] Z_train <- Z[train_inds] y_test <- y[test_inds] y_train <- y[train_inds] mu_test <- mu_x[test_inds] mu_train <- mu_x[train_inds] tau_test <- tau_x[test_inds] tau_train <- tau_x[train_inds] rfx_group_ids_test <- rfx_group_ids[test_inds] rfx_group_ids_train <- rfx_group_ids[train_inds] rfx_basis_test <- rfx_basis[test_inds,] rfx_basis_train <- rfx_basis[train_inds,] rfx_term_test <- rfx_term[test_inds] rfx_term_train <- rfx_term[train_inds] bcf_model <- bcf(X_train = X_train, Z_train = Z_train, y_train = y_train, propensity_train = pi_train, rfx_group_ids_train = rfx_group_ids_train, rfx_basis_train = rfx_basis_train, X_test = X_test, Z_test = Z_test, propensity_test = pi_test, rfx_group_ids_test = rfx_group_ids_test, rfx_basis_test = rfx_basis_test, num_gfr = 10, num_burnin = 0, num_mcmc = 10) bcf_json_string_list <- list(saveBCFModelToJsonString(bcf_model)) bcf_model_roundtrip <- createBCFModelFromCombinedJsonString(bcf_json_string_list)
n <- 500 p <- 5 X <- matrix(runif(n*p), ncol = p) mu_x <- ( ((0 <= X[,1]) & (0.25 > X[,1])) * (-7.5) + ((0.25 <= X[,1]) & (0.5 > X[,1])) * (-2.5) + ((0.5 <= X[,1]) & (0.75 > X[,1])) * (2.5) + ((0.75 <= X[,1]) & (1 > X[,1])) * (7.5) ) pi_x <- ( ((0 <= X[,1]) & (0.25 > X[,1])) * (0.2) + ((0.25 <= X[,1]) & (0.5 > X[,1])) * (0.4) + ((0.5 <= X[,1]) & (0.75 > X[,1])) * (0.6) + ((0.75 <= X[,1]) & (1 > X[,1])) * (0.8) ) tau_x <- ( ((0 <= X[,2]) & (0.25 > X[,2])) * (0.5) + ((0.25 <= X[,2]) & (0.5 > X[,2])) * (1.0) + ((0.5 <= X[,2]) & (0.75 > X[,2])) * (1.5) + ((0.75 <= X[,2]) & (1 > X[,2])) * (2.0) ) Z <- rbinom(n, 1, pi_x) E_XZ <- mu_x + Z*tau_x snr <- 3 rfx_group_ids <- rep(c(1,2), n %/% 2) rfx_coefs <- matrix(c(-1, -1, 1, 1), nrow=2, byrow=TRUE) rfx_basis <- cbind(1, runif(n, -1, 1)) rfx_term <- rowSums(rfx_coefs[rfx_group_ids,] * rfx_basis) y <- E_XZ + rfx_term + rnorm(n, 0, 1)*(sd(E_XZ)/snr) test_set_pct <- 0.2 n_test <- round(test_set_pct*n) n_train <- n - n_test test_inds <- sort(sample(1:n, n_test, replace = FALSE)) train_inds <- (1:n)[!((1:n) %in% test_inds)] X_test <- X[test_inds,] X_train <- X[train_inds,] pi_test <- pi_x[test_inds] pi_train <- pi_x[train_inds] Z_test <- Z[test_inds] Z_train <- Z[train_inds] y_test <- y[test_inds] y_train <- y[train_inds] mu_test <- mu_x[test_inds] mu_train <- mu_x[train_inds] tau_test <- tau_x[test_inds] tau_train <- tau_x[train_inds] rfx_group_ids_test <- rfx_group_ids[test_inds] rfx_group_ids_train <- rfx_group_ids[train_inds] rfx_basis_test <- rfx_basis[test_inds,] rfx_basis_train <- rfx_basis[train_inds,] rfx_term_test <- rfx_term[test_inds] rfx_term_train <- rfx_term[train_inds] bcf_model <- bcf(X_train = X_train, Z_train = Z_train, y_train = y_train, propensity_train = pi_train, rfx_group_ids_train = rfx_group_ids_train, rfx_basis_train = rfx_basis_train, X_test = X_test, Z_test = Z_test, propensity_test = pi_test, rfx_group_ids_test = rfx_group_ids_test, rfx_basis_test = rfx_basis_test, num_gfr = 10, num_burnin = 0, num_mcmc = 10) bcf_json_string_list <- list(saveBCFModelToJsonString(bcf_model)) bcf_model_roundtrip <- createBCFModelFromCombinedJsonString(bcf_json_string_list)
Convert an (in-memory) JSON representation of a BCF model to a BCF model object which can be used for prediction, etc...
createBCFModelFromJson(json_object)
createBCFModelFromJson(json_object)
json_object |
Object of type |
Object of type bcfmodel
n <- 500 p <- 5 X <- matrix(runif(n*p), ncol = p) mu_x <- ( ((0 <= X[,1]) & (0.25 > X[,1])) * (-7.5) + ((0.25 <= X[,1]) & (0.5 > X[,1])) * (-2.5) + ((0.5 <= X[,1]) & (0.75 > X[,1])) * (2.5) + ((0.75 <= X[,1]) & (1 > X[,1])) * (7.5) ) pi_x <- ( ((0 <= X[,1]) & (0.25 > X[,1])) * (0.2) + ((0.25 <= X[,1]) & (0.5 > X[,1])) * (0.4) + ((0.5 <= X[,1]) & (0.75 > X[,1])) * (0.6) + ((0.75 <= X[,1]) & (1 > X[,1])) * (0.8) ) tau_x <- ( ((0 <= X[,2]) & (0.25 > X[,2])) * (0.5) + ((0.25 <= X[,2]) & (0.5 > X[,2])) * (1.0) + ((0.5 <= X[,2]) & (0.75 > X[,2])) * (1.5) + ((0.75 <= X[,2]) & (1 > X[,2])) * (2.0) ) Z <- rbinom(n, 1, pi_x) E_XZ <- mu_x + Z*tau_x snr <- 3 rfx_group_ids <- rep(c(1,2), n %/% 2) rfx_coefs <- matrix(c(-1, -1, 1, 1), nrow=2, byrow=TRUE) rfx_basis <- cbind(1, runif(n, -1, 1)) rfx_term <- rowSums(rfx_coefs[rfx_group_ids,] * rfx_basis) y <- E_XZ + rfx_term + rnorm(n, 0, 1)*(sd(E_XZ)/snr) test_set_pct <- 0.2 n_test <- round(test_set_pct*n) n_train <- n - n_test test_inds <- sort(sample(1:n, n_test, replace = FALSE)) train_inds <- (1:n)[!((1:n) %in% test_inds)] X_test <- X[test_inds,] X_train <- X[train_inds,] pi_test <- pi_x[test_inds] pi_train <- pi_x[train_inds] Z_test <- Z[test_inds] Z_train <- Z[train_inds] y_test <- y[test_inds] y_train <- y[train_inds] mu_test <- mu_x[test_inds] mu_train <- mu_x[train_inds] tau_test <- tau_x[test_inds] tau_train <- tau_x[train_inds] rfx_group_ids_test <- rfx_group_ids[test_inds] rfx_group_ids_train <- rfx_group_ids[train_inds] rfx_basis_test <- rfx_basis[test_inds,] rfx_basis_train <- rfx_basis[train_inds,] rfx_term_test <- rfx_term[test_inds] rfx_term_train <- rfx_term[train_inds] mu_params <- list(sample_sigma_leaf = TRUE) tau_params <- list(sample_sigma_leaf = FALSE) bcf_model <- bcf(X_train = X_train, Z_train = Z_train, y_train = y_train, propensity_train = pi_train, rfx_group_ids_train = rfx_group_ids_train, rfx_basis_train = rfx_basis_train, X_test = X_test, Z_test = Z_test, propensity_test = pi_test, rfx_group_ids_test = rfx_group_ids_test, rfx_basis_test = rfx_basis_test, num_gfr = 10, num_burnin = 0, num_mcmc = 10, prognostic_forest_params = mu_params, treatment_effect_forest_params = tau_params) bcf_json <- saveBCFModelToJson(bcf_model) bcf_model_roundtrip <- createBCFModelFromJson(bcf_json)
n <- 500 p <- 5 X <- matrix(runif(n*p), ncol = p) mu_x <- ( ((0 <= X[,1]) & (0.25 > X[,1])) * (-7.5) + ((0.25 <= X[,1]) & (0.5 > X[,1])) * (-2.5) + ((0.5 <= X[,1]) & (0.75 > X[,1])) * (2.5) + ((0.75 <= X[,1]) & (1 > X[,1])) * (7.5) ) pi_x <- ( ((0 <= X[,1]) & (0.25 > X[,1])) * (0.2) + ((0.25 <= X[,1]) & (0.5 > X[,1])) * (0.4) + ((0.5 <= X[,1]) & (0.75 > X[,1])) * (0.6) + ((0.75 <= X[,1]) & (1 > X[,1])) * (0.8) ) tau_x <- ( ((0 <= X[,2]) & (0.25 > X[,2])) * (0.5) + ((0.25 <= X[,2]) & (0.5 > X[,2])) * (1.0) + ((0.5 <= X[,2]) & (0.75 > X[,2])) * (1.5) + ((0.75 <= X[,2]) & (1 > X[,2])) * (2.0) ) Z <- rbinom(n, 1, pi_x) E_XZ <- mu_x + Z*tau_x snr <- 3 rfx_group_ids <- rep(c(1,2), n %/% 2) rfx_coefs <- matrix(c(-1, -1, 1, 1), nrow=2, byrow=TRUE) rfx_basis <- cbind(1, runif(n, -1, 1)) rfx_term <- rowSums(rfx_coefs[rfx_group_ids,] * rfx_basis) y <- E_XZ + rfx_term + rnorm(n, 0, 1)*(sd(E_XZ)/snr) test_set_pct <- 0.2 n_test <- round(test_set_pct*n) n_train <- n - n_test test_inds <- sort(sample(1:n, n_test, replace = FALSE)) train_inds <- (1:n)[!((1:n) %in% test_inds)] X_test <- X[test_inds,] X_train <- X[train_inds,] pi_test <- pi_x[test_inds] pi_train <- pi_x[train_inds] Z_test <- Z[test_inds] Z_train <- Z[train_inds] y_test <- y[test_inds] y_train <- y[train_inds] mu_test <- mu_x[test_inds] mu_train <- mu_x[train_inds] tau_test <- tau_x[test_inds] tau_train <- tau_x[train_inds] rfx_group_ids_test <- rfx_group_ids[test_inds] rfx_group_ids_train <- rfx_group_ids[train_inds] rfx_basis_test <- rfx_basis[test_inds,] rfx_basis_train <- rfx_basis[train_inds,] rfx_term_test <- rfx_term[test_inds] rfx_term_train <- rfx_term[train_inds] mu_params <- list(sample_sigma_leaf = TRUE) tau_params <- list(sample_sigma_leaf = FALSE) bcf_model <- bcf(X_train = X_train, Z_train = Z_train, y_train = y_train, propensity_train = pi_train, rfx_group_ids_train = rfx_group_ids_train, rfx_basis_train = rfx_basis_train, X_test = X_test, Z_test = Z_test, propensity_test = pi_test, rfx_group_ids_test = rfx_group_ids_test, rfx_basis_test = rfx_basis_test, num_gfr = 10, num_burnin = 0, num_mcmc = 10, prognostic_forest_params = mu_params, treatment_effect_forest_params = tau_params) bcf_json <- saveBCFModelToJson(bcf_model) bcf_model_roundtrip <- createBCFModelFromJson(bcf_json)
Convert a JSON file containing sample information on a trained BCF model to a BCF model object which can be used for prediction, etc...
createBCFModelFromJsonFile(json_filename)
createBCFModelFromJsonFile(json_filename)
json_filename |
String of filepath, must end in ".json" |
Object of type bcfmodel
n <- 500 p <- 5 X <- matrix(runif(n*p), ncol = p) mu_x <- ( ((0 <= X[,1]) & (0.25 > X[,1])) * (-7.5) + ((0.25 <= X[,1]) & (0.5 > X[,1])) * (-2.5) + ((0.5 <= X[,1]) & (0.75 > X[,1])) * (2.5) + ((0.75 <= X[,1]) & (1 > X[,1])) * (7.5) ) pi_x <- ( ((0 <= X[,1]) & (0.25 > X[,1])) * (0.2) + ((0.25 <= X[,1]) & (0.5 > X[,1])) * (0.4) + ((0.5 <= X[,1]) & (0.75 > X[,1])) * (0.6) + ((0.75 <= X[,1]) & (1 > X[,1])) * (0.8) ) tau_x <- ( ((0 <= X[,2]) & (0.25 > X[,2])) * (0.5) + ((0.25 <= X[,2]) & (0.5 > X[,2])) * (1.0) + ((0.5 <= X[,2]) & (0.75 > X[,2])) * (1.5) + ((0.75 <= X[,2]) & (1 > X[,2])) * (2.0) ) Z <- rbinom(n, 1, pi_x) E_XZ <- mu_x + Z*tau_x snr <- 3 rfx_group_ids <- rep(c(1,2), n %/% 2) rfx_coefs <- matrix(c(-1, -1, 1, 1), nrow=2, byrow=TRUE) rfx_basis <- cbind(1, runif(n, -1, 1)) rfx_term <- rowSums(rfx_coefs[rfx_group_ids,] * rfx_basis) y <- E_XZ + rfx_term + rnorm(n, 0, 1)*(sd(E_XZ)/snr) test_set_pct <- 0.2 n_test <- round(test_set_pct*n) n_train <- n - n_test test_inds <- sort(sample(1:n, n_test, replace = FALSE)) train_inds <- (1:n)[!((1:n) %in% test_inds)] X_test <- X[test_inds,] X_train <- X[train_inds,] pi_test <- pi_x[test_inds] pi_train <- pi_x[train_inds] Z_test <- Z[test_inds] Z_train <- Z[train_inds] y_test <- y[test_inds] y_train <- y[train_inds] mu_test <- mu_x[test_inds] mu_train <- mu_x[train_inds] tau_test <- tau_x[test_inds] tau_train <- tau_x[train_inds] rfx_group_ids_test <- rfx_group_ids[test_inds] rfx_group_ids_train <- rfx_group_ids[train_inds] rfx_basis_test <- rfx_basis[test_inds,] rfx_basis_train <- rfx_basis[train_inds,] rfx_term_test <- rfx_term[test_inds] rfx_term_train <- rfx_term[train_inds] mu_params <- list(sample_sigma_leaf = TRUE) tau_params <- list(sample_sigma_leaf = FALSE) bcf_model <- bcf(X_train = X_train, Z_train = Z_train, y_train = y_train, propensity_train = pi_train, rfx_group_ids_train = rfx_group_ids_train, rfx_basis_train = rfx_basis_train, X_test = X_test, Z_test = Z_test, propensity_test = pi_test, rfx_group_ids_test = rfx_group_ids_test, rfx_basis_test = rfx_basis_test, num_gfr = 10, num_burnin = 0, num_mcmc = 10, prognostic_forest_params = mu_params, treatment_effect_forest_params = tau_params) tmpjson <- tempfile(fileext = ".json") saveBCFModelToJsonFile(bcf_model, file.path(tmpjson)) bcf_model_roundtrip <- createBCFModelFromJsonFile(file.path(tmpjson)) unlink(tmpjson)
n <- 500 p <- 5 X <- matrix(runif(n*p), ncol = p) mu_x <- ( ((0 <= X[,1]) & (0.25 > X[,1])) * (-7.5) + ((0.25 <= X[,1]) & (0.5 > X[,1])) * (-2.5) + ((0.5 <= X[,1]) & (0.75 > X[,1])) * (2.5) + ((0.75 <= X[,1]) & (1 > X[,1])) * (7.5) ) pi_x <- ( ((0 <= X[,1]) & (0.25 > X[,1])) * (0.2) + ((0.25 <= X[,1]) & (0.5 > X[,1])) * (0.4) + ((0.5 <= X[,1]) & (0.75 > X[,1])) * (0.6) + ((0.75 <= X[,1]) & (1 > X[,1])) * (0.8) ) tau_x <- ( ((0 <= X[,2]) & (0.25 > X[,2])) * (0.5) + ((0.25 <= X[,2]) & (0.5 > X[,2])) * (1.0) + ((0.5 <= X[,2]) & (0.75 > X[,2])) * (1.5) + ((0.75 <= X[,2]) & (1 > X[,2])) * (2.0) ) Z <- rbinom(n, 1, pi_x) E_XZ <- mu_x + Z*tau_x snr <- 3 rfx_group_ids <- rep(c(1,2), n %/% 2) rfx_coefs <- matrix(c(-1, -1, 1, 1), nrow=2, byrow=TRUE) rfx_basis <- cbind(1, runif(n, -1, 1)) rfx_term <- rowSums(rfx_coefs[rfx_group_ids,] * rfx_basis) y <- E_XZ + rfx_term + rnorm(n, 0, 1)*(sd(E_XZ)/snr) test_set_pct <- 0.2 n_test <- round(test_set_pct*n) n_train <- n - n_test test_inds <- sort(sample(1:n, n_test, replace = FALSE)) train_inds <- (1:n)[!((1:n) %in% test_inds)] X_test <- X[test_inds,] X_train <- X[train_inds,] pi_test <- pi_x[test_inds] pi_train <- pi_x[train_inds] Z_test <- Z[test_inds] Z_train <- Z[train_inds] y_test <- y[test_inds] y_train <- y[train_inds] mu_test <- mu_x[test_inds] mu_train <- mu_x[train_inds] tau_test <- tau_x[test_inds] tau_train <- tau_x[train_inds] rfx_group_ids_test <- rfx_group_ids[test_inds] rfx_group_ids_train <- rfx_group_ids[train_inds] rfx_basis_test <- rfx_basis[test_inds,] rfx_basis_train <- rfx_basis[train_inds,] rfx_term_test <- rfx_term[test_inds] rfx_term_train <- rfx_term[train_inds] mu_params <- list(sample_sigma_leaf = TRUE) tau_params <- list(sample_sigma_leaf = FALSE) bcf_model <- bcf(X_train = X_train, Z_train = Z_train, y_train = y_train, propensity_train = pi_train, rfx_group_ids_train = rfx_group_ids_train, rfx_basis_train = rfx_basis_train, X_test = X_test, Z_test = Z_test, propensity_test = pi_test, rfx_group_ids_test = rfx_group_ids_test, rfx_basis_test = rfx_basis_test, num_gfr = 10, num_burnin = 0, num_mcmc = 10, prognostic_forest_params = mu_params, treatment_effect_forest_params = tau_params) tmpjson <- tempfile(fileext = ".json") saveBCFModelToJsonFile(bcf_model, file.path(tmpjson)) bcf_model_roundtrip <- createBCFModelFromJsonFile(file.path(tmpjson)) unlink(tmpjson)
Convert a JSON string containing sample information on a trained BCF model to a BCF model object which can be used for prediction, etc...
createBCFModelFromJsonString(json_string)
createBCFModelFromJsonString(json_string)
json_string |
JSON string dump |
Object of type bcfmodel
n <- 500 p <- 5 X <- matrix(runif(n*p), ncol = p) mu_x <- ( ((0 <= X[,1]) & (0.25 > X[,1])) * (-7.5) + ((0.25 <= X[,1]) & (0.5 > X[,1])) * (-2.5) + ((0.5 <= X[,1]) & (0.75 > X[,1])) * (2.5) + ((0.75 <= X[,1]) & (1 > X[,1])) * (7.5) ) pi_x <- ( ((0 <= X[,1]) & (0.25 > X[,1])) * (0.2) + ((0.25 <= X[,1]) & (0.5 > X[,1])) * (0.4) + ((0.5 <= X[,1]) & (0.75 > X[,1])) * (0.6) + ((0.75 <= X[,1]) & (1 > X[,1])) * (0.8) ) tau_x <- ( ((0 <= X[,2]) & (0.25 > X[,2])) * (0.5) + ((0.25 <= X[,2]) & (0.5 > X[,2])) * (1.0) + ((0.5 <= X[,2]) & (0.75 > X[,2])) * (1.5) + ((0.75 <= X[,2]) & (1 > X[,2])) * (2.0) ) Z <- rbinom(n, 1, pi_x) E_XZ <- mu_x + Z*tau_x snr <- 3 rfx_group_ids <- rep(c(1,2), n %/% 2) rfx_coefs <- matrix(c(-1, -1, 1, 1), nrow=2, byrow=TRUE) rfx_basis <- cbind(1, runif(n, -1, 1)) rfx_term <- rowSums(rfx_coefs[rfx_group_ids,] * rfx_basis) y <- E_XZ + rfx_term + rnorm(n, 0, 1)*(sd(E_XZ)/snr) test_set_pct <- 0.2 n_test <- round(test_set_pct*n) n_train <- n - n_test test_inds <- sort(sample(1:n, n_test, replace = FALSE)) train_inds <- (1:n)[!((1:n) %in% test_inds)] X_test <- X[test_inds,] X_train <- X[train_inds,] pi_test <- pi_x[test_inds] pi_train <- pi_x[train_inds] Z_test <- Z[test_inds] Z_train <- Z[train_inds] y_test <- y[test_inds] y_train <- y[train_inds] mu_test <- mu_x[test_inds] mu_train <- mu_x[train_inds] tau_test <- tau_x[test_inds] tau_train <- tau_x[train_inds] rfx_group_ids_test <- rfx_group_ids[test_inds] rfx_group_ids_train <- rfx_group_ids[train_inds] rfx_basis_test <- rfx_basis[test_inds,] rfx_basis_train <- rfx_basis[train_inds,] rfx_term_test <- rfx_term[test_inds] rfx_term_train <- rfx_term[train_inds] bcf_model <- bcf(X_train = X_train, Z_train = Z_train, y_train = y_train, propensity_train = pi_train, rfx_group_ids_train = rfx_group_ids_train, rfx_basis_train = rfx_basis_train, X_test = X_test, Z_test = Z_test, propensity_test = pi_test, rfx_group_ids_test = rfx_group_ids_test, rfx_basis_test = rfx_basis_test, num_gfr = 10, num_burnin = 0, num_mcmc = 10) bcf_json <- saveBCFModelToJsonString(bcf_model) bcf_model_roundtrip <- createBCFModelFromJsonString(bcf_json)
n <- 500 p <- 5 X <- matrix(runif(n*p), ncol = p) mu_x <- ( ((0 <= X[,1]) & (0.25 > X[,1])) * (-7.5) + ((0.25 <= X[,1]) & (0.5 > X[,1])) * (-2.5) + ((0.5 <= X[,1]) & (0.75 > X[,1])) * (2.5) + ((0.75 <= X[,1]) & (1 > X[,1])) * (7.5) ) pi_x <- ( ((0 <= X[,1]) & (0.25 > X[,1])) * (0.2) + ((0.25 <= X[,1]) & (0.5 > X[,1])) * (0.4) + ((0.5 <= X[,1]) & (0.75 > X[,1])) * (0.6) + ((0.75 <= X[,1]) & (1 > X[,1])) * (0.8) ) tau_x <- ( ((0 <= X[,2]) & (0.25 > X[,2])) * (0.5) + ((0.25 <= X[,2]) & (0.5 > X[,2])) * (1.0) + ((0.5 <= X[,2]) & (0.75 > X[,2])) * (1.5) + ((0.75 <= X[,2]) & (1 > X[,2])) * (2.0) ) Z <- rbinom(n, 1, pi_x) E_XZ <- mu_x + Z*tau_x snr <- 3 rfx_group_ids <- rep(c(1,2), n %/% 2) rfx_coefs <- matrix(c(-1, -1, 1, 1), nrow=2, byrow=TRUE) rfx_basis <- cbind(1, runif(n, -1, 1)) rfx_term <- rowSums(rfx_coefs[rfx_group_ids,] * rfx_basis) y <- E_XZ + rfx_term + rnorm(n, 0, 1)*(sd(E_XZ)/snr) test_set_pct <- 0.2 n_test <- round(test_set_pct*n) n_train <- n - n_test test_inds <- sort(sample(1:n, n_test, replace = FALSE)) train_inds <- (1:n)[!((1:n) %in% test_inds)] X_test <- X[test_inds,] X_train <- X[train_inds,] pi_test <- pi_x[test_inds] pi_train <- pi_x[train_inds] Z_test <- Z[test_inds] Z_train <- Z[train_inds] y_test <- y[test_inds] y_train <- y[train_inds] mu_test <- mu_x[test_inds] mu_train <- mu_x[train_inds] tau_test <- tau_x[test_inds] tau_train <- tau_x[train_inds] rfx_group_ids_test <- rfx_group_ids[test_inds] rfx_group_ids_train <- rfx_group_ids[train_inds] rfx_basis_test <- rfx_basis[test_inds,] rfx_basis_train <- rfx_basis[train_inds,] rfx_term_test <- rfx_term[test_inds] rfx_term_train <- rfx_term[train_inds] bcf_model <- bcf(X_train = X_train, Z_train = Z_train, y_train = y_train, propensity_train = pi_train, rfx_group_ids_train = rfx_group_ids_train, rfx_basis_train = rfx_basis_train, X_test = X_test, Z_test = Z_test, propensity_test = pi_test, rfx_group_ids_test = rfx_group_ids_test, rfx_basis_test = rfx_basis_test, num_gfr = 10, num_burnin = 0, num_mcmc = 10) bcf_json <- saveBCFModelToJsonString(bcf_model) bcf_model_roundtrip <- createBCFModelFromJsonString(bcf_json)
Create a new (empty) C++ Json object
createCppJson()
createCppJson()
CppJson
object
example_vec <- runif(10) example_json <- createCppJson() example_json$add_vector("myvec", example_vec)
example_vec <- runif(10) example_json <- createCppJson() example_json$add_vector("myvec", example_vec)
Create a C++ Json object from a Json file
createCppJsonFile(json_filename)
createCppJsonFile(json_filename)
json_filename |
Name of file to read. Must end in |
CppJson
object
example_vec <- runif(10) example_json <- createCppJson() example_json$add_vector("myvec", example_vec) tmpjson <- tempfile(fileext = ".json") example_json$save_file(file.path(tmpjson)) example_json_roundtrip <- createCppJsonFile(file.path(tmpjson)) unlink(tmpjson)
example_vec <- runif(10) example_json <- createCppJson() example_json$add_vector("myvec", example_vec) tmpjson <- tempfile(fileext = ".json") example_json$save_file(file.path(tmpjson)) example_json_roundtrip <- createCppJsonFile(file.path(tmpjson)) unlink(tmpjson)
Create a C++ Json object from a Json string
createCppJsonString(json_string)
createCppJsonString(json_string)
json_string |
JSON string dump |
CppJson
object
example_vec <- runif(10) example_json <- createCppJson() example_json$add_vector("myvec", example_vec) example_json_string <- example_json$return_json_string() example_json_roundtrip <- createCppJsonString(example_json_string)
example_vec <- runif(10) example_json <- createCppJson() example_json$add_vector("myvec", example_vec) example_json_string <- example_json$return_json_string() example_json_roundtrip <- createCppJsonString(example_json_string)
Create an R class that wraps a C++ random number generator
createCppRNG(random_seed = -1)
createCppRNG(random_seed = -1)
random_seed |
(Optional) random seed for sampling |
CppRng
object
rng <- createCppRNG(1234) rng <- createCppRNG()
rng <- createCppRNG(1234) rng <- createCppRNG()
Create a forest
createForest( num_trees, leaf_dimension = 1, is_leaf_constant = FALSE, is_exponentiated = FALSE )
createForest( num_trees, leaf_dimension = 1, is_leaf_constant = FALSE, is_exponentiated = FALSE )
num_trees |
Number of trees in the forest |
leaf_dimension |
Dimensionality of the outcome model |
is_leaf_constant |
Whether leaf is constant |
is_exponentiated |
Whether forest predictions should be exponentiated before being returned |
Forest
object
num_trees <- 100 leaf_dimension <- 2 is_leaf_constant <- FALSE is_exponentiated <- FALSE forest <- createForest(num_trees, leaf_dimension, is_leaf_constant, is_exponentiated)
num_trees <- 100 leaf_dimension <- 2 is_leaf_constant <- FALSE is_exponentiated <- FALSE forest <- createForest(num_trees, leaf_dimension, is_leaf_constant, is_exponentiated)
Create a forest dataset object
createForestDataset(covariates, basis = NULL, variance_weights = NULL)
createForestDataset(covariates, basis = NULL, variance_weights = NULL)
covariates |
Matrix of covariates |
basis |
(Optional) Matrix of bases used to define a leaf regression |
variance_weights |
(Optional) Vector of observation-specific variance weights |
ForestDataset
object
covariate_matrix <- matrix(runif(10*100), ncol = 10) basis_matrix <- matrix(rnorm(3*100), ncol = 3) weight_vector <- rnorm(100) forest_dataset <- createForestDataset(covariate_matrix) forest_dataset <- createForestDataset(covariate_matrix, basis_matrix) forest_dataset <- createForestDataset(covariate_matrix, basis_matrix, weight_vector)
covariate_matrix <- matrix(runif(10*100), ncol = 10) basis_matrix <- matrix(rnorm(3*100), ncol = 3) weight_vector <- rnorm(100) forest_dataset <- createForestDataset(covariate_matrix) forest_dataset <- createForestDataset(covariate_matrix, basis_matrix) forest_dataset <- createForestDataset(covariate_matrix, basis_matrix, weight_vector)
Create a forest model object
createForestModel(forest_dataset, forest_model_config, global_model_config)
createForestModel(forest_dataset, forest_model_config, global_model_config)
forest_dataset |
ForestDataset object, used to initialize forest sampling data structures |
forest_model_config |
ForestModelConfig object containing forest model parameters and settings |
global_model_config |
GlobalModelConfig object containing global model parameters and settings |
ForestModel
object
num_trees <- 100 n <- 100 p <- 10 alpha <- 0.95 beta <- 2.0 min_samples_leaf <- 2 max_depth <- 10 feature_types <- as.integer(rep(0, p)) X <- matrix(runif(n*p), ncol = p) forest_dataset <- createForestDataset(X) forest_model_config <- createForestModelConfig(feature_types=feature_types, num_trees=num_trees, num_features=p, num_observations=n, alpha=alpha, beta=beta, min_samples_leaf=min_samples_leaf, max_depth=max_depth, leaf_model_type=1) global_model_config <- createGlobalModelConfig(global_error_variance=1.0) forest_model <- createForestModel(forest_dataset, forest_model_config, global_model_config)
num_trees <- 100 n <- 100 p <- 10 alpha <- 0.95 beta <- 2.0 min_samples_leaf <- 2 max_depth <- 10 feature_types <- as.integer(rep(0, p)) X <- matrix(runif(n*p), ncol = p) forest_dataset <- createForestDataset(X) forest_model_config <- createForestModelConfig(feature_types=feature_types, num_trees=num_trees, num_features=p, num_observations=n, alpha=alpha, beta=beta, min_samples_leaf=min_samples_leaf, max_depth=max_depth, leaf_model_type=1) global_model_config <- createGlobalModelConfig(global_error_variance=1.0) forest_model <- createForestModel(forest_dataset, forest_model_config, global_model_config)
Create a forest model config object
createForestModelConfig( feature_types = NULL, num_trees = NULL, num_features = NULL, num_observations = NULL, variable_weights = NULL, leaf_dimension = 1, alpha = 0.95, beta = 2, min_samples_leaf = 5, max_depth = -1, leaf_model_type = 1, leaf_model_scale = NULL, variance_forest_shape = 1, variance_forest_scale = 1, cutpoint_grid_size = 100 )
createForestModelConfig( feature_types = NULL, num_trees = NULL, num_features = NULL, num_observations = NULL, variable_weights = NULL, leaf_dimension = 1, alpha = 0.95, beta = 2, min_samples_leaf = 5, max_depth = -1, leaf_model_type = 1, leaf_model_scale = NULL, variance_forest_shape = 1, variance_forest_scale = 1, cutpoint_grid_size = 100 )
feature_types |
Vector of integer-coded feature types (integers where 0 = numeric, 1 = ordered categorical, 2 = unordered categorical) |
num_trees |
Number of trees in the forest being sampled |
num_features |
Number of features in training dataset |
num_observations |
Number of observations in training dataset |
variable_weights |
Vector specifying sampling probability for all p covariates in ForestDataset |
leaf_dimension |
Dimension of the leaf model (default: |
alpha |
Root node split probability in tree prior (default: |
beta |
Depth prior penalty in tree prior (default: |
min_samples_leaf |
Minimum number of samples in a tree leaf (default: |
max_depth |
Maximum depth of any tree in the ensemble in the model. Setting to |
leaf_model_type |
Integer specifying the leaf model type (0 = constant leaf, 1 = univariate leaf regression, 2 = multivariate leaf regression). Default: |
leaf_model_scale |
Scale parameter used in Gaussian leaf models (can either be a scalar or a q x q matrix, where q is the dimensionality of the basis and is only >1 when |
variance_forest_shape |
Shape parameter for IG leaf models (applicable when |
variance_forest_scale |
Scale parameter for IG leaf models (applicable when |
cutpoint_grid_size |
Number of unique cutpoints to consider (default: |
ForestModelConfig object
config <- createForestModelConfig(num_trees = 10, num_features = 5, num_observations = 100)
config <- createForestModelConfig(num_trees = 10, num_features = 5, num_observations = 100)
Create a container of forest samples
createForestSamples( num_trees, leaf_dimension = 1, is_leaf_constant = FALSE, is_exponentiated = FALSE )
createForestSamples( num_trees, leaf_dimension = 1, is_leaf_constant = FALSE, is_exponentiated = FALSE )
num_trees |
Number of trees |
leaf_dimension |
Dimensionality of the outcome model |
is_leaf_constant |
Whether leaf is constant |
is_exponentiated |
Whether forest predictions should be exponentiated before being returned |
ForestSamples
object
num_trees <- 100 leaf_dimension <- 2 is_leaf_constant <- FALSE is_exponentiated <- FALSE forest_samples <- createForestSamples(num_trees, leaf_dimension, is_leaf_constant, is_exponentiated)
num_trees <- 100 leaf_dimension <- 2 is_leaf_constant <- FALSE is_exponentiated <- FALSE forest_samples <- createForestSamples(num_trees, leaf_dimension, is_leaf_constant, is_exponentiated)
Create a global model config object
createGlobalModelConfig(global_error_variance = 1)
createGlobalModelConfig(global_error_variance = 1)
global_error_variance |
Global error variance parameter (default: |
GlobalModelConfig object
config <- createGlobalModelConfig(global_error_variance = 100)
config <- createGlobalModelConfig(global_error_variance = 100)
Create an outcome object
createOutcome(outcome)
createOutcome(outcome)
outcome |
Vector of outcome values |
Outcome
object
X <- matrix(runif(10*100), ncol = 10) y <- -5 + 10*(X[,1] > 0.5) + rnorm(100) outcome <- createOutcome(y)
X <- matrix(runif(10*100), ncol = 10) y <- -5 + 10*(X[,1] > 0.5) + rnorm(100) outcome <- createOutcome(y)
Reload a covariate preprocessor object from a JSON string containing a serialized preprocessor
createPreprocessorFromJson(json_object)
createPreprocessorFromJson(json_object)
json_object |
in-memory wrapper around JSON C++ object containing covariate preprocessor metadata |
Preprocessor object that can be used with the preprocessPredictionData
function
cov_mat <- matrix(1:12, ncol = 3) preprocess_list <- preprocessTrainData(cov_mat) preprocessor_json <- convertPreprocessorToJson(preprocess_list$metadata) preprocessor_roundtrip <- createPreprocessorFromJson(preprocessor_json)
cov_mat <- matrix(1:12, ncol = 3) preprocess_list <- preprocessTrainData(cov_mat) preprocessor_json <- convertPreprocessorToJson(preprocess_list$metadata) preprocessor_roundtrip <- createPreprocessorFromJson(preprocessor_json)
Reload a covariate preprocessor object from a JSON string containing a serialized preprocessor
createPreprocessorFromJsonString(json_string)
createPreprocessorFromJsonString(json_string)
json_string |
in-memory JSON string containing covariate preprocessor metadata |
Preprocessor object that can be used with the preprocessPredictionData
function
cov_mat <- matrix(1:12, ncol = 3) preprocess_list <- preprocessTrainData(cov_mat) preprocessor_json_string <- savePreprocessorToJsonString(preprocess_list$metadata) preprocessor_roundtrip <- createPreprocessorFromJsonString(preprocessor_json_string)
cov_mat <- matrix(1:12, ncol = 3) preprocess_list <- preprocessTrainData(cov_mat) preprocessor_json_string <- savePreprocessorToJsonString(preprocess_list$metadata) preprocessor_roundtrip <- createPreprocessorFromJsonString(preprocessor_json_string)
RandomEffectSamples
objectCreate a RandomEffectSamples
object
createRandomEffectSamples(num_components, num_groups, random_effects_tracker)
createRandomEffectSamples(num_components, num_groups, random_effects_tracker)
num_components |
Number of "components" or bases defining the random effects regression |
num_groups |
Number of random effects groups |
random_effects_tracker |
Object of type |
RandomEffectSamples
object
n <- 100 rfx_group_ids <- sample(1:2, size = n, replace = TRUE) rfx_basis <- matrix(rep(1.0, n), ncol=1) num_groups <- length(unique(rfx_group_ids)) num_components <- ncol(rfx_basis) rfx_tracker <- createRandomEffectsTracker(rfx_group_ids) rfx_samples <- createRandomEffectSamples(num_components, num_groups, rfx_tracker)
n <- 100 rfx_group_ids <- sample(1:2, size = n, replace = TRUE) rfx_basis <- matrix(rep(1.0, n), ncol=1) num_groups <- length(unique(rfx_group_ids)) num_components <- ncol(rfx_basis) rfx_tracker <- createRandomEffectsTracker(rfx_group_ids) rfx_samples <- createRandomEffectSamples(num_components, num_groups, rfx_tracker)
Create a random effects dataset object
createRandomEffectsDataset(group_labels, basis, variance_weights = NULL)
createRandomEffectsDataset(group_labels, basis, variance_weights = NULL)
group_labels |
Vector of group labels |
basis |
Matrix of bases used to define the random effects regression (for an intercept-only model, pass an array of ones) |
variance_weights |
(Optional) Vector of observation-specific variance weights |
RandomEffectsDataset
object
rfx_group_ids <- sample(1:2, size = 100, replace = TRUE) rfx_basis <- matrix(rnorm(3*100), ncol = 3) weight_vector <- rnorm(100) rfx_dataset <- createRandomEffectsDataset(rfx_group_ids, rfx_basis) rfx_dataset <- createRandomEffectsDataset(rfx_group_ids, rfx_basis, weight_vector)
rfx_group_ids <- sample(1:2, size = 100, replace = TRUE) rfx_basis <- matrix(rnorm(3*100), ncol = 3) weight_vector <- rnorm(100) rfx_dataset <- createRandomEffectsDataset(rfx_group_ids, rfx_basis) rfx_dataset <- createRandomEffectsDataset(rfx_group_ids, rfx_basis, weight_vector)
RandomEffectsModel
objectCreate a RandomEffectsModel
object
createRandomEffectsModel(num_components, num_groups)
createRandomEffectsModel(num_components, num_groups)
num_components |
Number of "components" or bases defining the random effects regression |
num_groups |
Number of random effects groups |
RandomEffectsModel
object
n <- 100 rfx_group_ids <- sample(1:2, size = n, replace = TRUE) rfx_basis <- matrix(rep(1.0, n), ncol=1) num_groups <- length(unique(rfx_group_ids)) num_components <- ncol(rfx_basis) rfx_model <- createRandomEffectsModel(num_components, num_groups)
n <- 100 rfx_group_ids <- sample(1:2, size = n, replace = TRUE) rfx_basis <- matrix(rep(1.0, n), ncol=1) num_groups <- length(unique(rfx_group_ids)) num_components <- ncol(rfx_basis) rfx_model <- createRandomEffectsModel(num_components, num_groups)
RandomEffectsTracker
objectCreate a RandomEffectsTracker
object
createRandomEffectsTracker(rfx_group_indices)
createRandomEffectsTracker(rfx_group_indices)
rfx_group_indices |
Integer indices indicating groups used to define random effects |
RandomEffectsTracker
object
n <- 100 rfx_group_ids <- sample(1:2, size = n, replace = TRUE) rfx_basis <- matrix(rep(1.0, n), ncol=1) num_groups <- length(unique(rfx_group_ids)) num_components <- ncol(rfx_basis) rfx_tracker <- createRandomEffectsTracker(rfx_group_ids)
n <- 100 rfx_group_ids <- sample(1:2, size = n, replace = TRUE) rfx_basis <- matrix(rep(1.0, n), ncol=1) num_groups <- length(unique(rfx_group_ids)) num_components <- ncol(rfx_basis) rfx_tracker <- createRandomEffectsTracker(rfx_group_ids)
Wrapper around a C++ tree ensemble
forest_ptr
External pointer to a C++ TreeEnsemble class
internal_forest_is_empty
Whether the forest has not yet been "initialized" such that its predict
function can be called.
new()
Create a new Forest object.
Forest$new( num_trees, leaf_dimension = 1, is_leaf_constant = FALSE, is_exponentiated = FALSE )
num_trees
Number of trees in the forest
leaf_dimension
Dimensionality of the outcome model
is_leaf_constant
Whether leaf is constant
is_exponentiated
Whether forest predictions should be exponentiated before being returned
A new Forest
object.
predict()
Predict forest on every sample in forest_dataset
Forest$predict(forest_dataset)
forest_dataset
ForestDataset
R class
vector of predictions with as many rows as in forest_dataset
predict_raw()
Predict "raw" leaf values (without being multiplied by basis) for every sample in forest_dataset
Forest$predict_raw(forest_dataset)
forest_dataset
ForestDataset
R class
Array of predictions for each observation in forest_dataset
and
each sample in the ForestSamples
class with each prediction having the
dimensionality of the forests' leaf model. In the case of a constant leaf model
or univariate leaf regression, this array is a vector (length is the number of
observations). In the case of a multivariate leaf regression,
this array is a matrix (number of observations by leaf model dimension,
number of samples).
set_root_leaves()
Set a constant predicted value for every tree in the ensemble. Stops program if any tree is more than a root node.
Forest$set_root_leaves(leaf_value)
leaf_value
Constant leaf value(s) to be fixed for each tree in the ensemble indexed by forest_num
. Can be either a single number or a vector, depending on the forest's leaf dimension.
prepare_for_sampler()
Set a constant predicted value for every tree in the ensemble. Stops program if any tree is more than a root node.
Forest$prepare_for_sampler( dataset, outcome, forest_model, leaf_model_int, leaf_value )
dataset
ForestDataset
Dataset class (covariates, basis, etc...)
outcome
Outcome
Outcome class (residual / partial residual)
forest_model
ForestModel
object storing tracking structures used in training / sampling
leaf_model_int
Integer value encoding the leaf model type (0 = constant gaussian, 1 = univariate gaussian, 2 = multivariate gaussian, 3 = log linear variance).
leaf_value
Constant leaf value(s) to be fixed for each tree in the ensemble indexed by forest_num
. Can be either a single number or a vector, depending on the forest's leaf dimension.
adjust_residual()
Adjusts residual based on the predictions of a forest
This is typically run just once at the beginning of a forest sampling algorithm. After trees are initialized with constant root node predictions, their root predictions are subtracted out of the residual.
Forest$adjust_residual(dataset, outcome, forest_model, requires_basis, add)
dataset
ForestDataset
object storing the covariates and bases for a given forest
outcome
Outcome
object storing the residuals to be updated based on forest predictions
forest_model
ForestModel
object storing tracking structures used in training / sampling
requires_basis
Whether or not a forest requires a basis for prediction
add
Whether forest predictions should be added to or subtracted from residuals
num_trees()
Return number of trees in each ensemble of a Forest
object
Forest$num_trees()
Tree count
leaf_dimension()
Return output dimension of trees in a Forest
object
Forest$leaf_dimension()
Leaf node parameter size
is_constant_leaf()
Return constant leaf status of trees in a Forest
object
Forest$is_constant_leaf()
TRUE
if leaves are constant, FALSE
otherwise
is_exponentiated()
Return exponentiation status of trees in a Forest
object
Forest$is_exponentiated()
TRUE
if leaf predictions must be exponentiated, FALSE
otherwise
add_numeric_split_tree()
Add a numeric (i.e. X[,i] <= c
) split to a given tree in the ensemble
Forest$add_numeric_split_tree( tree_num, leaf_num, feature_num, split_threshold, left_leaf_value, right_leaf_value )
tree_num
Index of the tree to be split
leaf_num
Leaf to be split
feature_num
Feature that defines the new split
split_threshold
Value that defines the cutoff of the new split
left_leaf_value
Value (or vector of values) to assign to the newly created left node
right_leaf_value
Value (or vector of values) to assign to the newly created right node
get_tree_leaves()
Retrieve a vector of indices of leaf nodes for a given tree in a given forest
Forest$get_tree_leaves(tree_num)
tree_num
Index of the tree for which leaf indices will be retrieved
get_tree_split_counts()
Retrieve a vector of split counts for every training set variable in a given tree in the forest
Forest$get_tree_split_counts(tree_num, num_features)
tree_num
Index of the tree for which split counts will be retrieved
num_features
Total number of features in the training set
get_forest_split_counts()
Retrieve a vector of split counts for every training set variable in the forest
Forest$get_forest_split_counts(num_features)
num_features
Total number of features in the training set
tree_max_depth()
Maximum depth of a specific tree in the forest
Forest$tree_max_depth(tree_num)
tree_num
Tree index within forest
Maximum leaf depth
average_max_depth()
Average the maximum depth of each tree in the forest
Forest$average_max_depth()
Average maximum depth
is_empty()
When a forest object is created, it is "empty" in the sense that none
of its component trees have leaves with values. There are two ways to
"initialize" a Forest object. First, the set_root_leaves()
method
simply initializes every tree in the forest to a single node carrying
the same (user-specified) leaf value. Second, the prepare_for_sampler()
method initializes every tree in the forest to a single node with the
same value and also propagates this information through to a ForestModel
object, which must be synchronized with a Forest during a forest
sampler loop.
Forest$is_empty()
TRUE
if a Forest has not yet been initialized with a constant
root value, FALSE
otherwise if the forest has already been
initialized / grown.
A dataset consists of three matrices / vectors: covariates, bases, and variance weights. Both the basis vector and variance weights are optional.
data_ptr
External pointer to a C++ ForestDataset class
new()
Create a new ForestDataset object.
ForestDataset$new(covariates, basis = NULL, variance_weights = NULL)
covariates
Matrix of covariates
basis
(Optional) Matrix of bases used to define a leaf regression
variance_weights
(Optional) Vector of observation-specific variance weights
A new ForestDataset
object.
update_basis()
Update basis matrix in a dataset
ForestDataset$update_basis(basis)
basis
Updated matrix of bases used to define a leaf regression
num_observations()
Return number of observations in a ForestDataset
object
ForestDataset$num_observations()
Observation count
num_covariates()
Return number of covariates in a ForestDataset
object
ForestDataset$num_covariates()
Covariate count
num_basis()
Return number of bases in a ForestDataset
object
ForestDataset$num_basis()
Basis count
has_basis()
Whether or not a dataset has a basis matrix
ForestDataset$has_basis()
True if basis matrix is loaded, false otherwise
has_variance_weights()
Whether or not a dataset has variance weights
ForestDataset$has_variance_weights()
True if variance weights are loaded, false otherwise
Hosts the C++ data structures needed to sample an ensemble of decision trees, and exposes functionality to run a forest sampler (using either MCMC or the grow-from-root algorithm).
tracker_ptr
External pointer to a C++ ForestTracker class
tree_prior_ptr
External pointer to a C++ TreePrior class
new()
Create a new ForestModel object.
ForestModel$new( forest_dataset, feature_types, num_trees, n, alpha, beta, min_samples_leaf, max_depth = -1 )
forest_dataset
ForestDataset
object, used to initialize forest sampling data structures
feature_types
Feature types (integers where 0 = numeric, 1 = ordered categorical, 2 = unordered categorical)
num_trees
Number of trees in the forest being sampled
n
Number of observations in forest_dataset
alpha
Root node split probability in tree prior
beta
Depth prior penalty in tree prior
min_samples_leaf
Minimum number of samples in a tree leaf
max_depth
Maximum depth that any tree can reach
A new ForestModel
object.
sample_one_iteration()
Run a single iteration of the forest sampling algorithm (MCMC or GFR)
ForestModel$sample_one_iteration( forest_dataset, residual, forest_samples, active_forest, rng, forest_model_config, global_model_config, keep_forest = TRUE, gfr = TRUE )
forest_dataset
Dataset used to sample the forest
residual
Outcome used to sample the forest
forest_samples
Container of forest samples
active_forest
"Active" forest updated by the sampler in each iteration
rng
Wrapper around C++ random number generator
forest_model_config
ForestModelConfig object containing forest model parameters and settings
global_model_config
GlobalModelConfig object containing global model parameters and settings
keep_forest
(Optional) Whether the updated forest sample should be saved to forest_samples
. Default: TRUE
.
gfr
(Optional) Whether or not the forest should be sampled using the "grow-from-root" (GFR) algorithm. Default: TRUE
.
propagate_basis_update()
Propagates basis update through to the (full/partial) residual by iteratively (a) adding back in the previous prediction of each tree, (b) recomputing predictions for each tree (caching on the C++ side), (c) subtracting the new predictions from the residual.
This is useful in cases where a basis (for e.g. leaf regression) is updated outside of a tree sampler (as with e.g. adaptive coding for binary treatment BCF). Once a basis has been updated, the overall "function" represented by a tree model has changed and this should be reflected through to the residual before the next sampling loop is run.
ForestModel$propagate_basis_update(dataset, outcome, active_forest)
dataset
ForestDataset
object storing the covariates and bases for a given forest
outcome
Outcome
object storing the residuals to be updated based on forest predictions
active_forest
"Active" forest updated by the sampler in each iteration
propagate_residual_update()
Update the current state of the outcome (i.e. partial residual) data by subtracting the current predictions of each tree.
This function is run after the Outcome
class's update_data
method, which overwrites the partial residual with an entirely new stream of outcome data.
ForestModel$propagate_residual_update(residual)
residual
Outcome used to sample the forest
None
update_alpha()
Update alpha in the tree prior
ForestModel$update_alpha(alpha)
alpha
New value of alpha to be used
None
update_beta()
Update beta in the tree prior
ForestModel$update_beta(beta)
beta
New value of beta to be used
None
update_min_samples_leaf()
Update min_samples_leaf in the tree prior
ForestModel$update_min_samples_leaf(min_samples_leaf)
min_samples_leaf
New value of min_samples_leaf to be used
None
update_max_depth()
Update max_depth in the tree prior
ForestModel$update_max_depth(max_depth)
max_depth
New value of max_depth to be used
None
The "low-level" stochtree interface enables a high degreee of sampler customization, in which users employ R wrappers around C++ objects like ForestDataset, Outcome, CppRng, and ForestModel to run the Gibbs sampler of a BART model with custom modifications. ForestModelConfig allows users to specify / query the parameters of a forest model they wish to run.
Vector of integer-coded feature types (integers where 0 = numeric, 1 = ordered categorical, 2 = unordered categorical)
Vector specifying sampling probability for all p covariates in ForestDataset
Root node split probability in tree prior
Depth prior penalty in tree prior
Minimum number of samples in a tree leaf
Maximum depth of any tree in the ensemble in the model
Scale parameter used in Gaussian leaf models
Shape parameter for IG leaf models
Scale parameter for IG leaf models
Number of unique cutpoints to consider
feature_types
Vector of integer-coded feature types (integers where 0 = numeric, 1 = ordered categorical, 2 = unordered categorical)
num_trees
Number of trees in the forest being sampled
num_features
Number of features in training dataset
num_observations
Number of observations in training dataset
leaf_dimension
Dimension of the leaf model
alpha
Root node split probability in tree prior
beta
Depth prior penalty in tree prior
min_samples_leaf
Minimum number of samples in a tree leaf
max_depth
Maximum depth of any tree in the ensemble in the model. Setting to -1
does not enforce any depth limits on trees.
leaf_model_type
Integer specifying the leaf model type (0 = constant leaf, 1 = univariate leaf regression, 2 = multivariate leaf regression)
leaf_model_scale
Scale parameter used in Gaussian leaf models
variable_weights
Vector specifying sampling probability for all p covariates in ForestDataset
variance_forest_shape
Shape parameter for IG leaf models (applicable when leaf_model_type = 3
)
variance_forest_scale
Scale parameter for IG leaf models (applicable when leaf_model_type = 3
)
cutpoint_grid_size
Number of unique cutpoints to consider Create a new ForestModelConfig object.
new()
ForestModelConfig$new( feature_types = NULL, num_trees = NULL, num_features = NULL, num_observations = NULL, variable_weights = NULL, leaf_dimension = 1, alpha = 0.95, beta = 2, min_samples_leaf = 5, max_depth = -1, leaf_model_type = 1, leaf_model_scale = NULL, variance_forest_shape = 1, variance_forest_scale = 1, cutpoint_grid_size = 100 )
feature_types
Vector of integer-coded feature types (integers where 0 = numeric, 1 = ordered categorical, 2 = unordered categorical)
num_trees
Number of trees in the forest being sampled
num_features
Number of features in training dataset
num_observations
Number of observations in training dataset
variable_weights
Vector specifying sampling probability for all p covariates in ForestDataset
leaf_dimension
Dimension of the leaf model (default: 1
)
alpha
Root node split probability in tree prior (default: 0.95
)
beta
Depth prior penalty in tree prior (default: 2.0
)
min_samples_leaf
Minimum number of samples in a tree leaf (default: 5
)
max_depth
Maximum depth of any tree in the ensemble in the model. Setting to -1
does not enforce any depth limits on trees. Default: -1
.
leaf_model_type
Integer specifying the leaf model type (0 = constant leaf, 1 = univariate leaf regression, 2 = multivariate leaf regression). Default: 0
.
leaf_model_scale
Scale parameter used in Gaussian leaf models (can either be a scalar or a q x q matrix, where q is the dimensionality of the basis and is only >1 when leaf_model_int = 2
). Calibrated internally as 1/num_trees
, propagated along diagonal if needed for multivariate leaf models.
variance_forest_shape
Shape parameter for IG leaf models (applicable when leaf_model_type = 3
). Default: 1
.
variance_forest_scale
Scale parameter for IG leaf models (applicable when leaf_model_type = 3
). Default: 1
.
cutpoint_grid_size
Number of unique cutpoints to consider (default: 100
)
A new ForestModelConfig object.
update_feature_types()
Update feature types
ForestModelConfig$update_feature_types(feature_types)
feature_types
Vector of integer-coded feature types (integers where 0 = numeric, 1 = ordered categorical, 2 = unordered categorical)
update_variable_weights()
Update variable weights
ForestModelConfig$update_variable_weights(variable_weights)
variable_weights
Vector specifying sampling probability for all p covariates in ForestDataset
update_alpha()
Update root node split probability in tree prior
ForestModelConfig$update_alpha(alpha)
alpha
Root node split probability in tree prior
update_beta()
Update depth prior penalty in tree prior
ForestModelConfig$update_beta(beta)
beta
Depth prior penalty in tree prior
update_min_samples_leaf()
Update root node split probability in tree prior
ForestModelConfig$update_min_samples_leaf(min_samples_leaf)
min_samples_leaf
Minimum number of samples in a tree leaf
update_max_depth()
Update root node split probability in tree prior
ForestModelConfig$update_max_depth(max_depth)
max_depth
Maximum depth of any tree in the ensemble in the model
update_leaf_model_scale()
Update scale parameter used in Gaussian leaf models
ForestModelConfig$update_leaf_model_scale(leaf_model_scale)
leaf_model_scale
Scale parameter used in Gaussian leaf models
update_variance_forest_shape()
Update shape parameter for IG leaf models
ForestModelConfig$update_variance_forest_shape(variance_forest_shape)
variance_forest_shape
Shape parameter for IG leaf models
update_variance_forest_scale()
Update scale parameter for IG leaf models
ForestModelConfig$update_variance_forest_scale(variance_forest_scale)
variance_forest_scale
Scale parameter for IG leaf models
update_cutpoint_grid_size()
Update number of unique cutpoints to consider
ForestModelConfig$update_cutpoint_grid_size(cutpoint_grid_size)
cutpoint_grid_size
Number of unique cutpoints to consider
get_feature_types()
Query feature types for this ForestModelConfig object
ForestModelConfig$get_feature_types()
get_variable_weights()
Query variable weights for this ForestModelConfig object
ForestModelConfig$get_variable_weights()
get_alpha()
Query root node split probability in tree prior for this ForestModelConfig object
ForestModelConfig$get_alpha()
get_beta()
Query depth prior penalty in tree prior for this ForestModelConfig object
ForestModelConfig$get_beta()
get_min_samples_leaf()
Query root node split probability in tree prior for this ForestModelConfig object
ForestModelConfig$get_min_samples_leaf()
get_max_depth()
Query root node split probability in tree prior for this ForestModelConfig object
ForestModelConfig$get_max_depth()
get_leaf_model_scale()
Query scale parameter used in Gaussian leaf models for this ForestModelConfig object
ForestModelConfig$get_leaf_model_scale()
get_variance_forest_shape()
Query shape parameter for IG leaf models for this ForestModelConfig object
ForestModelConfig$get_variance_forest_shape()
get_variance_forest_scale()
Query scale parameter for IG leaf models for this ForestModelConfig object
ForestModelConfig$get_variance_forest_scale()
get_cutpoint_grid_size()
Query number of unique cutpoints to consider for this ForestModelConfig object
ForestModelConfig$get_cutpoint_grid_size()
Wrapper around a C++ container of tree ensembles
forest_container_ptr
External pointer to a C++ ForestContainer class
new()
Create a new ForestContainer object.
ForestSamples$new( num_trees, leaf_dimension = 1, is_leaf_constant = FALSE, is_exponentiated = FALSE )
num_trees
Number of trees
leaf_dimension
Dimensionality of the outcome model
is_leaf_constant
Whether leaf is constant
is_exponentiated
Whether forest predictions should be exponentiated before being returned
A new ForestContainer
object.
load_from_json()
Create a new ForestContainer
object from a json object
ForestSamples$load_from_json(json_object, json_forest_label)
json_object
Object of class CppJson
json_forest_label
Label referring to a particular forest (i.e. "forest_0") in the overall json hierarchy
A new ForestContainer
object.
append_from_json()
Append to a ForestContainer
object from a json object
ForestSamples$append_from_json(json_object, json_forest_label)
json_object
Object of class CppJson
json_forest_label
Label referring to a particular forest (i.e. "forest_0") in the overall json hierarchy
None
load_from_json_string()
Create a new ForestContainer
object from a json object
ForestSamples$load_from_json_string(json_string, json_forest_label)
json_string
JSON string which parses into object of class CppJson
json_forest_label
Label referring to a particular forest (i.e. "forest_0") in the overall json hierarchy
A new ForestContainer
object.
append_from_json_string()
Append to a ForestContainer
object from a json object
ForestSamples$append_from_json_string(json_string, json_forest_label)
json_string
JSON string which parses into object of class CppJson
json_forest_label
Label referring to a particular forest (i.e. "forest_0") in the overall json hierarchy
None
predict()
Predict every tree ensemble on every sample in forest_dataset
ForestSamples$predict(forest_dataset)
forest_dataset
ForestDataset
R class
matrix of predictions with as many rows as in forest_dataset
and as many columns as samples in the ForestContainer
predict_raw()
Predict "raw" leaf values (without being multiplied by basis) for every tree ensemble on every sample in forest_dataset
ForestSamples$predict_raw(forest_dataset)
forest_dataset
ForestDataset
R class
Array of predictions for each observation in forest_dataset
and
each sample in the ForestSamples
class with each prediction having the
dimensionality of the forests' leaf model. In the case of a constant leaf model
or univariate leaf regression, this array is two-dimensional (number of observations,
number of forest samples). In the case of a multivariate leaf regression,
this array is three-dimension (number of observations, leaf model dimension,
number of samples).
predict_raw_single_forest()
Predict "raw" leaf values (without being multiplied by basis) for a specific forest on every sample in forest_dataset
ForestSamples$predict_raw_single_forest(forest_dataset, forest_num)
forest_dataset
ForestDataset
R class
forest_num
Index of the forest sample within the container
matrix of predictions with as many rows as in forest_dataset
and as many columns as dimensions in the leaves of trees in ForestContainer
predict_raw_single_tree()
Predict "raw" leaf values (without being multiplied by basis) for a specific tree in a specific forest on every observation in forest_dataset
ForestSamples$predict_raw_single_tree(forest_dataset, forest_num, tree_num)
forest_dataset
ForestDataset
R class
forest_num
Index of the forest sample within the container
tree_num
Index of the tree to be queried
matrix of predictions with as many rows as in forest_dataset
and as many columns as dimensions in the leaves of trees in ForestContainer
set_root_leaves()
Set a constant predicted value for every tree in the ensemble. Stops program if any tree is more than a root node.
ForestSamples$set_root_leaves(forest_num, leaf_value)
forest_num
Index of the forest sample within the container.
leaf_value
Constant leaf value(s) to be fixed for each tree in the ensemble indexed by forest_num
. Can be either a single number or a vector, depending on the forest's leaf dimension.
prepare_for_sampler()
Set a constant predicted value for every tree in the ensemble. Stops program if any tree is more than a root node.
ForestSamples$prepare_for_sampler( dataset, outcome, forest_model, leaf_model_int, leaf_value )
dataset
ForestDataset
Dataset class (covariates, basis, etc...)
outcome
Outcome
Outcome class (residual / partial residual)
forest_model
ForestModel
object storing tracking structures used in training / sampling
leaf_model_int
Integer value encoding the leaf model type (0 = constant gaussian, 1 = univariate gaussian, 2 = multivariate gaussian, 3 = log linear variance).
leaf_value
Constant leaf value(s) to be fixed for each tree in the ensemble indexed by forest_num
. Can be either a single number or a vector, depending on the forest's leaf dimension.
adjust_residual()
Adjusts residual based on the predictions of a forest
This is typically run just once at the beginning of a forest sampling algorithm. After trees are initialized with constant root node predictions, their root predictions are subtracted out of the residual.
ForestSamples$adjust_residual( dataset, outcome, forest_model, requires_basis, forest_num, add )
dataset
ForestDataset
object storing the covariates and bases for a given forest
outcome
Outcome
object storing the residuals to be updated based on forest predictions
forest_model
ForestModel
object storing tracking structures used in training / sampling
requires_basis
Whether or not a forest requires a basis for prediction
forest_num
Index of forest used to update residuals
add
Whether forest predictions should be added to or subtracted from residuals
save_json()
Store the trees and metadata of ForestDataset
class in a json file
ForestSamples$save_json(json_filename)
json_filename
Name of output json file (must end in ".json")
load_json()
Load trees and metadata for an ensemble from a json file. Note that
any trees and metadata already present in ForestDataset
class will
be overwritten.
ForestSamples$load_json(json_filename)
json_filename
Name of model input json file (must end in ".json")
num_samples()
Return number of samples in a ForestContainer
object
ForestSamples$num_samples()
Sample count
num_trees()
Return number of trees in each ensemble of a ForestContainer
object
ForestSamples$num_trees()
Tree count
leaf_dimension()
Return output dimension of trees in a ForestContainer
object
ForestSamples$leaf_dimension()
Leaf node parameter size
is_constant_leaf()
Return constant leaf status of trees in a ForestContainer
object
ForestSamples$is_constant_leaf()
TRUE
if leaves are constant, FALSE
otherwise
is_exponentiated()
Return exponentiation status of trees in a ForestContainer
object
ForestSamples$is_exponentiated()
TRUE
if leaf predictions must be exponentiated, FALSE
otherwise
add_forest_with_constant_leaves()
Add a new all-root ensemble to the container, with all of the leaves set to the value / vector provided
ForestSamples$add_forest_with_constant_leaves(leaf_value)
leaf_value
Value (or vector of values) to initialize root nodes in tree
add_numeric_split_tree()
Add a numeric (i.e. X[,i] <= c
) split to a given tree in the ensemble
ForestSamples$add_numeric_split_tree( forest_num, tree_num, leaf_num, feature_num, split_threshold, left_leaf_value, right_leaf_value )
forest_num
Index of the forest which contains the tree to be split
tree_num
Index of the tree to be split
leaf_num
Leaf to be split
feature_num
Feature that defines the new split
split_threshold
Value that defines the cutoff of the new split
left_leaf_value
Value (or vector of values) to assign to the newly created left node
right_leaf_value
Value (or vector of values) to assign to the newly created right node
get_tree_leaves()
Retrieve a vector of indices of leaf nodes for a given tree in a given forest
ForestSamples$get_tree_leaves(forest_num, tree_num)
forest_num
Index of the forest which contains tree tree_num
tree_num
Index of the tree for which leaf indices will be retrieved
get_tree_split_counts()
Retrieve a vector of split counts for every training set variable in a given tree in a given forest
ForestSamples$get_tree_split_counts(forest_num, tree_num, num_features)
forest_num
Index of the forest which contains tree tree_num
tree_num
Index of the tree for which split counts will be retrieved
num_features
Total number of features in the training set
get_forest_split_counts()
Retrieve a vector of split counts for every training set variable in a given forest
ForestSamples$get_forest_split_counts(forest_num, num_features)
forest_num
Index of the forest for which split counts will be retrieved
num_features
Total number of features in the training set
get_aggregate_split_counts()
Retrieve a vector of split counts for every training set variable in a given forest, aggregated across ensembles and trees
ForestSamples$get_aggregate_split_counts(num_features)
num_features
Total number of features in the training set
get_granular_split_counts()
Retrieve a vector of split counts for every training set variable in a given forest, reported separately for each ensemble and tree
ForestSamples$get_granular_split_counts(num_features)
num_features
Total number of features in the training set
ensemble_tree_max_depth()
Maximum depth of a specific tree in a specific ensemble in a ForestSamples
object
ForestSamples$ensemble_tree_max_depth(ensemble_num, tree_num)
ensemble_num
Ensemble number
tree_num
Tree index within ensemble ensemble_num
Maximum leaf depth
average_ensemble_max_depth()
Average the maximum depth of each tree in a given ensemble in a ForestSamples
object
ForestSamples$average_ensemble_max_depth(ensemble_num)
ensemble_num
Ensemble number
Average maximum depth
average_max_depth()
Average the maximum depth of each tree in each ensemble in a ForestContainer
object
ForestSamples$average_max_depth()
Average maximum depth
num_forest_leaves()
Number of leaves in a given ensemble in a ForestSamples
object
ForestSamples$num_forest_leaves(forest_num)
forest_num
Index of the ensemble to be queried
Count of leaves in the ensemble stored at forest_num
sum_leaves_squared()
Sum of squared (raw) leaf values in a given ensemble in a ForestSamples
object
ForestSamples$sum_leaves_squared(forest_num)
forest_num
Index of the ensemble to be queried
Average maximum depth
is_leaf_node()
Whether or not a given node of a given tree in a given forest in the ForestSamples
is a leaf
ForestSamples$is_leaf_node(forest_num, tree_num, node_id)
forest_num
Index of the forest to be queried
tree_num
Index of the tree to be queried
node_id
Index of the node to be queried
TRUE
if node is a leaf, FALSE
otherwise
is_numeric_split_node()
Whether or not a given node of a given tree in a given forest in the ForestSamples
is a numeric split node
ForestSamples$is_numeric_split_node(forest_num, tree_num, node_id)
forest_num
Index of the forest to be queried
tree_num
Index of the tree to be queried
node_id
Index of the node to be queried
TRUE
if node is a numeric split node, FALSE
otherwise
is_categorical_split_node()
Whether or not a given node of a given tree in a given forest in the ForestSamples
is a categorical split node
ForestSamples$is_categorical_split_node(forest_num, tree_num, node_id)
forest_num
Index of the forest to be queried
tree_num
Index of the tree to be queried
node_id
Index of the node to be queried
TRUE
if node is a categorical split node, FALSE
otherwise
parent_node()
Parent node of given node of a given tree in a given forest in a ForestSamples
object
ForestSamples$parent_node(forest_num, tree_num, node_id)
forest_num
Index of the forest to be queried
tree_num
Index of the tree to be queried
node_id
Index of the node to be queried
Integer ID of the parent node
left_child_node()
Left child node of given node of a given tree in a given forest in a ForestSamples
object
ForestSamples$left_child_node(forest_num, tree_num, node_id)
forest_num
Index of the forest to be queried
tree_num
Index of the tree to be queried
node_id
Index of the node to be queried
Integer ID of the left child node
right_child_node()
Right child node of given node of a given tree in a given forest in a ForestSamples
object
ForestSamples$right_child_node(forest_num, tree_num, node_id)
forest_num
Index of the forest to be queried
tree_num
Index of the tree to be queried
node_id
Index of the node to be queried
Integer ID of the right child node
node_depth()
Depth of given node of a given tree in a given forest in a ForestSamples
object, with 0 depth for the root node.
ForestSamples$node_depth(forest_num, tree_num, node_id)
forest_num
Index of the forest to be queried
tree_num
Index of the tree to be queried
node_id
Index of the node to be queried
Integer valued depth of the node
node_split_index()
Split index of given node of a given tree in a given forest in a ForestSamples
object. Returns -1
is node is a leaf.
ForestSamples$node_split_index(forest_num, tree_num, node_id)
forest_num
Index of the forest to be queried
tree_num
Index of the tree to be queried
node_id
Index of the node to be queried
Integer valued depth of the node
node_split_threshold()
Threshold that defines a numeric split for a given node of a given tree in a given forest in a ForestSamples
object.
Returns Inf
if the node is a leaf or a categorical split node.
ForestSamples$node_split_threshold(forest_num, tree_num, node_id)
forest_num
Index of the forest to be queried
tree_num
Index of the tree to be queried
node_id
Index of the node to be queried
Threshold defining a split for the node
node_split_categories()
Array of category indices that define a categorical split for a given node of a given tree in a given forest in a ForestSamples
object.
Returns c(Inf)
if the node is a leaf or a numeric split node.
ForestSamples$node_split_categories(forest_num, tree_num, node_id)
forest_num
Index of the forest to be queried
tree_num
Index of the tree to be queried
node_id
Index of the node to be queried
Categories defining a split for the node
node_leaf_values()
Leaf node value(s) for a given node of a given tree in a given forest in a ForestSamples
object.
Values are stale if the node is a split node.
ForestSamples$node_leaf_values(forest_num, tree_num, node_id)
forest_num
Index of the forest to be queried
tree_num
Index of the tree to be queried
node_id
Index of the node to be queried
Vector (often univariate) of leaf values
num_nodes()
Number of nodes in a given tree in a given forest in a ForestSamples
object.
ForestSamples$num_nodes(forest_num, tree_num)
forest_num
Index of the forest to be queried
tree_num
Index of the tree to be queried
Count of total tree nodes
num_leaves()
Number of leaves in a given tree in a given forest in a ForestSamples
object.
ForestSamples$num_leaves(forest_num, tree_num)
forest_num
Index of the forest to be queried
tree_num
Index of the tree to be queried
Count of total tree leaves
num_leaf_parents()
Number of leaf parents (split nodes with two leaves as children) in a given tree in a given forest in a ForestSamples
object.
ForestSamples$num_leaf_parents(forest_num, tree_num)
forest_num
Index of the forest to be queried
tree_num
Index of the tree to be queried
Count of total tree leaf parents
num_split_nodes()
Number of split nodes in a given tree in a given forest in a ForestSamples
object.
ForestSamples$num_split_nodes(forest_num, tree_num)
forest_num
Index of the forest to be queried
tree_num
Index of the tree to be queried
Count of total tree split nodes
nodes()
Array of node indices in a given tree in a given forest in a ForestSamples
object.
ForestSamples$nodes(forest_num, tree_num)
forest_num
Index of the forest to be queried
tree_num
Index of the tree to be queried
Indices of tree nodes
leaves()
Array of leaf indices in a given tree in a given forest in a ForestSamples
object.
ForestSamples$leaves(forest_num, tree_num)
forest_num
Index of the forest to be queried
tree_num
Index of the tree to be queried
Indices of leaf nodes
delete_sample()
Modify the ForestSamples
object by removing the forest sample indexed by 'forest_num
ForestSamples$delete_sample(forest_num)
forest_num
Index of the forest to be removed
Generic function for extracting random effect samples from a model object (BCF, BART, etc...)
getRandomEffectSamples(object, ...)
getRandomEffectSamples(object, ...)
object |
Fitted model object from which to extract random effects |
... |
Other parameters to be used in random effects extraction |
List of random effect samples
n <- 100 p <- 10 X <- matrix(runif(n*p), ncol = p) rfx_group_ids <- sample(1:2, size = n, replace = TRUE) rfx_basis <- rep(1.0, n) y <- (-5 + 10*(X[,1] > 0.5)) + (-2*(rfx_group_ids==1)+2*(rfx_group_ids==2)) + rnorm(n) bart_model <- bart(X_train=X, y_train=y, rfx_group_ids_train=rfx_group_ids, rfx_basis_train = rfx_basis, num_gfr=0, num_mcmc=10) rfx_samples <- getRandomEffectSamples(bart_model)
n <- 100 p <- 10 X <- matrix(runif(n*p), ncol = p) rfx_group_ids <- sample(1:2, size = n, replace = TRUE) rfx_basis <- rep(1.0, n) y <- (-5 + 10*(X[,1] > 0.5)) + (-2*(rfx_group_ids==1)+2*(rfx_group_ids==2)) + rnorm(n) bart_model <- bart(X_train=X, y_train=y, rfx_group_ids_train=rfx_group_ids, rfx_basis_train = rfx_basis, num_gfr=0, num_mcmc=10) rfx_samples <- getRandomEffectSamples(bart_model)
Extract raw sample values for each of the random effect parameter terms.
## S3 method for class 'bartmodel' getRandomEffectSamples(object, ...)
## S3 method for class 'bartmodel' getRandomEffectSamples(object, ...)
object |
Object of type |
... |
Other parameters to be used in random effects extraction |
List of arrays. The alpha array has dimension (num_components
, num_samples
) and is simply a vector if num_components = 1
.
The xi and beta arrays have dimension (num_components
, num_groups
, num_samples
) and is simply a matrix if num_components = 1
.
The sigma array has dimension (num_components
, num_samples
) and is simply a vector if num_components = 1
.
n <- 100 p <- 5 X <- matrix(runif(n*p), ncol = p) f_XW <- ( ((0 <= X[,1]) & (0.25 > X[,1])) * (-7.5) + ((0.25 <= X[,1]) & (0.5 > X[,1])) * (-2.5) + ((0.5 <= X[,1]) & (0.75 > X[,1])) * (2.5) + ((0.75 <= X[,1]) & (1 > X[,1])) * (7.5) ) snr <- 3 group_ids <- rep(c(1,2), n %/% 2) rfx_coefs <- matrix(c(-1, -1, 1, 1), nrow=2, byrow=TRUE) rfx_basis <- cbind(1, runif(n, -1, 1)) rfx_term <- rowSums(rfx_coefs[group_ids,] * rfx_basis) E_y <- f_XW + rfx_term y <- E_y + rnorm(n, 0, 1)*(sd(E_y)/snr) test_set_pct <- 0.2 n_test <- round(test_set_pct*n) n_train <- n - n_test test_inds <- sort(sample(1:n, n_test, replace = FALSE)) train_inds <- (1:n)[!((1:n) %in% test_inds)] X_test <- X[test_inds,] X_train <- X[train_inds,] y_test <- y[test_inds] y_train <- y[train_inds] rfx_group_ids_test <- group_ids[test_inds] rfx_group_ids_train <- group_ids[train_inds] rfx_basis_test <- rfx_basis[test_inds,] rfx_basis_train <- rfx_basis[train_inds,] rfx_term_test <- rfx_term[test_inds] rfx_term_train <- rfx_term[train_inds] bart_model <- bart(X_train = X_train, y_train = y_train, X_test = X_test, rfx_group_ids_train = rfx_group_ids_train, rfx_group_ids_test = rfx_group_ids_test, rfx_basis_train = rfx_basis_train, rfx_basis_test = rfx_basis_test, num_gfr = 10, num_burnin = 0, num_mcmc = 10) rfx_samples <- getRandomEffectSamples(bart_model)
n <- 100 p <- 5 X <- matrix(runif(n*p), ncol = p) f_XW <- ( ((0 <= X[,1]) & (0.25 > X[,1])) * (-7.5) + ((0.25 <= X[,1]) & (0.5 > X[,1])) * (-2.5) + ((0.5 <= X[,1]) & (0.75 > X[,1])) * (2.5) + ((0.75 <= X[,1]) & (1 > X[,1])) * (7.5) ) snr <- 3 group_ids <- rep(c(1,2), n %/% 2) rfx_coefs <- matrix(c(-1, -1, 1, 1), nrow=2, byrow=TRUE) rfx_basis <- cbind(1, runif(n, -1, 1)) rfx_term <- rowSums(rfx_coefs[group_ids,] * rfx_basis) E_y <- f_XW + rfx_term y <- E_y + rnorm(n, 0, 1)*(sd(E_y)/snr) test_set_pct <- 0.2 n_test <- round(test_set_pct*n) n_train <- n - n_test test_inds <- sort(sample(1:n, n_test, replace = FALSE)) train_inds <- (1:n)[!((1:n) %in% test_inds)] X_test <- X[test_inds,] X_train <- X[train_inds,] y_test <- y[test_inds] y_train <- y[train_inds] rfx_group_ids_test <- group_ids[test_inds] rfx_group_ids_train <- group_ids[train_inds] rfx_basis_test <- rfx_basis[test_inds,] rfx_basis_train <- rfx_basis[train_inds,] rfx_term_test <- rfx_term[test_inds] rfx_term_train <- rfx_term[train_inds] bart_model <- bart(X_train = X_train, y_train = y_train, X_test = X_test, rfx_group_ids_train = rfx_group_ids_train, rfx_group_ids_test = rfx_group_ids_test, rfx_basis_train = rfx_basis_train, rfx_basis_test = rfx_basis_test, num_gfr = 10, num_burnin = 0, num_mcmc = 10) rfx_samples <- getRandomEffectSamples(bart_model)
Extract raw sample values for each of the random effect parameter terms.
## S3 method for class 'bcfmodel' getRandomEffectSamples(object, ...)
## S3 method for class 'bcfmodel' getRandomEffectSamples(object, ...)
object |
Object of type |
... |
Other parameters to be used in random effects extraction |
List of arrays. The alpha array has dimension (num_components
, num_samples
) and is simply a vector if num_components = 1
.
The xi and beta arrays have dimension (num_components
, num_groups
, num_samples
) and is simply a matrix if num_components = 1
.
The sigma array has dimension (num_components
, num_samples
) and is simply a vector if num_components = 1
.
n <- 500 p <- 5 X <- matrix(runif(n*p), ncol = p) mu_x <- ( ((0 <= X[,1]) & (0.25 > X[,1])) * (-7.5) + ((0.25 <= X[,1]) & (0.5 > X[,1])) * (-2.5) + ((0.5 <= X[,1]) & (0.75 > X[,1])) * (2.5) + ((0.75 <= X[,1]) & (1 > X[,1])) * (7.5) ) pi_x <- ( ((0 <= X[,1]) & (0.25 > X[,1])) * (0.2) + ((0.25 <= X[,1]) & (0.5 > X[,1])) * (0.4) + ((0.5 <= X[,1]) & (0.75 > X[,1])) * (0.6) + ((0.75 <= X[,1]) & (1 > X[,1])) * (0.8) ) tau_x <- ( ((0 <= X[,2]) & (0.25 > X[,2])) * (0.5) + ((0.25 <= X[,2]) & (0.5 > X[,2])) * (1.0) + ((0.5 <= X[,2]) & (0.75 > X[,2])) * (1.5) + ((0.75 <= X[,2]) & (1 > X[,2])) * (2.0) ) Z <- rbinom(n, 1, pi_x) E_XZ <- mu_x + Z*tau_x snr <- 3 rfx_group_ids <- rep(c(1,2), n %/% 2) rfx_coefs <- matrix(c(-1, -1, 1, 1), nrow=2, byrow=TRUE) rfx_basis <- cbind(1, runif(n, -1, 1)) rfx_term <- rowSums(rfx_coefs[rfx_group_ids,] * rfx_basis) y <- E_XZ + rfx_term + rnorm(n, 0, 1)*(sd(E_XZ)/snr) test_set_pct <- 0.2 n_test <- round(test_set_pct*n) n_train <- n - n_test test_inds <- sort(sample(1:n, n_test, replace = FALSE)) train_inds <- (1:n)[!((1:n) %in% test_inds)] X_test <- X[test_inds,] X_train <- X[train_inds,] pi_test <- pi_x[test_inds] pi_train <- pi_x[train_inds] Z_test <- Z[test_inds] Z_train <- Z[train_inds] y_test <- y[test_inds] y_train <- y[train_inds] mu_test <- mu_x[test_inds] mu_train <- mu_x[train_inds] tau_test <- tau_x[test_inds] tau_train <- tau_x[train_inds] rfx_group_ids_test <- rfx_group_ids[test_inds] rfx_group_ids_train <- rfx_group_ids[train_inds] rfx_basis_test <- rfx_basis[test_inds,] rfx_basis_train <- rfx_basis[train_inds,] rfx_term_test <- rfx_term[test_inds] rfx_term_train <- rfx_term[train_inds] mu_params <- list(sample_sigma_leaf = TRUE) tau_params <- list(sample_sigma_leaf = FALSE) bcf_model <- bcf(X_train = X_train, Z_train = Z_train, y_train = y_train, propensity_train = pi_train, rfx_group_ids_train = rfx_group_ids_train, rfx_basis_train = rfx_basis_train, X_test = X_test, Z_test = Z_test, propensity_test = pi_test, rfx_group_ids_test = rfx_group_ids_test, rfx_basis_test = rfx_basis_test, num_gfr = 10, num_burnin = 0, num_mcmc = 10, prognostic_forest_params = mu_params, treatment_effect_forest_params = tau_params) rfx_samples <- getRandomEffectSamples(bcf_model)
n <- 500 p <- 5 X <- matrix(runif(n*p), ncol = p) mu_x <- ( ((0 <= X[,1]) & (0.25 > X[,1])) * (-7.5) + ((0.25 <= X[,1]) & (0.5 > X[,1])) * (-2.5) + ((0.5 <= X[,1]) & (0.75 > X[,1])) * (2.5) + ((0.75 <= X[,1]) & (1 > X[,1])) * (7.5) ) pi_x <- ( ((0 <= X[,1]) & (0.25 > X[,1])) * (0.2) + ((0.25 <= X[,1]) & (0.5 > X[,1])) * (0.4) + ((0.5 <= X[,1]) & (0.75 > X[,1])) * (0.6) + ((0.75 <= X[,1]) & (1 > X[,1])) * (0.8) ) tau_x <- ( ((0 <= X[,2]) & (0.25 > X[,2])) * (0.5) + ((0.25 <= X[,2]) & (0.5 > X[,2])) * (1.0) + ((0.5 <= X[,2]) & (0.75 > X[,2])) * (1.5) + ((0.75 <= X[,2]) & (1 > X[,2])) * (2.0) ) Z <- rbinom(n, 1, pi_x) E_XZ <- mu_x + Z*tau_x snr <- 3 rfx_group_ids <- rep(c(1,2), n %/% 2) rfx_coefs <- matrix(c(-1, -1, 1, 1), nrow=2, byrow=TRUE) rfx_basis <- cbind(1, runif(n, -1, 1)) rfx_term <- rowSums(rfx_coefs[rfx_group_ids,] * rfx_basis) y <- E_XZ + rfx_term + rnorm(n, 0, 1)*(sd(E_XZ)/snr) test_set_pct <- 0.2 n_test <- round(test_set_pct*n) n_train <- n - n_test test_inds <- sort(sample(1:n, n_test, replace = FALSE)) train_inds <- (1:n)[!((1:n) %in% test_inds)] X_test <- X[test_inds,] X_train <- X[train_inds,] pi_test <- pi_x[test_inds] pi_train <- pi_x[train_inds] Z_test <- Z[test_inds] Z_train <- Z[train_inds] y_test <- y[test_inds] y_train <- y[train_inds] mu_test <- mu_x[test_inds] mu_train <- mu_x[train_inds] tau_test <- tau_x[test_inds] tau_train <- tau_x[train_inds] rfx_group_ids_test <- rfx_group_ids[test_inds] rfx_group_ids_train <- rfx_group_ids[train_inds] rfx_basis_test <- rfx_basis[test_inds,] rfx_basis_train <- rfx_basis[train_inds,] rfx_term_test <- rfx_term[test_inds] rfx_term_train <- rfx_term[train_inds] mu_params <- list(sample_sigma_leaf = TRUE) tau_params <- list(sample_sigma_leaf = FALSE) bcf_model <- bcf(X_train = X_train, Z_train = Z_train, y_train = y_train, propensity_train = pi_train, rfx_group_ids_train = rfx_group_ids_train, rfx_basis_train = rfx_basis_train, X_test = X_test, Z_test = Z_test, propensity_test = pi_test, rfx_group_ids_test = rfx_group_ids_test, rfx_basis_test = rfx_basis_test, num_gfr = 10, num_burnin = 0, num_mcmc = 10, prognostic_forest_params = mu_params, treatment_effect_forest_params = tau_params) rfx_samples <- getRandomEffectSamples(bcf_model)
The "low-level" stochtree interface enables a high degreee of sampler customization, in which users employ R wrappers around C++ objects like ForestDataset, Outcome, CppRng, and ForestModel to run the Gibbs sampler of a BART model with custom modifications. GlobalModelConfig allows users to specify / query the global parameters of a model they wish to run.
Global error variance parameter
global_error_variance
Global error variance parameter Create a new GlobalModelConfig object.
new()
GlobalModelConfig$new(global_error_variance = 1)
global_error_variance
Global error variance parameter (default: 1.0
)
A new GlobalModelConfig object.
update_global_error_variance()
Update global error variance parameter
GlobalModelConfig$update_global_error_variance(global_error_variance)
global_error_variance
Global error variance parameter
get_global_error_variance()
Query global error variance parameter for this GlobalModelConfig object
GlobalModelConfig$get_global_error_variance()
Combine multiple JSON model objects containing forests (with the same hierarchy / schema) into a single forest_container
loadForestContainerCombinedJson(json_object_list, json_forest_label)
loadForestContainerCombinedJson(json_object_list, json_forest_label)
json_object_list |
List of objects of class |
json_forest_label |
Label referring to a particular forest (i.e. "forest_0") in the overall json hierarchy (must exist in every json object in the list) |
ForestSamples
object
X <- matrix(runif(10*100), ncol = 10) y <- -5 + 10*(X[,1] > 0.5) + rnorm(100) bart_model <- bart(X, y, num_gfr=0, num_mcmc=10) bart_json <- list(saveBARTModelToJson(bart_model)) mean_forest <- loadForestContainerCombinedJson(bart_json, "forest_0")
X <- matrix(runif(10*100), ncol = 10) y <- -5 + 10*(X[,1] > 0.5) + rnorm(100) bart_model <- bart(X, y, num_gfr=0, num_mcmc=10) bart_json <- list(saveBARTModelToJson(bart_model)) mean_forest <- loadForestContainerCombinedJson(bart_json, "forest_0")
Combine multiple JSON strings representing model objects containing forests (with the same hierarchy / schema) into a single forest_container
loadForestContainerCombinedJsonString(json_string_list, json_forest_label)
loadForestContainerCombinedJsonString(json_string_list, json_forest_label)
json_string_list |
List of strings that parse into objects of type |
json_forest_label |
Label referring to a particular forest (i.e. "forest_0") in the overall json hierarchy (must exist in every json object in the list) |
ForestSamples
object
X <- matrix(runif(10*100), ncol = 10) y <- -5 + 10*(X[,1] > 0.5) + rnorm(100) bart_model <- bart(X, y, num_gfr=0, num_mcmc=10) bart_json_string <- list(saveBARTModelToJsonString(bart_model)) mean_forest <- loadForestContainerCombinedJsonString(bart_json_string, "forest_0")
X <- matrix(runif(10*100), ncol = 10) y <- -5 + 10*(X[,1] > 0.5) + rnorm(100) bart_model <- bart(X, y, num_gfr=0, num_mcmc=10) bart_json_string <- list(saveBARTModelToJsonString(bart_model)) mean_forest <- loadForestContainerCombinedJsonString(bart_json_string, "forest_0")
Load a container of forest samples from json
loadForestContainerJson(json_object, json_forest_label)
loadForestContainerJson(json_object, json_forest_label)
json_object |
Object of class |
json_forest_label |
Label referring to a particular forest (i.e. "forest_0") in the overall json hierarchy |
ForestSamples
object
X <- matrix(runif(10*100), ncol = 10) y <- -5 + 10*(X[,1] > 0.5) + rnorm(100) bart_model <- bart(X, y, num_gfr=0, num_mcmc=10) bart_json <- saveBARTModelToJson(bart_model) mean_forest <- loadForestContainerJson(bart_json, "forest_0")
X <- matrix(runif(10*100), ncol = 10) y <- -5 + 10*(X[,1] > 0.5) + rnorm(100) bart_model <- bart(X, y, num_gfr=0, num_mcmc=10) bart_json <- saveBARTModelToJson(bart_model) mean_forest <- loadForestContainerJson(bart_json, "forest_0")
Combine multiple JSON model objects containing random effects (with the same hierarchy / schema) into a single container
loadRandomEffectSamplesCombinedJson(json_object_list, json_rfx_num)
loadRandomEffectSamplesCombinedJson(json_object_list, json_rfx_num)
json_object_list |
List of objects of class |
json_rfx_num |
Integer index indicating the position of the random effects term to be unpacked |
RandomEffectSamples
object
n <- 100 p <- 10 X <- matrix(runif(n*p), ncol = p) rfx_group_ids <- sample(1:2, size = n, replace = TRUE) rfx_basis <- rep(1.0, n) y <- (-5 + 10*(X[,1] > 0.5)) + (-2*(rfx_group_ids==1)+2*(rfx_group_ids==2)) + rnorm(n) bart_model <- bart(X_train=X, y_train=y, rfx_group_ids_train=rfx_group_ids, rfx_basis_train = rfx_basis, num_gfr=0, num_mcmc=10) bart_json <- list(saveBARTModelToJson(bart_model)) rfx_samples <- loadRandomEffectSamplesCombinedJson(bart_json, 0)
n <- 100 p <- 10 X <- matrix(runif(n*p), ncol = p) rfx_group_ids <- sample(1:2, size = n, replace = TRUE) rfx_basis <- rep(1.0, n) y <- (-5 + 10*(X[,1] > 0.5)) + (-2*(rfx_group_ids==1)+2*(rfx_group_ids==2)) + rnorm(n) bart_model <- bart(X_train=X, y_train=y, rfx_group_ids_train=rfx_group_ids, rfx_basis_train = rfx_basis, num_gfr=0, num_mcmc=10) bart_json <- list(saveBARTModelToJson(bart_model)) rfx_samples <- loadRandomEffectSamplesCombinedJson(bart_json, 0)
Combine multiple JSON strings representing model objects containing random effects (with the same hierarchy / schema) into a single container
loadRandomEffectSamplesCombinedJsonString(json_string_list, json_rfx_num)
loadRandomEffectSamplesCombinedJsonString(json_string_list, json_rfx_num)
json_string_list |
List of objects of class |
json_rfx_num |
Integer index indicating the position of the random effects term to be unpacked |
RandomEffectSamples
object
n <- 100 p <- 10 X <- matrix(runif(n*p), ncol = p) rfx_group_ids <- sample(1:2, size = n, replace = TRUE) rfx_basis <- rep(1.0, n) y <- (-5 + 10*(X[,1] > 0.5)) + (-2*(rfx_group_ids==1)+2*(rfx_group_ids==2)) + rnorm(n) bart_model <- bart(X_train=X, y_train=y, rfx_group_ids_train=rfx_group_ids, rfx_basis_train = rfx_basis, num_gfr=0, num_mcmc=10) bart_json_string <- list(saveBARTModelToJsonString(bart_model)) rfx_samples <- loadRandomEffectSamplesCombinedJsonString(bart_json_string, 0)
n <- 100 p <- 10 X <- matrix(runif(n*p), ncol = p) rfx_group_ids <- sample(1:2, size = n, replace = TRUE) rfx_basis <- rep(1.0, n) y <- (-5 + 10*(X[,1] > 0.5)) + (-2*(rfx_group_ids==1)+2*(rfx_group_ids==2)) + rnorm(n) bart_model <- bart(X_train=X, y_train=y, rfx_group_ids_train=rfx_group_ids, rfx_basis_train = rfx_basis, num_gfr=0, num_mcmc=10) bart_json_string <- list(saveBARTModelToJsonString(bart_model)) rfx_samples <- loadRandomEffectSamplesCombinedJsonString(bart_json_string, 0)
Load a container of random effect samples from json
loadRandomEffectSamplesJson(json_object, json_rfx_num)
loadRandomEffectSamplesJson(json_object, json_rfx_num)
json_object |
Object of class |
json_rfx_num |
Integer index indicating the position of the random effects term to be unpacked |
RandomEffectSamples
object
n <- 100 p <- 10 X <- matrix(runif(n*p), ncol = p) rfx_group_ids <- sample(1:2, size = n, replace = TRUE) rfx_basis <- rep(1.0, n) y <- (-5 + 10*(X[,1] > 0.5)) + (-2*(rfx_group_ids==1)+2*(rfx_group_ids==2)) + rnorm(n) bart_model <- bart(X_train=X, y_train=y, rfx_group_ids_train=rfx_group_ids, rfx_basis_train = rfx_basis, num_gfr=0, num_mcmc=10) bart_json <- saveBARTModelToJson(bart_model) rfx_samples <- loadRandomEffectSamplesJson(bart_json, 0)
n <- 100 p <- 10 X <- matrix(runif(n*p), ncol = p) rfx_group_ids <- sample(1:2, size = n, replace = TRUE) rfx_basis <- rep(1.0, n) y <- (-5 + 10*(X[,1] > 0.5)) + (-2*(rfx_group_ids==1)+2*(rfx_group_ids==2)) + rnorm(n) bart_model <- bart(X_train=X, y_train=y, rfx_group_ids_train=rfx_group_ids, rfx_basis_train = rfx_basis, num_gfr=0, num_mcmc=10) bart_json <- saveBARTModelToJson(bart_model) rfx_samples <- loadRandomEffectSamplesJson(bart_json, 0)
Load a scalar from json
loadScalarJson(json_object, json_scalar_label, subfolder_name = NULL)
loadScalarJson(json_object, json_scalar_label, subfolder_name = NULL)
json_object |
Object of class |
json_scalar_label |
Label referring to a particular scalar / string value (i.e. "num_samples") in the overall json hierarchy |
subfolder_name |
(Optional) Name of the subfolder / hierarchy under which vector sits |
R vector
example_scalar <- 5.4 example_json <- createCppJson() example_json$add_scalar("myscalar", example_scalar) roundtrip_scalar <- loadScalarJson(example_json, "myscalar")
example_scalar <- 5.4 example_json <- createCppJson() example_json$add_scalar("myscalar", example_scalar) roundtrip_scalar <- loadScalarJson(example_json, "myscalar")
Load a vector from json
loadVectorJson(json_object, json_vector_label, subfolder_name = NULL)
loadVectorJson(json_object, json_vector_label, subfolder_name = NULL)
json_object |
Object of class |
json_vector_label |
Label referring to a particular vector (i.e. "sigma2_samples") in the overall json hierarchy |
subfolder_name |
(Optional) Name of the subfolder / hierarchy under which vector sits |
R vector
example_vec <- runif(10) example_json <- createCppJson() example_json$add_vector("myvec", example_vec) roundtrip_vec <- loadVectorJson(example_json, "myvec")
example_vec <- runif(10) example_json <- createCppJson() example_json$add_vector("myvec", example_vec) roundtrip_vec <- loadVectorJson(example_json, "myvec")
The outcome class is wrapper around a vector of (mutable) outcomes for ML tasks (supervised learning, causal inference). When an additive tree ensemble is sampled, the outcome used to sample a specific model term is the "partial residual" consisting of the outcome minus the predictions of every other model term (trees, group random effects, etc...).
data_ptr
External pointer to a C++ Outcome class
new()
Create a new Outcome object.
Outcome$new(outcome)
outcome
Vector of outcome values
A new Outcome
object.
get_data()
Extract raw data in R from the underlying C++ object
Outcome$get_data()
R vector containing (copy of) the values in Outcome
object
add_vector()
Update the current state of the outcome (i.e. partial residual) data by adding the values of update_vector
Outcome$add_vector(update_vector)
update_vector
Vector to be added to outcome
None
subtract_vector()
Update the current state of the outcome (i.e. partial residual) data by subtracting the values of update_vector
Outcome$subtract_vector(update_vector)
update_vector
Vector to be subtracted from outcome
None
update_data()
Update the current state of the outcome (i.e. partial residual) data by replacing each element with the elements of new_vector
Outcome$update_data(new_vector)
new_vector
Vector from which to overwrite the current data
None
Predict from a sampled BART model on new data
## S3 method for class 'bartmodel' predict( object, X, leaf_basis = NULL, rfx_group_ids = NULL, rfx_basis = NULL, ... )
## S3 method for class 'bartmodel' predict( object, X, leaf_basis = NULL, rfx_group_ids = NULL, rfx_basis = NULL, ... )
object |
Object of type |
X |
Covariates used to determine tree leaf predictions for each observation. Must be passed as a matrix or dataframe. |
leaf_basis |
(Optional) Bases used for prediction (by e.g. dot product with leaf values). Default: |
rfx_group_ids |
(Optional) Test set group labels used for an additive random effects model. We do not currently support (but plan to in the near future), test set evaluation for group labels that were not in the training set. |
rfx_basis |
(Optional) Test set basis for "random-slope" regression in additive random effects model. |
... |
(Optional) Other prediction parameters. |
List of prediction matrices. If model does not have random effects, the list has one element – the predictions from the forest.
If the model does have random effects, the list has three elements – forest predictions, random effects predictions, and their sum (y_hat
).
n <- 100 p <- 5 X <- matrix(runif(n*p), ncol = p) f_XW <- ( ((0 <= X[,1]) & (0.25 > X[,1])) * (-7.5) + ((0.25 <= X[,1]) & (0.5 > X[,1])) * (-2.5) + ((0.5 <= X[,1]) & (0.75 > X[,1])) * (2.5) + ((0.75 <= X[,1]) & (1 > X[,1])) * (7.5) ) noise_sd <- 1 y <- f_XW + rnorm(n, 0, noise_sd) test_set_pct <- 0.2 n_test <- round(test_set_pct*n) n_train <- n - n_test test_inds <- sort(sample(1:n, n_test, replace = FALSE)) train_inds <- (1:n)[!((1:n) %in% test_inds)] X_test <- X[test_inds,] X_train <- X[train_inds,] y_test <- y[test_inds] y_train <- y[train_inds] bart_model <- bart(X_train = X_train, y_train = y_train, num_gfr = 10, num_burnin = 0, num_mcmc = 10) y_hat_test <- predict(bart_model, X_test)$y_hat
n <- 100 p <- 5 X <- matrix(runif(n*p), ncol = p) f_XW <- ( ((0 <= X[,1]) & (0.25 > X[,1])) * (-7.5) + ((0.25 <= X[,1]) & (0.5 > X[,1])) * (-2.5) + ((0.5 <= X[,1]) & (0.75 > X[,1])) * (2.5) + ((0.75 <= X[,1]) & (1 > X[,1])) * (7.5) ) noise_sd <- 1 y <- f_XW + rnorm(n, 0, noise_sd) test_set_pct <- 0.2 n_test <- round(test_set_pct*n) n_train <- n - n_test test_inds <- sort(sample(1:n, n_test, replace = FALSE)) train_inds <- (1:n)[!((1:n) %in% test_inds)] X_test <- X[test_inds,] X_train <- X[train_inds,] y_test <- y[test_inds] y_train <- y[train_inds] bart_model <- bart(X_train = X_train, y_train = y_train, num_gfr = 10, num_burnin = 0, num_mcmc = 10) y_hat_test <- predict(bart_model, X_test)$y_hat
Predict from a sampled BCF model on new data
## S3 method for class 'bcfmodel' predict( object, X, Z, propensity = NULL, rfx_group_ids = NULL, rfx_basis = NULL, ... )
## S3 method for class 'bcfmodel' predict( object, X, Z, propensity = NULL, rfx_group_ids = NULL, rfx_basis = NULL, ... )
object |
Object of type |
X |
Covariates used to determine tree leaf predictions for each observation. Must be passed as a matrix or dataframe. |
Z |
Treatments used for prediction. |
propensity |
(Optional) Propensities used for prediction. |
rfx_group_ids |
(Optional) Test set group labels used for an additive random effects model. We do not currently support (but plan to in the near future), test set evaluation for group labels that were not in the training set. |
rfx_basis |
(Optional) Test set basis for "random-slope" regression in additive random effects model. |
... |
(Optional) Other prediction parameters. |
List of 3-5 nrow(X)
by object$num_samples
matrices: prognostic function estimates, treatment effect estimates, (optionally) random effects predictions, (optionally) variance forest predictions, and outcome predictions.
n <- 500 p <- 5 X <- matrix(runif(n*p), ncol = p) mu_x <- ( ((0 <= X[,1]) & (0.25 > X[,1])) * (-7.5) + ((0.25 <= X[,1]) & (0.5 > X[,1])) * (-2.5) + ((0.5 <= X[,1]) & (0.75 > X[,1])) * (2.5) + ((0.75 <= X[,1]) & (1 > X[,1])) * (7.5) ) pi_x <- ( ((0 <= X[,1]) & (0.25 > X[,1])) * (0.2) + ((0.25 <= X[,1]) & (0.5 > X[,1])) * (0.4) + ((0.5 <= X[,1]) & (0.75 > X[,1])) * (0.6) + ((0.75 <= X[,1]) & (1 > X[,1])) * (0.8) ) tau_x <- ( ((0 <= X[,2]) & (0.25 > X[,2])) * (0.5) + ((0.25 <= X[,2]) & (0.5 > X[,2])) * (1.0) + ((0.5 <= X[,2]) & (0.75 > X[,2])) * (1.5) + ((0.75 <= X[,2]) & (1 > X[,2])) * (2.0) ) Z <- rbinom(n, 1, pi_x) noise_sd <- 1 y <- mu_x + tau_x*Z + rnorm(n, 0, noise_sd) test_set_pct <- 0.2 n_test <- round(test_set_pct*n) n_train <- n - n_test test_inds <- sort(sample(1:n, n_test, replace = FALSE)) train_inds <- (1:n)[!((1:n) %in% test_inds)] X_test <- X[test_inds,] X_train <- X[train_inds,] pi_test <- pi_x[test_inds] pi_train <- pi_x[train_inds] Z_test <- Z[test_inds] Z_train <- Z[train_inds] y_test <- y[test_inds] y_train <- y[train_inds] mu_test <- mu_x[test_inds] mu_train <- mu_x[train_inds] tau_test <- tau_x[test_inds] tau_train <- tau_x[train_inds] bcf_model <- bcf(X_train = X_train, Z_train = Z_train, y_train = y_train, propensity_train = pi_train, num_gfr = 10, num_burnin = 0, num_mcmc = 10) preds <- predict(bcf_model, X_test, Z_test, pi_test)
n <- 500 p <- 5 X <- matrix(runif(n*p), ncol = p) mu_x <- ( ((0 <= X[,1]) & (0.25 > X[,1])) * (-7.5) + ((0.25 <= X[,1]) & (0.5 > X[,1])) * (-2.5) + ((0.5 <= X[,1]) & (0.75 > X[,1])) * (2.5) + ((0.75 <= X[,1]) & (1 > X[,1])) * (7.5) ) pi_x <- ( ((0 <= X[,1]) & (0.25 > X[,1])) * (0.2) + ((0.25 <= X[,1]) & (0.5 > X[,1])) * (0.4) + ((0.5 <= X[,1]) & (0.75 > X[,1])) * (0.6) + ((0.75 <= X[,1]) & (1 > X[,1])) * (0.8) ) tau_x <- ( ((0 <= X[,2]) & (0.25 > X[,2])) * (0.5) + ((0.25 <= X[,2]) & (0.5 > X[,2])) * (1.0) + ((0.5 <= X[,2]) & (0.75 > X[,2])) * (1.5) + ((0.75 <= X[,2]) & (1 > X[,2])) * (2.0) ) Z <- rbinom(n, 1, pi_x) noise_sd <- 1 y <- mu_x + tau_x*Z + rnorm(n, 0, noise_sd) test_set_pct <- 0.2 n_test <- round(test_set_pct*n) n_train <- n - n_test test_inds <- sort(sample(1:n, n_test, replace = FALSE)) train_inds <- (1:n)[!((1:n) %in% test_inds)] X_test <- X[test_inds,] X_train <- X[train_inds,] pi_test <- pi_x[test_inds] pi_train <- pi_x[train_inds] Z_test <- Z[test_inds] Z_train <- Z[train_inds] y_test <- y[test_inds] y_train <- y[train_inds] mu_test <- mu_x[test_inds] mu_train <- mu_x[train_inds] tau_test <- tau_x[test_inds] tau_train <- tau_x[train_inds] bcf_model <- bcf(X_train = X_train, Z_train = Z_train, y_train = y_train, propensity_train = pi_train, num_gfr = 10, num_burnin = 0, num_mcmc = 10) preds <- predict(bcf_model, X_test, Z_test, pi_test)
Preprocess covariates. DataFrames will be preprocessed based on their column types. Matrices will be passed through assuming all columns are numeric.
preprocessPredictionData(input_data, metadata)
preprocessPredictionData(input_data, metadata)
input_data |
Covariates, provided as either a dataframe or a matrix |
metadata |
List containing information on variables, including train set categories for categorical variables |
Preprocessed data with categorical variables appropriately handled
cov_df <- data.frame(x1 = 1:5, x2 = 5:1, x3 = 6:10) metadata <- list(num_ordered_cat_vars = 0, num_unordered_cat_vars = 0, num_numeric_vars = 3, numeric_vars = c("x1", "x2", "x3")) X_preprocessed <- preprocessPredictionData(cov_df, metadata)
cov_df <- data.frame(x1 = 1:5, x2 = 5:1, x3 = 6:10) metadata <- list(num_ordered_cat_vars = 0, num_unordered_cat_vars = 0, num_numeric_vars = 3, numeric_vars = c("x1", "x2", "x3")) X_preprocessed <- preprocessPredictionData(cov_df, metadata)
Preprocess covariates. DataFrames will be preprocessed based on their column types. Matrices will be passed through assuming all columns are numeric.
preprocessTrainData(input_data)
preprocessTrainData(input_data)
input_data |
Covariates, provided as either a dataframe or a matrix |
List with preprocessed (unmodified) data and details on the number of each type of variable, unique categories associated with categorical variables, and the vector of feature types needed for calls to BART and BCF.
cov_mat <- matrix(1:12, ncol = 3) preprocess_list <- preprocessTrainData(cov_mat) X <- preprocess_list$X
cov_mat <- matrix(1:12, ncol = 3) preprocess_list <- preprocessTrainData(cov_mat) X <- preprocess_list$X
Coordinates various C++ random effects classes and persists those needed for prediction / serialization
rfx_container_ptr
External pointer to a C++ StochTree::RandomEffectsContainer class
label_mapper_ptr
External pointer to a C++ StochTree::LabelMapper class
training_group_ids
Unique vector of group IDs that were in the training dataset
new()
Create a new RandomEffectSamples object.
RandomEffectSamples$new()
A new RandomEffectSamples
object.
load_in_session()
Construct RandomEffectSamples object from other "in-session" R objects
RandomEffectSamples$load_in_session( num_components, num_groups, random_effects_tracker )
num_components
Number of "components" or bases defining the random effects regression
num_groups
Number of random effects groups
random_effects_tracker
Object of type RandomEffectsTracker
None
load_from_json()
Construct RandomEffectSamples object from a json object
RandomEffectSamples$load_from_json( json_object, json_rfx_container_label, json_rfx_mapper_label, json_rfx_groupids_label )
json_object
Object of class CppJson
json_rfx_container_label
Label referring to a particular rfx sample container (i.e. "random_effect_container_0") in the overall json hierarchy
json_rfx_mapper_label
Label referring to a particular rfx label mapper (i.e. "random_effect_label_mapper_0") in the overall json hierarchy
json_rfx_groupids_label
Label referring to a particular set of rfx group IDs (i.e. "random_effect_groupids_0") in the overall json hierarchy
A new RandomEffectSamples
object.
append_from_json()
Append random effect draws to RandomEffectSamples
object from a json object
RandomEffectSamples$append_from_json( json_object, json_rfx_container_label, json_rfx_mapper_label, json_rfx_groupids_label )
json_object
Object of class CppJson
json_rfx_container_label
Label referring to a particular rfx sample container (i.e. "random_effect_container_0") in the overall json hierarchy
json_rfx_mapper_label
Label referring to a particular rfx label mapper (i.e. "random_effect_label_mapper_0") in the overall json hierarchy
json_rfx_groupids_label
Label referring to a particular set of rfx group IDs (i.e. "random_effect_groupids_0") in the overall json hierarchy
None
load_from_json_string()
Construct RandomEffectSamples object from a json object
RandomEffectSamples$load_from_json_string( json_string, json_rfx_container_label, json_rfx_mapper_label, json_rfx_groupids_label )
json_string
JSON string which parses into object of class CppJson
json_rfx_container_label
Label referring to a particular rfx sample container (i.e. "random_effect_container_0") in the overall json hierarchy
json_rfx_mapper_label
Label referring to a particular rfx label mapper (i.e. "random_effect_label_mapper_0") in the overall json hierarchy
json_rfx_groupids_label
Label referring to a particular set of rfx group IDs (i.e. "random_effect_groupids_0") in the overall json hierarchy
A new RandomEffectSamples
object.
append_from_json_string()
Append random effect draws to RandomEffectSamples
object from a json object
RandomEffectSamples$append_from_json_string( json_string, json_rfx_container_label, json_rfx_mapper_label, json_rfx_groupids_label )
json_string
JSON string which parses into object of class CppJson
json_rfx_container_label
Label referring to a particular rfx sample container (i.e. "random_effect_container_0") in the overall json hierarchy
json_rfx_mapper_label
Label referring to a particular rfx label mapper (i.e. "random_effect_label_mapper_0") in the overall json hierarchy
json_rfx_groupids_label
Label referring to a particular set of rfx group IDs (i.e. "random_effect_groupids_0") in the overall json hierarchy
None
predict()
Predict random effects for each observation implied by rfx_group_ids
and rfx_basis
.
If a random effects model is "intercept-only" the rfx_basis
will be a vector of ones of size length(rfx_group_ids)
.
RandomEffectSamples$predict(rfx_group_ids, rfx_basis = NULL)
rfx_group_ids
Indices of random effects groups in a prediction set
rfx_basis
(Optional ) Basis used for random effects prediction
Matrix with as many rows as observations provided and as many columns as samples drawn of the model.
extract_parameter_samples()
Extract the random effects parameters sampled. With the "redundant parameterization" of Gelman et al (2008), this includes four parameters: alpha (the "working parameter" shared across every group), xi (the "group parameter" sampled separately for each group), beta (the product of alpha and xi, which corresponds to the overall group-level random effects), and sigma (group-independent prior variance for each component of xi).
RandomEffectSamples$extract_parameter_samples()
List of arrays. The alpha array has dimension (num_components
, num_samples
) and is simply a vector if num_components = 1
.
The xi and beta arrays have dimension (num_components
, num_groups
, num_samples
) and is simply a matrix if num_components = 1
.
The sigma array has dimension (num_components
, num_samples
) and is simply a vector if num_components = 1
.
delete_sample()
Modify the RandomEffectsSamples
object by removing the parameter samples index by sample_num
.
RandomEffectSamples$delete_sample(sample_num)
sample_num
Index of the RFX sample to be removed
extract_label_mapping()
Convert the mapping of group IDs to random effect components indices from C++ to R native format
RandomEffectSamples$extract_label_mapping()
List mapping group ID to random effect components.
A dataset consists of three matrices / vectors: group labels, bases, and variance weights. Variance weights are optional.
data_ptr
External pointer to a C++ RandomEffectsDataset class
new()
Create a new RandomEffectsDataset object.
RandomEffectsDataset$new(group_labels, basis, variance_weights = NULL)
group_labels
Vector of group labels
basis
Matrix of bases used to define the random effects regression (for an intercept-only model, pass an array of ones)
variance_weights
(Optional) Vector of observation-specific variance weights
A new RandomEffectsDataset
object.
num_observations()
Return number of observations in a RandomEffectsDataset
object
RandomEffectsDataset$num_observations()
Observation count
has_group_labels()
Whether or not a dataset has group label indices
RandomEffectsDataset$has_group_labels()
True if group label vector is loaded, false otherwise
has_basis()
Whether or not a dataset has a basis matrix
RandomEffectsDataset$has_basis()
True if basis matrix is loaded, false otherwise
has_variance_weights()
Whether or not a dataset has variance weights
RandomEffectsDataset$has_variance_weights()
True if variance weights are loaded, false otherwise
Stores current model state, prior parameters, and procedures for sampling from the conditional posterior of each parameter.
rfx_model_ptr
External pointer to a C++ StochTree::RandomEffectsModel class
num_groups
Number of groups in the random effects model
num_components
Number of components (i.e. dimension of basis) in the random effects model
new()
Create a new RandomEffectsModel object.
RandomEffectsModel$new(num_components, num_groups)
num_components
Number of "components" or bases defining the random effects regression
num_groups
Number of random effects groups
A new RandomEffectsModel
object.
sample_random_effect()
Sample from random effects model.
RandomEffectsModel$sample_random_effect( rfx_dataset, residual, rfx_tracker, rfx_samples, keep_sample, global_variance, rng )
rfx_dataset
Object of type RandomEffectsDataset
residual
Object of type Outcome
rfx_tracker
Object of type RandomEffectsTracker
rfx_samples
Object of type RandomEffectSamples
keep_sample
Whether sample should be retained in rfx_samples
. If FALSE
, the state of rfx_tracker
will be updated, but the parameter values will not be added to the sample container. Samples are commonly discarded due to burn-in or thinning.
global_variance
Scalar global variance parameter
rng
Object of type CppRNG
None
predict()
Predict from (a single sample of a) random effects model.
RandomEffectsModel$predict(rfx_dataset, rfx_tracker)
rfx_dataset
Object of type RandomEffectsDataset
rfx_tracker
Object of type RandomEffectsTracker
Vector of predictions with size matching number of observations in rfx_dataset
set_working_parameter()
Set value for the "working parameter." This is typically used for initialization, but could also be used to interrupt or override the sampler.
RandomEffectsModel$set_working_parameter(value)
value
Parameter input
None
set_group_parameters()
Set value for the "group parameters." This is typically used for initialization, but could also be used to interrupt or override the sampler.
RandomEffectsModel$set_group_parameters(value)
value
Parameter input
None
set_working_parameter_cov()
Set value for the working parameter covariance. This is typically used for initialization, but could also be used to interrupt or override the sampler.
RandomEffectsModel$set_working_parameter_cov(value)
value
Parameter input
None
set_group_parameter_cov()
Set value for the group parameter covariance. This is typically used for initialization, but could also be used to interrupt or override the sampler.
RandomEffectsModel$set_group_parameter_cov(value)
value
Parameter input
None
set_variance_prior_shape()
Set shape parameter for the group parameter variance prior.
RandomEffectsModel$set_variance_prior_shape(value)
value
Parameter input
None
set_variance_prior_scale()
Set shape parameter for the group parameter variance prior.
RandomEffectsModel$set_variance_prior_scale(value)
value
Parameter input
None
Stores a mapping from every observation to its group index, a mapping from group indices to the training sample observations available in that group, and predictions for each observation.
rfx_tracker_ptr
External pointer to a C++ StochTree::RandomEffectsTracker class
new()
Create a new RandomEffectsTracker object.
RandomEffectsTracker$new(rfx_group_indices)
rfx_group_indices
Integer indices indicating groups used to define random effects
A new RandomEffectsTracker
object.
ForestContainer
or to an ensemble of single-node (i.e. root) treesReset an active forest, either from a specific forest in a ForestContainer
or to an ensemble of single-node (i.e. root) trees
resetActiveForest(active_forest, forest_samples = NULL, forest_num = NULL)
resetActiveForest(active_forest, forest_samples = NULL, forest_num = NULL)
active_forest |
Current active forest |
forest_samples |
(Optional) Container of forest samples from which to re-initialize active forest. If not provided, active forest will be reset to an ensemble of single-node (i.e. root) trees. |
forest_num |
(Optional) Index of forest samples from which to initialize active forest. If not provided, active forest will be reset to an ensemble of single-node (i.e. root) trees. |
None
num_trees <- 100 leaf_dimension <- 1 is_leaf_constant <- TRUE is_exponentiated <- FALSE active_forest <- createForest(num_trees, leaf_dimension, is_leaf_constant, is_exponentiated) forest_samples <- createForestSamples(num_trees, leaf_dimension, is_leaf_constant, is_exponentiated) forest_samples$add_forest_with_constant_leaves(0.0) forest_samples$add_numeric_split_tree(0, 0, 0, 0, 0.5, -1.0, 1.0) forest_samples$add_numeric_split_tree(0, 1, 0, 1, 0.75, 3.4, 0.75) active_forest$set_root_leaves(0.1) resetActiveForest(active_forest, forest_samples, 0) resetActiveForest(active_forest)
num_trees <- 100 leaf_dimension <- 1 is_leaf_constant <- TRUE is_exponentiated <- FALSE active_forest <- createForest(num_trees, leaf_dimension, is_leaf_constant, is_exponentiated) forest_samples <- createForestSamples(num_trees, leaf_dimension, is_leaf_constant, is_exponentiated) forest_samples$add_forest_with_constant_leaves(0.0) forest_samples$add_numeric_split_tree(0, 0, 0, 0, 0.5, -1.0, 1.0) forest_samples$add_numeric_split_tree(0, 1, 0, 1, 0.75, 3.4, 0.75) active_forest$set_root_leaves(0.1) resetActiveForest(active_forest, forest_samples, 0) resetActiveForest(active_forest)
ForestContainer
Re-initialize a forest model (tracking data structures) from a specific forest in a ForestContainer
resetForestModel(forest_model, forest, dataset, residual, is_mean_model)
resetForestModel(forest_model, forest, dataset, residual, is_mean_model)
forest_model |
Forest model with tracking data structures |
forest |
Forest from which to re-initialize forest model |
dataset |
Training dataset object |
residual |
Residual which will also be updated |
is_mean_model |
Whether the model being updated is a conditional mean model |
None
n <- 100 p <- 10 num_trees <- 100 leaf_dimension <- 1 is_leaf_constant <- TRUE is_exponentiated <- FALSE alpha <- 0.95 beta <- 2.0 min_samples_leaf <- 2 max_depth <- 10 feature_types <- as.integer(rep(0, p)) leaf_model <- 0 sigma2 <- 1.0 leaf_scale <- as.matrix(1.0) variable_weights <- rep(1/p, p) a_forest <- 1 b_forest <- 1 cutpoint_grid_size <- 100 X <- matrix(runif(n*p), ncol = p) forest_dataset <- createForestDataset(X) y <- -5 + 10*(X[,1] > 0.5) + rnorm(n) outcome <- createOutcome(y) rng <- createCppRNG(1234) global_model_config <- createGlobalModelConfig(global_error_variance=sigma2) forest_model_config <- createForestModelConfig(feature_types=feature_types, num_trees=num_trees, num_observations=n, num_features=p, alpha=alpha, beta=beta, min_samples_leaf=min_samples_leaf, max_depth=max_depth, variable_weights=variable_weights, cutpoint_grid_size=cutpoint_grid_size, leaf_model_type=leaf_model, leaf_model_scale=leaf_scale) forest_model <- createForestModel(forest_dataset, forest_model_config, global_model_config) active_forest <- createForest(num_trees, leaf_dimension, is_leaf_constant, is_exponentiated) forest_samples <- createForestSamples(num_trees, leaf_dimension, is_leaf_constant, is_exponentiated) active_forest$prepare_for_sampler(forest_dataset, outcome, forest_model, 0, 0.) forest_model$sample_one_iteration( forest_dataset, outcome, forest_samples, active_forest, rng, forest_model_config, global_model_config, keep_forest = TRUE, gfr = FALSE ) resetActiveForest(active_forest, forest_samples, 0) resetForestModel(forest_model, active_forest, forest_dataset, outcome, TRUE)
n <- 100 p <- 10 num_trees <- 100 leaf_dimension <- 1 is_leaf_constant <- TRUE is_exponentiated <- FALSE alpha <- 0.95 beta <- 2.0 min_samples_leaf <- 2 max_depth <- 10 feature_types <- as.integer(rep(0, p)) leaf_model <- 0 sigma2 <- 1.0 leaf_scale <- as.matrix(1.0) variable_weights <- rep(1/p, p) a_forest <- 1 b_forest <- 1 cutpoint_grid_size <- 100 X <- matrix(runif(n*p), ncol = p) forest_dataset <- createForestDataset(X) y <- -5 + 10*(X[,1] > 0.5) + rnorm(n) outcome <- createOutcome(y) rng <- createCppRNG(1234) global_model_config <- createGlobalModelConfig(global_error_variance=sigma2) forest_model_config <- createForestModelConfig(feature_types=feature_types, num_trees=num_trees, num_observations=n, num_features=p, alpha=alpha, beta=beta, min_samples_leaf=min_samples_leaf, max_depth=max_depth, variable_weights=variable_weights, cutpoint_grid_size=cutpoint_grid_size, leaf_model_type=leaf_model, leaf_model_scale=leaf_scale) forest_model <- createForestModel(forest_dataset, forest_model_config, global_model_config) active_forest <- createForest(num_trees, leaf_dimension, is_leaf_constant, is_exponentiated) forest_samples <- createForestSamples(num_trees, leaf_dimension, is_leaf_constant, is_exponentiated) active_forest$prepare_for_sampler(forest_dataset, outcome, forest_model, 0, 0.) forest_model$sample_one_iteration( forest_dataset, outcome, forest_samples, active_forest, rng, forest_model_config, global_model_config, keep_forest = TRUE, gfr = FALSE ) resetActiveForest(active_forest, forest_samples, 0) resetForestModel(forest_model, active_forest, forest_dataset, outcome, TRUE)
RandomEffectsModel
object based on the parameters indexed by sample_num
in a RandomEffectsSamples
objectReset a RandomEffectsModel
object based on the parameters indexed by sample_num
in a RandomEffectsSamples
object
resetRandomEffectsModel(rfx_model, rfx_samples, sample_num, sigma_alpha_init)
resetRandomEffectsModel(rfx_model, rfx_samples, sample_num, sigma_alpha_init)
rfx_model |
Object of type |
rfx_samples |
Object of type |
sample_num |
Index of sample stored in |
sigma_alpha_init |
Initial value of the "working parameter" scale parameter. |
None
n <- 100 p <- 10 rfx_group_ids <- sample(1:2, size = n, replace = TRUE) rfx_basis <- matrix(rep(1.0, n), ncol=1) rfx_dataset <- createRandomEffectsDataset(rfx_group_ids, rfx_basis) y <- (-2*(rfx_group_ids==1)+2*(rfx_group_ids==2)) + rnorm(n) y_std <- (y-mean(y))/sd(y) outcome <- createOutcome(y_std) rng <- createCppRNG(1234) num_groups <- length(unique(rfx_group_ids)) num_components <- ncol(rfx_basis) rfx_model <- createRandomEffectsModel(num_components, num_groups) rfx_tracker <- createRandomEffectsTracker(rfx_group_ids) rfx_samples <- createRandomEffectSamples(num_components, num_groups, rfx_tracker) alpha_init <- rep(1,num_components) xi_init <- matrix(rep(alpha_init, num_groups),num_components,num_groups) sigma_alpha_init <- diag(1,num_components,num_components) sigma_xi_init <- diag(1,num_components,num_components) sigma_xi_shape <- 1 sigma_xi_scale <- 1 rfx_model$set_working_parameter(alpha_init) rfx_model$set_group_parameters(xi_init) rfx_model$set_working_parameter_cov(sigma_alpha_init) rfx_model$set_group_parameter_cov(sigma_xi_init) rfx_model$set_variance_prior_shape(sigma_xi_shape) rfx_model$set_variance_prior_scale(sigma_xi_scale) for (i in 1:3) { rfx_model$sample_random_effect(rfx_dataset=rfx_dataset, residual=outcome, rfx_tracker=rfx_tracker, rfx_samples=rfx_samples, keep_sample=TRUE, global_variance=1.0, rng=rng) } resetRandomEffectsModel(rfx_model, rfx_samples, 0, 1.0)
n <- 100 p <- 10 rfx_group_ids <- sample(1:2, size = n, replace = TRUE) rfx_basis <- matrix(rep(1.0, n), ncol=1) rfx_dataset <- createRandomEffectsDataset(rfx_group_ids, rfx_basis) y <- (-2*(rfx_group_ids==1)+2*(rfx_group_ids==2)) + rnorm(n) y_std <- (y-mean(y))/sd(y) outcome <- createOutcome(y_std) rng <- createCppRNG(1234) num_groups <- length(unique(rfx_group_ids)) num_components <- ncol(rfx_basis) rfx_model <- createRandomEffectsModel(num_components, num_groups) rfx_tracker <- createRandomEffectsTracker(rfx_group_ids) rfx_samples <- createRandomEffectSamples(num_components, num_groups, rfx_tracker) alpha_init <- rep(1,num_components) xi_init <- matrix(rep(alpha_init, num_groups),num_components,num_groups) sigma_alpha_init <- diag(1,num_components,num_components) sigma_xi_init <- diag(1,num_components,num_components) sigma_xi_shape <- 1 sigma_xi_scale <- 1 rfx_model$set_working_parameter(alpha_init) rfx_model$set_group_parameters(xi_init) rfx_model$set_working_parameter_cov(sigma_alpha_init) rfx_model$set_group_parameter_cov(sigma_xi_init) rfx_model$set_variance_prior_shape(sigma_xi_shape) rfx_model$set_variance_prior_scale(sigma_xi_scale) for (i in 1:3) { rfx_model$sample_random_effect(rfx_dataset=rfx_dataset, residual=outcome, rfx_tracker=rfx_tracker, rfx_samples=rfx_samples, keep_sample=TRUE, global_variance=1.0, rng=rng) } resetRandomEffectsModel(rfx_model, rfx_samples, 0, 1.0)
RandomEffectsTracker
object based on the parameters indexed by sample_num
in a RandomEffectsSamples
objectReset a RandomEffectsTracker
object based on the parameters indexed by sample_num
in a RandomEffectsSamples
object
resetRandomEffectsTracker( rfx_tracker, rfx_model, rfx_dataset, residual, rfx_samples )
resetRandomEffectsTracker( rfx_tracker, rfx_model, rfx_dataset, residual, rfx_samples )
rfx_tracker |
Object of type |
rfx_model |
Object of type |
rfx_dataset |
Object of type |
residual |
Object of type |
rfx_samples |
Object of type |
None
n <- 100 p <- 10 rfx_group_ids <- sample(1:2, size = n, replace = TRUE) rfx_basis <- matrix(rep(1.0, n), ncol=1) rfx_dataset <- createRandomEffectsDataset(rfx_group_ids, rfx_basis) y <- (-2*(rfx_group_ids==1)+2*(rfx_group_ids==2)) + rnorm(n) y_std <- (y-mean(y))/sd(y) outcome <- createOutcome(y_std) rng <- createCppRNG(1234) num_groups <- length(unique(rfx_group_ids)) num_components <- ncol(rfx_basis) rfx_model <- createRandomEffectsModel(num_components, num_groups) rfx_tracker <- createRandomEffectsTracker(rfx_group_ids) rfx_samples <- createRandomEffectSamples(num_components, num_groups, rfx_tracker) alpha_init <- rep(1,num_components) xi_init <- matrix(rep(alpha_init, num_groups),num_components,num_groups) sigma_alpha_init <- diag(1,num_components,num_components) sigma_xi_init <- diag(1,num_components,num_components) sigma_xi_shape <- 1 sigma_xi_scale <- 1 rfx_model$set_working_parameter(alpha_init) rfx_model$set_group_parameters(xi_init) rfx_model$set_working_parameter_cov(sigma_alpha_init) rfx_model$set_group_parameter_cov(sigma_xi_init) rfx_model$set_variance_prior_shape(sigma_xi_shape) rfx_model$set_variance_prior_scale(sigma_xi_scale) for (i in 1:3) { rfx_model$sample_random_effect(rfx_dataset=rfx_dataset, residual=outcome, rfx_tracker=rfx_tracker, rfx_samples=rfx_samples, keep_sample=TRUE, global_variance=1.0, rng=rng) } resetRandomEffectsModel(rfx_model, rfx_samples, 0, 1.0) resetRandomEffectsTracker(rfx_tracker, rfx_model, rfx_dataset, outcome, rfx_samples)
n <- 100 p <- 10 rfx_group_ids <- sample(1:2, size = n, replace = TRUE) rfx_basis <- matrix(rep(1.0, n), ncol=1) rfx_dataset <- createRandomEffectsDataset(rfx_group_ids, rfx_basis) y <- (-2*(rfx_group_ids==1)+2*(rfx_group_ids==2)) + rnorm(n) y_std <- (y-mean(y))/sd(y) outcome <- createOutcome(y_std) rng <- createCppRNG(1234) num_groups <- length(unique(rfx_group_ids)) num_components <- ncol(rfx_basis) rfx_model <- createRandomEffectsModel(num_components, num_groups) rfx_tracker <- createRandomEffectsTracker(rfx_group_ids) rfx_samples <- createRandomEffectSamples(num_components, num_groups, rfx_tracker) alpha_init <- rep(1,num_components) xi_init <- matrix(rep(alpha_init, num_groups),num_components,num_groups) sigma_alpha_init <- diag(1,num_components,num_components) sigma_xi_init <- diag(1,num_components,num_components) sigma_xi_shape <- 1 sigma_xi_scale <- 1 rfx_model$set_working_parameter(alpha_init) rfx_model$set_group_parameters(xi_init) rfx_model$set_working_parameter_cov(sigma_alpha_init) rfx_model$set_group_parameter_cov(sigma_xi_init) rfx_model$set_variance_prior_shape(sigma_xi_shape) rfx_model$set_variance_prior_scale(sigma_xi_scale) for (i in 1:3) { rfx_model$sample_random_effect(rfx_dataset=rfx_dataset, residual=outcome, rfx_tracker=rfx_tracker, rfx_samples=rfx_samples, keep_sample=TRUE, global_variance=1.0, rng=rng) } resetRandomEffectsModel(rfx_model, rfx_samples, 0, 1.0) resetRandomEffectsTracker(rfx_tracker, rfx_model, rfx_dataset, outcome, rfx_samples)
RandomEffectsModel
object to its "default" stateReset a RandomEffectsModel
object to its "default" state
rootResetRandomEffectsModel( rfx_model, alpha_init, xi_init, sigma_alpha_init, sigma_xi_init, sigma_xi_shape, sigma_xi_scale )
rootResetRandomEffectsModel( rfx_model, alpha_init, xi_init, sigma_alpha_init, sigma_xi_init, sigma_xi_shape, sigma_xi_scale )
rfx_model |
Object of type |
alpha_init |
Initial value of the "working parameter". |
xi_init |
Initial value of the "group parameters". |
sigma_alpha_init |
Initial value of the "working parameter" scale parameter. |
sigma_xi_init |
Initial value of the "group parameters" scale parameter. |
sigma_xi_shape |
Shape parameter for the inverse gamma variance model on the group parameters. |
sigma_xi_scale |
Scale parameter for the inverse gamma variance model on the group parameters. |
None
n <- 100 p <- 10 rfx_group_ids <- sample(1:2, size = n, replace = TRUE) rfx_basis <- matrix(rep(1.0, n), ncol=1) rfx_dataset <- createRandomEffectsDataset(rfx_group_ids, rfx_basis) y <- (-2*(rfx_group_ids==1)+2*(rfx_group_ids==2)) + rnorm(n) y_std <- (y-mean(y))/sd(y) outcome <- createOutcome(y_std) rng <- createCppRNG(1234) num_groups <- length(unique(rfx_group_ids)) num_components <- ncol(rfx_basis) rfx_model <- createRandomEffectsModel(num_components, num_groups) rfx_tracker <- createRandomEffectsTracker(rfx_group_ids) rfx_samples <- createRandomEffectSamples(num_components, num_groups, rfx_tracker) alpha_init <- rep(1,num_components) xi_init <- matrix(rep(alpha_init, num_groups),num_components,num_groups) sigma_alpha_init <- diag(1,num_components,num_components) sigma_xi_init <- diag(1,num_components,num_components) sigma_xi_shape <- 1 sigma_xi_scale <- 1 rfx_model$set_working_parameter(alpha_init) rfx_model$set_group_parameters(xi_init) rfx_model$set_working_parameter_cov(sigma_alpha_init) rfx_model$set_group_parameter_cov(sigma_xi_init) rfx_model$set_variance_prior_shape(sigma_xi_shape) rfx_model$set_variance_prior_scale(sigma_xi_scale) for (i in 1:3) { rfx_model$sample_random_effect(rfx_dataset=rfx_dataset, residual=outcome, rfx_tracker=rfx_tracker, rfx_samples=rfx_samples, keep_sample=TRUE, global_variance=1.0, rng=rng) } rootResetRandomEffectsModel(rfx_model, alpha_init, xi_init, sigma_alpha_init, sigma_xi_init, sigma_xi_shape, sigma_xi_scale)
n <- 100 p <- 10 rfx_group_ids <- sample(1:2, size = n, replace = TRUE) rfx_basis <- matrix(rep(1.0, n), ncol=1) rfx_dataset <- createRandomEffectsDataset(rfx_group_ids, rfx_basis) y <- (-2*(rfx_group_ids==1)+2*(rfx_group_ids==2)) + rnorm(n) y_std <- (y-mean(y))/sd(y) outcome <- createOutcome(y_std) rng <- createCppRNG(1234) num_groups <- length(unique(rfx_group_ids)) num_components <- ncol(rfx_basis) rfx_model <- createRandomEffectsModel(num_components, num_groups) rfx_tracker <- createRandomEffectsTracker(rfx_group_ids) rfx_samples <- createRandomEffectSamples(num_components, num_groups, rfx_tracker) alpha_init <- rep(1,num_components) xi_init <- matrix(rep(alpha_init, num_groups),num_components,num_groups) sigma_alpha_init <- diag(1,num_components,num_components) sigma_xi_init <- diag(1,num_components,num_components) sigma_xi_shape <- 1 sigma_xi_scale <- 1 rfx_model$set_working_parameter(alpha_init) rfx_model$set_group_parameters(xi_init) rfx_model$set_working_parameter_cov(sigma_alpha_init) rfx_model$set_group_parameter_cov(sigma_xi_init) rfx_model$set_variance_prior_shape(sigma_xi_shape) rfx_model$set_variance_prior_scale(sigma_xi_scale) for (i in 1:3) { rfx_model$sample_random_effect(rfx_dataset=rfx_dataset, residual=outcome, rfx_tracker=rfx_tracker, rfx_samples=rfx_samples, keep_sample=TRUE, global_variance=1.0, rng=rng) } rootResetRandomEffectsModel(rfx_model, alpha_init, xi_init, sigma_alpha_init, sigma_xi_init, sigma_xi_shape, sigma_xi_scale)
RandomEffectsTracker
object to its "default" stateReset a RandomEffectsTracker
object to its "default" state
rootResetRandomEffectsTracker(rfx_tracker, rfx_model, rfx_dataset, residual)
rootResetRandomEffectsTracker(rfx_tracker, rfx_model, rfx_dataset, residual)
rfx_tracker |
Object of type |
rfx_model |
Object of type |
rfx_dataset |
Object of type |
residual |
Object of type |
None
n <- 100 p <- 10 rfx_group_ids <- sample(1:2, size = n, replace = TRUE) rfx_basis <- matrix(rep(1.0, n), ncol=1) rfx_dataset <- createRandomEffectsDataset(rfx_group_ids, rfx_basis) y <- (-2*(rfx_group_ids==1)+2*(rfx_group_ids==2)) + rnorm(n) y_std <- (y-mean(y))/sd(y) outcome <- createOutcome(y_std) rng <- createCppRNG(1234) num_groups <- length(unique(rfx_group_ids)) num_components <- ncol(rfx_basis) rfx_model <- createRandomEffectsModel(num_components, num_groups) rfx_tracker <- createRandomEffectsTracker(rfx_group_ids) rfx_samples <- createRandomEffectSamples(num_components, num_groups, rfx_tracker) alpha_init <- rep(1,num_components) xi_init <- matrix(rep(alpha_init, num_groups),num_components,num_groups) sigma_alpha_init <- diag(1,num_components,num_components) sigma_xi_init <- diag(1,num_components,num_components) sigma_xi_shape <- 1 sigma_xi_scale <- 1 rfx_model$set_working_parameter(alpha_init) rfx_model$set_group_parameters(xi_init) rfx_model$set_working_parameter_cov(sigma_alpha_init) rfx_model$set_group_parameter_cov(sigma_xi_init) rfx_model$set_variance_prior_shape(sigma_xi_shape) rfx_model$set_variance_prior_scale(sigma_xi_scale) for (i in 1:3) { rfx_model$sample_random_effect(rfx_dataset=rfx_dataset, residual=outcome, rfx_tracker=rfx_tracker, rfx_samples=rfx_samples, keep_sample=TRUE, global_variance=1.0, rng=rng) } rootResetRandomEffectsModel(rfx_model, alpha_init, xi_init, sigma_alpha_init, sigma_xi_init, sigma_xi_shape, sigma_xi_scale) rootResetRandomEffectsTracker(rfx_tracker, rfx_model, rfx_dataset, outcome)
n <- 100 p <- 10 rfx_group_ids <- sample(1:2, size = n, replace = TRUE) rfx_basis <- matrix(rep(1.0, n), ncol=1) rfx_dataset <- createRandomEffectsDataset(rfx_group_ids, rfx_basis) y <- (-2*(rfx_group_ids==1)+2*(rfx_group_ids==2)) + rnorm(n) y_std <- (y-mean(y))/sd(y) outcome <- createOutcome(y_std) rng <- createCppRNG(1234) num_groups <- length(unique(rfx_group_ids)) num_components <- ncol(rfx_basis) rfx_model <- createRandomEffectsModel(num_components, num_groups) rfx_tracker <- createRandomEffectsTracker(rfx_group_ids) rfx_samples <- createRandomEffectSamples(num_components, num_groups, rfx_tracker) alpha_init <- rep(1,num_components) xi_init <- matrix(rep(alpha_init, num_groups),num_components,num_groups) sigma_alpha_init <- diag(1,num_components,num_components) sigma_xi_init <- diag(1,num_components,num_components) sigma_xi_shape <- 1 sigma_xi_scale <- 1 rfx_model$set_working_parameter(alpha_init) rfx_model$set_group_parameters(xi_init) rfx_model$set_working_parameter_cov(sigma_alpha_init) rfx_model$set_group_parameter_cov(sigma_xi_init) rfx_model$set_variance_prior_shape(sigma_xi_shape) rfx_model$set_variance_prior_scale(sigma_xi_scale) for (i in 1:3) { rfx_model$sample_random_effect(rfx_dataset=rfx_dataset, residual=outcome, rfx_tracker=rfx_tracker, rfx_samples=rfx_samples, keep_sample=TRUE, global_variance=1.0, rng=rng) } rootResetRandomEffectsModel(rfx_model, alpha_init, xi_init, sigma_alpha_init, sigma_xi_init, sigma_xi_shape, sigma_xi_scale) rootResetRandomEffectsTracker(rfx_tracker, rfx_model, rfx_dataset, outcome)
Sample one iteration of the (inverse gamma) global variance model
sampleGlobalErrorVarianceOneIteration(residual, dataset, rng, a, b)
sampleGlobalErrorVarianceOneIteration(residual, dataset, rng, a, b)
residual |
Outcome class |
dataset |
ForestDataset class |
rng |
C++ random number generator |
a |
Global variance shape parameter |
b |
Global variance scale parameter |
None
X <- matrix(runif(10*100), ncol = 10) y <- -5 + 10*(X[,1] > 0.5) + rnorm(100) y_std <- (y-mean(y))/sd(y) forest_dataset <- createForestDataset(X) outcome <- createOutcome(y_std) rng <- createCppRNG(1234) a <- 1.0 b <- 1.0 sigma2 <- sampleGlobalErrorVarianceOneIteration(outcome, forest_dataset, rng, a, b)
X <- matrix(runif(10*100), ncol = 10) y <- -5 + 10*(X[,1] > 0.5) + rnorm(100) y_std <- (y-mean(y))/sd(y) forest_dataset <- createForestDataset(X) outcome <- createOutcome(y_std) rng <- createCppRNG(1234) a <- 1.0 b <- 1.0 sigma2 <- sampleGlobalErrorVarianceOneIteration(outcome, forest_dataset, rng, a, b)
Sample one iteration of the leaf parameter variance model (only for univariate basis and constant leaf!)
sampleLeafVarianceOneIteration(forest, rng, a, b)
sampleLeafVarianceOneIteration(forest, rng, a, b)
forest |
C++ forest |
rng |
C++ random number generator |
a |
Leaf variance shape parameter |
b |
Leaf variance scale parameter |
None
num_trees <- 100 leaf_dimension <- 1 is_leaf_constant <- TRUE is_exponentiated <- FALSE active_forest <- createForest(num_trees, leaf_dimension, is_leaf_constant, is_exponentiated) rng <- createCppRNG(1234) a <- 1.0 b <- 1.0 tau <- sampleLeafVarianceOneIteration(active_forest, rng, a, b)
num_trees <- 100 leaf_dimension <- 1 is_leaf_constant <- TRUE is_exponentiated <- FALSE active_forest <- createForest(num_trees, leaf_dimension, is_leaf_constant, is_exponentiated) rng <- createCppRNG(1234) a <- 1.0 b <- 1.0 tau <- sampleLeafVarianceOneIteration(active_forest, rng, a, b)
Convert the persistent aspects of a BART model to (in-memory) JSON
saveBARTModelToJson(object)
saveBARTModelToJson(object)
object |
Object of type |
Object of type CppJson
n <- 100 p <- 5 X <- matrix(runif(n*p), ncol = p) f_XW <- ( ((0 <= X[,1]) & (0.25 > X[,1])) * (-7.5) + ((0.25 <= X[,1]) & (0.5 > X[,1])) * (-2.5) + ((0.5 <= X[,1]) & (0.75 > X[,1])) * (2.5) + ((0.75 <= X[,1]) & (1 > X[,1])) * (7.5) ) noise_sd <- 1 y <- f_XW + rnorm(n, 0, noise_sd) test_set_pct <- 0.2 n_test <- round(test_set_pct*n) n_train <- n - n_test test_inds <- sort(sample(1:n, n_test, replace = FALSE)) train_inds <- (1:n)[!((1:n) %in% test_inds)] X_test <- X[test_inds,] X_train <- X[train_inds,] y_test <- y[test_inds] y_train <- y[train_inds] bart_model <- bart(X_train = X_train, y_train = y_train, num_gfr = 10, num_burnin = 0, num_mcmc = 10) bart_json <- saveBARTModelToJson(bart_model)
n <- 100 p <- 5 X <- matrix(runif(n*p), ncol = p) f_XW <- ( ((0 <= X[,1]) & (0.25 > X[,1])) * (-7.5) + ((0.25 <= X[,1]) & (0.5 > X[,1])) * (-2.5) + ((0.5 <= X[,1]) & (0.75 > X[,1])) * (2.5) + ((0.75 <= X[,1]) & (1 > X[,1])) * (7.5) ) noise_sd <- 1 y <- f_XW + rnorm(n, 0, noise_sd) test_set_pct <- 0.2 n_test <- round(test_set_pct*n) n_train <- n - n_test test_inds <- sort(sample(1:n, n_test, replace = FALSE)) train_inds <- (1:n)[!((1:n) %in% test_inds)] X_test <- X[test_inds,] X_train <- X[train_inds,] y_test <- y[test_inds] y_train <- y[train_inds] bart_model <- bart(X_train = X_train, y_train = y_train, num_gfr = 10, num_burnin = 0, num_mcmc = 10) bart_json <- saveBARTModelToJson(bart_model)
Convert the persistent aspects of a BART model to (in-memory) JSON and save to a file
saveBARTModelToJsonFile(object, filename)
saveBARTModelToJsonFile(object, filename)
object |
Object of type |
filename |
String of filepath, must end in ".json" |
None
n <- 100 p <- 5 X <- matrix(runif(n*p), ncol = p) f_XW <- ( ((0 <= X[,1]) & (0.25 > X[,1])) * (-7.5) + ((0.25 <= X[,1]) & (0.5 > X[,1])) * (-2.5) + ((0.5 <= X[,1]) & (0.75 > X[,1])) * (2.5) + ((0.75 <= X[,1]) & (1 > X[,1])) * (7.5) ) noise_sd <- 1 y <- f_XW + rnorm(n, 0, noise_sd) test_set_pct <- 0.2 n_test <- round(test_set_pct*n) n_train <- n - n_test test_inds <- sort(sample(1:n, n_test, replace = FALSE)) train_inds <- (1:n)[!((1:n) %in% test_inds)] X_test <- X[test_inds,] X_train <- X[train_inds,] y_test <- y[test_inds] y_train <- y[train_inds] bart_model <- bart(X_train = X_train, y_train = y_train, num_gfr = 10, num_burnin = 0, num_mcmc = 10) tmpjson <- tempfile(fileext = ".json") saveBARTModelToJsonFile(bart_model, file.path(tmpjson)) unlink(tmpjson)
n <- 100 p <- 5 X <- matrix(runif(n*p), ncol = p) f_XW <- ( ((0 <= X[,1]) & (0.25 > X[,1])) * (-7.5) + ((0.25 <= X[,1]) & (0.5 > X[,1])) * (-2.5) + ((0.5 <= X[,1]) & (0.75 > X[,1])) * (2.5) + ((0.75 <= X[,1]) & (1 > X[,1])) * (7.5) ) noise_sd <- 1 y <- f_XW + rnorm(n, 0, noise_sd) test_set_pct <- 0.2 n_test <- round(test_set_pct*n) n_train <- n - n_test test_inds <- sort(sample(1:n, n_test, replace = FALSE)) train_inds <- (1:n)[!((1:n) %in% test_inds)] X_test <- X[test_inds,] X_train <- X[train_inds,] y_test <- y[test_inds] y_train <- y[train_inds] bart_model <- bart(X_train = X_train, y_train = y_train, num_gfr = 10, num_burnin = 0, num_mcmc = 10) tmpjson <- tempfile(fileext = ".json") saveBARTModelToJsonFile(bart_model, file.path(tmpjson)) unlink(tmpjson)
Convert the persistent aspects of a BART model to (in-memory) JSON string
saveBARTModelToJsonString(object)
saveBARTModelToJsonString(object)
object |
Object of type |
in-memory JSON string
n <- 100 p <- 5 X <- matrix(runif(n*p), ncol = p) f_XW <- ( ((0 <= X[,1]) & (0.25 > X[,1])) * (-7.5) + ((0.25 <= X[,1]) & (0.5 > X[,1])) * (-2.5) + ((0.5 <= X[,1]) & (0.75 > X[,1])) * (2.5) + ((0.75 <= X[,1]) & (1 > X[,1])) * (7.5) ) noise_sd <- 1 y <- f_XW + rnorm(n, 0, noise_sd) test_set_pct <- 0.2 n_test <- round(test_set_pct*n) n_train <- n - n_test test_inds <- sort(sample(1:n, n_test, replace = FALSE)) train_inds <- (1:n)[!((1:n) %in% test_inds)] X_test <- X[test_inds,] X_train <- X[train_inds,] y_test <- y[test_inds] y_train <- y[train_inds] bart_model <- bart(X_train = X_train, y_train = y_train, num_gfr = 10, num_burnin = 0, num_mcmc = 10) bart_json_string <- saveBARTModelToJsonString(bart_model)
n <- 100 p <- 5 X <- matrix(runif(n*p), ncol = p) f_XW <- ( ((0 <= X[,1]) & (0.25 > X[,1])) * (-7.5) + ((0.25 <= X[,1]) & (0.5 > X[,1])) * (-2.5) + ((0.5 <= X[,1]) & (0.75 > X[,1])) * (2.5) + ((0.75 <= X[,1]) & (1 > X[,1])) * (7.5) ) noise_sd <- 1 y <- f_XW + rnorm(n, 0, noise_sd) test_set_pct <- 0.2 n_test <- round(test_set_pct*n) n_train <- n - n_test test_inds <- sort(sample(1:n, n_test, replace = FALSE)) train_inds <- (1:n)[!((1:n) %in% test_inds)] X_test <- X[test_inds,] X_train <- X[train_inds,] y_test <- y[test_inds] y_train <- y[train_inds] bart_model <- bart(X_train = X_train, y_train = y_train, num_gfr = 10, num_burnin = 0, num_mcmc = 10) bart_json_string <- saveBARTModelToJsonString(bart_model)
Convert the persistent aspects of a BCF model to (in-memory) JSON
saveBCFModelToJson(object)
saveBCFModelToJson(object)
object |
Object of type |
Object of type CppJson
n <- 500 p <- 5 X <- matrix(runif(n*p), ncol = p) mu_x <- ( ((0 <= X[,1]) & (0.25 > X[,1])) * (-7.5) + ((0.25 <= X[,1]) & (0.5 > X[,1])) * (-2.5) + ((0.5 <= X[,1]) & (0.75 > X[,1])) * (2.5) + ((0.75 <= X[,1]) & (1 > X[,1])) * (7.5) ) pi_x <- ( ((0 <= X[,1]) & (0.25 > X[,1])) * (0.2) + ((0.25 <= X[,1]) & (0.5 > X[,1])) * (0.4) + ((0.5 <= X[,1]) & (0.75 > X[,1])) * (0.6) + ((0.75 <= X[,1]) & (1 > X[,1])) * (0.8) ) tau_x <- ( ((0 <= X[,2]) & (0.25 > X[,2])) * (0.5) + ((0.25 <= X[,2]) & (0.5 > X[,2])) * (1.0) + ((0.5 <= X[,2]) & (0.75 > X[,2])) * (1.5) + ((0.75 <= X[,2]) & (1 > X[,2])) * (2.0) ) Z <- rbinom(n, 1, pi_x) E_XZ <- mu_x + Z*tau_x snr <- 3 rfx_group_ids <- rep(c(1,2), n %/% 2) rfx_coefs <- matrix(c(-1, -1, 1, 1), nrow=2, byrow=TRUE) rfx_basis <- cbind(1, runif(n, -1, 1)) rfx_term <- rowSums(rfx_coefs[rfx_group_ids,] * rfx_basis) y <- E_XZ + rfx_term + rnorm(n, 0, 1)*(sd(E_XZ)/snr) test_set_pct <- 0.2 n_test <- round(test_set_pct*n) n_train <- n - n_test test_inds <- sort(sample(1:n, n_test, replace = FALSE)) train_inds <- (1:n)[!((1:n) %in% test_inds)] X_test <- X[test_inds,] X_train <- X[train_inds,] pi_test <- pi_x[test_inds] pi_train <- pi_x[train_inds] Z_test <- Z[test_inds] Z_train <- Z[train_inds] y_test <- y[test_inds] y_train <- y[train_inds] mu_test <- mu_x[test_inds] mu_train <- mu_x[train_inds] tau_test <- tau_x[test_inds] tau_train <- tau_x[train_inds] rfx_group_ids_test <- rfx_group_ids[test_inds] rfx_group_ids_train <- rfx_group_ids[train_inds] rfx_basis_test <- rfx_basis[test_inds,] rfx_basis_train <- rfx_basis[train_inds,] rfx_term_test <- rfx_term[test_inds] rfx_term_train <- rfx_term[train_inds] mu_params <- list(sample_sigma_leaf = TRUE) tau_params <- list(sample_sigma_leaf = FALSE) bcf_model <- bcf(X_train = X_train, Z_train = Z_train, y_train = y_train, propensity_train = pi_train, rfx_group_ids_train = rfx_group_ids_train, rfx_basis_train = rfx_basis_train, X_test = X_test, Z_test = Z_test, propensity_test = pi_test, rfx_group_ids_test = rfx_group_ids_test, rfx_basis_test = rfx_basis_test, num_gfr = 10, num_burnin = 0, num_mcmc = 10, prognostic_forest_params = mu_params, treatment_effect_forest_params = tau_params) bcf_json <- saveBCFModelToJson(bcf_model)
n <- 500 p <- 5 X <- matrix(runif(n*p), ncol = p) mu_x <- ( ((0 <= X[,1]) & (0.25 > X[,1])) * (-7.5) + ((0.25 <= X[,1]) & (0.5 > X[,1])) * (-2.5) + ((0.5 <= X[,1]) & (0.75 > X[,1])) * (2.5) + ((0.75 <= X[,1]) & (1 > X[,1])) * (7.5) ) pi_x <- ( ((0 <= X[,1]) & (0.25 > X[,1])) * (0.2) + ((0.25 <= X[,1]) & (0.5 > X[,1])) * (0.4) + ((0.5 <= X[,1]) & (0.75 > X[,1])) * (0.6) + ((0.75 <= X[,1]) & (1 > X[,1])) * (0.8) ) tau_x <- ( ((0 <= X[,2]) & (0.25 > X[,2])) * (0.5) + ((0.25 <= X[,2]) & (0.5 > X[,2])) * (1.0) + ((0.5 <= X[,2]) & (0.75 > X[,2])) * (1.5) + ((0.75 <= X[,2]) & (1 > X[,2])) * (2.0) ) Z <- rbinom(n, 1, pi_x) E_XZ <- mu_x + Z*tau_x snr <- 3 rfx_group_ids <- rep(c(1,2), n %/% 2) rfx_coefs <- matrix(c(-1, -1, 1, 1), nrow=2, byrow=TRUE) rfx_basis <- cbind(1, runif(n, -1, 1)) rfx_term <- rowSums(rfx_coefs[rfx_group_ids,] * rfx_basis) y <- E_XZ + rfx_term + rnorm(n, 0, 1)*(sd(E_XZ)/snr) test_set_pct <- 0.2 n_test <- round(test_set_pct*n) n_train <- n - n_test test_inds <- sort(sample(1:n, n_test, replace = FALSE)) train_inds <- (1:n)[!((1:n) %in% test_inds)] X_test <- X[test_inds,] X_train <- X[train_inds,] pi_test <- pi_x[test_inds] pi_train <- pi_x[train_inds] Z_test <- Z[test_inds] Z_train <- Z[train_inds] y_test <- y[test_inds] y_train <- y[train_inds] mu_test <- mu_x[test_inds] mu_train <- mu_x[train_inds] tau_test <- tau_x[test_inds] tau_train <- tau_x[train_inds] rfx_group_ids_test <- rfx_group_ids[test_inds] rfx_group_ids_train <- rfx_group_ids[train_inds] rfx_basis_test <- rfx_basis[test_inds,] rfx_basis_train <- rfx_basis[train_inds,] rfx_term_test <- rfx_term[test_inds] rfx_term_train <- rfx_term[train_inds] mu_params <- list(sample_sigma_leaf = TRUE) tau_params <- list(sample_sigma_leaf = FALSE) bcf_model <- bcf(X_train = X_train, Z_train = Z_train, y_train = y_train, propensity_train = pi_train, rfx_group_ids_train = rfx_group_ids_train, rfx_basis_train = rfx_basis_train, X_test = X_test, Z_test = Z_test, propensity_test = pi_test, rfx_group_ids_test = rfx_group_ids_test, rfx_basis_test = rfx_basis_test, num_gfr = 10, num_burnin = 0, num_mcmc = 10, prognostic_forest_params = mu_params, treatment_effect_forest_params = tau_params) bcf_json <- saveBCFModelToJson(bcf_model)
Convert the persistent aspects of a BCF model to (in-memory) JSON and save to a file
saveBCFModelToJsonFile(object, filename)
saveBCFModelToJsonFile(object, filename)
object |
Object of type |
filename |
String of filepath, must end in ".json" |
in-memory JSON string
n <- 500 p <- 5 X <- matrix(runif(n*p), ncol = p) mu_x <- ( ((0 <= X[,1]) & (0.25 > X[,1])) * (-7.5) + ((0.25 <= X[,1]) & (0.5 > X[,1])) * (-2.5) + ((0.5 <= X[,1]) & (0.75 > X[,1])) * (2.5) + ((0.75 <= X[,1]) & (1 > X[,1])) * (7.5) ) pi_x <- ( ((0 <= X[,1]) & (0.25 > X[,1])) * (0.2) + ((0.25 <= X[,1]) & (0.5 > X[,1])) * (0.4) + ((0.5 <= X[,1]) & (0.75 > X[,1])) * (0.6) + ((0.75 <= X[,1]) & (1 > X[,1])) * (0.8) ) tau_x <- ( ((0 <= X[,2]) & (0.25 > X[,2])) * (0.5) + ((0.25 <= X[,2]) & (0.5 > X[,2])) * (1.0) + ((0.5 <= X[,2]) & (0.75 > X[,2])) * (1.5) + ((0.75 <= X[,2]) & (1 > X[,2])) * (2.0) ) Z <- rbinom(n, 1, pi_x) E_XZ <- mu_x + Z*tau_x snr <- 3 rfx_group_ids <- rep(c(1,2), n %/% 2) rfx_coefs <- matrix(c(-1, -1, 1, 1), nrow=2, byrow=TRUE) rfx_basis <- cbind(1, runif(n, -1, 1)) rfx_term <- rowSums(rfx_coefs[rfx_group_ids,] * rfx_basis) y <- E_XZ + rfx_term + rnorm(n, 0, 1)*(sd(E_XZ)/snr) test_set_pct <- 0.2 n_test <- round(test_set_pct*n) n_train <- n - n_test test_inds <- sort(sample(1:n, n_test, replace = FALSE)) train_inds <- (1:n)[!((1:n) %in% test_inds)] X_test <- X[test_inds,] X_train <- X[train_inds,] pi_test <- pi_x[test_inds] pi_train <- pi_x[train_inds] Z_test <- Z[test_inds] Z_train <- Z[train_inds] y_test <- y[test_inds] y_train <- y[train_inds] mu_test <- mu_x[test_inds] mu_train <- mu_x[train_inds] tau_test <- tau_x[test_inds] tau_train <- tau_x[train_inds] rfx_group_ids_test <- rfx_group_ids[test_inds] rfx_group_ids_train <- rfx_group_ids[train_inds] rfx_basis_test <- rfx_basis[test_inds,] rfx_basis_train <- rfx_basis[train_inds,] rfx_term_test <- rfx_term[test_inds] rfx_term_train <- rfx_term[train_inds] mu_params <- list(sample_sigma_leaf = TRUE) tau_params <- list(sample_sigma_leaf = FALSE) bcf_model <- bcf(X_train = X_train, Z_train = Z_train, y_train = y_train, propensity_train = pi_train, rfx_group_ids_train = rfx_group_ids_train, rfx_basis_train = rfx_basis_train, X_test = X_test, Z_test = Z_test, propensity_test = pi_test, rfx_group_ids_test = rfx_group_ids_test, rfx_basis_test = rfx_basis_test, num_gfr = 10, num_burnin = 0, num_mcmc = 10, prognostic_forest_params = mu_params, treatment_effect_forest_params = tau_params) tmpjson <- tempfile(fileext = ".json") saveBCFModelToJsonFile(bcf_model, file.path(tmpjson)) unlink(tmpjson)
n <- 500 p <- 5 X <- matrix(runif(n*p), ncol = p) mu_x <- ( ((0 <= X[,1]) & (0.25 > X[,1])) * (-7.5) + ((0.25 <= X[,1]) & (0.5 > X[,1])) * (-2.5) + ((0.5 <= X[,1]) & (0.75 > X[,1])) * (2.5) + ((0.75 <= X[,1]) & (1 > X[,1])) * (7.5) ) pi_x <- ( ((0 <= X[,1]) & (0.25 > X[,1])) * (0.2) + ((0.25 <= X[,1]) & (0.5 > X[,1])) * (0.4) + ((0.5 <= X[,1]) & (0.75 > X[,1])) * (0.6) + ((0.75 <= X[,1]) & (1 > X[,1])) * (0.8) ) tau_x <- ( ((0 <= X[,2]) & (0.25 > X[,2])) * (0.5) + ((0.25 <= X[,2]) & (0.5 > X[,2])) * (1.0) + ((0.5 <= X[,2]) & (0.75 > X[,2])) * (1.5) + ((0.75 <= X[,2]) & (1 > X[,2])) * (2.0) ) Z <- rbinom(n, 1, pi_x) E_XZ <- mu_x + Z*tau_x snr <- 3 rfx_group_ids <- rep(c(1,2), n %/% 2) rfx_coefs <- matrix(c(-1, -1, 1, 1), nrow=2, byrow=TRUE) rfx_basis <- cbind(1, runif(n, -1, 1)) rfx_term <- rowSums(rfx_coefs[rfx_group_ids,] * rfx_basis) y <- E_XZ + rfx_term + rnorm(n, 0, 1)*(sd(E_XZ)/snr) test_set_pct <- 0.2 n_test <- round(test_set_pct*n) n_train <- n - n_test test_inds <- sort(sample(1:n, n_test, replace = FALSE)) train_inds <- (1:n)[!((1:n) %in% test_inds)] X_test <- X[test_inds,] X_train <- X[train_inds,] pi_test <- pi_x[test_inds] pi_train <- pi_x[train_inds] Z_test <- Z[test_inds] Z_train <- Z[train_inds] y_test <- y[test_inds] y_train <- y[train_inds] mu_test <- mu_x[test_inds] mu_train <- mu_x[train_inds] tau_test <- tau_x[test_inds] tau_train <- tau_x[train_inds] rfx_group_ids_test <- rfx_group_ids[test_inds] rfx_group_ids_train <- rfx_group_ids[train_inds] rfx_basis_test <- rfx_basis[test_inds,] rfx_basis_train <- rfx_basis[train_inds,] rfx_term_test <- rfx_term[test_inds] rfx_term_train <- rfx_term[train_inds] mu_params <- list(sample_sigma_leaf = TRUE) tau_params <- list(sample_sigma_leaf = FALSE) bcf_model <- bcf(X_train = X_train, Z_train = Z_train, y_train = y_train, propensity_train = pi_train, rfx_group_ids_train = rfx_group_ids_train, rfx_basis_train = rfx_basis_train, X_test = X_test, Z_test = Z_test, propensity_test = pi_test, rfx_group_ids_test = rfx_group_ids_test, rfx_basis_test = rfx_basis_test, num_gfr = 10, num_burnin = 0, num_mcmc = 10, prognostic_forest_params = mu_params, treatment_effect_forest_params = tau_params) tmpjson <- tempfile(fileext = ".json") saveBCFModelToJsonFile(bcf_model, file.path(tmpjson)) unlink(tmpjson)
Convert the persistent aspects of a BCF model to (in-memory) JSON string
saveBCFModelToJsonString(object)
saveBCFModelToJsonString(object)
object |
Object of type |
JSON string
n <- 500 p <- 5 X <- matrix(runif(n*p), ncol = p) mu_x <- ( ((0 <= X[,1]) & (0.25 > X[,1])) * (-7.5) + ((0.25 <= X[,1]) & (0.5 > X[,1])) * (-2.5) + ((0.5 <= X[,1]) & (0.75 > X[,1])) * (2.5) + ((0.75 <= X[,1]) & (1 > X[,1])) * (7.5) ) pi_x <- ( ((0 <= X[,1]) & (0.25 > X[,1])) * (0.2) + ((0.25 <= X[,1]) & (0.5 > X[,1])) * (0.4) + ((0.5 <= X[,1]) & (0.75 > X[,1])) * (0.6) + ((0.75 <= X[,1]) & (1 > X[,1])) * (0.8) ) tau_x <- ( ((0 <= X[,2]) & (0.25 > X[,2])) * (0.5) + ((0.25 <= X[,2]) & (0.5 > X[,2])) * (1.0) + ((0.5 <= X[,2]) & (0.75 > X[,2])) * (1.5) + ((0.75 <= X[,2]) & (1 > X[,2])) * (2.0) ) Z <- rbinom(n, 1, pi_x) E_XZ <- mu_x + Z*tau_x snr <- 3 rfx_group_ids <- rep(c(1,2), n %/% 2) rfx_coefs <- matrix(c(-1, -1, 1, 1), nrow=2, byrow=TRUE) rfx_basis <- cbind(1, runif(n, -1, 1)) rfx_term <- rowSums(rfx_coefs[rfx_group_ids,] * rfx_basis) y <- E_XZ + rfx_term + rnorm(n, 0, 1)*(sd(E_XZ)/snr) test_set_pct <- 0.2 n_test <- round(test_set_pct*n) n_train <- n - n_test test_inds <- sort(sample(1:n, n_test, replace = FALSE)) train_inds <- (1:n)[!((1:n) %in% test_inds)] X_test <- X[test_inds,] X_train <- X[train_inds,] pi_test <- pi_x[test_inds] pi_train <- pi_x[train_inds] Z_test <- Z[test_inds] Z_train <- Z[train_inds] y_test <- y[test_inds] y_train <- y[train_inds] mu_test <- mu_x[test_inds] mu_train <- mu_x[train_inds] tau_test <- tau_x[test_inds] tau_train <- tau_x[train_inds] rfx_group_ids_test <- rfx_group_ids[test_inds] rfx_group_ids_train <- rfx_group_ids[train_inds] rfx_basis_test <- rfx_basis[test_inds,] rfx_basis_train <- rfx_basis[train_inds,] rfx_term_test <- rfx_term[test_inds] rfx_term_train <- rfx_term[train_inds] mu_params <- list(sample_sigma_leaf = TRUE) tau_params <- list(sample_sigma_leaf = FALSE) bcf_model <- bcf(X_train = X_train, Z_train = Z_train, y_train = y_train, propensity_train = pi_train, rfx_group_ids_train = rfx_group_ids_train, rfx_basis_train = rfx_basis_train, X_test = X_test, Z_test = Z_test, propensity_test = pi_test, rfx_group_ids_test = rfx_group_ids_test, rfx_basis_test = rfx_basis_test, num_gfr = 10, num_burnin = 0, num_mcmc = 10, prognostic_forest_params = mu_params, treatment_effect_forest_params = tau_params) saveBCFModelToJsonString(bcf_model)
n <- 500 p <- 5 X <- matrix(runif(n*p), ncol = p) mu_x <- ( ((0 <= X[,1]) & (0.25 > X[,1])) * (-7.5) + ((0.25 <= X[,1]) & (0.5 > X[,1])) * (-2.5) + ((0.5 <= X[,1]) & (0.75 > X[,1])) * (2.5) + ((0.75 <= X[,1]) & (1 > X[,1])) * (7.5) ) pi_x <- ( ((0 <= X[,1]) & (0.25 > X[,1])) * (0.2) + ((0.25 <= X[,1]) & (0.5 > X[,1])) * (0.4) + ((0.5 <= X[,1]) & (0.75 > X[,1])) * (0.6) + ((0.75 <= X[,1]) & (1 > X[,1])) * (0.8) ) tau_x <- ( ((0 <= X[,2]) & (0.25 > X[,2])) * (0.5) + ((0.25 <= X[,2]) & (0.5 > X[,2])) * (1.0) + ((0.5 <= X[,2]) & (0.75 > X[,2])) * (1.5) + ((0.75 <= X[,2]) & (1 > X[,2])) * (2.0) ) Z <- rbinom(n, 1, pi_x) E_XZ <- mu_x + Z*tau_x snr <- 3 rfx_group_ids <- rep(c(1,2), n %/% 2) rfx_coefs <- matrix(c(-1, -1, 1, 1), nrow=2, byrow=TRUE) rfx_basis <- cbind(1, runif(n, -1, 1)) rfx_term <- rowSums(rfx_coefs[rfx_group_ids,] * rfx_basis) y <- E_XZ + rfx_term + rnorm(n, 0, 1)*(sd(E_XZ)/snr) test_set_pct <- 0.2 n_test <- round(test_set_pct*n) n_train <- n - n_test test_inds <- sort(sample(1:n, n_test, replace = FALSE)) train_inds <- (1:n)[!((1:n) %in% test_inds)] X_test <- X[test_inds,] X_train <- X[train_inds,] pi_test <- pi_x[test_inds] pi_train <- pi_x[train_inds] Z_test <- Z[test_inds] Z_train <- Z[train_inds] y_test <- y[test_inds] y_train <- y[train_inds] mu_test <- mu_x[test_inds] mu_train <- mu_x[train_inds] tau_test <- tau_x[test_inds] tau_train <- tau_x[train_inds] rfx_group_ids_test <- rfx_group_ids[test_inds] rfx_group_ids_train <- rfx_group_ids[train_inds] rfx_basis_test <- rfx_basis[test_inds,] rfx_basis_train <- rfx_basis[train_inds,] rfx_term_test <- rfx_term[test_inds] rfx_term_train <- rfx_term[train_inds] mu_params <- list(sample_sigma_leaf = TRUE) tau_params <- list(sample_sigma_leaf = FALSE) bcf_model <- bcf(X_train = X_train, Z_train = Z_train, y_train = y_train, propensity_train = pi_train, rfx_group_ids_train = rfx_group_ids_train, rfx_basis_train = rfx_basis_train, X_test = X_test, Z_test = Z_test, propensity_test = pi_test, rfx_group_ids_test = rfx_group_ids_test, rfx_basis_test = rfx_basis_test, num_gfr = 10, num_burnin = 0, num_mcmc = 10, prognostic_forest_params = mu_params, treatment_effect_forest_params = tau_params) saveBCFModelToJsonString(bcf_model)
Convert the persistent aspects of a covariate preprocessor to (in-memory) JSON string
savePreprocessorToJsonString(object)
savePreprocessorToJsonString(object)
object |
List containing information on variables, including train set categories for categorical variables |
in-memory JSON string
cov_mat <- matrix(1:12, ncol = 3) preprocess_list <- preprocessTrainData(cov_mat) preprocessor_json_string <- savePreprocessorToJsonString(preprocess_list$metadata)
cov_mat <- matrix(1:12, ncol = 3) preprocess_list <- preprocessTrainData(cov_mat) preprocessor_json_string <- savePreprocessorToJsonString(preprocess_list$metadata)