| Title: | Assessing Complex Heterogeneity in Surrogacy |
|---|---|
| Description: | Provides functions to assess complex heterogeneity in the strength of a surrogate marker with respect to multiple baseline covariates, in either a randomized treatment setting or observational setting. For a randomized treatment setting, the functions assess and test for heterogeneity using both a parametric model and a semiparametric two-step model. More details for the randomized setting are available in: Knowlton, R., Tian, L., & Parast, L. (2025). "A General Framework to Assess Complex Heterogeneity in the Strength of a Surrogate Marker," Statistics in Medicine, 44(5), e70001 <doi:10.1002/sim.70001>. For an observational setting, functions in this package assess complex heterogeneity in the strength of a surrogate marker using meta-learners, with options for different base learners. More details for the observational setting will be available in the future in: Knowlton, R., Parast, L. (2025) "Assessing Surrogate Heterogeneity in Real World Data Using Meta-Learners." A tutorial for this package can be found at <https://www.laylaparast.com/cohetsurr>. |
| Authors: | Rebecca Knowlton [aut], Layla Parast [aut, cre] |
| Maintainer: | Layla Parast <[email protected]> |
| License: | GPL |
| Version: | 2.0 |
| Built: | 2026-05-10 06:27:48 UTC |
| Source: | https://github.com/cran/cohetsurr |
Assesses complex heterogeneity in the utility of a surrogate marker by estimating the proportion of treatment effect explained by the surrogate marker as a function of multiple baseline covariates in a randomized treatment setting. Optionally, tests for evidence of heterogeneity overall and flags regions where the proportion of treatment effect explained is above a given threshold.
complex.heterogeneity(y, s, a, W.mat, type = "model", variance = FALSE, test = FALSE, W.grid = NULL, grid.size = 4, threshold = NULL)complex.heterogeneity(y, s, a, W.mat, type = "model", variance = FALSE, test = FALSE, W.grid = NULL, grid.size = 4, threshold = NULL)
y |
y, the outcome |
s |
s, the surrogate marker |
a |
a, the treatment assignment with 1 indicating the treatment group and 0 indicating the control group, assumed to be randomized |
W.mat |
matrix of baseline covariate observations, where the first column is W1, second columns is W2, etc. |
type |
options are "model", "two step", or "both"; specifies the estimation method that should be used for the proportion of treatment effect explained |
variance |
TRUE or FALSE, if variance/standard error estimates are wanted |
test |
TRUE or FALSE, if test for heterogeneity is wanted |
W.grid |
grid for the baseline covariates W where estimation will be provided |
grid.size |
number of measures for each baseline covariate to include in the estimation grid, if one is not provided by the user directly |
threshold |
threshold to flag regions where the estimated proportion of the treatment effect explained is at least that high |
A list is returned:
return.grid |
grid of estimates for the overall treatment effect, the residual treatment effect, and the proportion of treatment effect explained as a function of the baseline covariates, W. Includes variance estimates and regions flagged above the threshold, if specified by the user. |
pval |
p-value(s) from the F test and the two step omnibus test for heterogeneity, depending on type argument. |
Rebecca Knowlton
Knowlton, R., Tian, L., & Parast, L. (2025). A General Framework to Assess Complex Heterogeneity in the Strength of a Surrogate Marker. Statistics in Medicine, 44(5), e70001.
data(exampledata) names(exampledata) complex.heterogeneity(y = exampledata$y, s = exampledata$s, a = exampledata$a, W.mat = matrix(cbind(exampledata$w1, exampledata$w2), ncol = 2), type = "model", W.grid = matrix(cbind(exampledata$w1.grid, exampledata$w2.grid),ncol=2))data(exampledata) names(exampledata) complex.heterogeneity(y = exampledata$y, s = exampledata$s, a = exampledata$a, W.mat = matrix(cbind(exampledata$w1, exampledata$w2), ncol = 2), type = "model", W.grid = matrix(cbind(exampledata$w1.grid, exampledata$w2.grid),ncol=2))
Example data
data("exampledata")data("exampledata")
A list with 7 elements representing 1000 observations from a treatment group and 1000 observations from a control group, and a grid of baseline covariate values at which to calculate estimates:
ythe outcome
sthe surrogate marker
athe randomized treatment assignment, where 1 indicates treatment and 0 indicates control
w1the first baseline covariate of interest
w2the second baseline covariate of interest
w1.gridthe grid of first baseline covariate values to provide estimates for
w2.gridthe grid of second baseline covariate values to provide estimates for
data(exampledata) names(exampledata)data(exampledata) names(exampledata)
Example testing data for observational setting
data("obs_exampledata_test")data("obs_exampledata_test")
A data frame with 200 observations on the following 9 variables.
X1a numeric baseline covariate of interest
X2a numeric baseline covariate of interest
X3a numeric baseline covariate of interest
X4a numeric baseline covariate of interest
X5a numeric baseline covariate of interest
X6a numeric baseline covariate of interest
Gthe non-randomized treatment assignment, where 1 indicates treated and 0 indicates control
Sthe surrogate marker
Ythe primary outcome
data(obs_exampledata_test) names(obs_exampledata_test)data(obs_exampledata_test) names(obs_exampledata_test)
Example training data for observational setting
data("obs_exampledata_train")data("obs_exampledata_train")
A data frame with 1800 observations on the following 9 variables.
X1a numeric baseline covariate of interest
X2a numeric baseline covariate of interest
X3a numeric baseline covariate of interest
X4a numeric baseline covariate of interest
X5a numeric baseline covariate of interest
X6a numeric baseline covariate of interest
Gthe non-randomized treatment assignment, where 1 indicates treated and 0 indicates control
Sthe surrogate marker
Ythe primary outcome
data(obs_exampledata_train) names(obs_exampledata_train)data(obs_exampledata_train) names(obs_exampledata_train)
Assesses surrogate heterogeneity in real world data by estimating the proportion of the treatment effect explained as a function of baseline covariates. Optionally tests individuals for strong surrogacy based on a threshold.
obs.het.surr(df.train, df.test, type, var.want = FALSE, threshold = NULL, use.actual.control.S = FALSE)obs.het.surr(df.train, df.test, type, var.want = FALSE, threshold = NULL, use.actual.control.S = FALSE)
df.train |
dataframe containing training data; must have columns G (treatment assignment), S (surrogate marker), and Y (primary outcome), in addition to the baseline covariates of interest |
df.test |
dataframe containing testing data; must contain the same baseline covariate columns as the training data |
type |
options are "linear", "gam", "trees", or "all"; type of base learners to use |
var.want |
TRUE or FALSE, if variance estimates are wanted |
threshold |
optional threshold to test individuals for the null hypothesis that PTE is greater than the threshold; must have var.want = TRUE to return p-values |
use.actual.control.S |
TRUE or FALSE, if user prefers to use the actual observed values for the surrogate in the control group instead of predicting values from the base learners |
A dataframe is returned, which is the df.test argument with new columns appended for the estimates and corresponding variances of delta, delta.s, and R.s. If a threshold is specified, returns a p-value for the null hypothesis that PTE > threshold.
Rebecca Knowlton
Knowlton, R. and Parast, L. (2025) “Assessing Surrogate Heterogeneity in Real World Data Using Meta-Learners." Under Review.
data(obs_exampledata_train) data(obs_exampledata_test) obs.het.surr(df.train = obs_exampledata_train, df.test = obs_exampledata_test, type = "linear", var.want = FALSE)data(obs_exampledata_train) data(obs_exampledata_test) obs.het.surr(df.train = obs_exampledata_train, df.test = obs_exampledata_test, type = "linear", var.want = FALSE)