Package 'cohetsurr'

Title: Assessing Complex Heterogeneity in Surrogacy
Description: Provides functions to assess complex heterogeneity in the strength of a surrogate marker with respect to multiple baseline covariates, in either a randomized treatment setting or observational setting. For a randomized treatment setting, the functions assess and test for heterogeneity using both a parametric model and a semiparametric two-step model. More details for the randomized setting are available in: Knowlton, R., Tian, L., & Parast, L. (2025). "A General Framework to Assess Complex Heterogeneity in the Strength of a Surrogate Marker," Statistics in Medicine, 44(5), e70001 <doi:10.1002/sim.70001>. For an observational setting, functions in this package assess complex heterogeneity in the strength of a surrogate marker using meta-learners, with options for different base learners. More details for the observational setting will be available in the future in: Knowlton, R., Parast, L. (2025) "Assessing Surrogate Heterogeneity in Real World Data Using Meta-Learners." A tutorial for this package can be found at <https://www.laylaparast.com/cohetsurr>.
Authors: Rebecca Knowlton [aut], Layla Parast [aut, cre]
Maintainer: Layla Parast <[email protected]>
License: GPL
Version: 2.0
Built: 2026-05-10 06:27:48 UTC
Source: https://github.com/cran/cohetsurr

Help Index


Estimates the proportion of treatment effect explained by the surrogate marker as a function of multiple baseline covariates in a randomized treatment setting.

Description

Assesses complex heterogeneity in the utility of a surrogate marker by estimating the proportion of treatment effect explained by the surrogate marker as a function of multiple baseline covariates in a randomized treatment setting. Optionally, tests for evidence of heterogeneity overall and flags regions where the proportion of treatment effect explained is above a given threshold.

Usage

complex.heterogeneity(y, s, a, W.mat, type = "model", variance = FALSE, 
test = FALSE, W.grid = NULL, grid.size = 4, threshold = NULL)

Arguments

y

y, the outcome

s

s, the surrogate marker

a

a, the treatment assignment with 1 indicating the treatment group and 0 indicating the control group, assumed to be randomized

W.mat

matrix of baseline covariate observations, where the first column is W1, second columns is W2, etc.

type

options are "model", "two step", or "both"; specifies the estimation method that should be used for the proportion of treatment effect explained

variance

TRUE or FALSE, if variance/standard error estimates are wanted

test

TRUE or FALSE, if test for heterogeneity is wanted

W.grid

grid for the baseline covariates W where estimation will be provided

grid.size

number of measures for each baseline covariate to include in the estimation grid, if one is not provided by the user directly

threshold

threshold to flag regions where the estimated proportion of the treatment effect explained is at least that high

Value

A list is returned:

return.grid

grid of estimates for the overall treatment effect, the residual treatment effect, and the proportion of treatment effect explained as a function of the baseline covariates, W. Includes variance estimates and regions flagged above the threshold, if specified by the user.

pval

p-value(s) from the F test and the two step omnibus test for heterogeneity, depending on type argument.

Author(s)

Rebecca Knowlton

References

Knowlton, R., Tian, L., & Parast, L. (2025). A General Framework to Assess Complex Heterogeneity in the Strength of a Surrogate Marker. Statistics in Medicine, 44(5), e70001.

Examples

data(exampledata)
  names(exampledata)
  complex.heterogeneity(y = exampledata$y,
                        s = exampledata$s,
                        a = exampledata$a,
                        W.mat = matrix(cbind(exampledata$w1, exampledata$w2), ncol = 2),
                        type = "model",
                        W.grid = matrix(cbind(exampledata$w1.grid, exampledata$w2.grid),ncol=2))

Example data

Description

Example data

Usage

data("exampledata")

Format

A list with 7 elements representing 1000 observations from a treatment group and 1000 observations from a control group, and a grid of baseline covariate values at which to calculate estimates:

y

the outcome

s

the surrogate marker

a

the randomized treatment assignment, where 1 indicates treatment and 0 indicates control

w1

the first baseline covariate of interest

w2

the second baseline covariate of interest

w1.grid

the grid of first baseline covariate values to provide estimates for

w2.grid

the grid of second baseline covariate values to provide estimates for

Examples

data(exampledata)
names(exampledata)

Example testing data for observational setting

Description

Example testing data for observational setting

Usage

data("obs_exampledata_test")

Format

A data frame with 200 observations on the following 9 variables.

X1

a numeric baseline covariate of interest

X2

a numeric baseline covariate of interest

X3

a numeric baseline covariate of interest

X4

a numeric baseline covariate of interest

X5

a numeric baseline covariate of interest

X6

a numeric baseline covariate of interest

G

the non-randomized treatment assignment, where 1 indicates treated and 0 indicates control

S

the surrogate marker

Y

the primary outcome

Examples

data(obs_exampledata_test)
names(obs_exampledata_test)

Example training data for observational setting

Description

Example training data for observational setting

Usage

data("obs_exampledata_train")

Format

A data frame with 1800 observations on the following 9 variables.

X1

a numeric baseline covariate of interest

X2

a numeric baseline covariate of interest

X3

a numeric baseline covariate of interest

X4

a numeric baseline covariate of interest

X5

a numeric baseline covariate of interest

X6

a numeric baseline covariate of interest

G

the non-randomized treatment assignment, where 1 indicates treated and 0 indicates control

S

the surrogate marker

Y

the primary outcome

Examples

data(obs_exampledata_train)
names(obs_exampledata_train)

Estimate the proportion of the treatment effect explained by the surrogate marker as a function of multiple baseline covariates in an observational setting.

Description

Assesses surrogate heterogeneity in real world data by estimating the proportion of the treatment effect explained as a function of baseline covariates. Optionally tests individuals for strong surrogacy based on a threshold.

Usage

obs.het.surr(df.train, df.test, type, var.want = FALSE, threshold = NULL, 
  use.actual.control.S = FALSE)

Arguments

df.train

dataframe containing training data; must have columns G (treatment assignment), S (surrogate marker), and Y (primary outcome), in addition to the baseline covariates of interest

df.test

dataframe containing testing data; must contain the same baseline covariate columns as the training data

type

options are "linear", "gam", "trees", or "all"; type of base learners to use

var.want

TRUE or FALSE, if variance estimates are wanted

threshold

optional threshold to test individuals for the null hypothesis that PTE is greater than the threshold; must have var.want = TRUE to return p-values

use.actual.control.S

TRUE or FALSE, if user prefers to use the actual observed values for the surrogate in the control group instead of predicting values from the base learners

Value

A dataframe is returned, which is the df.test argument with new columns appended for the estimates and corresponding variances of delta, delta.s, and R.s. If a threshold is specified, returns a p-value for the null hypothesis that PTE > threshold.

Author(s)

Rebecca Knowlton

References

Knowlton, R. and Parast, L. (2025) “Assessing Surrogate Heterogeneity in Real World Data Using Meta-Learners." Under Review.

Examples

data(obs_exampledata_train)
data(obs_exampledata_test)
obs.het.surr(df.train = obs_exampledata_train, df.test = obs_exampledata_test,
type = "linear", var.want = FALSE)