| Title: | Longitudinal Surrogate Marker Analysis |
|---|---|
| Description: | Assess the proportion of treatment effect explained by a longitudinal surrogate marker as described in Agniel D and Parast L (2021) <doi:10.1111/biom.13310>; and estimate the treatment effect on a longitudinal surrogate marker as described in Wang et al. (2025) <doi:10.1093/biomtc/ujaf104>. |
| Authors: | Layla Parast [aut, cre], Denis Agniel [aut], Xuan Wang [aut] |
| Maintainer: | Layla Parast <[email protected]> |
| License: | GPL |
| Version: | 1.1 |
| Built: | 2026-05-15 09:28:37 UTC |
| Source: | https://github.com/laylaparast/longsurr |
Simulated example data for semiparametric joint Estimation functions
data("data_sjm")data("data_sjm")
A list with 200 observations on the following:
deltanumeric vector containing the event indicator for each observation
obsTnumeric matrix containing the time that the surrogate marker was measured for each observation; number of rows is equal to the number of observations (200) and number of columns is equal to the maximum number of surrogate markers measured (15)
Ynumeric matrix containing the surrogate marker measurements over time for each observation; same dimension as obsT
Timenumeric vector containing the observed event or censoring time for each observation
Treatmentnumeric vector containing the treatment indicator for each observation with 1 for treated and 0 for control
data(data_sjm) names(data_sjm)data(data_sjm) names(data_sjm)
Estimate the surrogate value of a longitudinal marker
estimate_surrogate_value(y_t, y_c, X_t, X_c, method = c("gam", "linear", "kernel"), k = 3, var = FALSE, bootstrap_samples = 50, alpha = 0.05)estimate_surrogate_value(y_t, y_c, X_t, X_c, method = c("gam", "linear", "kernel"), k = 3, var = FALSE, bootstrap_samples = 50, alpha = 0.05)
y_t |
vector of n1 outcome measurements for treatment group |
y_c |
vector of n0 outcome measurements for control or reference group |
X_t |
n1 x T matrix of longitudinal surrogate measurements for treatment group, where T is the number of time points |
X_c |
n0 x T matrix of longitudinal surrogate measurements for control or reference group, where T is the number of time points |
method |
method for dimension-reduction of longitudinal surrogate, either 'gam', 'linear', or 'kernel' |
k |
number of eigenfunctions to use in semimetric |
var |
logical, if TRUE then standard error estimates and confidence intervals are provided |
bootstrap_samples |
number of bootstrap samples to use for standard error estimation, used if var = TRUE, default is 50 |
alpha |
alpha level, default is 0.05 |
a tibble containing estimates of the treatment effect (Deltahat), the residual treatment effect (Deltahat_S), and the proportion of treatment effect explained (R); if var = TRUE, then standard errors of Deltahat_S and R are also provided (Deltahat_S_se and R_se), and quantile-based 95% confidence intervals for Deltahat_S and R are provided (Deltahat_S_ci_l [lower], Deltahat_S_ci_h [upper], R_ci_l [lower], R_ci_u [upper])
Agniel D and Parast L (2021). Evaluation of Longitudinal Surrogate Markers. Biometrics, 77(2): 477-489.
library(dplyr) data(full_data) wide_ds <- full_data %>% dplyr::select(id, a, tt, x, y) %>% tidyr::spread(tt, x) wide_ds_0 <- wide_ds %>% filter(a == 0) wide_ds_1 <- wide_ds %>% filter(a == 1) X_t <- wide_ds_1 %>% dplyr::select(`-1`:`1`) %>% as.matrix y_t <- wide_ds_1 %>% pull(y) X_c <- wide_ds_0 %>% dplyr::select(`-1`:`1`) %>% as.matrix y_c <- wide_ds_0 %>% pull(y) estimate_surrogate_value(y_t = y_t, y_c = y_c, X_t = X_t, X_c = X_c, method = 'gam', var = FALSE) estimate_surrogate_value(y_t = y_t, y_c = y_c, X_t = X_t, X_c = X_c, method = 'linear', var = TRUE, bootstrap_sample = 50)library(dplyr) data(full_data) wide_ds <- full_data %>% dplyr::select(id, a, tt, x, y) %>% tidyr::spread(tt, x) wide_ds_0 <- wide_ds %>% filter(a == 0) wide_ds_1 <- wide_ds %>% filter(a == 1) X_t <- wide_ds_1 %>% dplyr::select(`-1`:`1`) %>% as.matrix y_t <- wide_ds_1 %>% pull(y) X_c <- wide_ds_0 %>% dplyr::select(`-1`:`1`) %>% as.matrix y_c <- wide_ds_0 %>% pull(y) estimate_surrogate_value(y_t = y_t, y_c = y_c, X_t = X_t, X_c = X_c, method = 'gam', var = FALSE) estimate_surrogate_value(y_t = y_t, y_c = y_c, X_t = X_t, X_c = X_c, method = 'linear', var = TRUE, bootstrap_sample = 50)
Simulated nonsmooth data to illustrate functions
data("full_data")data("full_data")
A data frame with 10100 observations on the following 5 variables.
ida unique person ID
atreatment group, 0 or 1
tttime
xsurrogate marker value
yprimary outcome
Pre-smooth sparse longitudinal data
presmooth_data(obs_data, ...)presmooth_data(obs_data, ...)
obs_data |
data.frame or tibble containing the observed data, with columns |
... |
additional arguments passed on to |
list containing matrices X_t and X_c, which are the smoothed surrogate values for the treated and control groups, respectively, for use in downstream analyses
library(dplyr) data(full_data) obs_ds <- group_by(full_data, id) obs_data <- sample_n(obs_ds, 5) obs_data <- ungroup(obs_data) head(obs_data) presmooth_X <- presmooth_data(obs_data)library(dplyr) data(full_data) obs_ds <- group_by(full_data, id) obs_data <- sample_n(obs_ds, 5) obs_data <- ungroup(obs_data) head(obs_data) presmooth_X <- presmooth_data(obs_data)
Semiparametric joint modeling of the treatment effect on a longitudinal surrogate using both a Cox proportional hazards model and linear model
sjm_linear_estimate(X, Time, Delta, obsT, Y, n.resample=100, var = FALSE)sjm_linear_estimate(X, Time, Delta, obsT, Y, n.resample=100, var = FALSE)
X |
numeric vector containing the treatment indicator for each observation with 1 for treated and 0 for control |
Time |
numeric vector containing the observed event or censoring time for each observation |
Delta |
numeric vector containing the event indicator for each observation |
obsT |
numeric matrix containing the time that the surrogate marker was measured for each observation; number of rows should be equal to the number of observations and number of columns should be equal to the maximum number of surrogate markers measured. If the surrogate marker was not measured, the corresponding entry should be 0 or NA. |
Y |
numeric matrix containing the the surrogate marker measurements over time for each observation; number of rows should be equal to the number of observations and number of columns should be equal to the maximum number of surrogate markers measured. If the surrogate marker was not measured, as determined by the obsT entry, the Y at that time will be ignored. |
n.resample |
number of resampled estimates used for variance estimation; default is 100. |
var |
logical indicating whether the user would like variance estimates and confidence intervals; default is FALSE. |
A list of estimates is returned:
est |
vector of point estimates where the first entry is the hazard ratio from the Cox model, the second entry is the estimated treatment effect on the surrogate marker at baseline, and the third entry is the estimated treatment on the slope of the surrogate marker i.e., the surrogate marker trajectory |
SE |
if var is TRUE, a vector of standard error estimates corresponding to the returned point estimates |
CI_lower |
if var is TRUE, a vector of estimates for the lower bound of the 95% confidence interval for the quantities corresponding to the returned point estimates |
CI_upper |
if var is TRUE, a vector of estimates for the upper bound of the 95% confidence interval for the quantities corresponding to the returned point estimates |
Xuan Wang
Wang X, Zhou J, Parast L, Greene T (2025). Semiparametric Joint Modeling to Estimate the Treatment Effect on a Longitudinal Surrogate with Application to Chronic Kidney Disease Trials. Biometrics, 81(3): ujaf104.
data(data_sjm) sjm_linear_estimate(X=data_sjm$Treatment, Time = data_sjm$Time, Delta = data_sjm$delta, obsT = data_sjm$obsT, Y = data_sjm$Y) sjm_linear_estimate(X=data_sjm$Treatment, Time = data_sjm$Time, Delta = data_sjm$delta, obsT = data_sjm$obsT, Y = data_sjm$Y, n.resample=5, var=TRUE)data(data_sjm) sjm_linear_estimate(X=data_sjm$Treatment, Time = data_sjm$Time, Delta = data_sjm$delta, obsT = data_sjm$obsT, Y = data_sjm$Y) sjm_linear_estimate(X=data_sjm$Treatment, Time = data_sjm$Time, Delta = data_sjm$delta, obsT = data_sjm$obsT, Y = data_sjm$Y, n.resample=5, var=TRUE)
Semiparametric joint modeling of the treatment effect on a longitudinal surrogate using both a Cox proportional hazards model and a splines-based model
sjm_nl_estimate(X, Time, Delta, obsT, Y, gap_time = 0.1, n.resample = 100, var = FALSE)sjm_nl_estimate(X, Time, Delta, obsT, Y, gap_time = 0.1, n.resample = 100, var = FALSE)
X |
numeric vector containing the treatment indicator for each observation with 1 for treated and 0 for control |
Time |
numeric vector containing the observed event or censoring time for each observation |
Delta |
numeric vector containing the event indicator for each observation |
obsT |
numeric matrix containing the time that the surrogate marker was measured for each observation; number of rows should be equal to the number of observations and number of columns should be equal to the maximum number of surrogate markers measured. If the surrogate marker was not measured, the corresponding entry should be 0 or NA. |
Y |
numeric matrix containing the the surrogate marker measurements over time for each observation; number of rows should be equal to the number of observations and number of columns should be equal to the maximum number of surrogate markers measured. If the surrogate marker was not measured, as determined by the obsT entry, the Y at that time will be ignored. |
gap_time |
number indicating gap time for slope estimation; default is 0.1. |
n.resample |
number of resampled estimates used for variance estimation; default is 100. |
var |
logical indicating whether the user would like variance estimates and confidence intervals; default is FALSE. |
A list of estimates is returned:
est |
estimated hazard ratio from the Cox model |
est_t |
vector of estimated treatment effect on the slope of the surrogate marker i.e., the surrogate marker trajectory, on a grid constructed from the given gap time |
t_grid |
vector of grid times corresponding to the returned estimates |
SE_est |
if var is TRUE, standard error estimate of the hazard ratio |
SE_est_t |
if var is TRUE, standard error estimate of the estimated treatment effect on the slope of the surrogate marker |
CI_lower_est |
if var is TRUE, lower bound of the 95% confidence interval for the hazard ratio |
CI_upper_est |
if var is TRUE, upper bound of the 95% confidence interval for the hazard ratio |
CI_lower_est_t |
if var is TRUE, lower bound of the 95% confidence interval for the treatment effect on the slope of the surrogate marker |
CI_upper_est_t |
if var is TRUE, upper bound of the 95% confidence interval for the treatment effect on the slope of the surrogate marker |
Xuan Wang
Wang X, Zhou J, Parast L, Greene T (2025). Semiparametric Joint Modeling to Estimate the Treatment Effect on a Longitudinal Surrogate with Application to Chronic Kidney Disease Trials. Biometrics, 81(3): ujaf104.
data(data_sjm) sjm_nl_estimate(X=data_sjm$Treatment, Time = data_sjm$Time, Delta = data_sjm$delta, obsT = data_sjm$obsT, Y = data_sjm$Y, gap_time=0.2) sjm_nl_estimate(X=data_sjm$Treatment, Time = data_sjm$Time, Delta = data_sjm$delta, obsT = data_sjm$obsT, Y = data_sjm$Y, gap_time = 0.2, n.resample=5, var=TRUE)data(data_sjm) sjm_nl_estimate(X=data_sjm$Treatment, Time = data_sjm$Time, Delta = data_sjm$delta, obsT = data_sjm$obsT, Y = data_sjm$Y, gap_time=0.2) sjm_nl_estimate(X=data_sjm$Treatment, Time = data_sjm$Time, Delta = data_sjm$delta, obsT = data_sjm$obsT, Y = data_sjm$Y, gap_time = 0.2, n.resample=5, var=TRUE)