Title: | Landmark Prediction for Mixture Data |
---|---|
Description: | Non-parametric prediction of survival outcomes for mixture data that incorporates covariates and a landmark time. Details are described in Garcia (2021) <doi:10.1093/biostatistics/kxz052>. |
Authors: | Tanya Garcia [aut], Layla Parast [cre] |
Maintainer: | Layla Parast <[email protected]> |
License: | GPL |
Version: | 1.0 |
Built: | 2025-02-15 03:08:30 UTC |
Source: | https://github.com/cran/landmix |
Produces data from different populations with the probability of belonging to a population. Also produces one discrete covariate and one continuous covariate.
GenerateData(n, p, m, qvs, censoring.rate, simu.setting, covariate.dependent)
GenerateData(n, p, m, qvs, censoring.rate, simu.setting, covariate.dependent)
n |
sample size, must be at least 1. |
p |
number of populations, must be at least 2. |
m |
number of different mixture proportions, must be at least 2. |
qvs |
a numeric matrix of size |
censoring.rate |
a scalar indicating the censoring proportion. Options are 0 or 50. |
simu.setting |
Character indicating simulation setting. Options are "1A", "1B", "2A","2B". Setting "1A" and "1B" refer to Simulation setting 1 in the referenced paper, "1A" means the survival outcomes do NOT depend on the covariates, and "1B" means the survival outcomes do depend on the covariates. Setting "2A" and "2B" refer to Simulation setting 2 in the referenced paper, "2A" means the survival outcomes do NOT depend on the covariates, and "2B" means the survival outcomes do depend on the covariates. |
covariate.dependent |
logical indicator. If TRUE, then the survival times depend on covariates. |
Returns a list containing
x: a numeric vector of length n
containing the observed event times
for each person in the sample.
delta: a numeric vector of length n
that denotes
censoring (1 denotes event is observed, 0 denotes event is censored).
q: a numeric matrix of size p
by n
containing the
mixture proportions for each person in the sample.
ww: a numeric vector of length n
containing the values of the continuous
covariate for each person in the sample.
zz: a numeric vector of length n
containing the values of the discrete
covariate for each person in the sample.
true.groups: numeric vector of length n
denoting the population identifier for each person in the sample.
Estimates the distribution function for mixture data where
the population identifiers are unknown, but the probability of belonging
to a population is known. The distribution functions are evaluated at
time points tval
and adjust for dynamic landmark prediction and one
discrete covariate (zz
) and one continuous covariate (ww
).
landmix.estimator(n, m, p, qvs, q, x, delta, ww, zz, run.NPNA, run.NPNA_avg, tval, tval0, z.use, w.use)
landmix.estimator(n, m, p, qvs, q, x, delta, ww, zz, run.NPNA, run.NPNA_avg, tval, tval0, z.use, w.use)
n |
sample size, must be at least 1. |
m |
number of different mixture proportions, must be at least 2. |
p |
number of populations, must be at least 2. |
qvs |
a numeric matrix of size |
q |
a numeric matrix of size |
x |
a numeric vector of length |
delta |
a numeric vector of length |
ww |
a numeric vector of length |
zz |
a numeric vector of length |
run.NPNA |
a logical indicator. If TRUE, then the output includes the estimated distribution function for mixture data that accounts for covariates and dynamic landmarking. This estimator is called "NPNA" in the referenced paper. |
run.NPNA_avg |
a logical indicator. If TRUE, then the output includes the estimated distribution function for mixture data that averages out over the observed covariates. This is referred to as NPNA_marg in the referenced paper. |
tval |
numeric vector of time points at which the distribution function is evaluated, all values must be non-negative. |
tval0 |
numeric vector of time points representing the landmark times. All values must be non-negative
and smaller than the maximum of |
z.use |
numeric vector at which to evaluate the discrete covariate |
w.use |
numeric vector at which to evaluate the continuous covariate |
landmix.estimator
returns a list containing
Ft.estimate: a numeric array containing the estimated distribution functions for all methods for all
p
populations. The distribution function is evaluated at each tval
,
tval0
, z.use
, w.use
, and for all p
populations.
The dimension of the array is the number of methods by length(tval)
by lenth(tval0)
by
length(z.use)
by length(w.use)
by p
. The distribution function is only valid for , so
Ft.estimate
shows NA for any combination for which .
St.estimate: a numeric array containing the estimated distribution functions for all methods
for all m
mixture proportion subgroups. The distribution function is evaluated
at each tval
, tval0
, z.use
, w.use
, and for all m
mixture
proportion subgroups.
The dimension of the array is the number of methods by length(tval)
by lenth(tval0)
by
length(z.use)
by length(w.use)
by m
. The distribution function is only valid for , so
St.estimate
shows NA for any combination for which .
We estimate the distribution function for mixture data where
the population identifiers are unknown, but the probability of belonging
to a population is known. The distribution functions are evaluated at
time points tval
and adjust for dynamic landmark prediction and one
discrete covariate (zz
) and one continuous covariate (ww
).
Dynamic landmark prediction means that the distribution function is computed knowing
that the survival time, , satisfies
where
are the time points in
tval0
.
# Setup parameters to generate the data set.seed(1) censoring.rate <- 40 p <- 2 n <- 2000 m <- 4 tval <- seq(0,80,by=5) tval0 <- c(0,20,30,40,50) z.use <- c(0,1) w.use <- seq(35,55,by=1) simu.setting <- "2A" covariate.dependent <- TRUE run.NPMLEs <- TRUE run.NPNA <- TRUE run.OLS <- FALSE run.WLS <- FALSE run.EFF <- FALSE run.NPNA_avg <- FALSE ## compute the finite set of mixture proportions qvs <- qvs.values(p,m) ## generate the data data.gen <- GenerateData(n,p,m,qvs,censoring.rate,simu.setting,covariate.dependent) x <- data.gen$x delta <- data.gen$delta q <- data.gen$q ww <- data.gen$ww zz <- data.gen$zz ## true group membership (needed to compute the AUC/BS for simulated data true.groups <- data.gen$true.groups ## Perform the estimation estimators.out <- landmix.estimator(n,m,p,qvs,q, x,delta,ww,zz, run.NPNA, run.NPNA_avg, tval,tval0, z.use,w.use)
# Setup parameters to generate the data set.seed(1) censoring.rate <- 40 p <- 2 n <- 2000 m <- 4 tval <- seq(0,80,by=5) tval0 <- c(0,20,30,40,50) z.use <- c(0,1) w.use <- seq(35,55,by=1) simu.setting <- "2A" covariate.dependent <- TRUE run.NPMLEs <- TRUE run.NPNA <- TRUE run.OLS <- FALSE run.WLS <- FALSE run.EFF <- FALSE run.NPNA_avg <- FALSE ## compute the finite set of mixture proportions qvs <- qvs.values(p,m) ## generate the data data.gen <- GenerateData(n,p,m,qvs,censoring.rate,simu.setting,covariate.dependent) x <- data.gen$x delta <- data.gen$delta q <- data.gen$q ww <- data.gen$ww zz <- data.gen$zz ## true group membership (needed to compute the AUC/BS for simulated data true.groups <- data.gen$true.groups ## Perform the estimation estimators.out <- landmix.estimator(n,m,p,qvs,q, x,delta,ww,zz, run.NPNA, run.NPNA_avg, tval,tval0, z.use,w.use)
Produces the finite set of mixture proportions for simulated data.
qvs.values(p, m)
qvs.values(p, m)
p |
number of populations, must be at least 2. |
m |
number of different mixture proportions, must be at least 2. |
Returns a p
by m
matrix of mixture proportions.