| Title: | Sequential Trial Emulation |
|---|---|
| Description: | Implementation of sequential trial emulation for the analysis of observational databases. The 'SEQTaRget' software accommodates time-varying treatments and confounders, as well as binary and failure time outcomes. 'SEQTaRget' allows to compare both static and dynamic strategies, can be used to estimate observational analogs of intention-to-treat and per-protocol effects, and can adjust for potential selection bias induced by losses-to-follow-up. (Paper to come). |
| Authors: | Ryan O'Dea [aut, cre] (ORCID: <https://orcid.org/0009-0000-0103-9546>), Alejandro Szmulewicz [aut] (ORCID: <https://orcid.org/0000-0002-2664-802X>), Tom Palmer [aut] (ORCID: <https://orcid.org/0000-0003-4655-4511>, ROR: <https://ror.org/0524sp257>), Paul Madley-Dowd [aut] (ORCID: <https://orcid.org/0000-0003-2932-9486>), Miguel Hernán [aut] (ORCID: <https://orcid.org/0000-0003-1619-8456>), The President and Fellows of Harvard College [cph] (ROR: <https://ror.org/03vek6s52>) |
| Maintainer: | Ryan O'Dea <[email protected]> |
| License: | MIT + file LICENSE |
| Version: | 1.4.2.9002 |
| Built: | 2026-06-08 14:54:35 UTC |
| Source: | https://github.com/CausalInference/SEQTaRget |
Function to return competing event models from a SEQuential object
compevent(object)compevent(object)
object |
SEQoutput object |
A fastglm object, or a named list of fastglm objects when subgroups are specified
Retrieves Outcome, Numerator, and Denominator Covariates
covariates(object)covariates(object)
object |
object of class SEQoutput |
List of SEQuential covariates
Retrieves Denominator Models from SEQuential object
denominator(object)denominator(object)
object |
object of class SEQoutput |
List of both denominator models
Function to return diagnostic tables from a SEQuential object
diagnostics(object)diagnostics(object)
object |
SEQoutput object |
List of diagnostic tables
Function to return hazard ratios from a SEQuential object
hazard_ratio(object)hazard_ratio(object)
object |
SEQoutput object |
A named vector of hazard ratios, or a named list of vectors when subgroups are specified
Function to print Kaplan-Meier curves
km_curve( object, plot.type = "survival", plot.title, plot.subtitle, plot.labels, plot.colors )km_curve( object, plot.type = "survival", plot.title, plot.subtitle, plot.labels, plot.colors )
object |
SEQoutput object to plot |
plot.type |
character: type of plot to print; one of: |
plot.title |
character: defines the title of the plot |
plot.subtitle |
character: plot subtitle |
plot.labels |
length 2 character: plot labels |
plot.colors |
length 2 character: plot colors |
ggplot object of plot plot.type
Function to return survival data from a SEQuential object
km_data(object)km_data(object)
object |
SEQoutput object |
A data frame of survival values, or a named list of data frames when subgroups are specified
Retrieves Numerator Models from SEQuential object
numerator(object)numerator(object)
object |
object of class SEQoutput |
List of both numerator models
Retrieves Outcome Models from SEQuential object
outcome(object)outcome(object)
object |
object of class SEQoutput |
List of all outcome models
Function to return risk information from a SEQuential object
risk_comparison(object)risk_comparison(object)
object |
SEQoutput object |
A data frame of risk information at end of followup (risk ratios, risk differences and confidence intervals, if bootstrapped)
Function to return risk information from a SEQuential object
risk_data(object)risk_data(object)
object |
SEQoutput object |
A data table of risk information at the end of followup
Function to return the internal data from a SEQuential object
SEQ_data(object)SEQ_data(object)
object |
SEQoutput object |
data.table
Simulated observational example data for SEQuential()
SEQdataSEQdata
A data frame with 12,180 rows and 11 columns:
Integer: Unique ID emulating individual patients
Integer: Time of observation, always begins at 0, max time of 59. Should be continuous
Binary: eligibility criteria for timepoints
Binary: If an outcome is observed at this time point
Binary: If treatment is observed at this time point
Binary: Sex of the emulated patient
Numeric: Normal random variable from N(10,5)
Numeric: 4% continuously increase from U(0, 1)
Numeric: 2% continuously decrease from U(9, 10)
Binary: Once one, always one variable emulating an excuse for treatment switch
Binary: Once one, always one variable emulating an excuse for treatment switch
SEQuential()
Simulated lost-to-followup example data for SEQuential()
SEQdata.LTFUSEQdata.LTFU
A dataframe with 54,687 rows and 13 columns:
Integer: Unique ID emulating individual patients
Integer: Time of observation, always begins at 0, max time of 59; however, if lost-to-followup, time is truncated at a random point
Binary: eligibility criteria for timepoints
Binary: If an outcome is observed at this time point
Binary: If treatment is observed at this time point
Binary: Sex of the emulated patient
Numeric: Normal random variable from N(10,5)
Numeric: 4% continuously increase from U(0, 1)
Numeric: 2% continuously decrease from U(9, 10)
Binary: Once one, always one variable emulating an excuse for treatment switch
Binary: Once one, always one variable emulating an excuse for treatment switch
Binary: Flag for losing a simulated ID to followup, if 1 there are no more records of the ID afterwards
Binary: emulates columns which are eligible to entering into censoring models (e.g. if you want to limit columns for the LTFU model)
SEQuential() multinomial modelsSimulated multitreatment example data for SEQuential() multinomial models
SEQdata.multitreatmentSEQdata.multitreatment
A dataframe with 5,976 rows and 11 columns:
Integer: Unique ID emulating individual patients
Integer: Time of observation, always begins at 0, max time of 59; however, if lost-to-followup, time is truncated at a random point
Binary: eligibility criteria for timepoints
Binary: If an outcome is observed at this time point
Integer: Which treatment is observed at this time point
Binary: Sex of the emulated patient
Numeric: Normal random variable from N(10,5)
Numeric: 4% continuously increase from U(0, 1)
Numeric: 2% continuously decrease from U(9, 10)
Binary: Once one, always one variable emulating an excuse for treatment switch
Binary: Once one, always one variable emulating an excuse for treatment switch
Estimate the (very rough) time to run SEQuential analysis on current machine
SEQestimate( data, id.col, time.col, eligible.col, treatment.col, outcome.col, time_varying.cols = list(), fixed.cols = list(), method, options, verbose = TRUE )SEQestimate( data, id.col, time.col, eligible.col, treatment.col, outcome.col, time_varying.cols = list(), fixed.cols = list(), method, options, verbose = TRUE )
data |
data.frame or data.table, if not already expanded with |
id.col |
String: column name of the id column |
time.col |
String: column name of the time column |
eligible.col |
String: column name of the eligibility column |
treatment.col |
String: column name of the treatment column |
outcome.col |
String: column name of the outcome column |
time_varying.cols |
List: column names for time varying columns |
fixed.cols |
List: column names for fixed columns |
method |
String: method of analysis to perform |
options |
List: optional list of parameters from |
verbose |
Logical: if |
A list of (very rough) estimates for the time required for SEQuential containing:
modelTime estimated time used when running models
expansionTime estimated time used when expanding data
totalTime sum of model and expansion time
Parameter Builder for SEQuential Model and Estimates
SEQopts( bootstrap = FALSE, bootstrap.nboot = 100, bootstrap.sample = 0.8, bootstrap.CI = 0.95, bootstrap.CI_method = "se", cense = NA, cense.denominator = NA, cense.eligible = NA, cense.numerator = NA, compevent = NA, covariates = NA, data.return = FALSE, denominator = NA, deviation = FALSE, deviation.col = NA, deviation.conditions = c(NA, NA), deviation.excused = FALSE, deviation.excused_cols = c(NA, NA), excused = FALSE, excused.cols = c(NA, NA), expand.only = FALSE, fastglm.method = 2L, followup.class = FALSE, followup.include = TRUE, followup.max = Inf, followup.min = 0, followup.spline = FALSE, followup.spline.df = 4L, glm.package = "fastglm", hazard = FALSE, indicator.baseline = "_bas", indicator.squared = "_sq", km.curves = FALSE, multinomial = FALSE, ncores = availableCores(omit = 1L), nthreads = getDTthreads(), numerator = NA, parallel = FALSE, parglm.control = NULL, plot.colors = c("#F8766D", "#00BFC4", "#555555"), plot.labels = NA, plot.subtitle = NA, plot.title = NA, plot.type = "survival", risk.times = NA, seed = NULL, selection.first_trial = FALSE, selection.prob = 0.8, selection.random = FALSE, subgroup = NA, survival.max = Inf, treat.level = c(0, 1), trial.include = TRUE, visit = NA, visit.denominator = NA, visit.numerator = NA, weight.eligible_cols = c(), weight.lower = 0, weight.lag_condition = TRUE, weight.p99 = FALSE, weight.preexpansion = TRUE, weight.upper = Inf, weighted = FALSE )SEQopts( bootstrap = FALSE, bootstrap.nboot = 100, bootstrap.sample = 0.8, bootstrap.CI = 0.95, bootstrap.CI_method = "se", cense = NA, cense.denominator = NA, cense.eligible = NA, cense.numerator = NA, compevent = NA, covariates = NA, data.return = FALSE, denominator = NA, deviation = FALSE, deviation.col = NA, deviation.conditions = c(NA, NA), deviation.excused = FALSE, deviation.excused_cols = c(NA, NA), excused = FALSE, excused.cols = c(NA, NA), expand.only = FALSE, fastglm.method = 2L, followup.class = FALSE, followup.include = TRUE, followup.max = Inf, followup.min = 0, followup.spline = FALSE, followup.spline.df = 4L, glm.package = "fastglm", hazard = FALSE, indicator.baseline = "_bas", indicator.squared = "_sq", km.curves = FALSE, multinomial = FALSE, ncores = availableCores(omit = 1L), nthreads = getDTthreads(), numerator = NA, parallel = FALSE, parglm.control = NULL, plot.colors = c("#F8766D", "#00BFC4", "#555555"), plot.labels = NA, plot.subtitle = NA, plot.title = NA, plot.type = "survival", risk.times = NA, seed = NULL, selection.first_trial = FALSE, selection.prob = 0.8, selection.random = FALSE, subgroup = NA, survival.max = Inf, treat.level = c(0, 1), trial.include = TRUE, visit = NA, visit.denominator = NA, visit.numerator = NA, weight.eligible_cols = c(), weight.lower = 0, weight.lag_condition = TRUE, weight.p99 = FALSE, weight.preexpansion = TRUE, weight.upper = Inf, weighted = FALSE )
bootstrap |
Logical: defines if |
bootstrap.nboot |
Integer: number of bootstraps, default is |
bootstrap.sample |
Numeric: percentage of data to use when bootstrapping, should be in [0, 1], default is |
bootstrap.CI |
Numeric: defines the confidence interval after bootstrapping, default is |
bootstrap.CI_method |
Character: selects which way to calculate bootstraps confidence intervals ( |
cense |
String: column name for additional censoring variable, e.g. loss-to-follow-up |
cense.denominator |
String: censoring denominator covariates to the right hand side of a formula object |
cense.eligible |
String: column name for indicator column defining which rows to use for censoring model |
cense.numerator |
String: censoring numerator covariates to the right hand side of a formula object |
compevent |
String: column name for competing event indicator |
covariates |
String: covariates to the right hand side of a formula object |
data.return |
Logical: whether to return the expanded dataframe with weighting information, default is |
denominator |
String: denominator covariates to the right hand side of a formula object |
deviation |
Logical: create switch based on deviation from column |
deviation.col |
Character: column name for deviation |
deviation.conditions |
Character list: RHS evaluations of the same length as |
deviation.excused |
Logical: whether deviations should be excused by |
deviation.excused_cols |
Character list: excused columns for deviation switches |
excused |
Logical: in the case of censoring, whether there is an excused condition, default is |
excused.cols |
List: list of column names for treatment switch excuses - should be the same length, and ordered the same as |
expand.only |
Logical: if |
fastglm.method |
Integer: decomposition method for fastglm ( |
followup.class |
Logical: treat followup as a class, e.g. expands every time to it's own indicator column, default is |
followup.include |
Logical: whether or not to include 'followup' and 'followup_squared' in the outcome model, default is |
followup.max |
Numeric: maximum time to expand about, default is |
followup.min |
Numeric: minimum follow-up time since trial enrollment to include, must be non-negative, default is |
followup.spline |
Logical: treat followup as a natural cubic spline ( |
followup.spline.df |
Integer: degrees of freedom passed to |
glm.package |
Character: package to use for fitting GLMs, either |
hazard |
Logical: hazard error calculation instead of survival estimation, default is |
indicator.baseline |
String: identifier for baseline variables in |
indicator.squared |
String: identifier for squared variables in |
km.curves |
Logical: Kaplan-Meier survival curve creation and data return, default is |
multinomial |
Logical: whether to expect multilevel treatment values, default is |
ncores |
Integer: number of cores to use in parallel processing, default is one less than system max, see |
nthreads |
Integer: number of threads to use for data.table processing, default is |
numerator |
String: numerator covariates to the right hand side of a formula object |
parallel |
Logical: define if the SEQuential process is run in parallel, default is |
parglm.control |
A control object from |
plot.colors |
Character: Colors for output plot if |
plot.labels |
Character: Color labels for output plot if |
plot.subtitle |
Character: Subtitle for output plot if |
plot.title |
Character: Title for output plot if |
plot.type |
Character: Type of plot to create if |
risk.times |
Numeric vector: follow-up times (in the data's follow-up units) at which to report risk difference and risk ratio when |
seed |
Integer: starting seed |
selection.first_trial |
Logical: selects only the first eligible trial in the expanded dataset, default |
selection.prob |
Numeric: percent of total IDs to select for |
selection.random |
Logical: randomly selects IDs with replacement to run analysis, default |
subgroup |
Character: Column name to stratify outcome models on |
survival.max |
Numeric: maximum time for survival curves, default is |
treat.level |
List: treatment levels to compare, default is |
trial.include |
Logical: whether or not to include 'trial' and 'trial_squared' in the outcome model, default is |
visit |
String: column name for visit indicator variable, e.g. |
visit.denominator |
String: visit denominator covariates to the right hand side of a formula object |
visit.numerator |
String: visit numerator covariates to the right hand side of a formula object |
weight.eligible_cols |
List: list of column names for indicator columns defining which weights are eligible for weight models - in order of |
weight.lower |
Numeric: IPCW weights truncated at this lower bound, must be non-negative, default is |
weight.lag_condition |
Logical: whether weights should be conditioned on treatment lag value, default |
weight.p99 |
Logical: forces weight truncation at 1st and 99th percentile weights, will override provided |
weight.preexpansion |
Logical: whether weighting should be done on pre-expanded data, default |
weight.upper |
Numeric: weights truncated at upper end at this weight, default is |
weighted |
Logical: whether or not to perform weighted analysis, default is |
An object of class 'SEQopts'
An S4 class used to hold the outputs for the SEQuential process
paramsSEQparams object
outcomeoutcome covariates
numeratornumerator covariates
denominatordenominator covariates
outcome.modellist of length bootstrap.nboot containing outcome coefficients
hazardhazard ratio
survival.curveggplot object for the survival curves
survival.datadata.table of survival data
risk.differencerisk difference calculated from survival data
risk.ratiorisk ratio calculated from survival data
timetime used for the SEQuential process
weight.statisticsinformation from the weighting process, containing weight coefficients and weight statistics
infolist of outcome and switch information (if applicable)
ce.modellist of competing event models if compevent is specified, NA otherwise
SEQuential is an all-in-one API to SEQuential analysis, returning a SEQoutput object of results. More specific examples can be found on pages at https://causalinference.github.io/SEQTaRget/
SEQuential( data, id.col, time.col, eligible.col, treatment.col, outcome.col, time_varying.cols = list(), fixed.cols = list(), method, options, verbose = TRUE )SEQuential( data, id.col, time.col, eligible.col, treatment.col, outcome.col, time_varying.cols = list(), fixed.cols = list(), method, options, verbose = TRUE )
data |
data.frame or data.table, will perform expansion according to arguments passed through the |
id.col |
String: column name of the id column |
time.col |
String: column name of the time column |
eligible.col |
String: column name of the eligibility column |
treatment.col |
String: column name of the treatment column |
outcome.col |
String: column name of the outcome column |
time_varying.cols |
List: column names for time varying columns |
fixed.cols |
List: column names for fixed columns |
method |
String: method of analysis to perform; should be one of |
options |
List: optional list of parameters from |
verbose |
Logical: if TRUE, cats progress to console, default is |
Implementation of sequential trial emulation for the analysis of observational databases.
The SEQuential software accommodates time-varying treatments and confounders, as well as binary
and failure time outcomes. SEQuential allows to compare both static and dynamic strategies,
can be used to estimate observational analogs of intention-to-treat
and per-protocol effects, and can adjust for potential selection bias induced by losses-to-follow-up.
An S4 object of class SEQoutput. If options = SEQopts(expand.only = TRUE), returns the expanded data.table directly, with analysis steps skipped.
data <- SEQdata # Intention-to-treat (ITT) effect: subjects are assigned to the treatment # arm defined by their baseline treatment and followed regardless of any later # treatment changes, so no weighting is required. SEQuential(data, id.col = "ID", time.col = "time", eligible.col = "eligible", treatment.col = "tx_init", outcome.col = "outcome", time_varying.cols = c("N", "L", "P"), fixed.cols = "sex", method = "ITT", options = SEQopts()) # Per-protocol effect via artificial censoring: subjects are censored when they # deviate from their assigned strategy, and inverse-probability-of-censoring # weights adjust for the resulting selection bias. The denominator models the # probability of remaining uncensored given the time-varying confounders, while # the numerator uses only the baseline covariates to stabilize the weights (so # the two formulas must differ - identical formulas give weights of 1). SEQuential(data, id.col = "ID", time.col = "time", eligible.col = "eligible", treatment.col = "tx_init", outcome.col = "outcome", time_varying.cols = c("N", "L", "P"), fixed.cols = "sex", method = "censoring", options = SEQopts(weighted = TRUE, numerator = "sex", denominator = "N + L + P + sex"))data <- SEQdata # Intention-to-treat (ITT) effect: subjects are assigned to the treatment # arm defined by their baseline treatment and followed regardless of any later # treatment changes, so no weighting is required. SEQuential(data, id.col = "ID", time.col = "time", eligible.col = "eligible", treatment.col = "tx_init", outcome.col = "outcome", time_varying.cols = c("N", "L", "P"), fixed.cols = "sex", method = "ITT", options = SEQopts()) # Per-protocol effect via artificial censoring: subjects are censored when they # deviate from their assigned strategy, and inverse-probability-of-censoring # weights adjust for the resulting selection bias. The denominator models the # probability of remaining uncensored given the time-varying confounders, while # the numerator uses only the baseline covariates to stabilize the weights (so # the two formulas must differ - identical formulas give weights of 1). SEQuential(data, id.col = "ID", time.col = "time", eligible.col = "eligible", treatment.col = "tx_init", outcome.col = "outcome", time_varying.cols = c("N", "L", "P"), fixed.cols = "sex", method = "censoring", options = SEQopts(weighted = TRUE, numerator = "sex", denominator = "N + L + P + sex"))
Show method for S4 object - SEQoutput.
## S4 method for signature 'SEQoutput' show(object)## S4 method for signature 'SEQoutput' show(object)
object |
A SEQoutput object - usually generated from |
No return value, sends information about SEQoutput to the console