| Metric | Value |
|---|---|
| Original analytic cohort | 1,629 |
| Observed quit rate | 26.3% |
| Strategy-compatible clone share | 50.0% |
| Uncensored clone share after compatibility plus follow-up | 48.1% |
| Mean clone-censor weight among uncensored clones | 1.00 |
Clone-Censor-Weight Analysis For Sustained Smoking-Cessation Strategies
An applied methods paper using NHEFS with explicit handling of artificial censoring and follow-up loss
Background
Clone-censor-weight methods are most useful when the estimand is naturally defined in terms of sustained treatment strategies rather than as a contrast between the exposure patterns that happened to be observed. Among baseline smokers, what would 11-year weight change have looked like under a quit-smoking strategy versus a continued-smoking strategy?
In a conventional observational analysis, each participant contributes only to the strategy actually observed in the data. In a clone-censor analysis, each eligible participant is copied into each candidate strategy at baseline. Once observed behavior reveals that one clone is incompatible with its assigned strategy, that clone is artificially censored. A weighted analysis is then used to recover the intended strategy contrast under explicit assumptions.
This logic becomes especially important when artificial censoring is followed by additional loss of outcome observation. The missing-data and attrition problem does not disappear simply because incompatibility censoring has been handled. Instead, the retained clone set can still become selected by follow-up loss, which is why the same inverse probability weighting logic used for ordinary attrition analyses remains relevant here.
The present analysis therefore treats clone-censor-weight not only as a strategy-emulation tool, but also as a selection problem. Artificial censoring defines which clones remain eligible to represent each strategy, and observed follow-up determines which of those eligible clones still contribute the outcome.
Data Source and Study Question
Data came from causaldata::nhefs, a smoking cessation dataset commonly used in causal inference methods work. The estimand was framed as a contrast between two sustained baseline strategies:
- quit smoking
- continue smoking
The outcome was 11-year weight change, measured by wt82_71.
Methods
Cohort Definition and Baseline Characteristics
The eligible cohort consisted of baseline smokers in NHEFS with the required baseline variables observed. Competing sustained strategies were defined as smoking cessation and continued smoking. Baseline eligibility and covariate requirements were specified before clone construction so that each participant could, in principle, contribute to either strategy at time zero.
Table 1 summarizes the observed-treatment groups before cloning. These summaries are descriptive rather than causal, but they are useful for showing how the observed quitter and continuer groups differ prior to any strategy re-expression.
| Observed strategy | N | Mean age | Mean cigarettes/day | Mean baseline weight | Observed outcome % | Mean 11-year weight change |
|---|---|---|---|---|---|---|
| Continue smoking | 1,201 | 42.9 | 21.2 | 70.5 | 96.8% | 2.0 |
| Quit smoking | 428 | 46.7 | 18.8 | 72.6 | 94.2% | 4.5 |
Clone Construction and Artificial Censoring
Each participant was duplicated into a quit-smoking clone and a continued-smoking clone. Strategy incompatibility was then defined from the observed qsmk indicator. Once incompatibility was identified, the corresponding clone was artificially censored. This step was required because a clone that no longer matched its assigned strategy could not legitimately remain in the risk set for that strategy.
| Stage | N |
|---|---|
| Original analytic cohort | 1629 |
| Cloned person-strategy rows | 3258 |
| Strategy-compatible clone rows | 1629 |
| Strategy-compatible clones with observed outcome | 1566 |
| Artificially censored clone rows | 1629 |
Because only two strategies were considered, one clone per participant remained strategy-compatible and one was artificially censored. Among the compatible clones, some still lacked observed outcome follow-up and therefore required additional weighting.
Follow-Up Loss After Artificial Censoring
The key methodological motivation for the weighting step was that artificial censoring did not fully solve the selection problem. Even after incompatibility was handled, only a subset of the compatible clones still contributed an observed outcome. This is directly analogous to the attrition problem in ordinary longitudinal analyses: the analysis population can drift away from the intended baseline cohort if follow-up loss is informative. In this setting, the relevant question was whether the clone set remaining after incompatibility censoring and follow-up loss still represented the baseline population to which the sustained-strategy estimand referred.
| Strategy | Clone rows | Compatible clones | Artificially censored clones | Observed-outcome clones | Compatibility % | Observed after compatibility % | Mean weight | Max weight | ESS |
|---|---|---|---|---|---|---|---|---|---|
| Continue smoking | 1629 | 1201 | 428 | 1163 | 73.7% | 71.4% | 1.00 | 1.08 | 1 162.4 |
| Quit smoking | 1629 | 428 | 1201 | 403 | 26.3% | 24.7% | 1.00 | 1.20 | 401.4 |
Censoring-Weight Model
The weighting model estimated the probability that a clone remained uncensored after both incompatibility censoring and observed follow-up, conditional on baseline covariates and assigned strategy. Stabilized censoring weights were then constructed from that model. Baseline age, sex, race, education, smoking burden, baseline weight and body mass index, exercise, and activity were included because these variables were plausible common causes of sustained strategy compatibility, observed follow-up, and later weight change.
The causal interpretation of the weighted strategy estimate therefore depended on standard exchangeability, positivity, consistency, and censoring-model assumptions. In manuscript terms, the weighting step should be read as an attempt to reconstruct the baseline strategy-defined cohort from the subset of clones that remained both compatible and outcome-observed.
| Assumption | Plain-language meaning | What to check |
|---|---|---|
| Exchangeability | After conditioning on measured baseline variables, the retained clones are comparable enough for the strategy contrast to be meaningful. | Whether important prognostic baseline variables are missing from the design. |
| Positivity | Both strategies remain plausible within the covariate patterns represented in the eligible cohort. | Whether weights are highly unstable or concentrated in a small set of clones. |
| Consistency | The observed quitting indicator is a usable proxy for the strategy definition we say it represents. | Whether the operational strategy labels are obviously coarser than the scientific intervention of interest. |
| Censoring-model adequacy | The variables in the censoring model are rich enough that remaining uncensored is not strongly driven by omitted predictors. | Whether predictors of remaining uncensored suggest the censoring story is too strong for the available data. |
| Check | Why it matters | Warning sign |
|---|---|---|
| Clone counts | Confirms the clone construction and censoring logic behaved as expected. | Unexpected clone counts or censoring totals. |
| Observed-outcome counts after compatibility | Shows whether follow-up loss is changing the retained clone set in a meaningful way. | Large asymmetry in retained outcome-contributing clones with no clear design explanation. |
| Weight range and concentration | Flags potential positivity or model-instability problems. | Very large weights or a heavy right tail. |
| Effective sample size | Shows how much information remains after weighting. | A sharp collapse in effective sample size. |
| Predictors of remaining uncensored | Helps judge whether censoring is plausibly ignorable conditional on the modeled variables. | Strong signals tied to variables that look under-modeled or conceptually downstream. |
| Movement versus naive analysis | Shows whether the strategy framing materially changes the estimate or mainly changes the justification. | Large estimate shifts that cannot be explained by a plausible design story. |
Comparative Analyses
Three analyses were compared:
- a naive observed-treatment contrast
- a covariate-adjusted observed-treatment model
- a clone-censor weighted strategy analysis
The purpose of the comparison was not only to compare coefficients, but also to evaluate whether the strategy-based design materially altered the inferred contrast or mainly changed the justification and target population. This mirrors the logic used in missing-data analyses, where different analytic strategies are informative not only because they can shift the point estimate, but also because they clarify which population and which assumptions the estimate actually represents.
Results
Weight Diagnostics
| Weight summary statistic | Value |
|---|---|
| Minimum | 0.88 |
| 25th percentile | 0.98 |
| Median | 1.00 |
| Mean | 1.00 |
| 75th percentile | 1.02 |
| Maximum | 1.21 |
The stabilized weights were not extreme, suggesting that the weighted contrast was not dominated by a small number of highly upweighted clones.
Predictors of Remaining Uncensored
The model below summarizes which baseline features were associated with remaining uncensored after strategy compatibility and follow-up.
| Predictor | Odds ratio | 95% CI |
|---|---|---|
| Quit smoking strategy clone | 0.13 | 0.11 to 0.15 |
| Inactive vs very active | 0.90 | 0.68 to 1.20 |
| Some exercise vs much exercise | 1.09 | 0.87 to 1.36 |
| Little or no exercise vs much exercise | 1.06 | 0.84 to 1.34 |
| Baseline BMI | 0.97 | 0.53 to 1.78 |
| Moderately active vs very active | 1.01 | 0.85 to 1.20 |
| Male vs female | 0.99 | 0.79 to 1.25 |
| Years of education | 1.00 | 0.98 to 1.03 |
| Smoking years | 1.00 | 0.98 to 1.01 |
| Baseline weight | 1.00 | 0.98 to 1.01 |
These results should be interpreted as diagnostics rather than as substantive predictors of treatment success. Their main role is to show whether the retained clone set appears selected on measured baseline characteristics.
Comparative Effect Estimates
| Analysis | Population | N | Estimate | 95% CI |
|---|---|---|---|---|
| Naive observed-strategy contrast | Observed-outcome subset classified by recorded quitting status | 1,566 | 2.54 kg | 1.66 to 3.42 kg |
| Covariate-adjusted observed-strategy model | Observed-outcome subset after baseline covariate adjustment | 1,566 | 3.32 kg | 2.46 to 4.18 kg |
| Clone-censor weighted strategy comparison | Cloned baseline cohort reweighted toward adherence-compatible strategy clones | 1,566 | 2.44 kg | 1.55 to 3.32 kg |
The clone-censor estimate did not diverge sharply from the simpler observed-treatment analyses. The main methodological gain was not a dramatic numerical shift, but rather a closer alignment between the analysis and the sustained-strategy question.
Discussion
Principal Findings
Three findings are most important.
- The smoking-cessation question can be expressed naturally as a sustained-strategy contrast rather than only as a comparison between observed quitters and continuing smokers.
- Artificial censoring alone does not eliminate selection concerns, because compatible clones can still be lost through missing follow-up outcome data.
- A clone-censor-weight analysis therefore inherits some of the same logic as ordinary attrition analyses: the retained analytic set must be reconnected to the intended baseline population through explicit modeling assumptions.
Relation to Attrition and Selection
The missing-data and attrition framing is useful here because it clarifies what the weighting step is doing. After artificial censoring, the analysis no longer concerns the full cloned cohort, but only the subset that remains both strategy-compatible and outcome-observed. Without adjustment, that retained subset may represent a selected population rather than the intended baseline cohort. In that sense, the clone-censor problem and the attrition problem are structurally related: both require asking whether the observed analytic sample still represents the target population.
This framing also helps explain why the weighted strategy estimate should not be interpreted as automatically superior to the simpler observed-treatment analyses. Its strength comes from better alignment to the estimand, but that strength depends on whether the censoring model is rich enough and whether positivity remains plausible after the strategy structure is imposed.
Limitations
| Pitfall | Why it hurts | Better practice |
|---|---|---|
| Treating clone-censor as a default upgrade | It can add complexity without improving alignment to the scientific question. | Use it only when the strategy question clearly motivates it. |
| Using vague strategy definitions | Ambiguous strategies make compatibility and censoring impossible to defend. | Write the strategies in plain language before implementing them in code. |
| Ignoring follow-up missingness after artificial censoring | The retained clone set may still be selected even after incompatibility censoring is handled. | Track outcome observation explicitly among compatible clones. |
| Reporting the weighted estimate without diagnostics | Readers cannot tell whether the design is stable enough to support interpretation. | Treat counts, weight diagnostics, and ESS as core results. |
| Overclaiming causal precision | The method does not remove the need to explain residual bias and data limitations. | Frame the estimate as assumption-dependent and design-dependent. |
| Note |
|---|
| This analysis is intentionally simplified: adherence is summarized with a single observed quitting indicator rather than repeated treatment updates. |
| Artificial censoring occurs immediately after strategy incompatibility is identified, and the final weights also reflect who still has an observed outcome after remaining strategy-compatible. |
| The weighted estimate is best read as an applied demonstration of clone-censor logic with follow-up weighting, not as a fully developed longitudinal causal analysis. |
The current implementation remains intentionally simplified. Adherence was summarized using a single observed quitting indicator, and the weighting model was built from baseline covariates rather than a richer time-updated structure. Accordingly, the analysis should be read as an applied methods demonstration rather than as a full longitudinal target trial emulation.
Future Methodological Extensions
- Add an explicit grace-period parameter and compare immediate versus delayed incompatibility rules.
- Introduce a richer censoring model with nonlinear terms or interactions.
- Apply the same framework to data with repeated treatment updates and more detailed follow-up structure.
References
- Hernan MA, Robins JM. Causal Inference: What If. This provides the main conceptual framework for strategy-based causal questions, artificial censoring, exchangeability, and positivity.
- Hernan MA, Robins JM. “Using Big Data to Emulate a Target Trial When a Randomized Trial Is Not Available.” American Journal of Epidemiology. 2016. This is the main target-trial emulation reference behind the strategy framing.
- Seaman SR, White IR. Review articles on inverse probability weighting for missing data provide the conceptual basis for treating loss of observed follow-up after artificial censoring as a selection problem.
- van Buuren S. Flexible Imputation of Missing Data. This is useful background for understanding why follow-up loss should be treated as a modeling problem rather than ignored as a simple data omission.