Clone-Censor-Weight Analysis For Sustained Smoking-Cessation Strategies

An applied methods paper using NHEFS with explicit handling of artificial censoring and follow-up loss

Background

Clone-censor-weight methods are most useful when the estimand is naturally defined in terms of sustained treatment strategies rather than as a contrast between the exposure patterns that happened to be observed. Among baseline smokers, what would 11-year weight change have looked like under a quit-smoking strategy versus a continued-smoking strategy?

In a conventional observational analysis, each participant contributes only to the strategy actually observed in the data. In a clone-censor analysis, each eligible participant is copied into each candidate strategy at baseline. Once observed behavior reveals that one clone is incompatible with its assigned strategy, that clone is artificially censored. A weighted analysis is then used to recover the intended strategy contrast under explicit assumptions.

This logic becomes especially important when artificial censoring is followed by additional loss of outcome observation. The missing-data and attrition problem does not disappear simply because incompatibility censoring has been handled. Instead, the retained clone set can still become selected by follow-up loss, which is why the same inverse probability weighting logic used for ordinary attrition analyses remains relevant here.

The present analysis therefore treats clone-censor-weight not only as a strategy-emulation tool, but also as a selection problem. Artificial censoring defines which clones remain eligible to represent each strategy, and observed follow-up determines which of those eligible clones still contribute the outcome.

Data Source and Study Question

Data came from causaldata::nhefs, a smoking cessation dataset commonly used in causal inference methods work. The estimand was framed as a contrast between two sustained baseline strategies:

quit smoking
continue smoking

The outcome was 11-year weight change, measured by wt82_71.

Metric	Value
Original analytic cohort	1,629
Observed quit rate	26.3%
Strategy-compatible clone share	50.0%
Uncensored clone share after compatibility plus follow-up	48.1%
Mean clone-censor weight among uncensored clones	1.00

Methods

Cohort Definition and Baseline Characteristics

The eligible cohort consisted of baseline smokers in NHEFS with the required baseline variables observed. Competing sustained strategies were defined as smoking cessation and continued smoking. Baseline eligibility and covariate requirements were specified before clone construction so that each participant could, in principle, contribute to either strategy at time zero.

Table 1 summarizes the observed-treatment groups before cloning. These summaries are descriptive rather than causal, but they are useful for showing how the observed quitter and continuer groups differ prior to any strategy re-expression.

Table 1. Baseline characteristics and observed follow-up by observed smoking-cessation status.
Observed strategy	N	Mean age	Mean cigarettes/day	Mean baseline weight	Observed outcome %	Mean 11-year weight change
Continue smoking	1,201	42.9	21.2	70.5	96.8%	2.0
Quit smoking	428	46.7	18.8	72.6	94.2%	4.5

Clone Construction and Artificial Censoring

Each participant was duplicated into a quit-smoking clone and a continued-smoking clone. Strategy incompatibility was then defined from the observed qsmk indicator. Once incompatibility was identified, the corresponding clone was artificially censored. This step was required because a clone that no longer matched its assigned strategy could not legitimately remain in the risk set for that strategy.

Stage	N
Original analytic cohort	1629
Cloned person-strategy rows	3258
Strategy-compatible clone rows	1629
Strategy-compatible clones with observed outcome	1566
Artificially censored clone rows	1629

Because only two strategies were considered, one clone per participant remained strategy-compatible and one was artificially censored. Among the compatible clones, some still lacked observed outcome follow-up and therefore required additional weighting.

Follow-Up Loss After Artificial Censoring

The key methodological motivation for the weighting step was that artificial censoring did not fully solve the selection problem. Even after incompatibility was handled, only a subset of the compatible clones still contributed an observed outcome. This is directly analogous to the attrition problem in ordinary longitudinal analyses: the analysis population can drift away from the intended baseline cohort if follow-up loss is informative. In this setting, the relevant question was whether the clone set remaining after incompatibility censoring and follow-up loss still represented the baseline population to which the sustained-strategy estimand referred.

Strategy	Clone rows	Compatible clones	Artificially censored clones	Observed-outcome clones	Compatibility %	Observed after compatibility %	Mean weight	Max weight	ESS
Continue smoking	1629	1201	428	1163	73.7%	71.4%	1.00	1.08	1 162.4
Quit smoking	1629	428	1201	403	26.3%	24.7%	1.00	1.20	401.4

Censoring-Weight Model

The weighting model estimated the probability that a clone remained uncensored after both incompatibility censoring and observed follow-up, conditional on baseline covariates and assigned strategy. Stabilized censoring weights were then constructed from that model. Baseline age, sex, race, education, smoking burden, baseline weight and body mass index, exercise, and activity were included because these variables were plausible common causes of sustained strategy compatibility, observed follow-up, and later weight change.

The causal interpretation of the weighted strategy estimate therefore depended on standard exchangeability, positivity, consistency, and censoring-model assumptions. In manuscript terms, the weighting step should be read as an attempt to reconstruct the baseline strategy-defined cohort from the subset of clones that remained both compatible and outcome-observed.

Assumption	Plain-language meaning	What to check
Exchangeability	After conditioning on measured baseline variables, the retained clones are comparable enough for the strategy contrast to be meaningful.	Whether important prognostic baseline variables are missing from the design.
Positivity	Both strategies remain plausible within the covariate patterns represented in the eligible cohort.	Whether weights are highly unstable or concentrated in a small set of clones.
Consistency	The observed quitting indicator is a usable proxy for the strategy definition we say it represents.	Whether the operational strategy labels are obviously coarser than the scientific intervention of interest.
Censoring-model adequacy	The variables in the censoring model are rich enough that remaining uncensored is not strongly driven by omitted predictors.	Whether predictors of remaining uncensored suggest the censoring story is too strong for the available data.

Check	Why it matters	Warning sign
Clone counts	Confirms the clone construction and censoring logic behaved as expected.	Unexpected clone counts or censoring totals.
Observed-outcome counts after compatibility	Shows whether follow-up loss is changing the retained clone set in a meaningful way.	Large asymmetry in retained outcome-contributing clones with no clear design explanation.
Weight range and concentration	Flags potential positivity or model-instability problems.	Very large weights or a heavy right tail.
Effective sample size	Shows how much information remains after weighting.	A sharp collapse in effective sample size.
Predictors of remaining uncensored	Helps judge whether censoring is plausibly ignorable conditional on the modeled variables.	Strong signals tied to variables that look under-modeled or conceptually downstream.
Movement versus naive analysis	Shows whether the strategy framing materially changes the estimate or mainly changes the justification.	Large estimate shifts that cannot be explained by a plausible design story.

Comparative Analyses

Three analyses were compared:

a naive observed-treatment contrast
a covariate-adjusted observed-treatment model
a clone-censor weighted strategy analysis

The purpose of the comparison was not only to compare coefficients, but also to evaluate whether the strategy-based design materially altered the inferred contrast or mainly changed the justification and target population. This mirrors the logic used in missing-data analyses, where different analytic strategies are informative not only because they can shift the point estimate, but also because they clarify which population and which assumptions the estimate actually represents.

Results

Weight Diagnostics

Weight summary statistic	Value
Minimum	0.88
25th percentile	0.98
Median	1.00
Mean	1.00
75th percentile	1.02
Maximum	1.21

The stabilized weights were not extreme, suggesting that the weighted contrast was not dominated by a small number of highly upweighted clones.

Predictors of Remaining Uncensored

The model below summarizes which baseline features were associated with remaining uncensored after strategy compatibility and follow-up.

Predictor	Odds ratio	95% CI
Quit smoking strategy clone	0.13	0.11 to 0.15
Inactive vs very active	0.90	0.68 to 1.20
Some exercise vs much exercise	1.09	0.87 to 1.36
Little or no exercise vs much exercise	1.06	0.84 to 1.34
Baseline BMI	0.97	0.53 to 1.78
Moderately active vs very active	1.01	0.85 to 1.20
Male vs female	0.99	0.79 to 1.25
Years of education	1.00	0.98 to 1.03
Smoking years	1.00	0.98 to 1.01
Baseline weight	1.00	0.98 to 1.01

These results should be interpreted as diagnostics rather than as substantive predictors of treatment success. Their main role is to show whether the retained clone set appears selected on measured baseline characteristics.

Comparative Effect Estimates

Analysis	Population	N	Estimate	95% CI
Naive observed-strategy contrast	Observed-outcome subset classified by recorded quitting status	1,566	2.54 kg	1.66 to 3.42 kg
Covariate-adjusted observed-strategy model	Observed-outcome subset after baseline covariate adjustment	1,566	3.32 kg	2.46 to 4.18 kg
Clone-censor weighted strategy comparison	Cloned baseline cohort reweighted toward adherence-compatible strategy clones	1,566	2.44 kg	1.55 to 3.32 kg

The clone-censor estimate did not diverge sharply from the simpler observed-treatment analyses. The main methodological gain was not a dramatic numerical shift, but rather a closer alignment between the analysis and the sustained-strategy question.

Discussion

Principal Findings

Three findings are most important.

The smoking-cessation question can be expressed naturally as a sustained-strategy contrast rather than only as a comparison between observed quitters and continuing smokers.
Artificial censoring alone does not eliminate selection concerns, because compatible clones can still be lost through missing follow-up outcome data.
A clone-censor-weight analysis therefore inherits some of the same logic as ordinary attrition analyses: the retained analytic set must be reconnected to the intended baseline population through explicit modeling assumptions.

Relation to Attrition and Selection

The missing-data and attrition framing is useful here because it clarifies what the weighting step is doing. After artificial censoring, the analysis no longer concerns the full cloned cohort, but only the subset that remains both strategy-compatible and outcome-observed. Without adjustment, that retained subset may represent a selected population rather than the intended baseline cohort. In that sense, the clone-censor problem and the attrition problem are structurally related: both require asking whether the observed analytic sample still represents the target population.

This framing also helps explain why the weighted strategy estimate should not be interpreted as automatically superior to the simpler observed-treatment analyses. Its strength comes from better alignment to the estimand, but that strength depends on whether the censoring model is rich enough and whether positivity remains plausible after the strategy structure is imposed.

Limitations

Pitfall	Why it hurts	Better practice
Treating clone-censor as a default upgrade	It can add complexity without improving alignment to the scientific question.	Use it only when the strategy question clearly motivates it.
Using vague strategy definitions	Ambiguous strategies make compatibility and censoring impossible to defend.	Write the strategies in plain language before implementing them in code.
Ignoring follow-up missingness after artificial censoring	The retained clone set may still be selected even after incompatibility censoring is handled.	Track outcome observation explicitly among compatible clones.
Reporting the weighted estimate without diagnostics	Readers cannot tell whether the design is stable enough to support interpretation.	Treat counts, weight diagnostics, and ESS as core results.
Overclaiming causal precision	The method does not remove the need to explain residual bias and data limitations.	Frame the estimate as assumption-dependent and design-dependent.

Note
This analysis is intentionally simplified: adherence is summarized with a single observed quitting indicator rather than repeated treatment updates.
Artificial censoring occurs immediately after strategy incompatibility is identified, and the final weights also reflect who still has an observed outcome after remaining strategy-compatible.
The weighted estimate is best read as an applied demonstration of clone-censor logic with follow-up weighting, not as a fully developed longitudinal causal analysis.

The current implementation remains intentionally simplified. Adherence was summarized using a single observed quitting indicator, and the weighting model was built from baseline covariates rather than a richer time-updated structure. Accordingly, the analysis should be read as an applied methods demonstration rather than as a full longitudinal target trial emulation.

Future Methodological Extensions

Add an explicit grace-period parameter and compare immediate versus delayed incompatibility rules.
Introduce a richer censoring model with nonlinear terms or interactions.
Apply the same framework to data with repeated treatment updates and more detailed follow-up structure.

References

Hernan MA, Robins JM. Causal Inference: What If. This provides the main conceptual framework for strategy-based causal questions, artificial censoring, exchangeability, and positivity.
Hernan MA, Robins JM. “Using Big Data to Emulate a Target Trial When a Randomized Trial Is Not Available.” American Journal of Epidemiology. 2016. This is the main target-trial emulation reference behind the strategy framing.
Seaman SR, White IR. Review articles on inverse probability weighting for missing data provide the conceptual basis for treating loss of observed follow-up after artificial censoring as a selection problem.
van Buuren S. Flexible Imputation of Missing Data. This is useful background for understanding why follow-up loss should be treated as a modeling problem rather than ignored as a simple data omission.