Socrates Aedo, MD, MSc,1 Gabriel Cavada, MSc,2 Juan E. Blümel, MD, PhD,3 Peter Chedraui, MD, MSc, PhD,4 Juan Fica, MD,5 Patricio Barriga, MD,6 Sergio Brantes, MD,1 Cristina Irribarra, MD,1 María Vallejo, MD,7 and I´talo Campodo´nico, MD1

**Abstract**

Objective: This study aims to determine time differences (differences in restricted mean survival times [RMSTs]) in the onset of invasive breast cancer, coronary heart disease, stroke, pulmonary embolism, colorectal cancer, and hip fracture between the placebo group and the conjugated equine estrogens 0.625 mg plus medroxyprogesterone acetate 2.5 mg group of the Women_s Health Initiative (WHI) trial based on survival curves of the original report and to provide adequate interpretation of the clinical effects of a given intervention.

Methods: Distribution of survival function was obtained from cumulative hazard plots of the WHI report; Monte Carlo simulation was performed to obtain censored observations for each outcome, in which assumptions of the Cox model were evaluated once corresponding hazard ratios had been estimated. Using estimation methods such as numerical integration, pseudovalues, and flexible parametric modeling, we determined differences in RMSTs for each outcome.

Results: Obtained cumulative hazard plots, hazard ratios, and outcome rates from the simulated model did not show differences in relation to the original WHI report. The differences in RMST between placebo and conjugated equine estrogens 0.625 mg plus medroxyprogesterone acetate 2.5 mg (in flexible parametric modeling) were 1.17 days (95% CI, j2.25 to 4.59) for invasive breast cancer, 7.50 days (95% CI, 2.90 to 12.11) for coronary heart disease, 2.75 days (95% CI, j0.84 to 6.34) for stroke, 4.23 days (95% CI, 1.82 to 6.64) for pulmonary embolism, j2.73 days (95% CI, j5.32 to j0.13) for colorectal cancer, and j2.77 days (95% CI, j5.44 to j0.1) for hip fracture.

Conclusions: The differences in RMST for the outcomes of the WHI study are too small to establish clinical risks related to hormone therapy use.

**Key Words: Women_s Health Initiative Y Menopausal hormone therapy Y Menopause Y Difference in restricted mean survival time.**

In 2002, the main results of the Women_s Health Initiative (WHI) randomized controlled trial were published.1 This trial was carried out to assess the major risks and benefits of using a combined hormone regimen of conjugated equine estrogens 0.625 mg plus medroxyprogesterone acetate 2.5 mg (CEE/MPA) in healthy postmenopausal women. After a mean follow-up of 5.2 years, outcomes were evaluated using hazard ratios (HRs), which were obtained by Cox proportional hazards model analysis.2 Compared with placebo, the use of CEE/MPA combination showed favorable effects on hip fracture and colorectal cancer and a higher risk for invasive breast cancer, coronary heart disease, stroke, and pulmonary

embolism. These results caused an overall negative impact on menopausal hormone therapy prescription.3<5 Cox proportional hazards analysis is widely used for survival analysis in clinical studies. It uses HRs to estimate treatment effects, assuming that risk rates do not change across time among treatment groups (proportional hazards assumptions).6

Nevertheless, several authors agreed that times using HRs may pose difficulty in interpreting clinical results.6<9 Indeed, results that are statistically significant within a large sample size may not translate into a significant clinical difference.10,11

Received November 27, 2014; revised and accepted March 5, 2015.

From the 1Pen˜alole´n, Campus Oriente, Department of Obstetrics and Gynecology, School of Medicine, University of Chile, Santiago de Chile; 2Department of Public Health and Epidemiology, University of the Andes, Santiago de Chile, Chile; 3Campus Sur, Department of Medicine, School of Medicine, University of Chile, Santiago de Chile, Chile; 4Institute

of Biomedicine, Research Area for Women_s Health, Faculty of Medicine, Catholic University of Santiago de Guayaquil, Guayaquil, Ecuador; 5Avansalud, Santiago de Chile, Chile; 6School of Medicine, University Finis Terrae, Santiago de Chile, Chile; and 7Quilı´n Clinic, Clinical University Hospital Network Chile, School of Medicine, University of Chile, Santiago de Chile, Chile.

*Funding/support: None.*

*Financial disclosure/conflicts of interest: None reported. Address correspondence to: So´crates Aedo, MD, MSc, Abel Gonza´lez 0336, La Cisterna, Santiago de Chile, Chile. E-mail: socratesaedo@gmail.com*

Restricted mean survival time (RMST) is a measure of mean survival from time 0 to a specified time point. In contrast to HRs obtained in Cox proportional hazards analysis, RMST does not require proportional hazards assumptions.7<9 Royston and Parmar7,8 proposed estimating and reporting RMST and expressing treatment effects as the difference in RMST between studied randomized arms at a suitable followup time point. Time difference in the onset of events should not be evaluated by HRs.6 Rather, difference in RMST is used

as a measure to determine interpretable differences in treatment effects (complementing the information provided by HRs) to ensure a correct interpretation of the intervention.7<9

To assess the clinical effects of CEE/MPA combination on postmenopausal women in the WHI study,1 we proposed to determine differences in RMST7,8 for the outcomes of the

WHI study (invasive breast cancer, coronary heart disease, stroke$ pulmonary embolism, colorectal cancer, and hip fracture) between placebo and CEE/MPA at 5.2-year follow-up.

**METHODS**

In the present study, we proceeded to obtain log-hazard and time values for the outcomes of the WHI study per treatment group (placebo and CEE/MPA) by means of Digitizelt version 2.03 (Ingo Bormann, 2013), using original published cumulative hazard plots.1 With these results, the Stata program (Stata/IC version 13.1 for Windows; StataCorp LP, 2013) was used to perform log-hazard linear regression on log survival time. The linear regression provided rough estimates of the scale and shape parameters of the two survival distributions (placebo and CEE/MPA)12,13; Monte Carlo simulation was performed12 to obtain randomized censored observations for each outcome of the WHI study.

Upon simulation, Nelson-Aalen survival estimator is used to obtain cumulative hazard plots for 7 years per outcome.14 In addition, corresponding HRs through the Cox model2 were

determined per year of follow-up, and the respective log-rank tests were performed.14 Proportional risk assumptions per outcome are verified graphically and using Grambsch-Therneau test.15 Likewise, incidence rates and corresponding CIs (Poisson exact distribution)16 are determined per outcome and treatment group (placebo and CEE/MPA) at the 5.2-year time limit.

RMST is the mean time for the event to occur; its clinical interpretation depends on the studied event and follow-up time.7,8 It is determined by calculating the area under the

survival curve.7<9 This area can be estimated as a nonparametric curve (Kaplan-Meier survival curve) using numerical integration7,14 and/or pseudovalues.7,17 The area can also be calculated as a parametric curve using the so-called Bflexible parametric[ (FP) modeling,7,8 which, unlike other parametric models, is easy to implement and is more flexible, allowing for a better fitVthe reasons that it was selected for our analysis.

FP modeling should be understood as a regression methodology in which the dependent variable is the survival measure for the studied outcome. This methodology uses transformation of independent variables (restricted cubic splines).18,19 For our case, we used intervention (treatment group) and iteration between treatment times. Transformation

of independent variables generates different FP models. Final selection is based on likelihood (G0.05), Bayesian information criteria (BIC), and Akaike information criteria (AIC), which determine the best adjustment. Degrees of freedom (df ) and degrees of freedom for each time-dependent effect (DFTVC) indicate the transformation (number of knots) of independent variables.18,19 RMST (in days) is determined per treatment group (placebo and CEE/MPA) and outcome at 5.2 years, using estimation methods such as numerical integration,7,14 pseudovalues,7,17 and FP modeling.7,8 Differences in RMST (in days) at 5.2 years between placebo and CEE/MPA and the corresponding CIs (normal distribution) were calculated for each outcome of the WHI study.7,8

**RESULTS**

Before simulation, Kaplan-Meier survival curves for different outcomes were obtained from cumulative hazard plots of the original WHI report.1 These Kaplan-Meier survival curves showed a small separation area between placebo and treatment curves that, at first glance, seemed to overlap.

In our simulation of the WHI trial report, HRs obtained from Cox proportional hazards analysis at 5.2 years showed values with significant CIs for coronary heart disease, stroke,

pulmonary embolism, colorectal cancer, and hip fracture (Fig.).

In Table 1, one can observe a wide variation in HRs for various outcomes. Similarly, log-rank test also showed a trend similar to that observed for HRs (Table 1).

Grambsch-Therneau test was used to assess the assumptions of the Cox model. This showed significant values (P G 0.05) for coronary heart disease and invasive breast cancer outcomes at 7 years. Significant values were not found for the other outcomes. Graphic examination of cumulative hazards of simulated data for invasive breast cancer curves showed evident cross-linking. A similar cross-linking was observed for stroke and colorectal cancer; however, it was mild and initial in the survival curves (Fig.). In our simulation model, rates per outcome showed significant differences in pulmonary embolism between the placebo group and the CEE/MPA group (Table 2).

Adjustment measures and criteria used for selecting the FP model for each outcome were as follows: invasive breast cancer (df2; DFTVC2; AIC, 3,899.97; BIC, 3,946.28), coronary heart disease (df1; DFTVC1; AIC, 4,385.74; BIC, 4,416.61), stroke (df1; AIC, 3,242.98; BIC, 3,266.14), pulmonary embolism (df1; DFTVC1; AIC, 1,624.07; BIC, 1,654.95), colorectal cancer (df1; DFTVC1; AIC, 1,849.18; BIC, 1,880.05), and hip fracture (df2; AIC, 1,888.8; BIC, 1,919.67).

RMST showed differences of less than 1 day for each studied outcome when methodologies (numerical integration, pseudovalues, and FP modeling) were compared at 5.2 years (Table 3).

Using estimation methods (numerical integration, pseudovalues, and FP modeling), we found significant differences in coronary heart disease, pulmonary embolism, and colorectal cancer between RMST for placebo and RMST for CEE/MPA (P G 0.05). Using the FP estimation method, we found differences in RMST for hip fracture to be equally significant (P G 0.05; Table 3). The power for hypothesis testing of the difference of two means (two-tailed; P G 0.05) for colorectal cancer and hip fracture was below 54%. Contrary to this, for coronary heart disease and pulmonary embolism, power was higher than 88% (Table 3).

Differences in RMST at 5.2 years for each outcome are presented in Table 3. Using estimation methods (pseudovalues and numerical integration), we found the largest observed absolute difference for coronary heart disease (mean, 8.28 d) and the smallest observed absolute difference for invasive breast cancer (mean, 1.07 d; Table 3).

**DISCUSSION**

The original observations of the WHI study1 were presented as cumulative hazard plots for each clinical outcome. The simulation technique performed in our study used an inverse

mechanism that reproduced observations from original plots.12 Upon simulation, we obtained cumulative hazard plots (Fig.), HR years, and event rates at 5.2 years (Tables 1 and 2), which were similar to those of the original WHI study,1 therefore indicating the plausibility of inferring our results from those found in the WHI study.1

The WHI report1 and our simulation estimated HRs using Cox proportional hazards analysis.2 Both studies showed curve cross-linking for invasive breast cancer, indicating that HR was not constant across time. This observation was corroborated by other statistical tests (linear tendency test and Grambsch- Therneau test). When hazards are not maintained across time, it would be misleading to report treatment effects through HRs obtained by Cox proportional hazards analysis.6,7,20,21 In these cases, using other methodologies is advised.7,21

As shown in Table 1, the observed changes in HRs for each outcome on follow-up may reflect the natural variation of HRs and/or the fact that, for each outcome, there are subpopulations that possibly do not share the same risk profile. This will hinder the interpretation of HRs obtained from Cox proportional hazards analysis (under proportional hazards assumptions).

RMSTs at 5.2 years and their corresponding 95% CIs for the various outcomes of the WHI simulation display similar values (differences in thousands) with two different estimation

methods used (numerical integration and pseudovalues). This is attributable to the fact that both methods quantify the area under the nonparametric curve.7,14,17

The Royston-Parmar methodology (restricted cubic splines)18,19 generates FP regression models for independent variables (for this case, intervention and iteration intervention with treatment).

Likelihood, AIC, and BIC are measures assessing the prediction capability of FP regression for real data, allowing the selection of the best-fitting model. For example, the models selected for hip fracture and stroke used intervention (treatment) as independent variable. Invasive breast cancer, coronary heart disease, pulmonary embolism, and colorectal cancer used intervention and the interaction between intervention and time as independent variables. Therefore, FP models allow for better adjustment because they consider that time may vary with intervention and is a continuous variable. RMST values in the FP model are slightly different (differences of G1 d) from those estimated in the nonparametric curve. This agreement between the different methods used for estimating RMST indicates the robustness of the methodology employed.

In our analysis (FP model) at 5.2-year follow-up, the RMST or mean time of onset of coronary heart disease was 1,878.1 and 1,885.6 days for CEE/MPA and placebo, respectively, with a difference of 7.5 days between groups (P G 0.05; Table 3).

This means that, at 5.2 years of follow-up, coronary heart event occurs 7.5 days earlier in women using hormone therapy compared with placebo. Our simulationVsame as the WHI

trial1Vshows that, for a mean follow-up of 5.2 years, the HR for coronary heart disease is 1.29 (95% CI, 1.02 to 1.63; Table 1).

There was statistical significance for RMST and HR. However, one must consider that this fact does not mean causality22; in studies with great sample size, statistical significance can be achieved with minimal differences.10,11 In this sense, RMST offers a measure of time when an event occurs, whereas HRs can be interpreted as the instantaneous relative risk of an event per unit time for an individual with risk factors present compared with an individual with risk factors absent, given that both individuals have survived across time and are similar in all other covariates.16 For this reason, interpreting HRs to establish clinical differences becomes difficult. Thus, when RMST for placebo is compared with RMST for CEE/MPA at 5.2 years for outcomes such as coronary heart disease, pulmonary embolism, colorectal cancer, and hip fracture, P values lower than 0.05 do not provide a significant clinical difference (Table 3). Likewise, between-group (placebo and CEE/MPA) differences in RMST at 5.2 years for the time of onset of outcomes in the WHI study showed a small difference that does not allow establishment of a reliable clinical prediction of risk.

Difference in RMST is the difference in Kaplan-Meier survival curve areas between intervention groups (placebo and CEE/MPA). In the original report (published cumulative hazard plots were transformed into Kaplan-Meier survival curves), the observation for each outcome of small areas of separation between intervention groups confirms the aforementioned findings. In addition to the difficulty of interpreting HRs, one must add measure and selection biases23<25 reported for the WHI trial, which may possibly explain the observed results.

**CONCLUSIONS**

Upon simulation of WHI results, differences in RMST at 5.2 years for the onset of invasive breast cancer, coronary heart disease, stroke, and pulmonary embolism are too small

to establish risks related to hormone therapy use.

**REFERENCES**

1. Rossouw JE, Anderson GL, Prentice RL, et al. Writing Group for the Women_s Health Initiative Investigators. Risks and benefits of estrogen plus progestin in healthy postmenopausal women: principal results from the Women_s Health Initiative randomized controlled trial. JAMA 2002;288:321-333.

2. Cox D. Regression models and life-tables. J R Stat Soc 1972;34:187-220.

3. Rossouw JE, Manson JE, Kaunitz AM, Anderson GL. Lessons learned from the Women_s Health Initiative trials of menopausal hormone therapy.

Obstet Gynecol 2013;121:172-176.

4. Aedo S, Schiattino I, Cavada G, Porcile A. Quality of life in climacteric Chilean women treated with low-dose estrogen. Maturitas 2008;61: 248-251.

5. Sturmberg JP, Pond DC. Impacts on clinical decision makingVchanging hormone therapy management after the WHI. Aust Fam Physician 2009;38:249-251, 253-255.

6. Spruance SL, Reid JE, Grace M, Samore M. Hazard ratio in clinical trials. Antimicrob Agents Chemother 2004;48:2787-2792.

7. Royston P, Parmar MK. The use of restricted mean survival time to estimate the treatment effect in randomized clinical trials when the proportional hazards assumption is in doubt. Stat Med 2011;30:2409-2421.

8. Royston P, Parmar MK. Restricted mean survival time: an alternative to the hazard ratio for the design and analysis of randomized trials with a time-to-event outcome. BMC Med Res Methodol 2013;13:152.

9. Tian L, Zhao L, Wei LJ. Predicting the restricted mean event time with the subject_s baseline covariates in survival analysis. Biostatistics 2014; 15:222-233.

10. Ferrill MJ, Brown DA, Kyle JA. Clinical versus statistical significance: interpreting P values and confidence intervals related to measures of association to guide decision making. J Pharm Pract 2010;23:344-351.

11. Sarmukaddam SB. Interpreting Bstatistical hypothesis testing[ results in clinical research. J Ayurveda Integr Med 2012;3:65-69.

12. Royston P. Tools to simulate realistic censored survival-time distributions. Stata J 2012;12:639-654.

13. Crowther MJ, Lambert PC. Simulating biologically plausible complex survival data. Stat Med 2013;32:4118-4134.

14. Cleves M, Gould W, Gutierrez RG, Mrachenko YV. An Introduction to Survival Analysis Using Stata. 3rd ed. College Station, TX: Stata Press; 2010:412.

15. Grambsch PM, Therneau TM. Proportional hazards tests and diagnostics based on weighted residuals. Biometrika 1994;81:515-526.

16. Rosner B. Fundamentals of Biostatistics. 7th ed. Boston, MA: Cengage Learning; 2010:859.

17. Andersen PK, Perme MP. Pseudo-observations in survival analysis. Stat Methods Med Res 2010;19:71-99.

18. Lambert PC, Royston P. Further development of flexible parametric models for survival analysis. Stata J 2009;9:265-290.

19. Royston P, Lambert PC. Flexible Parametric Survival Analysis Using Stata: Beyond the Cox Model. College Station, TX: Stata Press; 2011:347.

20. Bellera CA, MacGrogan G, Debled M, de Lara CT, Brouste V, Mathoulin-Pe´lissier S. Variables with time-varying effects and the Cox model: some statistical concepts illustrated with a prognostic factor study in breast cancer. BMC Med Res Methodol 2010;10:20.

21. Uno H, Claggett B, Tian L, et al. Moving beyond the hazard ratio in quantifying the between-group difference in survival analysis. J Clin Oncol 2014;32:2380-2385.

22. Rothman KJ, Greenland S. Causation and causal inference in epidemiology. Am J Public Health 2005;95(suppl 1):S144-S150.

23. Shapiro S, Farmer RD, Mueck AO, Seaman H, Stevenson JC. Does hormone replacement therapy cause breast cancer? An application of causal principles to three studies: Part 2. The Women_s Health Initiative: estrogen plus progestogen. J Fam Plann Reprod Health Care 2011;37: 165-172.

24. Tan O, Harman SM, Naftolin F. What can we learn from design faults in the Women_s Health Initiative randomized clinical trial? Bull NYU Hosp Jt Dis 2009;67:226-229.

25. Rossouw JE, Prentice RL, Manson JE, et al. Postmenopausal hormone therapy and risk of cardiovascular disease by age and years since menopause. JAMA 2007;297:1465-1477.