Upsala J Med Sci 87: 189-199, 1982 Standard Computer Programs in Statistical Analysis of Survival in Childhood Lymphoblastic Leukemia Bertil W. Anderson' and Goran Gustafsson* From the Department of Statistics', University of Uppsala and the Department of Pediutrics', University Hospital, Uppsala, Sweden ABSTRACT A material comprising all children in Sweden with acute lympho- blastic leukemia diagnosed in the years 1973-80 was analysed sta- tistically. The total number of children was 5 0 5 . Studies were made of 38 different variables, using frequency tables, cross tables, life table studies ( 1 ) and linear regression analysis according to Cox's method ( 2 , 4 ) . Chi-square tests and log rank tests were included in the me- thods. The combination of life-table studies and linear regression analysis proved to be of value in assessing the significance of different parameters and treatment programs with regard to prog- nosis. The aim of this paper is to present a method for analysis of a patient material with use of standard computer programs. The re- sults of the total analysis will be published elsewhere (3). INTRODUCTION Acute lymphoblastic leukemia (ALL) is a malignant disease which can occur in children of all ages. With regard to age, white blood cell count (WBC) at diagnosis and the presence or absence of cen- tral nervous system (CNS) involvement and/or of a mediastinal tu- mor at diagnosis, the children were classified as suffereing from "high-risk leukemia" or "standard-risk leukemia" ( 3 ) . The children first received induction treatment for six weeks and if this was successful they were classified as being in com- plete remission. When remission was not achieved, the children died as a result of the disease and/or the treatment. ~. After remission, prophylactic radiation of the CNS was given, followed by maintenance therapy. Therapy was discontinued after 13-822858 189 three years in complete continuous remission (CCR) . Relapses of the disease may occur during therapy or after dis- continuation of therapy, in the bone marrow, CNS, testes, or other organs or a combination of these locations. Following relapse, a second remission may be induced and the child may survive or new relapses may terminate life. Death may also occur during a remis- sion period from other causes than the disease, e.g. infection. All analysed possible outcomes of the disease are presented schematically in Figure 1. Diagnosis Dead before remission Dead in CCR Dead after relapse ( s ) (DREL-ONTHER) Dead after relapse (s) (DREL-OFFTHER) - - - - - - - - - - y- - G o s e d a t e Alive after relapse(s) Alive in CCR Alive after relapse(s) (AREL-ONTHER) (ACCR, (AREL-OFFTHER) ACCR-OFWH ER) Fig.1 Possible outcomes of leukemia in children. Note: The notched line represents the whole group in ACCR, i.e. For abbreviation, see text. also those who have been treated for a shorter time than three years. MATERIAL AND ANALYTICAL PROCEDURES In the years 1973-80, acute lymphoblastic leukemia was diagnosed in 505 children in Sweden. For these children, 38 clinical varia- bles, for which information was taken from the medical records, were analysed. These 38 variables were divided into four groups: 190 1. Identification variables at diagnosis Name, month and year of birth, age, sex, hospital, home county, municipality and parish, date of diagnosis, presence or absence of CNS leukemia or mediastinal tumor, WBC, immunological classifica- tion, risk group, dominating symptom at diagnosis. 2. Therapy Type of induction therapy, consolidation therapy, CNS prophylax- is therapy, and maintenance therapy, and their side effects. Treat- ment program. 3 . Treatment results (time variables in moths) a) Duration of first remission TCCR = Time in CCR, i.e. length of time from achieved remission to death during remission 0 ~ - to first relapse 0 ~ ' to close date. Every child with achieved remission had a value of one month or more €or this variable. If the child died during induction the value was 00. ACCR = Alive in CCR, i.e. length of time from achieved remission to close date. Only children who were in CCR at the close date had a value for this variable. TCCR-OFFTHER = Time in CCR OFF THERAPY, i.e. length of time from discontinuation of therapy to death during remission to first relapse or to close date. Every child with discontinuation of therapy after 3 years in CCR had a value for this variable. ACCR-OFFTHER = Alive in CCR OFF THERAPY, i.a. length of time from discontinuation of therapy to close date. Only children who were in CCR at the close date had a value for this variable. b) Patients alive at close date but after relapse AREL-ONTHER-REM = Alive after RELapsing ON THERAPY, i.e. length of time from achieved remission to close date for children relaps- ing during therapy. AREL-ONTHER-RELAPSE = Alive after RELapsing ON THERAPY, i.e. length of time from first relapse to close date for children relapsing during therapy. AREL-OFFTHER-REM = Alive after RELapsing OFF THERAPY, i.e. length of time from achieved remission to close date for children re- lapsing after discontinuation of therapy. AREL-OFFTHER-RELAPSE = Alive after RELapsing OFF THERAPY, i.e. length of time from first relapse to close date for children relapsing after discontinuation of therapy. c) Dead patients DCCR = Died during CCR, i.e. length of time from achieved remission 191 to death during CCR. DREL-ONTHER = Died after RELapsing ON THERAPY, i.e. length of time from achieved remission to death for children relapsing during therapy. time from achieved remission to death for children relapsing after discontinuation of therapy. DREL-OFFTHER = Died after RELapsing OFF THERAPY, i.e. lengt of 4. Other variables REL, = Location of first relapse during therapy. REL2 Location of second relapse during therapy. REL-OFFTHER = Location of first relapse after discontinuation of therapy. CDCCR = Cause of death during CCR (e.g. infection). TREL1-REL2 = Length oft time in months between first and second relapse. Measurements on the 38 variables for the 505 children constitut- ed the data set. The data set In order to minimize the coding errors, a thorough examination of the data set comprising the following three steps was made: - the data set was printed and compared with the medical records, - frequence tables were used for checking missing values and out- liers , - cross tabulation was done to check that categorical responses were correctly classified. Life tables and survival functions In the commonly used method, with for example 5-year survival, information about patients participating in the studyfora shorter time than five years would not be utilized. The proportion of pa- tients surviving 5 years would in this case be: W e r of patients alive after five years in the study p5 = N-r of patients participating in the study for at least five years The life table technique, on the other hand, utilizes more in- formation by computing this proportion as a cumulative proportion of surviving children. In principle this can be written as follows: where p1 is the proportion surviving one year, p2 the proportion surviving two years provided that the patients survived the first year, and so on. This technique also provides a good idea of the 192 course of the disease. The problem with different starting and follow-up times is solved by rescaling the time variables s o that all the patients start at time 0 . The end point can be one of the following: 1 ) Dead (response), i.e. died during C C R or relapse. 2) Withdrawn, i.e. alive in CCR at the end of the study (close date). 3) Lost, i.e. patients lost at follow-up. The hazard and the density function are two ways of getting ideas of parametric models describing the survival time. The hazard function (failure rate), Xi is defined as:' 2 9i A i - hi ( 1 + pi) where q1 hi = the width of the i'th interval. The density function (probability of death or relapse per unit = probability of dying in interval i pi = 1 - qi time), fi, is defined as: where pi = the estimate of the cumulative proportion, surviving tothe The density is sometimes called the curve of death and is in beginning of the i'th interval. fact an absolute instantaneous rate of death or relapse. The standard errors computed for the survival, hazard and den- sity functions are used for computing confidence intervals andper- forming tests. Tables 1 and 2 and Figure 2 show the computer print out of the life table and survival analysis from the program BMDP, PIL, 1977 (1). 193 RESULTS Table 1. Example of survival analysis for female patients with achieved remission (computer print out). LIFE TABLE AND SURVIVAL RNALYSIS. TIME VARIABLE IS TIDICCR. GROUPING VARIABLE IS KON. LEVEL IS F: INTERVAL ENTERED WITHDRAWN 0.0-6.27 6.27-12.53 12.53-18.80 18.80-25.07 25.07-31.33 31.33-37.60 37.60-43.87 43.87-50.13 50.13-56.40 56.40-62.67 62.67-68.93 68.93-75.20 75.20-81.47 81.47-87.73 87.73-94.00 QUANTILE 75TH 216 24 181 10 158 10 127 1 1 96 8 80 9 67 9 53 9 40 8 31 8 22 4 18 5 13 4 9 2 7 7 ESTIMATE 19.10 MEDIAN I50TH) 44.22 LOST 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 DEAD EXPOSED 11 204.0 13 176.0 21 153.0 20 121.5 8 92.0 4 75.5 5 62.5 4 48.5 1 36.0 1 27.60 0 20.0 0 15.5 0 11.0 0 8.0 0 3.5 STANDARD ERROR 2.28 10.86 ExDlanations of the table head: PROPORTION PROPORTION DEAD 0.0539 0.0739 0.1373 0.1646 0.0870 0.0530 0.0800 0.0825 0.0278 0.0370 0.0 0.0 0.0 0.0 0 . 0 SURVIVING 0.9461 0.9261 0.8627 0.8354 0.9130 0.9470 0.9200 0.9175 0.9722 0.9630 1 . o o o o 1.0000 1 . o o o o 1 . o o o o 1 . o o o o CUMULATIVE HAZARD DENSITY SURVIVAL (S.E.1 (S.E.1 1 .oooo 0.0 0.9461 0.0158 0.8762 0.0237 0.7559 0.0318 0.6315 0.0368 0.5766 0.0384 0.5460 0.0393 0.5024 0.0407 0.4609 0.0423 0.4481 0.0430 0.4315 0.0445 0.4315 0.0445 0.4315 0.0445 0.4315 0.0445 0,4315 0.0445 0.0088 0 . 0 0 8 6 0.0027 0.0000 0.0122 0.0112 0.0034 0.0000 0.0235 0.0192 0.0051 0.0000 0.0286 0.0199 0.0064 0.0000 0.0145 0.0088 0.0051 0.0000 0.0087 0.0049 0.0043 0.0000 0.0133 0.0070 0.0059 0.0000 0.0137 0.0066 0.0069 0.0000 0.0045 0.0020 0.0045 0.0000 0.0060 0.0026 0.0060 0.0000 0.0 0.0 0 . 0 0.0 0.0 0 . 0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 TIME VARIABLE IS TIDICCR = Time from onset to response or close date. KON = Sex; LEVEL IS F = Sex is female. INTERVAL = Time in months in CCR. ENTERED = Number of patients with a time in CCR corresponding to the interval in question. DEAD = Number of patients responding in the interval in question, i.e. patients dying or relapsing in the interval. The important function values in Table 1 are the CUMULATIVE SUR- VIVAL, which forms the basis of the survival curves in Figure 2. The table also gives the median estimate in the material, i.e. the time in months when half the patients have responded. Table 2 gives a summary of the analyses presented in Tablelfor female and male patients separately. The test statistics inTable 2 represent the results of two non-parametric rank tests €or compar- 194 ison of the cumulative survival functions. The low p values indi- cate a difference between the two survival functions. Table 2 . Table summarizing the survival analyses. Test statistics for comparing the proportions of females and males in CCR. SUMMARY TABLE PERCENT TOTAL DEAD CENSORED CENSORED F '2 1 6 88 128 0.5VZt. 2h4 146 118 0.4470 M _ _ _ _ - - - - _ _ - - TOTALS 480 234 246 TEST STATISTICS STATISTIC D.F. P-VALUE GENERALIZED WILCOXON (BRESLOWi 18.065 1 0.0000 GENERALIZED SAVAGE (MANTEL-COX) 14.878 1 0.0001 DEAD = Number of patients who have responded, i.e. died in CCR of relapsed. CENSORED = Number of patients withdrawn, i.e. the number of pa- tients in CCR at close date. Fig.2 is a graphical illustration of the cumulative proportions of females and males surviving in CCR as shown in Table 1 . By using grouping variables, in this case sex, and comparing the times to response for different values of the grouping varia- bles, good information on prognostic factors such as sex, age and WBC is obtained. A further possibility is to make the analysis below for two or more grouping variables, e.g. duration of remis- sion for different risk groups of female and male patients. 195 CUMULATIVE PROPORTION S U R V I V I N G F= FEMALE M=MALE ~+....+....+....+....+....+....+....+....+....+....+....+....+....+....t....+....+....+....+....+....~. 1 . 0 + M . . . . . . ........ .90 + + . M ............. . . . . . . .so + . . . . . F ...... .70 + n . . . . . . . + . F ...... .hO + M...... F ....... F . . . ... n... ... F.. .... .50 + F . . . . . . M ....... F . ..... F ..... F . . . . . F . . . .. F ...... F . 4 0 + M . . . ... M . . . ... M . . . ... M ...... M.....M.....M.. ... M ...... M .30 + .2n + + . l o + + x . 30. 4 0 . 5 0 . h 0 . 70. Fig.2 Plot of the cumulative proportions of females (F) and males (M) surviving in CCR versus time in CCR in months. The PHGLM Procedure ( 2 , 4 ) The Cox proportional hazard linear model to one dependent vari- able can determine the "best" variable to be added to a model in a model explaining time in CCR (TCCR), i.e. the variation in TCCR will be explained by a set of explanatory variables. But as these variables sometimes explain the same variation (are correlated with each other), the strength of the different variables explaining TCCR will be obtained, provided that the other variables are in the model. Table 3 is the computer print out taken from the last step in the PHGLM Procedure, SAS SUPPLEMENTAL LIBRARY USER'S GUIDE, 1 9 8 0 ( 4 ) . In the print out BETA is comparable with parameters in a mul- tiple linear regression model. CHI-SQUARE is a measure of the 196 strength of the variable and the P value is the level of signifi- cance for the variable in the model. The D value gives a measure of the contribution of the variables explaining the variation in TCCR. The solution gives an answer to the question which variables are the most important of those affecting duration in CCR and is also a measure of the strength of these variables. Table 3. Summary of the PHGLM Procedure (computer print out). S I J R V I V A L I N A L L STEPWISE PROPORTIONAL HAZARDS GENERAL L I N E A R MODEL PROCEDURE 16: 10 TUESDAY I DECEMBER 1 I 1'381 DEPENDENT V A R I A E L E : T I D I C C R S U R V I V A L T I M E EVENT INDICATOR: L I C V A R I A B L E BETA STD. ERROR CHI-SQIJARE P D L P E 0,00266249 0.00041516 41.13 0.0000 0.124 EON 0.310S4665 0.13139369 5.5'? 0.0181 0.019 ALDER 0.00365191 0.00169567 4 . 6 4 0.0:31:3 0.016 MEDT 0.17321388 0.08517534 4.14 0 . ~ 4 2 0 0.014 CHI-SQIJAHE Q S T B T I S T I C S ADJUSTED ONLY FOR V A R I A B L E S I N THE MODEL U A R I U B L E CHI-SQIJARE P D LNS 0 . 0 3 0.8714 0.000 NO A D D I T I O N A L V A R I A E L E S MET THE 0.1000 S I G N I F I C A N C E L E V E L FOR ENTRY. Explanations to the Table: LPK = WBC, KON = Sex, ALDER = Age, MEDT = Mediastinal tumor. The variable CNS is not included in the model because it does not contribute enough to the explanation. The higher the D value of a variable, the stronger the influence of this variable on the duration of CCR. COMMENTS The aim of this communication is to demonstrate in a practical way how we have used standard computer programs in the evaluation of the influence of different clinical parameters on the outcome of a malignant disease. The most important factor in this kind of analysis is the quali- ty of the selected material. This must be as complete as possible and selection should be avoided. If there is selection, its conse- 197 quences must be analysed separately. Selection always implies a risk of irrelevant correlations, which can lead to wrong conclu- sions concerning the material. In our case there is no known selec- tion, as the material includes all known cases of ALL in children in Sweden during the period in question. No child was lost at fol- low-up, which gives important strength to the material. Frequency tables and cross tables analyse the material with re- gard to the distribution of different variables, e.g. age, sex, risk group, location of relapse, etc. The variables can be plotted against each other in a desired way. For instance the relation be- tween duration of CCR and age or sex can easily be determined, but the tables are difficult to read and the results are not easy to evaluate. Life table analyses ( 1 ) offer better possibilities than frequen- cy tables and cross tables of studying variables affecting the du- ration of CCR versus clinical parameters and different treatment programs. The life table method gives a graphical illustration of time in CCR against parameters such as age, WBC, treatmentprograms and so on. It also permits mutual comparisons of subgroups in the material, e.g. "standard risk patients" against "high risk pa- tients" with regard to sex or age. These analyses will yield vada- bles explicitly describing the duration of CCR. The problemis that in one individual patient, different parameters often interact with regard to the outcome of the disease. It may thus be difficult to estimate the effect of a single parameter. We have used a line- ar regression analysis as described by Cox ( 2 ) to solve this prob- lem. This method implies a listing of the internal order of the variables with regard to their influence on the outcome of the di- sease (Table 3 ) . Thus we have evaluated the strength of various "high risk cri- teria" in childhood lymphoblastic leukemia. ACKNOWLEDGEMENTS We wish to express our most sincere thanks to A.Christofferson and A.Kreuger for valuable help and discussions. Financial support was obtained from the Swedish Cancer Society and Selander's Foun- dation. Patient material was obtained from the Members of the Swe- dish Child Leukemia Group and from the 45 Departments of Pediat- rics in Sweden who are gratefully acknowledged. 198 1. Bernedette, J. & Yven, K.: Life tables and survival functions. In: Bio-Medical-Computer Programs, P-series (ed. M.B.Braum), pp.743-770. University of California Press, Los Anqeles, 1977. 2 . Cox, D.R.: Regression models and life tables. J Roy Statist 3 . Gustafsson, G., Kreuqer, A. & Dohlwitz, A.: Acute lymphoblastic SOC B 34:187-220, 1972. leukemia in Swedish children 1973-78. Acta Paediatr Scand 70:609-614, 1981. 4 . Harell, F.: The PHGLM Procedure. In: SAS Supplemental Library User's Guide, pp.119-131. SAS Institute Inc., Cary, North Caro- lina, USA, 1980. Accepted December 15, 1981. Address for reprints: Goran Gustafsson, M.D. Department of Pediatrics University Hospital S-750 14 Uppsala Sweden 1 99