Microsoft Word - 44 531 | Mathematics 2014) عام 3العدد ( 27مجلة إبن الھيثم للعلوم الصرفة و التطبيقية المجلد Ibn Al-Haitham Jour. for Pure & Appl. Sci. Vol. 27 (3) 2014 Estimate Complete the Survival Function for Real Data of Lung Cancer Patients Abbas N. Salman Ibtehal H. Farhan Dept. of Mathematics / College of Education for Pure science ( Ibn AL- Haitham ) / University of Baghdad Received in :8 June 2014 , Accepted in: 1 September 2014 Abstract In this paper, we estimate the survival function for the patients of lung cancer using different nonparametric estimation methods depending on sample from complete real data which describe the duration of survivor for patients who suffer from the lung cancer based on diagnosis of disease or the enter of patients in a hospital for period of two years (starting with 2012 to the end of 2013). Comparisons between the mentioned estimation methods has been performed using statistical indicator mean squares error, concluding that the survival function for the lung cancer by using shrinkage method is the best . Keywords: Nonparametric Estimation , Lung Cancer Disease, Complete Real Data, Empirical Estimator, Borkowf –type estimator, Thompson– type estimator, and Nelson-Aalen Estimator, Mean Squares Error 532 | Mathematics 2014) عام 3العدد ( 27مجلة إبن الھيثم للعلوم الصرفة و التطبيقية المجلد Ibn Al-Haitham Jour. for Pure & Appl. Sci. Vol. 27 (3) 2014 1. Introduction 1.1: Preface and History Survival analysis is one of the widely used techniques in medical statistics; its importance also arises in diverse fields such as medicine, engineering, epidemiology, biology, economics, physics, public health and or event history analysis in sociology. survival analysis involves the modeling of time to event data; in this context, death or failure is considered an "event" in the survival analysis literature – traditionally only a single event occurs for each subject, after which the organism or mechanism is dead or broken[6]. Cancer is a class of diseases when a cell or group of cells display uncontrolled growth, invasion and sometimes spread to other locations in the body via lymph or blood (metastasis)[2]. Lung cancer is the most common cancers in the world and the cause of cigarette smoking most types of lung cancer, the more the number of cigarettes smoked per day more and more beginning was in the habit of smoking in the age of the youngest whenever the risk of lung cancer the biggest, as well as the high levels of air pollution and exposure radiation and asbestos may also increase the risk of lung cancer. Nelson, W. [11] presents theory and applications of a simple graphical method, called hazard plotting for the analysis of multiply censored life data consisting of failure times of failed units intermixed with running times on un failed units. Applications of the method are given to multiply censored data on service life of equipment, for strength data on an item with different failure modes, and for biological data multiply censored on both sides from paired comparisons. Theory for the hazard plotting method, which is based on the hazard function of a distribution, is developed from the properties of order statistics from Type II multiply censored samples. Petrson ,A.V.[12] proved that the Kaplan-Meier estimator has consistency property and proposed an estimator for the cumulative hazard function. Haifa, K. [5] estimated the reliability function for the tools of 14 Ramadan factory of tissues with Non-parameters Kaplan and Meier methods . She made a comparison between Kaplan and Meier method and the reliability function when failure data is exponential distribution and concluded that no differences had been significant between the two estimations. Al-Qurashi,I.K.[1] suggested two formulas for estimating the reliability function whatever of the size of their data especially with small size data without access in the theoretical probability distributions, and comparing the proposed formulas with other parametric and non-parametric estimation methods. Borkowf, C. B. [3] proposed a survival function under the framework of the Kaplan-Meier survival function which is called Shrunken Kaplan-Meier survival function. The Shrunken Kaplan-Meier survival function having n number of cases in the study and proved that these estimators performed better as compared to the Greenwood and Peto’s estimators. Borkowf in his study analyzed only the variance estimators. 533 | Mathematics 2014) عام 3العدد ( 27مجلة إبن الھيثم للعلوم الصرفة و التطبيقية المجلد Ibn Al-Haitham Jour. for Pure & Appl. Sci. Vol. 27 (3) 2014 Mei-C. W. [9] presented some Non-Parametric estimation methods in Survival analysis and gave summary notes for survival analysis in Biostatistics Basher, F .M.[4] presented some Non-parametric methods to estimate the reliability function with the practical application using eight estimation methods , Empirical (EM), Product Limit (PLEM) Empirical Kaplan Meier (EKMEM) , Empirical Weighted Kaplan- Meier (WEKM) ,Modified Kaplan-Meier (MKMM),a weighted for reliability function ( WMR) , Modified One ( MMO ,) Modified Two ( MMT ) and reached best Non- parametric method is an Empirical (EM) method using statistical indicator , integral mean square error (IMSE). The knowledge of statistics is one of the important measurements in the pivotal trial and the method of data analysis and evaluation of results [10] In this paper, we rely on real data for patients with lung cancer was the size of the sample ( 118 ), the number of males (68) and the number of females (50) for the years 2012 and 2013 may have got more types of cancers in humans killed . The aim of this paper is to estimate the survival function for the mentioned complete real data. Comparisons between the proposed estimation methods has been performed using statistical indicator mean squares error, concluding that the survival function for the lung cancer by using shrinkage method is the best. Kaplan, E.L. & Meier, P. [6] suggested estimating the conditional probability of failure of time t by the observed proportion of failures of time t , and combined these estimates in the usual manner to obtain an estimate of the underlying survival distribution S(t).They studied the properties of proposed estimates and concluded the maximum likelihood estimate was strongly consistent and asymptotically normal. 1.2 Basic Concepts 1.2.1 Survival Function The object of primary interest is the survival function, conventionally denoted by S, which is defined as [7]: S(t) = Pr(T > t) (1) Where , T is a r.v. , t is the time of death, . The survival function S t is the probability that the patient will survive till time t. Survival probability is usually assumed to approach zero as age increases. i.e.; 1. 0 1. 2. lim → 0 . 3. is decreasing and continuous from right side. Another characteristic of survival data is that the survival time cannot be negative[13]. See figure (1). 2. Nonparametric Estimation Method Nonparametric method is often very easy and simple to understand as compared to parametric method .Furthermore, nonparametric analyses are more widely used in situation, 534 | Mathematics 2014) عام 3العدد ( 27مجلة إبن الھيثم للعلوم الصرفة و التطبيقية المجلد Ibn Al-Haitham Jour. for Pure & Appl. Sci. Vol. 27 (3) 2014 where there is doubt about the exact form of distribution [13]. In this research we use some nonparametric methods like Empirical Survival Function (EM), Borkowf –type (BE), Nelson -type (NE), and Thompson– type(TB) estimators for estimating the survival function for the patient of lung cancer based on complete censored data . 2.1 Empirical survival Function Estimation Method(EM) Let F ( t ) denotes the life distribution for a certain type of items. We want to estimate the distribution function F ( t ) and the survivor function S ( t ) = 1 - F ( t ) from a complete data set of n independent lifetimes. Let t ( l ) t(2) . . . t(n) be the data set arranged in ascending order. The empirical distribution function is defined as [6],[14] ( N u m b e r o f lif e ti m e t)F ( t) = n  2 If we assume that there are no ties in the data set, the empirical distribution function may be written t 0 for t t for t t t i 1,………….,n 1 for t t (3) The corresponding empirical survivor function is 1 t Number oflife time n 4 If there are no ties in the data set, the Empirical survivor function may also be written S t 1 for t t 1 for t t t i 1,……. . ,n 0 for t t (5) The variance of Empirical survivor function is [14] Var S t S t 1 S t n 6 If all observations are distinct, S(t) is a step function that decreases by l/n just before each observed failure time[6]. A simple adjustment accommodates any ties present in the data. S(t) as a function of t is illustrated, so we have S t 1 i n i 1,…………. ,n 7 2.2 Borkowf –Type Estimator Method (BE) Borkowf proposed a survival function under the framework of the Empirical survival function[3] .The Borkowf Empirical survival function having n number of cases in the study which is defined by the expression 1 1 2 8 535 | Mathematics 2014) عام 3العدد ( 27مجلة إبن الھيثم للعلوم الصرفة و التطبيقية المجلد Ibn Al-Haitham Jour. for Pure & Appl. Sci. Vol. 27 (3) 2014 Borkowf proved that the Greenwood’s variance of the proposed estimator less than the Greenwood’s variance of Empirical survival function ( (t)EM ). The standard error of is usually found from Greenwood’s formula. The variance of Borkowf proposed a survivor function is 3 , 14 Var n 1 n S t 1 S t n 9 2.3: Thompson– Type Estimator Method (TE) The shrinkage estimation method is the Bayesian approach depending on prior information regarding the value of the specific parameter from past experiences or previous studies. In this section we have to estimate S(t) when a prior information about S(t) available as initial value S0(t). Thus, Thompson- type shrinkage estimator have the following form [15] S t ξ S t 1 ξ S° t , 0 ξ 1 10 Where is a shrinkage factor, 0   1 . Here, is selected based on Wald test statistic for H0 : S(t) = S0(t) , against HA: S(t) S0(t) with Level of significance equal to 0.05. Where test statistic is 0 1/ 2 ˆ ˆ[var ] EM EM S S Z S   In this paper, we put forward the shrinkage weight function ξ as Exp(-10/n) . 2.4:Nelson-Aalen Estimator Method (NE) The Nelson-Aalen Estimator[11], an alternative estimate of the survival function which is based on the individual event times and of cumulative hazard rate H(t) at time t as below:- t for 0 11 Suppose that there are n individuals with observed survival times t 1, t2 ,...,tn. The ordered death times t(i) ,i=1,2,...,n. Where di is the number of individuals who die at time t ( i ) . H t h x dx LnS t Where h(t) refers to hazard rate at time t. Thus ,we can write the survival Nelson estimation as following 12)( ) Ŝ(t)NL=Exp (-Ĥ(t) 536 | Mathematics 2014) عام 3العدد ( 27مجلة إبن الھيثم للعلوم الصرفة و التطبيقية المجلد Ibn Al-Haitham Jour. for Pure & Appl. Sci. Vol. 27 (3) 2014 3. Estimation of Survival Function Methods The results of the estimation for the Survival Function using four mentioned methods under complete data using the MATLAB (2012a) program [8,14] are shown in Table( 1 ) . 4.Numerical Result and Conclusions 1. As an expected the values of survival function of all estimation methods which are proposed in this paper has been decreasing gradually with increasing failure times for lung cancer patients , that is means there is an opposite relationship between failure times and survival function. This shows that the value of survival function for patients was high when the patients were alive in the hospital and became low otherwise [14]. 2. The mean squares error [14], for proposed estimation methods of the survival function are given in table (2). Where; MSE ∑ 13 Where is the Median rank survival function, is the specific estimated survival function and n refer to the sample size of the patient . 3. As a consequence, the computations of mentioned statistical indicators which are shown in table (2) above, leads to the result that the mean squares error(MSE) for Thompson estimator (TE) method are less than those of the EM, BE and NE methods, so the shrinkage Method is the best estimation method. 4. By observing figure (4) below, one can note the matching of the proposed estimation methods in this paper and the extent of convergence resulting accuracy of these methods, especially to real Median rank survival function methods S (t) .See figure (2). References 1. AL- Qurashi , A . K. (2001). Estimate survival function for nonparametric methods Ph.D. thesis in statistics submitted to the Faculty of Management and Economics at University of Mustansiriya. 2. American Cancer Society (December 2007). "Report sees 7.6 million global 2007 cancer deaths". Reuters. Retrieved 2008-08-07. 3. Borkowf, C. B. (2005). A simple hybrid variance estimator for the Kaplan-Meier survival function. Statistics in Medicine, . 24; 827-851. 4. Basher, F. M.(2010).Some Of The Parametric Methods And Nonparametric to Estimate the reliability function With The Practical Application. Master thesis, Baghdad University, College of Administration and Economics 5. Haifa, K. (1987).Use Non-Parametric Method to Find Reliability Function For the Tools of 14 Ramadan factory - Department of textile . Master Thesis, Baghdad University, College of Administration and Economics. 6.Kaplan, E.L. and Meier, P. (1958). Non Parametric Estimation from Incomplete Observations .Journa1 of the American Statistical Association. 53. 457-481. 7. Marvin. R and Arnljot .H. ( 2004) . System Reliability Theory Models, Statistical Methods and Applications . Second Edition. 537 | Mathematics 2014) عام 3العدد ( 27مجلة إبن الھيثم للعلوم الصرفة و التطبيقية المجلد Ibn Al-Haitham Jour. for Pure & Appl. Sci. Vol. 27 (3) 2014 8.Mathews J. H. And Fink K. D. (2003)," Numerical Method Using MATLAB", Third Edition, Prentice Hall, USA. 9. Mei-C,W. (2006) .Summary Notes for Survival Analysis. Department of Biostatistics. Johns Hopkins University. 10. National Collaborating Centre for Cancer (2011), " The diagnosis and treatment of lung cancer (update) ". http://www.nice.org.uk/nicemedia/live/13465/54199/54199.pdf 11. Nelson, W.( 1972).Theory and application of hazard plotting for censored failure Data ." Techno metrics 14:945-966. 12. Petrson , A.V.(1977) . Expressing the Kaplan-Meier estimator as a function of empirical sub-survival function . JASA.72,854-858. 13.Qamruz, Z. and Karl, P. (2011) ,Survival Analysis Medical Research. http://interstat.statjournals.net/YEAR/2011/abstracts/1105005.php. 14. Taha , A. T (2013)," Estimate the Parameters and Related Probability Functions for Data of the Patients of Lymph Glands Cancer via Birnbaum- Saunders Model", M.Sc., Baghdad University, Education College for Pure Sciences(Ibn Al-Haitham) . 15. Thompson , J.R. (1968) . Some Shrinkage Techniques for Estimating the Mean . J. Amer. Statist. Assoc.63..113-122 Table No. (1): Estimated Values for the Survival Function No. Time/d _ _ _ _ 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 3 37 72 75 91 100 103 121 127 140 154 156 164 186 211 212 213 217 218 221 221 233 0.9915 0.9831 0.9746 0.9661 0.9576 0.9492 0.9407 0.9322 0.9237 0.9153 0.9068 0.8983 0.8898 0.8814 0.8729 0.8644 0.8559 0.8475 0.8390 0.8305 0.8220 0.8136 0.9874 0.9790 0.9706 0.9622 0.9537 0.9453 0.9369 0.9285 0.9201 0.9117 0.9033 0.8949 0.8865 0.8781 0.8697 0.8613 0.8529 0.8445 0.8361 0.8277 0.8193 0.8109 0.9917 0.9833 0.9748 0.9663 0.9578 0.9494 0.9409 0.9324 0.9240 0.9155 0.9070 0.8985 0.8901 0.8816 0.8731 0.8646 0.8562 0.8477 0.8392 0.8308 0.8223 0.8138 0.9916 0.9832 0.9749 0.9667 0.9585 0.9504 0.9424 0.9345 0.9266 0.9187 0.9110 0.9033 0.8957 0.8881 0.8806 0.8732 0.8658 0.8585 0.8513 0.8441 0.8370 0.8299 538 | Mathematics 2014) عام 3العدد ( 27مجلة إبن الھيثم للعلوم الصرفة و التطبيقية المجلد Ibn Al-Haitham Jour. for Pure & Appl. Sci. Vol. 27 (3) 2014 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 240 241 243 249 254 266 273 276 277 278 281 290 301 301 301 302 304 304 306 307 307 308 313 313 314 318 330 331 332 332 334 334 335 335 335 335 338 341 342 345 349 354 357 363 364 364 366 367 368 371 373 0.8051 0.7966 0.7881 0.7797 0.7712 0.7627 0.7542 0.7458 0.7373 0.7288 0.7203 0.7119 0.7034 0.6949 0.6864 0.6780 0.6695 0.6610 0.6525 0.6441 0.6356 0.6271 0.6186 0.6102 0.6017 0.5932 0.5847 0.5763 0.5678 0.5593 0.5508 0.5424 0.5339 0.5254 0.5169 0.5085 0.5000 0.4915 0.4831 0.4746 0.4661 0.4576 0.4492 0.4407 0.4322 0.4237 0.4153 0.4068 0.3983 0.3898 0.3814 0.8025 0.7941 0.7857 0.7773 0.7689 0.7605 0.7521 0.7437 0.7353 0.7269 0.7185 0.7101 0.7017 0.6933 0.6849 0.6765 0.6681 0.6597 0.6512 0.6428 0.6344 0.6260 0.6176 0.6092 0.6008 0.5924 0.5840 0.5756 0.5672 0.5588 0.5504 0.5420 0.5336 0.5252 0.5168 0.5084 0.5000 0.4916 0.4832 0.4748 0.4664 0.4580 0.4496 0.4412 0.4328 0.4244 0.4160 0.4076 0.3992 0.3908 0.3824 0.8053 0.7969 0.7884 0.7799 0.7715 0.7630 0.7545 0.7460 0.7376 0.7291 0.7206 0.7121 0.7037 0.6952 0.6867 0.6783 0.6698 0.6613 0.6528 0.6444 0.6359 0.6274 0.6190 0.6105 0.6020 0.5935 0.5851 0.5766 0.5681 0.5596 0.5512 0.5427 0.5342 0.5258 0.5173 0.5088 0.5003 0.4919 0.4834 0.4749 0.4665 0.4580 0.4495 0.4410 0.4326 0.4241 0.4156 0.4071 0.3987 0.3902 0.3817 0.8229 0.8160 0.8091 0.8022 0.7955 0.7888 0.7821 0.7755 0.7690 0.7625 0.7560 0.7497 0.7433 0.7371 0.7308 0.7247 0.7186 0.7125 0.7065 0.7005 0.6946 0.6887 0.6829 0.6772 0.6715 0.6658 0.6602 0.6546 0.6491 0.6436 0.6382 0.6328 0.6274 0.6221 0.6169 0.6117 0.6065 0.6014 0.5963 0.5913 0.5863 0.5814 0.5765 0.5716 0.5668 0.5620 0.5572 0.5525 0.5479 0.5433 0.5387 539 | Mathematics 2014) عام 3العدد ( 27مجلة إبن الھيثم للعلوم الصرفة و التطبيقية المجلد Ibn Al-Haitham Jour. for Pure & Appl. Sci. Vol. 27 (3) 2014 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 374 380 387 392 393 397 399 400 400 401 402 407 409 419 421 421 422 422 423 427 428 430 446 450 454 461 463 470 477 481 481 483 483 497 511 512 512 516 517 519 533 534 535 540 550 0.3729 0.3644 0.3559 0.3475 0.3390 0.3305 0.3220 0.3136 0.3051 0.2966 0.2881 0.2797 0.2712 0.2627 0.2542 0.2458 0.2373 0.2288 0.2203 0.2119 0.2034 0.1949 0.1864 0.1780 0.1695 0.1610 0.1525 0.1441 0.1356 0.1271 0.1186 0.1102 0.1017 0.0932 0.0847 0.0763 0.0678 0.0593 0.0508 0.0424 0.0339 0.0254 0.0169 0.0085 0 0.3740 0.3656 0.3572 0.3488 0.3403 0.3319 0.3235 0.3151 0.3067 0.2983 0.2899 0.2815 0.2731 0.2647 0.2563 0.2479 0.2395 0.2311 0.2227 0.2143 0.2059 0.1975 0.1891 0.1807 0.1723 0.1639 0.1555 0.1471 0.1387 0.1303 0.1219 0.1135 0.1051 0.0967 0.0883 0.0799 0.0715 0.0631 0.0547 0.0463 0.0378 0.0294 0.0210 0.0126 0.0042 0.3733 0.3648 0.3563 0.3478 0.3394 0.3309 0.3224 0.3140 0.3055 0.2970 0.2885 0.2801 0.2716 0.2631 0.2546 0.2462 0.2377 0.2292 0.2208 0.2123 0.2038 0.1953 0.1869 0.1784 0.1699 0.1615 0.1530 0.1445 0.1360 0.1276 0.1191 0.1106 0.1021 0.0937 0.0852 0.0767 0.0683 0.0598 0.0513 0.0428 0.0344 0.0259 0.0174 0.0090 0.0005 0.5341 0.5296 0.5252 0.5207 0.5163 0.5120 0.5076 0.5034 0.4991 0.4949 0.4907 0.4866 0.4825 0.4784 0.4744 0.4704 0.4664 0.4625 0.4586 0.4547 0.4509 0.4471 0.4433 0.4395 0.4358 0.4321 0.4285 0.4249 0.4213 0.4177 0.4142 0.4107 0.4073 0.4038 0.4004 0.3970 0.3937 0.3904 0.3871 0.3838 0.3806 0.3774 0.3742 0.3710 0.3679 540 | Mathematics 2014) عام 3العدد ( 27مجلة إبن الھيثم للعلوم الصرفة و التطبيقية المجلد Ibn Al-Haitham Jour. for Pure & Appl. Sci. Vol. 27 (3) 2014 Table No. (2): Comparing between four Non-parametric Methods Methods MSE[ ] EM 0.000018 NE 0.0292 BK 0.000019 TH 0.000015 S (t) Figure No.( 1): Shows the curve of the survival function Figure No.(2): Shows the curve of four used estimation methods for the survival function 0 100 200 300 400 500 600 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 Time , days S (T ) S-RL S-EM S-NE S-BE S-TH 541 | Mathematics 2014) عام 3العدد ( 27مجلة إبن الھيثم للعلوم الصرفة و التطبيقية المجلد Ibn Al-Haitham Jour. for Pure & Appl. Sci. Vol. 27 (3) 2014 تقدير دالة البقاء لبيانات حقيقية كاملة لمرضى سرطان الرئة سلمانعباس نجم ابتھال حسين فرحان / جامعة بغداد )ابن الھيثم(قسم الرياضيات / كلية التربية للعلوم الصرفة استلم في :8 حزيران 2014 ، قبل في :1 ايلول 2014  الخالصة بيانات ىمختلفة اعتمادا عل ق تقدير المعلمية ائدالة البقاء لمرضى سرطان الرئة باستخدام طر قدرتفي ھذا البحث ، أو دخول سرطان الرئة والمعتمد على تشخيص المرض حقيقية كاملة التي تصف مدة البقاء للمرضى الذين يعانون من ). وقد أجريت مقارنات بين طرائق 2013إلى نھاية عام 2012بداية عام المرضى في المستشفى مدة سنتين (تبدأ من اء على قيد مربعات الخطأ، وخلصت الدراسة إلى أن تقدير دالة البقمؤشر إحصائي متوسط التقدير المقترحة باستخدام سرطان الرئة باستخدام طريقة التقلص ھي الفضلى. الحياة لمرضى قدر بوركوف, , مرض سرطان الرئة، البيانات الحقيقية الكاملة، المقدر التجريبي,ة: مقدرات غير معلمي المفتاحيةالكلمات ومتوسط مربعات الخطأ. مقدر نلسون، مقدر التقلص