J. Hon. Sci. Vol. 1 (1): 68-70, 2006 Statistical modelling for pre-harvest forecast: an illustration with rose K. S. Shamasundaran and R. Yenugopalan Section of Economics and Statistics Indian Institute of Horticultural Research Hessaraghatta Lake Post, Bangalore-560 089, India E-mail: sham@iihr.emet.in ABSTRACT Crop yield forecast plays a vital role in arriving at pre-harvest yield estimate of a standing crop and to identify the stage at which reliable forecasting could be made before final harvest. In this paper, an attempt has been made to apply the regression technique for prediction of yield in rose. Rose, is an important flower crop not only for internal market but is also intended for export, and since it shrivels, estimation of yield of a standing crop before its actual harvest is essential. Based on results a model was developed, which showed that information from the first two pickings of a standing crop could be used to forecast rose yield to an extent of 77% two months before final harvest. It is also suggested to have a minimum sample size of 20 % to develop such a forecast model. Key words: Goodness of fit statistics, statistical modelling, yield forecast INTRODUCTION Commercial cultivation of roses has gained importance in recent years in India due to a growing demand for these flowers in both domestic and export markets. India, blessed with diverse agro-climatic conditions, has an immense potential to increase the productivity and, in turn, yields maximum return, in overseas market for this crop. This can be achieved by developing a suitable model to predict the actual yield of a standing crop and subsequently identify the stage within which forecasting could be made to the desired extent. To this end, it is imperative to develop a model through which growers and policy makers could frame suitable management strategies for maximizing crop productivity and net return. In this regard, statistical modelling plays a vital role in developing appropriate forecast models, on a strong scientific footing, for crop yield prediction. Shamasundaran and Singh (2003) made a beginning in this direction. In the present study, an attempt has been made to develop multiple regression models for obtaining a pre- harvest estimate of yield of rose based on information pertaining to several pickings. Goodness of fit of the models developed was carried out by statistically testing the computed regression coefficients and working out measures model adequacy. MATERIAL AND METHODS An investigation was carried out at the Indian Institute of Horticultural Research, Bangalore during 1994- 95 for yield prediction in rose cv. Happiness. Two hundred and fifty six samples were used in this study. All the recommended cultural practices with a spacing of 75 cm x 75 cm were followed uniformly for the entire plot. Data on yield in terms of number of flowers/plant from several pickings were recorded and consolidated. The first picking was made eleven months after planting. Subsequent pickings were made at an interval of 45 days. Linear correlation coefficient among harvests done during several pickings and total yield were computed and statistically tested. Further, multiple regression models were developed by regressing harvest pertaining to different pickings with the cumulative yield by utilizing the principle of least squares (Lewis-Beck, 1993). The following measures of goodness of fit statistics were used to judge the adequacy of the model developed (Agostid'no and Stephens, 1986): Mean squared error (MSE) A M S E = [ E ( Y t - Y t ) ^ / n ] Coefficient of Determination (R )̂ R^= l - [ 2 : ( Y t - Y ) 2 / [ 2 : ( Y t - Y t ) ^ ] where Ŷ represents the harvest/yield at time t. However, while fitting regression models to the data considered, it mailto:sham@iihr.emet.in Shamasundaran and Venugopalan may be noted that even an addition of one more independent variable to the model would result in increase in R̂ value (Kvelsth, 1985). Hence, to test the significance of the added variable, regression coefficients were subjected to t-test statistic analysis (Lewis-Beck, 1993). RESULTS AND DISCUSSION Linear [simple(r) and multiple(R)] correlation among yield (total) and individual pickings yield were computed are presented in Tables 1 and 2. Results revealed that there existed a highly significant relationship in almost all the pickings at 1 % level, either individually or in combination with total yield. Further, it was noticed that the first picking gave rise to R̂ of 70% followed by others and the least was noticed with the fifth picking. When multiple correlation and regression was carried out, it revealed that all the pickings, individually or in combination, had significantly higher association with total yield ranging from 0.3889 to 0.9060. It was found that more than 80% of R̂ noticed with all the pickings together followed by first three and first four pickings. The first two pickings and the same along with four pickings; the first five pickings except second gave rise to an R̂ of more than 77% yield prediction. Further, as discussed earlier, inclusion of additional information about the harvest obtained in every pickings, R̂ value tends to increase further. To this end, regression coefficients derived by including an additional variable were tested for its statistical significance. Results presented in table 3 indicate that inclusion of X^ variable into the model yielded non-significant regression coefficient, as indicated by t-statistic value of 1.04, which falls outside the acceptance region. Similarly, it may be further observed Table 1. Results of correlation (r) among individual pickings and total yield DV IV r 6 ) 0.84** 2 0.51** 3 0.53** 4 0.39** 5 0.15 ** Significant at 1% a 2.5487 -0.0172 -0.4903 0.0992 0.4205 DV- Dependent Variable IV-lndependent Variable 1-First picking (X,) 2-Second picking (X^) 4-Fourth picking (X )̂ 5- Fifth picking (X5) b, 0.3682 - - - - b, 0.0759 - - - 3-Third picking (X,) 6.Total yield (X,) Table 2. Results of multiple correlation (R) among pickings and total yield DV IV R 6 1,2 0.88** 1,3 0.72** 1,4 0.81** 1,5 0.79** 2,3 0.72** 2,4 0.75** 2,5 0.57** 3,4 0.74** 3,5 0.51** 4,5 0.39** 1,2,3 0.89** 1,2,4 0.88** 1,2,5 0.84** 2,3,4 0.81** 2,3,5 0.73** 3,4,5 0.74** 1,2,3,4 0.91** 1,2,3,5 0.90** 1,3,4,5 0.89** 2,3,4,5, 0.85** 1,2,3,4,5 0.91** a -1.6575 3.4425 3.2340 1.6152 6.2107 7.0901 7.9559 5.8523 8.5244 9.7716 -1.1389 2.7167 1.2488 4.5190 5.9351 5.8802 -0.8506 0.6815 1.1059 4.3604 -0.8571 b , 1.7039 1.2992 1.2362 1.4667 - - - - - 1.4853 1.0018 1.2886 - - - 1.2582 1.1200 1.1847 - 1.2599 b, 1.9354 - - - 3.3277 1.3168 1.2598 - - - 2.0963 0.9228 0.7775 2.4385 1.3418 - 1.8455 0.8992 3.4531 1.8408 b, - 0.2385 - - b, 0.8668 - 1.1362 - 1.4238 1.3072 - 0.3648 - - 1.3419 1.4194 - 0.5842 1.0148 0.8509 1.3611 0.5831 b. - - - 0.1139 - b. - 1.0299 - - 1.5209 - 1.8123 1.2908 - 1.1600 - 1.3873 - 0.6845 - 0.8678 1.5174 0.6835 b, - - - 0.0387 b, - - 1.2010 - - 1.0325 - 1.0150 0.1404 - 1.2007 - 1.0604 - - 1.1987 0.9536 1.0970 0.0043 RM%) 69.83 25.96 27.80 15.12 2.20 R̂ (%) 77.43 60.09 65.85 62.56 52.37 58.90 32.11 54.60 25.68 15.25 79.30 77.92 71.28 66.33 53.97 54.64 82.08 81.78 79.58 71.72 82.08 ** Significant at 1"; DV- Dependent Variable 1-First picking (Xj) 4-Fourth picking (X^) IV-lndependent Variable 2-Second picking (X )̂ 5- Fifth picking (X,) 3-Third picking (X )̂ 6.Total yield (X )̂ / Hon Sci. Vol. 1(1): 68-70, 2006 69 Statistical modelling for pre-harvest forecast Table 3. Results of goodness of fit statistics along with the selected models IV R2 MSE Model and (t-statistic) Significant IV ,2 ,2,3 ,2,3,4 ,2,3,4,5 0.88 0.89 0.82 0.82 6.05 6.01 5.69 6.52 Y = -1.66+1.7X,+1.93Xj (5.44) (2.09) Y = -1.14+1.48X,+2.09X2+0.36X3 (3.95) (2.24) Y = -0.85+1.26X,+1.84X2+0.58X3+0.68X, (3.1) (1.98) (1.54) (1.3) Y = -0.85+1.26X,+1.8X2+0.58X3+0.68X,+0.004X5 (1.1) (0.005) X,,X, X,,X, X „ X , X, Figures in parentheses indicate t-statistic values DV- Dependent Variable IV-Independent Variable 1-First picking (X,) 2-Second picking (X^) 4-Fourth picking (X^) 5- Fifth picking (X5)\ 3-Third picking (Xj) 6.Total yield (X^) that inclusion of an additional variable into the model results in non-significant regression estimates. Thus, results indicate that information from two pickings could predict the yield to an extent of 77 %. Further, corresponding regression coefficients were significant as indicated by the t-statistic values, which fall inside the acceptance region of 1.96. Moreover, the mean square error in reduction also strengthens our conclusion for identifying a model based on the first two pickings. Hence, the model developed showed that information from the first two pickings of a standing crop could be used to forecast rose yield considerably two months before final harvest. It may also be stressed here that as reported by Shamasundaran et al (2003), a minimum sample size of 20% of the population is required to get a good estimate to develop such a forecast model. ACKNOWLEDGEMENT The authors are grateful to Director, Indian Institute of Horticultural Research, Bangalore for providing all facilities to conduct this investigation. REFERENCES Agostid'no, R.B. and Stephens M.A.1986. Goodness of Fit Techniques. Marcel Dekker, New York 576p Lewis-Beck, S. M.1993. Regression Analysis. Sage Publ., New York. 433p Kvelseth,T.O 1985. Cautionary note about R .̂ The Amer. Stat., 39:279-85. Shamasundaran, K. S. and.Singh, K. R 2003. Yield forecasting in tuberose (Polyanthes tuberosa Linn.) as effected by association of various characters. J. Om. Hort., 6:372-75. Shamasundaran, K. S. Venugopalan, R and Singh, K. R 2003. Optimum sample size for yield estimation in certain commercial crops. J. Orn. Hart., 6:2AA-A1 (MS Received 16 February, 2006 Revised 6 June, 2006) J. Hort. Sci. Vol. 1(1): 68-70, 2006 70