International Journal of Applied Sciences and Smart Technologies International Journal of Applied Sciences and Smart Technologies Volume 4, Issue 2, pages 123–130 p-ISSN 2655-8564, e-ISSN 2685-9432 123 Sugarcane Production Modeling Using Machine Learning in Western Maharashtra Chhaya Narvekar1, *, Madhuri Rao 2 1Department of Information Technology, Xavier Institute Of Engineering , Mumbai , India 2Thadomal Shahani Engineering College , Mumbai, India *Corresponding Author: chhaya.n@xavier.ac.in (Received 01-05-2022; Revised 20-07-2022; Accepted 23-07-2022) Abstract Agriculture is the most important sector in the Indian economy. India is the world's second-largest producer of sugarcane. Study is undertaken at Shirol tehsil. Kolhapur district, Maharashtra state, India with the aim of modeling sugarcane production forecasting using supervised machine learning algorithms. Sugarcane is mostly cultivated crop in this area. We applied supervised machine learning for forecasting the productivity of sugarcane village wise based on the ten year’s data about sugarcane production from the year 2010 to 2020. Sugarcane yield prediction accuracy is around 65%, which is only based on data provided by sugar factory. Keywords: sugarcane, productivity, machine learning, forecasting. 1 Introduction The Indian economy relies heavily on sugarcane cultivation. Sugar, as well as enterprises manufacturing alcohol, paper, chemicals, and animal feed, rely on it for raw materials. In India, sugarcane production is processed through a network of sugar mills, as well as various other businesses and backward and forward connections. The demand This work is licensed under a Creative Commons Attribution 4.0 International License http://creativecommons.org/licenses/by/4.0/ International Journal of Applied Sciences and Smart Technologies Volume 4, Issue 2, pages 123–130 p-ISSN 2655-8564, e-ISSN 2685-9432 124 for higher sugarcane production in India is increasing due to the multi-purpose usage of sugarcane in India and its byproducts in numerous sectors [1]. Despite rising urbanization around the world, agriculture remains the primary source of income for a huge percentage of the people. Although technological developments have resulted in more accurate weather predictions and increased yields, much work remains to be done to provide farmers with a taking into account local data so they can forecast yields. In the Maharashtra (India) region, the Sugarcane Cultivation Life Cycle (SCLC) spans around 12 months, with plantation starting at three separate seasons. Our method relies on past production data to train a supervised machine learning system and make sugarcane crop predictions. Climate, production environments and agronomic aspects associated with agricultural management, such as variety selection, cane field age, fertilization, pest and disease control, and weed control, all influence sugarcane yield [2]. Description of study area - Shirol Taluka of Kolhapur district is gifted by the presence of natural irrigation potential on account of five major rivers i.e. Krishna,Panchaganga,Warana,Dudhganga and Vedganga [3]. Soil type here is alluvial. Normal rainfall is during June-October 1019.5mm. Top three crops cultivated are sugarcane 113.9(‘000 ha), Paddy Rainfed 113.8 (‘000 ha) and Groundnut 57.4(‘000 ha) [4]. India is the world's second-largest producer of sugarcane after Brazil. Sugarcane is grown in all of India's states and at various times of the year. In this study, we propose supervised machine learning based crop yield forecasting model for sugarcane as a principal crop in study area. Crop analysis and agricultural production forecasting always relied on statistical models. Models are applied on ten years production data of the sugarcane. Three algorithms applied for sugarcane productivity prediction and five algorithms are applied for sugarcane yield prediction on ten years sugarcane production data from study area provided by Shree Dutta Sugar Factory, Shirol. International Journal of Applied Sciences and Smart Technologies Volume 4, Issue 2, pages 123–130 p-ISSN 2655-8564, e-ISSN 2685-9432 125 2 Research Methodology Materials and Methods: Sugarcane is India's most important cash crop. It entails less risk, and farmers may be quite certain of a return even in difficult conditions. Sugarcane is first crop of Kolhapur district [4]. The sugarcane yield data, in tons of cane per hectare [5], originally available at the farmers and village gat number level. Ten years data from the sugar mill which includes farmer name, gat number village, date of sowing and season area of sugarcane cultivated and production. Knowing the size of the sugarcane harvest might assist industry members make better decision [2]. Table 1. Ten year sugarcane cultivation trend in study area Season-Year Cultivated AREA Total Sugarcane Production in Ton 2010-2011 14556.33 1344688.952 2011-2012 13032.36 1229240.511 2012-2013 10824.94 1196219.045 2013-2014 11139.9 1191862.504 2014-2015 337.8816667 67610.66034 2015-2016 11425.21 1294479.054 2016-2017 15058.3365 1224696.921 2017-2018 10524.99 1187021.203 2018-2019 12118.64 1212491.125 2019-2020 11637.48 1047024.887 2020-2021 11272.86 1192268.53 Figure 1. Village wise sugarcane production AGARBHAG (SHIROL) AKIWAT ANKALI (SANGALI) ARJUNWAD AURWAD BARWAD BORGAON CHAND-SHIRADWAD CHICHWAD (KOLHAPUR) CHINCHWAD International Journal of Applied Sciences and Smart Technologies Volume 4, Issue 2, pages 123–130 p-ISSN 2655-8564, e-ISSN 2685-9432 126 From this we added column for productivity and village wise data created and applied machine learning algorithm for predicting the productivity of a particular village. Table 1, shows season wise sugarcane cultivated area and sugarcane production from study area. The model's predictor variables productivity of village is calculated on a yearly basis. Regression analysis is a basic, technique for modeling the relationship between one or more independent or predictor variables and a dependent or response variable that we want to forecast, and it is one of the tools available in statistical analysis literature [5]. Sugarcane production in study area is summarized in Table 1. Dataset description Following Figure 2. shows the sample dataset which is recorded by sugar factory. Year wise production also visualised in Figure 3. For applying machine learning algorithm for yield some of the columns are removed which are less correlated. The features shown in figure 4 are used for training ML regression models after doing the pre-processing such as converting categorical variable in to numerical we get 57495 rows x 11 columns . Dataset is further divided into 80% training and 20 % testing . Performance of model discussed in results section. Figure 2. Sample Recorded Data International Journal of Applied Sciences and Smart Technologies Volume 4, Issue 2, pages 123–130 p-ISSN 2655-8564, e-ISSN 2685-9432 127 Figure 3. Yearly Production Figure 4. Features Used for Modeling From the data provided another dataset created which is village wise yearly cultivated area and village wise sugarcane production and productivity of each village calculated per unit area production and machine learning models are trained and tested on this created dataset as well for forecasting productivity of particular village. Productivity compared with national level productivity and state level productivity to get further insights. In both cases climatic, nutrient supply, soil fertility status such parameters are not taken into consideration, which can taken and accuracy would be improved. 3 Results and Discussion Crop forecasting is the science of estimating crop yields and production ahead of time, usually a couple of months ahead of time. A crucial part of crop production International Journal of Applied Sciences and Smart Technologies Volume 4, Issue 2, pages 123–130 p-ISSN 2655-8564, e-ISSN 2685-9432 128 forecasting is defining the time horizon in terms of time series forecasting approaches. This study included three algorithm for productivity prediction random forest (RF), boosting (GBM), and XGboost which are the most commonly used for agricultural modeling[6]. We tried five different algorithms for modeling yield [4] performance is as shown in Table 2. Performance is not great because there are extrinsic parameters as well which impact on production of crop such as climate, rainfall, soil fertility , management skill and so on which are not considered in the current study. Table 2. Sugarcane Yield Prediction Model Performance Machine Learning Algorithm Accuracy Linear Regression 62% Random Forest 65% XGboost 66% Gradient Boost 63% Decision Tree 63% For sugarcane productivity prediction only village wise area cultivated and production of sugarcane for that particular season is used and target variable is productivity. In this modeling climate data , rainfall , soil quality not considered , parameters used are how much area cultivated, which type of breed, when it is planted, what type of water supply and when crop is taken. Average sugarcane productivity of India is 70-80, average sugarcane productivity of Maharashtra 80.72, [R20], whereas average productivity of the study area is 95. Random forest repressor gives 65% accuracy and other two XGBoost and gradient boosting gave 66% accuracy. When opposed to using a single data model to predict a response, using many model can improve the robustness and accuracy of predictions. 4 Conclusion The goal of this study was to see if a machine learning approach could provide fresh insights about sugarcane productivity in western Maharashtra. Predicting crop International Journal of Applied Sciences and Smart Technologies Volume 4, Issue 2, pages 123–130 p-ISSN 2655-8564, e-ISSN 2685-9432 129 production may help sugar mills to boost industry revenues by implementing more effective and focused forward selling tactics and logistics planning. The methodology described in this research can readily be applied to other sugarcane-growing regions and agricultural businesses around the world to improve agricultural methods. Sugarcane productivity prediction results demonstrated that the prediction accuracy of the machine learning algorithm is quite promising. Acknowledgements The authors are highly grateful to Shree Dutta Sugar Factory, for providing necessary data to carry out this research and thankful to Thadomal Shahani Engineering College, Bandra as well as Xavier Institute of Engineering, Mumbai, India. References [1] P. Mishra, M. A. G. A. Khatib, I. Sardar, J. Mohammed, K. Karakaya, A. Dash, M. Ray, L. Narsimhaiah, A. Dubey, “Modeling and Forecasting of Sugarcane Production in India”, Sugar tech, 23(6), 1317-1324, 2021. [2] L. A. Monteiro and P. C. Sentelhas, “Sugarcane yield gap: can it be determined at national level with a simple agrometeorological model?”, Crop and Pasture Science, 68(3), 272-284, 2017. [3] I. MAHARASHTRA CELL, “Agriculture Contingency Plan for District: KOLHAPUR”, ICAR_CRIDA_NICRA, 2019. [4] Y. Everingham, J. Sexton, D. Skocaj, G. Inman-Bamber, “Accurate prediction of sugarcane yield using a random forest algorithm”. Agron. Sustain. Dev. 36. 27. Springer Verlag/EDP Sciences/INRA. 2016. [5] R. G. Hammer, P. C. Sentelhas, J. C. Mariano, “Sugarcane yield prediction through data mining and crop simulation models”. Sugar Tech, 22(2), 216-225. 2020 [6] R. S. Kodeeshwari and K. T. Ilakkiya, “Different types of data mining techniques used in agriculture-a survey”. International Journal of Advanced Engineering Research and Science, 4(6), 237191. 2017. International Journal of Applied Sciences and Smart Technologies Volume 4, Issue 2, pages 123–130 p-ISSN 2655-8564, e-ISSN 2685-9432 130 [7] Shree Datta Shetkari S.S.K. Ltd., Shirol. Available at: http://dattasugar.co.in/