PRES22_0231.docx


                                                                                                                                                                 DOI: 10.3303/CET2294116 
 
 
Paper Received: 19  May  2022; Revised: 15  June  2022; Accepted: 17  June  2022 
Please cite this article as: Sadenova M.A., Beisekenov N.A., Apshikur B., Khrapov S.S., Kapasov A.K., Mamysheva A.M., Klemeš J.J., 2022, 
Modelling of Alfalfa Yield Forecasting Based on Earth Remote Sensing (ERS) Data and Remote Sensing Methods, Chemical Engineering 
Transactions, 94, 697-702  DOI:10.3303/CET2294116 
  

 CHEMICAL ENGINEERING TRANSACTIONS  
 

VOL. 94, 2022 

A publication of 

 
The Italian Association 

of Chemical Engineering 
Online at www.cetjournal.it 

Guest Editors: Petar S. Varbanov, Yee Van Fan, Jiří J. Klemeš, Sandro Nižetić 
Copyright © 2022, AIDIC Servizi S.r.l. 
ISBN 978-88-95608-93-8; ISSN 2283-9216 

Modelling of Alfalfa Yield Forecasting Based on Earth Remote 
Sensing (ERS) Data and Remote Sensing Methods 

Marzhan Anuarbekovna Sadenovaa*, Nail Alikuly Beisekenova, Baitak Apshikura, 
Sergey Sergeevich Khrapovb, Azamat Kaisarovich Kapasova, Asel Mukhtarkanovna 
Mamyshevaa, Jiří Jaromír Klemešа 
aPriority Department Centre «Veritas» D. Serikbayev East Kazakhstan technical university, 19 Serikbayev str. 070000 
 Ust-Kamenogorsk, Kazakhstan 
bVolgograd State University, 100 Prospekt Universitetskiy. 400062  Volgograd, Russian Federation  
 MSadenova@ektu.kz 

This study aims to develop a method for modelling early forecasting of alfalfa yield on a farm scale located in 
East Kazakhstan. The authors evaluated the correlation coefficient between forage crop yield and different data 
sets, including weather data, climate indices, spectral indices from drones and satellite observations. An 
ensemble machine learning model was developed by combining three commonly used basic training modules: 
random forest (RF), support vector method (SVM), and multiple linear regression (MLR). It is found that the best 
yield prediction algorithm in this study is the Random Forest (RF) algorithm, which predicts yields with R2 = 0.94 
and RMSE = 0.25 t/ha. The results of this study showed that combining remote sensing drought indices with 
climatic and weather variables from UAV and satellite imagery using machine learning is a promising approach 
for alfalfa yield prediction. 

1. Introduction 
World and domestic experience show that high and sustainable productivity of farming is possible only when all 
agrochemical and environmental factors necessary for normal growth and development of plants, formation of 
yields and their quality, and prevention of land degradation are taken into account in the complex. Rational use 
of agricultural lands and soil protection under market conditions requires adequate application of new scientific 
and methodological approaches. One of such system-analytical ways for organisations is a combination of 
traditional ground methods with geoinformation systems (GIS) technologies on the basis of the wide use of 
aerospace images of different resolutions. Remote sensing data has become vital for mapping features of 
terrestrial landscapes and infrastructures, managing natural resources, and studying environmental change 
(Habarov et al., 2019). Crop mapping and crop evaluation are the simplest but most important issues in 
agriculture. Sentinel-2 satellite data have been used extensively for these tasks over the past few decades 
(Nihar et al., 2019). In Guo et al. (2021), boundary line analysis shows that relative yield increases of 8-10% 
can be obtained by optimising yield-limiting factors. Timely yield estimation can help in making accurate 
management decisions, but traditional yield estimation approaches are labour-intensive and time-consuming, 
which hinders timely information in the field. Recently, unmanned aerial vehicles (UAVs) have attracted 
considerable attention in precision agriculture because of their efficiency in data collection. In addition, 
compared to other imaging methods, hyperspectral data can provide higher spectral accuracy for constructing 
narrow-band vegetation indices, which are important in yield modelling. Accurate seasonal forecasting of grain 
yields is an important decision support tool (Bouras et al., 2021). Yang et al. (2020) estimated land productivity 
as the potential for agricultural production by considering biophysical properties, including climate, soil, and land 
slope. Land productivity is approximated by the potential yield of six major crops: corn, soybeans, winter wheat, 
spring wheat, cotton, and alfalfa. In Yadav et al. (2021), the combination of climate and normalised difference 
vegetation index (NDVI) variables produced more accurate predictions compared to using NDVI alone to predict 
wheat, sorghum, and corn yields. The reason for the limitations of existing methods for yield forecasting is that 

697


the models are not adapted to the soil and climatic conditions of Kazakhstan. In this study, a seasonal forecast 
of alfalfa yield was performed using a combination of satellite and drone spectral imagery. The goal of our study 
is to develop an ensemble machine learning model by combining three widely used basic training modules to 
evaluate and determine the best algorithm. 

2. Materials and methods 
In this study, a large number of spectral indices were extracted from an array of satellite images and aerial 
photography results from unmanned aerial vehicles (UAVs) in the first phase of the study. Two Geoscan 201 
Agro and DJI Phantom 4 Multispectral UAVs were used. These are specialised aircraft used in agricultural 
enterprises, the forestry industry, and all other industries where plant condition monitoring is required. Unlike a 
conventional optical camera, the DJI model provides maximum information by capturing different spectra and 
allows you to visually detect abnormalities and quickly make decisions to correct them. The additional spectral 
channels (blue, green) provide a unique opportunity to calculate not only the NDVI index (+NDRE) but also the 
Enhanced Vegetation Index (EVI), which provides enhanced baseline data. Geoscan 201 Agro is used for aerial 
surveys with a range of 30 km and a flight time of up to 3 hours. Geoscan 201 Agro allows to survey up to 8,000 
ha/d and get orthophotos with georeferencing accuracy, corresponding to 1:500 scale requirements. 
Necessary parameters were selected to decrease the data dimensionality. Data on weather, crop yield, and 
spectral data were collected from Sentinel-2, Landsat-8, and TERRA (MODIS scanner) satellites. In the next 
step, an ensemble machine learning model was developed by combining three commonly used basic training 
modules: random forest (RF) (Belgiu et al., 2016), support vector method (SVM) (Cihlar et al., 1991), and 
multiple linear regression (MLR) (Eberly, 2007). A schematic diagram of the proposed research methodology 
with an overview of the main input data is presented in Figure 1. 

 
Figure 1: Schematic diagram presenting an overview of the main inputs data and the methodology proposed in 
this study 

Widely used statistical measures were used to evaluate the performance of the developed models in this study 
(Feng et al., 2020). The coefficient of determination (R2) reflects the degree of a linear relationship between 
observed and predicted alfalfa yields Eq(1). The mean absolute error (MAE) indicates the percentage of the 

698


mean deviation of the predicted yield from the observation Eq(2). The root means square error (RMSE) 
measures the discrepancy between the predicted yield and observations Eq(3). 

𝑅𝑅2 =
(∑ (𝑂𝑂𝑖𝑖 − 𝑂𝑂�) (𝐹𝐹𝑖𝑖 − 𝐹𝐹�)𝑛𝑛𝑖𝑖=1 )2

∑ (𝑂𝑂𝑖𝑖 − 𝑂𝑂�)2𝑛𝑛𝑖𝑖=1 ∑ (𝐹𝐹𝑖𝑖 − 𝐹𝐹�)2
𝑛𝑛
𝑖𝑖=1

  (1) 

𝑀𝑀𝑀𝑀𝑀𝑀 =
1
𝑛𝑛
�|𝐹𝐹𝑖𝑖 − 𝑂𝑂𝑖𝑖|
𝑛𝑛

𝑖𝑖=1

 
(2) 

𝑅𝑅𝑀𝑀𝑅𝑅𝑀𝑀 =
1
𝑛𝑛
�� (𝐹𝐹𝑖𝑖 − 𝑂𝑂𝑖𝑖)2

𝑛𝑛

𝑖𝑖=1
 

(3) 

where Oi is the observed return, Fi is the predicted return using the machine learning algorithm, 𝑂𝑂� and 𝐹𝐹� are the 
average values of the observed and predicted returns, and n is the number of samples used for the machine 
learning model. 
The parameters of the selected models were adjusted using NDVI composite indices for the period from 2017 
to 2022. Sixteen-day composite images with a resolution of 250 m obtained from MODIS (TERRA satellite) as 
well as Sentinel-2 and Landsat-8 were used to prepare the initial data. Yield data for the specified time interval 
were obtained from the farm "Experimental farm of oilseed crops" (EFoOC), located in Eastern Kazakhstan. An 
adjustment was made for the alfalfa crop. The total area of the experimental plot was 327 ha and is shown in 
Figure 2. 

 
Figure 2: Study area and experimental field 

Alfalfa is one of the most valuable and intensively grown fodder crops worldwide. The use of remotely sensed 
data (ERS) is the most technologically advanced and progressive method of vegetation monitoring, but it has a 
number of disadvantages compared to UAV imagery. Due to the high altitude of satellite images, the detail of 
the objects has limitations. The main problems connected with the space image use are related to the pixel 
resolution (30 m2 per pixel for Landsat and 500 m2 for MODIS) and circulation period (16 d for Landsat and 26 
d for SPOT). This problem was solved when new satellites were introduced: WorldView 2-3 (DigitalGlobe, 
Longmont, Colorado, USA). WorldView-2 is the first commercial high-resolution satellite with eight spectral 
sensors ranging from visible to near-infrared. A feature of the satellite is that each sensor is narrowly focused 
on a specific range of the electromagnetic spectrum, which is sensitive to a particular feature of the earth or 
property of the atmosphere. However, images from this platform are very expensive. The re-visit time, which 
averages 16 d, also complicates agricultural applications, especially those related to water and nutrient 
management (Xue et al., 2017). On-board and/or unmanned platforms serve these two major challenges. 
Carrying out simultaneously with space monitoring and aerial surveys of land, followed by summarising the 

699


results of statistical processing of the dataset of climatic, agrochemical and others in the form of a mathematical 
model, seems an important and necessary task for crop yield forecasting. 

3 Results and discussion 
Analysis of seasonal dynamics of NDVI revealed that index values are higher in summer months (June, August) 
than in autumn months (October). This is explained by the seasonal dynamics of the vegetation index. The 
timing of the phases of development varies depending on the weather conditions of the year. As the phases of 
vegetative development change, the composition and content of pigments in the leaves of plants change, 
biomass increases and the amount of chlorophyll in the green leaves of plants increases. As chlorophyll 
accumulates, plant brightness decreases in the visible part of the spectrum, especially in the red zone, and 
increases in the infrared. Consequently, the NDVI value increases. Shortwave infrared (SWIR) measurements 
can estimate the amount of water in plants and soil because water absorbs SWIR wavelengths. Shortwave 
infrared bands (band - region of the electromagnetic spectrum; a satellite sensor can image the Earth in different 
bands) are also useful for distinguishing between cloud types (water clouds and ice clouds), snow and ice that 
appear white in visible light. In this composite image, vegetation appears in shades of green, soils and built-up 
areas have different shades of brown, and water appears black. The recently burned ground is strongly reflected 
in the SWIR bands, making them valuable for mapping fire damage. Each type of rock reflects shortwave 
infrared light differently, allowing you to map the geology by comparing the reflected SWIR light. 
Aerial photography from a drone gives larger and more detailed data in high resolution (including digital images) 
and allows to work in any weather (except wind) and surveys up to 5,000 ha of crops. Visualisation of 
comparison of space imagery data with the results of aerial photos processing is presented in Figures 3 and 4. 

 
Figure 3: Calculation of NDVI index. Alfalfa crop: 1 - based on Sentinel satellite imagery, 2 - EO Browser web-
application, 3 - based on aerial imagery from GEOScan 201 Agro UAV 

 
Figure 4: Calculation of NDVI index. Alfalfa crop: 1 - based on Landsat 8 satellite imagery, 2 - one soil web 
application, 3 - based on DJI Phantom 4 Multispectral aerial imagery 

700


Figure 3 shows the calculation of the NDVI index of the experimental plot in the period from September 6-12, 
2021. NDVI index values allow not only for comparing the performance of two levels of information acquisition 
but also for using accumulated values of NDVI indices as predictors in the yield forecasting model. Figure 4 
shows the results of calculating the NDVI index of the experimental plot from 6 – 10 May 2022. 
Three machine learning methods were developed (1 y before harvest) to determine the best combination of 
input data for modelling and forecasting alfalfa yield among satellite indices, weather data, and climate indices. 
To select the best hyperparameters of the machine learning algorithms, this study used complex grid search 
(GS) to explore all possible combinations of hyperparameters and also used cross-validation (CV) to evaluate 
the performance of the algorithms. In GS, a set of values was assigned to each hyperparameter, and a set of 
tests was generated by assembling all possible combinations of values. The evaluation was performed using k-
fold cross-validation. CV is the most commonly used method for algorithm selection and evaluation because of 
its simplicity and ability to avoid overtraining. In k-fold cross-validation, the training data are randomly divided 
into k subsets, and the delay method is repeated k times so that each time one of the k subsets is used as the 
validation set of the model built using (k - 1) subsets.  
Statistical measures for different combinations of input datasets and for different methods are presented in Table 
1. Cross-validation is used to avoid over-training of the neural network. All data on which the model is built are 
divided into k blocks of equal size. Training is done on k-1 blocks, and testing is done on the kth block. The 
procedure is repeated k times, and each time a different block is chosen for testing. As a result, all blocks turn 
out to be used both as training and testing blocks. This prevents and guarantees that the model is not be 
retrained in the future. 

Table 1. Statistical performance of prediction models for several combinations of raw data and three machine 
learning methods (1 y before harvest) 

Input data Models RMSE (t/ha) MAE (t/ha) R2 
Satellite drought indices 
only 

SVM 
MLR 
RF 

0.58 
0.64 
0.51 

0.41 
0.52 
0.40 

0.75 
0.63 
0.78 

Hyperspectral indices from 
UAV and weather data 

SVM 
MLR 
RF 

0.47 
0.38 
0.32 

0.36 
0.41 
0.30 

0.76 
0.86 
0.89 

Satellite drought indices, 
UAV data, weather data, 
and climate indices 

SVM 
MLR 
RF 

0.33 
0.45 
0.25 

0.25 
0.32 
0.20 

0.87 
0.78 
0.94 

     
The results presented in Table 1 show that the satellite drought indices, the UAV data, weather data, and climate 
indices show better results than the other data. The results showed that the statistical performance of the model 
improves as the number of data sets used for prediction increases. All statistical performance improves with the 
addition of datasets for all methods tested. Results showed that yield variability correlated with satellite drought 
index values R2 ranging from 0.63 (for MLR) to 0.78 (for RF) and RMSE from 0.58/ha (for MLR) to 0.51/ha (for 
RF).  
By combining satellite drought indices and weather data, the performance of all models improves by 3 – 8 % for 
R2 and 25 – 32 % for RMSE. The best statistical performance is obtained by combining the three data sets with 
a further statistical improvement of about 14 – 43 % for RMSE and 3 – 9 % for R2, depending on the method 
used. This means that climate indices such as the NAO, SCA, and SST models, as well as the use of UAV data, 
contribute to improved model performance. In addition, nonlinear machine learning (RF, SVM) approaches 
outperformed linear approaches (MLR) when comparing different methods. This indicates that most of the 
relationships between returns and the predictors in question are nonlinear, and those nonlinear methods are 
obviously better at capturing these relationships than linear methods. Finally, the best yield prediction algorithm 
in our study is RF, which predicts yield with R2 = 0.94 and RMSE = 0.25 t/ha. This result was confirmed by 
several studies on seasonal yield prediction, which showed better performance of the RF method compared to 
other nonlinear machine learning approaches such as SVM and MLR. This is consistent with the results of Adam 
et al. (2021), where the authors state that regression models using a coefficient of determination (R2), standard 
error of estimate (SEE), and root mean square error (RMSE) will allow farmers to properly develop soil 
management plans and prevent acidification problems when combined with other soil property data. 

701


4. Conclusion 

When testing the existing models, shortcomings in predicting various cereals, legumes, and oilseeds were 
identified. Alfalfa was selected to adapt the existing algorithms since it is harvested three times during one 
calendar year, so it is possible to make adjustments when there is a discrepancy in the yield forecast. Yield 
forecasting provides critical and timely information that allows farmers to make quick decisions to improve yields 
by improving farming practices during the growing season. The main objective of this study was to develop an 
approach to forage crop yield forecasting in East Kazakhstan based on data from multiple sources and machine 
learning techniques. To this end, this study presents a methodology based on different machine learning 
approaches (MLR, SVM, RF) to predict alfalfa yield in the year before harvest using freely available datasets, 
including satellite drought indices, weather data, and climate indices. The results show that the combination of 
satellite drought indices, weather, and climate data as predictors of forage crop yields provides higher predictive 
accuracy than using any single data source. The results revealed that the RF method is superior to other 
machine learning methods. The RF method predicts yield with R2 = 0.94 and RMSE = 0.25 t/ha. which is one of 
the closest to the actual data. In addition, the proposed approach provides a source of timely information for 
decision-making during the growing season. This work can be used to map yield gains and analyse yield gaps 
nationwide. The identified hotspot areas in terms of yield gaps are suggested for practice improvement and 
further research work. 

Acknowledgements 

This research has been supported by Project BR10865102 "Development of technologies for remote sensing 
of the earth (RSE) to improve agricultural management", funded by the Ministry of Agriculture of the Republic 
of Kazakhstan 

References 

Adam M., Ibrahim I., Sulieman M., Zeraatpisheh M., Mishra G., Brevik E. C. 2021, Predicting Soil Cation 
Exchange Capacity in Entisols with Divergent Textural Classes, The Case of Northern Sudan Soils. Air, Soil 
and Water Research, 14, 11786221211042381. 

Beisekenov N.A., Sadenova M.A., Varbanov P.S., 2021. Mathematical Optimization as A Tool for the 
Development of "Smart" Agriculture in Kazakhstan, Chemical Engineering Transactions, 88, 1219-1224. 

Belgiu M., Drăguţ L. 2016, Random forest in remote sensing: A review of applications and future 
directions. ISPRS Journal of Photogrammetry and Remote Sensing, 114, 24-31. 

Bouras E.h., Jarlan L., Er-Raki, S., Balaghi, R., Amazirh A., Richard B., Khabba S. 2021, Cereal Yield 
Forecasting with Satellite Drought-Based Indices, Weather Data and Regional Climate Indices Using 
Machine Learning in Morocco. Remote Sens. 13, 3101. doi: 10.3390/rs13163101.  

Cihlar J., Laurent L. S., Dyer J. A. 1991, Relation between the normalized difference vegetation index and 
ecological variables. Remote sensing of Environment, 35(2-3), 279-298. 

Eberly, L. E. 2007, Multiple linear regression. Topics in Biostatistics, 165-187. 
Feng L., Zhang Z., Ma Y., Du Q., Williams P., Drewry J., Luck B. 2020. Alfalfa yield prediction using UAV-based 

hyperspectral imagery and ensemble learning. Remote Sensing, 12(12), 2028. 
Guo X., Shukla M. K., Wu D., Chen S., Li D., Du T. 2021. Plant density, irrigation and nitrogen management: 

three major practices in closing yield gaps for agricultural sustainability in North-West China. Frontiers of 
Agricultural Science and Engineering, 8(4), 525-544.  

Habarov D.A., Adiev T.S., Popova O.O., Chugunov V.A., Kozhevnikov V.A. 2019, Analysis of modern 
technologies for remote sensing of the Earth, Moskovskij ekonomicheskij zhurnal. 181-190. doi 
10.24411/2413-046Х-2019-11068 

Nihar A., Patel N. R., Pokhariyal S., Danodia A. 2022, Sugarcane Crop Type Discrimination and Area Mapping 
at Field Scale Using Sentinel Images and Machine Learning Methods, Journal of the Indian Society of 
Remote Sensing, 1-9. 

Xue J., Su B. 2017, Significant Remote Sensing Vegetation Indices: A Review of Developments and 
Applications, Journal of Sensors, 2017, ID 1353691, doi: 10.1155/2017/1353691 

Yadav K., Geli H. M. 2021. Prediction of Crop Yield for New Mexico Based on Climate and Remote Sensing 
Data for the 1920–2019 Period. Land, 10(12), 1389.  

Yang P., Zhao Q., Cai X. 2020. Machine learning based estimation of land productivity in the contiguous US 
using biophysical predictors. Environmental Research Letters, 15(7), 074013. 

702


	PRES22_0234.pdf
	Modelling of Alfalfa Yield Forecasting Based on Earth Remote Sensing (ERS) Data and Remote Sensing Methods