Research Article http://dx.doi.org/10.4314/mejs.v15i1.7 Momona Ethiopian Journal of Science(MEJS), V15(1): 89-104, 2023 ©CNCS, Mekelle University, ISSN:2220-184X Submitted on: 26 th July 2020 Revised and Accepted on: 31 st March 2023 Design and Analysis of Urban Land Lease Price Predicting Model Using Batch Gradient Descent Algorithm Kifle Berhane Niguse* Department of Computer Science and Engineering, Mekelle Institute of Technology, Mekelle University, P.O. Box 231, Mekelle, Ethiopia (*kifle.berhane@mu.edu.et). ABSTRACT Standard and econometric models are appropriate for causal relationships and interpretations among facets of the economy. But with prediction, they tend to over-fit samples and simplify poorly to new, undetected data. This paper presents a batch gradient algorithm for predicting the rice of land with large datasets. This paper uses a batch gradient descent algorithm to minimize the cost function, J(θ) iteratively with possible combinations of θ0 and θ1 the number of iterations i=1500 and learning rates,α of 0.01, 0.02, 0.03 for the linear regression case and i = 100, α =0.3, 0.2, and 0.1 for the multiple regression case. The paper uses Octave-4.0.3(GUI) for implementing 129 samples of the lease bid price of Mekelle City as training sets and feature inputs of two and three for linear regression and multiple regressions. Using α = 0.01, the best fitting parameters found by training the dataset are θ0 = 6.02 and θ1 = 2.30 with a cost of J=67.82. The model predicts with an accuracy of 92.6% using LR and 91.15% using MLR for a 315 m2 land size. As the learning rate increases, the fitting parameters θ0 and θ1 increase and decrease respectively with an equal cost but the model’s prediction error increments slowly. With multiple regression, as the learning rate lowers, the model under fits prediction drastically (with an accuracy of 60%) with gradient descent and predicts with an accuracy of 91.5% with ordinary equations. So, prediction with ordinary equations provides the best fit for multiple regressions. Keywords: Batch Gradient Descent Algorithm, Cost Function, Feature Scaling, Learning Rate, Machine Learning, Regression. 1. INTRODUCTION Human beings have been passing through different revolutionary ages to make their living standard better than what their ancestors had lived. The industrial revolution age, the information technology age, and now the data science and Artificial Intelligence (AI) era are among the dominant ones, to mention a few. These all brought significant economic changes for every household. Recently, the role of data science and AI, more particularly Machine Learning (ML) along with Artificial Intelligence (AI), has caught the thoughtfulness of economists (Athey and Imbens, 2019). ML is a set of procedures (Algorithms). It is functional for big datasets for prediction, classification, fraud detection, and product recommendation (Athey, 2017). Machine learning applications are ready to lend a hand in attaining improved forecasting that is indispensable for economic policy formulation and targeting (El Naqa and Murphy, 2015). ML is Kifle Berhane Niguse (MEJS) Volume 15(1):89-104, 2023 © CNCS, Mekelle University 90 ISSN: 2220-184X not much concerned with questions about identification. But it brings better results when the destination is a quasi-parametric approximation or with several covariates in comparison to the size of observations (Giles, 2018). ML is used for development interventions and impact evaluations in measuring outcomes and targeting treatments (McKenzie, 2018). There is a gap in the measurement of fundamental statistics. Machine learning is better for the problem at the macro- and micro-level (Elena Badilo, 2019). Specialists in ML are becoming curious about ML if it helps target interventions, i.e., deciding when and where/for whom to intervene (McKenzie, 2018). ML techniques can be helpful for econometricians. ML approaches are so powerful computationally. Subsequently, the intermingled use of econometric models with ML approaches helps to yield a resounding estimation and hence more multifaceted models (Kleinberg and Ludwig, 2016). The correlation between land lease prices and the macroeconomy is an imperative and inspiring reason for predicting land lease values. Land lease price implications are not only an issue for consumers and suppliers (Mekelle City Administration), but it also signposts the present-day economic condition. Consequently, it is significant to predict these prices without bias to assist both the consumers and suppliers in making their decisions. Standard economic models are well appropriate to consider causal relationships between different facets of the economy, but when it comes to prediction, they tend to over-fit samples and occasionally simplify poorly to new, undetected data. Econometric models are relatively modest and relaxed to interpret; ML methods address enormous amount of aggregated data, frequently without forfeiting interpretation. Econometrics is to create causation, while ML aims to make accurate and actionable predictions. A significant benefit of ML is that it considers empirical analysis as algorithms that estimate (evaluate) and compare several other models. This approach contrasts with econometrics, where the researcher picks a model based on principles and estimations once (Giles, 2018). However, the growth of ML in social studies is so slow- moving. Social scientists have slowly but steadily begun leveraging ML techniques to gain new insights from data (Kleinberg and Ludwig, 2016). It is common to see that most of the researches in economics are generally very fascinated causality questions (e.g., how do consumers get into financial distress?) (Varian, 2016). Despite, several policy problems have prediction questions that need to be addressed (e.g., which consumers will become financially distressed?) (Gharehchopogh, 2013). There have been studies on retail sales forecasting, stock market price, gold prices and estimation of GDP with Kifle Berhane Niguse (MEJS) Volume 15(1):89-104, 2023 © CNCS, Mekelle University 91 ISSN: 2220-184X regression models prediction that check on the goodness of fit, model fit and specification, and statistical significance is explored using linear and multiple linear regressions (Wu et al., 2003). Other methods of ML predicting algorithms are like SVR and flexible ANN (employing a special adaptive regularization term) on travel-time prediction and the daily cash demand for ATM, respectively (Wang and Zhu, 2010). However, none of these studies used batch gradient descent algorithm for optimal parameters estimation and the minimization of the MSE or the cost. That is the reason the author is motivated to apply linear regression, multi-regression and ML algorithms (batch gradient algorithm) with the given training set and feature inputs of the land lease price of Mekelle city administration to contribute on better prediction of macroeconomic questions by blending econometrics and ML approaches. The line of best fit would be, using the model coefficients, such that it would be as close as possible to the actual dataset by minimizing the sum of the squared distances between each dataset and the line ℎ𝜃(𝑥) = 𝜃0 + 𝜃1𝑥. Once 𝜃0 𝑎𝑛𝑑 𝜃1 are known, the model can be used to predict the response (price). The objective of the research is to minimize the cost function iteratively using gradient descent algorithm with optimal learning rate; by taking possible combinations of 𝜃0 𝑎𝑛𝑑 𝜃1 and finally, design a price and profit predictive model for better predictability and analysis with new feature inputs of the city administration like that of land grade, land size, land location, bid price and so on. 2. METHODOLOGY 2.1. Conceptual Framework The model comprehends machine learning algorithms, data training sets, and feature inputs to design a predictive model with low cost (MSE), optimal fitting line, and fast learning (Fig 1). Specifically, the cost function, hypothesis function, and batch gradient descent algorithm are presumed for the ML algorithm. The row entities (data training set) are, based on the Mekelle city land lease size of 140 m2, 175 m2, 250 m2, and 300 m2. The column elements (feature inputs) treated are land size, land grade, and bid price. Pictorial presentation of the whole conceptual framework is presented in figure 1. 2.2. Description of the Study Area Data used for the study is a secondary data source collected from Mekelle city administration in the form of reports. The study employed correlation and experimental quantitative research. It used correlation methods to show the relation between explained and explanatory variables. Kifle Berhane Niguse (MEJS) Volume 15(1):89-104, 2023 © CNCS, Mekelle University 92 ISSN: 2220-184X Figure 1. Conceptual Framework. Quantitative analysis is applied to predict the cause-and-effect relationship between the dependent variable and independent variables. All these procedures use machine learning and econometric tools devised on linear regression and multiple regression models. Because land lease bid in Mekelle city is every three months, that is, discrete and equally spaced time intervals. The study is, thus, conducted using a time series data method. 2.3. Research Approach of the Study Because predicting prices of the land lease is numeral in their inquiry mode quantitative approach is used to capture the in-depth and enormous data and information for systematic analysis and understanding of the performance. As prices are predicted by linearly relating parameters (features) of the model, the research is correlational by its objective and experimental based on the nature of the investigation. Octave-4.0.3(GUI), NotePad++, and Mathlab are used to implement the experiment. Kifle Berhane Niguse (MEJS) Volume 15(1):89-104, 2023 © CNCS, Mekelle University 93 ISSN: 2220-184X In this paper, the researcher tested instrument regression (univariate and multivariate) and experienced to see it work on the given dataset. Regression parameters like that of are made to fit the dataset using batch gradient descent algorithm by tuning suitable learning rates and the number of iterations to achieve minimum cost or mean square error. The objective of linear regression is to minimize the cost function in equation (1). Cost function: 𝐽(𝜃0, 𝜃1) = 1 2𝑚 ∑ (ℎ𝜃(𝑥𝑖) − 𝑦𝑖) 𝑚 𝑖=1 2 -------------------- (1) Where, the hypothesis function ℎ𝜃(𝑥) is given by the linear model: ℎ𝜃(𝑥) = 𝜃 𝑇𝑥 = 𝜃0 + 𝜃1𝑥1 --------------------- (2) Hence, the parameters of the model are the 𝜃𝑗 values. For the regression possible feature input taken are land lease size (m2), land grade and the bid price. These are the values that are adjusted to minimize cost function𝐽(𝜃). The researcher approached this achievement by • Trying a number of blends of 𝜃0 𝑎𝑛𝑑 𝜃1, learning rate and number of iterations • Attaining at the line equation that satisfy most of the dataset • Use the finalized line equation for predicting the new set of input values Hence, the cost function rescues in identifying the optimal equation with the right combinations of 𝜃0 𝑎𝑛𝑑 𝜃1 to realize best fitting line and optimized model. For each prospective line equation, the researcher: • Tried to predict the ℎ𝜃 value and took the difference from the actual ℎ𝜃 value which gives the error possible with that particular equation. • Repeated the same for all the sample datasets available • Squared all the errors as the interest is in the magnitude of the error, and sometimes the positives and negatives may cancel out leading to no conclusion. • Took the average by dividing with the number of sample dataset The value, thus, arrived for each possible line equation is compared to derive the optimal line equation. The one with lowest Mean Square Error (MSE) is the one which can be picked as the Model for future predictions. To minimize the cost function 𝐽(𝜃0, 𝜃1), one approach to do this is using the batch gradient descent algorithm stated next. Kifle Berhane Niguse (MEJS) Volume 15(1):89-104, 2023 © CNCS, Mekelle University 94 ISSN: 2220-184X Gradient descent algorithm: Repeat until convergence { 𝜃𝑗: = 𝜃𝑗 − 𝛼 𝜕 𝜕𝜃𝑗 𝐽(𝜃0, 𝜃1); 𝜕 𝜕𝜃𝑗 𝐽(𝜃0, 𝜃1) = 𝜕 𝜕𝜃𝑗 1 2𝑚 ∑ (𝜃0 + 𝜃1𝑥1 − 𝑦𝑖 𝑚 𝑖=1 ) 2 For j=0, 𝜃0 𝜕 𝜕𝜃0 𝐽(𝜃0, 𝜃1) = 1 𝑚 ∑ (𝜃0 + 𝜃1𝑥1 − 𝑦𝑖) 𝑚 𝑖=1 For j=1, 𝜃1 𝜕 𝜕𝜃1 𝐽(𝜃0, 𝜃1) = 1 𝑚 ∑ (𝜃0 + 𝜃1𝑥1 − 𝑦𝑖) 𝑚 𝑖=1 𝑥1} Standard and econometric models are appropriate for causal relationships and interpretations between facets of the economy. But with prediction, they tend to over-fit samples and simplify poorly to new, undetected data. This paper presents a batch gradient algorithm for predicting urban lease prices with large datasets. The paper uses a batch gradient descent algorithm to minimize the cost function, J (θ) iteratively with possible combinations of 𝜃0 𝑎𝑛𝑑 𝜃1, the number of iterations i =1500 and learning rates α = 0.01, 0.02, and 0.03 for the linear regression case and i = 100, α = 0.3, 0.2, and 0.1 for the multiple regression case. The paper uses Octave-4.0.3(GUI) for implementing 129 samples of the lease bid price of Mekelle city as training sets and feature inputs two for linear regression and three multiple regressions. Using α = 0.01, the best fitting parameters found by training the dataset are 𝜃0=6.02 and 𝜃1=2.30 with a cost of J=67.82. The model predicts with an accuracy of 92.6% using LR and 91.15% using MLR for 315 m2 land size. It became clear that, as the learning rate increases, the fitting parameters 𝜃0and 𝜃1 increase and decrease respectively with equal cost, and the model’s prediction error increases slowly. With multiple regression, as the learning rate reduces, the model under fits prediction drastically (i.e., with an accuracy of 60%) with gradient descent and predicts with an accuracy of 91.5% with normal equation. So, prediction with normal equations provides the best fit for multiple regressions. { 𝜕 𝜕𝜃0 𝐽(𝜃0, 𝜃1) = 1 𝑚 ∑ (𝜃0 + 𝜃1𝑥1 − 𝑦𝑖) 𝑚 𝑖=1 For j=1, 𝜃1 𝜕 𝜕𝜃1 𝐽(𝜃0, 𝜃1) = 1 𝑚 ∑ (𝜃0 + 𝜃1𝑥1 − 𝑦𝑖) 𝑚 𝑖=1 𝑥1 } Kifle Berhane Niguse (MEJS) Volume 15(1):89-104, 2023 © CNCS, Mekelle University 95 ISSN: 2220-184X In batch gradient descent algorithm, each iteration performs the update: 𝜃𝑗: = 𝜃𝑗 − 𝛼 1 𝑚 ∑ (ℎ𝜃(𝑥𝑖) − 𝑦𝑖) 𝑚 𝑖=1 𝑥𝑖 𝑗 (Simultaneously update θj for all j ) --------- (3) For every step of gradient descent, the parameter 𝜃𝑗 comes closer to the optimal values that will achieve the lowest cost value that is 𝐽(𝜃0, 𝜃1). For multivariate linear regression, possible feature inputs can be land lease size (m2), land grade, and price per m2. The land sizes and the prices in this paper’s dataset are much bigger than the land grade, almost 100 times bigger. That deters the gradient descent algorithm does not converge quickly. To avoid this feature scaling is made, so that gradient descent converges more quickly. The working principle of multivariate linear regression is the mean values of each feature must be subtracted from the data set then scale (divide) the feature values by their respective standard deviations. The standard deviation is a way of measuring how much variation there is in the range of values of a particular feature. The gradient descent for multivariate linear regression remains the same as the univariate linear regression except that there will be more features. The hypothesis function and the batch gradient descent update will remain unchanged (See equations, 2 and 3). In the multivariate case, the cost function can be written in the form indicated by the equation (4). 𝐽(𝜃0, 𝜃1) = 1 2𝑚 (𝑋𝜃 − �⃗�)𝑇(𝑋𝜃 − �⃗�) ---------------------------------- (4) Where, X= [ 𝑥1 𝑇 𝑥2 𝑇 ⋮ 𝑥𝑚 𝑇 ] �⃗� = [ 𝑦1 𝑦2 ⋮ 𝑦𝑚 ] • m is the number of sample datasets (training set). • 𝒉𝜽(𝒙𝒊) is the predicted output value (price). • 𝒚𝒊 is the actual output value (takes a feature and estimates price). • 𝜶 is learning rate. • 𝒙 is input variable/features. • 𝒙𝒊 is i th training example. • 𝟏 𝟐 shows minimizing one half the mean squared error makes the Octave/Mathlab easy. Kifle Berhane Niguse (MEJS) Volume 15(1):89-104, 2023 © CNCS, Mekelle University 96 ISSN: 2220-184X 2.4. Methods of Data Processing and Analysis The collected dataset is first cleared (edited) at central editing in case of missing values. Next, it is coded and organized in vector-matrix form. Finally, it is processed with LR, MLR, gradient descent algorithm, and normalization procedures. The data processing flow chart of the used algorithm is shown below in figure 2. Figure 2. Data processing flow chart. The Quantitative dataset generated from the sample selected from Mekelle City Administration land lease bid price is analyzed by writing program codes using the Octave software by changing relevant parameters like that of theta (𝜃), number of iterations and the learning rate alpha (α) using both univariate and multivariate LRs. Predicted price, cost function value and gradient descent values are used in reporting and explaining the results. 3. RESULTS AND DISCUSSIONS 3.1. Results with Linear Regression 3.1.1. Scenario-A: α = 0.01 and i* = 1500 With the learning rate of α = 0.01 and number of iterations, i* = 1500, the fitting parameters and the minimum cost (MSE) value for the sample of land sizes 315 m2 and 140 m2 are found to be 𝜃0= 6.02, 𝜃1 = 2.30 and 𝐽(𝜃0, 𝜃1) = 67.38 respectively (Fig 3a, b and c). With these gradient descent fitting parameters, the prices predicted with the experiment for the samples are found to be Birr 13,274.74 and 9245.20 respectively (Table 1) whereas the actual average prices are 14,330.31 and 12,228.97 respectively. The prediction errors are 7.4% and 24.4% for the sample Kifle Berhane Niguse (MEJS) Volume 15(1):89-104, 2023 © CNCS, Mekelle University 97 ISSN: 2220-184X land sizes of 315 m2 and 140 m2. Hence, this model best fits with a prediction accuracy of 92.6% and 75.6% for the first and second samples respectively (Fig 3a). The simulated results for cost and the contour for scenario-A are illustrated in figure 3. (a) (b) (c) Figure 3. Linear regression fit line, Cost and Contour for α = 0.01 and i* = 1500 (a) Training data with linear regression fit (b) Surface (c) Contour, showing minimum. 3.1.2. Scenario-B: α = 0.02 and i* = 1500 If the learning rate increases to α = 0.02 with number of iterations, i* = 1500, the fitting parameter 𝜃0 increases to 7.76 and 𝜃1 decreases to 1.57 and the minimum cost (MSE) value remains same for the sample of land sizes 315 m2 and 140 m2 (Fig 4 a, b and c). With these gradient descent fitting parameters, the predicted prices are Birr 12,722.51and 9968.26 (Table1) respectively. As a result, this model best fits with a prediction accuracy of 88.78% and 81.5% for the first and second samples respectively (Fig 4a). The simulated results for price, cost and the contour for scenario-B are illustrated in figure 4. Kifle Berhane Niguse (MEJS) Volume 15(1):89-104, 2023 © CNCS, Mekelle University 98 ISSN: 2220-184X (a) (b) (c) Figure 4. Linear regression fit line, Cost and Contour for α = 0.02 and i* = 1500 (a) Training data with linear regression fit (b) Surface (c) Contour, showing minimum. 3.1.3. Scenario-C: α = 0.03 and i* = 1500 If the learning rate is further increased to α = 0.03 with number of iterations, i* = 1500, the fitting parameter θ0 keeps increasing to 8.47 and θ1 decreases to 1.28 and the minimum cost (MSE) value remains same for the sample of land sizes 315 m2 and 140 m2 (Fig 5 a, b and c). With these gradient descent fitting parameters, the predicted prices are Birr 12,499.09 and 10,260.81 (Table 1) respectively. Therefore, the prediction accuracies are 87.22% and 83.9% for the first and second samples respectively (Fig 5a). The simulated results for price, cost and the contour for scenario-C are illustrated in figure 5. Kifle Berhane Niguse (MEJS) Volume 15(1):89-104, 2023 © CNCS, Mekelle University 99 ISSN: 2220-184X (a) (b) (c) Figure 5. Linear regression fit line, Cost and Contour for α = 0.03 and i* = 1500 (a) Training data with linear regression fit (b) Surface (c) Contour, showing minimum. 3.1.4. Scenario-D: α = 0.01 and i* = 750 If the learning rate is further increased to α = 0.01 with number of iterations lowered to, i* = 750, the fitting parameter 𝜃0 decreases to 4.35 and 𝜃1 increases to 3.00 and the minimum cost (MSE) value remains same for the sample of land sizes 315 m2 and 140 m2 (Fig 6 a, b and c). With these gradient descent fitting parameters, the predicted prices are Birr 13,805.13 and 8550.72 (Table 1) respectively. Therefore, the prediction accuracies are 96.34% and 70% for the first and second samples respectively (Fig 6 a). The simulated results for price, cost and the contour for scenario- D are illustrated in figure 6. Kifle Berhane Niguse (MEJS) Volume 15(1):89-104, 2023 © CNCS, Mekelle University 100 ISSN: 2220-184X (a) (b) (c) Figure 6. Linear regression fit line, Cost and Contour for α = 0.01 and i* = 750 (a) Training data with linear regression fit (b) Surface (c) Contour, showing minimum (i* = number of iterations). The quantitative output (for the model parameters, MSE and price) obtained after considering scenarios A - D are summarized in table1 for the case of LR. Table1. Model parameters and results for LR. Scenario Theta found by gradient descent Cost (MSE) Predicted price (Birr) for sample land sizes 𝜃0 𝜃1 𝐽(𝜃0, 𝜃1) 315 m 2 250 m2 175 m2 140 m2 A α = 0.01 6.02 2.30 67.38 13274.74 11778.05 10051.10 9245.20 i* = 1500 B α = 0.02 7.76 1.57 67.38 12722.51 11699.51 10519.11 9968.26 i* = 1500 C α = 0.03 8.47 1.28 67.38 12499.09 11667.73 10708.46 10260.81 i* = 1500 D α = 0.01 4.35 3.00 67.38 13805.13 11853.49 9601.60 8550.72 i* = 750 Note: i* = number of iterations and α = learning rate. Kifle Berhane Niguse (MEJS) Volume 15(1):89-104, 2023 © CNCS, Mekelle University 101 ISSN: 2220-184X 3.2. Results with Multiple Regression For α = 0.3 and i* = 100, the fitting parameters computed from the experiment are 𝜃0 = 11110.98, 𝜃1 = 640.15, and 𝜃2 = 674.27 (Table 2) for the minimum cost value. The predicted prices for the two sample land sizes (using batch gradient descent algorithm) are Birr 13062.20 and 9745.72. The prediction accuracy is, therefore, 91.15% for the 315 m2 sample land size and 80% for 140 m2. This prediction accuracy and values of best fitting parameters remain almost the same for learning rates of 0.2, and 0.1 (Table 2). Table 2. Model parameters and results for MLR. Scenario Theta found by gradient descent Predicted price (Birr) for sample land sizes 𝛉𝟎 𝛉𝟏 𝛉𝟐 315 m 2 140 m2 A α = 0.3 i* = 100 Using gradient descent 11110.98 640.15 674.27 13062.20 9745.72 Using normal equation 7701.69 10.25 304.58 13062.20 9745.72 B α = 0.2 i* = 100 Using gradient descent 11110.98 640.15 674.27 13062.20 9745.71 Using normal equation 7701.69 10.25 304.58 13062.20 9745.72 C α = 0.1 i* = 100 Using gradient descent 11110.69 640.36 674.06 13061.89 9745.29 Using normal equation 7701.69 10.25 304.58 13062.20 9745.72 D α = 0.01 i* = 100 Using gradient descent 7044.00 513.94 525.91 8587.19 5959.41 Using normal equation 7701.69 10.25 304.58 13062.20 9745.72 Note: i* = number of iterations; α = learning rate. However, when the learning rate is small, i.e., when α = 0.01, the prediction accuracy is found to be so low, i.e., 60% and 49% for land size of 315 m2 and 140 m2 respectively with gradient descent algorithm. Conversely, for α = 0.01, the model predicts with an accuracy of 91.15% for the 315 m2 sample land size and 80% for 140 m2, if normal equation is used. Hence, the model predicts best with gradient descent algorithm when the learning rate, α = 0.3 and number of iterations, i* = 100 and it predicts best with normal equation with same accuracy as batch gradient descent algorithm when α = 0.01 and i* = 100. It can be inferred that at α = 0.3, 0.2, 0.1, and i* = 100, the batch gradient descent algorithm learns fast with an appropriate Kifle Berhane Niguse (MEJS) Volume 15(1):89-104, 2023 © CNCS, Mekelle University 102 ISSN: 2220-184X learning rate as shown in figure 7(a to d) in order of preference. When α = 0.01 and i* = 100, the algorithm learns so slowly even though the performance of price prediction is the same as with what the batch gradient algorithm does. (a) (b) (c) (d) Figure 7. Gradient descent convergence for (a): α = 0.3, i* = 100, (b): α = 0.2, i* = 100, (c): α = 0.1, i* = 100, and (d): α = 0.01, i* = 100. Therefore, the best price predictor mode would be when α = 0.3 and i* = 100 (Fig 7a) in 𝜃0 = 11110.98, 𝜃1 = 640.15, and 𝜃2 = 674.27 with a minimum cost value as shown in figure 7(a) with MLR. Figure 7 shows how and when the gradient descent converges or diverges. The quantitative output (for the model parameters and price) obtained after considering scenarios A - D is summarized in table2 for the case of MLR. 4. CONCLUSION Each iteration in this paper is performed simultaneously using the gradient descent update (algorithm). For every step of gradient descent, each fitting parameter gets closer to the optimal Kifle Berhane Niguse (MEJS) Volume 15(1):89-104, 2023 © CNCS, Mekelle University 103 ISSN: 2220-184X values that achieve the lowest cost value. The cost decreases with each step and iteration; the cost function has never increased and proved that the gradient descent algorithm functioned appropriately and converged to a steady state value that the algorithm terminates on. The final parameters’ values helped to make price predictions on new input features. When the sample training set becomes large, the cost function increases. Learning rates are on a log scale such as 0.01. 0.02, 0.03, 0.1, 0.2, and 0.3 were in a better range. A Too small learning rate led the batch gradient descent algorithm to converge slowly. A learning rate that is too small blows up and increases the value of the cost function. When the learning rate is too large, gradient descent overshoots the minimum failed to converge or diverge to some steady state cost value; the cost function was with a value of infinity. In the case of LR, using a learning rate of 0.01, the best fitting parameters found by training the dataset are 6.02 for theta1 and 2.30 for theta2 in order with a cost of 67.82; the model predicts with an accuracy of 92.6%. However, for MLR, the batch gradient descent algorithm converges fast with a learning rate of 0.3 and some iterations amounting to 100. The model achieves a prediction accuracy of 91.15% for the 315 m2 sample land size and 80% for 140 m2 with the best fitting parameters of 11110.98, 640.15, and 674.27 for theta0, theta1, and theta2 in order. This model predicts the price of urban land leases for Mekelle city administration based on time series data trained with machine learning and econometric models unified for better optimality. Designing such economic models enhances the predictability of important macroeconomic terms. As a result, consumers may be interested in how much to bid per square meter of the land lease in the city. The output of this research may support the government’s macroeconomic decisions, interventions, policy formulation, and targeting. 5. ACKNOWLEDGMENTS The author is highly indebted to Mekelle City Administration's core process of Land Management and Development and to Ms. Seble Assefa, Urban Land Management Expert at Tigray Regional State, for their generosity in the historical land lease prices datasets. The author is also thankful to Maarig Aregawi (Assistant Professor), Mekelle Institute of Technology, Mekelle University, Teame Hailemariam (Assistant Professor), College of Business and Economics, Mekelle University, for their meticulous and relentless support and encouragement. Kifle Berhane Niguse (MEJS) Volume 15(1):89-104, 2023 © CNCS, Mekelle University 104 ISSN: 2220-184X 6. REFERENCE Athey, S. 2017. The Impact of Machine Learning on Economics. In: Ajay Agrawal, Joshua Gans, and Avi Goldfarb (eds.), The Economics of Artificial Intelligence: An Agenda, University of Chicago Press, ISBN: 978-0-226-61333-8, http://www.nber.org/books/agra-1, Conference volume, pp.507-547. Athey, S & Imbens, G. W. 2019. Machine Learning Methods That Economists Should Know About. Annual Review of Economics, 11: 685-725. El Naqa, I & Murphy, M. J. 2015. What Is Machine Learning? In: El Naqa, I., Li, R., Murphy, M. (eds), Machine Learning in Radiation Oncology, 1: 3-11. Elena Badilo. 2019. The Impact of Machine Learning on Economics: What Machine Learning Can (and Cannot) Do for Economic Research. Chicago Policy Review. Available at: https://chicagopolicyreview.org/2019/01/21/the-impact-of-machine-learning-on- economics- what-machine-learning-can-and-cannot-do-for-economic-research/ Gharehchopogh, F. S. 2013. A Linear Regression Approach to Prediction of Stock Market Trading Volume: A Cases Study. International Journal of Managing Value and Supply Chains, 4(3): 1-7. Giles, A. 2018. Machince learning and Economics. Financial Conduct Authority. Availae at: https://spe.org.uk/site/assets/files/5204/adam_giles_slides.pdf Kleinberg, J & Ludwig, J. 2016. Prediction Policy Problems. American Economic Review, 105(5): 491-95. McKenzie, D. 2018. How can machine learning and artificial intelligence be used in development interventions and impact evaluations? World Bank Blogs. Available at:https://blogs.worldbank.org/impactevaluations/how-can-machine-learning-and-artificial- intelligence-be-used-development-interventions-and-impact. Varian, H. R. 2016. Causal inference in economics and marketing. PNAS.org, 113 (27): 7310-15. Wang, L & Zhu, J. 2010. Financial market forecasting using a two-step kernel learning method for the support vector regression. Annals of Operation Research, 174: 103-120 Wu, Chun-Hsin., Chia-Chen, Wei., Da-Chun, Su., Ming-Hua, Chang & Jan-Ming, Ho. 2003. Travel time prediction with support vector regression. Proceedings of the 2003 IEEE International Conference on Intelligent Transportation Systems : 2(2): 1438-1442.