http://journal.uir.ac.id/index.php/JGEET E-ISSN : 2541-5794 P-ISSN : 2503-216X Journal of Geoscience, Engineering, Environment, and Technology Vol 08 No 02-2 2023 Special Edition Special Issue from β€œThe 1st International Conference on Upstream Energy Technology and Digitalization (ICUPERTAIN) 2022” Akmal, F. et al./ JGEET Vol 08 No 02-2 2023 6 Special Issue from The 1st International Conference on Upstream Energy Technology and Digitalization (ICUPERTAIN) 2022 RESEARCH ARTICLE Machine learning prediction of tortuosity in digital rock Fadhillah Akmal1, M. Cisco Ramadhan Dzulizar1, Muhammad Faizal Rafli1, Fatimah Az- Zahra1, M. I. Khoirul Haq1, Irwan Ary Dharmawan1,* 1 Department of Geophysics, Faculty of Mathematics and Natural Science, Universitas Padjadjaran, Raya Bandung Sumedang km. 21 Street, Jatinangor 45363, Indonesia * Corresponding author : iad@geophys.unpad.ac.id Received: May 20, 2023. Revised : May 31, 2023, Accepted: June 10, 2023, Published: July 31, 2023 DOI: 10.25299/jgeet.2023.8.02-2.13875 Abstract Physical rock property measurement is an important stage in energy exploration, both for hydrocarbons and geothermal sources. The value of physical rock properties can provide information about reservoir quality, and one of these properties is tortuosity. Tortuosity is an intrinsic property of porous materials that describes the level of complexity of the porous arrangement when a fluid passes through it. Conventionally, tortuosity values are measured through laboratory analysis and numerical simulation, but these measurements can take a long time. An alternative method for measuring tortuosity is using machine learning with a convolutional neural network (CNN). A CNN is a type of deep neural network designed to analyze multi-channel images and has been applied successfully to classification and non- linear regression problems. By training a CNN on a dataset of digital rock samples that have been simulated using numerical computation to obtain their tortuosity values, it is possible to demonstrate that CNNs can accurately predict the tortuosity of digital r ock. The result is that the CNN model can predict tortuosity values with the Xception model being the most accurate with the lowest RMSE value of 0.90962. Keywords: Tortuosity, Digital Rock, Machine Learning, Convolutional Neural Network 1. Introduction Porous media are important in the field of energy exploration, where the properties of porous rocks can provide a lot of useful information, especially on hydrocarbon sources or geothermal sources. One of those properties is tortuosity. Tortuosity is an intrinsic property of porous media that describes the level of complexity of fluid pathways, which is described as the length of the pathway relative to its effective length, as shown in Fig. 1. Regarding the condition of the reservoir, it is crucial to identify reservoir rock features such tortuosity. However, since analytical solutions cannot be used, this requires laboratory testing or numerical simulations, making it challenging to complete (Ladopoulos, 2014). The determination of tortuosity values may also be carried out using machine learning techniques. Fig. 1. Depiction of tortuosity in porous media. The red line indicates the pathway that can be travelled by fluid, the blue line indicates the effective length of the pathway. Machine learning has been developed and applied widely in science and technology fields such as production optimization and hydrocarbon drilling. It has also begun to be used to simplify and accelerate the computational process in the estimation of physical parameters of porous rocks. Artificial neural networks (ANNs) are one of the popular machine learning models used to tackle complex problems. ANN algorithms are modelled on human nerves that adaptively train to complete a task. A typical ANN structure consists of several layers, each with a number of perceptrons. Perceptrons are the fundamental building blocks of ANNs and are modelled on the neurons in human brain networks. In an ANN, the input for one layer serves as the output for the following layer. One of the ANN algorithms that is frequently applied to solve picture recognition issues is the convolutional neural network (CNN). CNN uses a convolution method that applies filters of a specific size to various input data locations, resulting in the creation of new representative information from the convolution of the input data and the filters. This output from the convolution is then used as the input for the next layer of the neural network. Because the feature extraction and training processes in the CNN algorithm are carried out simultaneously by the computer, it is a good solution for estimating the physical properties of porous rocks with intricate patterns. CNN method has been proven to be used to predict the fundamental quantity value of porous media (Graczyk and Matyka, 2020). In recent studies, transfer learning, or the method of using pre-trained CNN models has obtained good results in determining rock parameters, especially on small datasets and out of range problems (Tang et al., 2022). http://journal.uir.ac.id/index.php/JGEET Akmal, F. et al./ JGEET Vol 08 No 02-2 2023 7 Special Issue from The 1st International Conference on Upstream Energy Technology and Digitalization (ICUPERTAIN) 2022 Table 1. Digital rock sample and the amount of digital rock data used in the study. There is an experimental error of Β± 0.5 % for porosity and Β± 10 % for permeability (Neumann et al., 2021). No Name Porosity Permeability Formation Number of Data 1 Bandera brown sandstone 24.11 % 63 mD Desmoinesian 2740 2 Bentheimer sandstone 22.64 % 22.64 mD Valaginian 2740 3 Berea cores sandstone 18.96% 18.96 mD Upper Devonian 2740 4 Buff berea sandstone 24.02 % 275 mD Upper Devonian 2700 5 Castlegate sandstone 26.54 % 269 mD Late Cretaceous 2700 6 Leopard sandstone 20.22 % 327 mD Paleozoic 2690 7 Parker sandstone 14.77 % 10 mD Paleozoic 2250 8 Kirby sandstone 19.95 % 62 mD - 2740 The general layout of the layers of the CNN architecture is shown in next section. The CNN was chosen for this research because it allows for the calculation of physical parameter values in a shorter time without the need for laboratory testing by injecting fluid into porous rocks, thereby avoiding damage to the porous rocks. In order to conduct this research, a 3D digital rock sample of sandstone was obtained from the Digital Rock Portal and pre-processed before being input into the CNN architecture (Neumann et al., 2020). 2. Material and methods This research begins with the creation of datasets from several types of digital rock data. The rock data used is porous rock data from CT-scan images. The digital rocks used in this study are a set of sandstone samples. Sandstone lithology considered a classic sedimentary rock primarily comprised of quartz, silica, and sand-sized minerals, which is converted into a three-dimensional array of size 1000 x 1000 x 1000 voxels. Table 1 shows digital rock sample and the amount of digital rock data used in the study. This array consists of a value of 0, representing a rock pore, and a value of 1, representing an obstacle. The CT-scan array is then resampled to a smaller array of size 128 x 128 x 128 voxels in order to lighten the computational load and create a larger dataset. The three-dimensional array is then calculated based on the connected path between its boundaries to obtain its tortuosity value. This was done with the help of Tort3D software. The way the software works is by reading digital rock data and looking for connected paths in data that has void space. After getting all the valid paths, the tortuosity value of one of the flow directions can be calculated as average of all path length divided by size of the image in the direction of flow (Al- Raoush and Madhoun, 2017). The calculation results are then used as dataset labels in the machine learning model architecture. The total number of datasets totals 21300 images, with a range of tortuosity values between 1.08 - 113.09. The largest distribution of data is in the range of 1.08 to 13.52, totaling 19764 data. The distribution of dataset tortuosity value is shown in Fig. 2. Instead of using a three-dimensional array as input for machine learning, certain parts of the array are selected to represent the entire array. These selected parts are slices of the plane on the three main axes, each 128 x 128 in size. These three plane slices are then stacked into a three- channel image as a synthetic RGB image. The total dataset consists of 21,300 samples as shown in Table 2, with 1,300 set aside as testing data that are not used in training. The remaining 20,000 samples are divided into a train- validation dataset, with 85% designated as the training set and 15% as the validation set. The illustration of the creation of synthetic RGB images is shown in Fig. 3. Fig. 2. Distribution of dataset tortuosity value. This research uses a Transfer Learning strategy by utilizing four types of pre-trained model architectures in order to obtain the best model. Transfer Learning is a machine learning technique in which a model is trained and developed for one task and then reused for a second, related task. It involves exploiting what has been learned in one setting to improve optimization in another setting (Gao and Mosalam, 2018). The Transfer Learning strategy is applied in this study to reduce training time and speed up the process of obtaining models with small errors. A pre- trained model is a model that has been trained on a large benchmark dataset and is capable of solving problems similar to the new problem that needs to be solved (Iorga and Neagoe, 2019). In this research, the problem to be solved is image recognition. The pre-trained model architectures used in this research are those available in the Keras library (https://keras.io/api/applications/), which have been tested and have good performance. The four types of pre-trained models used in this research are DenseNet201, Xception, InceptionV3, and MobileNetV2. Stages of synthetic. Convolutional neural network work in projecting images can be seen in Fig. 4. 8 Akmal, F. et al./ JGEET Vol 08 No 02-2 2023 Special Issue from The 1st International Conference on Upstream Energy Technology and Digitalization (ICUPERTAIN) 2022 Fig. 3. Stages of synthetic RGB image creation. Fig. 4. Convolutional neural network work in projecting images. Dense convolutional networks or DenseNet is a pre- trained model architecture built with a structure where each layer is connected to subsequent layers (Huang et al., 2017) as shown in Fig. 5. DenseNet architecture can reduce the occurrence of overfitting by utilizing dense connection techniques especially when the number of datasets used is small (Talo, 2019). Fig. 5. DenseNet201 architecture. Extreme Inception or better known as Xception is a model that uses the Depthwise Separable Convolution technique in its architecture as shown in Fig. 6. The Xception architecture consists of three main parts, namely Entry Flow, Middle Flow, and Exit Flow. Xception is noted to have better performance than InceptionV3 even though it has fewer parameters (Chollet, 2017). MobileNet is a pre-trained model architecture that utilizes depthwise separable convolution in its architecture which is a combination of depthwise convolution and pointwise convolution. The MobileNet architecture has a total of 28 layers with its architectural illustration shown in Fig. 7 (Howard et al., 2017). The architecture of the CNN model used can be seen in Fig. 8. The model is trained by learning the relationship between the input which is an RGB synthetic image of digital rocks and the labels which is the actual tortuosity value until the error between the predicted value and the actual value is minimized. The predicted value is the result of machine learning trying to get the tortuosity value from the synthetic RGB image. While the actual value is the dataset labels that are the tortuosity values obtained based on software calculations. In order to compare the performance and prediction accuracy of the different algorithms, three metrics are used as a loss function to determine the error value: mean absolute error (MAE), Root Mean Square Error (RMSE), and R2. MAE is a function used for regression models (Eqn. 1). MAE is the sum of absolute differences between the target and independent variables. It measures the average of the residuals, where 𝑛 represents the number of observations, 𝐹𝑖 is the predicted price at the point of sale 𝑖 and 𝐴𝑖 is the actual value. MAE is very good to use when there are outliers in the data and has a simple interpretation (Ansari and Binninger, 2022). 𝑀𝐴𝐸 = βˆ‘ | π΄π‘–βˆ’πΉπ‘– 𝐴𝑖 |𝑛𝑖=1 (1) Akmal, F. et al./ JGEET Vol 08 No 02-2 2023 9 Special Issue from The 1st International Conference on Upstream Energy Technology and Digitalization (ICUPERTAIN) 2022 Fig. 6. Xception architecture. Fig. 7. MobileNetV2 architecture. Fig. 8. Architecture of the CNN models, the input of the model is the synthetic RGB image data created from the digital rock, and the output of the model is the predicted tortuosity value of the rock. 10 Akmal, F. et al./ JGEET Vol 08 No 02-2 2023 Special Issue from The 1st International Conference on Upstream Energy Technology and Digitalization (ICUPERTAIN) 2022 RMSE is another commonly used metric to evaluate the accuracy of predictions obtained by a model (Eqn. 2). It takes the residuals between actual and predicted values and compares the prediction errors of different models for particular data. This metric is very useful for measuring how close the prediction is to the actual value, and gives a larger penalty to large errors. It is therefore suitable in cases where the difference between the predicted and actual values is critical, such as in rock modelling (Dandekar et al., 2018). 𝑅𝑀𝑆𝐸 = √ 1 𝑛 βˆ‘ (𝐴𝑖 βˆ’ 𝐹𝑖 ) 2𝑛 𝑖=1 (2) The variable 𝑅2 is a widely used statistical measure in regression-based machine learning (Eqn. 3). It indicates the percentage of the variance in the dependent variable that the independent variables explain collectively. The closer the value of 𝑅2 to 1, the better the model is fitted. R-squared provides information on how well the linear model fits the observed data and how much variation in the data can be explained by the model. 𝑅2 = 1 βˆ’ βˆ‘ (π΄πΌβˆ’πΉπ‘– ) 2𝑛 𝑖=1 βˆ‘ 𝐴𝑖 2𝑛 𝑖=1 (3) 3. Results and discussion Four CNN models were trained using 20,000 digital porous rock data that was converted into synthetic RGB images, then the model was tested using 1,300 images from datasets. Different CNN model performance was measured by looking at three metrics to determine the error value, namely MAE, RMSE, and R2. The performance of each model can be seen in Table 2 as follows. Table 2. Evaluation result of DenseNet201, InceptionV3, MobileNetV2, and Xception in tortuosity value prediction. No Model MAE RMSE R2 1 DenseNet201 33.539 1.02666 0.98635 2 InceptionV3 34.538 0.98445 0.98724 3 MobileNetV2 32.552 1.02465 0.98737 4 Xception 33.901 0.90962 0.98636 Based on Table 2, it is known that the pre-trained model with the smallest error is produced by the MobileNetV2 architecture with an MAE of 32.552, then the Xception model with an RMSE value of 0.90962, and MobileNetV2 with R2 value of 0.98737. The results show that the Xception model is the best model out of the four models. The four models all have R2 values over 0.98 and similar MAE values, indicating that they are all quite accurate. The RMSE values of the four models are then comparable, with the Xception model having the best RMSE value at 0.90962. This demonstrates that, when compared to the other four models, the Xception model has the least error and makes predictions that are most accurate. The ability of the CNN model to predict values of data can also be seen using a scatter plot with the horizontal axis being the actual tortuosity value and the vertical axis being the predicted tortuosity value. The scatter plot result of the models can be seen in Fig. 9. The data plotted in Fig. 9 is prediction data that has undergone outlier reduction. This reduction also shows that the CNN model created is able to have good performance under certain conditions and in a certain range. The out-of-range problem case was also found in research on using a CNN model to predict the permeability of synthetic rocks (Tang et al., 2022). (a) (b) (c) (d) Fig. 9. Predicted tortuosity values compared to actual values in small tortuosity value, (a) result from Xception model, (b) result from InceptionV3 model, (c) result from DenseNet201 model, and (d) result from MobileNetV2. Akmal, F. et al./ JGEET Vol 08 No 02-2 2023 11 Special Issue from The 1st International Conference on Upstream Energy Technology and Digitalization (ICUPERTAIN) 2022 The black diagonal line represents the real tortuosity value of the data, with the blue dots indicating the distribution of model-predicted values. Of the 1,300 test data, predictions with a small range of values were taken into account. As shown in Table 2, the Xception model had more accurate results compared to the other models. This can be seen from the predictions, which are close to the actual result value. The Xception model has a smaller RMSE value and has similar MAE and R2 value. These models are only able to accurately predict at smaller tortuosity values, with increasing inaccuracy as the tortuosity values increase. This research has shown that the input to the CNN model is not three-dimensional rock data itself, but rather slices of data that are combined to form a synthetic RGB image representing the whole data set. The data also has a non-uniform distribution and is largely comprised of data with small tortuosity values. These factors contribute to the model's low prediction accuracy and its ability to accurately predict only data within a narrow range of small tortuosity values. Fig. 10. Boxplot showing the distribution of tortuosity values of the dataset. As shown in Fig. 10, a box plot is used to visualise the distribution of the data. In the plot, the red line shows the median of the dataset, and where the two whisker lines limit the values where most of the dataset is located. The circle marks above the upper whisker line are outlier data values that have a large tortuosity value. Which reveals that the data in the dataset is concentrated at a smaller value of tortuosity. There is also a large number of outlier data points that can potentially impact the quality of the data. This clustering of data distribution is also reflected in the model's predictions, which are most accurate when predicting values within the small value range where the data is most densely distributed. From the results obtained, we can see the limitations of using the CNN method in this study. Where the accuracy of the model prediction will depend on the consistency of the data set used. Where there is an out-of-range problem, it will interfere with the performance of the model. Where outlier data value is vastly different from the training data, the model prediction will be inaccurate. Another limitation is the difficulty to predict rocks using larger data sets, such as three-dimensional rock data or higher resolution rock images. due to the application's limited memory and processing time. Moreover, only sandstone rocks are used in this study. The results obtained by other types of rocks have not yet been evaluated. 4. Conclusion This research developed a machine learning model using a CNN algorithm to estimate the physical parameters of digital rock tortuosity. The CNN model was selected from among several pre-trained model architectures, including MobileNetV2, DenseNet201, InceptionV3, and Xception, based on its performance. The results suggest that all four models are quite accurate, with the Xception model being the most accurate with the lowest RMSE value of 0.90962 and MAE and R2 values that are comparable to other models. However, the model's predictions were found to be most accurate for small tortuosity values, with decreasing accuracy as tortuosity values increased. Further research is needed to improve the performance and accuracy of the model, including the inclusion of additional rock types other than sandstone and a more balanced distribution of tortuosity values in the dataset. Acknowledgements The authors acknowledge the Department of Geophysics Universitas Padjadjaran supercomputing resources "RockExplorer" made available for conducting the research reported in this paper. 12 Akmal, F. et al./ JGEET Vol 08 No 02-2 2023 Special Issue from The 1st International Conference on Upstream Energy Technology and Digitalization (ICUPERTAIN) 2022 References Al-Raoush, R.I., Madhoun, I.T., 2017. TORT3D: A MATLAB code to compute geometric tortuosity from 3D images of unconsolidated porous media. Powder Technology 320, 99–107. https://doi.org/10.1016/j.powtec.2017.06.066 Ansari, O.B., Binninger, F.-M., 2022. A deep learning approach for estimation of price determinants. International Journal of Information Management Data Insights 2, 100101. https://doi.org/10.1016/j.jjimei.2022.100101 Chollet, F., 2017. Deep learning with Python. Manning Publications, New York. Dandekar, A.Y., Sondergeld, C.H., Rai, C.S., 2018. Machine learning for digital rock characterization: Opportunities and challenges. Geophysics 83, MR13– MR23. Gao, Y., Mosalam, K.M., 2018. Deep transfer learning for image-based structural damage recognition. Computer-Aided Civil and Infrastructure Engineering 33, 748–768. Graczyk, K.M., Matyka, M., 2020. Predicting porosity, permeability, and tortuosity of porous media from images by deep learning. Scientific Reports 10, 21488. https://doi.org/10.1038/s41598-020-78415-x Howard, A.G., Zhu, M., Chen, B., Kalenichenko, D., Wang, W., Weyand, T., Andreetto, M., Adam, H., 2017. MobileNets: Efficient convolutional neural networks for mobile vision applications. https://doi.org/10.48550/arXiv.1704.04861 Huang, G., Liu, Z., van der Maaten, L., Weinberger, K.Q., 2017. Densely connected convolutional networks. Presented at the Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, IEEE, pp. 4700–4708. Iorga, C., Neagoe, V.-E., 2019. A deep CNN approach with transfer learning for image recognition, in: 2019 11th International Conference on Electronics, Computers and Artificial Intelligence (ECAI). Presented at the 2019 11th International Conference on Electronics, Computers and Artificial Intelligence (ECAI), IEEE, Pitesti, Romania, pp. 1–6. https://doi.org/10.1109/ECAI46879.2019.9042173 Ladopoulos, E.G., 2014. Non-linear three-dimensional porous medium analysis in petroleum reservoir engineering. Universal Journal of Fluid Mechanics 2, 1–11. Neumann, R., Andreeta, M., Lucas-Oliveira, E., 2020. 11 Sandstones: raw, filtered and segmented data [WWW Document]. URL www.digitalrocksportal.org (accessed 12.27.22). Neumann, R.F., Barsi-Andreeta, M., Lucas-Oliveira, E., Barbalho, H., Trevizan, W.A., Bonagamba, T.J., Steiner, M.B., 2021. High accuracy capillary network representation in digital rock reveals permeability scaling functions. Scientific Reports 11, 11370. https://doi.org/10.1038/s41598-021-90090-0 Talo, M., 2019. Convolutional neural networks for multi- class histopathology image classification. arXiv:1903.10035. Tang, P., Zhang, D., Li, H., 2022. Predicting permeability from 3D rock images based on CNN with physical information. Journal of Hydrology 606, 127473. https://doi.org/10.1016/j.jhydrol.2022.127473 Β© 2023 Journal of Geoscience, Engineering, Environment and Technology. All rights reserved. This is an open access article distributed under the terms of the CC BY-SA License (http://creativecommons.org/licenses/by- sa/4.0/). http://creativecommons.org/licenses/by-sa/4.0/ http://creativecommons.org/licenses/by-sa/4.0/ http://creativecommons.org/licenses/by-sa/4.0/