http://journal.uir.ac.id/index.php/JGEET 
 

E-ISSN : 2541-5794  
   P-ISSN : 2503-216X  

Journal of Geoscience,  
Engineering, Environment, and Technology 
Vol 08 No 02-2 2023 Special Edition 
Special Issue from “The 1st International Conference on Upstream Energy Technology and 
Digitalization (ICUPERTAIN) 2022” 

 
Akmal, F. et al./ JGEET Vol 08 No 02-2 2023 6 

Special Issue from The 1st International Conference on Upstream Energy Technology and Digitalization (ICUPERTAIN) 2022 

RESEARCH ARTICLE 
 

Machine learning prediction of tortuosity in digital rock 

Fadhillah Akmal1, M. Cisco Ramadhan Dzulizar1, Muhammad Faizal Rafli1, Fatimah Az-
Zahra1, M. I. Khoirul Haq1, Irwan Ary Dharmawan1,* 

1 Department of Geophysics, Faculty of Mathematics and Natural Science, Universitas Padjadjaran, Raya Bandung Sumedang km. 21 Street, Jatinangor 45363, 
Indonesia 

 
* Corresponding author : iad@geophys.unpad.ac.id 
Received: May 20, 2023. Revised : May 31, 2023, Accepted: June 10, 2023, Published: July 31, 2023 
DOI: 10.25299/jgeet.2023.8.02-2.13875 
 

Abstract 

Physical rock property measurement is an important stage in energy exploration, both for hydrocarbons and geothermal sources. The 
value of physical rock properties can provide information about reservoir quality, and one of these properties is tortuosity.  Tortuosity is 
an intrinsic property of porous materials that describes the level of complexity of the porous arrangement when a fluid passes through it. 
Conventionally, tortuosity values are measured through laboratory analysis and numerical simulation, but these measurements can take 
a long time. An alternative method for measuring tortuosity is using machine learning with a convolutional neural network (CNN). A CNN 
is a type of deep neural network designed to analyze multi-channel images and has been applied successfully to classification and non-
linear regression problems. By training a CNN on a dataset of digital rock samples that have been simulated using numerical computation 
to obtain their tortuosity values, it is possible to demonstrate that CNNs can accurately predict the tortuosity of digital r ock. The result is 
that the CNN model can predict tortuosity values with the Xception model being the most accurate with the lowest RMSE value of 0.90962. 
 
Keywords: Tortuosity, Digital Rock, Machine Learning, Convolutional Neural Network 
 

1. Introduction  

Porous media are important in the field of energy 
exploration, where the properties of porous rocks can 
provide a lot of useful information, especially on 
hydrocarbon sources or geothermal sources. One of those 
properties is tortuosity. Tortuosity is an intrinsic property 
of porous media that describes the level of complexity of 
fluid pathways, which is described as the length of the 
pathway relative to its effective length, as shown in Fig. 1. 
Regarding the condition of the reservoir, it is crucial to 
identify reservoir rock features such tortuosity. However, 
since analytical solutions cannot be used, this requires 
laboratory testing or numerical simulations, making it 
challenging to complete (Ladopoulos, 2014). The 
determination of tortuosity values may also be carried out 
using machine learning techniques. 

 
Fig. 1. Depiction of tortuosity in porous media. The red line indicates 
the pathway that can be travelled by fluid, the blue line indicates the 

effective length of the pathway. 

Machine learning has been developed and applied 
widely in science and technology fields such as production 
optimization and hydrocarbon drilling. It has also begun to 
be used to simplify and accelerate the computational 
process in the estimation of physical parameters of porous 
rocks. Artificial neural networks (ANNs) are one of the 
popular machine learning models used to tackle complex 
problems. ANN algorithms are modelled on human nerves 
that adaptively train to complete a task. A typical ANN 
structure consists of several layers, each with a number of 
perceptrons. Perceptrons are the fundamental building 
blocks of ANNs and are modelled on the neurons in human 
brain networks. In an ANN, the input for one layer serves as 
the output for the following layer. 

One of the ANN algorithms that is frequently applied to 
solve picture recognition issues is the convolutional neural 
network (CNN). CNN uses a convolution method that 
applies filters of a specific size to various input data 
locations, resulting in the creation of new representative 
information from the convolution of the input data and the 
filters. This output from the convolution is then used as the 
input for the next layer of the neural network. Because the 
feature extraction and training processes in the CNN 
algorithm are carried out simultaneously by the computer, 
it is a good solution for estimating the physical properties 
of porous rocks with intricate patterns. CNN method has 
been proven to be used to predict the fundamental quantity 
value of porous media (Graczyk and Matyka, 2020). In 
recent studies, transfer learning, or the method of using 
pre-trained CNN models has obtained good results in 
determining rock parameters, especially on small datasets 
and out of range problems (Tang et al., 2022).

http://journal.uir.ac.id/index.php/JGEET


Akmal, F. et al./ JGEET Vol 08 No 02-2 2023 7 

Special Issue from The 1st International Conference on Upstream Energy Technology and Digitalization (ICUPERTAIN) 2022 

Table 1. Digital rock sample and the amount of digital rock data used in the study. There is an experimental error of ± 0.5 % for porosity 
and ± 10 % for permeability (Neumann et al., 2021). 

No Name Porosity Permeability Formation Number of Data 

1 Bandera brown 
sandstone 

24.11 % 63 mD Desmoinesian 2740 

2 Bentheimer sandstone 22.64 % 22.64 mD Valaginian 2740 

3 Berea cores sandstone 18.96% 18.96 mD Upper Devonian 2740 

4 Buff berea sandstone 24.02 % 275 mD Upper Devonian 2700 

5 Castlegate sandstone 26.54 % 269 mD Late Cretaceous 2700 

6 Leopard sandstone 20.22 % 327 mD Paleozoic 2690 

7 Parker sandstone 14.77 % 10 mD Paleozoic 2250 

8 Kirby sandstone 19.95 % 62 mD - 2740 

 
The general layout of the layers of the CNN architecture is 
shown in next section. 

The CNN was chosen for this research because it allows 
for the calculation of physical parameter values in a shorter 
time without the need for laboratory testing by injecting 
fluid into porous rocks, thereby avoiding damage to the 
porous rocks. In order to conduct this research, a 3D digital 
rock sample of sandstone was obtained from the Digital 
Rock Portal and pre-processed before being input into the 
CNN architecture (Neumann et al., 2020). 

2. Material and methods 

This research begins with the creation of datasets from 
several types of digital rock data. The rock data used is 
porous rock data from CT-scan images. The digital rocks 
used in this study are a set of sandstone samples. Sandstone 
lithology considered a classic sedimentary rock primarily 
comprised of quartz, silica, and sand-sized minerals, which 
is converted into a three-dimensional array of size 1000 x 
1000 x 1000 voxels. Table 1 shows digital rock sample and 
the amount of digital rock data used in the study. 

This array consists of a value of 0, representing a rock 
pore, and a value of 1, representing an obstacle. The CT-scan 
array is then resampled to a smaller array of size 128 x 128 
x 128 voxels in order to lighten the computational load and 
create a larger dataset. The three-dimensional array is then 
calculated based on the connected path between its 
boundaries to obtain its tortuosity value. This was done 
with the help of Tort3D software. The way the software 
works is by reading digital rock data and looking for 
connected paths in data that has void space. After getting all 
the valid paths, the tortuosity value of one of the flow 
directions can be calculated as average of all path length 
divided by size of the image in the direction of flow (Al-
Raoush and Madhoun, 2017). 

The calculation results are then used as dataset labels in 
the machine learning model architecture. The total number 
of datasets totals 21300 images, with a range of tortuosity 
values between 1.08 - 113.09. The largest distribution of 
data is in the range of 1.08 to 13.52, totaling 19764 data. The 
distribution of dataset tortuosity value is shown in Fig. 2. 

Instead of using a three-dimensional array as input for 
machine learning, certain parts of the array are selected to 
represent the entire array. These selected parts are slices of 
the plane on the three main axes, each 128 x 128 in size. 
These three plane slices are then stacked into a three-

channel image as a synthetic RGB image. The total dataset 
consists of 21,300 samples as shown in Table 2, with 1,300 
set aside as testing data that are not used in training. The 
remaining 20,000 samples are divided into a train-
validation dataset, with 85% designated as the training set 
and 15% as the validation set. The illustration of the 
creation of synthetic RGB images is shown in Fig. 3.  

 
Fig. 2. Distribution of dataset tortuosity value. 

 
This research uses a Transfer Learning strategy by 

utilizing four types of pre-trained model architectures in 
order to obtain the best model. Transfer Learning is a 
machine learning technique in which a model is trained and 
developed for one task and then reused for a second, related 
task. It involves exploiting what has been learned in one 
setting to improve optimization in another setting (Gao and 
Mosalam, 2018). The Transfer Learning strategy is applied 
in this study to reduce training time and speed up the 
process of obtaining models with small errors. A pre-
trained model is a model that has been trained on a large 
benchmark dataset and is capable of solving problems 
similar to the new problem that needs to be solved (Iorga 
and Neagoe, 2019). In this research, the problem to be 
solved is image recognition. The pre-trained model 
architectures used in this research are those available in the 
Keras library (https://keras.io/api/applications/), which 
have been tested and have good performance. The four 
types of pre-trained models used in this research are 
DenseNet201, Xception, InceptionV3, and MobileNetV2. 
Stages of synthetic. Convolutional neural network work in 
projecting images can be seen in Fig. 4.


8  Akmal, F. et al./ JGEET Vol 08 No 02-2 2023 

Special Issue from The 1st International Conference on Upstream Energy Technology and Digitalization (ICUPERTAIN) 2022 

 
Fig. 3. Stages of synthetic RGB image creation. 

 
Fig. 4. Convolutional neural network work in projecting images. 

 
Dense convolutional networks or DenseNet is a pre-

trained model architecture built with a structure where 
each layer is connected to subsequent layers (Huang et al., 
2017) as shown in Fig. 5. DenseNet architecture can reduce 
the occurrence of overfitting by utilizing dense connection 
techniques especially when the number of datasets used is 
small (Talo, 2019). 

 
Fig. 5. DenseNet201 architecture. 

 
Extreme Inception or better known as Xception is a 

model that uses the Depthwise Separable Convolution 
technique in its architecture as shown in Fig. 6. The 
Xception architecture consists of three main parts, namely 
Entry Flow, Middle Flow, and Exit Flow. Xception is noted 
to have better performance than InceptionV3 even though 
it has fewer parameters (Chollet, 2017). 

MobileNet is a pre-trained model architecture that 
utilizes depthwise separable convolution in its architecture 
which is a combination of depthwise convolution and 
pointwise convolution. The MobileNet architecture has a 
total of 28 layers with its architectural illustration shown in 
Fig. 7 (Howard et al., 2017). 

The architecture of the CNN model used can be seen in 
Fig. 8. The model is trained by learning the relationship 
between the input which is an RGB synthetic image of 
digital rocks and the labels which is the actual tortuosity 
value until the error between the predicted value and the 
actual value is minimized. The predicted value is the result 
of machine learning trying to get the tortuosity value from 
the synthetic RGB image. While the actual value is the 
dataset labels that are the tortuosity values obtained based 
on software calculations. 

In order to compare the performance and prediction 
accuracy of the different algorithms, three metrics are used 
as a loss function to determine the error value: mean 
absolute error (MAE), Root Mean Square Error (RMSE), and 
R2.  

MAE is a function used for regression models (Eqn. 1). 
MAE is the sum of absolute differences between the target 
and independent variables. It measures the average of the 
residuals, where 𝑛 represents the number of observations, 
𝐹𝑖  is the predicted price at the point of sale 𝑖 and 𝐴𝑖  is the 
actual value. MAE is very good to use when there are 
outliers in the data and has a simple interpretation (Ansari 
and Binninger, 2022). 

 
 𝑀𝐴𝐸  =  ∑ |
𝐴𝑖−𝐹𝑖

𝐴𝑖
|𝑛𝑖=1                         (1) 

 
Akmal, F. et al./ JGEET Vol 08 No 02-2 2023 9 

Special Issue from The 1st International Conference on Upstream Energy Technology and Digitalization (ICUPERTAIN) 2022 

 
Fig. 6. Xception architecture. 

 
Fig. 7. MobileNetV2 architecture. 

 
Fig. 8. Architecture of the CNN models, the input of the model is the synthetic RGB image data created from the digital rock, and the 

output of the model is the predicted tortuosity value of the rock. 


10  Akmal, F. et al./ JGEET Vol 08 No 02-2 2023 

Special Issue from The 1st International Conference on Upstream Energy Technology and Digitalization (ICUPERTAIN) 2022 

RMSE is another commonly used metric to evaluate the 
accuracy of predictions obtained by a model (Eqn. 2). It 
takes the residuals between actual and predicted values and 
compares the prediction errors of different models for 
particular data. This metric is very useful for measuring 
how close the prediction is to the actual value, and gives a 
larger penalty to large errors. It is therefore suitable in 
cases where the difference between the predicted and 
actual values is critical, such as in rock modelling (Dandekar 
et al., 2018). 
 

 𝑅𝑀𝑆𝐸 = √
1

𝑛
∑ (𝐴𝑖 − 𝐹𝑖 )

2𝑛
𝑖=1                          (2) 

 
The variable 𝑅2 is a widely used statistical measure in 

regression-based machine learning (Eqn. 3). It indicates the 
percentage of the variance in the dependent variable that 
the independent variables explain collectively. The closer 
the value of 𝑅2 to 1, the better the model is fitted. R-squared 
provides information on how well the linear model fits the 
observed data and how much variation in the data can be 
explained by the model. 

 
𝑅2 = 1 −  
∑ (𝐴𝐼−𝐹𝑖 )

2𝑛
𝑖=1

∑ 𝐴𝑖
2𝑛

𝑖=1

                                                             (3) 

3. Results and discussion 

Four CNN models were trained using 20,000 digital 
porous rock data that was converted into synthetic RGB 
images, then the model was tested using 1,300 images from 
datasets. Different CNN model performance was measured 
by looking at three metrics to determine the error value, 
namely MAE, RMSE, and R2. The performance of each model 
can be seen in Table 2 as follows. 

Table 2. Evaluation result of DenseNet201, InceptionV3, 
MobileNetV2, and Xception in tortuosity value prediction.  

No Model MAE RMSE R2 

1 DenseNet201 33.539 1.02666 0.98635 

2 InceptionV3 34.538 0.98445 0.98724 

3 MobileNetV2 32.552 1.02465 0.98737 

4 Xception 33.901 0.90962 0.98636 

 
Based on Table 2, it is known that the pre-trained model 
with the smallest error is produced by the MobileNetV2 
architecture with an MAE of 32.552, then the Xception 
model with an RMSE value of 0.90962, and MobileNetV2 
with R2 value of 0.98737. The results show that the Xception 
model is the best model out of the four models. The four 
models all have R2 values over 0.98 and similar MAE values, 
indicating that they are all quite accurate. The RMSE values 
of the four models are then comparable, with the Xception 
model having the best RMSE value at 0.90962. This 
demonstrates that, when compared to the other four 
models, the Xception model has the least error and makes 
predictions that are most accurate. 

 The ability of the CNN model to predict values of data 
can also be seen using a scatter plot with the horizontal axis 
being the actual tortuosity value and the vertical axis being 
the predicted tortuosity value. The scatter plot result of the 
models can be seen in Fig. 9. The data plotted in Fig. 9 is 
prediction data that has undergone outlier reduction. This 
reduction also shows that the CNN model created is able to 
have good performance under certain conditions and in a 
certain range. The out-of-range problem case was also 
found in research on using a CNN model to predict the 
permeability of synthetic rocks (Tang et al., 2022). 

 
(a) (b) 

  
(c) (d) 

Fig. 9. Predicted tortuosity values compared to actual values in small tortuosity value, (a) result from Xception model, (b) result  from 
InceptionV3 model, (c) result from DenseNet201 model, and (d) result from MobileNetV2. 


Akmal, F. et al./ JGEET Vol 08 No 02-2 2023 11 

Special Issue from The 1st International Conference on Upstream Energy Technology and Digitalization (ICUPERTAIN) 2022 

The black diagonal line represents the real tortuosity 
value of the data, with the blue dots indicating the 
distribution of model-predicted values. Of the 1,300 test 
data, predictions with a small range of values were taken 
into account. As shown in Table 2, the Xception model had 
more accurate results compared to the other models. This 
can be seen from the predictions, which are close to the 
actual result value. The Xception model has a smaller RMSE 
value and has similar MAE and R2 value. These models are 
only able to accurately predict at smaller tortuosity values, 
with increasing inaccuracy as the tortuosity values 
increase. 

This research has shown that the input to the CNN 
model is not three-dimensional rock data itself, but rather 
slices of data that are combined to form a synthetic RGB 
image representing the whole data set. The data also has a 
non-uniform distribution and is largely comprised of data 
with small tortuosity values. These factors contribute to the 
model's low prediction accuracy and its ability to accurately 
predict only data within a narrow range of small tortuosity 
values.  

 
Fig. 10. Boxplot showing the distribution of tortuosity values of 

the dataset. 

 
As shown in Fig. 10, a box plot is used to visualise the 

distribution of the data. In the plot, the red line shows the 
median of the dataset, and where the two whisker lines 
limit the values where most of the dataset is located. The 
circle marks above the upper whisker line are outlier data 
values that have a large tortuosity value. Which reveals that 
the data in the dataset is concentrated at a smaller value of 
tortuosity. There is also a large number of outlier data 
points that can potentially impact the quality of the data. 
This clustering of data distribution is also reflected in the 
model's predictions, which are most accurate when 
predicting values within the small value range where the 
data is most densely distributed. 

From the results obtained, we can see the limitations of 
using the CNN method in this study. Where the accuracy of 
the model prediction will depend on the consistency of the 
data set used. Where there is an out-of-range problem, it 
will interfere with the performance of the model. Where 
outlier data value is vastly different from the training data, 
the model prediction will be inaccurate. Another limitation 
is the difficulty to predict rocks using larger data sets, such 
as three-dimensional rock data or higher resolution rock 
images. due to the application's limited memory and 
processing time. Moreover, only sandstone rocks are used 
in this study. The results obtained by other types of rocks 
have not yet been evaluated.  

 
4. Conclusion 

This research developed a machine learning model 
using a CNN algorithm to estimate the physical parameters 
of digital rock tortuosity. The CNN model was selected from 
among several pre-trained model architectures, including 
MobileNetV2, DenseNet201, InceptionV3, and Xception, 
based on its performance. The results suggest that all four 
models are quite accurate, with the Xception model being 
the most accurate with the lowest RMSE value of 0.90962 
and MAE and R2 values that are comparable to other 
models. However, the model's predictions were found to be 
most accurate for small tortuosity values, with decreasing 
accuracy as tortuosity values increased. Further research is 
needed to improve the performance and accuracy of the 
model, including the inclusion of additional rock types other 
than sandstone and a more balanced distribution of 
tortuosity values in the dataset. 

Acknowledgements 

The authors acknowledge the Department of 
Geophysics Universitas Padjadjaran supercomputing 
resources "RockExplorer" made available for conducting 
the research reported in this paper.   


12  Akmal, F. et al./ JGEET Vol 08 No 02-2 2023 

Special Issue from The 1st International Conference on Upstream Energy Technology and Digitalization (ICUPERTAIN) 2022 

References 

Al-Raoush, R.I., Madhoun, I.T., 2017. TORT3D: A MATLAB 
code to compute geometric tortuosity from 3D images 
of unconsolidated porous media. Powder Technology 
320, 99–107. 
https://doi.org/10.1016/j.powtec.2017.06.066 

Ansari, O.B., Binninger, F.-M., 2022. A deep learning 
approach for estimation of price determinants. 
International Journal of Information Management 
Data Insights 2, 100101. 
https://doi.org/10.1016/j.jjimei.2022.100101 

Chollet, F., 2017. Deep learning with Python. Manning 
Publications, New York. 

Dandekar, A.Y., Sondergeld, C.H., Rai, C.S., 2018. Machine 
learning for digital rock characterization: 
Opportunities and challenges. Geophysics 83, MR13–
MR23. 

Gao, Y., Mosalam, K.M., 2018. Deep transfer learning for 
image-based structural damage recognition. 
Computer-Aided Civil and Infrastructure Engineering 
33, 748–768. 

Graczyk, K.M., Matyka, M., 2020. Predicting porosity, 
permeability, and tortuosity of porous media from 
images by deep learning. Scientific Reports 10, 21488. 
https://doi.org/10.1038/s41598-020-78415-x 

Howard, A.G., Zhu, M., Chen, B., Kalenichenko, D., Wang, W., 
Weyand, T., Andreetto, M., Adam, H., 2017. 
MobileNets: Efficient convolutional neural networks 
for mobile vision applications. 
https://doi.org/10.48550/arXiv.1704.04861 

Huang, G., Liu, Z., van der Maaten, L., Weinberger, K.Q., 2017. 
Densely connected convolutional networks. 
Presented at the Proceedings of the IEEE Conference 
on Computer Vision and Pattern Recognition, IEEE, 
pp. 4700–4708.

 
Iorga, C., Neagoe, V.-E., 2019. A deep CNN approach with 

transfer learning for image recognition, in: 2019 11th 
International Conference on Electronics, Computers 
and Artificial Intelligence (ECAI). Presented at the 
2019 11th International Conference on Electronics, 
Computers and Artificial Intelligence (ECAI), IEEE, 
Pitesti, Romania, pp. 1–6. 
https://doi.org/10.1109/ECAI46879.2019.9042173 

Ladopoulos, E.G., 2014. Non-linear three-dimensional 
porous medium analysis in petroleum reservoir 
engineering. Universal Journal of Fluid Mechanics 2, 
1–11. 

Neumann, R., Andreeta, M., Lucas-Oliveira, E., 2020. 11 
Sandstones: raw, filtered and segmented data [WWW 
Document]. URL www.digitalrocksportal.org 
(accessed 12.27.22). 

Neumann, R.F., Barsi-Andreeta, M., Lucas-Oliveira, E., 
Barbalho, H., Trevizan, W.A., Bonagamba, T.J., Steiner, 
M.B., 2021. High accuracy capillary network 
representation in digital rock reveals permeability 
scaling functions. Scientific Reports 11, 11370. 
https://doi.org/10.1038/s41598-021-90090-0 

Talo, M., 2019. Convolutional neural networks for multi-
class histopathology image classification. 
arXiv:1903.10035. 

Tang, P., Zhang, D., Li, H., 2022. Predicting permeability 
from 3D rock images based on CNN with physical 
information. Journal of Hydrology 606, 127473. 
https://doi.org/10.1016/j.jhydrol.2022.127473 

 
© 2023 Journal of Geoscience, Engineering, 
Environment and Technology. All rights reserved. 
This is an open access article distributed under the 

terms of the CC BY-SA License (http://creativecommons.org/licenses/by-
sa/4.0/). 

 
http://creativecommons.org/licenses/by-sa/4.0/
http://creativecommons.org/licenses/by-sa/4.0/
http://creativecommons.org/licenses/by-sa/4.0/