Performance evaluation of underwater image pre-processing algorithms for the improvement of multi-view 3D reconstruction

ACTA IMEKO
ISSN: 2221-870X
September 2019, Volume 8, Number 3, 69 – 77

ACTA IMEKO | www.imeko.org September 2019 | Volume 8 | Number 3 | 69

Performance evaluation of underwater image pre-processing
algorithms for the improvement of multi-view 3D
reconstruction

Alessandro Gallo1, Fabio Bruno1, Loris Barbieri1, Antonio Lagudi1, Maurizio Muzzupappa1

1 Department of Mechanical, Energy and Management Engineering (DIMEG), University of Calabria, Via P. Bucci 46C, 87036 Rende, Italy

Section: RESEARCH PAPER

Keywords: 3D reconstruction; Underwater Cultural Heritage; Image enhancement; Underwater imaging

Citation: Alessandro Gallo, Fabio Bruno, Loris Barbieri, Antonio Lagudi, Maurizio Muzzupappa, Performance evaluation of underwater image pre-processing
algorithms for the improvement of multi-view 3D reconstruction, Acta IMEKO, vol. 8, no. 3, article 11, September 2019, identifier: IMEKO-ACTA-08 (2019)-
03-11

Section Editor: Egidio De Benedetto, University of Salento, Italy

Received November 12, 2018; In final form May 28, 2019; Published September 2019

Copyright: This is an open-access article distributed under the terms of the Creative Commons Attribution 3.0 License, which permits unrestricted use,
distribution, and reproduction in any medium, provided the original author and source are credited.

Corresponding author: Loris Barbieri, e-mail: loris.barbieri@unical.it

1. INTRODUCTION

The 3D reconstruction of submerged structures or
archaeological finds has achieved notable popularity in
Underwater Cultural Heritage (UCH) preservation, as the
method may allow for the exploration of sites located in
inaccessible and hostile environments.

In the last decade, techniques and tools for 3D
reconstructions have been widely employed in the underwater
archaeology field according to the guidelines of UNESCO,
which suggest the in-situ preservation of underwater heritage [1].
Among the different 3D imaging techniques that are suitable for
underwater applications, photogrammetry represents a valid
method of reconstructing 3D scenes from a set of images taken
from different viewpoints [2]. The popularity of this technique is
also due to the acquisition devices (still or a movie camera with
appropriate waterproof casings) that are affordable and easy to
use compared to dedicated devices like LIDAR, multi-beam, etc.
[3]. Furthermore, these devices can be handled by scuba divers
or mounted on underwater robots [4]. Unfortunately, image-

based acquisition suffers due to the poor environmental
conditions. The depth of the water, flora, fauna, weather
conditions, and sea currents are all factors that affect visibility,
refraction, and lighting conditions. Consequently, these factors
limit underwater photogrammetry’s scope to close-range
applications, and further efforts are required to improve the
radiometric quality of the images. For these reasons, the
enhancement of underwater images is still a necessary step in
improving the accuracy of 3D reconstruction and creating
realistic textures.

This article presents a performance evaluation of underwater
image pre-processing algorithms for the improvement of multi-
view 3D reconstruction. Two existing colour enhancement
models, i.e. ACE (Automatic Colour Equalisation) and PCA
(Principal Component Analysis) algorithms, have been tested to
compare their results with those provided by a new method
based on histogram stretching and manual retouching (HIST).
To this end, an experimental campaign has been planned using
Design of Experiment (DOE) [5] criteria to investigate the
factors that affect reconstruction accuracy. The experimental

ABSTRACT
3D models of submerged structures and underwater archaeological finds are widely used in various and different applications, such as
monitoring, analysis, dissemination, and inspection. Underwater environments are characterised by poor visibility conditions and the
presence of marine flora and fauna. Consequently, the adoption of passive optical techniques for the 3D reconstruction of underwater
scenarios is a highly challenging task.
This article presents a performance analysis conducted on a multi-view technique that is commonly used in air in order to highlight its
limits in the underwater environment and then provide guidelines for the accurate modelling of a submerged site in poor visibility
conditions. A performance analysis has been performed by comparing different image enhancement algorithms, and the results have
been adopted to reconstruct an area of 40 m2 at a depth of about 5 m at the underwater archaeological site of Baiae (Italy).

mailto:loris.barbieri@unical.it

ACTA IMEKO | www.imeko.org September 2019 | Volume 8 | Number 3 | 70

campaign has been carried out in the underwater archaeological
site of Baiae (Naples, Italy), where the seafloor, with a water
depth ranging between 2.5 and 20 m, offers a particularly
interesting environment, as it encompasses a submerged area of
many hectares and presents a wide range of different
architectural structures with a number of decorations that are still
preserved. The research has been conducted in the context of the
iMARECulture project [6], [7], [8], which aims to develop new
tools and technologies for improving public awareness of UCH.

The article is organised as follows. Section 2 presents related
works about underwater image pre-processing methods. Section
3 describes the image acquisition and pre-processing stage. In
section 4, the results of the statistical analysis are detailed. Section
5 presents the 3D reconstruction obtained by means of the
enhanced images, and finally, conclusions are presented in
section 6.

2. UNDERWATER IMAGE PRE-PROCESSING ALGORITHMS

Underwater pictures generally suffer from light absorption,
which causes some defects mostly on the red channel (the first
component of the light spectrum that is absorbed), and this
effect is already noticeable at only a few metres of depth. The
pre-processing of underwater images can be conducted with two
different approaches: image restoration techniques or image
enhancement methods [9], [10]. Image restoration techniques
need some environmental parameters to be entered, such as
scattering and attenuation coefficients, while image enhancement
methods do not require a priori knowledge of the underwater
environment.

The physical effects of visibility degradation have been
analysed in [11], showing that the degradation effects can be
associated mainly with the partial polarisation of light. The
developed algorithm is based on a couple of images taken
through a polariser at different orientation, improving contrast
and colour and doubling the underwater visibility range. The
work of [12] presents an image restoration filter based on a
simplified version of the Jaffe-McGlamery underwater image
formation model, which can be used for images with limited
backscatter in diffuse lighting.

The ACE algorithm [13] is inspired by the human vision,
which is able to adapt to highly variable lighting conditions,
extracting visual information from the underwater environment
[14]. The algorithm combines the Patch White algorithm with the
Gray World algorithm, taking into account the spatial
distribution of colour information. In the first stages of the ACE
method, chromatic data and pixels are processed and adjusted
according to the information contained in the image.
Subsequently, colours in the output image are restored and
enhanced [15]. Different to the ACE algorithm, which can adapt
to widely varying lighting conditions and can extract visual
information, in order to reduce the number of variables
considerably while still retaining much of the information in the
original dataset, it is possible to adopt the Principal Component
Analysis (PCA) algorithm. PCA is one of the most popular
multivariate statistical techniques that analyses a data table
representing observations described by several dependent
variables and extracts the important information in the form of
a set of new orthogonal variables called principal components.
In this specific application, the PCA algorithm allows us to
extract a dominant colour of the image. Hence, in most cases,
the water colour also provides good results in term of colour
enhancement.

An automatic enhancement algorithm that does not require
any correction parameter has been proposed in [16], where each
source of errors is corrected sequentially. The first step removes
the moiré effect, then a homomorphic or frequency filter is
applied to equalise brightness and to enhance the contrast.
Regarding the acquisition noise, a wavelet denoising filter
followed by an anisotropic filtering has been applied. Finally,
dynamic expansion is applied to increase contrast, followed by
colour equalisation. The process is performed on one channel,
specifically the YCbCr colour space, in order to optimise the
computation time. Even if this last step speeds up all the
following processes avoiding the need to process each RGB
channel each time, it is important to point out that the use of a
homomorphic filter affects the geometry and could generate
errors on the reconstructed scene. The effectiveness of the use
of different colour spaces for the enhancement of underwater
images has been demonstrated in [17], where a slide stretching
algorithm has been used both on RGB and HSI colour spaces.
After a contrast stretching on RGB colour space has been
performed, the resulting images have been converted to HSI
colour space and processed through saturation and intensity
stretching in order to increase the true colour and solve the
problem of lighting. The aim of underwater colour correction is
not only to obtain better quality images, but also to improve the
performance of feature extraction algorithms in terms of the
detection of feature points. The effects of different image pre-
processing methods on the performance of the SURF (Speeded
Up Robust Features) detector [18] have been investigated in [20],
and the IACE (Image Adaptive Contrast Enhancement) method
has been proposed. In particular, the IACE method enhances the
intrinsic features in images, like corners, edges, and blobs, along
with maintaining the relative contrast between the pixels. Thanks
to this capability, the IACE method has proven better than other
techniques, like Histogram Equalisation and Multiscale Retinex
algorithm for Color Enhancement, in terms of the repeatability
of their SURF detector and the robustness and distinctiveness of
their SURF descriptor.

Different to previous works [20] that focus on the
comparison of different image enhancement algorithms, with all
other conditions being equal, this article presents a performance
analysis based on a DOE approach, which takes into account the
main influential factors that affect 3D reconstruction quality.

3. EXPERIMENTATION

The experiment was undertaken in the underwater
archaeological site of Baiae, which is located few kilometres
north of Naples (Italy). The submerged environment of Baiae is
characterised by highly critical visibility conditions due to water
turbidity and the heavy presence of flora and fauna. The area
selected for the experimentation is the thermal room of ‘Villa
Protiro’, with a size of 5 x 8 m at an average depth from the sea
level of 5 m. The choice of this area is due to the presence of
different building materials (bricks, mortar, tile floors, etc.) and a
strong colonisation of various bio-fouling agents. Because of the
critical visibility conditions, 3D reconstruction techniques based
on the multi-view stereo method are not sufficient for
performing an accurate 3D reconstruction of the submerged
archaeological area. The underwater images require, then, a pre-
processing stage involving the adoption of image enhancement
algorithms that could have a relevant or mediocre impact on the
quality of the final 3D reconstructed model.

ACTA IMEKO | www.imeko.org September 2019 | Volume 8 | Number 3 | 71

3.1. Experimental setup

The experimental setup consists of a camera, its underwater
housing, and two underwater strobes. The camera is a Nikon
D7000 reflex device equipped with a CMOS (Complementary
Metal-Oxide Semiconductor) sensor size of 23.6 x 15.8 mm and
a resolution of 4928 × 3264 pixels (16.2 effective megapixels), as
well as a AF-Nikkor 20 mm lens. The underwater housing,
manufactured by Ikelite, is equipped with a spherical port. The
flashguns are connected to the camera housing with a pair of
articulated arms at a distance of 45 cm. The two strobes have
been fixed at a distance of 5 cm behind the dome pointing
outwards to illuminate the object with the ‘edge’ of the light
beam. A calibration panel, produced by Lastolite, has been used
in the beginning of the survey to acquire a colour calibration
image to perform in-situ white balance correction, while a digital
depth gauge has been used to maintain a constant depth from
the seabed.

3.2. Image acquisition

The photogrammetric survey of the submerged area has been
carried out in two different dive sessions, in the north and south
parts of the site. The survey has been carried out according to a
standard aerial photography layout: The diver swims at a distance
from the submerged structures of about 2.5 m, taking
overlapping pictures along straight lines that cover the whole area
in the north-south direction. Another set of images has been
acquired in the east-west direction. The occluded areas have been
acquired using oblique photographs. At the end of the survey
activity, the dataset included a total of around 700 images.

3.3. Colour enhancement of underwater images

The original images (OR) have been then enhanced by means
of three algorithms (ACE, PCA, HIST) and corrected through
an in-situ white balance correction procedure.

The ACE (Automatic Colour Enhancement) algorithm
proposed in [9] and the PCA algorithm proposed in [21] have
been adopted.

The HIST (Histogram Stretching and Manual Retouching)
algorithm is a semiautomatic enhancement methodology that has
been developed for this study. It is based on histogram stretching
and a manual colour retouching procedure and has been
implemented using batch actions in a graphics editor to rescue
the maximum amount of information from a set of defective and
noisy pictures. In particular, the HIST method consists of the
following three-step procedure: preliminary histogram stretching
to improve the contrast; mixing of the colour channels to balance
the missing information on the red channel; creation of a set of
adjustment layers, including saturation enhancement for some
missing hues, contrast masks, colour balancing, and equalising.

In addition to the enhanced algorithms, the images have been
processed by performing an in-situ white balance correction
procedure (WB) performed by means of a Lastolite waterproof
panel.

The following figure shows an original uncorrected image
(Figure 1a) and those enhanced with the WB procedure (Figure
1b), ACE (Figure 1c), HIST (Figure 1d) and PCA (Figure 1e)
algorithms.

3.4. Design of the experimental campaign

The experimental campaign has been planned according to
the DOE criteria with the purpose of identifying the most
influential factors affecting the results of the 3D reconstruction
in the underwater environment. Particular attention has been

given to the effect of the image enhancement methods on the
camera orientation and the self-calibration bundle adjustment
process. The measured data has been compared and analysed by
means of standard statistical tools to verify if a particular factor
(or a combination of factors) has an impact on a parameter with
a certain confidence level. On the basis of the results of this
analysis, it is possible to find the best combination of factors that
should be used for an accurate and dense 3D reconstruction by
using a multi-view stereo technique.

3.5. Influencing factors

The first step is the selection and identification of the
influencing factors that could have an influence on the quality of
a 3D reconstruction performed with a multi-view technique in
the underwater environment. The influencing factors have been
chosen among those that cannot be controlled in situ, such as
the presence of marine organisms in motion and the level of
turbidity of the water. Furthermore, factors that can be
considered as a direct consequence of others have been taken
into account. For instance, focus settings depend on the distance
from the subject, and the focal length can be set according to the
required field of view and to the working distance. The factors
selected for the experiment are reported in Table 1.

The first factor (EN) is related to the original images (OR)
and to the colour enhancement algorithm (WB, HIST, ACE and
PCA) adopted to improve underwater images.

The second factor refers to image resolution (PYR). The full
resolution images have not been used in order to save
computational time. The raw images (4928 x 3264 pixels) have
been resized by means of the Mitchell-Netravali Cubic Filter [22]
in order to create the following levels: level 1 for images of 2464
x 1632 pixels; level 2 for images of 1232 x 816 pixels; and level 3
for images of 616 x 408 pixels.

The third factor is represented by the composite RGB image
and its three R (Red), G (Green), and B (Blue) components. This
factor has been taken into consideration in order to investigate
the influence of a single-colour channel on the reconstruction

Figure 1. Sample original image (a) corrected with the in situ white balance
measurement (b), enhanced with the ACE method (c), HIST method (d), and
PCA method (e).

ACTA IMEKO | www.imeko.org September 2019 | Volume 8 | Number 3 | 72

quality with respect to the grayscale image obtained by
combining the RGB components.

In order to perform a quantitative analysis of the impact of
camera layout on the processing results, the last influencing
factor is referred to the type of image set (SET). Seven subsets
have been created, which differ among each other according to
the type of shot (aerial vs. oblique), working distance, and
overlapping pictures. In particular, the first two sets include
photos that have been taken with a standard aerial layout. The
third set includes pictures characterised by high overlap and good
visibility due to the reduced distance from the submerged
structures. The fourth and fifth sets cover the outside masonry
structures of the outer walls, while the sixth and seventh sets
group oblique pictures with variable working distances.

3.6. Measured parameters

The 3D reconstruction quality has been evaluated by means
of four different parameters: the mean number of extracted
features; the percentage of matched features; the percentage of
oriented cameras; and bundle adjustment mean re-projection

error. The mean number of extracted features (𝑛𝑢𝑚�̅�) has been
calculated according to the SIFT (Scale Invariant Feature
Transform) operator [23] that consists of the following
relationship:

𝑛𝑢𝑚�̅�𝐸𝑁,𝑃𝑌𝑅,𝐶𝐻,𝑆𝐸𝑇 =
𝑛𝑢𝑚𝐹𝐸𝑁,𝑃𝑌𝑅,𝐶𝐻,𝑆𝐸𝑇

𝑛𝑢𝑚𝐼𝑚𝑆𝐸𝑇
(1)

where 𝑛𝑢𝑚𝐹 is the total number of extracted features for
each configuration, and 𝑛𝑢𝑚𝐼𝑚 is the number of images
included in each set.

The percentage of matched features (𝑚𝑎𝑡𝑐ℎ𝑒𝑑𝐹%) and

percentage of oriented cameras (𝑐𝑎𝑚%) parameters have been
evaluated using Bundler [24], through the following
relationships:

𝑚𝑎𝑡𝑐ℎ𝑒𝑑𝐹% 𝐸𝑁,𝑃𝑌𝑅,𝐶𝐻,𝑆𝐸𝑇 =
𝑚𝑎𝑡𝑐ℎ𝑒𝑑𝐹𝐸𝑁,𝑃𝑌𝑅,𝐶𝐻,𝑆𝐸𝑇

𝑛𝑢𝐹𝐸𝑁,𝑌𝑅,𝐶𝐻,𝑆𝐸𝑇

(2)

𝑐𝑎𝑚% 𝐸𝑁,𝑃𝑌𝑅,𝐶𝐻,𝑆𝐸𝑇 =
𝑐𝑎𝑚𝐸𝑁,𝑃𝑌𝑅,𝐶𝐻,𝑆𝐸𝑇

𝑛𝑢𝑚𝐼𝑚𝑆𝐸𝑇
(3)

where 𝑚𝑎𝑡𝑐ℎ𝑒𝑑𝐹 represents the number of matched 3D
points in the sparse scene reconstruction resulting at the end of

the bundle adjustment process, and 𝑐𝑎𝑚 is the number of
oriented images for each configuration.

The bundle adjustment mean re-projection error (available as
output in the Bundler log file and measured in pixels) is the result
of a minimisation problem applied to the sum of distances

between the projections of each track (a connected set of
matching key points across multiple images) and its
corresponding image features.

3.7. Dataset generation

As mentioned above in section 3.5, the whole dataset has been
grouped into seven subsets according to: camera orientation;
distance from the subject; pictures taken with flash; and the
heavy presence of dark and bright areas. The grouping procedure
has meant a selection and reduction in the number of images to
196 pictures.

A Matlab script has been programmed in order to manage the
selected images and apply the different image enhancement
algorithms. Firstly, the enhancement methods (ACE, PCA, WB,
HIST) and the WB correction technique have been applied to
the original full resolution images. Secondly, the red, green, and
blue colour components have been extracted only from images
enhanced with WB correction, ACE, and HIST methods. Mo
action has been taken on images enhanced with the PCA method
because this method produces a single-channel output. Lastly, all
the images have been resized according to the pyramid levels.

4. STATISTICAL ANALYSIS

Table 2 shows the mean values of the measured parameters,
described in section 3.6, computed for each influential factor.

The data has been analysed by means of statistical instruments
and have been compared by performing an ANOVA test with a
95 % confidence level. The summary of the results is presented
in Table 3, in which the measured parameters have been
computed from the main source only. The Tukey post hoc test
has been performed in order to find out significant differences
between the groups of each influential factor.

4.1. Mean extracted features

Figure 2 reports the mean values of the extracted features for
all the factors of influence (summarised in Table 2). The results
show that the number of extracted points strongly depends on
image resolution: A greater number of features is obtained from
images with a higher resolution. In this regard, the data
summarised in Table 3 shows a statistically significant difference

Table 1. Influential factors and related symbol and levels.

Influential factor Symbol Level

Colour enhancement method EN OR, HIST, ACE, PCA, WB

Image pyramid level PYR 1, 2, 3

Colour channel CH RGB, R, G, B

Image set SET SET 1, SET 2,..., SET 7

Figure 2. The mean values of extracted features for all the influential factors.

ACTA IMEKO | www.imeko.org September 2019 | Volume 8 | Number 3 | 73

among the three levels of the PYR factor, and this result shows
that the number of features has a strong relationship with image
resolution. Nevertheless, the images at level 1 (four times more
pixels than level 2), produces only three times more features than
the images at level 2 and nine times more features than the
images at level 3 (containing 16 times less pixels). The image
enhancement methods ACE and HIST have returned the best
results. On the contrary, the colour channel appears to be less
influential than enhancement algorithms, because these last ones
operate a mixing between the various colour channels in
different ways. The results related to the factor ‘image set’ show
a difference in behaviour within the seven datasets. In particular,
the highest number of features is extracted from the images
belonging to set 3, which includes pictures taken at a reduced
distance from the subject. The first two sets are related to the
same area: The second set has been acquired after having
removed the sand that covered the tiled floor in order to improve
the reliability of point detection. Set 6 shows a lower number of
features, as the oblique pictures have been taken from a greater
distance, and the presence of the blue background is more
evident.

4.2. Percentage of matched features

The most influential factor on the parameter ‘percentage of
matched features’ (Table 2) is the image enhancement algorithm
used. As depicted in Figure 3, it is noticeable that the HIST
algorithm allows for matching a higher number of features. The
second factor in terms of influence is the colour channel: RGB
images and the green channel allow for matching the maximum
number of features.

The results related to the factor ‘image pyramid level’ reflect
the low influence deduced from the ANOVA analysis (Table 3),
but it must be pointed out that resized images lead to a higher
percentage of matched points. For all the three enhancement
algorithms used in the experimentation, PYR levels 1 and 2 leads
to a higher performance of the feature-matching algorithm. One
of the reasons for this behaviour is the fact that by reducing the
resolution, it is possible to find more robust features, as they are
extracted from the more evident details. The images at level 3
have not shown good results due to the lack of reliable details.

Regarding the ‘image set’ factor, the matching of the images
included in sets 5, 6, and 7 leads to poor results, since these are
composed by oblique photographs only. The best results have

Table 2. Mean values of the measured parameters computed for each influential factor.

Factors Mean extracted features (1) % matched features (2) % oriented cameras (3)
Bundle adjustment

mean re-projection error (pixels)

HIST 5344.9 2.41 43.24 0.19

ACE 6984.8 1.45 37.72 0.22

PCA 1841.2 2.07 33.38 0.18

WB 2452.1 1.21 21.23 0.22

OR 260.1 1.14 9.26 0.10

PYR

1 8806.0 1.61 37.22 0.24

2 2829.6 2.04 32.58 0.16

3 954.2 1.35 20.42 0.13

RGB 3848.9 2.20 35.98 0.18

R 4315.0 1.36 30.51 0.17

G 3300.1 2.19 34.82 0.18

B 5322.4 0.91 18.97 0.16

SET

1 2813.7 1.71 28.44 0.14

2 4120.1 1.81 25.97 0.15

3 5986.0 2.39 48.46 0.13

4 3647.2 2.56 47.02 0.22

5 4654.6 0.96 22.62 0.24

6 3040.7 1.43 26.59 0.24

7 5022.6 0.98 20.90 0.22

Table 3. Summary of the results of the ANOVA analysis.

Factors Mean extracted features (1) % matched features (2) % oriented cameras (3)
Bundle adjustment

mean re-projection error

F p-value F p-value F p-value F p-value

EN 1065.41 0 21.07 0 91.08 0 9.17 0.0002

PYR 1457.10 0 5.93 0.005 20.61 0 13.82 0

CH 47.80 0 14.54 0 12.39 0 0.18 0.9129

SET 92.79 0 7.76 0.0001 18.64 0 6.11 0.0002

ACTA IMEKO | www.imeko.org September 2019 | Volume 8 | Number 3 | 74

been obtained for sets 3 and 4, which include images taken with
a reduced working distance and a greater picture overlap.

4.3. Percentage of oriented cameras

The image enhancement algorithm is the most influential
factor on the parameter ‘percentage of oriented cameras’ (Table
2). The performances obtained using each method are clearly
better than the results obtained with the original images. For the
latter, only 9 % of the cameras have been oriented, while about
43 %, 37 %, and 33 % of cameras have been successfully oriented
while using the HIST, ACE, and PCA enhancement methods,
respectively.

The second most influential factor is image resolution. By
analysing the results in Figure 4 and the outcomes of the Tukey
post hoc test, it is noticeable that there are no statistically relevant
differences between the values for the first and second levels of
the image pyramid. This means that it is possible to obtain the
maximum number of oriented cameras with a lower resolution,
also saving computational time.

4.4. Bundle adjustment mean re-projection error

The ANOVA results (Table 3) show that the most influential
factor for the parameter ‘mean re-projection error’ is image
resolution. If we consider the pixel size of the subsampled images
and the mean distance from the subject of 2.5 m, the first and
second pyramid levels have demonstrated errors measured on
the ground of 0.29 and 0.38 mm, respectively.

By analysing the results presented in Table 2 and depicted in
Figure 5, it is noticeable that a higher error has been measured
on the image sets containing oblique photographs only. The
presence of the blue background in almost all the pictures
reduces the accuracy of the bundle adjustment process.

Furthermore, the ANOVA analysis results (Table 3) reveal
that there is not a statistically significant difference among the
different colour channels (CH).

4.5. Discussion

The statistical analysis allowed for choosing the best
combination of factors that should be used to perform the 3D
reconstruction of the site. The accuracy of the SfM (Structure
from Motion) procedure (namely the mean re-projection error)
is mainly related to the camera network orientation. Sets 1, 2 and
3 are characterised by convergent images with a high overlap,
forming a more robust network. Moreover, as reported in the
previous section, both levels 1 and 2 led to errors below the
acceptable value of 0.5 mm. For these reasons, it is possible to
save computational time using subsampled images, which also
result in a higher percentage of matched features. In fact, in the
same conditions and varying only the image resolution, the
average reconstruction time for the datasets taken into account
in the study presents a saving time of 81 % for PYR2 compared
to PYR1, and 92 % for PYR3 compared to PYR1.

HIST and ACE methods considerably increase the
performance of image matching. The first method returns better
results in terms of the percentage of oriented cameras and
matched features, increasing the performance by about 150 %
and 50 %, respectively. The analysis of the effects of white
balance correction has shown good results for the parameters
‘mean number of extracted features’ and ‘percentage of oriented
cameras’. In particular, white balance corrected images have
shown a higher number of extracted features and better
performances compared with PCA method. The images

Figure 4. Mean values of the percentage of oriented cameras for all the
influential factors.

Figure 3. The mean values of the percentage of the matched features for all
the influential factors.

Figure 5. Bundle adjustment mean re-projection error for all the influential
factors.

ACTA IMEKO | www.imeko.org September 2019 | Volume 8 | Number 3 | 75

obtained using the custom white balance adjustment lead to less
stable results, since the correction is performed at the beginning
of the survey.

The results in terms of the number of oriented cameras have
shown similar values for the first and second image pyramid
levels. Concerning the re-projection error, as described in section
4.4, in this case the best choice is also to use a lower resolution
in order to save computational time without affecting
reconstruction accuracy. Regarding the colour channel, the data
has demonstrated the better performance of the RGB images,
particularly the green channel, which outperforms the results for
other channels in terms of matched features and oriented images.

Considering the results related to the factor ‘image set’, it is
evident that the highest number of features is extracted from the
sequences of images that have been taken using a standard aerial
photography layout, with a reduced distance to the subject. On
the contrary, oblique pictures, where the presence of the blue
background is more evident, returned poor results in terms of
percentage of matched features and percentage of oriented
cameras.

5. RESULTS

The 3D reconstruction pipeline starts by performing the
orientation of the whole dataset of 722 pictures by means of the
Bundler software [25]. In the first instance, the 3D
reconstruction has been performed on the dataset composed by
the original images. The image orientation process failed to
orient all the pictures in a single block: the dataset has been
divided into two non-overlapping groups, the north and south
parts, which have been reconstructed separately. In particular,
384 images have been oriented for the north block and 116 for
the south block. This failure is mainly due to the sandy seabed
present in the central part of the room, which makes the
extraction and matching of features difficult, as a consequence
of the low contrast. Furthermore, the lack of overlapping areas
in the reconstructed model prevented the alignment of the two
blocks.

The results of the statistical analysis allow for choosing the
best combination of factors to be used in order to improve the
reconstruction process, represented by RGB images resized to

25 % (second pyramid level) enhanced with the HIST method.
The enhanced dataset has been processed with Bundler, and a
subset of 533 images related to the whole area has been aligned,
allowing for the generation of a complete 3D point cloud without
the need to register the different meshes (Figure 6). This result
shows that colour correction considerably improves the
matching process.

The data returned by Bundler (camera positions and camera
parameters computed by a self-calibration procedure) and the
undistorted images have been processed with PMVS2 (Patch
Based Multi-View Stereo) [26] in order to create a dense cloud of
about 10 million points related to the whole site.

This algorithm estimates the surface orientation while
enforcing the local photometric consistency, which is important
for obtaining accurate models for low-textured objects or for
images affected by blur due to the turbidity in the underwater
environment. Furthermore, PMVS2 automatically rejects
moving objects such as fishes and algae. The dense stereo
matching algorithm implemented in PVMS2 receives, as inputs,
an undistorted set of images and the 3 × 4 camera projection
matrix computed by Bundler. The output is a coloured dense 3D
point cloud. The PMVS2 parameters used to fine-tune the 3D
reconstruction are the size of the correlation window and the
level in the internal image pyramid used for the computation. In
our experiment, a fixed correlation window with a size of 7 × 7
pixels was adopted, while the image resolution (image pyramid
level) was chosen according to the results obtained through the
variational analysis. Moreover, image triplets instead of pairs
were used to increase the robustness of the reconstruction.

The 3D point cloud has been elaborated with Meshlab tools.
The first operation was the manual selection and deletion of
unwanted areas and outliers caused by the presence of
underwater flora and fauna and bad visibility conditions. Then, a
watertight surface with about 25 millions of triangles (Figure 7)
was obtained through the Poisson Surface Reconstruction
algorithm.

The resulting surface has been subsequently decimated in a
mesh of 6.5 million triangles and 3 million points in order to be
handled more efficiently without losing details. Since the camera
orientation procedure has been carried out with an unknown
scale factor, it is necessary to scale the model by selecting two
points with a known distance. In this experimentation, a scale bar
has been placed in the scene and reconstructed in order to
evaluate the scale factor.

The last step consists in the application of the texture on the
3D surface. Colour information can be extracted directly from
the coloured point cloud, but this method does not allow for the

Figure 6. Results of the camera orientation process (enhanced pictures with
HIST method): sparse point cloud and 533 oriented cameras.

Figure 7. Reconstructed surface.

ACTA IMEKO | www.imeko.org September 2019 | Volume 8 | Number 3 | 76

creation of a high-quality texture, because its resolution depends
on the point cloud density. Moreover, since the enhancement
procedure is often performed to improve the feature extraction
process (by increasing the contrast without taking into account
the fidelity of the colours – usually single-component or
greyscale images are used), the colour information stored in the
pixels cannot be used. Since the camera positions are known, the
texture mapping has been carried out by means the projection
and blending of high-resolution images directly on the 3D
surface. In particular, an image subset has been selected because
the averaging among neighbourhood values during the blending
on the images works better if a small overlapping area is present.

This subset of images has been extracted from the images
enhanced with the HIST method, which also gave the best results
in terms of texture quality. This is mainly due to the manual
retouching step performed on a sample image and then exported
to the whole dataset.

The result of this procedure is a texture with a resolution
comparable to the original images (Figure 8).

6. CONCLUSIONS

This paper has presented a performance analysis, based on a
DOE approach, of the main influential factors that affect 3D
reconstruction quality. The performance of three different
colour enhancement algorithms, ACE, PCA, and HIST, have
been evaluated by using a variance analysis, including the effects
of image resolution and colour channels. The results of the
ANOVA analysis show that the factors EN (image enhancement
method), PYR (image pyramid level), and SET (image set) are
influential, with a confidence level of 95 %, for all the parameters,
while the results related to the factor CH (colour channel) have
shown a limited influence, since each enhancement method
performs mixing operations among the channels.

The ANOVA data allowed for choosing the best combination
of factors to optimise the SfM bundle adjustment mean re-
projection error, the number of extracted features, oriented
cameras, and matched features, also taking the processing time
into account. More precisely, the best combination is
characterised by RGB images resized to 25 % and enhanced with
the HIST method, which returns more stable results.

By using the results of the statistical analysis to correct and
process the underwater images, it has been possible to align an
unordered sequence of more than 500 images belonging to the
entire site. On the contrary, the original images could not be used
to align all the cameras. Moreover, the corrected images allowed
for creating a model mapped with a high-quality texture,

comparable with original images in terms of resolution and with
a fair colour balance, since the whole dataset shares the same
colour statistics.

Even if these techniques have been used in other works
related to underwater archaeology, this experiment represents a
significant case study for verifying their robustness in the
presence of strong turbidity and poor environmental conditions,
providing useful guidelines for an accurate modelling of a
submerged site.

ACKNOWLEDGEMENT

This work has been supported by the iMARECulture project
that has received funding from the European Union’s Horizon
2020 research and innovation programme under grant agreement
No. 727153.

REFERENCES

[1] UNESCO, Convention on the Protection of the Underwater
Cultural Heritage, 2 November 2001, http://www.unesco.org.

[2] G. Telem, S. Filin, Photogrammetric modelling of underwater
environments, ISPRS Journal of Photogrammetry and Remote
Sensing 65, 5 (2010), pp. 433-444.

[3] F. Menna, P. Agrafiotis, A. Georgopoulos, State of the art and
applications in archaeological underwater 3D recording and
mapping, Journal of Cultural Heritage 33 (2018) pp. 231-248.

[4] F. Bruno, A. Lagudi, L. Barbieri, D. Rizzo, M. Muzzupappa, L. De
Napoli, Augmented Reality visualization of scene depth for aiding
ROV pilots in underwater manipulation, Ocean Engineering 168C
(2018) pp. 140-154.

[5] D. C. Montgomery, Design and Analysis of Experiments, John
Wiley & Sons, New York, 2017, ISBN: 978-1-119-11347-8.

[6] IMARECulture, http://www.iMARECulture.eu
[7] F. Bruno, A. Lagudi, G. Ritacco, J. Cejka, P. Kouril, F. Liarokapis,

P. Agrafiotis, D. Skarlatos, O. Philpin-Briscoe, E.C. Poullis,
Development and integration of digital technologies addressed to
raise awareness and access to European underwater cultural
heritage, An Overview of the H2020 iMARECulture Project,
Proc. of the MTS/IEEE Conference Oceans’17, Aberdeen, UK,
19-22 June, 2017.

[8] D. Skarlatos, P. Agrafiotis, T. Balogh, F. Bruno, F. Castro, B.D.
Petriaggi, S. Demesticha, A. Doulamis, P. Drap, A. Georgopoulos,
‘Project iMARECulture: Advanced VR, iMmersive serious games
and augmented reality as tools to raise awareness and access to
European underwater cultural heritage’, Proc. of the International
Conference on Cultural Heritage, Nicosia, Cyprus, 1-5 November,
2016.

[9] A. Mahiddine, J. Seinturier, D. Peloso, J. M. Boï, P. Drap, D.
Merad, Underwater image pre-processing for automated
photogrammetry in high turbidity water, VSMM2012, 2012, pp.
189-194.

[10] R. Schettini, S. Corchs, Imaging for underwater archaeology,
American Journal of Field Archaeology 27, 3 (2000), pp. 319-328.

[11] Y. Y. Schechner, N. Karpel, Recovery of underwater visibility and
structure by polarization analysis, IEEE Journal of Oceanic
Engineering, 2005, 30(3), pp. 570-587.

[12] E. Trucco, A.T. Olmos-Antillon, Self-tuning underwater image
restoration, IEEE Journal of Oceanic Engineering 31, 2 (2006) pp.
511-519.

[13] A. Rizzi, C. Gatta, From Retinex to Automatic Color Equalization:
issues in developing a new algorithm for unsupervised color
equalization, Journal of Electronic Imaging 13 (2004) pp.75-84.

[14] M. Chambah, D. Semani, A. Renouf, P. Courtellemont, A. Rizzi,
Underwater color constancy: enhancement of automatic live fish
recognition, Proc. of the 16th Annual Symposium on Electronic
Imaging, 2003, United States, 5293, pp. 157-169.

Figure 8. Final textured 3D model.

ACTA IMEKO | www.imeko.org September 2019 | Volume 8 | Number 3 | 77

[15] F. Petit, Traitement et analyse d’images couleur sous-marines:
modèles physiques et représentation quaternionique, Doctorat,
Sciences et Ingénierie pour l'Information, Poitier, 2010.

[16] S. Bazeille, I. Quidu, L. Jaulin, J. P. Malkasse, Automatic
underwater image pre-processing, CMM’06 - Caracterisation Du
Milieu Marin, 2006.

[17] K. Iqbal, R. Abdul Salam, A. Osman, A. Z. Talib, Underwater
image enhancement using an integrated colour model, IAENG
International Journal of Computer Science 32, 2 (2007) pp.239-
244.

[18] H. Bay, A. Ess, T. Tuytelaars, L. Van Gool, Speeded-up robust
features (SURF), Comput. Vis. Image Underst. 110 (2008) pp.346-
359.

[19] R. Kalia, K.-D. Lee, B.V.R. Samir, S.-K. Je, W.-G. Oh, An analysis
of the effect of different image pre-processing techniques on the
performance of SURF: Speeded Up Robust Feature, Proc. of the
17th Korea-Japan Joint Workshop on Frontiers of Computer
Vision (FCV), Ulsan, South Korea, 9-11 February 2011, pp.1-6.

[20] M. Mangeruga, F. Bruno, M. Cozza, P. Agrafiotis, D. Skarlatos,
Guidelines for underwater image enhancement based on

benchmarking of different methods, Remote Sensing 10, 10 (2018)
1652, pp.1-27.

[21] A. Tonazzini, E. Salerno, M. Mochi, L. Bedini, Blind source
separation techniques for detecting hidden texts and textures in
document images, Image Analysis and Recognition Lecture Notes
in Computer Science 3212 (2004) pp. 241-248.

[22] D. P. Mitchell, A. N. Netravali, Reconstruction filters in computer
graphics, Computer Graphics 22, 4 (1988) pp. 221-228.

[23] D. G. Lowe, Distinctive image features from scale-invariant
keypoints, International Journal of Computer Vision 60, 2 (2004)
pp. 91-110.

[24] Z. Zhang, A flexible new technique for camera calibration, IEEE
Transactions on Pattern Analysis and Machine Intelligence 22, 11
(2000), pp.1330-1334.

[25] Bundler software, http://www.cs.cornell.edu/~snavely/bundler
[26] Y. Furukawa, J. Ponce, Accurate, dense, and robust multi-view

stereopsis, IEEE Transactions on Pattern Analysis and Machine
Intelligence 32, 8 (2010), pp. 1362-1376.