A tool for curating and searching databases proving traceable analysis of data and workflows


ACTA IMEKO 
ISSN: 2221-870X 
March 2023, Volume 12, Number 1, 1 - 6 

 
ACTA IMEKO | www.imeko.org March 2023 | Volume 12 | Number 1 | 1 

A tool for curating and searching databases proving traceable 
analysis of data and workflows 

Frederic Brochu1, Michael Chrubasik1, Spencer A. Thomas1 

1 Data Science, National Physical Laboratory, Hampton Road, Teddington, Middlesex, TW11 0LW, United Kingdom  

 
Section: RESEARCH PAPER  

Keywords: searchable metadata; reproducibility; data curation; data traceability; FAIR 

Citation: Frederic Brochu, Michael Chrubasik, Spencer A. Thomas, A tool for curating and searching databases proving traceable analysis of data and 
workflows, Acta IMEKO, vol. 12, no. 1, article 12, March 2023, identifier: IMEKO-ACTA-12 (2023)-01-12 

Section Editor: Daniel Hutzschenreuter, PTB, Germany  

Received November 18, 2022; In final form February 16, 2023; Published March 2023 

Copyright: This is an open-access article distributed under the terms of the Creative Commons Attribution 3.0 License, which permits unrestricted use, 
distribution, and reproduction in any medium, provided the original author and source are credited. 

Funding: This work was funded by the Department for Business, Energy & Industrial Strategy through the UK’s National Measurement System. 

Corresponding author: Frederic Brochu, e-mail: frederic.brochu@npl.co.uk  

 
1. INTRODUCTION 

Technological and scientific advances over the last 20+ years 
have led to the ability to generate and store vast amounts of data. 
Furthermore, emphasis on reducing acquisition times has 
significantly increased the throughput of data from experiments. 
In parallel with the developments in measurement technologies, 
there have been significant advancements in data storage, 
allowing this data to be efficiently captured and stored. However, 
a lack of systems or standards to organise or curate data leads to 
ad-hoc file structures, inconsistent conventions in recording 
metadata and a loss of data provenance. Consequently, the vast 
amounts of data being recorded are not findable, accessible, 
interoperable, and reusable (FAIR) [1].  

Without a well-curated database [2] that includes rich 
metadata, data will not be findable. For research institutions that 
generate or collect large volumes of data, this is highly 
problematic as it significantly restricts the data’s reusability and 
therefore value. This is compounded with potentially high 
financial costs associated with repeated acquisitions and double 
storage if the data are not discoverable. The inability to retrieve 
data may have significant repercussions for reproducibility of 

results, traceability, or adherence to funder requirements. The 
concept of measurement traceability, where any instrument’s 
measurement can be linked to a known standard through an 
unbroken chain of comparisons, is well established. Data 
traceability extends this concept to data and analysis pipelines 
where a given output (processed data, figures, statistical tests, etc) 
is linked back to data at the point of measurement through an 
unbroken chain of steps in a data workflow. These steps include 
data conversion, data processing (for example noise reduction), 
and data analysis steps (e.g., statistical tests, machine learning, 
etc) [2]. Throughout the rest of this paper, we use the term 
traceability to refer to data traceability.  

Previously, at the National Physical Laboratory (NPL), we 
have developed methods for curating data at the point of 
measurement which can be used to establish a FAIR and 
traceable database [2], [3]. A curated database with relevant well-
structured metadata tags permits searches using Structured 
Query Language (SQL) or similar, where investigators can return 
a list of datasets within the database to match given criteria. The 
metadata itself can be analysed to reveal insights into the data. 
For example, in radiology, analysis of the sensitivity and radiation 
exposure over time, at different sites, uncovered inter-site and 
temporal differences [4]. However, establishing such a system 

ABSTRACT 
We present a framework for easy annotating, archiving, retrieving, and searching measurement data from a large-scale data archival 
system. Our tool extends and simplifies the interaction with the database and is implemented in popular scientific applications used for 
data analysis, namely MATLAB, and Python. This allows scientists to execute complex interactions with the database for data curation 
and retrieval tasks in a few simple lines of accessible templated code. Scientists can now ensure their measurement data is well curated 
and FAIR (findable, accessible, interoperable, and reusable) compliant without requiring specific data skills or knowledge. Our tools allow 
users to perform SQL-type (Structured Query Language) queries on the data from simple templated scripts allowing data retrieval from 
long-term storage systems. 

mailto:frederic.brochu@npl.co.uk


ACTA IMEKO | www.imeko.org March 2023 | Volume 12 | Number 1 | 2 

requires a high level of computational skills and often requires 
bespoke software creating significant barriers for measurement 
scientists.  

We use our internal database for long-term curated storage of 
measurement data along with experimental conditions that form 
the basis of the metadata and are vital for traceability and 
reproducibility. In this work, we introduce a tool that combines 
and extends the functionalities of the application programming 
interface (API) provided with NPL’s archive for file transfer, 
annotation, and metadata queries into a single, convenient 
interface tool accessible from the popular scientific applications 
MATLAB and Python. This tool not only simplifies interactions 
with the archive, but it also makes the entire data management 
process more accessible for scientists. The data archive we use in 
this work is an ‘Objectstore’, a database storing data as objects 
that have their own attributes comprising of system metadata 
(size, creation date, etc) and custom metadata (user-definable 
fields). In this work we will exploit the archives’ use of data 
objects to store any number of multimodal data files, in any 
format, as a single Hierarchical Data Format (HDF5) file that we 
use as a ‘container’ for data files. A single HDF5 container file, 
consisting of any number of data files, corresponds to the data 
objects uploaded to the data archive. We refer to the data objects 
as HDF5 container files through the manuscript. We can further 
exploit the Objectstore functionality by utilising the custom 
metadata to define domain-specific attributes that are used to 
‘tag’ the HDF5 file in the Objectstore enabling highly specialised 
and domain-specific searching of the data. For large 
organisations this provides a flexible approach to automatically 
curate large and diverse databases without domain-specific 
infrastructure. For users, this enables multiple data files to be 
wrapped in a single container that is stored in a data archive for 
long-term storage alongside relevant metadata for searching and 
retrieval. Our tool provides a user-friendly interface for users to 
perform all necessary steps (generating the container, tagging 
with metadata, uploading to the database, performing searches 
and retrieval of the data) without any expertise in these areas. We 
provide a case study using large and complex multimodal cohort 
data from experiments conducted at multiple institutions and 
across multiple instruments. 

2. BACKGROUND 

Data curation is the process of organised storing of data in a 
structured way with rich machine-actionable information about 
the data and its provenance. Analogous to finding a book in a 
library, data curation enables users to locate specific datasets 
based on a defined list of attributes. For example, locating a 
dataset consisting of an image of a cat in winter captured with a 
mobile phone camera in the countryside. Although many databases will 
enable the searching of the criteria of ‘images’, ’cat’, ‘camera 
phone’, ‘winter’, and ‘countryside’, there is no strict matching of 
attributes that these keywords map to. In a curated database we 
can perform such searches, type = ‘images’, subject = ’cat’, 
device = ‘camera phone’, season = ‘winter’, and 
location = ‘countryside’. This search would provide exact 
matches to our search rather than any attribute matches to the 
keywords in the former case. 

The demand for long-term data curation arises from the 
researchers themselves, as it enables them to utilise the data in 
future studies, from funding bodies through data retention 
requirements, and from the community, which promotes open 
science with FAIR data. Well-curated data has the additional 

benefit of enabling meta-analysis across the database which can 
provide more precise estimates compared to individual studies, 
as well as an assessment of variability [5] and the development of 
computational models [6]. For example, meta-analysis has 
identified a higher proportion of positive COVID-19 tests in low 
and low-middle income countries compared to higher income 
countries [7], evaluating the treatment effects [8] and by assessing 
the impact of missing data on outcomes [9]. 

There is currently no data curation tool or platform suite to 
manage experimental data due to its inherent complexities. This 
is particularly problematic for research which will typically be 
subject to funding bodies’ data storage and retention policies. 
Due to the high cost (financial, expertise, and time) involved in 
some experimental studies, researchers want to maximise the 
future utility of the data in other studies. For example, healthcare 
or pharmaceutical studies involving tissue imaging have very 
complex data collection pipelines with different centres 
responsible for collecting the samples (e.g. biopsies), sample 
preparation (e.g. embedding and sectioning) and the 
measurement data (e.g. imaging). In this case, one experiment 
can involve multiple institutions and the provenance of the 
sample is highly complex. Data quality controls and future meta-
analysis require this information to be captured in a machine- 
actionable way.  

Current solutions range from individual-level record keeping 
to universal data repositories; we argue these are insufficient for 
curating measurement data. Record keeping, such as 
spreadsheets or a database (SQL, Access, etc), to capture 
information, such as data storage location, does not constitute a 
data archive as it simply lists the locations and possibly some 
metadata. The fact that this information is unstandardized, prone 
to error, not machine-actionable, and not searchable (in a 
database environment) is even more problematic as it prevents 
the data from being FAIR. Universal data repositories such as 
Zenodo, Figshare, Scientific Data, Dropbox, re3data, etc, offer 
storage of data and user flexibility with regards to files and 
formats stored as well as providing some scope for metadata. 
However, capturing the metadata is far from trivial [2], [3], [10] 
and there may be many terms to include. Although many of these 
platforms offer a search functionality, it is offered in a basic 
implementation, preventing highly specialised searches such as 
structured database queries. Furthermore, there are no specific 
checks on these entered fields, hence information may be missed, 
incorrectly added, or exist in several forms (e.g., acronyms, 
capitalisation). 

3. METHOD 

The developed tool is a set of MATLAB and Python scripts 
handling interactions with the archive data storage and 
experimental data with metadata encapsulation. This enables 
“behind the scenes” operations at the command of the scientists 
wishing to archive their data without requiring the programming 
skills necessary to do so. The interface tool is invoked through 
MATLAB which is used as a user interface for Python code 
operating as a two-layer program. The first layer is a MATLAB 
master class describing a connection “object” with different call-
back functions. In the second layer the functions are mapped 
into a python layer handling “representational state transfer” 
(REST) calls [11] to the Objectstore APIs for file handling and 
metadata queries. This configuration not only provides access of 
our tool for MATLAB and Python users, but also provides a 
simple interface (in MATLAB) for users with little programming 


ACTA IMEKO | www.imeko.org March 2023 | Volume 12 | Number 1 | 3 

expertise. The python libraries are installed as dependencies of 
the code following our documented installation procedure, this 
also covers MATLAB installation which is straightforward. This 
tool is designed to work within an organisations digital 
infrastructure and thus can be preloaded on to institutional 
machines such as laptops and lab machines. File transfers are 
performed with the AWS S3 protocol [11] with all data stored on 
our institutions internal data storage infrastructure with access 
permissions fully controllable by our IT team. Our code is 
available on a private GitLab repository as a PyPl package, and a 
public release may be possible in the future.  

To be an effective solution, our tool has required: to store all 
relevant and related data sets as a single instance (see Section 3.1), 
the data be linked to the metadata (see Section 3.2), all data to be 
archived in a common location (see Section 3.3), and the 
metadata be searchable for data retrieval (see Section 3.4). The 
layout tools and their different components are presented in 
Figure 1. 

3.1. HDF5 container 

We store the data in a Hierarchical Data Format (HDF5) 
container file that can hold an arbitrary number of data files and 
formats. This allows the storage of experimental data with 
associated datasets such as calibration data or processing scripts 
as a complete and unbroken data pipeline [3]. Containerising the 
data and associated steps in a workflow ensures reproducibility 
through the data pipeline. The data parsed into the container are 
stored in binary form as this allows supporting any data format. 
They are also compressed at the creation of the container to 
optimise data replication to the archive system. 

3.2. Annotation 

Annotation refers to the process of attributing information to 
a data file such that it becomes the metadata for that file and 
providing terms that the file can be searched with. The metadata 
for complex experiments is multi-source [10] and takes different 
format. We collect and aggregate them all in a single ‘well-
formed’ XML file [12]. XML is a metadata format that is both 
human and machine-readable, and is the only format supported 
by the NPL archive described in Section 3.3. The metadata flow 
is duplicated: a copy is embedded in the HDF5 container, and 
another is used to link the HDF5 container with the associated 

metadata in the archive annotation database. Linking the HDF5 
with its associated metadata ensures the data are FAIR compliant 
prior to uploading the file to the database. Although aiding 
reproducibility and interoperability for individual files, linking 
the data with metadata is not sufficient to provide a curated 
database that can be easily searched or mined for meta-analysis. 
By using standardised and well-formed metadata to link to the 
HDF5 files we can automatically establish a curated database. 
That is, all HDF5 containers have the same metadata structure 
and are therefore well organised and can be viewed and search in 
a systematic way. Further details of this are in Section 3.4. 

3.3. Data Archive (Objectstore) 

The annotated HDF5 container files are stored in our 
“Objectstore” database. The Objectstore is NPL’s large-scale 
data archival system, an instance of the Hitachi Content Platform 
(HCP), which in its most basic form is a flexible database for 
storing and annotating data. File annotation tags the data with 
associated metadata for curation and database searching. The 
metadata can be defined by the user, known as ‘custom 
metadata’, and can be used for curating complex experimental 
data [2]. When tagged with metadata, data are stored in an 
internal database with a dedicated API allowing simple SQL-type 
queries for searching the data. The Objectstore consists of 
“Tenants”, organisational level division of the system (e.g., 
departments); and “namespaces”, logical grouping of objects 
(e.g., projects). This archival system supports file versioning, 
where the history of any changes to the data are recorded, as well 
as annotation. Both the data and metadata can be updated for 
new versions. A database with versioning enabled can enhance 
traceability when the data are tagged with the associated metadata 
[3]. The ability to trace software, file, and documentation changes 
through an unbroken chain (i.e., the version history) can help 
identifying bugs/errors and tracking system evolution. The 
ordered nature of versioning enables the user to return to a point 
of the evolution of the data and create a new branching point. 
This is particularly useful if errors have been identified or new 
methods have been developed, such as improved data 
processing.  

The tool provides specific functions for: connecting to the 
database, uploading data following archiving as a HDF5 
container file, downloading datasets, and performing metadata 
queries (see Section 3 for more details). An example of the 
MATLAB script to establish the database connection is given in 
Figure 2. 

Once a connection is established the data can be uploaded 
simply by specifying the location of the data to be uploaded 
(local_dir) and the target storage location on the database 

 
Figure 1. Interface layout and functionality described in section 3. User 
functions are represented by orange arrows. The desired datasets are each 
converted to binary and added to a HDF5 container file that is annotated (or 
tagged) with relevant metadata. The HDF5 container file is uploaded to the 
curated database where the metadata can be queried. Any desired data can 
be easily downloaded and automatically converted back to its native 
format.  

 
Figure 2. MATLAB script to setup database connection. Comments are 
given in lines beginning with % and coloured green. Here users only need to 
specify the namespace and Tenant they wish to connect to which will be fixed 
for each project. 


ACTA IMEKO | www.imeko.org March 2023 | Volume 12 | Number 1 | 4 

(database_dir) in a function call. The function first creates an 
HDF5 container file that is populated with all the data in the 
directory specified by the user. Note this directory can contain 
any number of, or format of data files and also supports the use 
of shortcuts/links to other directories. The latter is vital when 
dealing with very large files that may be stored on multiple 
disparate devices, such as different laboratory instruments, and 
avoids the need to transfer data or duplicate data prior to using 
our tool. Next, the tool tags the container with the (well-formed 
XML) metadata as outlined in Section 3.2 allowing search 
functions in the database. Finally, this tagged HDF5 file is then 
uploaded to the database. The script for this is given in Figure 3 
which highlights the simplicity of our tool’s interface which is 
vital for non-expert users. 

Similarly, data can be easily downloaded from the database. In 
this case “local_dir” is the folder location where the HDF5 
container will be downloaded to and then unpacked to the same 
folder structure and data formats as the originally uploaded data. 
Directories that were originally shortcuts are unpacked as 
subdirectories within the data parent directory, i.e., there are no 
shortcuts when data are downloaded and unpacked. The script 
for downloading the data is given in Figure 4. 

3.4. List contents and SQL queries 

The curated database consists of HDF5 files tagged with 
associated metadata that allows the entire database to be 
searchable. The content of the entire namespace, or a specific 
directory can be listed as shown in Figure 5. 

We can also search the contents of our database filtering on 
any of the tagged metadata attributes using SQL-type queries. 
The queries return a list of HDF5 files that match the criteria of 
the query. In addition to attributes of the data from the metadata, 

the queries can also include the filename or a unique identifier 
making all the data findable. Using SQL-type queries, which can 
be standardised through template queries for metadata attributes 
in a specific domain (see Results Section), ensures that the data 
are accessible to all users in line with the GO FAIR principles 
[13]. Database permissions can be set to restrict the visibility of 
data for different users as required, though this is outside the 
scope of this work. 

4. RESULTS 

We demonstrate our tool with a case study using a database 
from the Cancer Research UK Rosetta Grand Challenge 
(A24034) Project led by NPL’s NiCE-MSI group [14]. Previously 
we have established a curated database of complex experimental 
data [2] that can be used for traceable data processing workflows 
that are fully reproducible [3]. This database consists of 
experimental data acquired from several different instruments 
(vendors and models), across multiple sites, with a large number 
of experimental parameters and operating procedures that are 
dependent on the sample. The project also conducts cohort, 
longitudinal and inter-laboratory studies, and aims to conduct 
some meta-analysis on the data once complete. 

Some instruments write data in a proprietary format that is 
accessible internally, but this access is lost when archiving the 
data as the proprietary format requires the instrument’s vendor 
software to open. We convert this data in an open community 
standard format called imzML [15] making the data accessible. 

Opening the database connection as shown in Figure 2, the 
database can be searched with simple SQL queries of the form 
conn.metadata_query('key' ,'value'). This provides a simple user-
friendly means for experimentalists to search the curated 
database for any subset of files making the data findable. We 
provide standard and simplified code for common queries 
allowing non-programmers to utilise this functionality. Some 
examples of domain-specific queries that users may want to 
perform are:  

• data from a specific measurement technique, for 
example DESI and MALDI: 
conn.metadata_query('Technique','DESI') 
conn.metadata_query('Technique','MALDI') 

• data from a particular experimental study: 
conn.metadata_query('Study','SLC7A5') 

• data acquired from samples from a particular 
collaborator: 
conn.metadata_query ('SampleSource','AstraZeneca') 

• data from a specific vendor instrument model, for 
example a ‘SYNAPT G2-Si’ model: 
conn.metadata_query('Instrument','SYNAPTG2-Si') 

• data from a particular sample (unique barcode from a 
separate sample management database, the data is inte-
grated into our curated database prior to upload see [2]): 
conn.metadata_query('BARCODE','1000202') 

 
Figure 3. MATLAB script to upload local data to the database. Comments are 
given in lines beginning with % and coloured green. Users specify the location 
of the data they wish to upload (any number or format of data) in local_dir 
which also supports shortcuts, and the destination folder on the database in 
database_dir. The structure of folders on the database can resemble folder 
structures on computers which will not impact the searchability of the 
database. 

 
Figure 4. MATLAB script for downloading data from the database to the local 
computer. Comments are given in lines beginning with % and coloured green. 
Here database_dir is the data users wish to retrieve from the database and 
local_dir is the target directory on the local machine to download the data 
and automatically covert it back into its original format prior to upload to the 
databased.  

 
Figure 5. MATLAB script to list the objects contained within the Objectstore 
namespace. Comments are given in lines beginning with % and coloured 
green. 


ACTA IMEKO | www.imeko.org March 2023 | Volume 12 | Number 1 | 5 

• datasets of the same experimental parameters eg, data 
of a particular instrument polarity: 
conn.metadata_query('Polarity','Negative') 

• size of acquisitions area for each pixel: 
conn.metadata_query('PixelSize','100 microns') 

• acquisition time for each pixel:  
conn.metadata_query('ScanTime','0.485 sec') 

• data acquired over the same measurement range:  
conn.metadata_query('massRange','m/z 50-1200') 

Note the last three examples can be executed with or without 
the units. Example results for the conn.metadata_query( 
'massRange', 'm/z 50-1200') query are given in Table 1. 

As the data are contained in HDF5 containers with well-
structured metadata all relevant associated datasets, including 
calibration files, processing scripts and the data are interoperable 
and re-usable, meaning external entities can more easily access, 
exchange and make use of the information contained within the 
HDF5 file. 

5. CONCLUSIONS 

We have introduced and demonstrated our tool that allows 
non-experts to interact with a well-curated database using the 
popular programming languages MATLAB and Python. By 
encapsulating multiple difficult APIs into a single, user-friendly 
application, the tool allows measurement scientists to easily 
ensure that their data is saved in a well-curated, FAIR-compliant, 
and traceable database without the need for specialised 
computational skills. The ability to collate all relevant data, tag 
the collated data with relevant machine actionable metadata, and 
upload it to a database with a single function call reduces the 
computational barrier significantly.  

A single line of code is used to perform SQL-type searches 
on the database ensuring all data are findable and accessible. An 
integrated function for downloading and extracting the data in 
its original format allows the utilisation of the HDF5 container 
without requiring the user to interact with it. One such benefit 
of the container file is the storage of data with associated 
metadata, processing and analysis codes providing interoperable 
and reusable data that is also traceable. We demonstrate this 
through a case study of data collected from a large-scale multi-
site imaging project with large volumes of highly complex 
measurement data.  

Template scripts further reduce this barrier and enable the 
capture of metadata at the point of measurement as well as any 
stage throughout the data processing pipeline. This enables 
experiments to easily retrieve data and maximise the usefulness 
of a FAIR and curated database without requiring any knowledge 
of these principles. 

ACKNOWLEDGEMENT 

This work was funded by the Department for Business, 
Energy & Industrial Strategy through the UK’s National 
Measurement System.  

REFERENCES 

[1] M. D. Wilkinson, M. Dumontier, I. J. Aalbersberg, G. Appleton, 
M. Axton (+48 more authors), The FAIR Guiding Principles for 
scientific data management and stewardship, Scientific Data 6(1) 
2019, 9 pp.   
DOI: 10.1038/sdata.2016.18  

[2] S. A. Thomas, F. Brochu, A framework for traceable storage 
and curation of measurement data, Measurement: Sensors, 
18(100201) 2021, 5 pp.   
DOI: 10.1016/j.measen.2021.100201 

[3] S. A. Thomas, F. Brochu, Curation at the point of measurement 
and traceability of measurement workflows, Measurement: 
Sensors, 23(1000399) 2022, 7 pp.   
DOI: 10.1016/j.measen.2022.100399 

[4] M. Santos, P. Sá-Couto, A. Silva, N. Rocha, In DICOM metadata-
mining in PACS for computed radiography X-Ray exposure 
analysis: a mammography multisite study, European Congress of 
Radiology-ECR 2014, Vienna, Austria, 6-10 March 2014, 7 pp. 
DOI: 10.1594/ecr2014/B-0276 

[5] A. B. Haidich, Meta-analysis in medical research, Hippokratia, 
14(Suppl 1): 29–37, 2010. Online [Accessed 17 March 2023] 
https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3049418/ 

[6] N. Mikolajewicz, S. V. Komarova, Meta-Analytic Methodology for 
Basic Research: A Practical Guide, Front. Physiol., Sec. 
Computational Physiology and Medicine, 2019, 20 pp.  
DOI: 10.3389/fphys.2019.00203 

[7] I. Bergeri, M. G. Whelan, H. Ware, L. Subissi, A. Nardone (+25 
more authors), Global SARS-CoV-2 seroprevalence from January 
2020 to April 2022: A systematic review and meta-analysis of 
standardized population-based studies, PLOS Medicine, 19(11): 
e1004107, 2022, 24 pp.  
DOI: 10.1371/journal.pmed.1004107 

[8] C. B. Joy, C. E. Adams, S. Lawrie, Haloperidol versus placebo for 
schizophrenia. Cochrane Database of Systematic Reviews, John 
Wiley & Sons, Ltd, 2001.  
DOI: 10.1002/14651858.CD003082 

[9] J. P. Higgins, I. R. White, A. M. Wood, Imputation methods for 
missing outcome data in meta-analysis of clinical trials. Clinical 
Trials. 2008;5(3), pp. 225-239.   
DOI: 10.1177/1740774508091600 

[10] N. Smith, D. Sinden, S. A. Thomas, M. Romanchikova, J. E. 
Talbott, M. Adeogun, Building confidence in digital health 
through metrology, The British Journal of Radiology, 93(1109) 
2020, 3 pp.  
DOI: 10.1259/bjr.20190574 

[11] AWS, AWS S3 REST API protocol. Online [Accessed 17 March 
2023]  
https://docs.aws.amazon.com/AmazonS3/latest/API/s3-
api.pdf#Welcome 

[12] w3resource.com, Well-formed-XML. Online [Accessed 17 March 
2023]  
https://www.w3resource.com/xml/well-formed.php  

[13] GO FAIR Int. Support and Coordination Office (GFISCO), GO 
FAIR Initiative. Online [Accessed 17 March 2023]   
https://www.go-fair.org/fair-principles/  

Table 1. Example output from the query conn.metadata_query( 'massRange', 
'm/z 50-1200') with some additional fields with categorised results for clarity 
of the general reader. Here we can see several objects from the same 
measurement site or of the same modality, or both, allowing inter-lab and 
multimodal studies and analysis. The file version also gives an indication of 
the provenance of the files and analysis. 

ID File Version Measurement Site Modality 

Object A 1st A B 

Object B 2nd A A 

Object C 1st C A 

Object D 4th B C 

Object E 2nd C C 

Object F 2nd A B 

Object G 1st C C 

…    

https://doi.org/10.1038/sdata.2016.18
https://doi.org/10.1016/j.measen.2021.100201
https://doi.org/10.1016/j.measen.2022.100399
https://doi.org/10.1594/ecr2014/B-0276
https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3049418/
https://doi.org/10.3389/fphys.2019.00203
https://doi.org/10.1371/journal.pmed.1004107
https://doi.org/10.1002/14651858.CD003082
https://doi.org/10.1177/1740774508091600
https://doi.org/10.1259/bjr.20190574
https://docs.aws.amazon.com/AmazonS3/latest/API/s3-api.pdf#Welcome
https://docs.aws.amazon.com/AmazonS3/latest/API/s3-api.pdf#Welcome
https://www.w3resource.com/xml/well-formed.php
https://www.go-fair.org/fair-principles/


ACTA IMEKO | www.imeko.org March 2023 | Volume 12 | Number 1 | 6 

[14] Cancer Research UK, Rosetta Project. Online [Accessed 17 March 
2023]   
https://cancergrandchallenges.org/teams/rosetta 

[15] A. Römpp, Th. Schramm, A. Hester, I. Klinkert, J.-P. Both, R. M. 
A. Heeren, M. Stoeckli, B. Spengler, imzML: Imaging mass 

spectrometry markup language: A common data format for mass 
spectrometry imaging. Methods Mol Biol., 696, 2011, pp. 205–224. 
DOI: 10.1007/978-1-60761-987-1_12 

 
https://cancergrandchallenges.org/teams/rosetta
https://doi.org/10.1007/978-1-60761-987-1_12