Database for tropospheric product evaluations - implementation aspects Jan Douša, Gabriel Győri Research Institute of Geodesy, Topography and Cartography, Geodetic Observatory Pecný Ústecká 98, Zdiby 250 66 jan.dousa@pecny.cz Abstract The high-performance GOP Tropo database for evaluating tropospheric products has been developed at the Geodetic Observatory Pecný. The paper describes initial database structure and aimed functionality. Special focus was given to the optimiz- ing effort in order to handle billions of records. Evaluation examples demonstrate its current functionality, but future extensions and developments are outlined too. Keywords: troposphere, zenith delays, database, GNSS, radiosonde, meteorological data 1. Introduction The potential of GNSS observations for troposphere monitoring has been described in Be- vis et al. (1992). Since that time various projects aimed for developing GNSS-meteorology in Europe. First benchmark of near real-time ground-based GNSS tropospheric products – Zenith Total Delays (ZTD) – was provided within the COST Action 716 – Exploitation of Ground-based GNSS for Meteorology and Climatology (1999-2004). The extended routine analyses were supported by the EU FP5 project – Targeting Optimal Use of GNSS for Humid- ity (TOUGH, 2003-2006). Recently, the E-GVAP I-III (2006-2016) aimed for the establishing operational ZTD estimations and their assimilations in numerical weather models (NWM) and for developing active quality control for GNSS products. Additional effort on enhanced capability of GNSS troposphere monitoring and the exploitation of NWM data for GNSS po- sitioning has been recently prepared and approved within the COST Action ES1206 – GNSS for Severe Weather Events and Climate (GNSS4SWEC, 2013-2017). The Geodetic Observatory Pecný (GOP) analysis centre has contributed to the above projects since 2000 and provided one of the first operational GPS tropospheric products - near real-time regional GPS solution available officially since 2001 (Douša, 2001). Additional tropospheric products in support of meteorology have been developed at GOP during recent years – near real-time regional multi-GNSS product, 2011-present (Douša, 2012a), first near real-time global product, 2010-present (Douša 2012b) and real-time ZTD product, 2012-present (Douša et al., 2013). GOP routine post-processing tropospheric solution has been available also since 1996 for the part of the EUREF Permanent Network (EPN). The complete European network was reprocessed at GOP recently for the entire period 1996-2012. As being the most accurate and homogeneous during the whole interval, the reprocessing could be used in regional studies for climatology. The regular and long-term evaluation of all these products is an important task for both getting a relevant feedback about the accuracy and studying potential for improvements. Initial comparison of GOP tropospheric products was done occasionally (Douša, 2003) using Geoinformatics FCE CTU 10, 2013 39 Douša J., Győri, G.: Database for tropospheric product evaluation PERL scripts for data stored in internal text format. With increased data period such design was recognized as inconvenient and it was replaced by a simple MySQL database used dur- ing 2002-2010 for GOP GNSS-based zenith total delays routine evaluations. Recently, more flexible database structure was requested for fully automating tropospheric data comparisons including the searching of nearby points, filtering, converting, interpolating, generating var- ious statistics and their extracting for web-based plots. New database (labelled as ’GOP Tropo database’) was required to provide a high-performance system in order to deal with billions of data records. As a free alternative to the enterprise solutions, the PostgreSQL server (POSGRESQL) was selected for this task and the database structure was completely revised in order to support its flexible utilization. The GOP Tropo database structure and functionality design is described in Section 2, the optimization aspects in Section 3 and initial comparison examples in Section 4. The conclusion and outlook is given in the last section. 2. Database design This section provides a rough description of the data structure with the focus on specific de- tails. The data organization is the most important aspect of any database since it predefines any future utilization (and its flexibility for extensions). The main GOP Tropo database fea- tures and functionalities were initially defined as well as requirements for future developments to: • accommodate and compare different tropospheric data types in a single database, • provide geo-referencing and collocated point searching for data comparisons and inter- polations, • apply vertical corrections (potentially supported with geoid and orography models), • generate comparison differences and statistics in a yearly, monthly, weekly and hourly mode, • support conversion data types (e.g. zenith wet delay to integrated water vapour and others in future), • interpolate values from grid points and calculate grid points from available data (in future), • study trends, temporal/spatial variations, correlations (in future). Target and potential utilizations are foreseen such as a) to compare (near) real-time GNSS tropospheric products with respect to the final ZTD products, b) to assess GNSS results with respect to radio soundings, radiometers or numerical weather models and other independent datasets, c) to compare troposphere estimates from different space geodetic techniques (GNSS, VLBI, DORIS), d) to evaluate different strategies of interpolations or kriging of meteorological data or tropospheric delays, e) to evaluate in-situ meteorological observations provided in M- RINEX, f) to evaluate global pressure, temperature or specific tropospheric models in future. 2.1. Database structure In any relational database the user data are placed in database tables, which structure is designed optimally for the purpose of the utilization. The data are organized as records (rep- Geoinformatics FCE CTU 10, 2013 40 Douša J., Győri, G.: Database for tropospheric product evaluation Figure 1: Basic structure of GOP Tropo database including original data representation. resented by table rows) while each record has specific values (represented by table columns). The data should not be duplicated and, when any duplication occurs, new database table with relevant records is created to unify any dualities. Such unique record is then related to the original table record by using the relation provided with a reference key (that’s why such database is called relational). The GOP Tropo database is designed to accommodate different tropospheric meteo data types such as: GNSS, VLBI, DORIS, radiosondes, radiometers, synoptic data, in-situ me- teorological data, data extracted from the NWM and other supporting data/models as e.g. geoid, orography. According to the variability of tropospheric products, meteorological data or other supporting data, it would not be easy and efficient to accommodate them within a single data table, because each data source has its own specific stored values. Fortunately, data from each source could be processed independently and it was identified as useful to define a specific data table for each data type. Different data sources for all data types are also considered, e.g. such as the GNSS ZTDs from EUREF, E-GVAP analysis centres, the International GNSS service and many others. Additionally, for a single analysis centre, e.g. GOP, various products are available too, such as final, reprocessed, near real-time (global, GPS, GPS+GLONASS), real-time or others. All the sources within a single data type are accommodated within a single table providing unique source identification for the data. The common structure of the data organization within the database is shown in Fig. 1. All data are georeferenced according to the reference key to the tPoint table (’t’ is always used to identify the table name), which provides additional information about the data source (tSource), site identification (tSite) and point location. Optionally geoid undulation and orography height is provided too. Each record in the tPoint table is uniquely defined by its name, source and position (latitude, longitude, height) along with the position accuracy used Geoinformatics FCE CTU 10, 2013 41 Douša J., Győri, G.: Database for tropospheric product evaluation Table 1: List of existing input filters (PERL) for data decoding and inserting in database Input filter Input format Procedure Remarks tro-snx2DB.pl Tropo-SINEX fInsertGNSS ZTDs, IGS/EUREF products bsw-trp2DB.pl Bernese TRP fInsertGNSS ZTDs, GOP products cost-trp2DB.pl COST-716 fInsertGNSS ZTDs, E-GVAP rt-flt2DB.pl Tefnut output fInsertGNSS ZTDs, GOP real-time analysis raobs2DB.pl BADC profiles fInsertRAOBS Integrated data, radiosondes wvr2DB.pl Radiometrics fInsertWVR Integrated data, radiometers met-rnx2DB.pl Meteo RINEX fInsertINSIT In-situ meteo data (GNSS) cost-met2DB.pl COST Meteo fInsertSYNOP COST 716 meteorological data in identification of a unique point. Specific data tables are currently predefined for the data types – tGNSS, tVLBI, tDORIS, tRAOBS, tWVR, tINSIT, tUSER, tGEOID, tSURF and others where the name can help to identify the data content. Others specific data could be completed later, such as for mapping functions coefficients, slant tropospheric delays, synoptic data, NWP grid data (or more likely their reduction to the specific parameters at a reference surface) and other. 2.2. Database feeding, record uniqueness The database filling is done in three steps: 1) data download, 2) decoding and converting original data including data preparation for SQL command calling a specific database insert procedure and 3) executing SQL command within GOP Tropo database. Input data are collected from various sources via the standard ftp or http downloads to a local disk, all in original and usually compressed formats. This process is controlled via unix cron job scheduler. The data decoding and filtering is done by in-house developed input filters written in the PERL scripting langue. Their main tasks consist of the extracting and converting of data from text files and preparing (and executing) SQL commands calling the GOP Tropo database insert procedure. The input filters are designed for various input formats (e.g. for tGNSS it is Tropo-SINEX format, Bernese ’TRP’ output, COST-716 format) and also specific database insert procedure is called for each data types (e.g. tGNSS uses fInsertGNSS procedure). The list of supporting formats and input filters is given in Tab. 1. The radio sounding profile is the example of data type, content of which is not fully included into the database, but only selected parameters derived at the surface (e.g. pressure, temperature, water vapour pressure, zenith hydrostatic delay, zenith wet delay, relevant lapse rates for possible vertical conversions etc.). The data decoding and database filling is also regularly started from a cron scheduler. The internal database stored procedure for inserting is performed either via INSERT or UPDATE SQL command. The former command supports direct inclusion, but works for new records only. The latter consists of an initial check if the record already exists in the database and, if true, it replaces it with the current data. In this context it is important to understand how records are identified as unique within data tables. Database systems use Geoinformatics FCE CTU 10, 2013 42 Douša J., Győri, G.: Database for tropospheric product evaluation primary keys to specify the column (or columns) that uniquely identify each record, and these can be either natural or surrogate. A natural key is the one that is composed of columns directly related to the data, while surrogate key is a specific column added to the data table only for serving as a primary key (e.g. a unique identifier, auto-incremented numerical value). In many cases we use surrogate keys, but not for the main data tables, where the primary key is designed as a unique index over two columns – Epoch and tPoint. The epoch value is of the type ’TIMESTAMP WITHOUT TIMEZONE’ handling commonly data and time. The point column refers to the record in the tPoint table, where the uniqueness is provided via auto-incremented numerical value. The record in the tPoint table is checked and updated automatically anytime when filling new entry into main data table. This is proceeded as follows – requested point is searched within the tPoint table and, if found, it returns the reference key and, if not, the new point is created. The uniqueness of the point records is thus managed within the point searching/inserting internal procedure (fInsertPoint) taking into account the unique point characteristics: SiteID, Source, PointType and location (latitude, longitude, height) within predefined point accuracy (horizontal and vertical). Figure 2: Scheme of the comparison within GOP Tropo database 2.3. Data and product comparison The main functionality of the current GOP Tropo database implementation is the comparison of various tropospheric data and products. It consists of this sequence of steps (Fig. 2): • comparison configuration (manual), • search of collocated points (with respect to the setting criteria), • generation of data differences for identified pairs, • statistics over data differences, • extraction and visualization (provided outside the database). The user configures data or product comparisons by defining two data sources from one or two data tables. The setting consists of criteria for searching pairs of collocated points – the limit for the horizontal and vertical distance of two points. If applicable, the maximum σ is set for the initial data filtering and the limit of the confidence interval is provided for the statistical procedures to detect and reject outliers. Optionally, mask or explicit site list can be provided if selected stations should be compared only (implicitly all stations). Individual comparison settings are stored as records in the special database table (tPairConf ) where a surrogate key is set for further referencing. Geoinformatics FCE CTU 10, 2013 43 Douša J., Győri, G.: Database for tropospheric product evaluation The candidates for identifying collocated points are searched within the tPoint table for the two specified sources from the setting. The station pairs are generated by the tGeneratePair procedure and the pair list is stored in the tPair table used afterwards for data differences generation (fGeneratePairDiff ) within the period of request. Compared products have usually different sampling rates and the values for comparing could be generally referred to different epochs. For this case, the database supports sampling rate argument defining the interval within which all single product values are extracted and the mean value is calculated for the comparison. An optimum difference sampling step should be set up as the higher data sampling of both products in order to grab common values and, on the other hand, below one hour since the variation of the troposphere will be significantly smoothed by averaging product values over a longer time. The setup between 10 and 60 minutes is usually reasonable. In future, we consider supporting other functional fitting of the single product values for differencing instead of a simple averaging. Generated data differences are always stored in the tPairDiff table enabling to accommodate differences from recently collected data or analyzed products. The statistical procedure (fGeneratePairStat) applies three iterations to estimate biases, stan- dard deviations and root mean squares (r.m.s.) for each compared pair individually. The first iteration serves for the median estimation as a robust initial mean value. It is used in the second step for calculating differences reduced by mean value in order to estimate reliable standard deviation for the outlier detection. In the third step, final statistics – bias, standard deviation, r.m.s., number of all observation and outliers – are calculated after outliers were rejected using the confidence interval based on standard deviations from the second iteration and, optionally, data excluding r.m.s. limit from the setting. For individual comparisons, five statistic modes are provided for a period of request (defined by ’beg’ and ’end’ argu- ments) – all, yearly, monthly, weekly and hourly. The first one calculates the statistics over the requested period, the last for the same period but providing statistics for data filtered according to hour of day. All other modes calculate statistics individually for windows as specified within the whole period. Because the differences are saved in the tPairDiff table, all statistic modes can be efficiently repeated in a regular update for generating time-series as demonstrated in Fig. 3. Finally, the extractions and visualizations of results are described in the section with sample evaluations. Table 2 summarizes the comparison procedures. Note that each comparison procedure uses a key to the configuration record in the tPairConf as its first argument which has been omitted in the table. Table 2: List of comparison procedures/functions (’f’) input/output tables (’t’) Procedure Arguments Input Output Remarks fGeneratePair tPairConf tPair generate pairs fGeneratePairDiff ’beg’,’end’,’sample’ tPair,tData tPairDiff generate differences fGeneratePairStat ’beg’,’end’,’type’ tPairStat generate statistics 2.4. Other procedures for data maintenance Since data are structured using various database tables, any of the removing, viewing, extract- ing and statistic operations require more complex SQL commands, which are implemented as specific stored procedures in GOP Tropo database. According to the relationships between Geoinformatics FCE CTU 10, 2013 44 Douša J., Győri, G.: Database for tropospheric product evaluation Table 3: List of other procedures for database content maintenance Procedure Arguments Category Remarks vPoint ’source’,’site’,’description’ View List filtered point list vSource ’source’ View List filtered source list vSitel ’source’,’site’,’description’ View List filtered site list vData ’source’,’site’,’description’ View List filtered data vDiff ’source’,’site’,’description’ View List filtered differences fDataInfo Info Information on data table fPairInfo Info Information on pair tables fStatInfo Info Information on statistics tables fDeletePoint ’source’,’site’,’description’ Delete Remove selected data and site fDeleteSite Delete Remove unreferenced sites fDeletePair ’source’,’site’,’description’ Delete Remove pairs and differences records in various data tables, such procedures combine data for a transparent and user- friendly output. Additionally, for easy data selection, three common arguments for name or mask input (via a simple regular expression using asterix) are supported for most common maintenance procedures. Table 3 provides an incomplete list of maintenance procedures with respect to three category operations: view, delete or info. While view procedures are designed to extract specific data combinations from various tables, the info procedures extract general information about table contents – e.g. start/end of data, number of records etc. All these procedures are, however, only minimum implementations to simplify common queries while any other specific query can still be requested using a standard SQL command. On the contrary, delete procedures could be very tricky, in particular due to a possibility to lose a consistency within tables and their relations. Any removal is implicitly driven by the ’DELETE CASCADE’ attribute used for most tables. The attribute specifies that when referenced row is deleted, row(s) referencing it should be automatically deleted as well. However, the cleaning of any specific point-/pair-related data or differences should be done with the relevant delete procedure. For cleaning sites in the tSite table which are not anymore referenced a specific stored procedure exists too. Table 4: List of selected important settings from postgresql.conf configuration file Name of variable Value shared_buffer 3000 MB work_mem 512 MB max_connections 10 maintenance_work_mem 256 MB default_statistic_target 300 effective_cache_size 5000 MB constraint_exclusion partition Geoinformatics FCE CTU 10, 2013 45 Douša J., Győri, G.: Database for tropospheric product evaluation 3. Database optimization The database is running on a dedicated server with GNU Linux operating system Debian 6.0.7. and, currently, has reserved 12Gb memory and 8 thread 64-bit Intel(R) Xeon(R) CPU. This hardware configuration is sufficient for current operations. However, more than 750 millions of records are stored in each of the largest tables – tGNSS and tRAOBS – and there are more than one billion records total in the database, utilizing almost 100Gb of hard drive space. This amount of data can cause lack of performance and rapidly decrease execution speed of some SQL queries. The highest performance can be reached by optimal configuration of several parameters which are stored in postgresql.conf settings file, because they are set to extremely low values by default and, in most cases, they are not related to the available hardware configuration. For that reason, we discuss recommended and applied settings in the next paragraph. The first important parameter – work_mem – defines the maximum limit of memory which can be used for one sorting operation. The amount of memory usage increases with each ad- ditional sorting. Each client connected to the same database server typically uses a maximum of two sorting operation at one time. This implies that the value of work_mem could be set to the amount of unused memory divided by the maximum number of connections while di- vided by two. The number of connections can be reduced in max_connections setting. When sorting, PostgreSQL estimates first the amount of memory for possible use. If work_mem value is not high enough, the system will use swap operations along with free space on the hard-drive, decreasing the performance. The parameter shared_buffer then determines the maximum amount of memory which can be used for cached data in memory after they are read from the hard drive. The higher value, the bigger the set of data is possible to store in memory, which reduces the number of swap operations. Optimal value is quite difficult to set, but 30% of the available memory is recommended for a dedicated server. The VACUUM operation is one of the most important commands in PostgreSQL database server. It is a kind of garbage collector which releases allocated space from invalid records. The VACUUM operation is closely related to the ANALYZE operation which is used to col- lect statistics for optimizing a server query plan. The automatic maintenance of database is provided by the AUTOVACUUM command, which is turned on by default. The main- tenance_work_mem parameter in the setting then defines the amount of memory which can be used for AUTOVACUUM. The last important parameter is default_statistic_target, value of which determines the quantity of data used for statistics. The higher the value, the higher the CPU utilization generated by server. On the other hand, a higher value of default_statistic_target likely yields more precise statistics and thus possibly increase the speed of the next SQL query. Table 4 summarizes the important setting of the PostgreSQL configuration file. As it has been already mentioned above, the biggest table in GOP Tropo database contains more than half a billion of rows, which can increase easily when new data are introduced (e.g. such as from E-GVAP project). The SQL queries usually work with data in specific time range, for example, generating monthly statistics between different products. The PostgreSQL supports basic table partitioning while splitting single large table into several smaller pieces. This is done by applying an inherited scheme, which defines a parent table similarly as the original table, while data are stored in a sequence of child tables (partitions). Such Geoinformatics FCE CTU 10, 2013 46 Douša J., Győri, G.: Database for tropospheric product evaluation partitioning can rapidly improve system performance, in particular when the query works with data in a single partition. On the other hand, additional overheads are relevant to all ’INSERT’ commands calling a special trigger function when the table is partitioned. The inherited implementation guarantees that the partitioning has no direct influence to scripts or applications used within the database since data stored in child tables inherits behaviour from the empty parent table. In GOP Tropo database, range partitioning was applied in a yearly scheme on all large tables and each tGNSS child table thus contains about 20 millions records on average. New partitions are created by specific trigger PL/SQL functions called when data are inserted in the table. Table 5 summarizes the execution time of computing average value from data stored in a single partition restricted by WHERE clause. It is obvious that such query is much faster on the partitioned table than on the table without partitioning. EXPLAIN ANALYZE SELECT AVG(ztd) from tGNSS as t1 where t1.epoch >= ’2007-01-01 00:00:00’ AND t1.epoch < ’2008-01-01 00:01:00’ AND t1.FK_Point = XX; Table 5: The statistics from the partitioning test via analyze function for specific (repeated) command Sequence Station Source Execution time Execution time number (partitioned table) (single table) First GOPE EUREF Repro1 1363 [ms] 1300 [ms] Second GOPE EUREF Repro1 13 48 Third GOPE EUREF Repro1 13 48 First ALBH IGS Repro1 47 125576 Second ALBH IGS Repro1 0.1 4422 Third ALBH IGS Repro1 0.1 4422 The database performance was tested on several different clusters as defined in Tab. 6. The original cluster is labelled as A and to this variant additional optimization steps were applied sequentially. In the first step the original file system (ext3) was replaced with newer file system (ext4) to improve swap operations (cluster B). The PostgreSQL setting was then revised according to that described above (cluster C). The tGNSS table was divided into yearly partitions (cluster D). Monthly partitioning produced a lot of tables and this variant was rejected from the comparisons. The tPairDiff table was also partitioned applying the yearly scheme (cluster E). The comparison averaging interval was decreased from 60 to 10 minutes (cluster F). The PostgreSQL was updated from 8.4 to recently newest 9.2 version (cluster G) with applying original settings and, finally, the settings was tuned also in PostgreSQL 9.2 in a similar way as for the version 8.4 (cluster H). Table 6 summarizes the settings and performance of tested clusters. The statistics clearly show the importance of PostgreSQL configuration tuning, which improved overall performance by a factor of 2-4, in particular for all highly time-consuming procedures (e.g. inserting). That is true for the old as well as the new system version. The new file system also did not provide any improvements as expected, but even a slight degradation. In this general performance test, partitioning did not cause an increased performance when sets of procedures applying more complex SQL queries were used. This was not expected, but in general we decided to Geoinformatics FCE CTU 10, 2013 47 Douša J., Győri, G.: Database for tropospheric product evaluation keep partitioning since only very small degradations and overhead costs were found. Further benefit can be expected for new PostgreSQL releases as well as in with growth of the data in database. The latter was the primary argument for preserving partitioned tables for large datasets. Table 6: Variant settings of the database optimization and their performance Cluster File Partition PostgreSQL Compared Insert Total Diff system data/diff vers/tuning samples [s] [s] [s] A ext3 1/1 8.4 / no 60 min 4712 1130 835 B ext4 1/1 8.4 / no 60 min 5050 1361 864 C ext4 1/1 8.4 / yes 60 min 1348 944 660 D ext4 Yearly/1 8.4 / yes 60 min 1663 1324 1033 E ext4 Yearly / Yearly 8.4 / yes 60 min 1653 1377 1079 F ext4 Yearly / Yearly 8.4 / yes 10 min 1637 7855 7563 G ext4 Yearly / Yearly 9.2 / no 60 min 5759 1568 1228 H ext4 Yearly / Yearly 9.2 / yes 60 min 1739 1554 1207 4. Evaluation examples In order to demonstrate the initial database functionality, we provide several samples from routine evaluations with a focus on GNSS tropospheric products comparisons. The following products were used in the demonstration figures: • IGS operational and re-processed (Repro1) tropospheric products (Byram, 2012), • EUREF combined tropospheric results from the operational and re-processed (Repro1) solutions (Soehne and Weber, 2005), • GOP global near real-time tropospheric product (Douša 2012b), • GOP near real-time GPS and multi-GNSS(GPS+GLONASS) tropospheric products (Douša, 2012a). We do not intend to go into details in the examples and we do not thus discuss details about product differences, e.g. applied software, processing strategy, models, constraints, precise products and many others affecting the ZTD solution. It is out of the scope of this paper to study effects of various changes clearly visible in statistics. Sample comparisons demostrate the calculated biases and standard deviations, either for the entire period or within period split into regular discrete intervals (e.g. years, months, weeks). The statistic results are provided with the database procedures for predefined configurations and the results are extracted from the database and visualized with Gnuplot (Williams and Kelley, 2011) or GMT (Wessel and Smith, 1998) plotting tools. Various figures are generated – total bias and standard deviations for the whole period and all common stations, the geographical distribution of the values or time-series of mean statistics over all stations from the comparison. The samples are given in Figs. 3-6. Figure 3 shows the assessment of the GOP near real-time multi-GNSS ZTD solution with re- spect to the post-processed EUREF combined ZTD product over three months in 2011. For Geoinformatics FCE CTU 10, 2013 48 Douša J., Győri, G.: Database for tropospheric product evaluation Figure 3: Example total bias and standard deviations for GOP GPS and GNSS ZTD near real-time solutions with respect to EUREF combined ZTD solution (three months in 2011). The circles indicates multi-GNSS stations. Figure 4: Weekly mean and its r.m.s. of ZTD biases and standard deviations from all stations each station, which was identified as common to both products, the bias and standard devia- tion is calculated and plotted. Such plot provides information about the internal accuracy of GNSS ZTD products on a station by station basis. This is useful to assess a consistency of various strategies (and models) for comparison pairs when different product timeliness, input products, GPS or multi-GNSS observations and others are used. Figure 4 and 5 show time-series of a long-term comparison of two products on a weekly and monthly intervals, respectively. The biases and standard deviations were calculated as mean values over all common stations for each interval individually so that any effect common to all stations can be visualized. Additionally, r.m.s. of such mean is calculated and plotted. Figure 4 compares historical and homogeneously reprocessed IGS ZTD products. The evolution of models and strategies within the historical product are clearly seen when compared to the re-processed ZTDs using the same strategy and models over entire period. Figure 5 compares two different re-processing products – IGS (global) and EUREF (regional) while common stations in Europe could be compared only. Although both products are Geoinformatics FCE CTU 10, 2013 49 Douša J., Győri, G.: Database for tropospheric product evaluation Figure 5: Monthly mean and its r.m.s. of ZTD biases and standard deviations from all stations consistent over all time, the standard deviation clearly shows improvement in time, which can be attributed to the steadily increasing quality of data and products when more permanent stations in global and European networks are involved. Finally, Figure 6 shows a comparison of IGS Final and IGS Repro 1 ZTD values for all dates covered by Repro 1. (Whereas Fig. 4 compared IGS Final and IGS Repro 1 ZTD estimates according to date, Fig. 6 compares them according to site.) The ZTD standard deviation is typically lower (about 2 mm) at low North hemisphere as well as in low latitudes in general. The effect of isolated stations, e.g. in central Asia, Africa and oceans, could be easily identified from the figure with standard deviations up to 4-5 mm. This can be attributed to the effect Figure 6: Displays geographical representation of ZTD standard deviations from two IGS global solutions Geoinformatics FCE CTU 10, 2013 50 Douša J., Győri, G.: Database for tropospheric product evaluation of lower accuracy of precise global orbit and clock products due to the lack of contributing stations, in particular during 90th. 5. Summary and outlook The developments of the highly performance GOP Tropo database for the evaluation of tropospheric data and products were described with a special focus on its implementation aspects. The structure was designed in a flexible way to fulfil requirements specified in the introduction. Although the initial and primary motivation aimed for routine comparisons and evaluation of GNSS tropospheric products, current implementation functionality already goes beyond this scope. A special effort was given to the database optimization to support billions of records, which is already easy to achieve with several products. The optimization shows the need for revision of PostgreSQL default settings which could improve the overall performance by a factor of 2-4. Although careful partitioning did not improve performance, it was decided to keep it for the future since it is expected to become very important in handling huge quantities of data. Finally, samples of results were provided in order to demonstrate currently implemented functionality on selected interesting examples. Acknowledgements The database development was supported by the Czech Science Foundation (No. P209/12/2207). The authors also thank to Dr. Christine Hackman from U.S. Naval Observatory and two anonymous reviewers for useful comments improving the manuscript. References [1] Byram, S., Hackman, C. and Tracey, J. 2012. Computation of a High-Precision GPS- Based Troposphere Product by the USNO. Proceedings of the 24th International Tech- nical Meeting of The Satellite Division of the Institute of Navigation (ION GNSS 2011). 2012, Portland, OR, September 2011, pp. 572-578. [2] Douša, J. (2001): Towards an Operational Near-real Time Precipitable Water Vapor Estimation, Physics and Chemistry of the Earth, Part A, 26/3, pp. 189-194. [3] Douša, J. (2003): Evaluation of tropospheric parameters estimated in various routine analyses, Physics and Chemistry of the Earth, 29/2-3, pp. 167-175. [4] Douša, J. and G.V. Bennitt (2012): Estimation and evaluation of hourly updated global GPS Zenith Total Delays over ten months, GPS Solutions, Springer, Online-First. [5] Douša, J. (2012): Development of the GLONASS ultra-rapid orbit determination at Geodetic Observatory Pecný, In: Geodesy of Planet Earth, S. Kenyon, M.C. Pacino, U. Marti (eds.), International Association of Geodesy Symposia, Vol 136, pp.1029-1036. [6] Douša, J., Válavovic, P. and Gyori, G. 2013. Development of real-time GNSS ZTD prod- ucts. presentation at the EGU 2013 General Assembly, April 7-12, 2013. [7] Williams, T. and C. Kelley (2011). Gnuplot 4.5: an interactive plotting program. URL: http://www.gnuplot.info. [8] POSTGRESQL, http://www.postgresql.org/docs/8.4/static/reference.html. Geoinformatics FCE CTU 10, 2013 51 http://www.gnuplot.info http://www.postgresql.org/docs/8.4/static/reference.html Douša J., Győri, G.: Database for tropospheric product evaluation [9] Soehne, W. and G. Weber (2005): Status Report of the EPN Special Project “Tropo- sphere Parameter Estimation”, EUREF Publication No. 15, Mitteilungen des Bunde- samtes fuer Kartographie und Geodaesie, Vienna, Austria, Band 38, pp. 79-82. [10] Wessel, P. and W.H.F. Smith (1998): New improved version of the Generic Mapping Tools Released, EOS Tans. AGU, 79, 579. Geoinformatics FCE CTU 10, 2013 52