CHEMICAL ENGINEERINGTRANSACTIONS VOL. 61, 2017 A publication of The Italian Association of Chemical Engineering Online at www.aidic.it/cet Guest Editors:Petar SVarbanov, Rongxin Su, Hon Loong Lam, Xia Liu, Jiří J Klemeš Copyright © 2017, AIDIC Servizi S.r.l. ISBN978-88-95608-51-8; ISSN 2283-9216 Verification of Information in Large Databases by Mathematical Programming in Waste Management Radovan Šompláka*, Vlastimír Nevrlýa, Veronika Smejkalováb, Martin Pavlasa, Jakub Kůdelab a Institute of Process Engineering, Faculty of Mechanical Engineering, Brno University of Technology – VUT Brno, Technická 2896/2, 616 69 Brno, Czech Republic b Institute of Mathematics, Faculty of Mechanical Engineering, Brno University of Technology – VUT Brno, Technická 2896/2 616 69 Brno, Czech Republic Radovan.Somplak@vutbr.cz The obligation to register production and waste management leads to a formation of a large-scale database. The reporting obligation concerns immediately a large number of subject that may cause discrepancies in reported data. The paper presents an approach for error detection in large data files. Errors are reflected as inconsistency in total production and processing or in transportation between two nodes. In this case, the area (node), that has sent the waste, registers a different quantity than the node that has received the waste. The database of waste management is an essential source of information for many calculations and analysis, which further open up the scope for the realisation of projects, so it is important to have accurate data. This paper presents an approach for identifying errors in the database using mathematical programming techniques. This issue was solved as a task of network flow with an emphasis on the force of mass balance in nodes. The objective is to make the amount of produced and delivered waste to each node equal to the amount that was there processed or removed. This is required with the minimum modification of the input data. Weights are introduced to distinguish high and low-quality data by assigning bigger values to arcs where sent amount correspond with quantity received. In this case, there is no reason to consider the data as erroneous. This tool has been tested through a case study on the database of waste management in the Czech Republic. The considered network consists of 206 nodes representing municipalities, which corresponds to 42,230 edges (possible flows). The output from the calculation is a large amount of data, which are in terms of approximation to initial values interpreted as maps. However, the tool could be used for other areas of records and databases, where there is a transfer of any material flow. In the further research, the model can be supplemented by specific constraints arising from additional information for the specific application. In this case, decision-making about network flow would be done with taking into account the shortest distance between producers and treatment facilities. 1. Introduction In many fields, the central databases, devoted to acquiring, producing, handing over and processing of certain material flow, contain some inconsistencies in reported data. This problem arises mainly because of the involvement of a large number of entities that have the obligation to report. As an example of fields, where inconsistencies can emerge in the registration database, can be mentioned: records of income and expenses - tax documents, receipt and delivery of goods, reporting in waste management. The dependence of conclusive results from many optimisation-based models on accurate data inputs is indisputable. An inaccuracy in input data can lead to erroneous or suboptimal decisions, especially in network flow models such as Šomplák et al. (2013) or in supply chain models like Čuček et al. (2011) and Stille et al. (2011) for P-graphs, or other problems considered in process engineering, where uncertainties are included. Several different approaches have been proposed to tackle this issue. Roupec et al. (2013) proposed a hybrid algorithm for network design problem with uncertain demands. A solid assessment of the Origin-Destination DOI: 10.3303/CET1761162 Please cite this article as: Šomplák R., Nevrlý V., Smejkalová V., Pavlas M., Kůdela J., 2017, Verification of information in large databases by mathematical programming in waste management, Chemical Engineering Transactions, 61, 985-990 DOI:10.3303/CET1761162 985 matrix estimation techniques was presented by Bera and Rao (2011). The study by Karlaftis and Vlahogianni (2011), and Sun et al. (2006) investigate the trade-off between using more traditional statistical methods and neural networks in transportation problems. A similar investigation was conducted in the problems concerning failure and reliability predictions by Zio et al. (2012). In the case of the supply chain problems, the input data which creates the network (edges and nodes) have to be often verified using different techniques. The proposed articles utilise only one value for each parameter. In some databases, the particular information might be reported by more subjects differently. This fact makes it easier to identify potential errors. First entity gives the amount of handover material, and the second one notes down the takeover. These amounts are supposed to be equal, but often this is not true. The flow principle is illustrated in Figure 1. It schematically describes the material flow through the network (from a source node over transfer nodes to target facility). The inconsistencies arise always between two neighbouring nodes, where for example source node handover certain amount (H1), but transfer node does not takeover the same amount (T1). The equivalent situation can be detected in H2 and T2, see Figure 1. Figure 1: Schematic representation of reporting flows Due to error illustrated in Figure 1, accurate values of material flow are unknown. These errors significantly restrict supply chain planning, where it is necessary to have a quality data from the previous period. Usually the errors are supposed to be random, but in this case, the systematic error is assumed. It is because of reporting one value by more subjects, where the error is expected from either of them. This paper presents an approach based on optimisation methods which identify errors leading to inconsistencies in databases (Section 2). Section 3 focuses on a case study and the mentioned process is applied to the issue of waste management registration in the Czech Republic. The result is the model of production and assessment of transportation effectivity, which identify deficiencies in processing capacities within regions. 2. Mathematical model The mathematical model is described within the waste management application. As noted previously, there are two sources of information for each shipment of waste. The first one is the node sending the waste, and these values are marked by index (-). The second information comes from a node receiving the waste designated by index (+). These values are considered as two equivalent scenarios. The verification of data should guarantee the validity of the mass balance in the nodes with minimal change in input data. Sets: ∈ index of the node ∈ index of the arc Parameters: incidence matrix for sending incidence matrix for receiving amount of waste shipped on the arc according to the scenario - amount of waste shipped on the arc according to the scenario + weight of the arc , ∈ 0;1 waste production in the node waste processing in the node threshold of zero penalization Variables: error on the arc j, scenario - error on the arc j, scenario + 986 error in the node penalization Positive variables: positive part of the error negative part of the error positive part of the error negative part of the error positive part of the penalization negative part of the penalization min ∈∈ (1) s.t. ∈ 0,∈ ∀ ∈ (2) , ∀ ∈ (3) 0, ∀ ∈ (4) 0, ∀ ∈ (5) , ∀ ∈ (6) , ∀ ∈ (7) sgn ), ∀ ∈ (8) , ∀ ∈ (9) The objective function in Eq(1) minimises weighted total error on arcs and sum of positive part of the penalization for exceeding waste production. The construction of weights is described below. Constraint Eq(2) is the equation for conservation of the mass balance in each node. The balance on each arc ensures equality of scenarios (+,-) for the sum of the data and found error. Constraints Eq(4) specifies non-negativity of flow on each arc for scenario (-). The same property for scenario (+) is ensured due to Eq(3). Similarly, Eq(5) is condition for non-negativity production. The errors for both scenarios (+,-) are decomposed into positive and negative part in Eq(6) and Eq(7). The assessment of penalty function for each node provides Eq(8). The calculation of parameter (threshold for zero penalty) is described below. The last Eq(9) splits penalization into positive and negative part. In the case of the waste registration, inconsistencies can be found also in total waste production and processing. This is probably caused by incorrect records and the errors are expected especially on the production side. This error rate on the production side is caused by recalculating the production according to the rules of the annual report, so a high precision of processing data may be assumed. This inconsistency problem can be solved by integrating penalty into optimisation tasks, specifically in the objective function Eq(1). The illustration of penalty function is shown in Figure 2 and arising equation that this principle transforms into a model is Eq(8). The computation of threshold for zero penalty was set according to Eq(10), which correspond to the ratio of an average change of production and average production to maintain the balance with the processing. Parameter determines the ratio of error for each producer, which can be reached without being penalized, assuming that each producer participates in the error equally. The sign of is determined by differences between production and treatment from Eq(10) and affects the form of equation Eq(8). Figure 2: The choice of penalty function 987 ∑ ∈∑ ∈ (10) More significant errors occur in the data about the waste streams. In addition, the recalculation of production does not take into account the flow due to lack of awareness of processing site for a specific producer. To sum up, an inconsistency in the database may arise in the following manner: incorrect evidence, incorrect data processing, reporting to company headquarters, missing re-count of production and others. To distinguish between these inconsistencies, the weights were introduced. Each weight is based on the similarity of the transmitted and received quantities. The more these values are equal, the greater weight is chosen, to ensure that the respective error is lower. To ensure this feature, weights were selected as in the equation Eq(11). 1, , 01 max , , ∀ ∈ (11) As a result of the presented mathematical model, errors in production and transported amount of waste for all arcs are estimated. The results indicate the efficiency of the transport of waste and self-sufficiency of each region with regard to the processing capacities and the appropriateness of the current deployment. On this basis, the region with great potential for economic and environmental improvements can be identified. 3. Case study The case study is an example of model utilisation for the balance of mixed municipal waste in the Czech Republic in 2015. The input data were provided by the Ministry of the Environment of the Czech Republic. One of the contributions of this computation is information about the waste production and therefore also about the necessity of waste export. The modelled production for each node consist of sum of input data ( and error ( ). In the most of territorial units, the resulting has ended with threshold value ( ), only few of them has been penalized. It has been given by waste flows with a big weight ( ). Figure 3 illustrates a comparison between the modelled waste production and processing for each region. At first sight, significant differences can be observed. Some regions have enough capacity for the processing the waste they produce (their own waste) and can even import a foreign waste, e.g. Central Bohemian Region, Pardubice Region, Liberec Region, or South Moravian Region. Transport over long distances is not desirable, but it should be noted that some exports between the regions are advantageous for geographical and environmental reasons. Regions Plzeň and Moravian-Silesian are characterised by a lack of capacity. These areas are dependent on the processing in other territories. It points out on the potential for more efficient transport of waste relative to the economic and environmental aspects, in the case, when processing capacities would be increased. Figure 3: Comparing the production and processing of mixed municipal waste in the various regions. 988 The optimum operation has Karlovy Vary Region, where the quantities of treated waste are nearly the waste production. If the production after the balance is in accordance with processing, it does not mean that there is not import and export. The Ústí nad Labem Region is worth mentioning. It has an acceptable ratio of production and processing, however, considerable interregional transport takes place. This is due to the size of the territory and a small number of processing sites. The similar situation is in the Prague, the Capital City. This is a specific case in the Czech Republic, the region comprises of the only city. The attention should be also concentrated on the regions Zlín and Vysočina, which show an almost ideal data in terms of processing on their own territory. The average distance in km for transportation of produced waste is depicted in Figure 4, which shows two types of transport (within and outside the region). In this respect, the Hradec Králové Region greatly exceeds all others and this is the only region that is so different. The problem is not the fact that region exports waste beyond its own area, but overall travelled kilometres, which cause a high impact on the environment. This happens due to the inappropriate deployment of processing capacities, but reveal the space for efficiency improvement. Figure 4: The average transported distance in km related to one produced tonne of waste Generally, exporting usually dominates, with some exceptions. The reason is the inefficient distribution of processing facilities within the particular region, even though some regions are almost self-sufficient. According to the stated aspects, it can be concluded, that waste is treated effectively in regions Vysočina and Zlín despite the little lack of their own processing capacity. Although they are forced to process part of the production elsewhere, it is multiple times less than in other cases in terms of km. Table 1: The aggregated results of waste flows beyond the Karlovy Vary Region Micro-region Handover [t] —>O Reporting [t] Result [t] Reporting [t] O—> Takeover [t] Result [t] Aš 56 9 9 930 492 1,320 Cheb 398 11 16 11,760 12,427 11,756 Karlovy Vary 3,664 658 664 3,955 10,966 4,613 Kraslice 0 0 0 53 46 57 Mariánské Lázně 83 0 0 4,012 6,511 4,360 Ostrov 328 0 25 4,771 10,750 5,055 Sokolov 31,656 8,780 27,105 3,346 3,138 4,905 989 The detailed illustration of results for the Karlovy Vary Region is in Table 1. This region includes a total of seven lower territorial units – micro-regions. The symbol —>O means the total import to the micro-region and handover (tagged as H in Figure 1) denotes the amount of waste which flows from micro-regions beyond the Karlovy Vary Region. The reporting (tagged as T in Figure 1) in this context means the amount of waste which was reported in the particular micro-region. Conversely, O—> is a total export from the particular micro-region and the reporting here swaps the role with the takeover. 4. Conclusions This paper has introduced the mathematical model, which allows estimation of errors in current flows which were reported in the waste management database. The computation was presented at the case study of mixed municipal waste in the Czech Republic in 2015. The results demonstrate the potential for more efficient transport of waste within individual regions. It shows the dependence on other regions and the degree of export. A big contribution is in the identification of regions with insufficient capacities (high export) or bad deployment of processing facilities (high export and import). This analysis could serve as the initialization process before computing another optimisation tasks about potential construction of new treatment facilities. For routing in cities, Viktorín et al. (2016) proposed differential evolution algorithm while the whole topic is covered by Šomplák et al. (2013), where authors allocate facilities to optimally transport the waste. In the future research, these approaches could be integrated into one multiphase tool. Acknowledgments The authors gratefully acknowledge financial support provided by Technology Agency of the Czech Republic within the research project No. TE02000236 "Waste-to-Energy (WtE) Competence Centre" and by the project Sustainable Process Integration Laboratory – SPIL, funded as project No. CZ.02.1.01/0.0/0.0/15_003/0000456, by Czech Republic Operational Programme Research and Development, Education, Priority 1: Strengthening capacity for quality research. References Bera S., Rao K.V.K., 2011. Estimation of origin-destination matrix from traffic counts: the state of the art. European Transport, 49, 3-23. Čuček L., Klemeš J. J., Varbanov P.S., Kravanja Z., 2011. Life cycle assessment and multi-criteria optimization of regional biomass and bioenergy supply chains. Chemical Engineering Transactions, 25(1), 575 - 580. Karlaftis M.G., Vlahogianni E.I., 2011. Statistical methods versus neural networks in transportation research: Differences, similarities and some insights. Transportation Research Part C: Emerging Technologies, 19(3), 387-399. Roupec J., Popela P., Hrabec D., Novotný J., Olstad A., Haugen K.K., 2013. Hybrid algorithm for network design problem with uncertain demands. World Congress on Engineering and Computer Science, 1, 554- 559. Stille Z., Bertók B., Friedler F., Fan L. T., 2011. Optimal design of supply chains by P-graph framework under uncertainties. Chemical Engineering Transactions, 25(1), 453 - 458. Sun S., Zhang C., Yu G., 2006. A Bayesian Network Approach to Traffic Flow Forecasting. IEEE Transactions on Intelligent Transportation Systems, 7(1), 124-132. Šomplák R., Procházka V., Pavlas M., Popela P., 2013. The Logistic Model for Decision Making in Waste Management. Chemical Engineering Transactions, 35(1), 817 - 822. Viktorín A., Hrabec D., Pluháček M., 2016. Multi-chaotic differential evolution for vehicle routing problem with profits. 30 th European Conference on Modelling and Simulation, 30, 245-251. Zio E., Broggi M., Golea L.R., Pedroni N., 2012. Failure and reliability predictions by infinite impulse response locally recurrent neural networks. Chemical Engineering Transactions, 26(1), 117 - 122. 990