DOI: 10.3303/CET2188077 Paper Received: 3 June 2021; Revised: 11 September 2021; Accepted: 3 October 2021 Please cite this article as: Pluskal J., Šomplák R., Kůdela J., 2021, Novel Approaches for Transport Infrastructure Reduction to Effective Optimisation of Flow Tasks, Chemical Engineering Transactions, 88, 463-468 DOI:10.3303/CET2188077 CHEMICAL ENGINEERING TRANSACTIONS VOL. 88, 2021 A publication of The Italian Association of Chemical Engineering Online at www.cetjournal.it Guest Editors: Petar S. Varbanov, Yee Van Fan, Jiří J. Klemeš Copyright © 2021, AIDIC Servizi S.r.l. ISBN 978-88-95608-86-0; ISSN 2283-9216 Novel Approaches for Transport Infrastructure Reduction to Effective Optimisation of Flow Tasks Jaroslav Pluskala,*, Radovan Šompláka, Jakub Kůdelab a Institute of Process Engineering, Faculty of Mechanical Engineering, Brno University of Technology, Technická 2896/2, 616 69 Brno, Czech Republic b Institute of Automation and Computer Science, Faculty of Mechanical Engineering, Brno University of Technology, Technická 2896/2, 616 69 Brno, Czech Republic Jaroslav.Pluskal@vutbr.cz Nowadays, increasing complexity of solved optimisation problems leads to necessity of dealing with computation time demand. In the case of network flow tasks, computation time is highly dependent on detail of transport infrastructure. The presented paper is concerned with developing novel approaches for transport infrastructure reduction using clustering analysis. According to the required outputs of the task, it is possible to variably change the detail of the network in individual territorial units to ensure the solvability of the task, but without significant distortion of the results. The main idea and novelty of the presented research is to have a finer construction only in the vicinity of the monitored subject. With a greater distance, it is possible to reduce the level of detail in the transport network. The principle of reduction technique is based on transformation of geographic coordinates with subsequent cluster analysis. K-means and hierarchical clustering are introduced and results of developed approach are shown on municipalities in Czech Republic. Consistency within clusters of both methods is evaluated using silhouettes. The presented methodology allows to solve optimisation of case studies more efficiently with greater detail in monitored region, which leads to more accurate solutions. 1. Introduction Nowadays, the mathematical programming produces comprehensive supporting tools describing real tasks and providing appropriate insight into strategic planning. In logistics and in optimisation of network related problems, there are real-world problem instances that are too computationally demanding even for today’s computer technology. Even a relatively simple looking mathematical model on a real-world transportation infrastructure network can be effectively unsolvable, as it would require several years for the computations to terminate (Pavlas et al., 2015). Typical examples of these problems are the traveling salesman problem and the vehicle routing problem (Archetti et al., 2011). Solving these types of problems often requires the use of different heuristics, that have no guarantee of finding the optimal solutions, but are not burdened by the exponential computational times (Borthen et al., 2018). A different approach for speeding up the computations is to reduce the problem in question and focus on obtaining a solution that concentrates only on selected aspects of the problem. This is achieved by pre- processing the data that serve as the input for the optimisation models. For the problems in logistics, one of the steps of the pre-processing procedures is the reduction of the considered transportation infrastructure, decreasing the number of nodes (or cities) and arcs that connect them. Lam et al. (2011) studied the infrastructure reduction in a biomass supply chain. The presented techniques were comprised of the reduction of the connectivity of the considered transportation network based on the distance between individual nodes and the removal of unnecessary model variables (such as flows from nodes with zero production). Li and Balakrishan (2016) used optimisation models for transportation network reduction, where the main objective was to minimise the sum of the shortest paths between nodes or to minimise the number of connecting arcs. The presented method is mainly used for approaching vehicle routing problems, while its application in flow allocation problems might be problematic, since it does not consider capacities of the nodes in the graph and the resulting reduction could make the problem infeasible. Different analogies of these network reduction 463 methods are also used in various other application areas. Chardy et al. (2012) used a node and arc elimination method in telecommunication problems. In the energy domain, Svendsen (2015) used grid reduction techniques for the analysis of large scale renewable energy integration, and Hinners et al. (2019) developed a methodology for model order reduction of active distribution network models for interconnected power flow control. A different angle at network reduction is represented by the utilisation of cluster analysis (Grabusts, 2018). Ishizaki et al. (2014) developed a clustering algorithm for undirected graphs based on Krylov subspaces and tridiagonalisation, which was further extended to directed weighted graphs with positive weights in Ishizaki et al. (2015). This approach uses advanced mathematical methods, and its applicability depends on the eigenvectors of the matrix that describes the transportation network. For weakly connected networks, Cheng and Sherpen (2020) presented a clustering algorithm that guarantees that the nodes within a cluster form a connected subgraph of the original network. The study Lam et al. (2010) presented two-level strategy for supply chain network optimisation. The principles are based on cluster analysis, when supply chain synthesis is carried out by process graph within each cluster and afterwards, optimisation on clusters infrastructure level is performed. A review of more than 140 articles that present inventive network reduction techniques was carried out by Đukić and Sarić (2012) and shows the remarkable intricacy of the newly developed methods. However, there were no methods with a connection to logistics optimisation problems and their practical solutions. In this paper, two novel clustering approaches for transport infrastructure reduction are presented. The aim is to cluster points in a transport network unevenly, with fine detail (more smaller clusters) in the vicinity of a chosen reference point, where any change in the infrastructure in the reference point (e.g., increased capacity) might have large impact on the optimal flow allocations. On the other hand, further away from the reference point, the level of detail should get coarser (less larger clusters), resulting in a high level of aggregation and, consequently, a reduction of the number of nodes and arcs used for the description of the whole network, which should have positive effect on the computational requirements for solving optimisation problems on the reduced network. 2. Proposed clustering approaches The main goal is to determine a general rule or criterion for changing the detail of the infrastructure. For this purpose, the cluster analysis can be used. However, in order to ensure detail in the vicinity of the monitored subject, it is necessary to properly transform the original coordinates to create a tendency to form larger clusters further from the monitored node. The individual node transformations are described below, but generally depend on the clustering algorithm, which differs in approach and applicability to different types of tasks. 2.1 Approach based on the k-means algorithm The k-means algorithm is one of the most widely used clustering algorithms (Hennig et al, 2016). It is an iteration- based method, that aims at partitioning the set of 𝑁 points 𝑥𝑖 , 𝑖 = 1, … , 𝑁 , in an 𝑚-dimensional space into 𝐾 clusters 𝑆𝑘 , 𝑘 = 1, … , 𝐾, with centers 𝑐𝑘, in which each point belongs to the cluster with the nearest cluster centre (often referred to as cluster mean, or cluster centroid). It assumes that there is a distance (or similarity) function defined between points 𝑥𝑖 = [𝑋𝑖 , 𝑌𝑖 ] and cluster centers 𝑐𝑘 = [𝐶𝑋𝑘 , 𝐶𝑌𝑘 ]. The most used function is the Euclidean distance The k-means square-error criterion is then defined as the summary distance between points and their cluster centres The smaller the value 𝑊(𝑆, 𝑐), the better the resulting clustering. Unfortunately, the minimisation of Eq(2) is an NP-hard optimisation problem. However, there are efficient heuristic algorithms that converge quickly to a local optimum, one of which is the Lloyd’s algorithm (also known as naïve k-means) (Lloyd, 1982) used for the computations in this paper. The resulting clusters are regular and have roughly the same size. In order to use the k-means algorithm to cluster points based on a distance from a reference point 𝑥𝑟 = [𝑋𝑟 , 𝑌𝑟 ] (such that the clusters that are closer to the reference point are smaller), a modification of either the input data or the algorithm is needed. The proposed transformation of the data points 𝑥𝑖 into new coordinates [�̂�𝑖 , �̂�𝑖 ] is done with the following equation whose effect can be seen on municipalities of Czech Republic in Figure 1. On the left are the original data points bounded by a red box, with the reference point highlighted by a red dot, a point close to the reference point 𝑑(𝑥𝑖 , 𝑐𝑘 ) = (𝑋𝑖 − 𝐶𝑋𝑘 ) 2 + (𝑌𝑖 − 𝐶𝑌𝑘 ) 2. (1) 𝑊(𝑆, 𝑐) = ∑ ∑ 𝑑(𝑥𝑖 , 𝑐𝑘 )𝑖∈𝑆𝑘 𝐾 𝑘=1 . (2) [�̂�𝑖 , �̂�𝑖 ] = [𝑋𝑟 , 𝑌𝑟 ] + [𝑋𝑖 ,𝑌𝑖 ]−[𝑋𝑟,𝑌𝑟] √(𝑋𝑖−𝑋𝑟) 2+(𝑌𝑖 −𝑌𝑟) 2 , (3) 464 highlighted by a blue dot, and a point far from the reference point highlighted by a green dot. The transformed data are shown on the right, with the corresponding points and bounding box highlighted in the same way. Figure 1: Visualisation of the dependence of the transformation on the distance from the reference node Figure 1 shows the dependence of the transformation on the distance from the reference node. The position of the red dot remains unchanged, but the others are moved. Significant scaling can be noticed, with most points being transformed outside the original territory. For illustration, a rectangle of 500 x 280 km is shown in the original and transformed network. The blue dot closest to the reference point is shifted furthest, and the slightly distant green dot has hardly changed position within the rectangle. The scale of the entire network has increased significantly. When applying the k-means algorithm, where the initial distribution of centroids will evenly cover the whole new space, the areas where the concentration of points is smaller will form small clusters. Thus, the desired detail near the reference point in the inverse transformation is achieved. 2.2 Approach based on the hierarchical clustering method Hierarchical clustering is a method of groups points based on their distances relative to each other (Hennig et al., 2016). This distance (also known as similarity) can be measured by an arbitrary function (usually, metric distances are used). This brings the advantage of using real-world distances from the considered traffic network. Strategies for hierarchical clustering can be generally divided into two types: agglomerative (bottom-up) – each point starts in its own cluster and clusters are gradually merged; divisive (top-down) – all points start in one cluster and clusters are gradually split. Typically, these merges and splits are determined in a greedy manner, and the result of hierarchical clustering is represented by a dendrogram. As the agglomerative approach is currently the most used one, it is also the approach adopted in this paper. Apart from the distance between the individual points, the method also needs a way to measure the distance between the different clusters. This can be achieved by several different approaches: nearest neighbour distance, furthest neighbour distance, average neighbour distance, etc. In this work, the furthest neighbour distance (the distance between two clusters is the maximum distance between any two points, each in one cluster) is adapted, as it produces cluster of roughly the same size and prevents chaining, where a cluster becomes a long chain of points. To get the desired effect of having smaller clusters close to the reference point and less larger clusters further away from the reference point, a specific way of measuring distances between points is needed. Let 𝐷𝑖,𝑗 be the original distance between points 𝑥𝑖 and 𝑥𝑗 . Using the reference point 𝑥𝑟 , the proposed modified distance �̂�𝑖,𝑗 between points 𝑥𝑖 and 𝑥𝑗 is computed as The effect of Eq(4) is the following – it assigns lower distances to points that are both further away from the reference point, which should force the algorithm to form larger clusters of these points. 3. Comparison of the proposed approaches The two proposed approaches were compared on a dataset containing 6,258 municipalities in the Czech Republic (shown in Figure 1), that was used in the optimisation problem developed in Kůdela et al. (2019). This infrastructure represents the finest detail of legislative division, which can be used in order to obtain relevant data for tasks in waste management. As the reference point was chosen the capital city (Prague), that is denoted �̂�𝑖,𝑗 = Di,j min(Di,r,Dj,r) . (4) 465 by the red dot in Figure 1. For both methods, the desired number of clusters was set to 200. For illustration on variables of network flow optimisation, the number of arcs is reduced by 1,000 times in the case of bipartite graph, where each two nodes are connected. In further research, it would be useful to perform many test calculations in order to estimate the relationship between the reduction in number of nodes and edges due to aggregation into clusters. The number of edges usually represents variables in transport tasks and they are important in terms of resolvability and time demand of computation. Figure 2a shows the resulting clustering of the municipalities by one run of the k-means based approach, with Figure 2b showing the placement of the centroids. a) b) Figure 2: a) clustering of the municipalities by one run of the approach based on the k-means algorithm, b) placement of the resulting centroids The resulting clustering satisfies the desired principle of transport infrastructure. In the vicinity of reference point, there are separate municipalities or clusters containing only few of them. With greater distance, the size of clusters increases and the furthest cluster represents almost the entire Moravian-Silesian Region. Figure 3a shows the resulting clustering of the municipalities by one run of the hierarchical clustering based approach. As hierarchical clustering itself does not use centroids in the computations, the centroids were computed separately (as the mean of the points in a cluster) and are shown in Figure 3b. a) b) Figure 3: a) clustering of the municipalities by one run of the approach based on hierarchical clustering, b) placement of the resulting centroids When comparing the images, one can see that the k-means based method produced a clustering with a higher emphasis around the reference point than the method based on hierarchical clustering (the number of clusters closer to the reference point is higher for the k-means based method). The k-means based method also produced clusters that are more spherical than the method based on hierarchical clustering, where the resulting shapes were more general. On the other hand, hierarchical clustering allows to use data based on real distances between nodes. The difference between the two proposed approaches should not be noticeable when they are applied on state without natural barriers. It is not desirable to merge two nodes, which are close to each other, but cannot be accessible due to mountainous terrain. Therefore, it is favourable to include even profile of a terrain and accessibility of each node. 466 A common method of interpretation and validation of the consistency within clusters of data is by using silhouettes (Rousseeuw, 1987). The value of a silhouette gives an insight into how well is a given point classified, with values closer to plus one indicating good clustering, and values closer to minus one indicating bad clustering. The histograms of the values of the silhouettes for the two methods are shown on Figure 4a (for the k-means based method) and Figure 4b (for the hierarchical clustering based method). The average value of the silhouette for the k-means based method was 0.3137, while for the hierarchical clustering based method the average value was 0.2608. By this measure, the k-means based approach comes out as the more favourable one, but a more conclusive tests, preferably ones that employ the clustering methods for some large-scale optimisation problems, will be subjected to further research. a) b) Figure 4: Histograms of the values of silhouettes for a) k-means based approach, b) hierarchical clustering based approach To use the presented methods in transport tasks, it is necessary to select a representative node from each cluster. The representative node aggregates characteristic parameters such as the production of the selected commodity. A significant disadvantage of both methods is the fact that the resulting centroid is not comparable to any input node, because its position is calculated as coordinates mean in post-processing. For afterwards utilisation of proposed approach, it is desirable to include final representative into decision process of cluster analysis method. Depending on the size of cluster, it is further appropriate to incorporate into model the extra costs associated with transport within individual clusters and take into account the inaccuracy given by node aggregation. 4. Conclusions The presented study introduces a proposal of a methodology for the reduction of transport infrastructure with regard to the solution of optimisation flow tasks. The main contribution lies in the novel approach based on coordinates transformation and clustering analysis, which allows to propose graph network in relation to monitored subject and perform effective computation. The principle of the procedures consists in the keeping the finer detail of infrastructure in the vicinity of the monitored node and it is possible to have coarser detail with increasing distance. For this purpose, the transformation of the coordinates of the nodes was used, followed by the application of cluster analysis. The subject of the research were two approaches, the k-means algorithm and hierarchical agglomerative clustering. In general, K-Means provides better results with respect to the silhouette validation method. On the other hand, hierarchical clustering offers considerable freedom in designing further transformations because it is not necessary to meet the conditions of metric space. At the same time, it is possible to use the distance between nodes, not only with an estimate using air distances from GPS coordinates. This fact can be especially important when applied to a transport network located on a very mountainous terrain, where two nodes can be placed close in detail, but through obstacles it is not be desirable to assign these two nodes to a single cluster. Subsequent research will focus on including other factors such as population at individual nodes, available capacity in relation to total production, etc. These factors may have a major impact on the results of optimisation tasks in the application of proposed procedures for transport infrastructure reduction. Furthermore, it is desirable to tune the equations and control the detail of infrastructure in the vicinity of monitored node. The possible modification of k-means could be k-medoids, which deals with existing nodes instead of coordinates mean and allows to simply aggregation of key input data with using real distances. At the same time, the extension to multiple reference points can be useful while solving the complex real tasks related to industry requirements. Finally, all developed approaches must be tested on case studies to evaluate the error between original and proposed infrastructure. 467 Nomenclature 𝑐𝑘 – centroid of cluster 𝑘, - 𝑑(𝑥𝑖 , 𝑐𝑘 ) – distance between node 𝑥𝑖 and centroid 𝑐𝑘 , m Di,j – original distance between node 𝑖 and 𝑗, m �̂�𝑖,𝑗 – transformed distance between node 𝑖 and 𝑗, m 𝐾 – number of clusters, - 𝑁 – number of nodes, - 𝑥𝑖 – input node 𝑖, - 𝑥𝑟 – input reference node, - [𝐶𝑋𝑘 , 𝐶𝑌𝑘 ] – coordinates of centroid of cluster 𝑘, - [𝑋𝑖 , 𝑌𝑖 ] – original coordinates of node 𝑖, - [�̂�𝑖 , �̂�𝑖 ] – transformed coordinates of node 𝑖, - 𝑊(𝑆, 𝑐) – summary distance between nodes and their clusters, m Acknowledgements This research was funded by the Czech Ministry of Education, Youth and Sports/EU Operational Programme Research, Development and Education, grant No. CZ.02.1.01/0.0/0.0/16_026/0008413 “Strategic partnership for environmental technologies and energy production”. This work was supported by IGA BUT: FSI-S-20-6538. References Archetti C., Feillet D., Gendreau G., Speranza M.G., 2011, Complexity of the VRP and SDVRP, Transportation Research Part C: Emerging Technologies, 19, 5, 741–750. Borthen T., Loennechen H., Wang X., Fagerholt K., Vidal T., 2018. A genetic search-based heuristic for a fleet size and periodic routing problem with application to offshore supply planning, EURO Journal on Transportation and Logistics, 7, 2, 121–150. Chardy M., Costa M.-C., Faye A., Trampont M., 2012, Optimizing splitter and fiber location in a multilevel optical FTTH network, European Journal of Operational Research, 222, 3, 430–440. Cheng X., Scherpen J.M.A., 2020, Clustering-Based Model Reduction of Laplacian Dynamics With Weakly Connected Topology, IEEE Transactions on Automatic Control, 65, 10, 4393–4399. Đukić S., Sarić A., 2012, Dynamic Model Reduction: An Overview of Available Techniques with Application to Power Systems. Serbian Journal of Electrical Engineering, 9, 131–169. Grabusts P., 2018, Numerical Data Clustering Ontology Approach, Mendel Journal, 24, 31–38. Hennig C., Meila M., Murtagh F., Rocci R., 2016, Handbook of cluster analysis, CRC Press, Taylor & Francis Group, Boca Raton, USA. Hinners H., Gonzalez D. M., Myrzik J.M.A., 2019. Model Order Reduction of Active Distribution Networks with TSO-DSO Interconnection Power Flow Control, IEEE Milan PowerTech, Milan, Italy, 1–6. Ishizaki T., Kashima K., Imura J., Aihara K., 2014, Model reduction and clusterization of large-scale bidirectional networks, IEEE Transactions on Automatic Control, 59, 1, 48–63. Ishizaki T., Kashima K., Girard A., Imura J., Chen L., Aihara K., 2015, Clustered model reduction of positive directed networks, Automatica, 59, 238–247. Kůdela J., Šomplák R., Nevrlý V., Lipovský T., Smejkalová V., Dobrovský L., 2019, Multi-objective strategic waste transfer station planning, Journal of Cleaner Production, 230, 1294–1304. Lam H. L., Klemeš J. J., Kravanja Z., 2011, Model-size reduction techniques for large-scale biomass production and supply networks, Energy, 36, 8, 4599–4608. Lam H. L., Varbanov P. S., Klemeš J. J., 2010, Optimisation of regional energy supply chains utilising renewables: P-graph approach, Computers & Chemical Engineering, 34, 782-792. Li G., Balakrishnan A., 2016, Models and algorithms for network reduction, European Journal of Operational Research, 248, 3, 930–942. Lloyd S.P., 1982, Least Squares Quantization in PCM, IEEE Transactions on Information Theory, 28, 129–137. Pavlas M., Nevrlý V., Popela P., Šomplák R., 2015, Heuristic for generation of waste transportation test networks, Mendel Journal, 21, 189–194. Rousseeuw P.J., 1987, Silhouettes: a Graphical Aid to the Interpretation and Validation of Cluster Analysis, Computational and Applied Mathematics, 20, 53–65. Svendsen H.G., 2015, Grid Model Reduction for Large Scale Renewable Energy Integration Analyses, Energy Procedia, 80, 349–356. 468