Microsoft Word - cet-01.docx CHEMICAL ENGINEERING TRANSACTIONS VOL. 46, 2015 A publication of The Italian Association of Chemical Engineering Online at www.aidic.it/cet Guest Editors: Peiyu Ren, Yancang Li, Huiping Song Copyright © 2015, AIDIC Servizi S.r.l., ISBN 978-88-95608-37-2; ISSN 2283-9216 A Data Field Clustering Method for Classification of Concrete Dam Cracks Long Zhao*a, LinFeng Jiangb a State Key Laboratory of software Engineering, Wuhan University, Wuhan, China b QiLu University of Technology, Shandong, China zxcvbnm9515@163.com Crack detection based on digital image processing is more and more widely applied in the maintenance of concrete dam diseases. However, due to the complexity of the crack image, it is difficult to achieve high accuracy of the crack classification. To improve the shortcomings and deficiencies in cracks extraction and classification algorithm under crack detection system this article focuses on the application of general data field to effectively solve the problem of crack classification. We propose a new data field clustering method for classification of concrete dam cracks. Clustering is an important step when building a classifier for dam crack. Clustering is a process of discovering densely populated regions. In the data space, it groups a set of data in a way that maximizes the similarity within clusters and minimizes the similarity between two different clusters. The mutual information (MI) of two grids is a measure of the grid’s mutual dependence. This definition is useful in the field of clustering, because it gives a way to quantify the relevance between different grids. A data field clustering method for classification of dam cracks adopts the potential values within grid and potential values between different grids. Well-known crack classification methods are compared with our method. The experimental results show that the proposed method has an obvious increase on the precision and interpretability. 1. Introduction Using digital image processing technology to identify and classify the cracks in the dam can help us to fully grasp the dam crack information, and provide a reference for dam maintenance management and disaster warning and forecast. Crack category is one of the most important information of the crack. After the recognition and differentiation of the concrete crack, some appropriate methods are needed to determine the type of crack. Different kinds of cracks are different to the dam body. Based on the characteristics of various types of cracks. This paper extracts the feature points that can effectively distinguish the cracks from the projection method based on the generalized data field, and obtains better classification results. In order to overcome the over-segmentation problem appeared after crack extracting, this paper proposes the concept of general data field, which is based on the characteristic that the ratio between grayscale change steep area and total area is statistically less in dam crack images, and serves the increase and decrease of gradient entropy after region merging as the criterion of judging region merging. Afterwards, combined with that criterion, this paper designs an adaptive algorithm which can calculate the threshold of watershed edge segmentation, and that algorithm can solve the over-segmentation problem in crack image extracting effectively. Clustering is one of the effective ways to solve the problem of crack classification. Clustering uses some evaluation criteria to choose a most significant grids and reduce the dimensions. This thesis has mainly studied the classification of crack images. The contribution of grid is shown by the sum of potential values in data field. In order to meet the crack parameters needs of maintenance, the length of transverse and longitudinal cracks and the minimum circumscribed rectangle area of block and reticular cracks have been selected and calculated. The rest of this paper is organized as follows. Section 2 discusses the related classification methods introduced for crack detection. Section 3 explains the data field and the steps of clustering. Our method is given in Section 4. The experiment of rival clustering methods are analyzed in Section 5. A conclusion is drawn in Section 6. DOI: 10.3303/CET1546115 Please cite this article as: Zhao L., Jiang L.F., 2015, A data field clustering method for classification of concrete dam cracks, Chemical Engineering Transactions, 46, 685-690 DOI:10.3303/CET1546115 685 2. Related work The dam crack has some characteristics which can be used in the classification and damage degree of crack. The geometrical features such as length width ratio area and density are also proposed. K Ohno and M Ohtsu think the crack mode of cracking in concrete is normally changing from tensile mode to shear mode at impending failure. The classification of the cracks of the radial basis function neural network based on is designed. Li, H. proposed a method for blade crack classification based on the signals monitored by using a squared envelope spectrum. In the case of no intervention, accurate, automatic, real-time extraction of dam crack type and severity information. After extracting, crack images are always prone to cracked and incomplete. At the same time, traditional watershed algorithm has the advantage that it can maintain weak edge information good. Wen-Pei Sung et al proposed artificial neural networks to detect the damage of the dam in real time, and the better classification results are obtained, but the operation space and the speed of the algorithm are much slower. The new compound error function of BP neural network is designed, and the improvement of the training speed of neural network is improved. The distribution density is the percentage of the pixels in the image of dam crack. This definition can be understood as a one-way crack distribution density is small, and the distribution density of mesh cracks. But if there are two values of the noise points and the length of the crack length, the distribution density of the cracks is very close to the distribution density of the crack, which is more or less. So the distribution density of the characteristic value is simple and easy to use, but it can easily lead to the wrong type of some crack. Ioannis Valavanis and Dimitrios Kosmopoulos use the geometric and texture features to classify the damaged image. Noorsuhada Md No et al used the relationship between average frequency and RA value indicated clear trend with respect to crack classifications. Hence, this paper applies general data field to extracting of crack. 3. Definition of generalized data field Data field sets each data in the dataset as an energy source which radiates its energy into the space and then generates a data field. The mutual effect among different data object is indicated by a field strength function, which might take various kinds of forms, such as nuclear form and gravitational form. The effect power at one place comes from different source could be overlaid, whose superposed result is named as potential value. Nuclear field strength on x of data field from y is calculated as follows. φ( ) = e (|| ||) (1) The first derivation of field strength on x of data field from y is: F( ) = (y − x) ∗ m ∗ e | | (2) Impact factor is a parameter which controls the distance of mutual influence between two data objects. By defining an appropriate impact factor, data field model could describe the data distribution rather well. The clustering centers are the points which have local maximum potential values: φ(x) = | | ∑ m × K H (x − X ) (3) Where K(x) is a multivariate potential function, H is a positive-definitive 1×d matrix. n is the feature number of D. H is a non-singular constant matrix. Set hj is the jth data points of H.ℎ = ∑ ( ( ) − ( )) . The data field measurement is calculated by a grid-based importance measure algorithm. The potential function accord the data points are estimated within the grid. The potential values of each data point are calculated in different grids, and then integrate all the potential values to calculate the weight of each grid. 3.1 The steps of crack clustering The input and output of this method are as follows: Input:Multidimensional data items’ feature vectors, user- specific grid number, user-specific impact factor parameter, noise thread function (optional).Output: clusters of cracks. This method is divided into the following steps: First, feature space is divided into grids, each data point is put into a grid to form a grid-based data space. Second, an adaptive generalized data field is built to calculate the potential value of each data point, and the distribution of the potential values is calculated based on the characteristics grid. Third, according to the distribution of characteristics space is calculated. The edge of cluster starts from the clustering center, the absolute value of first derivative potential value stops increasing. Partition feature space into grids and assign data items to grids, initialize impact factor, calculate the potential value and first derivative value. The last step is searching the clustering centers. Search points where first derivative potential value equals 0. Accord to the steps above, we can detect the edges of cracks. Search the neighborhood of clustering centers and mark all candidate grids. Filter candidate grids and detect full clusters with Flood-Fill algorithm. 686 Calculate impact factor σ. σ = max s × ifp (4) Where si represents the length traverse of each grid and it marks all these grids which contains local maximum potential value as candidate grids. Calculate potential value and first derivative potential value according to equator. > 0&& < 0, 1 ≤ < − 1 (5) where is the first derivative potential value of grid on dimension i. A candidate grid could be a grid which contains clustering center (center grid in short) only if it existed to be candidate grid on all dimensions. For each center gridCenter = (v ,v ,…,v ), search its neighborhood and mark grids M which satisfies: ≥ (6) Where ,1 ≤ ≤ , 1 ≤ ≤ is the location of Grid on the dimension i of feature space, and satisfies the same condition as well. Calculate noise thread t using the noise thread function [ ( )], where is the quantity of data objects inside grid inside edges. Filter all grids whose