key: cord-020814-1ty7wzlv authors: Berrendorf, Max; Faerman, Evgeniy; Melnychuk, Valentyn; Tresp, Volker; Seidl, Thomas title: Knowledge Graph Entity Alignment with Graph Convolutional Networks: Lessons Learned date: 2020-03-24 journal: Advances in Information Retrieval DOI: 10.1007/978-3-030-45442-5_1 sha: doc_id: 20814 cord_uid: 1ty7wzlv In this work, we focus on the problem of entity alignment in Knowledge Graphs (KG) and we report on our experiences when applying a Graph Convolutional Network (GCN) based model for this task. Variants of GCN are used in multiple state-of-the-art approaches and therefore it is important to understand the specifics and limitations of GCN-based models. Despite serious efforts, we were not able to fully reproduce the results from the original paper and after a thorough audit of the code provided by authors, we concluded, that their implementation is different from the architecture described in the paper. In addition, several tricks are required to make the model work and some of them are not very intuitive.We provide an extensive ablation study to quantify the effects these tricks and changes of architecture have on final performance. Furthermore, we examine current evaluation approaches and systematize available benchmark datasets.We believe that people interested in KG matching might profit from our work, as well as novices entering the field. (Code: https://github.com/Valentyn1997/kg-alignment-lessons-learned). The success of information retrieval in a given task critically depends on the quality of the underlying data. Another issue is that in many domains knowledge bases are spread across various data sources [14] and it is crucial to be able to combine information from different sources. In this work, we focus on knowledge bases in the form of Knowledge Graphs (KGs), which are particularly suited for information retrieval [17] . Joining information from different KGs is non-trivial, as there is no unified schema or vocabulary. The goal of the entity alignment task is to overcome this problem by learning a matching between entities in different KGs. In the typical setting some of the alignments are known in advance (seed alignments) and the task is therefore supervised. More formally, we are given graphs G L = (V L , E L ) and G R = (V R , E R ) with a seed alignment A = (l i , r i ) i ⊆ V L × V R . It is commonly assumed that an entity v ∈ V L can match at most one entity v ∈ V R . Thus the goal is to infer alignments for the remaining nodes only. Graph Convolutional Networks (GCN) [7, 9] , which have been recently become increasingly popular, are at the core of state-of-the-art methods for entity alignments in KGs [3, 6, 22, 24, 27] . In this paper, we thoroughly analyze one of the first GCN-based entity alignment methods, GCN-Align [22] . Since the other methods we are studying can be considered as extensions of this first paper and have a similar architecture, our goal is to understand the importance of its individual components and architecture choices.In summary, our contribution is as follows: 1. We investigate the reproducibility of the published results of a recent GCNbased method for entity alignment and uncover differences between the method's description in the paper and the authors' implementation. 2. We perform an ablation study to demonstrate the individual components' contribution. 3. We apply the method to numerous additional datasets of different sizes to investigate the consistency of results across datasets. In this section, we review previous work on entity alignment for Knowledge Graphs and revisit the current evaluation process. We believe that this is useful for practitioners, since we discover some pitfalls, especially when implementing evaluation scores and selecting datasets for comparison. An overview of methods, datasets and metrics is provided in Table 1 . Methods. While the problem of entity alignments in Knowledge Graphs has been tackled historically by researching vocabularies which are as broad as possible, and establish them as a standard, recent approaches take a more data-driven view. Early methods use classical knowledge graph link prediction models such as TransE [2] to embed the entities of the individual knowledge graphs using an intra-KG link prediction loss, and differ in what they do with the aligned entities. For instance, MTransE [5] learns a linear transformation between the embedding spaces of the individual graphs using an L 2 -loss. BootEA [19] adopts a bootstrapping approach and iteratively labels the most likely alignments to utilize them for further training. In addition to the alignment loss, embeddings of aligned entities are swapped regularly to calibrate embedding spaces against each other. SEA [15] learns a mapping between embedding spaces in both directions and additionally adds a cycle-consistency loss. Thereby, the distance between the original embedding of an entity, and the result of translating this embedding to the opposite space and back again, is penalized. IPTransE [26] embeds both KGs into the same embedding space and uses a margin-based loss to enforce the embeddings of aligned entities to become similar. RSN [8] generates sequences using different types of random walks which can move between graphs when visiting aligned entities. The generated sequences are feed to an adapted recurrent model. JAPE [18] , KDCoE [4] , MultiKE [25] and AttrE [20] utilize attributes available for some entities and additional information like the names of entities and relationships. Graph Convolutional Network (GCN) based models [3, 6, 22, 24, 27] 1 have in common that they use GCN to create node representations by aggregating node representations together with representations of their neighbors. Most of GCN approaches do not distinguish between different relations and either consider all neighbors equally [6, 22, 24] or use attention [3] to weight the representations of the neighbors for the aggregation. Datasets. The datasets used by entity alignments methods are generally based on large-scale open-source data sources such as DBPedia [1] , YAGO [13] , or Wikidata [23] . While there is the DWY-100K dataset, which comprises 100 K aligned entities across the three aforementioned individual knowledge graphs, most of the datasets, such as DBP15K, or WK3l are generated from a single multi-lingual database. There, subsets are formed according to a specific language, and entities which are linked across languages are used as alignments. A detailed description of most-used datasets can be found in Table 2 . As an interesting observation we found out that all papers which evaluate on DBP15, do not evaluate on the full DBP15K dataset 2 (which we refer to as DBP15K (full)), but rather use a smaller subset provided by the authors of JAPE [18] in their GitHub repository 3 , which we call DBP15K (JAPE). The smaller subsets were created by selecting a portion of entities (around 20K of 100K) which are popular, i.e. appear in many triples as head or tail. The number of aligned entities stays the same (15K). As [18] only reports the dataset statistics of the larger dataset, and does not mention the reduction of the dataset, subsequent papers also report the statistics of the larger dataset, although experiments use the smaller variant [3, 18, 19, 22, 26] . As the metrics rely on absolute ranks, the numbers are better than on the full dataset (cf. Table 3 ). Scores. It is common practice to only consider the entities being part of the test alignment as potential matching candidates. Although we argue that ignoring entities exclusive to a single graph as potential candidates does not reflect well Table 2 . Overview of used datasets with their sizes in the number of triples (edges), entities (nodes), relations (different edge types) and alignments. For WK3l, the alignment is provided as a directed mapping on a entity level. However, there are additional triple alignments. Following a common practice as e.g. [15] we can assume that an alignment should be symmetric and that we can extract entity alignments from the triple alignments. Thereby, we obtain the number of alignments given in brackets. GCN-Align [22] is a GCN-based approach to embed all entities from both graphs into a common embedding space. Each entity i is associated with structural features h i ∈ R d , which are initialized randomly and updated during training. The features of all entities in a single graph are combined to the feature matrix H. Subsequently, a two-layer GCN is applied. A single GCN layer is described by ij is the diagonal node degree matrix. The input of the first layer is set to H (0) = H, and σ is non-linear activation function, chosen as ReLU. The output of the last layer is considered as the structural representation, denoted by s i = H (2) i ∈ R d . Both graphs are equipped with their own node features, but the convolution weights W (i) are shared across the graphs. The adjacency matrix is derived from the knowledge graph by first computing a score, called functionality, for each relation as the ratio between the number of different entities which occur as head, and the number of triples in which the relation occurs α r . Analogously, the inverse functionality α r is obtained by replacing the nominator by the number of different tail entities. The final adjacency matrix is obtained as A ij = (ei,r,ej ) α r + (ej ,r,ej ) α r . Note, that analogously to structural features GCN-Align is able to process the attributes and integrate them in final representation. However, since attributes have little effect on final score, and to be consistent with other GNN models, here we focus only on structural representations. Implementation Specifics. The code 5 provided by the authors differs in a few aspects from the method described in the paper. First, when computing the adjacency matrix, fun(r) and if un(f ) are set to at least 0.3. S, the node embeddings are initialized with values drawn from a normal distribution with variance n −1/2 , where n is the number of nodes 6 . Additionally, the node features are always normalised to unit Euclidean length before passing them into the network. Finally, there are no convolution weights. This means that the whole GCN does not contain a single parameter, but is just a fixed function on the learned node embeddings. In initial experiments we were able to reproduce the results reported in the paper using the implementation provided by the authors. Moreover, we are able to reproduce the results using our own implementation, and settings adjusted to the authors' code. In addition, we replaced the adjacency matrix based on functionality and inverse functionality by a simpler version, where We additionally useD −1Â instead of the symmetric normalization. In total, we see no difference in performance between our simplified adjacency matrix, and the authors' one. We identified two aspects which affect the model's performance: Not using convolutional weights, and normalizing the variance when initializing node embeddings. We provide empirical evidence for this finding across numerous datasets. Our results regarding Hits@1 (H@1) are summarised in Table 3 . Node Embedding Initialization. Comparing the columns of Table 3 we can observe the influence of the node embedding initialization. Using the settings from the authors' code, i.e. not using weights, a choosing a variance of n −1/2 actually results in inferior performance in terms of H@1, as compared to use a standard normal distribution. These findings are consistent across datasets. Convolution Weights. The first column of Table 3 corresponds to the weight usage and initialization settings used in the code for GCN-Align. We achieve slightly better results than published in [22] , which we attribute to a more exhaustive parameter search. Interestingly, all best configurations use Adam optimizer instead of SGD. Adding convolution weights degrades the performance across all datasets and subsets thereof but one as witnessed by comparing the first two columns with the last two columns. In this work, we reported our experiences when implementing the Knowledge Graph alignment method GCN-Align. We pointed at important differences between the model described in the paper and the actual implementation and quantified their effects in the ablation study. For future work, we plan to include other methods for entity alignments in our framework. DBpedia: a nucleus for a web of open data Translating embeddings for modeling multi-relational data Multi-channel graph neural network for entity alignment Co-training embeddings of knowledge graphs and entity descriptions for cross-lingual entity alignment Multilingual knowledge graph embeddings for cross-lingual knowledge alignment Deep graph matching consensus Neural message passing for quantum chemistry Learning to exploit long-term relational dependencies in knowledge graphs Semi-supervised classification with graph convolutional networks Proceedings of the 57th Conference of the Association for Computational Linguistics (ACL 2019) Proceedings of the Twenty-Eighth International Joint Conference on Artificial Intelligence (IJCAI 2019) Proceedings of the Twenty-Seventh International Joint Conference on Artificial Intelligence (IJCAI 2018) YAGO3: a knowledge base from multilingual wikipedias A review of relational machine learning for knowledge graphs Semi-supervised entity alignment via knowledge graph embedding with awareness of degree difference Proceedings of the Twenty-Sixth International Joint Conference on Artificial Intelligence (IJCAI 2017) Introducing the knowledge graph: things, not strings Cross-lingual entity alignment via joint attribute-preserving embedding Bootstrapping entity alignment with knowledge graph embedding Entity alignment between knowledge graphs using attribute embeddings Graph attention networks Cross-lingual knowledge graph alignment via graph convolutional networks Cross-lingual knowledge graph alignment via graph matching neural network Multi-view knowledge graph embedding for entity alignment Iterative entity alignment via joint knowledge embeddings Neighborhood-aware attentional representation for multilingual knowledge graphs Acknowledgements. This work has been funded by the German Federal Ministry of Education and Research (BMBF) under Grant No. 01IS18036A and by the Bavarian Ministry for Economic Affairs, Infrastructure, Transport and Technology through the Center for Analytics-Data-Applications (ADA-Center) within the framework of "BAY-ERN DIGITAL II". The authors of this work take full responsibilities for its content.