February 08, 2024
Knowledge Graphs (KGs) have become increasingly common for representing large-scale linked data. However, their immense size has required graph learning systems to assist humans in analysis, interpretation, and pattern detection. While there have been promising results for researcher- and clinician- empowerment through a variety of KG learning systems, we identify four key deficiencies in state-of-the-art graph learning that simultaneously limit KG learning performance and diminish the ability of humans to interface optimally with these learning systems. These deficiencies are: 1) lack of expert knowledge integration, 2) instability to node degree extremity in the KG, 3) lack of consideration for uncertainty and relevance while learning, and 4) lack of explainability. Furthermore, we characterise state-of-the-art attempts to solve each of these problems and note that each attempt has largely been isolated from attempts to solve the other problems. Through a formalisation of these problems and a review of the literature that addresses them, we adopt the position that not only are deficiencies in these four key areas holding back human-KG empowerment, but that the divide-and-conquer approach to solving these problems as individual units rather than a whole is a significant barrier to the interface between humans and KG learning systems. We propose that it is only through integrated, holistic solutions to the limitations of KG learning systems that human and KG learning co-empowerment will be efficiently affected. We finally present our "Veni, Vidi, Vici" framework that sets a roadmap for effectively and efficiently shifting to a holistic co-empowerment model in both the KG learning and the broader machine learning domain.
Knowledge Graphs, Knowledge Graph Embedding, Relational Learning, Neuro-Symbolic Learning, GNNs
Knowledge Graphs (KGs) are semantic data stores that model data as a set of nodes and the relations between them [1]. The atomic unit of knowledge in a knowledge graph is the triple, which consists of a single labelled head node, a labelled tail node, and a directed, labelled edge that relates the head to the tail. For example, the fact "Sauron created the One Ring" could be written as (Sauron, created, One-Ring).
The graph structure of KGs lends them very naturally to a variety of intrinsically networked datasets, such as social networks, computer networked systems, and biomedical drug-gene interaction networks. However, KG data has grown to a size that precludes human analysis without computational assistance. For example, Hetionet, a common benchmark dataset for drug repurposing in bioinformatics, contains 2.25M triples – far more than a human can analyse alone [2]. Other large KGs can range from 1M to 21M triples [3], [4].
In order to empower human use of these massive KGs, various KG learning systems have been proposed to detect patterns and predict new information in the domain of a given KG. Knowledge Graph Embeddings (KGEs) use machine learning to attempt to represent the semantics of a KG in vector space [1], [5], [6]. Other KG learning approaches include logical and rule-based methods [7]–[9], path-based methods [7], [10], and Graph Neural Networks (GNNs) [11]–[14].
KG learning systems have made significant progress in assisting with research tasks, such as drug re-purposing [7], [15], [15] and drug-drug interaction prediction [16], [17]. However, recent literature has identified some major shortcomings of the current state-of-the-art approaches to KG learning systems.
The first shortcoming is that most existing KG learning systems do not account for the logical structure of the data, which means that these systems typically cannot use an expert-curated ontologies even if they are available [1], [5], [6], [12], [18]. Moreover, application of KGEs to real-world problems, notably including high-potential-impact uses such as medical drug repurposing and predictions about drugs and diseases in the biomedical context, are mostly done on methods that make no attempt to model the logical structure of the data [19]–[22]. This means that many logical and causal inferences the humans care about most are not well accounted for, either in theory or in practice.
Moreover, KG learning systems have been shown to suffer from biases due to varying distributions of node degrees: high-degree nodes are generally learned much more reliably and with much greater embedding quality than low-degree nodes across various KG learning methods [23]–[25]. Despite their generally superior embedding quality, high-degree nodes can also have detrimental effects on KG learning systems [15], [23], to the point that removing high-degree domain-relevant data from the KG can boost performance [15].
KG learning systems also often lack the focus that is desired by human researchers and clinicians: while in many cases KG learning systems are only used for predicting certain types of facts in a graph, those models are trained to predict all types of facts with equal emphasis on each one [15]. Most widely-used KG learning systems do not calculate or use uncertainty or relevance scores for triples, which further prevents the model from focusing on the most relevant and most reliable information [1], [5], [6], [26].
Finally, most KG learning systems were not created or evaluated for explainability [5]–[7], [10], [12], [24], [25], [27]. Some logical methods do allow explainability [7], [8], [28]; however, with few exceptions, these logical approaches generally cannot handle large KGs [7]. Moreover, there is no agreed-upon definition or concrete metric for explainability for KG learning systems, which makes comparison of progress in this area a challenge in itself [27], [29]. As a result, not only are KG learning systems opaque to humans, but critically detecting the presence and cause of the other 3 problems is severely hindered by the black-box nature of most KG learning systems.
Individually, these problems are each a major setback for human-AI empowerment and advancements in the realm of KGs. Taken together, they present a bleak outlook for the use of KG learning techniques and suggest an urgent need for corrective innovation and development within the field.
In this position paper, we first formalise definitions and nomenclatures for each of these four problems: lack of expert knowledge integration, instability to extreme topological variation, inability to perform focused learning, and lack of explainability. We present the details of each problem and give a review of state-of-the-art techniques that attempt to address each one. Our analysis leads us to the conclusion that the state-of-the-art is deficient not only because of a general lack of robust answers to these problems, but because existing solutions tend to address only one of these problems while ignoring the others. We conclude by presenting our "Veni, Vidi, Vici" framework for how to represent and consider diverse problems in the same conceptual space to best empower humans with robust, interpretable, and reliable learning systems. Finally, we propose that use of the "Veni, Vidi, Vici" model will allow researchers to not only greatly advance KG learning systems past these major challenges, but also fundamentally strengthen the machine learning-human interface in many different domains.
We first define expert knowledge as so: expert knowledge is knowledge about the logic, dependency patterns, pragmatics, inferences, and analytic approaches of a specific domain. From this, we formally define expert knowledge integration as follows: expert knowledge integration is the act of making a machine learning model explicitly aware of expert knowledge in how it models tasks and/or data. While in different domains this will have widely different manifestations, in the domain of KG learning systems it can be presented in a very straight forward manner. We can say, without loss of generality, that a KG learning system expresses expert knowledge integration if and only if it is able to recognise different information with the exact same graph topology.
For example, take Figure 1. To a human, it is obvious that if Gimli is a friend of Legolas and Legolas is an enemy of Sauron, that Gimli should also be an enemy of Sauron. On the other hand, there is no reason to assume a direct connection between Gandalf and Mithril Armour based on the information given in the left-hand graph. The two graphs are topologically identical, but very distinct in information content.
 
Modern KGEs, such as TransE, ComplEx, DistMult, RotatE, and ConvE [5], [6], [9], as well as rule mining systems such as AMIE+ [28], GNNs [12]–[14], and path-based KG learning systems [10] cannot take this expert information into account and would learn these two graphs identically. Using this example as a guide, we formalise the problem of expert knowledge integration as so: By what methods is it possible to teach a KG learning system to distinguish knowledge with identical topology but variable semantics? We call methods that attempt to distinguish such graphs "Expert Knowledge Methods".
Note: although rule-mining and query-based approaches are at times discussed as if they contained expert knowledge, rule-based systems such as AMIE+ and IterE [28], [30] and query-answering methods that allow logical operations in queries (such as Query2box and CQD) [31], [32] are not Expert Knowledge Systems. We exclude them because none of them can distinguish the two graphs in Figure 1; i.e., their reasoning derives entirely from topological patterns, not from domain expertise or expert knowledge.
Modern literature has principally focused on two main methods for creating expert-knowledge aware systems: use of ontological rules [8], [9], [18], [33]–[35] and use of non-ontological domain expertise [7], [15].
Ontology-based methods incorporate background knowledge expressed as ontological rules at various stages of the learning pipeline. For example, Alshahrani et al. use an ontology to complete an existing KG before attempting to learn on it, meaning that all rules inferred by the ontology are explicitly present in the graph [33]. In contrast, DL-Learner mines rules from an KG and uses them to refine or expand an existing ontology, taking the initial form of that ontology into account [8]. Injecting ontologies into training is also done, either to create negative examples to learn from [18] or to explicitly model for logical relationships during training [34], [35]. A more complete analysis of the stages of ontology integration and ontology-guided learning can be found in [9].
A second, more recent approach to integrating expert knowledge is using "meta-paths" procured by domain experts to guide learning along specific parts of a KG. Meta-paths are paths of entities and relationships in a KG that are identified not by the identity of nodes along them, but by their broader type. For example, (Gandalf, friends-with, Aragorn) would have the meta-path (Person, friends-with, Person). These meta-paths allow expects to create rules that graph learners can use to bias their training. For example, PoLo uses expert-curated meta-paths to guide a path-based reinforcement learner on KGs [7]. Ratajczak et al. use expert-curated meta-paths to prune out information from a graph that experts would not consider relevant for specific learning tasks [15].
The second issue facing KG learning systems to-date is that of their inability to handle extremities in node degree and topological variation. We formally define this problem as so: handling topological variation means capturing the semantics of a node or edge equally regardless of its local connectivity patterns. An example of this can be seen in Figure 2. Both graphs on the right contain identical semantic information. However, the topology of the graphs vary, which means that the degree of each node is greatly reduced in the right-hand graph as opposed to the left-hand one.
 
While the fact that Gandalf is an enemy of Sauron is implicit in the graph and obvious to a human, it is not labelled explicitly in the graph. As the remainder of this section will show, this means that traditional KG learning methods will be much less apt to learn that Gandalf and Sauron are enemies. Using this example as a guide, we formalise the problem of handling topological variation as so: By what methods is it possible to teach a KG learning system to robustly and consistently learn knowledge with distinct topology but identical semantics? We call methods that attempt to distinguish such graphs "connectivity-tolerant methods".
Topological imbalance in KGs has a variety of negative effects on learning using traditional KGE models, GNNs, and path-based KG learning systems [15], [23]–[25]. Low-degree nodes embed at a much lower quality relative to high-degree nodes [24], and high-degree nodes are sometimes predicted as answers during inference simply because of their higher degree, not because of domain relevance [23]. So-called "super-hubs", or nodes with extremely high degree in the graph, also dilute information and hinder learning; this occurs even when those nodes are highly relevant to the given domain [15], [23].
Existing solutions to topological imbalance include the work of Liu et al., which found that using meta-learning to make low-degree node embeddings more similar to high-degree node embeddings improved performance of KGEs on low-degree nodes [24]. The authors replicated this in GNNs with similar results [25]. A study by Tang et al. found that explicitly modelling for node degree in the GNN allowed the model to more robustly learn embeddings for low-degree nodes [11]. However, even with these systems the core problem of poor representations for low-degree node remains at best mitigated, not solved, since modern KG learning paradigms rely principally on (a plurality of) local connections for learning [11], [24], [25].
The third issue facing KG learning systems is that, even though many KG learning-based tasks only have one type of prediction in mind, they attempt to learn all predictive tasks with equal strength [15]. Moreover, all triples in the input graph are generally considered equally relevant and true [5], [6]. This means that KG learning models cannot estimate of how certain or relevant a fact is, nor use human-known uncertainty in how it models KG data. We formally define the problem of modelling uncertainty as so: focused learning is learning that explicitly models which triples are less certain or less relevant during training.
We split this into two (somewhat overlapping) cases: dealing with uncertainty and dealing with relevance. For example, take Figure 3. On the left, we have a graph where scores are given representing how uncertain (near 0) or certain (near 1) a fact is. On the right, another graph contains triples with no annotations, but where it is possible that some connections are uncertain. For instance, If Frodo is friends with Samwise, and Samwise dislikes Gollum, we may be less certain that Gollum is friends with Frodo. Similarly, we might not care about predicting friendship, but about predicting who travels with whom. This would make only the Travels-with relationship directly relevant; other relationships would be of use only to the extend that they assist learning to predict triples about friendship.
 
Using this example as a guide, we formalise the problem of explicitly modelling for uncertainty as so: By what methods is it possible to distinguish certainty and relevance of triples within a KG? We call methods that attempt to model uncertainty / relevance "focused learning methods".
One method for focused learning in the state-of-the-art is that taken by UKGE and FocusE for KGEs: to use uncertainty / relevance information given as additional triple-level labels in the KG, as is shown in the left of Figure 3 [26], [36]. Specifically, UKGE uses a set of out-of-band logical rules to model how uncertainties interact, and thus only works with uncertainty [36]. FocusE takes a more broad approach: it directly uses numeric labels to modify the scoring layer of a KGE model with fewer assumptions, allowing it to model both uncertainty and relevance equally well [26].
Uncertainty can also be modelled implicitly, in the absence of certainty or relevance labels in the graph [11]. One method for this is used in SL-DSGCN, a GNN method that models uncertainty using a Bayesian-based teacher-student model [11]. The Bayesian Neural Network is trained not only to teach a student network, but to also give uncertainty scores as it teaches [11]. These scores are not input to the model, but learned and created automatically by the Bayesian Network during training [11].
Finally, using expert-curated meta-path information can also help focus learning on those specific paths [7], [15]. Two recent applications are filtering methods that use meta-paths to select for only task-relevant information as a pre-processing step [15] and the PoLo model, which uses reinforcement learning and logical rules to explicitly reward learning done on the given meta-paths.
For our purposes, we define explainability in keeping with what Lipton calls "post-hoc interpretability", or the ability to explain why a model made a certain prediction [37]. Similarly, we say a method is explainable by design if it was created to natively give such explanations. Under this definition of explainability, almost all modern KGEs, GNNs, and path-based methods would not be explainable by design since their predictions do not provide any method for post-hoc explanation [5], [6], [10], [12].
 
Suppose we have a trained KG learning system that performs link prediction. As illustrated in Figure 4, it will attempt to predict if a triple is true (light green) or false (dark red); or in other words, distinguish why one graph structure is predicted to exist while the other is not. This leads us to formalise the problem of explanation of KG learning systems as so: By what methods can we extract the reason(s) that a KG learning system predicted the presence of a certain graph structure or feature rather than any other? We call systems that implement these methods "Explainability Systems".
The first general approach to explainability is to provide a post-hoc explanation of a KG learning system that is not explainable by design, such as TransE [5], RDF2vec [10], or SL-DSGCN [11]. Methods for post-hoc interpretability of KGEs are generally based on estimating the influence of a triple on a given prediction [38]. This includes various Instance Attribution Methods such as Influence Functions and Instance and Gradient Similarity computation [39]. For GNNs, many post-hoc explanation systems are not GNN-specific [40], [41], although some, such as GNNExplainer, PGExplainer, and GraphMask explicitly take advantage of the triple-based structure of graphs to identify the most important triples for a prediction [40]. More detailed analyses can be found in [38], [40], [41].
The second approach is to create models that are explainable by design. This includes rule-based models such as AMIE+ and DL-Learner, which use symbolic logic to infer rules that are used to predict new statements in a graph [8], [28]. Since those rules are written in human-readable logical clauses, they are direct explanations for the models’ predictions. Models such as PoLo also fall in this category: it uses expert-curated rules (in the form of meta-paths) to create a policy-based model able to predict new triples [7].
In examining the four major challenges faced by modern KG learning systems, we observe that, while many papers attempt to solve each problem individually, none attempt to solve all four at once. The current state-of-the-art has divide-and-conquer methodology that results in tunnel-vision with respect to specific problems with KG learning systems and a lack of attention to the broader systemic shortcomings of the field as a whole. To this, we propose a new design methodology, called "Veni, Vidi, Vici" for helping human researchers to conceptualise, frame, and solve distinct problems as one. While we present this in the context of KG learning systems, it is potentially applicable to any case where multiple challenges exist within the same broader domain. An overview of this model and its use is shown in Figure 5.
 
The "Veni, Vidi, Vici" approach occurs in three phases. The first, "Veni" (meaning "I came" in Latin) is to phrase all problems in a shared conceptual space. The second, "Vidi" (meaning "I saw" in Latin) is to deeply explore and analyse this shared conceptual space: for example, what information is lost when it splits into various sub-problems, and what elements of this space do human users care most about? The third and final, "Vici" (meaning "I conquered’ in Latin) is to determine how to solve the underlying problem represented by this shared conceptual space as a single, atomic unit
As an example of the use of the "Veni, Vidi, Vici" system, consider the four challenges to KG learning systems presented here. The first step, Veni, we did by taking each of these challenges and placing them in the same shared conceptual space of distinguishing elements of graphs. Our Vidi step was to examine each of these problems in the state-of-the-art and their limitations, especially in terms of opportunity lost by treating each problem as distinct. Finally, our Vici step of creating, evaluating, and publishing a joint solution is left as a future direction – one that this methodology well prepares up to take.
We expand on these three phases below.
The Veni phase is characterised by asking, how? Specifically, it asks How can I understand these problems in a shared conceptual space? It aims to express each of the four issues – expert knowledge integration, wide topological variation, attention to certainty and relevance, and explainable – into the same terms.
The idea here is twofold. First, that considering the problems in isolation will lead to less general and less generalisable solutions to advancing KGE systems, as outlined in the review above. The second is that, since there is a single common goal (improving the predictive performance of KGE systems), that the various barriers to this goal should be expressible in common, unifying terms. The Veni step thus would not ask "To what extent can some KGE method reduce the detrimental effects of topological extremity in a KG?" – such a question, while valuable, only focusses on one part of the problem, and does not take a fully holistic approach.
Instead, it begins by looking for common ground in these problems. The review above highlights that the methods that seek to strongly and meaningfully integrate expert knowledge for inference on a graph (such as Polo) tend to provide ways of estimating relevance and reducing the effect of extreme topological variation [7]. We also note that most logic-based methods tend to be more natively explainable than those that do not directly consider logic [5], [6], [8], [28].
As a result, one possible question in the Veni phase would be the following: "To what extent can path-based models of graph logic represent the relevant semantic content of a KG?" Here, the common term chosen is logic – most commonly supplied by an ontology for KGEs. It then asks how a logic based method can represent content that is relevant and semantic, meaning that it should be able to distinguish different semantics in identical structures, and focus learning on the most important parts thereof. In other words, this is all about distinguishing graph elements, as outlined in the previous section – distinguishing the meaning of regions with same structure (logic and semantics), distinguishing the relevance of distinct triples, and identifying the structures of a graph that actually encode similar information (logic and topological awareness).
Once this core question is obtained, the Veni step is complete – each problem has been presented in terms of the same basic building-blocks.
The Vidi phase is characterised by asking why?. Specifically, it asks two questions Why did these problems originally separate? and How can I see what information was lost in that separation?. The purpose of this phase is to solidify understanding of the question asked in the Veni phase and, critically, to ensure that the common terms being used to represent the problems in the same conceptual space actually address the core issue of why those problems should be modelled as one, rather than as individual units.
The review conducted here provides a hypothesis to answer these questions for KGEs: that these problems originally separated because of the distinct deficiencies identified by different researchers, and because of the differing aims and purposes for which various groups developed and used KGEs. Similarly, the main loss was that each of these problems was solved for a very specific purpose, and that therefore general advancement of KGEs was second to advancement in the specific areas most pressing at the time the research was conducted. In other words, there remains a large region of untapped potential for KGE development in providing more general and generalisable solutions to KG learning.
The Vici phase is characterised by asking what? Specifically, it asks What goals can I set to holistically address the patterns I see? This is the action step – the step where the goals set are directly actionable. The output here is a specific model or set of models, evaluation protocols, and tests to ensure that any proposed solution not only answers the main question asked in the Veni phase, but that it actually improves on each of the distinct problems identified and recovers what was lost when each problem was taken separately (as identified in the Vidi phase).
The purpose of the Veni, Vidi, Vici model is to invite a new paradigm of developing KGEs – to focus on a holistic, broad goal rather than a variety of distinct sub-goals. Based on the existing literature and the review conducted here, it is in the opinion of the authors that a holistic approach to furthering KGE development will provide a novel, and potentially very successful, new method for approaching semantic modelling and KG representation.
It is critical to note that the proposed "Veni, Vidi, Vici" is not a linear model – it is expected that each phase will involve multiple steps of backtracking to the previous (or first) stage for iterative refinement. It also is meant to be used in any case where several challenges exist within a single broader domain, and has potential to drive innovation in many fields outside of KG learning systems, although exploration of how it can be applied in other domains is left as a future direction.
In this paper, we provide a survey of the state-of-the-art in KG learning systems and the challenges they face. We conclude that the current "divide and conquer" approach to advancing individual problems facing KG learning systems is actively detrimental to advancing KG learning systems. To address this, we propose the "Veni, Vidi, Vici" system that allows human experts to reconsider these problems in a common framework. This framework explicitly models human desiderata and aims to help guide development along the directions that human experts desire most. It simultaneously aims to ensure co-empowerment not only of all areas of learning systems, but also how humans can interface with those systems. Finally, we note that "Veni, Vidi, Vici" is broadly applicable to any domain with multiple sub-problems.