KDD 19': Heterogeneous Graph Neural Network
🤗 Recommendation system paper challenge (28/50)
🤔 What problem do they solve?
They would like to generate Heterogeneous Graph embedding consisting of graph structure information and node content information.
😮 What are the challenges?
Few of them can jointly consider heterogeneous structural (graph) information as well as heterogeneous contents information of each node effectively.
- many nodes could not connect to all types of neighbors
- A node could carry unstructured content
- Different types of neighbors contributes differently to node embedding
😎 Overview of the models: HetGNN
They propose a heterogeneous graph neural network model to resolve this issue.
Specifically, they first introduce a random walk with restart strategy to sample a fixed size of strongly correlated heterogeneous neighbors for each node and group them based upon node types.
Next, we design a neural network architecture with two modules to aggregate feature information of those sampled neighboring nodes. The first module encodes “deep” feature interactions of heterogeneous contents and generates content embedding for each node.
The second module aggregates content (attribute) embeddings of different neighboring groups (types) and further combines them by considering the impacts of different groups to obtain the ultimate node embedding.
Finally, we leverage a graph context loss and a mini-batch gradient descent procedure to train the model in an end-to-end manner.
Sampling Heterogeneous Neighbors (C1)
Most of other GNNs models have some issues:
- They cannot capture feature information from different types of neighbors.
- Besides that, They are weakened by various neighbor sizes.
- They are not suitable for aggregating heterogeneous neighbors which have different content features.
They propose a heterogeneous neighbors sampling strategy based on random walk with restart (RWR).
- RWR collects all types of neighbors for each node
- the sampled neighbor size of each node is fixed and the most frequently visited neighbors are selected;
- neighbors of the same type (having the same content features) are grouped such that type-based aggregation can be deployed.
Encoding Heterogeneous Contents (C2)
Given a node, it has different type neighbors.
Input: 1 neighboring node
output: 1 embedding
For each neighboring node, we would like to get its encoding but different type of node has different contents. How do we aggregate the information together? Bi-LSTM!
We can apply pre-trained model to get embedding of each information and feed into Bi-LSTM (to capture deep interactions) and then Mean Pooling to aggregate it.
Note that the Bi-LSTM operates on an unordered content set.
(1) it has concise structures with relative low complexity (less parameters), making the model implementation and tuning relatively easy
(2) it is capable to fuse the heterogeneous contents information, leading to a strong expression capability;
(3) it is flexible to add extra content features, making the model extension convenient.
Aggregating Heterogeneous Neighbors (C3)
As C1, given 1 node, we have neighboring nodes. Thanks to C2, each neighboring node can encode to 1 embedding.
But how do we aggregate these neighboring nodes together?
Group by Type and for each type run Bi-LSTM!
Same Type Neighbors Aggregation
After grouping neighboring nodes, in each group, we have several nodes in same type. So we can run Same Type Neighbors Aggregation.
We employ Bi-LSTM to aggregate content embeddings of all t-type neighbors and use the average over all hidden states to represent the general aggregated embedding.
We use different Bi-LSTMs to distinguish different node types for neighbors aggregation. Note that the Bi-LSTM operates on an unordered neighbors set, which is inspired by GraphSAGE.
Again, given a node, we have neighboring nodes and we group them into several group by types. So each type, we have 1 embedding.
How to aggregate these embedding? Attention layer
To combine these type-based neighbor embeddings with v’s content embedding, we employ the attention mechanism.
The motivation is that different types of neighbors will make different contributions to the final representation of v.
Objective and Model Training
A graph context loss and a mini-batch gradient descent.
They applied negative sampling (1 by 1) and similar toDeepWalk, they apply random walk to get negative samples.
Finally, we have this whole picture.
🥴 What else in this paper?
In this paper, they also discuss several experiments.
(RQ1) How does HetGNN perform vs. state-of-the-art baselines for various graph mining tasks, such as link prediction (RQ1–1), personalized recommendation (RQ1–2), and node classification & clustering (RQ1–3)?
(RQ2) How does HetGNN perform vs. state-of-the-art baselines for inductive graph mining tasks, such as inductive node classification & clustering?
(RQ3) How do different components, e.д., node heterogeneous contents encoder or heterogeneous neighbors aggregator, affect the model performance?
- HetGNN has better performance than No-Neigh in most cases, demonstrating that aggregating neighbors information is effective for generating better node embeddings.
- HetGNN outperforms Content-FC, indicating that the Bi-LSTM based content encoding is better than “shallow” encoding like FC for capturing “deep” content feature interactions.
- HetGNN achieves better results than Type-FC, showing that selfattention is better than FC for capturing node type impact
(RQ4) How do various hyper-parameters, e.д., embedding dimension or the size of sampled heterogeneous neighbors set, impact the model performance?
🙃 Other related blogs:
WWW’17: Visual Discovery at Pinterest
ICCV: International Conference on Computer Vision
CVPR: Conference on Computer Vision and Pattern Recognition
Top Conference Paper Challenge: