GADNR¶
- class pygod.detector.GADNR(hid_dim=64, num_layers=1, deg_dec_layers=4, fea_dec_layers=3, backbone=<class 'torch_geometric.nn.models.basic_gnn.GCN'>, sample_size=2, sample_time=3, neigh_loss='KL', lambda_loss1=0.01, lambda_loss2=0.1, lambda_loss3=0.8, real_loss=True, lr=0.01, epoch=100, dropout=0.0, weight_decay=0.0003, act=<function relu>, gpu=-1, batch_size=0, num_neigh=-1, contamination=0.1, verbose=0, save_emb=False, compile_model=False, **kwargs)[source]¶
Bases:
DeepDetector
Graph Anomaly Detection via Neighborhood Reconstruction
GAD-NR is a new type of GAE based on neighborhood reconstruction for graph anomaly detection. GAD-NR aims to reconstruct the entire neighborhood (including local structure, self attributes, and neighbors attributes) around a node based on the corresponding node representation.
See [RSL+24] for details.
- Parameters:
hid_dim (int, optional) – Hidden dimension of model. Default:
64
.num_layers (int, optional) – Total number of layers in the backbone encoder model. Default:
1
.deg_dec_layers (int, optional) – The number of layers for the node degree decoder. Default:
4
.fea_dec_layers (int, optional) – The number of layers for the node feature decoder. Default:
3
.backbone (torch.nn.Module, optional) – The backbone of the deep detector implemented in PyG. Default:
torch_geometric.nn.GCN
.sample_size (int, optional) – The number of samples for the neighborhood distribution. Default:
2
.sample_time (int, optional) – The number sample times to remove the noise during node feature and neighborhood distribution reconstruction. Default:
3
.neigh_loss (str, optional) – The neighbor reconstruction loss.
KL
represents the KL divergence loss,W2
represents the W2 loss. Default:KL
.lambda_loss1 (float, optional) – The weight of the neighborhood reconstruction loss term. Default:
1e-2
.lambda_loss2 (float, optional) – The weight of the node feature reconstruction loss term. Default:
1e-3
.lambda_loss3 (float, optional) – The weight of the node degree reconstruction loss term. Default:
1e-4
.real_loss (bool, optional) – Whether using the original loss proposed in the paper as the decision score, if not, using the proposed weighted decision score. Default:
True
.lr (float, optional) – Learning rate. Default:
0.01
.epoch (int, optional) – Maximum number of training epoch. Default:
100
.dropout (float, optional) – Dropout rate. Default:
0.
.weight_decay (float, optional) – Weight decay (L2 penalty). Default:
0.0003
.act (callable activation function or None, optional) – Activation function if not None. Default:
torch.nn.functional.relu
.gpu (int) – GPU Index, -1 for using CPU. Default:
-1
.batch_size (int, optional) – Minibatch size, 0 for full batch training. Default:
0
.num_neigh (int, optional) – Number of neighbors in sampling, -1 for all neighbors. Default:
-1
.contamination (float, optional) – The amount of contamination of the dataset in (0., 0.5], i.e., the proportion of outliers in the dataset. Used when fitting to define the threshold on the decision function. Default:
0.1
.verbose (int, optional) – Verbosity mode. Range in [0, 3]. Larger value for printing out more log information. Default:
0
.save_emb (bool, optional) – Whether to save the embedding. Default:
False
.compile_model (bool, optional) – Whether to compile the model with
torch_geometric.compile
. Default:False
.**kwargs (optional) – Other parameters for the backbone.
- decision_score_¶
The outlier scores of the training data. Outliers tend to have higher scores. This value is available once the detector is fitted.
- Type:
- threshold_¶
The threshold is based on
contamination
. It is the \(N \times\)contamination
most abnormal samples indecision_score_
. The threshold is calculated for generating binary outlier labels.- Type:
- label_¶
The binary labels of the training data. 0 stands for inliers and 1 for outliers. It is generated by applying
threshold_
ondecision_score_
.- Type:
- emb¶
The learned node hidden embeddings of shape \(N \times\)
hid_dim
. Only available whensave_emb
isTrue
. When the detector has not been fitted,emb
isNone
. When the detector has multiple embeddings,emb
is a tuple of torch.Tensor.- Type:
torch.Tensor or tuple of torch.Tensor or None
- fit(data, label=None, h_loss_weight=1.0, degree_loss_weight=0.0, feature_loss_weight=2.5, loss_step=20)[source]¶
Overwrite the base model fit function since GAD-NR uses multiple personalized loss functions.
- Parameters:
data (torch_geometric.data.Data) – Input graph.
label (torch.Tensor, optional) – The optional outlier ground truth labels used for testing. Default:
None
.h_loss_weight (float, optional) – The weight of the neighborhood reconstruction loss term used in the weighted decision score. Default:
1.0
.degree_loss_weight (float, optional) – The weight of the node degree reconstruction loss term used in the weighted decision score. Default:
0.
.feature_loss_weight (float, optional) – The weight of the node feature reconstruction loss term used in the weighted decision score. Default:
2.5
.loss_step (int, optional) – The epoch interval to update the loss terms. Default:
20
.
- predict(data=None, label=None, return_pred=True, return_score=False, return_prob=False, prob_method='linear', return_conf=False, return_emb=False)¶
Prediction for testing data using the fitted detector. Return predicted labels by default.
- Parameters:
data (torch_geometric.data.Data, optional) – The testing graph. If
None
, the training data is used. Default:None
.label (torch.Tensor, optional) – The optional outlier ground truth labels used for testing. Default:
None
.return_pred (bool, optional) – Whether to return the predicted binary labels. The labels are determined by the outlier contamination on the raw outlier scores. Default:
True
.return_score (bool, optional) – Whether to return the raw outlier scores. Default:
False
.return_prob (bool, optional) – Whether to return the outlier probabilities. Default:
False
.prob_method (str, optional) –
The method to convert the outlier scores to probabilities. Two approaches are possible:
1.
'linear'
: simply use min-max conversion to linearly transform the outlier scores into the range of [0,1]. The model must be fitted first.2.
'unify'
: use unifying scores, see [KKSZ11].Default:
'linear'
.return_conf (boolean, optional) – Whether to return the model’s confidence in making the same prediction under slightly different training sets. See [PVD20]. Default:
False
.return_emb (bool, optional) – Whether to return the learned node representations. Default:
False
.
- Returns:
pred (torch.Tensor) – The predicted binary outlier labels of shape \(N\). 0 stands for inliers and 1 for outliers. Only available when
return_label=True
.score (torch.Tensor) – The raw outlier scores of shape \(N\). Only available when
return_score=True
.prob (torch.Tensor) – The outlier probabilities of shape \(N\). Only available when
return_prob=True
.conf (torch.Tensor) – The prediction confidence of shape \(N\). Only available when
return_conf=True
.