CARD¶

class pygod.detector.CARD(hid_dim=64, num_layers=2, dropout=0.0, weight_decay=0.0, act=<function relu>, backbone=<class 'torch_geometric.nn.models.basic_gnn.GCN'>, contamination=0.1, lr=0.004, epoch=100, gpu=-1, batch_size=0, num_neigh=-1, subgraph_num_neigh=4, fp=0.6, gama=0.5, alpha=0.1, verbose=0, save_emb=False, compile_model=False, **kwargs)[source]¶

Bases: DeepDetector

Community-Guided Contrastive Learning with Anomaly-Aware Reconstruction for Anomaly Detection on Attributed Networks.

CARD is a contrastive learning based method and utilizes mask reconstruction and community information to make anomalies more distinct. This model is train with contrastive loss and local and global attribute reconstruction loss. Random neighbor sampling instead of random walk sampling is used to sample the subgraph corresponding to each node. Since random neighbor sampling cannot accurately control the number of neighbors for each sampling, it may run slower compared to the method implementation in the original paper.

See:cite:Wang2024Card for details.

Parameters:

hid_dim (int, optional) – Hidden dimension of model. Default: 64.
num_layers (int, optional) – Total number of layers in model. Default: 2.
dropout (float, optional) – Dropout rate. Default: 0..
weight_decay (float, optional) – Weight decay (L2 penalty). Default: 0..
act (callable activation function or None, optional) – Activation function if not None. Default: torch.nn.functional.relu.
backbone (torch.nn.Module) – The backbone of the deep detector implemented in PyG. Default: torch_geometric.nn.GCN.
contamination (float, optional) – The amount of contamination of the dataset in (0., 0.5], i.e., the proportion of outliers in the dataset. Used when fitting to define the threshold on the decision function. Default: 0.1.
lr (float, optional) – Learning rate. Default: 0.004.
epoch (int, optional) – Maximum number of training epoch. Default: 100.
gpu (int) – GPU Index, -1 for using CPU. Default: -1.
batch_size (int, optional) – Minibatch size, 0 for full batch training. Default: 0.
num_neigh (int, optional) – Number of neighbors in sampling, -1 for all neighbors. Default: -1.
subgraph_num_neigh (int, optional) – Number of neighbors in subgraph sampling for each node, Values not exceeding 4 are recommended for efficiency. Default: 4.
fp (float, optional) – The balance parameter between the mask autoencoder module and contrastive learning. Default: 0.6
gama (float, optional) – The proportion of the local reconstruction in contrastive learning module. Default: 0.5
alpha (float, optional) – The proprotion of the community embedding in the conbine_encoder. Default: 0.1
verbose (int, optional) – Verbosity mode. Range in [0, 3]. Larger value for printing out more log information. Default: 0.
save_emb (bool, optional) – Whether to save the embedding. Default: False.
compile_model (bool, optional) – Whether to compile the model with torch_geometric.compile. Default: False.
**kwargs – Other parameters for the backbone.

decision_score_¶

The outlier scores of the training data. Outliers tend to have higher scores. This value is available once the detector is fitted.

Type:: torch.Tensor

threshold_¶

The threshold is based on contamination. It is the \(N \times\) contamination most abnormal samples in decision_score_. The threshold is calculated for generating binary outlier labels.

Type:: float

label_¶

The binary labels of the training data. 0 stands for inliers and 1 for outliers. It is generated by applying threshold_ on decision_score_.

Type:: torch.Tensor

emb¶

The learned node hidden embeddings of shape \(N \times\) hid_dim. Only available when save_emb is True. When the detector has not been fitted, emb is None. When the detector has multiple embeddings, emb is a tuple of torch.Tensor.

Type:: torch.Tensor or tuple of torch.Tensor or None

fit(data, label=None)¶

Fit detector with training data.

Parameters:

data (torch_geometric.data.Data) – The training graph.
label (torch.Tensor, optional) – The optional outlier ground truth labels used to monitor the training progress. They are not used to optimize the unsupervised model. Default: None.

Returns:

self – Fitted detector.

Return type:

object

predict(data=None, label=None, return_pred=True, return_score=False, return_prob=False, prob_method='linear', return_conf=False, return_emb=False)¶

Prediction for testing data using the fitted detector. Return predicted labels by default.

Parameters:

data (torch_geometric.data.Data, optional) – The testing graph. If None, the training data is used. Default: None.
label (torch.Tensor, optional) – The optional outlier ground truth labels used for testing. Default: None.
return_pred (bool, optional) – Whether to return the predicted binary labels. The labels are determined by the outlier contamination on the raw outlier scores. Default: True.
return_score (bool, optional) – Whether to return the raw outlier scores. Default: False.
return_prob (bool, optional) – Whether to return the outlier probabilities. Default: False.
prob_method (str, optional) –
The method to convert the outlier scores to probabilities. Two approaches are possible:

1. 'linear': simply use min-max conversion to linearly transform the outlier scores into the range of [0,1]. The model must be fitted first.

2. 'unify': use unifying scores, see [KKSZ11].

Default: 'linear'.
return_conf (boolean, optional) – Whether to return the model’s confidence in making the same prediction under slightly different training sets. See [PVD20]. Default: False.
return_emb (bool, optional) – Whether to return the learned node representations. Default: False.

Returns:

pred (torch.Tensor) – The predicted binary outlier labels of shape \(N\). 0 stands for inliers and 1 for outliers. Only available when return_label=True.
score (torch.Tensor) – The raw outlier scores of shape \(N\). Only available when return_score=True.
prob (torch.Tensor) – The outlier probabilities of shape \(N\). Only available when return_prob=True.
conf (torch.Tensor) – The prediction confidence of shape \(N\). Only available when return_conf=True.