Utility Functions#

pygod.utils.early_stopping module#

Early Stopping Counter Adapted from DGL

class pygod.utils.early_stopping.EarlyStopping(patience: int = 10, verbose: bool = True)[source]#

Bases: object

Early Stopping Counter

Parameters

patience (int) – The epoch number waiting after the highest score Default: 10
verbose (bool) – Whether to print information Default: False

step(score: float, model: torch.nn.modules.module.Module) → bool[source]#

pygod.utils.metric module#

Metrics used to evaluate the anomaly detection performance

pygod.utils.metric.eval_precision_at_k(labels, pred, k, threshold=0.5)[source]#

Precision score for top k instances with the highest outlier scores.

Parameters

labels (numpy.array) – Labels in shape of (N, ), where 1 represents outliers, 0 represents normal nodes.
pred (numpy.array) – Outlier scores in shape of (N, ).
k (int) – The number of instances to evaluate.
threshold (float) – The binary classification threshold.

Returns

precision_at_k – Precision for top k instances with the highest outlier scores.

Return type

float

pygod.utils.metric.eval_recall_at_k(labels, pred, k, threshold=0.5)[source]#

Recall score for top k instances with the highest outlier scores.

Parameters

labels (numpy.array) – Labels in shape of (N, ), where 1 represents outliers, 0 represents normal nodes.
pred (numpy.array) – Outlier scores in shape of (N, ).
k (int) – The number of instances to evaluate.
threshold (float) – The binary classification threshold.

Returns

recall_at_k – Recall for top k instances with the highest outlier scores.

Return type

float

pygod.utils.metric.eval_roc_auc(labels, pred)[source]#

ROC-AUC score for binary classification.

Parameters

labels (numpy.array) – Labels in shape of (N, ), where 1 represents outliers, 0 represents normal nodes.
pred (numpy.array) – Outlier scores in shape of (N, ).

Returns

roc_auc – Average ROC-AUC score across different labels.

Return type

float

pygod.utils.outlier_generator module#

This file including functions to generate different types of outliers given the input dataset for benchmarking

pygod.utils.outlier_generator.gen_attribute_outliers(data, n, k)[source]#

Generating attribute outliers according to paper “Deep Anomaly Detection on Attributed Networks” <https://epubs.siam.org/doi/abs/10.1137/1.9781611975673.67>.

We randomly select n nodes as the attribute perturbation candidates. For each selected node i, we randomly pick another k nodes from the data and select the node j whose attributes deviate the most from node i among the k nodes by maximizing the Euclidean distance ||xi − xj ||2. Afterwards, we then change the attributes xi of node i to xj.

Parameters

data (torch_geometric.datasets.Data) – PyG dataset object.
n (int) – Number of nodes converting to outliers.
k (int) – Number of candidate nodes for each outlier node.

Returns

data (torch_geometric.datasets.Data) – The attribute outlier graph with modified node attributes.
y_outlier (torch.Tensor) – The outlier label tensor where 1 represents outliers and 0 represents regular nodes.

pygod.utils.outlier_generator.gen_structure_outliers(data, m, n)[source]#

Generating structural outliers according to paper “Deep Anomaly Detection on Attributed Networks” <https://epubs.siam.org/doi/abs/10.1137/1.9781611975673.67>.

We randomly select m nodes from the network and then make those nodes fully connected, and then all the m nodes in the clique are regarded as outliers. We iteratively repeat this process until a number of n cliques are generated and the total number of structural outliers is m×n.

Parameters

data (torch_geometric.datasets.Data) – PyG dataset object.
m (int) – Number nodes in the outlier cliques.
n (int) – Number of outlier cliques.

Returns

data (torch_geometric.datasets.Data) – The structural outlier graph with injected edges.
y_outlier (torch.Tensor) – The outlier label tensor where 1 represents outliers and 0 represents regular nodes.