Models#

AdONE#

class pygod.models.AdONE(hid_dim=32, num_layers=4, dropout=0.0, weight_decay=0.0, act=<function leaky_relu>, a1=0.2, a2=0.2, a3=0.2, a4=0.2, a5=0.2, contamination=0.1, lr=0.005, epoch=5, gpu=0, batch_size=0, num_neigh=-1, verbose=False)[source]#

Bases: BaseDetector

AdONE (Adversarial Outlier Aware Attributed Network Embedding) consists of an attribute autoencoder and a structure autoencoder. It estimates five loss to optimize the model, including an attribute proximity loss, an attribute homophily loss, a structure proximity loss, a structure homophily loss, and an alignment loss. It calculates three outlier scores, and averages them as an overall score.

See [BVM20] for details.

Parameters
  • hid_dim (int, optional) – Hidden dimension for both attribute autoencoder and structure autoencoder. Default: 0.

  • num_layers (int, optional) – Total number of layers in model. A half (ceil) of the layers are for the encoder, the other half (floor) of the layers are for decoders. Default: 4.

  • dropout (float, optional) – Dropout rate. Default: 0..

  • weight_decay (float, optional) – Weight decay (L2 penalty). Default: 0..

  • act (callable activation function or None, optional) – Activation function if not None. Default: torch.nn.functional.relu.

  • a1 (float, optional) – Loss balance weight for structure proximity. Default: 0.2.

  • a2 (float, optional) – Loss balance weight for structure homophily. Default: 0.2.

  • a3 (float, optional) – Loss balance weight for attribute proximity. Default: 0.2.

  • a4 (float, optional) – Loss balance weight for attribute proximity. Default: 0.2.

  • a5 (float, optional) – Loss balance weight for alignment. Default: 0.2.

  • contamination (float, optional) – Valid in (0., 0.5). The proportion of outliers in the data set. Used when fitting to define the threshold on the decision function. Default: 0.1.

  • lr (float, optional) – Learning rate. Default: 0.004.

  • epoch (int, optional) – Maximum number of training epoch. Default: 5.

  • gpu (int) – GPU Index, -1 for using CPU. Default: 0.

  • batch_size (int, optional) – Minibatch size, 0 for full batch training. Default: 0.

  • num_neigh (int, optional) – Number of neighbors in sampling, -1 for all neighbors. Default: -1.

  • verbose (bool) – Verbosity mode. Turn on to print out log information. Default: False.

Examples

>>> from pygod.models import AdONE
>>> model = AdONE()
>>> model.fit(data) # PyG graph data object
>>> prediction = model.predict(data)
decision_function(G)[source]#

Predict raw anomaly score using the fitted detector. Outliers are assigned with larger anomaly scores.

Parameters

G (PyTorch Geometric Data instance (torch_geometric.data.Data)) – The input data.

Returns

outlier_scores – The anomaly score of shape N.

Return type

numpy.ndarray

fit(G, y_true=None)[source]#

Fit detector with input data.

Parameters
  • G (torch_geometric.data.Data) – The input data.

  • y_true (numpy.ndarray, optional) – The optional outlier ground truth labels used to monitor the training progress. They are not used to optimize the unsupervised model. Default: None.

Returns

self – Fitted estimator.

Return type

object

get_params(deep=True)#

Get parameters for this estimator. See sklearn.base.BaseEstimator for more information.

Parameters

deep (bool, optional) – If True, will return the parameters for this estimator and contained sub-objects that are estimators. Default: `True`.

Returns

params – Parameter names mapped to their values.

Return type

mapping of string to any

predict(G, return_confidence=False)#

Predict if a particular sample is an outlier or not.

Parameters

G (PyTorch Geometric Data instance (torch_geometric.data.Data)) – The input graph.

Returns

  • outlier_labels (numpy array of shape (n_samples,)) – For each observation, tells whether or not it should be considered as an outlier according to the fitted model. 0 stands for inliers and 1 for outliers.

  • confidence (numpy array of shape (n_samples,).) – Only if return_confidence is set to True.

predict_confidence(G)#

Predict the model’s confidence in making the same prediction under slightly different training sets. See [PVD20].

Parameters

G (PyTorch Geometric Data instance (torch_geometric.data.Data)) – The input graph.

Returns

confidence – For each observation, tells how consistently the model would make the same prediction if the training set was perturbed. Return a probability, ranging in [0,1].

Return type

numpy array of shape (n_samples,)

predict_proba(G, method='linear', return_confidence=False)#

Predict the probability of a sample being outlier. Two approaches are possible:

  1. simply use Min-max conversion to linearly transform the outlier scores into the range of [0,1]. The model must be fitted first.

  2. use unifying scores, see [KKSZ11].

Parameters
  • G (PyTorch Geometric Data instance (torch_geometric.data.Data)) – The input graph.

  • method (str, optional (default='linear')) – probability conversion method. It must be one of ‘linear’ or ‘unify’.

  • return_confidence (boolean, optional(default=False)) – If True, also return the confidence of prediction.

Returns

outlier_probability – For each observation, tells whether it should be considered as an outlier according to the fitted model. Return the outlier probability, ranging in [0,1]. Note it depends on the number of classes, which is by default 2 classes ([proba of normal, proba of outliers]).

Return type

numpy array of shape (n_samples, n_classes)

set_params(**params)#

Set the parameters of this estimator. The method works on simple estimators as well as on nested objects (such as pipelines). The latter have parameters of the form <component>__<parameter> so that it’s possible to update each component of a nested object. See sklearn.base.BaseEstimator for more information.

Returns

self

Return type

object

ANOMALOUS#

class pygod.models.ANOMALOUS(gamma=1.0, weight_decay=0.01, lr=0.004, epoch=100, gpu=0, contamination=0.1, verbose=False)[source]#

Bases: BaseDetector

ANOMALOUS (A Joint Modeling Approach for Anomaly Detection on Attributed Networks) is an anomaly detector with CUR decomposition and residual analysis. This model is transductive only.

See [PLL+18] for details.

Parameters
  • gamma (float, optional) – Loss balance weight for attribute and structure. Default: 1..

  • weight_decay (float, optional) – Weight decay (alpha and beta in the original paper). Default: 0.01.

  • contamination (float, optional) – Valid in (0., 0.5). The proportion of outliers in the data set. Used when fitting to define the threshold on the decision function. Default: 0.1.

  • epoch (int, optional) – Maximum number of training epoch. Default: 5.

  • verbose (bool) – Verbosity mode. Turn on to print out log information. Default: False.

Examples

>>> from pygod.models import ANOMALOUS
>>> model = ANOMALOUS()
>>> model.fit(data) # PyG graph data object
>>> prediction = model.predict(None)
decision_function(G)[source]#

Predict raw anomaly score using the fitted detector. Outliers are assigned with larger anomaly scores.

Parameters

G (PyTorch Geometric Data instance (torch_geometric.data.Data)) – The input data.

Returns

outlier_scores – The anomaly score of shape N.

Return type

numpy.ndarray

fit(G, y_true=None)[source]#

Fit detector with input data.

Parameters
  • G (torch_geometric.data.Data) – The input data.

  • y_true (numpy.ndarray, optional) – The optional outlier ground truth labels used to monitor the training progress. They are not used to optimize the unsupervised model. Default: None.

Returns

self – Fitted estimator.

Return type

object

get_params(deep=True)#

Get parameters for this estimator. See sklearn.base.BaseEstimator for more information.

Parameters

deep (bool, optional) – If True, will return the parameters for this estimator and contained sub-objects that are estimators. Default: `True`.

Returns

params – Parameter names mapped to their values.

Return type

mapping of string to any

predict(G, return_confidence=False)#

Predict if a particular sample is an outlier or not.

Parameters

G (PyTorch Geometric Data instance (torch_geometric.data.Data)) – The input graph.

Returns

  • outlier_labels (numpy array of shape (n_samples,)) – For each observation, tells whether or not it should be considered as an outlier according to the fitted model. 0 stands for inliers and 1 for outliers.

  • confidence (numpy array of shape (n_samples,).) – Only if return_confidence is set to True.

predict_confidence(G)#

Predict the model’s confidence in making the same prediction under slightly different training sets. See [PVD20].

Parameters

G (PyTorch Geometric Data instance (torch_geometric.data.Data)) – The input graph.

Returns

confidence – For each observation, tells how consistently the model would make the same prediction if the training set was perturbed. Return a probability, ranging in [0,1].

Return type

numpy array of shape (n_samples,)

predict_proba(G, method='linear', return_confidence=False)#

Predict the probability of a sample being outlier. Two approaches are possible:

  1. simply use Min-max conversion to linearly transform the outlier scores into the range of [0,1]. The model must be fitted first.

  2. use unifying scores, see [KKSZ11].

Parameters
  • G (PyTorch Geometric Data instance (torch_geometric.data.Data)) – The input graph.

  • method (str, optional (default='linear')) – probability conversion method. It must be one of ‘linear’ or ‘unify’.

  • return_confidence (boolean, optional(default=False)) – If True, also return the confidence of prediction.

Returns

outlier_probability – For each observation, tells whether it should be considered as an outlier according to the fitted model. Return the outlier probability, ranging in [0,1]. Note it depends on the number of classes, which is by default 2 classes ([proba of normal, proba of outliers]).

Return type

numpy array of shape (n_samples, n_classes)

set_params(**params)#

Set the parameters of this estimator. The method works on simple estimators as well as on nested objects (such as pipelines). The latter have parameters of the form <component>__<parameter> so that it’s possible to update each component of a nested object. See sklearn.base.BaseEstimator for more information.

Returns

self

Return type

object

AnomalyDAE#

class pygod.models.AnomalyDAE(embed_dim=8, out_dim=4, dropout=0.2, weight_decay=1e-05, act=<function relu>, alpha=None, theta=1.01, eta=1.01, contamination=0.1, lr=0.004, epoch=5, gpu=0, batch_size=0, num_neigh=-1, verbose=False)[source]#

Bases: BaseDetector

AnomalyDAE (Dual autoencoder for anomaly detection on attributed networks) is an anomaly detector that consists of a structure autoencoder and an attribute autoencoder to learn both node embedding and attribute embedding jointly in latent space. The structural autoencoder uses Graph Attention layers. The reconstruction mean square error of the decoders are defined as structure anomaly score and attribute anomaly score, respectively, with two additional penalties on the reconstructed adj matrix and node attributes (force entries to be nonzero).

See [FZL20] for details.

Parameters
  • embed_dim (int, optional) – Hidden dimension of model. Defaults: 8`.

  • out_dim (int, optional) – Dimension of the reduced representation after passing through the structure autoencoder and attribute autoencoder. Defaults: 4.

  • dropout (float, optional) – Dropout rate. Defaults: 0.2.

  • weight_decay (float, optional) – Weight decay (L2 penalty). Defaults: 1e-5.

  • act (callable activation function or None, optional) – Activation function if not None. Defaults: torch.nn.functional.relu.

  • alpha (float, optional) – Loss balance weight for attribute and structure. None for balancing by standard deviation. Default: None.

  • theta (float, optional) – greater than 1, impose penalty to the reconstruction error of the non-zero elements in the adjacency matrix Defaults: 1.01

  • eta (float, optional) – greater than 1, imporse penalty to the reconstruction error of the non-zero elements in the node attributes Defaults: 1.01

  • contamination (float, optional) – Valid in (0., 0.5). The proportion of outliers in the data set. Used when fitting to define the threshold on the decision function. Defaults: 0.1.

  • lr (float, optional) – Learning rate. Defaults: 0.004.

  • epoch (int, optional) – Maximum number of training epoch. Defaults: 5.

  • gpu (int) – GPU Index, -1 for using CPU. Defaults: 0.

  • batch_size (int, optional) – Minibatch size, 0 for full batch training. Default: 0.

  • num_neigh (int, optional) – Number of neighbors in sampling, -1 for all neighbors. Default: -1.

  • verbose (bool) – Verbosity mode. Turn on to print out log information. Defaults: False.

Examples

>>> from pygod.models import AnomalyDAE
>>> model = AnomalyDAE()
>>> model.fit(data) # PyG graph data object
>>> prediction = model.predict(data)
decision_function(G)[source]#

Predict raw anomaly score of X using the fitted detector. The anomaly score of an input sample is computed based on different detector algorithms. For consistency, outliers are assigned with larger anomaly scores.

Parameters

G (PyTorch Geometric Data instance (torch_geometric.data.Data)) – The input data.

Returns

anomaly_scores – The anomaly score of shape N.

Return type

numpy.ndarray

fit(G, y_true=None)[source]#

Fit detector with input data.

Parameters
  • G (torch_geometric.data.Data) – The input data.

  • y_true (numpy.ndarray, optional) – The optional outlier ground truth labels used to monitor the training progress. They are not used to optimize the unsupervised model. Default: None.

Returns

self – Fitted estimator.

Return type

object

get_params(deep=True)#

Get parameters for this estimator. See sklearn.base.BaseEstimator for more information.

Parameters

deep (bool, optional) – If True, will return the parameters for this estimator and contained sub-objects that are estimators. Default: `True`.

Returns

params – Parameter names mapped to their values.

Return type

mapping of string to any

predict(G, return_confidence=False)#

Predict if a particular sample is an outlier or not.

Parameters

G (PyTorch Geometric Data instance (torch_geometric.data.Data)) – The input graph.

Returns

  • outlier_labels (numpy array of shape (n_samples,)) – For each observation, tells whether or not it should be considered as an outlier according to the fitted model. 0 stands for inliers and 1 for outliers.

  • confidence (numpy array of shape (n_samples,).) – Only if return_confidence is set to True.

predict_confidence(G)#

Predict the model’s confidence in making the same prediction under slightly different training sets. See [PVD20].

Parameters

G (PyTorch Geometric Data instance (torch_geometric.data.Data)) – The input graph.

Returns

confidence – For each observation, tells how consistently the model would make the same prediction if the training set was perturbed. Return a probability, ranging in [0,1].

Return type

numpy array of shape (n_samples,)

predict_proba(G, method='linear', return_confidence=False)#

Predict the probability of a sample being outlier. Two approaches are possible:

  1. simply use Min-max conversion to linearly transform the outlier scores into the range of [0,1]. The model must be fitted first.

  2. use unifying scores, see [KKSZ11].

Parameters
  • G (PyTorch Geometric Data instance (torch_geometric.data.Data)) – The input graph.

  • method (str, optional (default='linear')) – probability conversion method. It must be one of ‘linear’ or ‘unify’.

  • return_confidence (boolean, optional(default=False)) – If True, also return the confidence of prediction.

Returns

outlier_probability – For each observation, tells whether it should be considered as an outlier according to the fitted model. Return the outlier probability, ranging in [0,1]. Note it depends on the number of classes, which is by default 2 classes ([proba of normal, proba of outliers]).

Return type

numpy array of shape (n_samples, n_classes)

set_params(**params)#

Set the parameters of this estimator. The method works on simple estimators as well as on nested objects (such as pipelines). The latter have parameters of the form <component>__<parameter> so that it’s possible to update each component of a nested object. See sklearn.base.BaseEstimator for more information.

Returns

self

Return type

object

CoLA#

class pygod.models.CoLA(lr=0.001, epoch=10, embedding_dim=64, negsamp_ratio=1, readout='avg', weight_decay=0.0, batch_size=0, subgraph_size=4, contamination=0.1, gpu=0, verbose=False)[source]#

Bases: BaseDetector

CoLA (Anomaly Detection on Attributed Networks via Contrastive Self-Supervised Learning) is a contrastive self-supervised learning based method for graph anomaly detection. (beta)

See [LLP+21] for details.

Parameters
  • lr (float, optional) – Learning rate. Default: 1e-3.

  • epoch (int, optional) – Maximum number of training epoch. Default: 10.

  • embedding_dim (int, optional) – The node embedding dimension obtained by the GCN module of CoLA. Default: 64.

  • negsamp_ratio (int, optional) – Number of negative samples for each instance used by the contrastive learning module. Default: 1.

  • readout (str, optional) – The readout layer type used by CoLA model. Default: avg .

  • weight_decay (float, optional) – Weight decay (L2 penalty). Default: 0..

  • batch_size (int, optional) – Minibatch size, 0 for full batch training. Default: 0.

  • subgraph_size (int, optional) – Number of nodes in the subgraph sampled by random walk. Default: 4.

  • contamination (float, optional) – Valid in (0., 0.5). The proportion of outliers in the data set. Used when fitting to define the threshold on the decision function. Default: 0.1.

  • gpu (int, optional) – GPU Index, -1 for using CPU. Default: 0.

  • verbose (bool, optional) – Verbosity mode. Turn on to print out log information. Default: False.

Examples

>>> from pygod.models import CoLA
>>> model = CoLA()
>>> model.fit(data)
>>> prediction = model.predict(data)
decision_function(G, rounds=10)[source]#

Predict raw anomaly score using the fitted detector. Outliers are assigned with larger anomaly scores.

Parameters
  • G (PyTorch Geometric Data instance (torch_geometric.data.Data)) – The input data.

  • rounds (int, optional) – Number of rounds to generate the decision score. Default: 10.

Returns

outlier_scores – The anomaly score of shape N.

Return type

numpy.ndarray

fit(G, y_true=None)[source]#

Fit detector with input data.

Parameters
  • G (PyTorch Geometric Data instance (torch_geometric.data.Data)) – The input data.

  • y_true (numpy.ndarray, optional) – The optional outlier ground truth labels used to monitor the training progress. They are not used to optimize the unsupervised model. Default: None.

Returns

self – Fitted estimator.

Return type

object

get_params(deep=True)#

Get parameters for this estimator. See sklearn.base.BaseEstimator for more information.

Parameters

deep (bool, optional) – If True, will return the parameters for this estimator and contained sub-objects that are estimators. Default: `True`.

Returns

params – Parameter names mapped to their values.

Return type

mapping of string to any

loss_function(logits, batch_size)[source]#
predict(G, return_confidence=False)#

Predict if a particular sample is an outlier or not.

Parameters

G (PyTorch Geometric Data instance (torch_geometric.data.Data)) – The input graph.

Returns

  • outlier_labels (numpy array of shape (n_samples,)) – For each observation, tells whether or not it should be considered as an outlier according to the fitted model. 0 stands for inliers and 1 for outliers.

  • confidence (numpy array of shape (n_samples,).) – Only if return_confidence is set to True.

predict_confidence(G)#

Predict the model’s confidence in making the same prediction under slightly different training sets. See [PVD20].

Parameters

G (PyTorch Geometric Data instance (torch_geometric.data.Data)) – The input graph.

Returns

confidence – For each observation, tells how consistently the model would make the same prediction if the training set was perturbed. Return a probability, ranging in [0,1].

Return type

numpy array of shape (n_samples,)

predict_proba(G, method='linear', return_confidence=False)#

Predict the probability of a sample being outlier. Two approaches are possible:

  1. simply use Min-max conversion to linearly transform the outlier scores into the range of [0,1]. The model must be fitted first.

  2. use unifying scores, see [KKSZ11].

Parameters
  • G (PyTorch Geometric Data instance (torch_geometric.data.Data)) – The input graph.

  • method (str, optional (default='linear')) – probability conversion method. It must be one of ‘linear’ or ‘unify’.

  • return_confidence (boolean, optional(default=False)) – If True, also return the confidence of prediction.

Returns

outlier_probability – For each observation, tells whether it should be considered as an outlier according to the fitted model. Return the outlier probability, ranging in [0,1]. Note it depends on the number of classes, which is by default 2 classes ([proba of normal, proba of outliers]).

Return type

numpy array of shape (n_samples, n_classes)

set_params(**params)#

Set the parameters of this estimator. The method works on simple estimators as well as on nested objects (such as pipelines). The latter have parameters of the form <component>__<parameter> so that it’s possible to update each component of a nested object. See sklearn.base.BaseEstimator for more information.

Returns

self

Return type

object

CONAD#

class pygod.models.CONAD(hid_dim=64, num_layers=4, dropout=0.3, weight_decay=0.0, act=<function relu>, alpha=None, eta=0.5, contamination=0.1, lr=0.005, epoch=5, gpu=0, batch_size=0, num_neigh=-1, margin=0.5, r=0.2, m=50, k=50, f=10, verbose=False)[source]#

Bases: BaseDetector

CONAD (Contrastive Attributed Network Anomaly Detection) is an anomaly detector consisting of a shared graph convolutional encoder, a structure reconstruction decoder, and an attribute reconstruction decoder. The model is trained with both contrastive loss and structure/attribute reconstruction loss. The reconstruction mean square error of the decoders are defined as structure anomaly score and attribute anomaly score, respectively.

See [XHZ+22] for details.

Parameters
  • hid_dim (int, optional) – Hidden dimension of model. Default: 64.

  • num_layers (int, optional) – Total number of layers in model. A half (ceil) of the layers are for the encoder, the other half (floor) of the layers are for decoders. Default: 4.

  • dropout (float, optional) – Dropout rate. Default: 0.3.

  • weight_decay (float, optional) – Weight decay (L2 penalty). Default: 0..

  • act (callable activation function or None, optional) – Activation function if not None. Default: torch.nn.functional.relu.

  • alpha (float, optional) – Loss balance weight for attribute and structure. None for balancing by standard deviation. Default: None.

  • eta (float, optional) – Loss balance weight for contrastive and reconstruction. Default: 0.5.

  • contamination (float, optional) – Valid in (0., 0.5). The proportion of outliers in the data set. Used when fitting to define the threshold on the decision function. Default: 0.1.

  • lr (float, optional) – Learning rate. Default: 0.005.

  • epoch (int, optional) – Maximum number of training epoch. Default: 5.

  • gpu (int) – GPU Index, -1 for using CPU. Default: 0.

  • batch_size (int, optional) – Minibatch size, 0 for full batch training. Default: 0.

  • num_neigh (int, optional) – Number of neighbors in sampling, -1 for all neighbors. Default: -1.

  • r (float, optional) – The rate of augmented anomalies. Default: .2.

  • m (int, optional) – For densely connected nodes, the number of edges to add. Default: 50.

  • k (int, optional) – same as k in pygod.generator.gen_contextual_outliers. Default: 50.

  • f (int, optional) – For disproportionate nodes, the scale factor applied on their attribute value. Default: 10.

  • verbose (bool) – Verbosity mode. Turn on to print out log information. Default: False.

Examples

>>> from pygod.models import CONAD
>>> model = CONAD()
>>> model.fit(data) # PyG graph data object
>>> prediction = model.predict(data)
decision_function(G)[source]#

Predict raw anomaly score using the fitted detector. Outliers are assigned with larger anomaly scores.

Parameters

G (PyTorch Geometric Data instance (torch_geometric.data.Data)) – The input data.

Returns

outlier_scores – The anomaly score of shape N.

Return type

numpy.ndarray

fit(G, y_true=None)[source]#

Fit detector with input data.

Parameters
  • G (PyTorch Geometric Data instance (torch_geometric.data.Data)) – The input data.

  • y_true (numpy.ndarray, optional) – The optional outlier ground truth labels used to monitor the training progress. They are not used to optimize the unsupervised model. Default: None.

Returns

self – Fitted estimator.

Return type

object

get_params(deep=True)#

Get parameters for this estimator. See sklearn.base.BaseEstimator for more information.

Parameters

deep (bool, optional) – If True, will return the parameters for this estimator and contained sub-objects that are estimators. Default: `True`.

Returns

params – Parameter names mapped to their values.

Return type

mapping of string to any

predict(G, return_confidence=False)#

Predict if a particular sample is an outlier or not.

Parameters

G (PyTorch Geometric Data instance (torch_geometric.data.Data)) – The input graph.

Returns

  • outlier_labels (numpy array of shape (n_samples,)) – For each observation, tells whether or not it should be considered as an outlier according to the fitted model. 0 stands for inliers and 1 for outliers.

  • confidence (numpy array of shape (n_samples,).) – Only if return_confidence is set to True.

predict_confidence(G)#

Predict the model’s confidence in making the same prediction under slightly different training sets. See [PVD20].

Parameters

G (PyTorch Geometric Data instance (torch_geometric.data.Data)) – The input graph.

Returns

confidence – For each observation, tells how consistently the model would make the same prediction if the training set was perturbed. Return a probability, ranging in [0,1].

Return type

numpy array of shape (n_samples,)

predict_proba(G, method='linear', return_confidence=False)#

Predict the probability of a sample being outlier. Two approaches are possible:

  1. simply use Min-max conversion to linearly transform the outlier scores into the range of [0,1]. The model must be fitted first.

  2. use unifying scores, see [KKSZ11].

Parameters
  • G (PyTorch Geometric Data instance (torch_geometric.data.Data)) – The input graph.

  • method (str, optional (default='linear')) – probability conversion method. It must be one of ‘linear’ or ‘unify’.

  • return_confidence (boolean, optional(default=False)) – If True, also return the confidence of prediction.

Returns

outlier_probability – For each observation, tells whether it should be considered as an outlier according to the fitted model. Return the outlier probability, ranging in [0,1]. Note it depends on the number of classes, which is by default 2 classes ([proba of normal, proba of outliers]).

Return type

numpy array of shape (n_samples, n_classes)

set_params(**params)#

Set the parameters of this estimator. The method works on simple estimators as well as on nested objects (such as pipelines). The latter have parameters of the form <component>__<parameter> so that it’s possible to update each component of a nested object. See sklearn.base.BaseEstimator for more information.

Returns

self

Return type

object

DOMINANT#

class pygod.models.DOMINANT(hid_dim=64, num_layers=4, dropout=0.3, weight_decay=0.0, act=<function relu>, alpha=None, contamination=0.1, lr=0.005, epoch=5, gpu=0, batch_size=0, num_neigh=-1, verbose=False)[source]#

Bases: BaseDetector

DOMINANT (Deep Anomaly Detection on Attributed Networks) is an anomaly detector consisting of a shared graph convolutional encoder, a structure reconstruction decoder, and an attribute reconstruction decoder. The reconstruction mean square error of the decoders are defined as structure anomaly score and attribute anomaly score, respectively.

See [DLBL19] for details.

Parameters
  • hid_dim (int, optional) – Hidden dimension of model. Default: 0.

  • num_layers (int, optional) – Total number of layers in model. A half (floor) of the layers are for the encoder, the other half (ceil) of the layers are for decoders. Default: 4.

  • dropout (float, optional) – Dropout rate. Default: 0..

  • weight_decay (float, optional) – Weight decay (L2 penalty). Default: 0..

  • act (callable activation function or None, optional) – Activation function if not None. Default: torch.nn.functional.relu.

  • alpha (float, optional) – Loss balance weight for attribute and structure. None for balancing by standard deviation. Default: None.

  • contamination (float, optional) – Valid in (0., 0.5). The proportion of outliers in the data set. Used when fitting to define the threshold on the decision function. Default: 0.1.

  • lr (float, optional) – Learning rate. Default: 0.004.

  • epoch (int, optional) – Maximum number of training epoch. Default: 5.

  • gpu (int) – GPU Index, -1 for using CPU. Default: 0.

  • batch_size (int, optional) – Minibatch size, 0 for full batch training. Default: 0.

  • num_neigh (int, optional) – Number of neighbors in sampling, -1 for all neighbors. Default: -1.

  • verbose (bool) – Verbosity mode. Turn on to print out log information. Default: False.

Examples

>>> from pygod.models import DOMINANT
>>> model = DOMINANT()
>>> model.fit(data) # PyG graph data object
>>> prediction = model.predict(data)
decision_function(G)[source]#

Predict raw anomaly score using the fitted detector. Outliers are assigned with larger anomaly scores.

Parameters

G (PyTorch Geometric Data instance (torch_geometric.data.Data)) – The input data.

Returns

outlier_scores – The anomaly score of shape N.

Return type

numpy.ndarray

fit(G, y_true=None)[source]#

Fit detector with input data.

Parameters
  • G (torch_geometric.data.Data) – The input data.

  • y_true (numpy.ndarray, optional) – The optional outlier ground truth labels used to monitor the training progress. They are not used to optimize the unsupervised model. Default: None.

Returns

self – Fitted estimator.

Return type

object

get_params(deep=True)#

Get parameters for this estimator. See sklearn.base.BaseEstimator for more information.

Parameters

deep (bool, optional) – If True, will return the parameters for this estimator and contained sub-objects that are estimators. Default: `True`.

Returns

params – Parameter names mapped to their values.

Return type

mapping of string to any

predict(G, return_confidence=False)#

Predict if a particular sample is an outlier or not.

Parameters

G (PyTorch Geometric Data instance (torch_geometric.data.Data)) – The input graph.

Returns

  • outlier_labels (numpy array of shape (n_samples,)) – For each observation, tells whether or not it should be considered as an outlier according to the fitted model. 0 stands for inliers and 1 for outliers.

  • confidence (numpy array of shape (n_samples,).) – Only if return_confidence is set to True.

predict_confidence(G)#

Predict the model’s confidence in making the same prediction under slightly different training sets. See [PVD20].

Parameters

G (PyTorch Geometric Data instance (torch_geometric.data.Data)) – The input graph.

Returns

confidence – For each observation, tells how consistently the model would make the same prediction if the training set was perturbed. Return a probability, ranging in [0,1].

Return type

numpy array of shape (n_samples,)

predict_proba(G, method='linear', return_confidence=False)#

Predict the probability of a sample being outlier. Two approaches are possible:

  1. simply use Min-max conversion to linearly transform the outlier scores into the range of [0,1]. The model must be fitted first.

  2. use unifying scores, see [KKSZ11].

Parameters
  • G (PyTorch Geometric Data instance (torch_geometric.data.Data)) – The input graph.

  • method (str, optional (default='linear')) – probability conversion method. It must be one of ‘linear’ or ‘unify’.

  • return_confidence (boolean, optional(default=False)) – If True, also return the confidence of prediction.

Returns

outlier_probability – For each observation, tells whether it should be considered as an outlier according to the fitted model. Return the outlier probability, ranging in [0,1]. Note it depends on the number of classes, which is by default 2 classes ([proba of normal, proba of outliers]).

Return type

numpy array of shape (n_samples, n_classes)

set_params(**params)#

Set the parameters of this estimator. The method works on simple estimators as well as on nested objects (such as pipelines). The latter have parameters of the form <component>__<parameter> so that it’s possible to update each component of a nested object. See sklearn.base.BaseEstimator for more information.

Returns

self

Return type

object

DONE#

class pygod.models.DONE(hid_dim=32, num_layers=4, dropout=0.0, weight_decay=0.0, act=<function leaky_relu>, a1=0.2, a2=0.2, a3=0.2, a4=0.2, a5=0.2, contamination=0.1, lr=0.005, epoch=5, gpu=0, batch_size=0, num_neigh=-1, verbose=False)[source]#

Bases: BaseDetector

DONE (Deep Outlier Aware Attributed Network Embedding) consists of an attribute autoencoder and a structure autoencoder. It estimates five losses to optimize the model, including an attribute proximity loss, an attribute homophily loss, a structure proximity loss, a structure homophily loss, and a combination loss. It calculates three outlier scores, and averages them as an overall scores.

See [BVM20] for details.

Parameters
  • hid_dim (int, optional) – Hidden dimension for both attribute autoencoder and structure autoencoder. Default: 0.

  • num_layers (int, optional) – Total number of layers in model. A half (ceil) of the layers are for the encoder, the other half (floor) of the layers are for decoders. Default: 4.

  • dropout (float, optional) – Dropout rate. Default: 0..

  • weight_decay (float, optional) – Weight decay (L2 penalty). Default: 0..

  • act (callable activation function or None, optional) – Activation function if not None. Default: torch.nn.functional.relu.

  • a1 (float, optional) – Loss balance weight for structure proximity. Default: 0.2.

  • a2 (float, optional) – Loss balance weight for structure homophily. Default: 0.2.

  • a3 (float, optional) – Loss balance weight for attribute proximity. Default: 0.2.

  • a4 (float, optional) – Loss balance weight for attribute proximity. Default: 0.2.

  • a5 (float, optional) – Loss balance weight for combination. Default: 0.2.

  • contamination (float, optional) – Valid in (0., 0.5). The proportion of outliers in the data set. Used when fitting to define the threshold on the decision function. Default: 0.1.

  • lr (float, optional) – Learning rate. Default: 0.004.

  • epoch (int, optional) – Maximum number of training epoch. Default: 5.

  • gpu (int) – GPU Index, -1 for using CPU. Default: 0.

  • batch_size (int, optional) – Minibatch size, 0 for full batch training. Default: 0.

  • num_neigh (int, optional) – Number of neighbors in sampling, -1 for all neighbors. Default: -1.

  • verbose (bool) – Verbosity mode. Turn on to print out log information. Default: False.

Examples

>>> from pygod.models import DONE
>>> model = DONE()
>>> model.fit(data)
>>> prediction = model.predict(data)
decision_function(G)[source]#

Predict raw anomaly score using the fitted detector. Outliers are assigned with larger anomaly scores.

Parameters

G (PyTorch Geometric Data instance (torch_geometric.data.Data)) – The input data.

Returns

outlier_scores – The anomaly score of shape N.

Return type

numpy.ndarray

fit(G, y_true=None)[source]#

Fit detector with input data.

Parameters
  • G (torch_geometric.data.Data) – The input data.

  • y_true (numpy.ndarray, optional) – The optional outlier ground truth labels used to monitor the training progress. They are not used to optimize the unsupervised model. Default: None.

Returns

self – Fitted estimator.

Return type

object

get_params(deep=True)#

Get parameters for this estimator. See sklearn.base.BaseEstimator for more information.

Parameters

deep (bool, optional) – If True, will return the parameters for this estimator and contained sub-objects that are estimators. Default: `True`.

Returns

params – Parameter names mapped to their values.

Return type

mapping of string to any

predict(G, return_confidence=False)#

Predict if a particular sample is an outlier or not.

Parameters

G (PyTorch Geometric Data instance (torch_geometric.data.Data)) – The input graph.

Returns

  • outlier_labels (numpy array of shape (n_samples,)) – For each observation, tells whether or not it should be considered as an outlier according to the fitted model. 0 stands for inliers and 1 for outliers.

  • confidence (numpy array of shape (n_samples,).) – Only if return_confidence is set to True.

predict_confidence(G)#

Predict the model’s confidence in making the same prediction under slightly different training sets. See [PVD20].

Parameters

G (PyTorch Geometric Data instance (torch_geometric.data.Data)) – The input graph.

Returns

confidence – For each observation, tells how consistently the model would make the same prediction if the training set was perturbed. Return a probability, ranging in [0,1].

Return type

numpy array of shape (n_samples,)

predict_proba(G, method='linear', return_confidence=False)#

Predict the probability of a sample being outlier. Two approaches are possible:

  1. simply use Min-max conversion to linearly transform the outlier scores into the range of [0,1]. The model must be fitted first.

  2. use unifying scores, see [KKSZ11].

Parameters
  • G (PyTorch Geometric Data instance (torch_geometric.data.Data)) – The input graph.

  • method (str, optional (default='linear')) – probability conversion method. It must be one of ‘linear’ or ‘unify’.

  • return_confidence (boolean, optional(default=False)) – If True, also return the confidence of prediction.

Returns

outlier_probability – For each observation, tells whether it should be considered as an outlier according to the fitted model. Return the outlier probability, ranging in [0,1]. Note it depends on the number of classes, which is by default 2 classes ([proba of normal, proba of outliers]).

Return type

numpy array of shape (n_samples, n_classes)

set_params(**params)#

Set the parameters of this estimator. The method works on simple estimators as well as on nested objects (such as pipelines). The latter have parameters of the form <component>__<parameter> so that it’s possible to update each component of a nested object. See sklearn.base.BaseEstimator for more information.

Returns

self

Return type

object

GAAN#

class pygod.models.GAAN(noise_dim=16, hid_dim=64, generator_layers=2, encoder_layers=2, dropout=0.1, weight_decay=0.01, act=<function relu>, alpha=None, contamination=0.1, lr=0.01, epoch=5, gpu=0, batch_size=0, num_neigh=-1, verbose=False)[source]#

Bases: BaseDetector

GAAN (Generative Adversarial Attributed Network Anomaly Detection) is a generative adversarial attribute network anomaly detection framework, including a generator module, an encoder module, a discriminator module, and uses anomaly evaluation measures that consider sample reconstruction error and real sample recognition confidence to make predictions. This model is transductive only.

See [CLW+20] for details.

Parameters
  • noise_dim (int, optional) – Dimension of the Gaussian random noise. Defaults: 16.

  • hid_dim (int, optional) – Hidden dimension of MLP later 3. Defaults: 64.

  • generator_layers (int, optional) – Number of layers in generator. Defaults: 2.

  • encoder_layers (int, optional) – Number of layers in encoder. Defaults: 2.

  • dropout (float, optional) – Dropout rate. Defaults: 0..

  • weight_decay (float, optional) – Weight decay (L2 penalty). Defaults: 0..

  • act (callable activation function or None, optional) – Activation function if not None. Defaults: torch.nn.functional.relu.

  • alpha (float, optional) – Loss balance weight for attribute and structure. None for balancing by standard deviation. Default: None.

  • contamination (float, optional) – Valid in (0., 0.5). The proportion of outliers in the data set. Used when fitting to define the threshold on the decision function. Defaults: 0.05.

  • lr (float, optional) – Learning rate. Defaults: 0.005.

  • epoch (int, optional) – Maximum number of training epoch. Defaults: 10.

  • gpu (int) – GPU Index, -1 for using CPU. Defaults: -1.

  • batch_size (int, optional) – Minibatch size, 0 for full batch training. Default: 0.

  • num_neigh (int, optional) – Number of neighbors in sampling, -1 for all neighbors. Default: -1.

  • verbose (bool) – Verbosity mode. Turn on to print out log information. Defaults: False.

Examples

>>> from pygod.models import GAAN
>>> model = GAAN()
>>> model.fit(data) # PyG graph data object
>>> prediction = model.predict(None)
decision_function(G)[source]#

Predict raw anomaly score using the fitted detector. Outliers are assigned with larger anomaly scores.

Parameters

G (PyTorch Geometric Data instance (torch_geometric.data.Data)) – The input data.

Returns

outlier_scores – The anomaly score of shape N.

Return type

numpy.ndarray

fit(G, y_true=None)[source]#

Fit detector with input data.

Parameters
  • G (torch_geometric.data.Data) – The input data.

  • y_true (numpy.ndarray, optional) – The optional outlier ground truth labels used to monitor the training progress. They are not used to optimize the unsupervised model. Default: None.

Returns

self – Fitted estimator.

Return type

object

get_params(deep=True)#

Get parameters for this estimator. See sklearn.base.BaseEstimator for more information.

Parameters

deep (bool, optional) – If True, will return the parameters for this estimator and contained sub-objects that are estimators. Default: `True`.

Returns

params – Parameter names mapped to their values.

Return type

mapping of string to any

predict(G, return_confidence=False)#

Predict if a particular sample is an outlier or not.

Parameters

G (PyTorch Geometric Data instance (torch_geometric.data.Data)) – The input graph.

Returns

  • outlier_labels (numpy array of shape (n_samples,)) – For each observation, tells whether or not it should be considered as an outlier according to the fitted model. 0 stands for inliers and 1 for outliers.

  • confidence (numpy array of shape (n_samples,).) – Only if return_confidence is set to True.

predict_confidence(G)#

Predict the model’s confidence in making the same prediction under slightly different training sets. See [PVD20].

Parameters

G (PyTorch Geometric Data instance (torch_geometric.data.Data)) – The input graph.

Returns

confidence – For each observation, tells how consistently the model would make the same prediction if the training set was perturbed. Return a probability, ranging in [0,1].

Return type

numpy array of shape (n_samples,)

predict_proba(G, method='linear', return_confidence=False)#

Predict the probability of a sample being outlier. Two approaches are possible:

  1. simply use Min-max conversion to linearly transform the outlier scores into the range of [0,1]. The model must be fitted first.

  2. use unifying scores, see [KKSZ11].

Parameters
  • G (PyTorch Geometric Data instance (torch_geometric.data.Data)) – The input graph.

  • method (str, optional (default='linear')) – probability conversion method. It must be one of ‘linear’ or ‘unify’.

  • return_confidence (boolean, optional(default=False)) – If True, also return the confidence of prediction.

Returns

outlier_probability – For each observation, tells whether it should be considered as an outlier according to the fitted model. Return the outlier probability, ranging in [0,1]. Note it depends on the number of classes, which is by default 2 classes ([proba of normal, proba of outliers]).

Return type

numpy array of shape (n_samples, n_classes)

set_params(**params)#

Set the parameters of this estimator. The method works on simple estimators as well as on nested objects (such as pipelines). The latter have parameters of the form <component>__<parameter> so that it’s possible to update each component of a nested object. See sklearn.base.BaseEstimator for more information.

Returns

self

Return type

object

GCNAE#

class pygod.models.GCNAE(hid_dim=64, num_layers=4, dropout=0.3, weight_decay=0.0, act=<function relu>, contamination=0.1, lr=0.005, epoch=100, gpu=0, batch_size=0, num_neigh=-1, verbose=False)[source]#

Bases: BaseDetector

Vanila Graph Convolutional Networks Autoencoder.

See [YZY+21] for details.

Parameters
  • hid_dim (int, optional) – Hidden dimension of model. Default: 0.

  • num_layers (int, optional) – Total number of layers in autoencoders. Default: 4.

  • dropout (float, optional) – Dropout rate. Default: 0..

  • weight_decay (float, optional) – Weight decay (L2 penalty). Default: 0..

  • act (callable activation function or None, optional) – Activation function if not None. Default: torch.nn.functional.relu.

  • contamination (float, optional) – Valid in (0., 0.5). The proportion of outliers in the data set. Used when fitting to define the threshold on the decision function. Default: 0.1.

  • lr (float, optional) – Learning rate. Default: 0.004.

  • epoch (int, optional) – Maximum number of training epoch. Default: 100.

  • gpu (int) – GPU Index, -1 for using CPU. Default: 0.

  • batch_size (int, optional) – Minibatch size, 0 for full batch training. Default: 0.

  • num_neigh (int, optional) – Number of neighbors in sampling, -1 for all neighbors. Default: -1.

  • verbose (bool) – Verbosity mode. Turn on to print out log information. Default: False.

Examples

>>> from pygod.models import GCNAE
>>> model = GCNAE()
>>> model.fit(data) # PyG graph data object
>>> prediction = model.predict(data)
decision_function(G)[source]#

Predict raw anomaly score using the fitted detector. Outliers are assigned with larger anomaly scores.

Parameters

G (PyTorch Geometric Data instance (torch_geometric.data.Data)) – The input data.

Returns

outlier_scores – The anomaly score of shape N.

Return type

numpy.ndarray

fit(G, y_true=None)[source]#

Fit detector with input data.

Parameters
  • G (torch_geometric.data.Data) – The input data.

  • y_true (numpy.ndarray, optional) – The optional outlier ground truth labels used to monitor the training progress. They are not used to optimize the unsupervised model. Default: None.

Returns

self – Fitted estimator.

Return type

object

get_params(deep=True)#

Get parameters for this estimator. See sklearn.base.BaseEstimator for more information.

Parameters

deep (bool, optional) – If True, will return the parameters for this estimator and contained sub-objects that are estimators. Default: `True`.

Returns

params – Parameter names mapped to their values.

Return type

mapping of string to any

predict(G, return_confidence=False)#

Predict if a particular sample is an outlier or not.

Parameters

G (PyTorch Geometric Data instance (torch_geometric.data.Data)) – The input graph.

Returns

  • outlier_labels (numpy array of shape (n_samples,)) – For each observation, tells whether or not it should be considered as an outlier according to the fitted model. 0 stands for inliers and 1 for outliers.

  • confidence (numpy array of shape (n_samples,).) – Only if return_confidence is set to True.

predict_confidence(G)#

Predict the model’s confidence in making the same prediction under slightly different training sets. See [PVD20].

Parameters

G (PyTorch Geometric Data instance (torch_geometric.data.Data)) – The input graph.

Returns

confidence – For each observation, tells how consistently the model would make the same prediction if the training set was perturbed. Return a probability, ranging in [0,1].

Return type

numpy array of shape (n_samples,)

predict_proba(G, method='linear', return_confidence=False)#

Predict the probability of a sample being outlier. Two approaches are possible:

  1. simply use Min-max conversion to linearly transform the outlier scores into the range of [0,1]. The model must be fitted first.

  2. use unifying scores, see [KKSZ11].

Parameters
  • G (PyTorch Geometric Data instance (torch_geometric.data.Data)) – The input graph.

  • method (str, optional (default='linear')) – probability conversion method. It must be one of ‘linear’ or ‘unify’.

  • return_confidence (boolean, optional(default=False)) – If True, also return the confidence of prediction.

Returns

outlier_probability – For each observation, tells whether it should be considered as an outlier according to the fitted model. Return the outlier probability, ranging in [0,1]. Note it depends on the number of classes, which is by default 2 classes ([proba of normal, proba of outliers]).

Return type

numpy array of shape (n_samples, n_classes)

set_params(**params)#

Set the parameters of this estimator. The method works on simple estimators as well as on nested objects (such as pipelines). The latter have parameters of the form <component>__<parameter> so that it’s possible to update each component of a nested object. See sklearn.base.BaseEstimator for more information.

Returns

self

Return type

object

GUIDE#

class pygod.models.GUIDE(a_hid=32, s_hid=4, num_layers=4, dropout=0.3, weight_decay=0.0, act=<function relu>, alpha=None, contamination=0.1, lr=0.001, epoch=10, gpu=0, batch_size=0, num_neigh=-1, graphlet_size=4, selected_motif=True, cache_dir=None, verbose=False)[source]#

Bases: BaseDetector

GUIDE (Higher-order Structure based Anomaly Detection on Attributed Networks) is an anomaly detector consisting of an attribute graph convolutional autoencoder, and a structure graph attentive autoencoder (not the same as the graph attention networks). Instead of the adjacency matrix, node motif degree is used as input of structure autoencoder. The reconstruction mean square error of the autoencoders are defined as structure anomaly score and attribute anomaly score, respectively.

Note: The calculation of node motif degree in preprocessing has high time complexity. It may take longer than you expect.

See [YZY+21] for details.

Parameters
  • a_hid (int, optional) – Hidden dimension for attribute autoencoder. Default: 32.

  • s_hid (int, optional) – Hidden dimension for structure autoencoder. Default: 4.

  • num_layers (int, optional) – Total number of layers in autoencoders. Default: 4.

  • dropout (float, optional) – Dropout rate. Default: 0..

  • weight_decay (float, optional) – Weight decay (L2 penalty). Default: 0..

  • act (callable activation function or None, optional) – Activation function if not None. Default: torch.nn.functional.relu.

  • alpha (float, optional) – Loss balance weight for attribute and structure. None for balancing by standard deviation. Default: None.

  • contamination (float, optional) – Valid in (0., 0.5). The proportion of outliers in the data set. Used when fitting to define the threshold on the decision function. Default: 0.1.

  • lr (float, optional) – Learning rate. Default: 0.004.

  • epoch (int, optional) – Maximum number of training epoch. Default: 10.

  • gpu (int, optional) – GPU Index, -1 for using CPU. Default: 0.

  • batch_size (int, optional) – Minibatch size, 0 for full batch training. Default: 0.

  • num_neigh (int, optional) – Number of neighbors in sampling, -1 for all neighbors. Default: -1.

  • graphlet_size (int, optional) – The maximum graphlet size used to compute structure input. Default: 4.

  • selected_motif (bool, optional) – Use selected motifs which are defined in the original paper. Default: True.

  • cache_dir (str, option) – The directory for the node motif degree caching. Default: None.

  • verbose (bool, optional) – Verbosity mode. Turn on to print out log information. Default: False.

Examples

>>> from pygod.models import GUIDE
>>> model = GUIDE()
>>> model.fit(data) # PyG graph data object
>>> prediction = model.predict(data)
decision_function(G)[source]#

Predict raw anomaly score using the fitted detector. Outliers are assigned with larger anomaly scores.

Parameters

G (PyTorch Geometric Data instance (torch_geometric.data.Data)) – The input data.

Returns

outlier_scores – The anomaly score of shape N.

Return type

numpy.ndarray

fit(G, y_true=None)[source]#

Fit detector with input data.

Parameters
  • G (torch_geometric.data.Data) – The input data.

  • y_true (numpy.ndarray, optional) – The optional outlier ground truth labels used to monitor the training progress. They are not used to optimize the unsupervised model. Default: None.

Returns

self – Fitted estimator.

Return type

object

get_params(deep=True)#

Get parameters for this estimator. See sklearn.base.BaseEstimator for more information.

Parameters

deep (bool, optional) – If True, will return the parameters for this estimator and contained sub-objects that are estimators. Default: `True`.

Returns

params – Parameter names mapped to their values.

Return type

mapping of string to any

predict(G, return_confidence=False)#

Predict if a particular sample is an outlier or not.

Parameters

G (PyTorch Geometric Data instance (torch_geometric.data.Data)) – The input graph.

Returns

  • outlier_labels (numpy array of shape (n_samples,)) – For each observation, tells whether or not it should be considered as an outlier according to the fitted model. 0 stands for inliers and 1 for outliers.

  • confidence (numpy array of shape (n_samples,).) – Only if return_confidence is set to True.

predict_confidence(G)#

Predict the model’s confidence in making the same prediction under slightly different training sets. See [PVD20].

Parameters

G (PyTorch Geometric Data instance (torch_geometric.data.Data)) – The input graph.

Returns

confidence – For each observation, tells how consistently the model would make the same prediction if the training set was perturbed. Return a probability, ranging in [0,1].

Return type

numpy array of shape (n_samples,)

predict_proba(G, method='linear', return_confidence=False)#

Predict the probability of a sample being outlier. Two approaches are possible:

  1. simply use Min-max conversion to linearly transform the outlier scores into the range of [0,1]. The model must be fitted first.

  2. use unifying scores, see [KKSZ11].

Parameters
  • G (PyTorch Geometric Data instance (torch_geometric.data.Data)) – The input graph.

  • method (str, optional (default='linear')) – probability conversion method. It must be one of ‘linear’ or ‘unify’.

  • return_confidence (boolean, optional(default=False)) – If True, also return the confidence of prediction.

Returns

outlier_probability – For each observation, tells whether it should be considered as an outlier according to the fitted model. Return the outlier probability, ranging in [0,1]. Note it depends on the number of classes, which is by default 2 classes ([proba of normal, proba of outliers]).

Return type

numpy array of shape (n_samples, n_classes)

set_params(**params)#

Set the parameters of this estimator. The method works on simple estimators as well as on nested objects (such as pipelines). The latter have parameters of the form <component>__<parameter> so that it’s possible to update each component of a nested object. See sklearn.base.BaseEstimator for more information.

Returns

self

Return type

object

MLPAE#

class pygod.models.MLPAE(hid_dim=64, num_layers=4, dropout=0.3, weight_decay=0.0, act=<function relu>, contamination=0.1, lr=0.005, epoch=5, gpu=0, batch_size=0, verbose=False)[source]#

Bases: BaseDetector

Vanila Multilayer Perceptron Autoencoder.

See [YZY+21] for details.

Parameters
  • hid_dim (int, optional) – Hidden dimension of model. Default: 0.

  • num_layers (int, optional) – Total number of layers in autoencoders. Default: 4.

  • dropout (float, optional) – Dropout rate. Default: 0..

  • weight_decay (float, optional) – Weight decay (L2 penalty). Default: 0..

  • act (callable activation function or None, optional) – Activation function if not None. Default: torch.nn.functional.relu.

  • contamination (float, optional) – Valid in (0., 0.5). The proportion of outliers in the data set. Used when fitting to define the threshold on the decision function. Default: 0.1.

  • lr (float, optional) – Learning rate. Default: 0.004.

  • epoch (int, optional) – Maximum number of training epoch. Default: 5.

  • gpu (int) – GPU Index, -1 for using CPU. Default: 0.

  • batch_size (int, optional) – Minibatch size, 0 for full batch training. Default: 0.

  • verbose (bool) – Verbosity mode. Turn on to print out log information. Default: False.

Examples

>>> from pygod.models import MLPAE
>>> model = MLPAE()
>>> model.fit(data) # PyG graph data object
>>> prediction = model.predict(data)
decision_function(G)[source]#

Predict raw anomaly score using the fitted detector. Outliers are assigned with larger anomaly scores.

Parameters

G (PyTorch Geometric Data instance (torch_geometric.data.Data)) – The input data.

Returns

outlier_scores – The anomaly score of shape N.

Return type

numpy.ndarray

fit(G, y_true=None)[source]#

Fit detector with input data.

Parameters
  • G (torch_geometric.data.Data) – The input data.

  • y_true (numpy.ndarray, optional) – The optional outlier ground truth labels used to monitor the training progress. They are not used to optimize the unsupervised model. Default: None.

Returns

self – Fitted estimator.

Return type

object

get_params(deep=True)#

Get parameters for this estimator. See sklearn.base.BaseEstimator for more information.

Parameters

deep (bool, optional) – If True, will return the parameters for this estimator and contained sub-objects that are estimators. Default: `True`.

Returns

params – Parameter names mapped to their values.

Return type

mapping of string to any

predict(G, return_confidence=False)#

Predict if a particular sample is an outlier or not.

Parameters

G (PyTorch Geometric Data instance (torch_geometric.data.Data)) – The input graph.

Returns

  • outlier_labels (numpy array of shape (n_samples,)) – For each observation, tells whether or not it should be considered as an outlier according to the fitted model. 0 stands for inliers and 1 for outliers.

  • confidence (numpy array of shape (n_samples,).) – Only if return_confidence is set to True.

predict_confidence(G)#

Predict the model’s confidence in making the same prediction under slightly different training sets. See [PVD20].

Parameters

G (PyTorch Geometric Data instance (torch_geometric.data.Data)) – The input graph.

Returns

confidence – For each observation, tells how consistently the model would make the same prediction if the training set was perturbed. Return a probability, ranging in [0,1].

Return type

numpy array of shape (n_samples,)

predict_proba(G, method='linear', return_confidence=False)#

Predict the probability of a sample being outlier. Two approaches are possible:

  1. simply use Min-max conversion to linearly transform the outlier scores into the range of [0,1]. The model must be fitted first.

  2. use unifying scores, see [KKSZ11].

Parameters
  • G (PyTorch Geometric Data instance (torch_geometric.data.Data)) – The input graph.

  • method (str, optional (default='linear')) – probability conversion method. It must be one of ‘linear’ or ‘unify’.

  • return_confidence (boolean, optional(default=False)) – If True, also return the confidence of prediction.

Returns

outlier_probability – For each observation, tells whether it should be considered as an outlier according to the fitted model. Return the outlier probability, ranging in [0,1]. Note it depends on the number of classes, which is by default 2 classes ([proba of normal, proba of outliers]).

Return type

numpy array of shape (n_samples, n_classes)

set_params(**params)#

Set the parameters of this estimator. The method works on simple estimators as well as on nested objects (such as pipelines). The latter have parameters of the form <component>__<parameter> so that it’s possible to update each component of a nested object. See sklearn.base.BaseEstimator for more information.

Returns

self

Return type

object

OCGNN#

class pygod.models.OCGNN(n_hidden=256, n_layers=4, contamination=0.1, dropout=0.3, lr=0.005, weight_decay=0, eps=0.001, nu=0.5, gpu=0, epoch=5, warmup_epoch=2, verbose=False, act=<function relu>, batch_size=0, num_neigh=-1)[source]#

Bases: BaseDetector

OCGNN (One-Class Graph Neural Networks for Anomaly Detection in Attributed Networks) is an anomaly detector that measures the distance of anomaly to the centroid, in a similar fashion to the support vector machine, but in the embedding space after feeding towards several layers of GCN.

See [WJD+21] for details.

Parameters
  • n_hidden (int, optional) – Hidden dimension of model. Defaults: 256`.

  • n_layers (int, optional) – Dimensions of underlying GCN. Defaults: 4.

  • contamination (float, optional) – Valid in (0., 0.5). The proportion of outliers in the data set. Used when fitting to define the threshold on the decision function. Default: 0.1.

  • dropout (float, optional) – Dropout rate. Defaults: 0.3.

  • weight_decay (float, optional) – Weight decay (L2 penalty). Defaults: 0..

  • act (callable activation function or None, optional) – Activation function if not None. Defaults: torch.nn.functional.relu.

  • eps (float, optional) – A small valid number for determining the center and make sure it does not collapse to 0. Defaults: 0.001.

  • nu (float, optional) – Regularization parameter. Defaults: 0.5

  • lr (float, optional) – Learning rate. Defaults: 0.005.

  • epoch (int, optional) – Maximum number of training epoch. Defaults: 5.

  • warmup_epoch (int, optional) – Number of epochs to update radius and center in the beginning of training. Defaults: 2.

  • gpu (int) – GPU Index, -1 for using CPU. Defaults: 0.

  • verbose (bool) – Verbosity mode. Turn on to print out log information. Defaults: False.

  • batch_size (int, optional) – Minibatch size, 0 for full batch training. Default: 0.

  • num_neigh (int, optional) – Number of neighbors in sampling, -1 for all neighbors. Default: -1.

Examples

>>> from pygod.models import AnomalyDAE
>>> model = OCGNN()
>>> model.fit(data) # PyG graph data object
>>> prediction = model.predict(data)
decision_function(G)[source]#

Predict raw anomaly score of X using the fitted detector. The anomaly score of an input sample is computed based on distance to the centroid and measurement within the radius :param G: The input data. :type G: PyTorch Geometric Data instance (torch_geometric.data.Data)

Returns

anomaly_scores – The anomaly score of the input samples of shape (n_samples,).

Return type

numpy.array

fit(G, y_true=None)[source]#

Fit detector with input data.

Parameters
  • G (torch_geometric.data.Data) – The input data.

  • y_true (numpy.ndarray, optional) – The optional outlier ground truth labels used to monitor the training progress. They are not used to optimize the unsupervised model. Default: None.

Returns

self – Fitted estimator.

Return type

object

get_params(deep=True)#

Get parameters for this estimator. See sklearn.base.BaseEstimator for more information.

Parameters

deep (bool, optional) – If True, will return the parameters for this estimator and contained sub-objects that are estimators. Default: `True`.

Returns

params – Parameter names mapped to their values.

Return type

mapping of string to any

predict(G, return_confidence=False)#

Predict if a particular sample is an outlier or not.

Parameters

G (PyTorch Geometric Data instance (torch_geometric.data.Data)) – The input graph.

Returns

  • outlier_labels (numpy array of shape (n_samples,)) – For each observation, tells whether or not it should be considered as an outlier according to the fitted model. 0 stands for inliers and 1 for outliers.

  • confidence (numpy array of shape (n_samples,).) – Only if return_confidence is set to True.

predict_confidence(G)#

Predict the model’s confidence in making the same prediction under slightly different training sets. See [PVD20].

Parameters

G (PyTorch Geometric Data instance (torch_geometric.data.Data)) – The input graph.

Returns

confidence – For each observation, tells how consistently the model would make the same prediction if the training set was perturbed. Return a probability, ranging in [0,1].

Return type

numpy array of shape (n_samples,)

predict_proba(G, method='linear', return_confidence=False)#

Predict the probability of a sample being outlier. Two approaches are possible:

  1. simply use Min-max conversion to linearly transform the outlier scores into the range of [0,1]. The model must be fitted first.

  2. use unifying scores, see [KKSZ11].

Parameters
  • G (PyTorch Geometric Data instance (torch_geometric.data.Data)) – The input graph.

  • method (str, optional (default='linear')) – probability conversion method. It must be one of ‘linear’ or ‘unify’.

  • return_confidence (boolean, optional(default=False)) – If True, also return the confidence of prediction.

Returns

outlier_probability – For each observation, tells whether it should be considered as an outlier according to the fitted model. Return the outlier probability, ranging in [0,1]. Note it depends on the number of classes, which is by default 2 classes ([proba of normal, proba of outliers]).

Return type

numpy array of shape (n_samples, n_classes)

set_params(**params)#

Set the parameters of this estimator. The method works on simple estimators as well as on nested objects (such as pipelines). The latter have parameters of the form <component>__<parameter> so that it’s possible to update each component of a nested object. See sklearn.base.BaseEstimator for more information.

Returns

self

Return type

object

ONE#

class pygod.models.ONE(K=36, iter=5, contamination=0.1, verbose=False)[source]#

Bases: BaseDetector

ONE (Outlier Aware Network Embedding for Attributed Networks)

See [BLM19] for details.

Parameters
  • K (int, optional) – Every vertex is a K dimensional vector, K < min(N, D). Default: 36.

  • iter (int, optional) – Number of outer Iterations for optimization. Default: 5.

  • contamination (float, optional) – Valid in (0., 0.5). The proportion of outliers in the data set. Used when fitting to define the threshold on the decision function. Default: 0.1.

  • verbose (bool) – Verbosity mode. Turn on to print out log information. Default: False.

Examples

>>> from pygod.models import ONE
>>> model = ONE()
>>> model.fit(data) # PyG graph data object
>>> prediction = model.predict(data)
decision_function(G)[source]#

Predict raw anomaly score using the fitted detector. Outliers are assigned with larger anomaly scores.

Parameters

G (PyTorch Geometric Data instance (torch_geometric.data.Data)) – The input data.

Returns

outl2 – The anomaly score of shape N.

Return type

numpy.ndarray

fit(G, y_true=None)[source]#

Fit detector with input data.

Parameters
  • G (torch_geometric.data.Data) – The input data.

  • y_true (numpy.ndarray, optional) – The optional outlier ground truth labels used to monitor the training progress. They are not used to optimize the unsupervised model. Default: None.

Returns

self – Fitted estimator.

Return type

object

get_params(deep=True)#

Get parameters for this estimator. See sklearn.base.BaseEstimator for more information.

Parameters

deep (bool, optional) – If True, will return the parameters for this estimator and contained sub-objects that are estimators. Default: `True`.

Returns

params – Parameter names mapped to their values.

Return type

mapping of string to any

predict(G, return_confidence=False)#

Predict if a particular sample is an outlier or not.

Parameters

G (PyTorch Geometric Data instance (torch_geometric.data.Data)) – The input graph.

Returns

  • outlier_labels (numpy array of shape (n_samples,)) – For each observation, tells whether or not it should be considered as an outlier according to the fitted model. 0 stands for inliers and 1 for outliers.

  • confidence (numpy array of shape (n_samples,).) – Only if return_confidence is set to True.

predict_confidence(G)#

Predict the model’s confidence in making the same prediction under slightly different training sets. See [PVD20].

Parameters

G (PyTorch Geometric Data instance (torch_geometric.data.Data)) – The input graph.

Returns

confidence – For each observation, tells how consistently the model would make the same prediction if the training set was perturbed. Return a probability, ranging in [0,1].

Return type

numpy array of shape (n_samples,)

predict_proba(G, method='linear', return_confidence=False)#

Predict the probability of a sample being outlier. Two approaches are possible:

  1. simply use Min-max conversion to linearly transform the outlier scores into the range of [0,1]. The model must be fitted first.

  2. use unifying scores, see [KKSZ11].

Parameters
  • G (PyTorch Geometric Data instance (torch_geometric.data.Data)) – The input graph.

  • method (str, optional (default='linear')) – probability conversion method. It must be one of ‘linear’ or ‘unify’.

  • return_confidence (boolean, optional(default=False)) – If True, also return the confidence of prediction.

Returns

outlier_probability – For each observation, tells whether it should be considered as an outlier according to the fitted model. Return the outlier probability, ranging in [0,1]. Note it depends on the number of classes, which is by default 2 classes ([proba of normal, proba of outliers]).

Return type

numpy array of shape (n_samples, n_classes)

set_params(**params)#

Set the parameters of this estimator. The method works on simple estimators as well as on nested objects (such as pipelines). The latter have parameters of the form <component>__<parameter> so that it’s possible to update each component of a nested object. See sklearn.base.BaseEstimator for more information.

Returns

self

Return type

object

Radar#

class pygod.models.Radar(gamma=1.0, weight_decay=0.01, lr=0.004, epoch=100, gpu=0, contamination=0.1, verbose=False)[source]#

Bases: BaseDetector

Radar (Residual Analysis for Anomaly Detection in Attributed Networks) is an anomaly detector with residual analysis. This model is transductive only.

See [LDHL17] for details.

Parameters
  • gamma (float, optional) – Loss balance weight for attribute and structure. Default: 1..

  • weight_decay (float, optional) – Weight decay (alpha and beta in the original paper). Default: 0.01.

  • contamination (float, optional) – Valid in (0., 0.5). The proportion of outliers in the data set. Used when fitting to define the threshold on the decision function. Default: 0.1.

  • epoch (int, optional) – Maximum number of training epoch. Default: 5.

  • verbose (bool) – Verbosity mode. Turn on to print out log information. Default: False.

Examples

>>> from pygod.models import Radar
>>> model = Radar()
>>> model.fit(data) # PyG graph data object
>>> prediction = model.predict(None)
decision_function(G)[source]#

Predict raw anomaly score using the fitted detector. Outliers are assigned with larger anomaly scores.

Parameters

G (PyTorch Geometric Data instance (torch_geometric.data.Data)) – The input data.

Returns

outlier_scores – The anomaly score of shape N.

Return type

numpy.ndarray

fit(G, y_true=None)[source]#

Fit detector with input data.

Parameters
  • G (torch_geometric.data.Data) – The input data.

  • y_true (numpy.ndarray, optional) – The optional outlier ground truth labels used to monitor the training progress. They are not used to optimize the unsupervised model. Default: None.

Returns

self – Fitted estimator.

Return type

object

get_params(deep=True)#

Get parameters for this estimator. See sklearn.base.BaseEstimator for more information.

Parameters

deep (bool, optional) – If True, will return the parameters for this estimator and contained sub-objects that are estimators. Default: `True`.

Returns

params – Parameter names mapped to their values.

Return type

mapping of string to any

predict(G, return_confidence=False)#

Predict if a particular sample is an outlier or not.

Parameters

G (PyTorch Geometric Data instance (torch_geometric.data.Data)) – The input graph.

Returns

  • outlier_labels (numpy array of shape (n_samples,)) – For each observation, tells whether or not it should be considered as an outlier according to the fitted model. 0 stands for inliers and 1 for outliers.

  • confidence (numpy array of shape (n_samples,).) – Only if return_confidence is set to True.

predict_confidence(G)#

Predict the model’s confidence in making the same prediction under slightly different training sets. See [PVD20].

Parameters

G (PyTorch Geometric Data instance (torch_geometric.data.Data)) – The input graph.

Returns

confidence – For each observation, tells how consistently the model would make the same prediction if the training set was perturbed. Return a probability, ranging in [0,1].

Return type

numpy array of shape (n_samples,)

predict_proba(G, method='linear', return_confidence=False)#

Predict the probability of a sample being outlier. Two approaches are possible:

  1. simply use Min-max conversion to linearly transform the outlier scores into the range of [0,1]. The model must be fitted first.

  2. use unifying scores, see [KKSZ11].

Parameters
  • G (PyTorch Geometric Data instance (torch_geometric.data.Data)) – The input graph.

  • method (str, optional (default='linear')) – probability conversion method. It must be one of ‘linear’ or ‘unify’.

  • return_confidence (boolean, optional(default=False)) – If True, also return the confidence of prediction.

Returns

outlier_probability – For each observation, tells whether it should be considered as an outlier according to the fitted model. Return the outlier probability, ranging in [0,1]. Note it depends on the number of classes, which is by default 2 classes ([proba of normal, proba of outliers]).

Return type

numpy array of shape (n_samples, n_classes)

set_params(**params)#

Set the parameters of this estimator. The method works on simple estimators as well as on nested objects (such as pipelines). The latter have parameters of the form <component>__<parameter> so that it’s possible to update each component of a nested object. See sklearn.base.BaseEstimator for more information.

Returns

self

Return type

object

SCAN#

class pygod.models.SCAN(eps=0.5, mu=2, contamination=0.1, verbose=False)[source]#

Bases: BaseDetector

SCAN (Structural Clustering Algorithm for Networks) is a clustering algorithm, which only takes the graph structure without the node features as the input. Note: This model will output detected clusters instead of “outliers” descibed in the original paper.

See [XYFS07] for details.

Parameters
  • eps (float, optional) – Neighborhood threshold. Default: .5.

  • mu (int, optional) – Minimal size of clusters. Default: 2.

  • contamination (float, optional) – Valid in (0., 0.5). The proportion of outliers in the data set. Used when fitting to define the threshold on the decision function. Default: 0.1.

  • verbose (bool) – Verbosity mode. Turn on to print out log information. Default: False.

Examples

>>> from pygod.models import SCAN
>>> model = SCAN()
>>> model.fit(data) # PyG graph data object
>>> prediction = model.predict(None)
decision_function(G)[source]#

Predict raw anomaly score using the fitted detector. Outliers are assigned with larger anomaly scores.

Parameters

G (PyTorch Geometric Data instance (torch_geometric.data.Data)) – The input data.

Returns

outlier_scores – The anomaly score of shape N.

Return type

numpy.ndarray

fit(G, y_true=None)[source]#

Fit detector with input data.

Parameters
  • G (torch_geometric.data.Data) – The input data.

  • y_true (numpy.ndarray, optional) – The optional outlier ground truth labels used to monitor the training progress. They are not used to optimize the unsupervised model. Default: None.

Returns

self – Fitted estimator.

Return type

object

get_params(deep=True)#

Get parameters for this estimator. See sklearn.base.BaseEstimator for more information.

Parameters

deep (bool, optional) – If True, will return the parameters for this estimator and contained sub-objects that are estimators. Default: `True`.

Returns

params – Parameter names mapped to their values.

Return type

mapping of string to any

hasLabel(cliques, vertex)[source]#
neighborhood(G, v)[source]#
predict(G, return_confidence=False)#

Predict if a particular sample is an outlier or not.

Parameters

G (PyTorch Geometric Data instance (torch_geometric.data.Data)) – The input graph.

Returns

  • outlier_labels (numpy array of shape (n_samples,)) – For each observation, tells whether or not it should be considered as an outlier according to the fitted model. 0 stands for inliers and 1 for outliers.

  • confidence (numpy array of shape (n_samples,).) – Only if return_confidence is set to True.

predict_confidence(G)#

Predict the model’s confidence in making the same prediction under slightly different training sets. See [PVD20].

Parameters

G (PyTorch Geometric Data instance (torch_geometric.data.Data)) – The input graph.

Returns

confidence – For each observation, tells how consistently the model would make the same prediction if the training set was perturbed. Return a probability, ranging in [0,1].

Return type

numpy array of shape (n_samples,)

predict_proba(G, method='linear', return_confidence=False)#

Predict the probability of a sample being outlier. Two approaches are possible:

  1. simply use Min-max conversion to linearly transform the outlier scores into the range of [0,1]. The model must be fitted first.

  2. use unifying scores, see [KKSZ11].

Parameters
  • G (PyTorch Geometric Data instance (torch_geometric.data.Data)) – The input graph.

  • method (str, optional (default='linear')) – probability conversion method. It must be one of ‘linear’ or ‘unify’.

  • return_confidence (boolean, optional(default=False)) – If True, also return the confidence of prediction.

Returns

outlier_probability – For each observation, tells whether it should be considered as an outlier according to the fitted model. Return the outlier probability, ranging in [0,1]. Note it depends on the number of classes, which is by default 2 classes ([proba of normal, proba of outliers]).

Return type

numpy array of shape (n_samples, n_classes)

process_graph(G)[source]#

Process the raw PyG data object into a tuple of sub data objects needed for the model.

Parameters

G (PyTorch Geometric Data instance (torch_geometric.data.Data)) – The input data.

Returns

G – NetworkX Graph

Return type

networkx.classes.graph.Graph

sameClusters(G, clusters, u)[source]#
set_params(**params)#

Set the parameters of this estimator. The method works on simple estimators as well as on nested objects (such as pipelines). The latter have parameters of the form <component>__<parameter> so that it’s possible to update each component of a nested object. See sklearn.base.BaseEstimator for more information.

Returns

self

Return type

object

similarity(G, v, u)[source]#

Reference#

BLM19

Sambaran Bandyopadhyay, N Lokesh, and M Narasimha Murty. Outlier aware network embedding for attributed networks. In Proceedings of the AAAI conference on artificial intelligence, volume 33, 12–19. 2019.

BVM20(1,2)

Sambaran Bandyopadhyay, Saley Vishal Vivek, and MN Murty. Outlier resistant unsupervised deep architectures for attributed network embedding. In Proceedings of the 13th International Conference on Web Search and Data Mining, 25–33. 2020.

CCL+21

Lei Cai, Zhengzhang Chen, Chen Luo, Jiaping Gui, Jingchao Ni, Ding Li, and Haifeng Chen. Structural temporal graph neural networks for anomaly detection in dynamic graphs. In Proceedings of the 30th ACM International Conference on Information & Knowledge Management, 3747–3756. 2021.

CLW+20

Zhenxing Chen, Bo Liu, Meiqing Wang, Peng Dai, Jun Lv, and Liefeng Bo. Generative adversarial attributed network anomaly detection. In Proceedings of the 29th ACM International Conference on Information & Knowledge Management, 1989–1992. 2020.

DLBL19

Kaize Ding, Jundong Li, Rohit Bhanushali, and Huan Liu. Deep anomaly detection on attributed networks. In Proceedings of the 2019 SIAM International Conference on Data Mining, 594–602. SIAM, 2019.

DLS+20

Yingtong Dou, Zhiwei Liu, Li Sun, Yutong Deng, Hao Peng, and Philip S Yu. Enhancing graph neural network-based fraud detectors against camouflaged fraudsters. In Proceedings of the 29th ACM International Conference on Information & Knowledge Management, 315–324. 2020.

FZL20

Haoyi Fan, Fengbin Zhang, and Zuoyong Li. Anomalydae: dual autoencoder for anomaly detection on attributed networks. In ICASSP 2020 - 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 5685–5689. 2020. doi:10.1109/ICASSP40776.2020.9053387.

KKSZ11(1,2,3,4,5,6,7,8,9,10,11,12,13,14,15)

Hans-Peter Kriegel, Peer Kroger, Erich Schubert, and Arthur Zimek. Interpreting and unifying outlier scores. In Proceedings of the 2011 SIAM International Conference on Data Mining, 13–24. SIAM, 2011.

LDHL17

Jundong Li, Harsh Dani, Xia Hu, and Huan Liu. Radar: residual analysis for anomaly detection in attributed networks. In IJCAI, 2152–2158. 2017.

LLP+21

Yixin Liu, Zhao Li, Shirui Pan, Chen Gong, Chuan Zhou, and George Karypis. Anomaly detection on attributed networks via contrastive self-supervised learning. IEEE transactions on neural networks and learning systems, 2021.

PLL+18

Zhen Peng, Minnan Luo, Jundong Li, Huan Liu, and Qinghua Zheng. Anomalous: a joint modeling approach for anomaly detection on attributed networks. In IJCAI, 3513–3519. 2018.

PVD20(1,2,3,4,5,6,7,8,9,10,11,12,13,14,15)

Lorenzo Perini, Vincent Vercruyssen, and Jesse Davis. Quantifying the confidence of anomaly detectors in their example-wise predictions. In Joint European Conference on Machine Learning and Knowledge Discovery in Databases, 227–243. Springer, 2020.

WJD+21

Xuhong Wang, Baihong Jin, Ying Du, Ping Cui, Yingshui Tan, and Yupu Yang. One-class graph neural networks for anomaly detection in attributed networks. Neural computing and applications, 33(18):12073–12085, 2021.

XYFS07

Xiaowei Xu, Nurcan Yuruk, Zhidan Feng, and Thomas AJ Schweiger. Scan: a structural clustering algorithm for networks. In Proceedings of the 13th ACM SIGKDD international conference on Knowledge discovery and data mining, 824–833. 2007.

XHZ+22

Zhiming Xu, Xiao Huang, Yue Zhao, Yushun Dong, and Jundong Li. Contrastive attributed network anomaly detection with data augmentation. In Pacific-Asian Conference on Knowledge Discovery and Data Mining (PAKDD). 2022.

YZY+21(1,2,3)

Xu Yuan, Na Zhou, Shuo Yu, Huafei Huang, Zhikui Chen, and Feng Xia. Higher-order structure based anomaly detection on attributed networks. In 2021 IEEE International Conference on Big Data (Big Data), 2691–2700. IEEE, 2021.