Models#

pygod.models.adone module#

Adversarial Outlier Aware Attributed Network Embedding (AdONE)

class pygod.models.adone.AdONE(Adversarial Outlier Aware Attributed Network Embedding)[source]#

Bases: pygod.models.base.BaseDetector

AdONE is consist of an attribute autoencoder and a structure autoencoder. It estimates five loss to optimize the model, including an attribute proximity loss, an attribute homophily loss, a structure proximity loss, a structure homophily loss, and an alignment loss. It calculates three outlier score, and averages them as an overall score.

See [BVM20] for details.

Parameters
  • hid_dim (int, optional) – Hidden dimension for both attribute autoencoder and structure autoencoder. Default: 0.

  • num_layers (int, optional) – Total number of layers in model. A half (ceil) of the layers are for the encoder, the other half (floor) of the layers are for decoders. Default: 4.

  • dropout (float, optional) – Dropout rate. Default: 0..

  • weight_decay (float, optional) – Weight decay (L2 penalty). Default: 0..

  • act (callable activation function or None, optional) – Activation function if not None. Default: torch.nn.functional.relu.

  • a1 (float, optional) – Loss balance weight for structure proximity. Default: 0.2.

  • a2 (float, optional) – Loss balance weight for structure homophily. Default: 0.2.

  • a3 (float, optional) – Loss balance weight for attribute proximity. Default: 0.2.

  • a4 (float, optional) – Loss balance weight for attribute proximity. Default: 0.2.

  • a5 (float, optional) – Loss balance weight for alignment. Default: 0.2.

  • contamination (float, optional) – Valid in (0., 0.5). The proportion of outliers in the data set. Used when fitting to define the threshold on the decision function. Default: 0.1.

  • lr (float, optional) – Learning rate. Default: 0.004.

  • epoch (int, optional) – Maximum number of training epoch. Default: 5.

  • gpu (int) – GPU Index, -1 for using CPU. Default: 0.

  • verbose (bool) – Verbosity mode. Turn on to print out log information. Default: False.

Examples

>>> from pygod.models import AdONE
>>> model = AdONE()
>>> model.fit(data) # PyG graph data object
>>> prediction = model.predict(data)
decision_function(G)[source]#

Predict raw anomaly score using the fitted detector. Outliers are assigned with larger anomaly scores.

Parameters

G (PyTorch Geometric Data instance (torch_geometric.data.Data)) – The input data.

Returns

outlier_scores – The anomaly score of shape N.

Return type

numpy.ndarray

fit(G, y_true=None)[source]#

Fit detector with input data.

Parameters
  • G (PyTorch Geometric Data instance (torch_geometric.data.Data)) – The input data.

  • y_true (numpy.array, optional (default=None)) – The optional outlier ground truth labels used to monitor the training progress. They are not used to optimize the unsupervised model.

Returns

self – Fitted estimator.

Return type

object

get_params(deep=True)#

Get parameters for this estimator. See https://scikit-learn.org/stable/modules/generated/sklearn.base .BaseEstimator.html and sklearn/base.py for more information. :param deep: If True, will return the parameters for this estimator and

contained sub-objects that are estimators.

Returns

params – Parameter names mapped to their values.

Return type

mapping of string to any

loss_func(x, x_, s, s_, h_a, h_s, dna, dns, dis_a, dis_s)[source]#
predict(G, return_confidence=False)#

Predict if a particular sample is an outlier or not.

Parameters

G (PyTorch Geometric Data instance (torch_geometric.data.Data)) – The input graph.

Returns

  • outlier_labels (numpy array of shape (n_samples,)) – For each observation, tells whether or not it should be considered as an outlier according to the fitted model. 0 stands for inliers and 1 for outliers.

  • confidence (numpy array of shape (n_samples,).) – Only if return_confidence is set to True.

predict_confidence(G)#

Predict the model’s confidence in making the same prediction under slightly different training sets. See [PVD20].

Parameters

G (PyTorch Geometric Data instance (torch_geometric.data.Data)) – The input graph.

Returns

confidence – For each observation, tells how consistently the model would make the same prediction if the training set was perturbed. Return a probability, ranging in [0,1].

Return type

numpy array of shape (n_samples,)

predict_proba(G, method='linear', return_confidence=False)#

Predict the probability of a sample being outlier. Two approaches are possible:

  1. simply use Min-max conversion to linearly transform the outlier scores into the range of [0,1]. The model must be fitted first.

  2. use unifying scores, see [KKSZ11].

Parameters
  • G (PyTorch Geometric Data instance (torch_geometric.data.Data)) – The input graph.

  • method (str, optional (default='linear')) – probability conversion method. It must be one of ‘linear’ or ‘unify’.

  • return_confidence (boolean, optional(default=False)) – If True, also return the confidence of prediction.

Returns

outlier_probability – For each observation, tells whether it should be considered as an outlier according to the fitted model. Return the outlier probability, ranging in [0,1]. Note it depends on the number of classes, which is by default 2 classes ([proba of normal, proba of outliers]).

Return type

numpy array of shape (n_samples, n_classes)

process_graph(G)[source]#

Process the raw PyG data object into a tuple of sub data objects needed for the model.

Parameters

G (PyTorch Geometric Data instance (torch_geometric.data.Data)) – The input data.

Returns

  • x (torch.Tensor) – Attribute (feature) of nodes.

  • s (torch.Tensor) – Adjacency matrix of the graph.

  • edge_index (torch.Tensor) – Edge list of the graph.

set_params(**params)#

Set the parameters of this estimator. The method works on simple estimators as well as on nested objects (such as pipelines). The latter have parameters of the form <component>__<parameter> so that it’s possible to update each component of a nested object. See https://scikit-learn.org/stable/modules/generated/sklearn.base .BaseEstimator.html and sklearn/base.py for more information. :returns: self :rtype: object

pygod.models.anomalydae module#

AnomalyDAE: Dual autoencoder for anomaly detection on attributed networks

class pygod.models.anomalydae.AnomalyDAE(embed_dim=8, out_dim=4, dropout=0.2, weight_decay=1e-05, act=<function relu>, alpha=0.5, theta=1.01, eta=1.01, contamination=0.1, lr=0.004, epoch=5, gpu=0, verbose=False)[source]#

Bases: pygod.models.base.BaseDetector

AnomalyDAE (Dual autoencoder for anomaly detection on attributed networks): AnomalyDAE is an anomaly detector that. consists of a structure autoencoder and an attribute autoencoder to learn both node embedding and attribute embedding jointly in latent space. The structural autoencoer uses Graph Attention layers. The reconstruction mean square error of the decoders are defined as structure anamoly score and attribute anomaly score, respectively, with two additional penalties on the reconstructed adj matrix and node attributes (force entries to be nonzero).

See: cite ‘fan2020anomalydae’ for details.

Parameters
  • embed_dim (int, optional) – Hidden dimension of model. Defaults: 8`.

  • out_dim (int, optional) – Dimension of the reduced representation after passing through the structure autoencoder and attribute autoencoder. Defaults: 4.

  • dropout (float, optional) – Dropout rate. Defaults: 0.2.

  • weight_decay (float, optional) – Weight decay (L2 penalty). Defaults: 1e-5.

  • act (callable activation function or None, optional) – Activation function if not None. Defaults: torch.nn.functional.relu.

  • alpha (float, optional) – loss balance weight for attribute and structure. Defaults: 0.5.

  • theta (float, optional) – greater than 1, impose penalty to the reconstruction error of the non-zero elements in the adjacency matrix Defaults: 1.01

  • eta (float, optional) – greater than 1, imporse penalty to the reconstruction error of the non-zero elements in the node attributes Defaults: 1.01

  • contamination (float, optional) – Valid in (0., 0.5). The proportion of outliers in the data set. Used when fitting to define the threshold on the decision function. Defaults: 0.1.

  • lr (float, optional) – Learning rate. Defaults: 0.004.

  • epoch (int, optional) – Maximum number of training epoch. Defaults: 5.

  • gpu (int) – GPU Index, -1 for using CPU. Defaults: 0.

  • verbose (bool) – Verbosity mode. Turn on to print out log information. Defaults: False.

Examples

>>> from pygod.models import AnomalyDAE
>>> model = AnomalyDAE()
>>> model.fit(data) # PyG graph data object
>>> prediction = model.predict(data)
decision_function(G)[source]#

Predict raw anomaly score of X using the fitted detector. The anomaly score of an input sample is computed based on different detector algorithms. For consistency, outliers are assigned with larger anomaly scores.

Parameters

G (PyTorch Geometric Data instance (torch_geometric.data.Data)) – The input data.

Returns

anomaly_scores – The anomaly score of shape N.

Return type

numpy.ndarray

fit(G, y_true=None)[source]#

Fit detector with input data.

Parameters
  • G (PyTorch Geometric Data instance (torch_geometric.data.Data)) – The input data.

  • y_true (numpy.array, optional (default=None)) – The optional outlier ground truth labels used to monitor the training progress. They are not used to optimize the unsupervised model.

Returns

self – Fitted estimator.

Return type

object

get_params(deep=True)#

Get parameters for this estimator. See https://scikit-learn.org/stable/modules/generated/sklearn.base .BaseEstimator.html and sklearn/base.py for more information. :param deep: If True, will return the parameters for this estimator and

contained sub-objects that are estimators.

Returns

params – Parameter names mapped to their values.

Return type

mapping of string to any

loss_func(adj, A_hat, attrs, X_hat)[source]#
predict(G, return_confidence=False)#

Predict if a particular sample is an outlier or not.

Parameters

G (PyTorch Geometric Data instance (torch_geometric.data.Data)) – The input graph.

Returns

  • outlier_labels (numpy array of shape (n_samples,)) – For each observation, tells whether or not it should be considered as an outlier according to the fitted model. 0 stands for inliers and 1 for outliers.

  • confidence (numpy array of shape (n_samples,).) – Only if return_confidence is set to True.

predict_confidence(G)#

Predict the model’s confidence in making the same prediction under slightly different training sets. See [PVD20].

Parameters

G (PyTorch Geometric Data instance (torch_geometric.data.Data)) – The input graph.

Returns

confidence – For each observation, tells how consistently the model would make the same prediction if the training set was perturbed. Return a probability, ranging in [0,1].

Return type

numpy array of shape (n_samples,)

predict_proba(G, method='linear', return_confidence=False)#

Predict the probability of a sample being outlier. Two approaches are possible:

  1. simply use Min-max conversion to linearly transform the outlier scores into the range of [0,1]. The model must be fitted first.

  2. use unifying scores, see [KKSZ11].

Parameters
  • G (PyTorch Geometric Data instance (torch_geometric.data.Data)) – The input graph.

  • method (str, optional (default='linear')) – probability conversion method. It must be one of ‘linear’ or ‘unify’.

  • return_confidence (boolean, optional(default=False)) – If True, also return the confidence of prediction.

Returns

outlier_probability – For each observation, tells whether it should be considered as an outlier according to the fitted model. Return the outlier probability, ranging in [0,1]. Note it depends on the number of classes, which is by default 2 classes ([proba of normal, proba of outliers]).

Return type

numpy array of shape (n_samples, n_classes)

process_graph(G)[source]#

Process the raw PyG data object into a tuple of sub data objects needed for the model.

Parameters

G (PyTorch Geometric Data instance (torch_geometric.data.Data)) – The input data.

Returns

  • x (torch.Tensor) – Attribute (feature) of nodes.

  • adj (torch.Tensor) – Adjacency matrix of the graph.

  • edge_index (torch.Tensor) – Edge list of the graph.

set_params(**params)#

Set the parameters of this estimator. The method works on simple estimators as well as on nested objects (such as pipelines). The latter have parameters of the form <component>__<parameter> so that it’s possible to update each component of a nested object. See https://scikit-learn.org/stable/modules/generated/sklearn.base .BaseEstimator.html and sklearn/base.py for more information. :returns: self :rtype: object

pygod.models.dominant module#

Deep Anomaly Detection on Attributed Networks (DOMINANT)

class pygod.models.dominant.DOMINANT(Deep Anomaly Detection on Attributed Networks)[source]#

Bases: pygod.models.base.BaseDetector

DOMINANT is an anomaly detector consisting of a shared graph convolutional encoder, a structure reconstruction decoder, and an attribute reconstruction decoder. The reconstruction mean square error of the decoders are defined as structure anomaly score and attribute anomaly score, respectively.

See [DLBL19] for details.

Parameters
  • hid_dim (int, optional) – Hidden dimension of model. Default: 0.

  • num_layers (int, optional) – Total number of layers in model. A half (ceil) of the layers are for the encoder, the other half (floor) of the layers are for decoders. Default: 4.

  • dropout (float, optional) – Dropout rate. Default: 0..

  • weight_decay (float, optional) – Weight decay (L2 penalty). Default: 0..

  • act (callable activation function or None, optional) – Activation function if not None. Default: torch.nn.functional.relu.

  • alpha (float, optional) – Loss balance weight for attribute and structure. Default: 0.5.

  • contamination (float, optional) – Valid in (0., 0.5). The proportion of outliers in the data set. Used when fitting to define the threshold on the decision function. Default: 0.1.

  • lr (float, optional) – Learning rate. Default: 0.004.

  • epoch (int, optional) – Maximum number of training epoch. Default: 5.

  • gpu (int) – GPU Index, -1 for using CPU. Default: 0.

  • verbose (bool) – Verbosity mode. Turn on to print out log information. Default: False.

Examples

>>> from pygod.models import DOMINANT
>>> model = DOMINANT()
>>> model.fit(data) # PyG graph data object
>>> prediction = model.predict(data)
decision_function(G)[source]#

Predict raw anomaly score using the fitted detector. Outliers are assigned with larger anomaly scores.

Parameters

G (PyTorch Geometric Data instance (torch_geometric.data.Data)) – The input data.

Returns

outlier_scores – The anomaly score of shape N.

Return type

numpy.ndarray

fit(G, y_true=None)[source]#

Fit detector with input data.

Parameters
  • G (PyTorch Geometric Data instance (torch_geometric.data.Data)) – The input data.

  • y_true (numpy.array, optional (default=None)) – The optional outlier ground truth labels used to monitor the training progress. They are not used to optimize the unsupervised model.

Returns

self – Fitted estimator.

Return type

object

get_params(deep=True)#

Get parameters for this estimator. See https://scikit-learn.org/stable/modules/generated/sklearn.base .BaseEstimator.html and sklearn/base.py for more information. :param deep: If True, will return the parameters for this estimator and

contained sub-objects that are estimators.

Returns

params – Parameter names mapped to their values.

Return type

mapping of string to any

loss_func(x, x_, adj, adj_)[source]#
predict(G, return_confidence=False)#

Predict if a particular sample is an outlier or not.

Parameters

G (PyTorch Geometric Data instance (torch_geometric.data.Data)) – The input graph.

Returns

  • outlier_labels (numpy array of shape (n_samples,)) – For each observation, tells whether or not it should be considered as an outlier according to the fitted model. 0 stands for inliers and 1 for outliers.

  • confidence (numpy array of shape (n_samples,).) – Only if return_confidence is set to True.

predict_confidence(G)#

Predict the model’s confidence in making the same prediction under slightly different training sets. See [PVD20].

Parameters

G (PyTorch Geometric Data instance (torch_geometric.data.Data)) – The input graph.

Returns

confidence – For each observation, tells how consistently the model would make the same prediction if the training set was perturbed. Return a probability, ranging in [0,1].

Return type

numpy array of shape (n_samples,)

predict_proba(G, method='linear', return_confidence=False)#

Predict the probability of a sample being outlier. Two approaches are possible:

  1. simply use Min-max conversion to linearly transform the outlier scores into the range of [0,1]. The model must be fitted first.

  2. use unifying scores, see [KKSZ11].

Parameters
  • G (PyTorch Geometric Data instance (torch_geometric.data.Data)) – The input graph.

  • method (str, optional (default='linear')) – probability conversion method. It must be one of ‘linear’ or ‘unify’.

  • return_confidence (boolean, optional(default=False)) – If True, also return the confidence of prediction.

Returns

outlier_probability – For each observation, tells whether it should be considered as an outlier according to the fitted model. Return the outlier probability, ranging in [0,1]. Note it depends on the number of classes, which is by default 2 classes ([proba of normal, proba of outliers]).

Return type

numpy array of shape (n_samples, n_classes)

process_graph(G)[source]#

Process the raw PyG data object into a tuple of sub data objects needed for the model.

Parameters

G (PyTorch Geometric Data instance (torch_geometric.data.Data)) – The input data.

Returns

  • x (torch.Tensor) – Attribute (feature) of nodes.

  • adj (torch.Tensor) – Adjacency matrix of the graph.

  • edge_index (torch.Tensor) – Edge list of the graph.

set_params(**params)#

Set the parameters of this estimator. The method works on simple estimators as well as on nested objects (such as pipelines). The latter have parameters of the form <component>__<parameter> so that it’s possible to update each component of a nested object. See https://scikit-learn.org/stable/modules/generated/sklearn.base .BaseEstimator.html and sklearn/base.py for more information. :returns: self :rtype: object

pygod.models.done module#

Deep Outlier Aware Attributed Network Embedding (DONE)

class pygod.models.done.DONE(Deep Outlier Aware Attributed Network Embedding)[source]#

Bases: pygod.models.base.BaseDetector

DONE is consist of an attribute autoencoder and a structure autoencoder. It estimates five loss to optimize the model, including an attribute proximity loss, an attribute homophily loss, a structure proximity loss, a structure homophily loss, and a combination loss. It calculates three outlier score, and averages them as an overall score.

See [BVM20] for details.

Parameters
  • hid_dim (int, optional) – Hidden dimension for both attribute autoencoder and structure autoencoder. Default: 0.

  • num_layers (int, optional) – Total number of layers in model. A half (ceil) of the layers are for the encoder, the other half (floor) of the layers are for decoders. Default: 4.

  • dropout (float, optional) – Dropout rate. Default: 0..

  • weight_decay (float, optional) – Weight decay (L2 penalty). Default: 0..

  • act (callable activation function or None, optional) – Activation function if not None. Default: torch.nn.functional.relu.

  • a1 (float, optional) – Loss balance weight for structure proximity. Default: 0.2.

  • a2 (float, optional) – Loss balance weight for structure homophily. Default: 0.2.

  • a3 (float, optional) – Loss balance weight for attribute proximity. Default: 0.2.

  • a4 (float, optional) – Loss balance weight for attribute proximity. Default: 0.2.

  • a5 (float, optional) – Loss balance weight for combination. Default: 0.2.

  • contamination (float, optional) – Valid in (0., 0.5). The proportion of outliers in the data set. Used when fitting to define the threshold on the decision function. Default: 0.1.

  • lr (float, optional) – Learning rate. Default: 0.004.

  • epoch (int, optional) – Maximum number of training epoch. Default: 5.

  • gpu (int) – GPU Index, -1 for using CPU. Default: 0.

  • verbose (bool) – Verbosity mode. Turn on to print out log information. Default: False.

Examples

>>> from pygod.models import DONE
>>> model = DONE()
>>> model.fit(data)
>>> prediction = model.predict(data)
decision_function(G)[source]#

Predict raw anomaly score using the fitted detector. Outliers are assigned with larger anomaly scores.

Parameters

G (PyTorch Geometric Data instance (torch_geometric.data.Data)) – The input data.

Returns

outlier_scores – The anomaly score of shape N.

Return type

numpy.ndarray

fit(G, y_true=None)[source]#

Fit detector with input data.

Parameters
  • G (PyTorch Geometric Data instance (torch_geometric.data.Data)) – The input data.

  • y_true (numpy.array, optional (default=None)) – The optional outlier ground truth labels used to monitor the training progress. They are not used to optimize the unsupervised model.

Returns

self – Fitted estimator.

Return type

object

get_params(deep=True)#

Get parameters for this estimator. See https://scikit-learn.org/stable/modules/generated/sklearn.base .BaseEstimator.html and sklearn/base.py for more information. :param deep: If True, will return the parameters for this estimator and

contained sub-objects that are estimators.

Returns

params – Parameter names mapped to their values.

Return type

mapping of string to any

loss_func(x, x_, s, s_, h_a, h_s, dna, dns)[source]#
predict(G, return_confidence=False)#

Predict if a particular sample is an outlier or not.

Parameters

G (PyTorch Geometric Data instance (torch_geometric.data.Data)) – The input graph.

Returns

  • outlier_labels (numpy array of shape (n_samples,)) – For each observation, tells whether or not it should be considered as an outlier according to the fitted model. 0 stands for inliers and 1 for outliers.

  • confidence (numpy array of shape (n_samples,).) – Only if return_confidence is set to True.

predict_confidence(G)#

Predict the model’s confidence in making the same prediction under slightly different training sets. See [PVD20].

Parameters

G (PyTorch Geometric Data instance (torch_geometric.data.Data)) – The input graph.

Returns

confidence – For each observation, tells how consistently the model would make the same prediction if the training set was perturbed. Return a probability, ranging in [0,1].

Return type

numpy array of shape (n_samples,)

predict_proba(G, method='linear', return_confidence=False)#

Predict the probability of a sample being outlier. Two approaches are possible:

  1. simply use Min-max conversion to linearly transform the outlier scores into the range of [0,1]. The model must be fitted first.

  2. use unifying scores, see [KKSZ11].

Parameters
  • G (PyTorch Geometric Data instance (torch_geometric.data.Data)) – The input graph.

  • method (str, optional (default='linear')) – probability conversion method. It must be one of ‘linear’ or ‘unify’.

  • return_confidence (boolean, optional(default=False)) – If True, also return the confidence of prediction.

Returns

outlier_probability – For each observation, tells whether it should be considered as an outlier according to the fitted model. Return the outlier probability, ranging in [0,1]. Note it depends on the number of classes, which is by default 2 classes ([proba of normal, proba of outliers]).

Return type

numpy array of shape (n_samples, n_classes)

process_graph(G)[source]#

Process the raw PyG data object into a tuple of sub data objects needed for the model.

Parameters

G (PyTorch Geometric Data instance (torch_geometric.data.Data)) – The input data.

Returns

  • x (torch.Tensor) – Attribute (feature) of nodes.

  • s (torch.Tensor) – Adjacency matrix of the graph.

  • edge_index (torch.Tensor) – Edge list of the graph.

set_params(**params)#

Set the parameters of this estimator. The method works on simple estimators as well as on nested objects (such as pipelines). The latter have parameters of the form <component>__<parameter> so that it’s possible to update each component of a nested object. See https://scikit-learn.org/stable/modules/generated/sklearn.base .BaseEstimator.html and sklearn/base.py for more information. :returns: self :rtype: object

pygod.models.gaan module#

Generative Adversarial Attributed Network Anomaly Detection (GAAN)

class pygod.models.gaan.GAAN(Generative Adversarial Attributed Network Anomaly Detection)[source]#

Bases: pygod.models.base.BaseDetector

GAAN is a generative adversarial attribute network anomaly detection framework, including a generator module, an encoder module, a discriminator module, and uses anomaly evaluation measures that consider sample reconstruction error and real sample recognition confidence to make predictions.

See [CLW+20] for details.

Parameters
  • noise_dim (int, optional) – Dimension of the Gaussian random noise. Defaults: 32.

  • latent_dim (int, optional) – Dimension of the latent space. Defaults: 32.

  • hid_dim1 (int, optional) – Hidden dimension of MLP later 1. Defaults: 32.

  • hid_dim2 (int, optional) – Hidden dimension of MLP later 2. Defaults: 64.

  • hid_dim3 (int, optional) – Hidden dimension of MLP later 3. Defaults: 128.

  • num_layers (int, optional) – Total number of layers in model. A half (ceil) of the layers are for the encoder, the other half (floor) of the layers are for decoders. Defaults: 3.

  • dropout (float, optional) – Dropout rate. Defaults: 0..

  • weight_decay (float, optional) – Weight decay (L2 penalty). Defaults: 0..

  • act (callable activation function or None, optional) – Activation function if not None. Defaults: torch.nn.functional.relu.

  • alpha (float, optional) – loss balance weight for attribute and structure. Defaults: 0.2.

  • contamination (float, optional) – Valid in (0., 0.5). The proportion of outliers in the data set. Used when fitting to define the threshold on the decision function. Defaults: 0.05.

  • lr (float, optional) – Learning rate. Defaults: 0.005.

  • epoch (int, optional) – Maximum number of training epoch. Defaults: 10.

  • gpu (int) – GPU Index, -1 for using CPU. Defaults: -1.

  • verbose (bool) – Verbosity mode. Turn on to print out log information. Defaults: False.

Examples

>>> from pygod.models import GAAN
>>> model = GAAN()
>>> model.fit(data) # PyG graph data object
>>> prediction = model.predict(data)
decision_function(G)[source]#

Predict raw anomaly score using the fitted detector. Outliers are assigned with larger anomaly scores.

Parameters

G (PyTorch Geometric Data instance (torch_geometric.data.Data)) – The input data.

Returns

outlier_scores – The anomaly score of shape N.

Return type

numpy.ndarray

fit(G, y_true=None)[source]#

Fit detector with input data.

Parameters
  • G (PyTorch Geometric Data instance (torch_geometric.data.Data)) – The input data.

  • y_true (numpy.array, optional (default=None)) – The optional outlier ground truth labels used to monitor the training progress. They are not used to optimize the unsupervised model.

Returns

self – Fitted estimator.

Return type

object

get_params(deep=True)#

Get parameters for this estimator. See https://scikit-learn.org/stable/modules/generated/sklearn.base .BaseEstimator.html and sklearn/base.py for more information. :param deep: If True, will return the parameters for this estimator and

contained sub-objects that are estimators.

Returns

params – Parameter names mapped to their values.

Return type

mapping of string to any

loss_function(X, X_, Y_true_pre, Y_fake_pre, edge_index, criterion)[source]#

Obtain the generator and discriminator losses separately.

Parameters
  • X (torch.Tensor) – Attribute (feature) of nodes.

  • X – Fake attribute (feature) of nodes.

  • Y_true_pre (torch.Tensor) – Labels predicted from the ture attribute.

  • Y_fake_pre (torch.Tensor) – Labels predicted from the fake attribute.

  • edge_index (torch.Tensor) – Edge list of the graph.

  • criterion (torch.nn.modules.loss.BCELoss) – Edge list of the graph.

Returns

  • loss_D (torch.Tensor) – Generator loss.

  • loss_GE (torch.Tensor) – Discriminator loss.

predict(G, return_confidence=False)#

Predict if a particular sample is an outlier or not.

Parameters

G (PyTorch Geometric Data instance (torch_geometric.data.Data)) – The input graph.

Returns

  • outlier_labels (numpy array of shape (n_samples,)) – For each observation, tells whether or not it should be considered as an outlier according to the fitted model. 0 stands for inliers and 1 for outliers.

  • confidence (numpy array of shape (n_samples,).) – Only if return_confidence is set to True.

predict_confidence(G)#

Predict the model’s confidence in making the same prediction under slightly different training sets. See [PVD20].

Parameters

G (PyTorch Geometric Data instance (torch_geometric.data.Data)) – The input graph.

Returns

confidence – For each observation, tells how consistently the model would make the same prediction if the training set was perturbed. Return a probability, ranging in [0,1].

Return type

numpy array of shape (n_samples,)

predict_proba(G, method='linear', return_confidence=False)#

Predict the probability of a sample being outlier. Two approaches are possible:

  1. simply use Min-max conversion to linearly transform the outlier scores into the range of [0,1]. The model must be fitted first.

  2. use unifying scores, see [KKSZ11].

Parameters
  • G (PyTorch Geometric Data instance (torch_geometric.data.Data)) – The input graph.

  • method (str, optional (default='linear')) – probability conversion method. It must be one of ‘linear’ or ‘unify’.

  • return_confidence (boolean, optional(default=False)) – If True, also return the confidence of prediction.

Returns

outlier_probability – For each observation, tells whether it should be considered as an outlier according to the fitted model. Return the outlier probability, ranging in [0,1]. Note it depends on the number of classes, which is by default 2 classes ([proba of normal, proba of outliers]).

Return type

numpy array of shape (n_samples, n_classes)

process_graph(G)[source]#

Process the raw PyG data object into a tuple of sub data objects needed for the model.

Parameters

G (PyTorch Geometric Data instance (torch_geometric.data.Data)) – The input data.

Returns

  • X (torch.Tensor) – Attribute (feature) of nodes.

  • edge_index (torch.Tensor) – Edge list of the graph.

score_function(X, X_, Y_true_pre, Y_fake_pre, edge_index, criterion)[source]#

Get anomaly score after the model training by weighted context reconstruction loss and structure discriminator loss.

Parameters
  • X (torch.Tensor) – Attribute (feature) of nodes.

  • X – Fake attribute (feature) of nodes.

  • Y_true_pre (torch.Tensor) – Labels predicted from the ture attribute.

  • Y_fake_pre (torch.Tensor) – Labels predicted from the fake attribute.

  • edge_index (torch.Tensor) – Edge list of the graph.

  • criterion (torch.nn.modules.loss.BCELoss) – Edge list of the graph.

Returns

score – Anomaly score.

Return type

torch.Tensor

set_params(**params)#

Set the parameters of this estimator. The method works on simple estimators as well as on nested objects (such as pipelines). The latter have parameters of the form <component>__<parameter> so that it’s possible to update each component of a nested object. See https://scikit-learn.org/stable/modules/generated/sklearn.base .BaseEstimator.html and sklearn/base.py for more information. :returns: self :rtype: object

train_model(X, gaussian_noise, edge_index)[source]#

Complete the entire process from noise to generator, to encoder, and finally to discriminator.

Parameters
  • X (torch.Tensor) – Attribute (feature) of nodes.

  • gaussian_noise (torch.Tensor) – Gaussian noise for generator.

  • edge_index (torch.Tensor) – Edge list of the graph.

Returns

  • X_ (torch.Tensor) – Fake attribute (feature) of nodes.

  • Y_true_pre (torch.Tensor) – Labels predicted from the ture attribute.

  • Y_fake_pre_ (torch.Tensor) – Labels predicted from the fake attribute.

pygod.models.gcnae module#

Graph Convolutional Network Autoencoder

class pygod.models.gcnae.GCNAE(hid_dim=64, num_layers=4, dropout=0.3, weight_decay=0.0, act=<function relu>, contamination=0.1, lr=0.005, epoch=100, gpu=0, verbose=False)[source]#

Bases: pygod.models.base.BaseDetector

Vanila Graph Convolutional Networks Autoencoder

See [YZY+21] for details.

Parameters
  • hid_dim (int, optional) – Hidden dimension of model. Default: 0.

  • num_layers (int, optional) – Total number of layers in autoencoders. Default: 4.

  • dropout (float, optional) – Dropout rate. Default: 0..

  • weight_decay (float, optional) – Weight decay (L2 penalty). Default: 0..

  • act (callable activation function or None, optional) – Activation function if not None. Default: torch.nn.functional.relu.

  • contamination (float, optional) – Valid in (0., 0.5). The proportion of outliers in the data set. Used when fitting to define the threshold on the decision function. Default: 0.1.

  • lr (float, optional) – Learning rate. Default: 0.004.

  • epoch (int, optional) – Maximum number of training epoch. Default: 100.

  • gpu (int) – GPU Index, -1 for using CPU. Default: 0.

  • verbose (bool) – Verbosity mode. Turn on to print out log information. Default: False.

Examples

>>> from pygod.models import GCNAE
>>> model = GCNAE()
>>> model.fit(data) # PyG graph data object
>>> prediction = model.predict(data)
decision_function(G)[source]#

Predict raw anomaly score using the fitted detector. Outliers are assigned with larger anomaly scores.

Parameters

G (PyTorch Geometric Data instance (torch_geometric.data.Data)) – The input data.

Returns

outlier_scores – The anomaly score of shape N.

Return type

numpy.ndarray

fit(G, y_true=None)[source]#

Fit detector with input data.

Parameters
  • G (PyTorch Geometric Data instance (torch_geometric.data.Data)) – The input data.

  • y_true (numpy.array, optional (default=None)) – The optional outlier ground truth labels used to monitor the training progress. They are not used to optimize the unsupervised model.

Returns

self – Fitted estimator.

Return type

object

get_params(deep=True)#

Get parameters for this estimator. See https://scikit-learn.org/stable/modules/generated/sklearn.base .BaseEstimator.html and sklearn/base.py for more information. :param deep: If True, will return the parameters for this estimator and

contained sub-objects that are estimators.

Returns

params – Parameter names mapped to their values.

Return type

mapping of string to any

predict(G, return_confidence=False)#

Predict if a particular sample is an outlier or not.

Parameters

G (PyTorch Geometric Data instance (torch_geometric.data.Data)) – The input graph.

Returns

  • outlier_labels (numpy array of shape (n_samples,)) – For each observation, tells whether or not it should be considered as an outlier according to the fitted model. 0 stands for inliers and 1 for outliers.

  • confidence (numpy array of shape (n_samples,).) – Only if return_confidence is set to True.

predict_confidence(G)#

Predict the model’s confidence in making the same prediction under slightly different training sets. See [PVD20].

Parameters

G (PyTorch Geometric Data instance (torch_geometric.data.Data)) – The input graph.

Returns

confidence – For each observation, tells how consistently the model would make the same prediction if the training set was perturbed. Return a probability, ranging in [0,1].

Return type

numpy array of shape (n_samples,)

predict_proba(G, method='linear', return_confidence=False)#

Predict the probability of a sample being outlier. Two approaches are possible:

  1. simply use Min-max conversion to linearly transform the outlier scores into the range of [0,1]. The model must be fitted first.

  2. use unifying scores, see [KKSZ11].

Parameters
  • G (PyTorch Geometric Data instance (torch_geometric.data.Data)) – The input graph.

  • method (str, optional (default='linear')) – probability conversion method. It must be one of ‘linear’ or ‘unify’.

  • return_confidence (boolean, optional(default=False)) – If True, also return the confidence of prediction.

Returns

outlier_probability – For each observation, tells whether it should be considered as an outlier according to the fitted model. Return the outlier probability, ranging in [0,1]. Note it depends on the number of classes, which is by default 2 classes ([proba of normal, proba of outliers]).

Return type

numpy array of shape (n_samples, n_classes)

process_graph(G)[source]#

Process the raw PyG data object into a tuple of sub data objects needed for the model.

Parameters

G (PyTorch Geometric Data instance (torch_geometric.data.Data)) – The input data.

Returns

  • x (torch.Tensor) – Attribute (feature) of nodes.

  • edge_index (torch.Tensor) – Edge list of the graph.

set_params(**params)#

Set the parameters of this estimator. The method works on simple estimators as well as on nested objects (such as pipelines). The latter have parameters of the form <component>__<parameter> so that it’s possible to update each component of a nested object. See https://scikit-learn.org/stable/modules/generated/sklearn.base .BaseEstimator.html and sklearn/base.py for more information. :returns: self :rtype: object

pygod.models.guide module#

Higher-order Structure based Anomaly Detection on Attributed Networks (GUIDE)

class pygod.models.guide.GUIDE(hid_dim=64, num_layers=4, dropout=0.3, weight_decay=0.0, act=<function relu>, alpha=0.1, contamination=0.1, lr=0.001, epoch=10, gpu=0, graphlet_size=4, selected_motif=False, verbose=False)[source]#

Bases: pygod.models.base.BaseDetector

GUIDE (Higher-order Structure based Anomaly Detection on Attributed Networks) GUIDE is an anomaly detector consisting of an attribute graph convolutional autoencoder, and a structure graph attentive autoencoder (not same as the graph attention networks). Instead of adjacency matrix, node motif degree (graphlet degree is used in this implementation by default) is used as input of structure autoencoder. The reconstruction mean square error of the autoencoders are defined as structure anomaly score and attribute anomaly score, respectively.

Note: The graph preprocesing in the model has high time complexity. It may take longer than you expect.

See [YZY+21] for details.

Parameters
  • hid_dim (int, optional) – Hidden dimension of model. Default: 0.

  • num_layers (int, optional) – Total number of layers in autoencoders. Default: 4.

  • dropout (float, optional) – Dropout rate. Default: 0..

  • weight_decay (float, optional) – Weight decay (L2 penalty). Default: 0..

  • act (callable activation function or None, optional) – Activation function if not None. Default: torch.nn.functional.relu.

  • alpha (float, optional) – Loss balance weight for attribute and structure. Default: 0.5.

  • contamination (float, optional) – Valid in (0., 0.5). The proportion of outliers in the data set. Used when fitting to define the threshold on the decision function. Default: 0.1.

  • lr (float, optional) – Learning rate. Default: 0.004.

  • epoch (int, optional) – Maximum number of training epoch. Default: 10.

  • gpu (int) – GPU Index, -1 for using CPU. Default: 0.

  • graphlet_size (int) – The maximum graphlet size used to compute structure input. Default: 4.

  • selected_motif (bool) – Use selected motifs which are defined in the original paper. Default: False.

  • verbose (bool) – Verbosity mode. Turn on to print out log information. Default: False.

Examples

>>> from pygod.models import GUIDE
>>> model = GUIDE()
>>> model.fit(data) # PyG graph data object
>>> prediction = model.predict(data)
decision_function(G)[source]#

Predict raw anomaly score using the fitted detector. Outliers are assigned with larger anomaly scores.

Parameters

G (PyTorch Geometric Data instance (torch_geometric.data.Data)) – The input data.

Returns

outlier_scores – The anomaly score of shape N.

Return type

numpy.ndarray

fit(G, y_true=None)[source]#

Fit detector with input data.

Parameters
  • G (PyTorch Geometric Data instance (torch_geometric.data.Data)) – The input data.

  • y_true (numpy.array, optional (default=None)) – The optional outlier ground truth labels used to monitor the training progress. They are not used to optimize the unsupervised model.

Returns

self – Fitted estimator.

Return type

object

get_params(deep=True)#

Get parameters for this estimator. See https://scikit-learn.org/stable/modules/generated/sklearn.base .BaseEstimator.html and sklearn/base.py for more information. :param deep: If True, will return the parameters for this estimator and

contained sub-objects that are estimators.

Returns

params – Parameter names mapped to their values.

Return type

mapping of string to any

loss_func(x, x_, s, s_)[source]#
predict(G, return_confidence=False)#

Predict if a particular sample is an outlier or not.

Parameters

G (PyTorch Geometric Data instance (torch_geometric.data.Data)) – The input graph.

Returns

  • outlier_labels (numpy array of shape (n_samples,)) – For each observation, tells whether or not it should be considered as an outlier according to the fitted model. 0 stands for inliers and 1 for outliers.

  • confidence (numpy array of shape (n_samples,).) – Only if return_confidence is set to True.

predict_confidence(G)#

Predict the model’s confidence in making the same prediction under slightly different training sets. See [PVD20].

Parameters

G (PyTorch Geometric Data instance (torch_geometric.data.Data)) – The input graph.

Returns

confidence – For each observation, tells how consistently the model would make the same prediction if the training set was perturbed. Return a probability, ranging in [0,1].

Return type

numpy array of shape (n_samples,)

predict_proba(G, method='linear', return_confidence=False)#

Predict the probability of a sample being outlier. Two approaches are possible:

  1. simply use Min-max conversion to linearly transform the outlier scores into the range of [0,1]. The model must be fitted first.

  2. use unifying scores, see [KKSZ11].

Parameters
  • G (PyTorch Geometric Data instance (torch_geometric.data.Data)) – The input graph.

  • method (str, optional (default='linear')) – probability conversion method. It must be one of ‘linear’ or ‘unify’.

  • return_confidence (boolean, optional(default=False)) – If True, also return the confidence of prediction.

Returns

outlier_probability – For each observation, tells whether it should be considered as an outlier according to the fitted model. Return the outlier probability, ranging in [0,1]. Note it depends on the number of classes, which is by default 2 classes ([proba of normal, proba of outliers]).

Return type

numpy array of shape (n_samples, n_classes)

process_graph(G)[source]#

Process the raw PyG data object into a tuple of sub data objects needed for the model. Part of this function is adapted from https://github.com/benedekrozemberczki/OrbitalFeatures.

Parameters

G (PyTorch Geometric Data instance (torch_geometric.data.Data)) – The input data.

Returns

  • x (torch.Tensor) – Attribute (feature) of nodes.

  • s (torch.Tensor) – Structure matrix (node motif degree/graphlet degree)

  • edge_index (torch.Tensor) – Edge list of the graph.

set_params(**params)#

Set the parameters of this estimator. The method works on simple estimators as well as on nested objects (such as pipelines). The latter have parameters of the form <component>__<parameter> so that it’s possible to update each component of a nested object. See https://scikit-learn.org/stable/modules/generated/sklearn.base .BaseEstimator.html and sklearn/base.py for more information. :returns: self :rtype: object

pygod.models.mlpae module#

Multilayer Perceptron Autoencoder

class pygod.models.mlpae.MLPAE(hid_dim=64, num_layers=4, dropout=0.3, weight_decay=0.0, act=<function relu>, contamination=0.1, lr=0.005, epoch=5, gpu=0, verbose=False)[source]#

Bases: pygod.models.base.BaseDetector

Vanila Multilayer Perceptron Autoencoder

See [YZY+21] for details.

Parameters
  • hid_dim (int, optional) – Hidden dimension of model. Default: 0.

  • num_layers (int, optional) – Total number of layers in autoencoders. Default: 4.

  • dropout (float, optional) – Dropout rate. Default: 0..

  • weight_decay (float, optional) – Weight decay (L2 penalty). Default: 0..

  • act (callable activation function or None, optional) – Activation function if not None. Default: torch.nn.functional.relu.

  • contamination (float, optional) – Valid in (0., 0.5). The proportion of outliers in the data set. Used when fitting to define the threshold on the decision function. Default: 0.1.

  • lr (float, optional) – Learning rate. Default: 0.004.

  • epoch (int, optional) – Maximum number of training epoch. Default: 5.

  • gpu (int) – GPU Index, -1 for using CPU. Default: 0.

  • verbose (bool) – Verbosity mode. Turn on to print out log information. Default: False.

Examples

>>> from pygod.models import MLPAE
>>> model = MLPAE()
>>> model.fit(data) # PyG graph data object
>>> prediction = model.predict(data)
decision_function(G)[source]#

Predict raw anomaly score using the fitted detector. Outliers are assigned with larger anomaly scores.

Parameters

G (PyTorch Geometric Data instance (torch_geometric.data.Data)) – The input data.

Returns

outlier_scores – The anomaly score of shape N.

Return type

numpy.ndarray

fit(G, y_true=None)[source]#

Fit detector with input data.

Parameters
  • G (PyTorch Geometric Data instance (torch_geometric.data.Data)) – The input data.

  • y_true (numpy.array, optional (default=None)) – The optional outlier ground truth labels used to monitor the training progress. They are not used to optimize the unsupervised model.

Returns

self – Fitted estimator.

Return type

object

get_params(deep=True)#

Get parameters for this estimator. See https://scikit-learn.org/stable/modules/generated/sklearn.base .BaseEstimator.html and sklearn/base.py for more information. :param deep: If True, will return the parameters for this estimator and

contained sub-objects that are estimators.

Returns

params – Parameter names mapped to their values.

Return type

mapping of string to any

predict(G, return_confidence=False)#

Predict if a particular sample is an outlier or not.

Parameters

G (PyTorch Geometric Data instance (torch_geometric.data.Data)) – The input graph.

Returns

  • outlier_labels (numpy array of shape (n_samples,)) – For each observation, tells whether or not it should be considered as an outlier according to the fitted model. 0 stands for inliers and 1 for outliers.

  • confidence (numpy array of shape (n_samples,).) – Only if return_confidence is set to True.

predict_confidence(G)#

Predict the model’s confidence in making the same prediction under slightly different training sets. See [PVD20].

Parameters

G (PyTorch Geometric Data instance (torch_geometric.data.Data)) – The input graph.

Returns

confidence – For each observation, tells how consistently the model would make the same prediction if the training set was perturbed. Return a probability, ranging in [0,1].

Return type

numpy array of shape (n_samples,)

predict_proba(G, method='linear', return_confidence=False)#

Predict the probability of a sample being outlier. Two approaches are possible:

  1. simply use Min-max conversion to linearly transform the outlier scores into the range of [0,1]. The model must be fitted first.

  2. use unifying scores, see [KKSZ11].

Parameters
  • G (PyTorch Geometric Data instance (torch_geometric.data.Data)) – The input graph.

  • method (str, optional (default='linear')) – probability conversion method. It must be one of ‘linear’ or ‘unify’.

  • return_confidence (boolean, optional(default=False)) – If True, also return the confidence of prediction.

Returns

outlier_probability – For each observation, tells whether it should be considered as an outlier according to the fitted model. Return the outlier probability, ranging in [0,1]. Note it depends on the number of classes, which is by default 2 classes ([proba of normal, proba of outliers]).

Return type

numpy array of shape (n_samples, n_classes)

process_graph(G)[source]#

Process the raw PyG data object into a tuple of sub data objects needed for the model.

Parameters

G (PyTorch Geometric Data instance (torch_geometric.data.Data)) – The input data.

Returns

x – Attribute (feature) of nodes.

Return type

torch.Tensor

set_params(**params)#

Set the parameters of this estimator. The method works on simple estimators as well as on nested objects (such as pipelines). The latter have parameters of the form <component>__<parameter> so that it’s possible to update each component of a nested object. See https://scikit-learn.org/stable/modules/generated/sklearn.base .BaseEstimator.html and sklearn/base.py for more information. :returns: self :rtype: object

pygod.models.ocgnn module#

One-Class Graph Neural Networks for Anomaly Detection in Attributed Networks

class pygod.models.ocgnn.OCGNN(n_hidden=256, n_layers=4, contamination=0.1, dropout=0.3, lr=0.005, weight_decay=0, eps=0.001, nu=0.5, gpu=0, epoch=5, warmup_epoch=2, verbose=False, act=<function relu>)[source]#

Bases: pygod.models.base.BaseDetector

OCGNN (One-Class Graph Neural Networks for Anomaly Detection in Attributed Networks): OCGNN is an anomaly detector that measures the distance of anomaly to the centroid, in the similar fashion to the support vector machine, but in the embedding space after feeding towards several layers

of GCN.

See [WJD+21] for details.

Parameters
  • n_hidden (int, optional) – Hidden dimension of model. Defaults: 256`.

  • n_layers (int, optional) – Dimensions of underlying GCN. Defaults: 4.

  • contamination (float, optional) – Valid in (0., 0.5). The proportion of outliers in the data set. Used when fitting to define the threshold on the decision function. Default: 0.1.

  • dropout (float, optional) – Dropout rate. Defaults: 0.3.

  • weight_decay (float, optional) – Weight decay (L2 penalty). Defaults: 0..

  • act (callable activation function or None, optional) – Activation function if not None. Defaults: torch.nn.functional.relu.

  • eps (float, optional) – A small valid number for determining the center and make sure it does not collapse to 0. Defaults: 0.001.

  • nu (float, optional) – Regularization parameter. Defaults: 0.5

  • lr (float, optional) – Learning rate. Defaults: 0.005.

  • epoch (int, optional) – Maximum number of training epoch. Defaults: 5.

  • warmup_epoch (int, optional) – Number of epochs to update radius and center in the beginning of training. Defaults: 2.

  • gpu (int) – GPU Index, -1 for using CPU. Defaults: 0.

  • verbose (bool) – Verbosity mode. Turn on to print out log information. Defaults: False.

Examples

>>> from pygod.models import AnomalyDAE
>>> model = OCGNN()
>>> model.fit(data) # PyG graph data object
>>> prediction = model.predict(data)
anomaly_scores(outputs)[source]#

Calculate the anomaly score given by Euclidean distance to the center.

Parameters

outputs (torch.Tensor) – The output in the reduced space by GCN.

Returns

  • dist (torch.Tensor) – Average distance.

  • scores (torch.Tensor) – Anomaly scores.

decision_function(G)[source]#

Predict raw anomaly score of X using the fitted detector. The anomaly score of an input sample is computed based on distance to the centroid and measurement within the radius :param G: The input data. :type G: PyTorch Geometric Data instance (torch_geometric.data.Data)

Returns

anomaly_scores – The anomaly score of the input samples of shape (n_samples,).

Return type

numpy.array

fit(G, y_true=None)[source]#

Fit detector with input data.

Parameters
  • G (PyTorch Geometric Data instance (torch_geometric.data.Data)) – The input data.

  • y_true (numpy.array, optional (default=None)) – The optional outlier ground truth labels used to monitor the training progress. They are not used to optimize the unsupervised model.

Returns

self – Fitted estimator.

Return type

object

get_params(deep=True)#

Get parameters for this estimator. See https://scikit-learn.org/stable/modules/generated/sklearn.base .BaseEstimator.html and sklearn/base.py for more information. :param deep: If True, will return the parameters for this estimator and

contained sub-objects that are estimators.

Returns

params – Parameter names mapped to their values.

Return type

mapping of string to any

get_radius(dist)[source]#

Optimally solve for radius R via the (1-nu)-quantile of distances.

Parameters

dist (torch.Tensor) – Distance of the data points, calculated by the loss function.

Returns

r – New radius.

Return type

numpy.array

init_center(x, edge_index)[source]#

Initialize hypersphere center c as the mean from an initial forward pass on the data.

Parameters
  • x (torch.Tensor) – Node features.

  • edge_index (torch.Tensor) – Edge indices for the graph data

Returns

c – The new centroid.

Return type

torch.Tensor

loss_function(outputs, update=False)[source]#

Calculate the loss in paper Equation (4)

Parameters
  • outputs (torch.Tensor) – The output in the reduced space by GCN.

  • update (bool, optional (default=False)) – If you need to update the radius, set update=True.

Returns

  • dist (torch.Tensor) – Average distance.

  • scores (torch.Tensor) – Anomaly scores.

  • loss (torch.Tensor) – A combined loss of radius and average scores.

predict(G, return_confidence=False)#

Predict if a particular sample is an outlier or not.

Parameters

G (PyTorch Geometric Data instance (torch_geometric.data.Data)) – The input graph.

Returns

  • outlier_labels (numpy array of shape (n_samples,)) – For each observation, tells whether or not it should be considered as an outlier according to the fitted model. 0 stands for inliers and 1 for outliers.

  • confidence (numpy array of shape (n_samples,).) – Only if return_confidence is set to True.

predict_confidence(G)#

Predict the model’s confidence in making the same prediction under slightly different training sets. See [PVD20].

Parameters

G (PyTorch Geometric Data instance (torch_geometric.data.Data)) – The input graph.

Returns

confidence – For each observation, tells how consistently the model would make the same prediction if the training set was perturbed. Return a probability, ranging in [0,1].

Return type

numpy array of shape (n_samples,)

predict_proba(G, method='linear', return_confidence=False)#

Predict the probability of a sample being outlier. Two approaches are possible:

  1. simply use Min-max conversion to linearly transform the outlier scores into the range of [0,1]. The model must be fitted first.

  2. use unifying scores, see [KKSZ11].

Parameters
  • G (PyTorch Geometric Data instance (torch_geometric.data.Data)) – The input graph.

  • method (str, optional (default='linear')) – probability conversion method. It must be one of ‘linear’ or ‘unify’.

  • return_confidence (boolean, optional(default=False)) – If True, also return the confidence of prediction.

Returns

outlier_probability – For each observation, tells whether it should be considered as an outlier according to the fitted model. Return the outlier probability, ranging in [0,1]. Note it depends on the number of classes, which is by default 2 classes ([proba of normal, proba of outliers]).

Return type

numpy array of shape (n_samples, n_classes)

process_graph(G)[source]#

Process the raw PyG data object into a tuple of sub data objects needed for the model.

Parameters

G (PyTorch Geometric Data instance (torch_geometric.data.Data)) – The input data.

Returns

  • x (torch.Tensor) – Attribute (feature) of nodes.

  • adj (torch.Tensor) – Adjacency matrix of the graph.

  • edge_index (torch.Tensor) – Edge list of the graph.

set_params(**params)#

Set the parameters of this estimator. The method works on simple estimators as well as on nested objects (such as pipelines). The latter have parameters of the form <component>__<parameter> so that it’s possible to update each component of a nested object. See https://scikit-learn.org/stable/modules/generated/sklearn.base .BaseEstimator.html and sklearn/base.py for more information. :returns: self :rtype: object

pygod.models.one module#

Outlier Aware Network Embedding for Attributed Networks (ONE)

class pygod.models.one.ONE(Outlier Aware Network Embedding for Attributed Networks)[source]#

Bases: pygod.models.base.BaseDetector

Reference: <https://arxiv.org/pdf/1811.07609.pdf>

See [BLM19] for details.

Parameters
  • K (int, optional) – Every vertex is a K dimensional vector, K < min(N, D). Default: 36.

  • iter (int, optional) – Number of outer Iterations for optimization. Default: 5.

  • contamination (float, optional) – Valid in (0., 0.5). The proportion of outliers in the data set. Used when fitting to define the threshold on the decision function. Default: 0.1.

  • verbose (bool) – Verbosity mode. Turn on to print out log information. Default: False.

Examples

>>> from pygod.models import ONE
>>> model = ONE()
>>> model.fit(data) # PyG graph data object
>>> prediction = model.predict(data)
cal_outlierScore(A, C)[source]#

Calculate the outlier scores.

Parameters
  • A (numpy.array) – The adjacency matrix.

  • C (numpy.array) – The node attribute matrix.

Returns

outlier_scores – Three sets of outlier scores from three different layers.

Return type

Tuple(numpy.array, numpy.array, numpy.array)

calc_lossValues(A, C, G_mat, H, U, V, W, outl1, outl2, outl3, alpha, beta, gamma)[source]#

Calculate the loss. This function is called inside the fit() function.

Parameters

function. (Multiple variables inside the fit()) –

Return type

None

decision_function(G)[source]#

Predict raw anomaly score using the fitted detector. Outliers are assigned with larger anomaly scores.

Parameters

G (PyTorch Geometric Data instance (torch_geometric.data.Data)) – The input data.

Returns

outl2 – The anomaly score of shape N.

Return type

numpy.ndarray

fit(G, y_true=None)[source]#

Fit detector with input data.

Parameters
  • G (PyTorch Geometric Data instance (torch_geometric.data.Data)) – The input data.

  • y_true (numpy.array, optional (default=None)) – The optional outlier ground truth labels used to monitor the training progress. They are not used to optimize the unsupervised model.

Returns

self – Fitted estimator.

Return type

object

get_params(deep=True)#

Get parameters for this estimator. See https://scikit-learn.org/stable/modules/generated/sklearn.base .BaseEstimator.html and sklearn/base.py for more information. :param deep: If True, will return the parameters for this estimator and

contained sub-objects that are estimators.

Returns

params – Parameter names mapped to their values.

Return type

mapping of string to any

predict(G, return_confidence=False)#

Predict if a particular sample is an outlier or not.

Parameters

G (PyTorch Geometric Data instance (torch_geometric.data.Data)) – The input graph.

Returns

  • outlier_labels (numpy array of shape (n_samples,)) – For each observation, tells whether or not it should be considered as an outlier according to the fitted model. 0 stands for inliers and 1 for outliers.

  • confidence (numpy array of shape (n_samples,).) – Only if return_confidence is set to True.

predict_confidence(G)#

Predict the model’s confidence in making the same prediction under slightly different training sets. See [PVD20].

Parameters

G (PyTorch Geometric Data instance (torch_geometric.data.Data)) – The input graph.

Returns

confidence – For each observation, tells how consistently the model would make the same prediction if the training set was perturbed. Return a probability, ranging in [0,1].

Return type

numpy array of shape (n_samples,)

predict_proba(G, method='linear', return_confidence=False)#

Predict the probability of a sample being outlier. Two approaches are possible:

  1. simply use Min-max conversion to linearly transform the outlier scores into the range of [0,1]. The model must be fitted first.

  2. use unifying scores, see [KKSZ11].

Parameters
  • G (PyTorch Geometric Data instance (torch_geometric.data.Data)) – The input graph.

  • method (str, optional (default='linear')) – probability conversion method. It must be one of ‘linear’ or ‘unify’.

  • return_confidence (boolean, optional(default=False)) – If True, also return the confidence of prediction.

Returns

outlier_probability – For each observation, tells whether it should be considered as an outlier according to the fitted model. Return the outlier probability, ranging in [0,1]. Note it depends on the number of classes, which is by default 2 classes ([proba of normal, proba of outliers]).

Return type

numpy array of shape (n_samples, n_classes)

process_graph(G)[source]#

Process the raw PyG data object into a tuple of sub data objects needed for the model.

Parameters

G (PyTorch Geometric Data instance (torch_geometric.data.Data)) – The input graph.

Returns

  • A (numpy.array) – The adjacency matrix.

  • C (numpy.array) – The node attribute matrix.

set_params(**params)#

Set the parameters of this estimator. The method works on simple estimators as well as on nested objects (such as pipelines). The latter have parameters of the form <component>__<parameter> so that it’s possible to update each component of a nested object. See https://scikit-learn.org/stable/modules/generated/sklearn.base .BaseEstimator.html and sklearn/base.py for more information. :returns: self :rtype: object

pygod.models.one.calculate_G(G_mat, alpha, outl1, H, A, gamma, outl3, U, W)[source]#

References

BLM19

Sambaran Bandyopadhyay, N Lokesh, and M Narasimha Murty. Outlier aware network embedding for attributed networks. In Proceedings of the AAAI conference on artificial intelligence, volume 33, 12–19. 2019.

BVM20(1,2)

Sambaran Bandyopadhyay, Saley Vishal Vivek, and MN Murty. Outlier resistant unsupervised deep architectures for attributed network embedding. In Proceedings of the 13th International Conference on Web Search and Data Mining, 25–33. 2020.

CCL+21

Lei Cai, Zhengzhang Chen, Chen Luo, Jiaping Gui, Jingchao Ni, Ding Li, and Haifeng Chen. Structural temporal graph neural networks for anomaly detection in dynamic graphs. In Proceedings of the 30th ACM International Conference on Information & Knowledge Management, 3747–3756. 2021.

CLW+20

Zhenxing Chen, Bo Liu, Meiqing Wang, Peng Dai, Jun Lv, and Liefeng Bo. Generative adversarial attributed network anomaly detection. In Proceedings of the 29th ACM International Conference on Information & Knowledge Management, 1989–1992. 2020.

DLBL19

Kaize Ding, Jundong Li, Rohit Bhanushali, and Huan Liu. Deep anomaly detection on attributed networks. In Proceedings of the 2019 SIAM International Conference on Data Mining, 594–602. SIAM, 2019.

DLS+20

Yingtong Dou, Zhiwei Liu, Li Sun, Yutong Deng, Hao Peng, and Philip S Yu. Enhancing graph neural network-based fraud detectors against camouflaged fraudsters. In Proceedings of the 29th ACM International Conference on Information & Knowledge Management, 315–324. 2020.

KKSZ11(1,2,3,4,5,6,7,8,9,10)

Hans-Peter Kriegel, Peer Kroger, Erich Schubert, and Arthur Zimek. Interpreting and unifying outlier scores. In Proceedings of the 2011 SIAM International Conference on Data Mining, 13–24. SIAM, 2011.

PVD20(1,2,3,4,5,6,7,8,9,10)

Lorenzo Perini, Vincent Vercruyssen, and Jesse Davis. Quantifying the confidence of anomaly detectors in their example-wise predictions. In Joint European Conference on Machine Learning and Knowledge Discovery in Databases, 227–243. Springer, 2020.

WJD+21

Xuhong Wang, Baihong Jin, Ying Du, Ping Cui, Yingshui Tan, and Yupu Yang. One-class graph neural networks for anomaly detection in attributed networks. Neural computing and applications, 33(18):12073–12085, 2021.

YZY+21(1,2,3)

Xu Yuan, Na Zhou, Shuo Yu, Huafei Huang, Zhikui Chen, and Feng Xia. Higher-order structure based anomaly detection on attributed networks. In 2021 IEEE International Conference on Big Data (Big Data), 2691–2700. IEEE, 2021.