API CheatSheet#

The following APIs are applicable for all detector models for easy use.

Key Attributes of a fitted model:

For the inductive setting:

Input of PyGOD: Please pass in a PyTorch Geometric (PyG) data object. See PyG data processing examples.

See base class definition below:

pygod.models.base module#

Base class for all outlier detector models

class pygod.models.base.BaseDetector(contamination=0.1)[source]#

Bases: object

Abstract class for all outlier detection algorithms.

Parameters

contamination (float in (0., 0.5), optional (default=0.1)) – The amount of contamination of the data set, i.e. the proportion of outliers in the data set. Used when fitting to define the threshold on the decision function.

decision_scores_#

The outlier scores of the training data. The higher, the more abnormal. Outliers tend to have higher scores. This value is available once the detector is fitted.

Type

numpy array of shape (n_samples,)

threshold_#

The threshold is based on contamination. It is the n_samples * contamination most abnormal samples in decision_scores_. The threshold is calculated for generating binary outlier labels.

Type

float

labels_#

The binary labels of the training data. 0 stands for inliers and 1 for outliers/anomalies. It is generated by applying threshold_ on decision_scores_.

Type

int, either 0 or 1

decision_function(G)[source]#

Predict raw anomaly scores of PyG Graph G using the fitted detector. The anomaly score of an input sample is computed based on the fitted detector. For consistency, outliers are assigned with higher anomaly scores.

Parameters

G (PyTorch Geometric Data instance (torch_geometric.data.Data)) – The input graph.

Returns

anomaly_scores – The anomaly score of The input graph..

Return type

numpy array of shape (n_samples,)

fit(G)[source]#

Fit detector. y is ignored in unsupervised methods.

Parameters

G (PyTorch Geometric Data instance (torch_geometric.data.Data)) – The input graph.

Returns

self – Fitted estimator.

Return type

object

get_params(deep=True)[source]#

Get parameters for this estimator. See sklearn.base.BaseEstimator for more information.

Parameters

deep (bool, optional) – If True, will return the parameters for this estimator and contained sub-objects that are estimators. Default: `True`.

Returns

params – Parameter names mapped to their values.

Return type

mapping of string to any

predict(G, return_confidence=False)[source]#

Predict if a particular sample is an outlier or not.

Parameters

G (PyTorch Geometric Data instance (torch_geometric.data.Data)) – The input graph.

Returns

  • outlier_labels (numpy array of shape (n_samples,)) – For each observation, tells whether or not it should be considered as an outlier according to the fitted model. 0 stands for inliers and 1 for outliers.

  • confidence (numpy array of shape (n_samples,).) – Only if return_confidence is set to True.

predict_confidence(G)[source]#

Predict the model’s confidence in making the same prediction under slightly different training sets. See [PVD20].

Parameters

G (PyTorch Geometric Data instance (torch_geometric.data.Data)) – The input graph.

Returns

confidence – For each observation, tells how consistently the model would make the same prediction if the training set was perturbed. Return a probability, ranging in [0,1].

Return type

numpy array of shape (n_samples,)

predict_proba(G, method='linear', return_confidence=False)[source]#

Predict the probability of a sample being outlier. Two approaches are possible:

  1. simply use Min-max conversion to linearly transform the outlier scores into the range of [0,1]. The model must be fitted first.

  2. use unifying scores, see [KKSZ11].

Parameters
  • G (PyTorch Geometric Data instance (torch_geometric.data.Data)) – The input graph.

  • method (str, optional (default='linear')) – probability conversion method. It must be one of ‘linear’ or ‘unify’.

  • return_confidence (boolean, optional(default=False)) – If True, also return the confidence of prediction.

Returns

outlier_probability – For each observation, tells whether it should be considered as an outlier according to the fitted model. Return the outlier probability, ranging in [0,1]. Note it depends on the number of classes, which is by default 2 classes ([proba of normal, proba of outliers]).

Return type

numpy array of shape (n_samples, n_classes)

process_graph(G)[source]#

Process the raw PyG data object into a tuple of sub data objects needed for the underlying model. For instance, if the training of the model need the node feature and edge index, return (G.x, G.edge_index).

Parameters

G (PyTorch Geometric Data instance (torch_geometric.data.Data)) – The input graph.

Returns

processed_data – The necessary information from the raw PyG Data object.

Return type

tuple of data object

set_params(**params)[source]#

Set the parameters of this estimator. The method works on simple estimators as well as on nested objects (such as pipelines). The latter have parameters of the form <component>__<parameter> so that it’s possible to update each component of a nested object. See sklearn.base.BaseEstimator for more information.

Returns

self

Return type

object