API CheatSheet#

The following APIs are applicable for all detector models for easy use.

pygod.models.base.BaseDetector.fit(): Fit detector. y is ignored in unsupervised methods.
pygod.models.base.BaseDetector.decision_function(): Predict raw anomaly scores of PyG Graph G using the fitted detector
pygod.models.base.BaseDetector.predict(): Predict if a particular sample is an outlier or not using the fitted detector.
pygod.models.base.BaseDetector.predict_proba(): Predict the probability of a sample being outlier using the fitted detector.
pygod.models.base.BaseDetector.predict_confidence(): Predict the model’s sample-wise confidence (available in predict and predict_proba).
pygod.models.base.BaseDetector.process_graph() (you do not need to call this explicitly): Process the raw PyG data object into a tuple of sub data objects needed for the underlying model.

Key Attributes of a fitted model:

pygod.models.base.BaseDetector.decision_scores_: The outlier scores of the training data. The higher, the more abnormal. Outliers tend to have higher scores.
pygod.models.base.BaseDetector.labels_: The binary labels of the training data. 0 stands for inliers and 1 for outliers/anomalies.

Input of PyGOD: Please pass in a PyTorch Geometric (PyG) data object. See PyG data processing examples.

See base class definition below:

pygod.models.base module#

Base class for all outlier detector models

class pygod.models.base.BaseDetector(contamination=0.1)[source]#

Bases: object

Abstract class for all outlier detection algorithms.

Parameters: contamination (float in (0., 0.5), optional (default=0.1)) – The amount of contamination of the data set, i.e. the proportion of outliers in the data set. Used when fitting to define the threshold on the decision function.

decision_scores_#

The outlier scores of the training data. The higher, the more abnormal. Outliers tend to have higher scores. This value is available once the detector is fitted.

Type: numpy array of shape (n_samples,)

threshold_#

The threshold is based on contamination. It is the n_samples * contamination most abnormal samples in decision_scores_. The threshold is calculated for generating binary outlier labels.

Type: float

labels_#

The binary labels of the training data. 0 stands for inliers and 1 for outliers/anomalies. It is generated by applying threshold_ on decision_scores_.

Type: int, either 0 or 1

decision_function(G)[source]#

Predict raw anomaly scores of PyG Graph G using the fitted detector. The anomaly score of an input sample is computed based on the fitted detector. For consistency, outliers are assigned with higher anomaly scores.

Parameters: G (PyTorch Geometric Data instance (torch_geometric.data.Data)) – The input graph.
Returns: anomaly_scores – The anomaly score of The input graph..
Return type: numpy array of shape (n_samples,)

fit(G)[source]#

Fit detector. y is ignored in unsupervised methods.

Parameters: G (PyTorch Geometric Data instance (torch_geometric.data.Data)) – The input graph.
Returns: self – Fitted estimator.
Return type: object

get_params(deep=True)[source]#

Get parameters for this estimator. See https://scikit-learn.org/stable/modules/generated/sklearn.base .BaseEstimator.html and sklearn/base.py for more information. :param deep: If True, will return the parameters for this estimator and

contained sub-objects that are estimators.

Returns: params – Parameter names mapped to their values.
Return type: mapping of string to any

predict(G, return_confidence=False)[source]#

Predict if a particular sample is an outlier or not.

Parameters

G (PyTorch Geometric Data instance (torch_geometric.data.Data)) – The input graph.

Returns

outlier_labels (numpy array of shape (n_samples,)) – For each observation, tells whether or not it should be considered as an outlier according to the fitted model. 0 stands for inliers and 1 for outliers.
confidence (numpy array of shape (n_samples,).) – Only if return_confidence is set to True.

predict_confidence(G)[source]#

Predict the model’s confidence in making the same prediction under slightly different training sets. See [PVD20].

Parameters: G (PyTorch Geometric Data instance (torch_geometric.data.Data)) – The input graph.
Returns: confidence – For each observation, tells how consistently the model would make the same prediction if the training set was perturbed. Return a probability, ranging in [0,1].
Return type: numpy array of shape (n_samples,)

predict_proba(G, method='linear', return_confidence=False)[source]#

Predict the probability of a sample being outlier. Two approaches are possible:

simply use Min-max conversion to linearly transform the outlier scores into the range of [0,1]. The model must be fitted first.
use unifying scores, see [KKSZ11].

Parameters

G (PyTorch Geometric Data instance (torch_geometric.data.Data)) – The input graph.
method (str, optional (default='linear')) – probability conversion method. It must be one of ‘linear’ or ‘unify’.
return_confidence (boolean, optional(default=False)) – If True, also return the confidence of prediction.

Returns

outlier_probability – For each observation, tells whether it should be considered as an outlier according to the fitted model. Return the outlier probability, ranging in [0,1]. Note it depends on the number of classes, which is by default 2 classes ([proba of normal, proba of outliers]).

Return type

numpy array of shape (n_samples, n_classes)

process_graph(G)[source]#

Process the raw PyG data object into a tuple of sub data objects needed for the underlying model. For instance, if the training of the model need the node feature and edge index, return (G.x, G.edge_index).

Parameters: G (PyTorch Geometric Data instance (torch_geometric.data.Data)) – The input graph.
Returns: processed_data – The necessary information from the raw PyG Data object.
Return type: tuple of data object

set_params(**params)[source]#: Set the parameters of this estimator. The method works on simple estimators as well as on nested objects (such as pipelines). The latter have parameters of the form <component>__<parameter> so that it’s possible to update each component of a nested object. See https://scikit-learn.org/stable/modules/generated/sklearn.base .BaseEstimator.html and sklearn/base.py for more information. :returns: self :rtype: object