API CheatSheet#
The following APIs are applicable for all detector models for easy use.
pygod.models.base.BaseDetector.fit()
: Fit detector. y is ignored in unsupervised methods.pygod.models.base.BaseDetector.decision_function()
: Predict raw anomaly scores of PyG Graph G using the fitted detector
Key Attributes of a fitted model:
pygod.models.base.BaseDetector.decision_scores_
: The outlier scores of the training data. The higher, the more abnormal. Outliers tend to have higher scores.pygod.models.base.BaseDetector.labels_
: The binary labels of the training data. 0 stands for inliers and 1 for outliers/anomalies.
For the inductive setting:
pygod.models.base.BaseDetector.predict()
: Predict if a particular sample is an outlier or not using the fitted detector.pygod.models.base.BaseDetector.predict_proba()
: Predict the probability of a sample being outlier using the fitted detector.pygod.models.base.BaseDetector.predict_confidence()
: Predict the model’s sample-wise confidence (available in predict and predict_proba).
Input of PyGOD: Please pass in a PyTorch Geometric (PyG) data object. See PyG data processing examples.
pygod.models.base.BaseDetector.process_graph()
(you do not need to call this explicitly): Process the raw PyG data object into a tuple of sub data objects needed for the underlying model.
See base class definition below:
pygod.models.base module#
Base class for all outlier detector models
- class pygod.models.base.BaseDetector(contamination=0.1)[source]#
Bases:
object
Abstract class for all outlier detection algorithms.
- Parameters:
contamination (float in (0., 0.5), optional (default=0.1)) – The amount of contamination of the data set, i.e. the proportion of outliers in the data set. Used when fitting to define the threshold on the decision function.
- decision_scores_#
The outlier scores of the training data. The higher, the more abnormal. Outliers tend to have higher scores. This value is available once the detector is fitted.
- Type:
numpy array of shape (n_samples,)
- threshold_#
The threshold is based on
contamination
. It is then_samples * contamination
most abnormal samples indecision_scores_
. The threshold is calculated for generating binary outlier labels.- Type:
- labels_#
The binary labels of the training data. 0 stands for inliers and 1 for outliers/anomalies. It is generated by applying
threshold_
ondecision_scores_
.- Type:
int, either 0 or 1
- decision_function(G)[source]#
Predict raw anomaly scores of PyG Graph G using the fitted detector. The anomaly score of an input sample is computed based on the fitted detector. For consistency, outliers are assigned with higher anomaly scores.
- Parameters:
G (PyTorch Geometric Data instance (torch_geometric.data.Data)) – The input graph.
- Returns:
anomaly_scores – The anomaly score of The input graph..
- Return type:
numpy array of shape (n_samples,)
- fit(G)[source]#
Fit detector. y is ignored in unsupervised methods.
- Parameters:
G (PyTorch Geometric Data instance (torch_geometric.data.Data)) – The input graph.
- Returns:
self – Fitted estimator.
- Return type:
- get_params(deep=True)[source]#
Get parameters for this estimator. See sklearn.base.BaseEstimator for more information.
- Parameters:
deep (bool, optional) – If True, will return the parameters for this estimator and contained sub-objects that are estimators. Default:
`True`
.- Returns:
params – Parameter names mapped to their values.
- Return type:
mapping of string to any
- predict(G, return_confidence=False)[source]#
Predict if a particular sample is an outlier or not.
- Parameters:
G (PyTorch Geometric Data instance (torch_geometric.data.Data)) – The input graph.
- Returns:
outlier_labels (numpy array of shape (n_samples,)) – For each observation, tells whether or not it should be considered as an outlier according to the fitted model. 0 stands for inliers and 1 for outliers.
confidence (numpy array of shape (n_samples,).) – Only if return_confidence is set to True.
- predict_confidence(G)[source]#
Predict the model’s confidence in making the same prediction under slightly different training sets. See [PVD20].
- Parameters:
G (PyTorch Geometric Data instance (torch_geometric.data.Data)) – The input graph.
- Returns:
confidence – For each observation, tells how consistently the model would make the same prediction if the training set was perturbed. Return a probability, ranging in [0,1].
- Return type:
numpy array of shape (n_samples,)
- predict_proba(G, method='linear', return_confidence=False)[source]#
Predict the probability of a sample being outlier. Two approaches are possible:
simply use Min-max conversion to linearly transform the outlier scores into the range of [0,1]. The model must be fitted first.
use unifying scores, see [KKSZ11].
- Parameters:
G (PyTorch Geometric Data instance (torch_geometric.data.Data)) – The input graph.
method (str, optional (default='linear')) – probability conversion method. It must be one of ‘linear’ or ‘unify’.
return_confidence (boolean, optional(default=False)) – If True, also return the confidence of prediction.
- Returns:
outlier_probability – For each observation, tells whether it should be considered as an outlier according to the fitted model. Return the outlier probability, ranging in [0,1]. Note it depends on the number of classes, which is by default 2 classes ([proba of normal, proba of outliers]).
- Return type:
numpy array of shape (n_samples, n_classes)
- process_graph(G)[source]#
Process the raw PyG data object into a tuple of sub data objects needed for the underlying model. For instance, if the training of the model need the node feature and edge index, return (G.x, G.edge_index).
- Parameters:
G (PyTorch Geometric Data instance (torch_geometric.data.Data)) – The input graph.
- Returns:
processed_data – The necessary information from the raw PyG Data object.
- Return type:
tuple of data object
- set_params(**params)[source]#
Set the parameters of this estimator. The method works on simple estimators as well as on nested objects (such as pipelines). The latter have parameters of the form
<component>__<parameter>
so that it’s possible to update each component of a nested object. See sklearn.base.BaseEstimator for more information.- Returns:
self
- Return type: