API CheatSheet#
The following APIs are applicable for all detector models for easy use.
pygod.models.base.BaseDetector.fit()
: Fit detector. y is ignored in unsupervised methods.pygod.models.base.BaseDetector.decision_function()
: Predict raw anomaly scores of PyG Graph G using the fitted detector
Key Attributes of a fitted model:
pygod.models.base.BaseDetector.decision_scores_
: The outlier scores of the training data. The higher, the more abnormal. Outliers tend to have higher scores.pygod.models.base.BaseDetector.labels_
: The binary labels of the training data. 0 stands for inliers and 1 for outliers/anomalies.
For the inductive setting:
pygod.models.base.BaseDetector.predict()
: Predict if a particular sample is an outlier or not using the fitted detector.pygod.models.base.BaseDetector.predict_proba()
: Predict the probability of a sample being outlier using the fitted detector.pygod.models.base.BaseDetector.predict_confidence()
: Predict the model’s sample-wise confidence (available in predict and predict_proba).
Input of PyGOD: Please pass in a PyTorch Geometric (PyG) data object. See PyG data processing examples.
pygod.models.base.BaseDetector.process_graph()
(you do not need to call this explicitly): Process the raw PyG data object into a tuple of sub data objects needed for the underlying model.
See base class definition below:
pygod.models.base module#
Base classes for all outlier detector
- class pygod.models.base.BaseDetector(contamination=0.1)[source]#
Bases:
object
Abstract class for all outlier detection algorithms.
- Parameters:
contamination (float in (0., 0.5), optional (default=0.1)) – The amount of contamination of the data set, i.e. the proportion of outliers in the data set. Used when fitting to define the threshold on the decision function.
- decision_scores_#
The outlier scores of the training data. The higher, the more abnormal. Outliers tend to have higher scores. This value is available once the detector is fitted.
- Type:
numpy array of shape (n_samples,)
- threshold_#
The threshold is based on
contamination
. It is then_samples * contamination
most abnormal samples indecision_scores_
. The threshold is calculated for generating binary outlier labels.- Type:
- labels_#
The binary labels of the training data. 0 stands for inliers and 1 for outliers/anomalies. It is generated by applying
threshold_
ondecision_scores_
.- Type:
int, either 0 or 1
- decision_function(G)[source]#
Predict raw outlier scores of PyG Graph G using the fitted detector. The outlier score of an input sample is computed based on the fitted detector. For consistency, outliers are assigned with higher anomaly scores.
- Parameters:
G (PyTorch Geometric Data instance (torch_geometric.data.Data)) – The input graph.
- Returns:
outlier_scores – The outlier score of shape
.
- Return type:
numpy.ndarray
- fit(G, y_true=None)[source]#
Fit detector.
- Parameters:
G (PyTorch Geometric Data instance (torch_geometric.data.Data)) – The input graph.
y_true (numpy.ndarray, optional) – The optional outlier ground truth labels used to monitor the training progress. They are not used to optimize the unsupervised model. Default:
None
.
- Returns:
self – Fitted estimator.
- Return type:
- get_params(deep=True)[source]#
Get parameters for this estimator. See sklearn.base.BaseEstimator for more information.
- Parameters:
deep (bool, optional) – If True, will return the parameters for this estimator and contained sub-objects that are estimators. Default:
`True`
.- Returns:
params – Parameter names mapped to their values.
- Return type:
mapping of string to any
- predict(G, return_confidence=False)[source]#
Predict if a particular sample is an outlier or not.
- Parameters:
G (PyTorch Geometric Data instance (torch_geometric.data.Data)) – The input graph.
- Returns:
outlier_labels (numpy array of shape (n_samples,)) – For each observation, tells whether or not it should be considered as an outlier according to the fitted model. 0 stands for inliers and 1 for outliers.
confidence (numpy array of shape (n_samples,).) – Only if return_confidence is set to True.
- predict_confidence(G)[source]#
Predict the model’s confidence in making the same prediction under slightly different training sets. See [PVD20].
- Parameters:
G (PyTorch Geometric Data instance (torch_geometric.data.Data)) – The input graph.
- Returns:
confidence – For each observation, tells how consistently the model would make the same prediction if the training set was perturbed. Return a probability, ranging in [0,1].
- Return type:
numpy array of shape (n_samples,)
- predict_proba(G, method='linear', return_confidence=False)[source]#
Predict the probability of a sample being outlier. Two approaches are possible:
simply use Min-max conversion to linearly transform the outlier scores into the range of [0,1]. The model must be fitted first.
use unifying scores, see [KKSZ11].
- Parameters:
G (PyTorch Geometric Data instance (torch_geometric.data.Data)) – The input graph.
method (str, optional (default='linear')) – probability conversion method. It must be one of ‘linear’ or ‘unify’.
return_confidence (boolean, optional(default=False)) – If True, also return the confidence of prediction.
- Returns:
outlier_probability – For each observation, tells whether it should be considered as an outlier according to the fitted model. Return the outlier probability, ranging in [0,1]. Note it depends on the number of classes, which is by default 2 classes ([proba of normal, proba of outliers]).
- Return type:
numpy array of shape (n_samples, n_classes)
- process_graph(G)[source]#
Process the raw PyG data object into a tuple of sub data objects needed for the underlying model. For instance, if the training of the model need the node feature and edge index, return (G.x, G.edge_index).
- Parameters:
G (PyTorch Geometric Data instance (torch_geometric.data.Data)) – The input graph.
- Returns:
processed_data – The necessary information from the raw PyG Data object.
- Return type:
tuple of data object
- set_params(**params)[source]#
Set the parameters of this estimator. The method works on simple estimators as well as on nested objects (such as pipelines). The latter have parameters of the form
<component>__<parameter>
so that it’s possible to update each component of a nested object. See sklearn.base.BaseEstimator for more information.- Returns:
self
- Return type:
- class pygod.models.base.DeepDetector(in_dim=None, hid_dim=64, num_layers=4, dropout=0.3, weight_decay=0.0, act=<built-in method relu of type object>, contamination=0.1, lr=0.005, epoch=5, gpu=-1, batch_size=0, num_neigh=-1, scalable=False, verbose=False, **kwargs)[source]#
Bases:
BaseDetector
,ABC
Abstract class for deep outlier detection algorithms.
- Parameters:
TODO (update the docstring) –
hid_dim (int, optional) – Hidden dimension of model. Default:
0
.num_layers (int, optional) – Total number of layers in model. A half (floor) of the layers are for the encoder, the other half (ceil) of the layers are for decoders. Default:
4
.dropout (float, optional) – Dropout rate. Default:
0.
.weight_decay (float, optional) – Weight decay (L2 penalty). Default:
0.
.act (callable activation function or None, optional) – Activation function if not None. Default:
torch.nn.functional.relu
.contamination (float, optional) – Valid in (0., 0.5). The proportion of outliers in the data set. Used when fitting to define the threshold on the decision function. Default:
0.1
.lr (float, optional) – Learning rate. Default:
0.004
.epoch (int, optional) – Maximum number of training epoch. Default:
5
.gpu (int) – GPU Index, -1 for using CPU. Default:
0
.batch_size (int, optional) – Minibatch size, 0 for full batch training. Default:
0
.num_neigh (int, optional) – Number of neighbors in sampling, -1 for all neighbors. Default:
-1
.scalable (bool, optional) – Whether using the scalable version of the model TODO: add more info about the scalable version Default:
False
.verbose (bool) – Verbosity mode. Turn on to print out log information. Default:
False
.
- decision_function(G)[source]#
Predict raw anomaly score using the fitted detector. Outliers are assigned with larger anomaly scores.
- Parameters:
G (PyTorch Geometric Data instance (torch_geometric.data.Data)) – The input data.
- Returns:
outlier_scores – The anomaly score of shape
.
- Return type:
numpy.ndarray
- fit(G, y_true=None)[source]#
Fit detector with input data.
- Parameters:
G (torch_geometric.data.Data) – The input data.
y_true (numpy.ndarray, optional) – The optional outlier ground truth labels used to monitor the training progress. They are not used to optimize the unsupervised model. Default:
None
.
- Returns:
self – Fitted estimator.
- Return type:
- get_params(deep=True)#
Get parameters for this estimator. See sklearn.base.BaseEstimator for more information.
- Parameters:
deep (bool, optional) – If True, will return the parameters for this estimator and contained sub-objects that are estimators. Default:
`True`
.- Returns:
params – Parameter names mapped to their values.
- Return type:
mapping of string to any
- predict(G, return_confidence=False)#
Predict if a particular sample is an outlier or not.
- Parameters:
G (PyTorch Geometric Data instance (torch_geometric.data.Data)) – The input graph.
- Returns:
outlier_labels (numpy array of shape (n_samples,)) – For each observation, tells whether or not it should be considered as an outlier according to the fitted model. 0 stands for inliers and 1 for outliers.
confidence (numpy array of shape (n_samples,).) – Only if return_confidence is set to True.
- predict_confidence(G)#
Predict the model’s confidence in making the same prediction under slightly different training sets. See [PVD20].
- Parameters:
G (PyTorch Geometric Data instance (torch_geometric.data.Data)) – The input graph.
- Returns:
confidence – For each observation, tells how consistently the model would make the same prediction if the training set was perturbed. Return a probability, ranging in [0,1].
- Return type:
numpy array of shape (n_samples,)
- predict_proba(G, method='linear', return_confidence=False)#
Predict the probability of a sample being outlier. Two approaches are possible:
simply use Min-max conversion to linearly transform the outlier scores into the range of [0,1]. The model must be fitted first.
use unifying scores, see [KKSZ11].
- Parameters:
G (PyTorch Geometric Data instance (torch_geometric.data.Data)) – The input graph.
method (str, optional (default='linear')) – probability conversion method. It must be one of ‘linear’ or ‘unify’.
return_confidence (boolean, optional(default=False)) – If True, also return the confidence of prediction.
- Returns:
outlier_probability – For each observation, tells whether it should be considered as an outlier according to the fitted model. Return the outlier probability, ranging in [0,1]. Note it depends on the number of classes, which is by default 2 classes ([proba of normal, proba of outliers]).
- Return type:
numpy array of shape (n_samples, n_classes)
- process_graph(G)#
Process the raw PyG data object into a tuple of sub data objects needed for the underlying model. For instance, if the training of the model need the node feature and edge index, return (G.x, G.edge_index).
- Parameters:
G (PyTorch Geometric Data instance (torch_geometric.data.Data)) – The input graph.
- Returns:
processed_data – The necessary information from the raw PyG Data object.
- Return type:
tuple of data object
- set_params(**params)#
Set the parameters of this estimator. The method works on simple estimators as well as on nested objects (such as pipelines). The latter have parameters of the form
<component>__<parameter>
so that it’s possible to update each component of a nested object. See sklearn.base.BaseEstimator for more information.- Returns:
self
- Return type: