CIMtools.metrics package

CIMtools.metrics.balanced_accuracy_score_with_ad(Y_true, Y_pred, AD)

The metric is used to optimize AD thresholds and their (hyper)parameters. This metric shows how well AD definition detects Y-outliers. First, property prediction errors are estimated in cross-validation for all reactions in a dataset. The reactions for which the absolute prediction error is higher than 3×RMSE are identified as Y-outliers, while the rest are considered as Y-inliers. Y-Outliers (poorly predicted) that are predicted by AD definition as X-outliers (outside AD) are called true outliers (TO), while Y-inliers predicted by AD definition as X-inliers (within AD) are called true inliers (TI). False outliers (FO) are Y-inliers that are wrongly predicted by the AD definition as X-outliers, while false inliers (FI) are Y-outliers that are wrongly predicted by the AD definition as X-inliers. The quality of outliers/inliers determination can be assessed using an analogue of the balanced accuracy.

Parameters

Y_true (array-like, shape = [n_samples]) – The target values (real numbers in regression).
Y_pred (array-like, shape = [n_samples]) – The predicted values of Y_true.
AD (array-like, shape = [n_samples]) – Array contains True (reaction in AD) and False (reaction residing outside AD).

Returns

balanced_accuracy

Return type

float

CIMtools.metrics.rmse_score_with_ad(Y_true, Y_pred, AD)

The metric is used to optimize AD thresholds and their (hyper)parameters. This metric is the difference between RMSE of property prediction for reactions outside AD and within AD. The metric was first proposed by Sahigata et al [1]. Negative values indicate that the reactions detected X-outliers (outside AD) are predicted better than X-inliers (within AD), thus highlighting some possible drawbacks in the definition of interpolation space. Its positive values indicate a reliable partition for the reactions detected as inside and outside AD and higher predictive performance within AD as compared to outside it. If no reactions are left inside or outside AD, then OIR is considered equal to 0.

Parameters

Y_true (array-like, shape = [n_samples]) – The target values (real numbers in regression).
Y_pred (array-like, shape = [n_samples]) – The predicted values of Y_true.
AD (array-like, shape = [n_samples]) – Array contains True (reaction in AD) and False (reaction residing outside AD).

Returns

difference between RMSE of property prediction for reactions outside AD and within AD

Return type

float

References

1: Sahigara F., Mansouri K., Ballabio D., Mauri A., Consonni V. Todeschini R. Comparison of Different Approaches to Define the Applicability Domain of QSAR Models. Molecules, 2012, vol. 17, pp. 4791-4810. doi: 10.3390/molecules17054791.

CIMtools.metrics.tanimoto_kernel(x, y)

Calculate Tanimoto between each elements of array x and y.

Parameters

x (2D array) – Array of features.
y (2D array) – Array of features.

Note

Features in arrays x and y should be equal and in same order.

Returns: array – Pairwise Tanimoto coefficients.
Return type: 2D array