These tools and metrics are designed to help AI actors develop and use trustworthy AI systems and applications that respect human rights and are fair, transparent, explainable, robust, secure and safe.
Scope
SUBMIT A METRIC
If you have a tool that you think should be featured in the Catalogue of AI Tools & Metrics, we would love to hear from you!
SUBMITSurrogacy Efficacy Score (SESc)
The Surrogacy Efficacy Score is a technique for gaining a better understanding of the inner workings of complex "black box" models. For example, by using a Tree-based model, this method provides a more interpretable representation of the model’s behavior by...
Objectives:
G-Eval
G-Eval is a novel framework designed to evaluate the outputs of large language models (LLMs) using the interpretive and reasoning capabilities of the models themselves. Introduced in the paper “NLG Evaluation using GPT-4 with Better Human Alignment”,...
Objectives:
Earth Mover’s Distance
Objectives:
Log odds-ratio
Objectives:
SAFE Artificial Intelligence in finance
We propose a set of interrelated metrics, all based on the notion of AI output concentration, and the related Lorenz curve/Lorenz area under the curve, able to measure the Sustainability/robustness, Accuracy, Fairness/privacy, Explainability/accountability ...
Objectives:
Local Explanation Method using Nonlinear Approximation (LEMNA)
Given an input data sample, LEMNA generates a small set of interpretable features to explain how the input sample is classified. The core idea is to approximate a local area of the complex deep learning decision boundary using a simple interpretable model. ...
Objectives:
Shapley Additive Explanation (SHAP)
Shapley Additive Explanations (SHAP) is a method that quantifies the contribution of each feature to the output of a predictive model. Rooted in cooperative game theory, SHAP values provide a theoretically sound approach for interpreting complex models by d...
Objectives:
Local Interpretable Model-agnostic Explanation (LIME)
Local Interpretable Model-agnostic Explanations (LIME) is a method developed to enhance the explainability and transparency of machine learning models, particularly those that are complex and difficult to interpret. It is designed to provide clear, localize...
Objectives:
Variable Importance Cloud (VIC)
Objectives:
Beta Shapley
Objectives:
Spearman's rank correlation coefficient (SRCC)
In statistics, Spearman's rank correlation coefficient or Spearman's ρ is a non-parametric measure of rank correlation (statistical dependence between the rankings of two variables). It assesses how well the relationship between two variables can be describ...
Objectives:
Partial Dependence Complexity (PDC)
The Partial Dependence Complexity metric uses the concept of Partial Dependence curve to evaluate how simple this curve can be represented. The partial dependence curve is used to show model predictions are affected on average by each feature. Curves repres...
Objectives:
α-Feature Importance (αFI)
The α-Feature Importance metric quantifies the minimum proportion of features required to represent α of the total importance. In other words, this metric is focused in obtaining the minimum number of features necessary to obtain no less than α × 100% of th...
Objectives:
Predictions Groups Contrast (PGC)
The PGC metric compares the top-K ranking of features importance drawn from the entire dataset with the top-K ranking induced from specific subgroups of predictions. It can be applied to both categorical and regression problems, being useful for quantifying...
Objectives:
Local Feature Importance Spread Stability (LFISS)
Local Feature Importance refers to the assignment of feature normalized importance to different regions of the input data space. For a given dataset D with N samples, it is possible to compute a vector of feature importance for each individual observation d...
Objectives:
SAFE (Sustainable, Accurate, Fair and Explainable)
Machine learning models, at the core of AI applications, typically achieve a high accuracy at the expense of an insufficient explainability. Moreover, according to the proposed regulations, AI applications based on machine learning must be "trus...
Objectives:
Normalized Mutual Information (NMI)
Normalized Mutual Information is a metric calculated between two clusterings and is a normalization of the Mutual Information (MI) score to scale the results between 0 (no mutual information) and 1 (perfect correlation).
Objectives:
Kendall rank correlation coefficient (KRCC)
In statistics, the Kendall rank correlation coefficient, commonly referred to as Kendall's τ coefficient, is a statistic used to measure the ordinal association between two measured quantities. A τ test is a non-parametric hypothesis test for statistical de...
Objectives:
Learned Perceptual Image Patch Similarity (LPIPS)
The learned perceptual image patch similarity (LPIPS) is used to judge the perceptual similarity between two images. LPIPS is computed with a model that is trained on a labeled dataset of human-judged perceptual similarity. The perception-measuring model co...
Objectives:
Fréchet Inception Distance (FID)
The Fréchet inception distance (FID) typically measures the quality of image generative models. More specifically, FID is a semimetric commonly applied to generative models based on generative adversarial networks (GANs), which was among the first generativ...
Objectives: