FRIES Trust Score: Measuring the Trustworthiness of AI

Illustration. Two hands holding a monitor with features about trustworthy and responsible AI displayed.

©stock.adobe.com – nateejindakum

Artificial intelligence is no longer an abstract, intangible vision of the future – it has now become part of our everyday life and helps make medical diagnoses, stock market predictions, solving computer vision tasks, writes elaborate paragraphs of text, and generates mesmerizing images. For this purpose, vast amounts of data and adequate models are needed. However, not all datasets and models are created equally: What if decisions are based on uncertain foundations? What if the models are biased, unstable or incomprehensible? What if the data in question was collected in an illicit manner? In a world in which AI algorithms are increasingly encroaching in key parts of our lives, one term is becoming central: Trust.

The question that thus begs to be asked is: How can the trustworthiness of AI models be measured –as objectively, holistically, and systematically as possible? Even though AI now seems to be omnipresent, researchers have not yet answered this crucial question. Without an answer to this question, however, no comparison between models and datasets is possible. Comparability and a common understanding of such key terminology should nevertheless be of utmost interest to every AI researcher.

This is exactly where the FRIES Trust Score comes in. The focus of this score is not just on a technical metric, but on a structured path to greater transparency and comparability in models and datasets. Before broaching the score itself, we need to discuss a few key terms.

What does “trust” mean in the context of artificial intelligence?

Trust is a term often used in everyday language – but in (AI) research it needs to be defined precisely, so as to provide a common foundation of understanding. Since no formal definition was provided thus far, we propose one and deduce from relevant literature, that trustworthy AI models and datasets must fulfill a variety of ethical, technical, and social requirements:

The five pillars of trust

Fairness: Models and datasets must not systematically disadvantage individual groups. Biased training data or discriminatory decision-making logic jeopardize fairness.
Robustness: Models and datasets must function reliably even under adverse conditions such as noisy input data or targeted manipulation attempts (e.g., adversarial attacks).
Integrity: Models and datasets must not unknowingly be tampered with. Changes have to be traceable and verifiable.
Explainability: Models must make decisions as transparent as possible for developers, users and regulators. Full insight into the data used to train a model is equally important.
Safety: The protection of sensitive data and access to models must be guaranteed, data breaches avoided at all costs.

These five aspects form the foundation of the conceptualization of trustworthiness and serve as the evaluation dimensions for the FRIES Trust Score. Given this background, we define the term “trust” in the context of Machine Learning as follows:

The concept of trust in Machine Learning comprises the fair use of data, robust performance when encountering anomalous data, the assurance of data and model integrity, the provision of explainable decisions as well as the safe use of confidential information.

Now that these five pillars of trust and the definition of the latter have been explained, we can discuss the FRIES Trust Score itself.

From quality assurance to trust assessment

The concepts of quality and trustworthiness have a lot in common. In both cases, an individual conceptually relies on, e.g., a product, service, or prediction. For instance, if one believes in the quality of a product, it will be used with fervor and no thought will be wasted on potential breakdowns or shortcomings (e.g., the use of a car from a manufacturer that is deemed reliable). This feeling – one might call it a feeling of trust – can be highly subjective in nature, but methods and tools for quality assurance do exist of course. Such methods include Six Sigma or Failure Mode and Effects Analysis (FMEA). Given the commonalities between quality and trustworthiness, the use of quality assurance methods appears to be a logical consequence of the lack of trust assessment methods.

Since FMEA (Failure Mode and Effects Analysis) has established itself as proven method for risk identification and assessment in engineering and especially in quality management, the adaptation of this approach to the context of trust makes sense. Uncertainties, sources of error and potential consequences must also be assessed in the context of trust. However, changes need to be made along the way.

The classic FMEA assesses quality risks on the basis of three factors, rated on a scale from 1 to 10:

Occurrence (O) – How likely is the error to occur?
Significance (S) – How significant would the error’s consequences be?
Detection (D) – How likely will the error be detected?

These values are multiplied with one another to obtain the so called RPN (Risk Priority Number). The goal of FMEA is to then reduce the resulting RPN by implementing new quality assurance steps.

The FRIES Trust Score: Structure and Function

The FMEA was adopted and adapted for the quantifications of trustworthiness; we want to assess risks related solely to trustworthiness. In addition, we do not want to improve trust at this point but merely aim to quantify it. Since FMEA is concerned with quality improvement, the list of risks that can be evaluated is entirely at the user’s liberty, i.e., in theory endless and in practice not comparable since every user might evaluate different risks. To mitigate this source of subjectivity, we provide users a list of risks per aspect of trust, both specific to models and to datasets. For ease of use, this list is available in a CLI (command line interface), which will also provide the final score and a JSON-file that can be used for subsequent analysis and comparison.

In the context of trust, risks are identified for each of the five pillars of trust (fairness, robustness, integrity, explainability, safety). They are assessed by the above-mentioned O, S and D values, which we score from 0 to 10 (0 = not trustworthy, 10 = entirely trustworthy) and also multiply with one another. However, a cubic root is applied to the multiplication to improve the score distribution. The addition of the value of 0 also allows to stress particularly serious risks that would result in a model or dataset that should overall not be considered as trustworthy (i.e., the overall score would result in 0, regardless of the remaining subscores). The resulting scores per aspect are then summarized and weighted, depending on the specific task. Finally, the user is provided with a FRIES Trust Score also ranging from 0 to 10, which can thus be compared either with the assessment with other users on the same model or dataset of with the score obtained when applied to a different model or dataset.

A practical example of how the score works

An AI model used for automated applicant selection could pose the following risks:

Fairness: User input leads to biased decisions (O=9, S=5, D=8)
Robustness: Repeated model executions do not generate the same or similar outputs (O=4, S=5, D=7)
Integrity: No output uncertainties are given (O=9, S=4, D=9)
Explainability: Decisions cannot be validated by stakeholders (O=8, S=3, D=9)
Safety: Insufficient access to the model (O=7, S=4, D=6)

The assessment of these risks – depending on how they are weighted – results in an overall FRIES Trust score of 6.24/10. Such a value would be categorized as slightly above average performance. The score shows that the system fulfils basic requirements, but leaves room for improvement.

Where do we go from here?

The FRIES Trust score represents a first step towards a common understanding of the term “trust” and towards a commonly used method to quantify the former. Naturally, valid criticism can be voiced regarding the approach, which in turn also opens up new research questions: (How) can the subjectivity of the assessments be reduced? What training do users need in order to use the score correctly? Can the method be scaled to efficiently assess large systems? Are the risks and aspects that are included holistic?

All these questions and many more are entirely legitimate to ask and will hopefully be answered soon. What is certain is the necessity for a common understanding of crucial terminology in each and every field of research, including trustworthy AI. With the proposal of the herein presented definition and quantification metric we hope to have contributed to this. Maybe more importantly, we hope to have kindled the imagination of other researchers, in the hopes of one day establishing a commonly used definition and quantification metric for the concept of trust in AI and beyond.

You wanna dive deeper into the topic? See the paper on Benchmarking Trust: A Metric for Trustworthy Machine Learning and also the author’s dissertation Verlässliche Identifikation logistischer Entitäten anhand inhärenter visueller Merkmale (in German).

11. June 2025

The FRIES Trust Score: How AI Trustworthiness Can Be Quantified

What does “trust” mean in the context of artificial intelligence?

The five pillars of trust

From quality assurance to trust assessment

The FRIES Trust Score: Structure and Function

A practical example of how the score works

Where do we go from here?

Topics

Tags

More blog posts