AI Labels – How They Can Provide Guidance and Build Trust

AI label for MobileNetV3Small (rated A) shown next to a symbolic image of a green tree growing from a computer chip. The AI label highlights energy efficiency, robustness, and accuracy, promoting sustainable and trustworthy AI. ____ KI-Label für MobileNetV3Small (mit der Bewertung A) neben einem symbolischen Bild eines grünen Baumes, der aus einem Computerchip wächst. Das KI-Label hebt Energieeffizienz, Robustheit und Genauigkeit hervor und fördert damit eine nachhaltige und vertrauenswürdige KI.
© stock.adobe.com – Vadym

Artificial Intelligence is rapidly gaining significance and presence in today’s society. Not only large corporations and technical professionals, but also smaller companies and individuals without technical backgrounds are now using AI. When a company decides to use AI, many different people with varying levels of expertise are involved in the decision-making process. Despite their different backgrounds and knowledge, all parties need to agree on a final decision. It is often difficult to bring everyone to the same level of understanding. These information and communication gaps make well-informed and evidence-based decisions more difficult. In the long term, this problem stands in the way of a sustainable and trustworthy use of AI. To ensure such responsible use, existing communication gaps must be closed. 

So how can these pressing problems be addressed? To simplify communication within these decision-making processes, researchers at the Lamarr Institute at TU Dortmund have developed the concept of AI labels, inspired by the EU energy label. These labels are intended to present complex information about AI models in a way that is also understandable for non-experts. AI labels can thus be a valuable tool for creating transparency, bridging communication gaps between stakeholders, and making the use and development of AI more sustainable. In theory, AI labeling already shows great potential for improvement. However, since other labeling systems and AI trust seals have recently faced criticism regarding their effectiveness, it is particularly important to evaluate the AI labeling approach at an early stage. This enables a realistic assessment of its potential and allows for adjustments if needed. 

The image shows an AI performance and efficiency label designed in the style of an energy efficiency label. It evaluates the MobileNetV3Small model for the ImageNet (ILSVRC2012) task, with inference performed using A100 x8 GPUs and TensorFlow 2.8.0.
Figure 1: Prototypes for AI labels that were also presented to the interviewees
© Raphael Fischer

Qualitative Study on Practical Evaluation

To evaluate the practical usefulness of AI labeling, a qualitative user study was conducted. Through interviews, the proposed AI labels were examined in relation to the following key research questions: 

  1. Who is interested in AI labels and what challenges do they face in using or developing AI? 
  1. What are the practical benefits and limitations of labeling AI model behavior? 
  1. How are AI labels perceived in comparison to other forms of information presentation? 
  1. How do AI labels and their associated certifying authorities affect trust in AI systems? 

Based on these research questions, an interview guide was developed and a call for participation was published. A total of 16 participants, including developers, users, and non-expert individuals interested in AI were interviewed using a semi-structured format to gather their experiences and assessments. Following the guide, participants were first asked to introduce themselves, describe their connection to AI, and discuss their everyday challenges. They were then shown a prototype of an AI label, which they were asked to describe and assess in terms of perceived advantages and disadvantages. After that a second label was presented, allowing participants to compare the two and identify further aspects they found helpful or confusing. In the third part of the interview, the interviewers introduced alternative reporting and visualization formats, which participants were asked to compare with the AI label format. Finally, the conversation focused on trustworthiness, certification, and regulation of labels. Participants were asked who could develop and issue labels and which type of institution they would most likely trust. The interviews were recorded, transcribed, coded, and analyzed qualitatively. An inductive approach was used, the coding scheme was iteratively refined, and the intercoder reliability intercoder reliability was checked. 

From Theory to Practice: What AI Labels Need to Deliver 

To determine what should be considered in the development of AI labels, various user groups and their needs and requirements were first identified. The analysis showed that labeling can be useful for users, managers, customers, experts, and developers of AI. The prior knowledge and requirements of these different groups are as diverse as the resulting challenges in dealing with AI. 

Most participants emphasized predictive accuracy and performance as central requirements for artificial intelligence. The availability of data and the protection of personal and sensitive information were also particularly important to the respondents. Many participants found communication about the models especially challenging, often citing problems related to comprehensibility and transparency as the main cause. Knowledge gaps between stakeholders and a lack of education within companies also seem to hinder employee involvement and increase uncertainty in handling new AI tools. 

Participants particularly highlighted communication gaps as well as the comprehensibility and transparency of AI models. The AI label that was presented to the participants was seen as a helpful tool to support communication and knowledge transfer. The clear layout of the AI label was considered especially helpful, as it could facilitate model comparison and support decision-making processes. The design and color coding were also praised. Among other things, the way information was presented helped raise awareness of the link between sustainability and efficiency. AI Labels, therefore, can be not only informative but also act as decision-making aids in terms of sustainability by drawing attention to sustainability-related features such as energy consumption. 

Although the simplified presentation of the AI labels was mentioned as an advantage, some participants also saw a risk: the loss of depth could lead to misunderstandings. In particular, the term “robustness,” the weighting of metrics, and the overall score led to misinterpretations. The color coding of AI labels was also partly criticized for not being accessible to color-blind individuals. As solutions, participants suggested additional letter ratings, explanatory back sides, and interactive AI labels that could display different information depending on the target group. 

Compared to other forms of communication, the lack of depth was seen as a potential disadvantage. However, no other reporting format was considered to be as simple and time efficient as the label. Additionally, the label offered an at-a-glance overview of the most important aspects. Many participants stated that they would see the AI label as an interim solution and would use it as a supplement to other reporting formats, which could help bring all stakeholders together. 

Regarding the question of whether AI labels are suitable for building trust, views varied. Some saw AI labels reviewed by experts as a helpful tool, while others doubted that metrics alone could create trust. A key issue in this context was the credibility of the issuing institution. Some preferred to test the models themselves, while others considered the reliability of the responsible authorities. Half of the interviewees expressed doubts about the objectivity of possible issuing institutions, fearing they could be bribed or manipulate the system, as has happened with other seals in the past. Who exactly should take on this responsibility remained unclear because no one could name a clear authority. What became clear, however, was that the trustworthiness of AI labels depends heavily on the prior knowledge of the respective target group. Depending on their level of expertise, expectations regarding credibility can differ significantly.

Conclusion: Why AI Labels Are Important, Relevant, and Valuable 

The increasing use of artificial intelligence presents new challenges for organizations, especially when it comes to communicating complex systems. Different levels of knowledge, lack of transparency, and uncertainty in dealing with AI make informed decision-making difficult. This is precisely where AI labels come in. They aim to make key information about AI models understandable, comparable, and accessible both for experts and for people without a background in AI. 

The user study shows that there is a real need for such simplified representations. AI Labels can serve as a bridge between experts and users, reduce misunderstandings, and facilitate employee involvement. Especially for user groups without technical knowledge, AI labels offer quick access to relevant information and thus support responsible and reflective AI usage. The main challenge is to present information in an understandable way without losing important technical detail. Since different user groups require different types of information, a “one-size-fits-all” AI label is not sufficient. AI Labels should therefore be designed to be interactive, so that each audience can access the information that is most relevant to them. In addition, AI labels should be linked to other reporting formats, allowing interested users to explore more deeply, which could enhance both the effectiveness and trustworthiness of the labels. Furthermore, AI labels are not only informative. They can also guide decision-making by highlighting certain performance aspects of models. In the context of sustainability, for example, labels could help shift the focus away from pure performance and toward environmental concerns. In summary, the AI labels were well received by participants and received a lot of positive feedback. However, they still need to be improved and refined in line with the points mentioned, so that their full potential can be achieved by all user groups. 

Limitations of the Study and Outlook for Future Research

As insightful as the study may be, it is not without limitations. The recruitment of participants could have led to a sampling bias, which limits the generalizability of the findings. Caution is also advised when presenting the AI labels: since the AI labels were designed by the researchers themselves, unintentional influence on participants cannot be ruled out. 

The findings of the study will be presented and published at this year’s “Conference on AI, Ethics and Society” (AIES). To further validate the results, a follow-up quantitative study with a larger sample size would be beneficial. This would allow for stronger statistical support. At the same time, such a follow-up study could test different AI label designs in more realistic application scenarios to determine which aspects actually lead to trust. 

Further Information

Paper: https://doi.org/10.1609/aies.v8i1.36601

GitHub: https://github.com/raphischer/strep 

Katharina Poitz

Katharina Poitz studies sociology and psychology at the University of Dortmund. At the Lamarr Institute, she works as a student assistant and focuses her research on the impact and effectiveness of AI labels. 

Raphael Fischer

Raphael Fischer is an AI researcher at Lamarr Institute / TU Dortmund University (PhD defended) and also heads the Young AI Leaders Hub in Dortmund. His work focuses on the sustainable and trustworthy development and use of AI, with specialized topics like high-level AI labeling for increased transparency and automated model selection via resource-aware meta-learning. His collaborations across disciplines and application areas centrally promote responsibility in AI.

More blog posts