A growing body of research in psychology and sociology has shown that profound dissatisfaction with body image is an increasingly pressing problem among young people, partly due to media exposure, which is expected to worsen with AI-generated idealised bodies.
In the DIZH-funded UnRealBody project, the HCHAI group developed a video-based intervention to raise awareness among young people about the potential harm caused by unrealistic AI-generated body images. Using a combination of the UnRealEngine for avatar animation and a curated selection of generated images, the video was created and piloted in a qualitative and quantitative interview study. Participants were recruited from within UZH and internationally through social media and personal networks.
The survey led to very relevant and interesting feedback that informed the refinement of the video. The first surprising finding was the pre-existing level of awareness and knowledge about the technology and its potential for harm among our survey participants. As a result, the final version of the video includes more detailed information, including references to relevant studies. Feedback also influences the design of the avatar and the choice of images, leading to a shift toward more photorealistic visuals.
The final version of the video is available. A manuscript reporting on the development and acceptability of the video intervention is in progress.
This project arose out of an earlier study, in which the HCHAI group explored the types and nature of errors in AI-generated humans. In this earlier study, we observed that as AI-generated images of humans created fewer anatomical errors, they generated more and more unrealistic and over-idealised images.
2024
-
Evaluating Text-to-Image Generated Photorealistic Images of Human Anatomy
Paula Muhr, Yating Pan, Charlotte Tumescheit, Ann-Kathrin Kübler, Hatice Kübra Parmaksiz, Cheng Chen, Pablo Sebastián Bolaños Orozco, Soeren S. Lienkamp, Janna Hastings, Paula Muhr, and 8 more authors
Cureus, Nov 2024
Publisher: Cureus
Background: Generative artificial intelligence (AI) models that can produce photorealistic images from text descriptions have many applications in medicine, including medical education and the generation of synthetic data. However, it can be challenging to evaluate their heterogeneous outputs and to compare between different models. There is a need for a systematic approach enabling image and model comparisons. Method: To address this gap, we developed an error classification system for annotating errors in AI-generated photorealistic images of humans and applied our method to a corpus of 240 images generated with three different models (DALL-E 3, Stable Diffusion XL, and Stable Cascade) using 10 prompts with eight images per prompt. Results: The error classification system identifies five different error types with three different severities across five anatomical regions and specifies an associated quantitative scoring method based on aggregated proportions of errors per expected count of anatomical components for the generated image. We assessed inter-rater agreement by double-annotating 25% of the images and calculating Krippendorf’s alpha and compared results across the three models and 10 prompts quantitatively using a cumulative score per image. The error classification system, accompanying training manual, generated image collection, annotations, and all associated scripts, is available from our GitHub repository at https://github.com/hastingslab-org/ai-human-images. Inter-rater agreement was relatively poor, reflecting the subjectivity of the error classification task. Model comparisons revealed that DALL-E 3 performed consistently better than Stable Diffusion; however, the latter generated images reflecting more diversity in personal attributes. Images with groups of people were more challenging for all the models than individuals or pairs; some prompts were challenging for all models. Conclusion: Our method enables systematic comparison of AI-generated photorealistic images of humans; our results can serve to catalyse improvements in these models for medical applications.
Mention Evaluating Human images