What is the Role of Generative AI in Content Moderation?
In the Trust and Safety industry, there has been ongoing dialogue around the proliferation of Generative AI, its potential harms, and how companies deploy it safely for public use.
This ongoing dialogue showcases the complexities of GenAI content moderation, particularly when it comes to managing large-scale content operations.
There are already several tools that are being used by the public regularly, with the adoption of Generative AI having more than doubled in the past five years.
The Rise of Generative AI in Content Moderation
Tools like ChatGPT and GPT-4, DALL-E and its advanced versions 2 and 3, and Midjourney AI have gained a significant user base, even prompting academic institutions to issue policies to students about using tools like ChatGPT to complete assignments and projects citing academic integrity or risks to personal, private, confidential, or proprietary information.
Challenges of GenAI Content Moderation at Scale
According to a recent Gartner article, they predict that by 2026, GenAI “will automate 60% of the design effort for new websites and mobile apps” and that “over 100 million humans will engage robocolleagues to contribute to their work”.
By 2027, Gartner predicts that “nearly 15% of new applications will be automatically generated by AI without a human in the loop”. These predictions might evoke fear in some individuals whilst others may see these predictions as exciting.
Balancing Innovation and Safety in AI Deployments
For the Trust and Safety industry, there is a trade-off between deploying GenAI tools for public consumption to enhance working practices and advance technology use in sectors like healthcare for better patient outcomes and ensuring users are kept safe against bad actors and malicious attacks online.
What are the Challenges in CSAM Moderation with GenAI?
One of the most topical challenges discussed in the Trust and Safety industry is addressing the trade-off mentioned above.
Manipulation of GenAI Guardrails by Bad Actors
Although GenAI tools are ostensibly developed with guardrails in place (the same way that other online platforms have user terms of service to curb bad online behavior), it is not uncommon for bad actors to find methods of manipulating and avoiding these guardrails.
Harmful Content Generated by GenAI – A Growing Concern
We have likely all seen the media sharing incidents where GenAI tools have been used to develop harmful imagery – such as the Taylor Swift deepfakes which some have cited as image-based sexual violence.
Other problematic GenAI usage runs the gamut, including scammers using voice cloning, ongoing elections disinformation campaigns, and even risks to journalistic integrity.
The Escalation of GenAI-Generated CSAM
Unfortunately, the increase of GenAI child sexual abuse materials (CSAM) is one of the most deeply harmful issues that needs to be addressed within the industry.
The role of CSAM moderation becomes increasingly challenging as generative AI content moderation must adapt to the scale and complexity of AI-generated materials.
In fact, Stanford researchers found that Stable Diffusion, a text-to-image GenAI tool, was generating CSAM content because its models were trained on an open-source dataset that included hundreds of known CSAM images scraped from the internet. It began generating photo-realistic nude images, including CSAM.
Unfortunately, when these materials are generated through AI tools, it is up to Content Moderators to tackle the issue.
Addressing Psychological Effects on Content Moderators
As highlighted above, some AI tools like Stable Diffusion have been found to generate CSAM content because their models were trained on datasets that contained this imagery.
The issue develops partially because of the way these models are trained and partially due to bad actors manipulating or circumventing the tools’ guardrails.
Challenges Faced by Human Moderators in Tackling GenAI CSAM
The ability for users to generate this imagery then turns into a problem for platforms to tackle at a large scale. This highlights the ongoing issues of content moderation AI and the question of scale, where traditional methods may fall short.
Whether they are data labellers for AI companies or traditional Content Moderators for social media platforms, it is the Content Moderator role to ensure that users are kept from generating or viewing these harmful and illegal materials online.
Mental Health Risks Associated with GenAI CSAM Moderation
While reviewing CSAM is not new for many Content Moderators, there are unique challenges that come with reviewing GenAI CSAM versus ‘real’ CSAM. The added stressors underscore the importance of sympathetic leadership in supporting moderators’ mental health.
Some of these challenges include:
- Exponentially increased volume of CSAM content,
- Discerning GenAI CSAM and ‘real’ CSAM based on policies,
- Risk of vicarious traumatization and other mental health difficulties.
Managing the Increased Volume of GenAI CSAM
The increased volume of CSAM content is one challenge that Content Moderators must contend with. This underscores the urgent need for a content moderation service for children websites, to effectively manage and scale operations.
In 2023, the Internet Watch Foundation compiled a report on their investigations of the proliferation of GenAI CSAM content online. They found a total of 20,245 AI-generated images posted to one dark web CSAM forum in a one-month period. Of these images, over 11,000 were investigated as they were judged to be the most likely to be criminal.
Stress and Specialized Training Needs
The IWF stated in their report that “chief among those differences [from previous technologies] is the potential for offline generation of images at scale – with the clear potential to overwhelm those working to fight online child sexual abuse and divert significant resources from real CSAM towards AI CSAM.”
A Complex Task
The exponential increased volume of CSAM materials not only places undue stress on Content Moderators to quickly remove the materials but also means that they will require more specialized training to escalate these matters to law enforcement, the National Center for Missing and Exploited Children, and other third-party agencies.
Unfortunately, discerning GenAI CSAM from ‘real’ CSAM is challenging.
Discerning GenAI CSAM from ‘Real’ CSAM
It is up to Content Moderators to discern whether these images and videos are AI-generated or whether they are ‘real’ CSAM content – and in some cases, a combination of both.
Circumvention of Guardrails by Bad Actors
Bad actors are not only creating novel images that don’t depict real children, but they are also circumventing the tools’ guardrails to generate hundreds of new images of previous victims, sharing tips with other predators about how they are navigating around the safeguards in place, and re-victimizing children in the process.
Platform Policies and the Evolution of GenAI Moderation
Platform policies or user terms of service are what guides a Content Moderator to make an accurate decision for violative content. These policies are regularly reviewed and updated based on users’ online behaviors changing, regulatory requirements, legal requirements, or the advancement of technologies like GenAI.
The question is whether platform policies or terms of service have caught up to the proliferation of GenAI CSAM.
While many platforms’ terms of service include digitally generated imagery, fictional characters, art and other non-real depictions of CSAM as violations, it falls to the Content Moderator to accurately interpret these policies and make executive decisions about the materials.
Addressing Vicarious Trauma and Mental Health Issues in CSAM Moderation
Not unlike moderating real CSAM imagery and videos, Content Moderators who are now tasked with reviewing AI-generated CSAM are at higher risk of developing vicarious traumatization and other mental health difficulties.
The added stressors of increased volumes of content, swift takedowns, and accurate decision-making only heightens the level of risk to Content Moderators.
Symptoms of Secondary Traumatic Stress and Vicarious Trauma
Based on research conducted amongst adjacent populations including law enforcement and mental health professionals who are similarly exposed to child abuse materials in their line of work, repetitive exposure to CSAM can result in:
- Secondary traumatic stress indicated by irritability, social withdrawal, marriage difficulties, intrusive thoughts, autonomic system arousal
- Acute reactivity such as shock, anger, and sadness resulting from displays of emotions by victims, norm violations, and personal relevance to the viewer
- Vicarious trauma symptomatology including changes in cognitive schemas and core beliefs
- Post-traumatic stress symptoms, especially when CSAM included violence beyond the sexual assault
- Discomfort expressing intimacy with their own children – more prevalent in males than females
Factors Influencing Mental Health in CSAM Moderation
Other factors that influenced mental health difficulties such as elevated post-traumatic stress symptoms, anxiety and depressive symptoms, and lower subjective wellbeing in research conducted with ICAC (Internet Crimes Against Children) investigators included:
- Less control over work assigned,
- Not knowing about final case resolutions,
- Not attending training programs related to CSAM, and
- Unavailability of process-oriented staff discussions, access to mental health professionals and individual case reviews.
Support Systems for Content Moderators Handling GenAI CSAM
There are several ways that companies can support Content Moderators investigating cases of AI-generated CSAM or other real CSAM materials.
At Zevo, we highly recommend consideration of the working practices and policies, including offering AI moderation mental health support programs tailored to their specific challenges.
Implementing Effective Coping Mechanisms and Support Systems
The literature suggests that work-related factors can minimize the potential risk of harm to individuals exposed to this type of egregious content.
These include a sense of agency or autonomy in choosing case work, offering ample personal time off, and providing opportunities for shared debriefing and process-oriented discussions between colleagues and facilitated by mental health professionals.
Enhancing Mental Health Through Feedback Loops and Autonomy
Finally, knowing the outcomes of their investigations has been demonstrated to increase wellbeing scores and reduce mental ill health symptomology amongst adjacent populations such as law enforcement and mental health professionals similarly exposed to CSAM.
Therefore, we advocate for organizations to develop feedback loops between all stakeholders that allow Content Moderators to acknowledge the positive outcomes of the work they are conducting.
While the proliferation of GenAI CSAM will undoubtedly place additional pressures on organizations and Content Moderators alike to swiftly and accurately remove harmful materials and discern what is AI-generated versus real CSAM, there are indeed measures that can be implemented to protect moderation teams from further harm.
Get in Touch
Zevo Health provides comprehensive solutions to support Content Moderators dealing with the challenges of GenAI CSAM.
Our tailored programs focus on mental wellbeing, resilience, and effective coping mechanisms. To learn more about how we can assist your team, feel free to get in touch.