Teaching AI to Spot Its Own Biases

It is a very real problem, that sometime the AI writing our company diversity reports, are creating the very biases it is intended to eliminate. Here at Cultural Infusion’s Atlas, we’ve built an AI system that is able to spot when other AI systems are being subtly discriminatory in their writing.

The Problem: AI’s Hidden Prejudices

Increasingly, companies now rely on artificial intelligence to assist in writing their diversity, equity and inclusion reports – and this makes sense! AI can produce professional sounding reports quickly and cost-effectively.

The catch here, is that these AI systems often carry hidden biases from their training data, this subtly reinforces the very stereotypes that diversity programs are ideally meant to combat.

Consider this sentence from a hypothetical report: “Our senior leadership team, headed by capable male executives, collaborates with supportive female team members.”

While seemingly innocuous, this sentence quietly suggests that men are natural leaders, whereas women are naturally supportive. Arguably, this is exactly the kind of thinking that good diversity policies try to change.

The Detective AI

We decided to fight fire with fire, by building an AI system specifically designed to catch biases in other AI-generated content. Our “bias detective” spots five types of discrimination: gender, religious, age, disability and sexuality biases.

To this end, we generated nearly 1,000 fake diversity report using Google’s Gemini AI, then had human reviewers painstakingly analyse over 10,000 sentences to identify subtle biases. This created a training dataset to teach the detection system what bias looks like in corporate contexts.

The Findings

The results show even when explicitly asked to write inclusive content, the AI consistently produced biased language:

Disability bias appeared most frequently (7.6% of sentences), often framing accommodations as “special needs” rather than standard practice.
Gender bias (6.3%) typically reinforced traditional role expectations.
Religious bias (6.2%) often portrayed religious diversity as a management challenge rather than an asset.

At the end of the day, all companies analysed (including tech giants like GitHub and Okta) demonstrated measurable bias in the AI-generated reports.

The Technical Brilliance

Our detection system built using RoBERTa (a sophisticated language model) proved to be remarkably accurate. It caught bias with 97% accuracy for gender issues and performed well across all categories. We deliberately tuned it to err on the side of caution: it is better to flag a potentially innocent sentence for human review, than to miss actual bias.

Why This Matters for Everyone

As organisations increasingly use AI for important communications, hidden biases can:

Undermine genuine diversity efforts
Damage company credibility when biased content goes public
Reinforce harmful stereotypes amongst employees and customers, and
Create legal and reputational risks

This tool offers a practical solution: organisations can automatically scan their AI-generated content before publication, catching problematic language that humans might miss.

The Bigger Picture

This research highlights a crucial point about our AI-powered future: the systems we’re building to solve social problems can inadvertently perpetuate them. It’s not enough to ask AI to “be inclusive”, we need active, ongoing monitoring to ensure it delivers.

Platform

Atlas Surveys

Marketplace

Insights

Research

Partners

Us

Security

Support

Teaching AI to Spot Its Own Biases

Rezza Moieni

The Problem: AI’s Hidden Prejudices

The Detective AI

The Findings

The Technical Brilliance

Why This Matters for Everyone

The Bigger Picture

Product

Resources

Company