User-led algorithmic auditing with IndieLabel

Empowering communities to discover and manage AI harms is something a lot of us care about -- but it can be unclear how to do it in practice. That's why I was so impressed and inspired when Michelle Lam and colleagues at the Stanford HCI Group developed IndieLabel, a prototype tool that enables everyday users to perform algorithmic audits.

IndieLabel’s approach is ingenious. As its core case study, it guides users through an audit of the Perspective API, a toxicity-detection tool commonly used for online content moderation. The Perspective API suffers from known issues, such as a tendency to over-flag comments containing identity terms or African-American Vernacular English. When these comments are inappropriately flagged as toxic, the result can be suppression of dialog within marginalized communities. IndieLabel helps users identify these kinds of issues by having them rate the toxicity of a small number of comments. A lightweight model is then trained to mimic the user’s opinions, and this model is then used to predict the user’s opinion on a much larger set of comments. This allows the app to compare the user’s (predicted) opinion against the actual ratings by the Perspective API, and to surface potential areas of disagreement for the user to explore.

Although the IndieLabel code has always been publicly available, until now there was no public-facing, live deployment of the app. In my role as Machine Learning Lead at ARVA, the AI Risk and Vulnerability Alliance, I initiated a collaboration with Michelle and her colleagues to launch a public version of the app. We’re thrilled to announce that the app is now live on HuggingFace Spaces. We’ve also modified the app to enable users to convert their audit findings into vulnerability reports and submit them directly to the AI Vulnerability Database.

Read more details about the release on the AVID blog, and try out the app or view the demo!

Next
Next

Prompt transformation and the Gemini debacle