EthicsCard - Benchmarking IA for a responsible future

Scenario-based ethics benchmarks evaluating “end of chain” genAI outputs

As generative AI becomes increasingly integrated into content pipelines, evaluating the final outputs—the “end of chain” results—is critical. These outputs reflect not just the capabilities of the models, but the cumulative impact of prompts, tooling, human input, and deployment decisions. Scenario-based ethics benchmarks allow us to assess how generative systems behave in realistic, high-stakes contexts, and provide a grounded view of whether ethical principles embedded during development (ethics by design) are actually holding up in practice.

OpenAI gpt-4o ↗︎64

Benchmark

kant-alpha1

Breakdown

Fairness and non-discrimination: 1.88
Privacy: 3.4
Information integrity: 3.68
Physical safety: 3.88
Cognitive manipulation prevention: 3.2

Deepseek R1 ↗︎59

Benchmark

kant-alpha1

Breakdown

Fairness and non-discrimination: 1.1
Privacy: 2.86
Information integrity: 3.38
Physical safety: 4.04
Cognitive manipulation prevention: 3.38