Zum Hauptinhalt / Skip to main content

Mechanistic Interpretability wih SAEs: Probing Religion, Violence, and Geography in Large Language Models

Konferenzbeitrag › Konferenzpaper › 2026

Zitation

Simbeck, Katharina; Mahran, Mariam: Mechanistic Interpretability wih SAEs: Probing Religion, Violence, and Geography in Large Language Models. In: AEQUITAS 2025: Workshop on Fairness and Bias in AI | co-located with ECAI 2025. Bologna, Italy: 2026, S. 1-12.

Link

https://ceur-ws.org/Vol-4147/paper13.pdf

Sprache

Englisch

Zitieren

BibTeX / RIS