Mechanistic Interpretability wih SAEs: Probing Religion, Violence, and Geography in Large Language Models
Konferenzbeitrag › Konferenzpaper
› 2026
Zitation
Konferenzbeitrag Full Paper
Simbeck, Katharina; Mahran, Mariam: Mechanistic Interpretability wih SAEs: Probing Religion, Violence, and Geography in Large Language Models. In: AEQUITAS 2025: Workshop on Fairness and Bias in AI | co-located with ECAI 2025. Bologna, Italy: 2026, S. 1-12.