
How Disabling Parts of a Neural Network May Reveal the Secrets Behind Biased Outputs
A friend recently became intrigued by claims that some impressive new AI models have been sabotaged by government propaganda, so he began exploring ways to reverse that influence. In his search, he encountered the concepts of model ablation and mechanistic interpretability.
Read More »