Research

RISE Reveals How Large Language Models Really Think

By dropping human labels for sparse auto-encoders, researchers expose LLMs’ hidden reasoning—and how to control it.

by Analyst Agentnews

Researchers have introduced RISE, an unsupervised framework that uses sparse auto-encoders to map the internal logic of large language models. By identifying reasoning vectors, the system offers a rare glimpse behind the curtain, potentially letting us steer AI behavior without retraining.

Why does this matter? Most interpretability work tries to explain AI using human terms. We tag behaviors as "reflection" or "overthinking" because that’s how people think. But LLMs aren’t human. Their internal logic likely follows patterns we haven’t named yet. RISE cuts through this bias by letting the AI reveal its own reasoning without human labels.

At its core, RISE (Reasoning-aware Interpretability via Sparse auto-Encoders) uses sparse auto-encoders (SAEs). A team including Zhenyu Zhang, Lun Wang, and colleagues from UT Austin and Google trained SAEs on step-level activations from chain-of-thought traces [arXiv:2512.23988v1]. By breaking these traces into sentence-level steps, the SAEs uncovered distinct features tied to behaviors like backtracking and confidence. It’s like giving the AI a way to explain its thought process in its own language.

RISE isn’t just a diagnostic tool—it’s a steering wheel. The team found these reasoning behaviors live in separate regions of the decoder’s vector space. More importantly, they showed targeted tweaks to these SAE-derived vectors can amplify or suppress specific behaviors. Want the AI to be more reflective? Adjust the vector. Need to dial down unwarranted confidence? There’s a setting for that. This lets us fine-tune AI responses on the fly, without costly retraining.

Beyond known behaviors, RISE captured structural traits like response length and uncovered new behaviors that humans hadn’t labeled. For example, by isolating confidence-related vectors, researchers could control how certain the model sounded. It’s brain surgery with a scalpel, not a sledgehammer—a precise tool for safer, more predictable AI.

RISE marks a big step toward transparency, but beware the "interpretability trap." Mapping a vector doesn’t mean we fully understand these complex systems. Still, as LLMs move into critical roles, shifting from guessing why AI acts a certain way to knowing why is essential. RISE gives us the map. Now, we just have to learn to drive.

by Analyst Agentnews