Uncertainty quantification for LLMs is a key analysis route in the direction of addressing hallucination and different points that restrict their dependable deployment. On this work, we present that reasoning hint size is an easy and helpful confidence estimator in massive reasoning fashions. Via complete experiments throughout a number of fashions, datasets, and prompts, we present that hint size performs in comparable however complementary methods to different zero-shot confidence estimators corresponding to verbalized confidence. Our work reveals that reasoning post-training basically alters the connection between hint size and accuracy, going past prior work that had proven that post-training causes traces to develop longer basically (e.g., “overthinking”). We examine the mechanisms behind hint size’s efficiency as a confidence sign, observing that the impact stays even after adjusting for confounders corresponding to drawback issue and GRPO-induced size bias. We determine high-entropy or “forking” tokens as enjoying a key position within the mechanism. Our findings exhibit that reasoning post-training enhances uncertainty quantification past verbal expressions, and set up hint size as a sensible confidence measure for giant reasoning fashions.
- † College of Southern California
- ‡ Stanford College

