POPPER Framework

AI Agent that automates hypothesis validation with statistical rigor

Authors

Kexin Huang¹, Ying Jin², Ryan Li¹, Michael Y. Li¹, Emmanuel Candès^3,4, Jure Leskovec¹

¹Department of Computer Science, Stanford University

²Data Science Initiative & Department of Health Care Policy, Harvard University

³Department of Statistics, Stanford University

⁴Department of Mathematics, Stanford University

Learn More Read Paper

How POPPER Works

POPPER is a novel framework for rigorous and automated validation of free-form natural language hypotheses using LLM agents.

Experiment Design Agent

Proposes falsification experiments

Leverages reasoning capabilities and domain knowledge to identify measurable implications of the main hypothesis and design falsification experiments.

Experiment Execution Agent

Implements and runs experiments

Implements the designed experiments through data collection, simulations, statistical analyses, or real-world procedures to produce p-values.

Sequential Error Control

Maintains statistical rigor

Converts p-values into e-values and aggregates evidence while strictly controlling the Type-I error rate for statistically sound decisions.

Iterative Testing

Accumulates evidence systematically

Systematically explores the flexibility of a hypothesis by iteratively testing adaptively solicited implications while adhering to rigorous statistical principles.

POPPER vs. Human Experts

Our framework matches human performance while dramatically reducing validation time.

9.7x Faster

POPPER completes hypothesis validation tasks 9.7 times faster than human experts, dramatically accelerating the research process.

3.6x More Code

POPPER generates 3.6 times more lines of code than human experts, enabling more comprehensive and thorough analysis.

2.5x More Tests

POPPER performs 2.5 times more statistical tests than human experts, providing more robust evidence for or against hypotheses.

See POPPER in Action

Watch our demo showcasing POPPER's capabilities for hypothesis validation in target validation scenarios.

Watch Demo

Hypothesis Validation for Target Validation

See how POPPER validates hypotheses like "Gene A regulates Phenotype B" with statistical rigor and automated experimentation.

Try POPPER Research Prototype

Ready to Accelerate Your Research?

Request Demo Contact Us