POPPER Framework

AI Agent that automates hypothesis validation with statistical rigor

Stanford University
Harvard University

Authors

Kexin Huang1, Ying Jin2, Ryan Li1, Michael Y. Li1, Emmanuel Candès3,4, Jure Leskovec1

1Department of Computer Science, Stanford University

2Data Science Initiative & Department of Health Care Policy, Harvard University

3Department of Statistics, Stanford University

4Department of Mathematics, Stanford University

How POPPER Works

POPPER is a novel framework for rigorous and automated validation of free-form natural language hypotheses using LLM agents.

POPPER Framework Diagram
Experiment Design Agent
Proposes falsification experiments

Leverages reasoning capabilities and domain knowledge to identify measurable implications of the main hypothesis and design falsification experiments.

Experiment Execution Agent
Implements and runs experiments

Implements the designed experiments through data collection, simulations, statistical analyses, or real-world procedures to produce p-values.

Sequential Error Control
Maintains statistical rigor

Converts p-values into e-values and aggregates evidence while strictly controlling the Type-I error rate for statistically sound decisions.

Iterative Testing
Accumulates evidence systematically

Systematically explores the flexibility of a hypothesis by iteratively testing adaptively solicited implications while adhering to rigorous statistical principles.

POPPER vs. Human Experts

Our framework matches human performance while dramatically reducing validation time.

POPPER vs Human Experts Comparison
9.7x Faster

POPPER completes hypothesis validation tasks 9.7 times faster than human experts, dramatically accelerating the research process.

3.6x More Code

POPPER generates 3.6 times more lines of code than human experts, enabling more comprehensive and thorough analysis.

2.5x More Tests

POPPER performs 2.5 times more statistical tests than human experts, providing more robust evidence for or against hypotheses.

See POPPER in Action

Watch our demo showcasing POPPER's capabilities for hypothesis validation in target validation scenarios.

Hypothesis Validation for Target Validation

See how POPPER validates hypotheses like "Gene A regulates Phenotype B" with statistical rigor and automated experimentation.

Ready to Accelerate Your Research?

Contact us to learn how POPPER can help validate your hypotheses and accelerate your biomedical discoveries.