The Virtual Embryo Challenge
Generative modeling of mouse embryogenesis across space, scale, and time — under genetic perturbation.
Embryogenesis is fundamental — and largely unmodelled
A single fertilised cell becomes a complete organism through spatiotemporally coordinated gene regulation, cell-fate transitions, tissue morphogenesis, and organ formation. Disruptions cause congenital defects, which still affect 1 in 33 newborns and remain a leading cause of infant mortality.
Large embryo atlases and spatial-transcriptomics datasets give us snapshots, but they don't reveal how cell states transition, how local molecular changes propagate to tissue- and organ-level phenotypes, or how development responds to perturbation.
The Virtual Embryo Challenge establishes a standardised benchmark for predictive embryogenesis: a curated dataset, an evaluation pipeline, baseline models, and three tasks that jointly stress spatial context, multiscale reasoning, temporal dynamics, and perturbation response.
Three tasks, one shared atlas
Each task uses staged train / validation / hidden-test splits over the same whole-embryo + heart-focused resource. Hidden labels are never released; final rankings reflect generalisation to held-out stages, embryos, and genotypes.
Forecast the gene-expression distribution at unseen future stages from earlier single-cell data.
Predict expression + cell-type composition + 3D spatial organization jointly across stages from 4D MERFISH.
Predict mutant gene expression + 3D spatial organization under unseen genetic knock-outs.
Human-designed vs agent-designed, scored side by side
Both tracks address the same three tasks and are scored on the same metrics and hidden test sets. Prizes are awarded separately so the leaderboards directly contrast human-designed and agent-designed approaches to predictive embryogenesis modelling.
Methods designed and supervised by human participants. Algorithm/model design → submission → evaluation. Standard NeurIPS competition track.
Methods produced by coding agents or LLM-based evolutionary algorithms that iteratively select, mutate, and evaluate candidate solutions without human supervision. The agent system + complete evolutionary trace must be shared before prize evaluation.
Multimodal whole-embryo perturbation resource
~1 million cells across 11 developmental time points, spanning early gastrulation through cardiac progenitor emergence, heart-tube formation, looping, and later morphogenesis.
Whole-embryo per-cell RNA + chromatin accessibility (Multiome) across staged embryos from E6.75 to E12.5.
Coronal sections decoded into per-cell 3D positions plus measured RNA across the same developmental window — the 4D atlas powering Task 2.
Per-cell cell-type, tissue-domain, and anatomical-region calls plus morphology-derived features (when available).
β-catenin and Gata4 (E8.75) plus Mab21l2 (E9.5) — three cardiac developmental regulators with paired wild-type controls, profiled by 3D MERFISH.
Three metrics, automatic scoring, hidden labels
Scores are computed on held-out embryos after schema validation (gene order, cell-type vocabulary, coordinate convention, missing-value policy). Sub-scores per task; an overall composite for ranking.
Primarily pseudobulk Pearson correlation; complemented by MSE/MAE, gene-wise marginal-variance agreement, distributional distances (MMD, energy, sliced Wasserstein, FID-like), and DEG-overlap with the preceding stage. Bootstrap CIs across embryos.
A frozen cell-type probe classifier — trained by the organizers and locked before evaluation — assigns predicted-vs-observed cell-type proportions; compared via base-2 Jensen-Shannon divergence at global, regional, and per-condition levels.
Fused Gromov-Wasserstein distance combining expression similarity with spatial-structure preservation. Penalises predictions that get the marginals right but the geometry wrong. Augmented with neighborhood MMD / energy / sliced Wasserstein.
Three phases · launch → development → final
- 2026-06-30Site + submission portal + eval platform live
- 2026-07-20Starter kit released; website opens to participants
- 2026-07-30P1 · Test phase begins (workflow + leaderboard validation)
- 2026-08-15P2 · Development phase begins; validation dataset released
- 2026-10-25P3 · Final test phase begins (new held-out dataset)
- 2026-11-02Final submissions due; official evaluation starts
- 2026-11-18Winners announced at NeurIPS
$104K total from the Laude Institute Moonshots Seed Grant
Per track ($27K × 2): one $8K first prize, two $5K second prizes, three $3K third prizes. Tracks are scored on the same hidden tests but awarded separately.
15–20 grants for early-career researchers to attend the NeurIPS workshop.
Website, starter-kit repo, tutorials, reproducible walkthroughs, baseline documentation, participant communication channels.
Evaluation runs on the Stanford Sherlock and GenBio GPU clusters (NVIDIA H100 / H200 80 GB).
Explore the data that powers it
The challenge is grounded in the same atlas you can browse on this site: 3-D spatial-transcriptomics specimens by Theiler stage, a whole single-cell time-lapse from gastrula to birth, and the EMA anatomical references. Use them now to understand the modality coverage and stage spacing before the starter kit drops.
Common questions
More detailed answers — on submission format, data schema (gene order, cell-type vocabulary, coordinate convention, missing-value policy), and worked metric examples — land alongside the starter kit.
- Who can participate?
- Anyone — academic, industry, independent — who can submit a prediction file conforming to the required format. Each person joins one team; each team works on one track.
- What does the starter kit include?
- Baseline implementations for each task, data-loading utilities, evaluation scripts, example submission files, documentation, and reproducibility walkthroughs. Released two weeks before P1.
- How is cheating prevented?
- All ground-truth labels are hidden. Submission counts are limited in the final phase to reduce leaderboard probing. Duplicate registrations, unauthorized data use, code sharing across teams, or falsified results are grounds for disqualification.
- Can the Agent Team track use any framework?
- Yes — coding agents, recursive LLM systems, evolutionary search. To be eligible for prizes, the agent system and full evolutionary trace must be shared before prize evaluation.
- Is the data really released?
- Public training data + validation targets are released for method development. Hidden test ground truth (expression, cell-type composition, spatial organisation, perturbation outcomes) is withheld until competition close.
- Will the test data overlap with public atlases?
- Hidden splits are defined by developmental stage, perturbation condition, or combinations thereof — distinct from public splits. Models must learn generalisable developmental dynamics, not memorise.
Hosted by the Qiu Lab at Stanford University, in collaboration with researchers at Harvard, UC San Diego, MBZUAI, CMU, and GenBio. The full organising committee and contributor roles will be listed alongside the starter-kit release. The challenge is supported by the Laude Institute Moonshots Seed Grant.
Reach the organisers at the address below for questions about the competition, data, prize logistics, or partnerships. Public submission portal, GitHub repository, and discussion forum go live ~2 weeks before P1.
This page summarises the NeurIPS 2026 competition proposal currently under review. Dates, datasets, prize amounts, and exact metric formulations are subject to change between proposal acceptance and launch.