Temporal gene-expression distribution prediction
Predict the gene-expression distribution of an entire embryo at developmental stages the model has never seen, from earlier single-cell multi-omics observations. This is the time-extrapolation core of the challenge — models must understand developmental trajectories, not just interpolate between adjacent stages.
Single-cell multi-omics (Multiome RNA + ATAC) profiled across staged whole embryos from E6.75 to E12.5. Each cell contributes gene expression, chromatin-accessibility peaks, and developmental-stage metadata. The task uses the RNA modality as primary; ATAC is available as auxiliary signal.
What the model gets, what it must predict
Full single-cell multi-omics atlases for these three stages, including cell-type labels and lineage trajectories.
Released during P2 (development phase). Models can submit predictions for this stage and receive sub-scores via the public leaderboard for diagnostics.
Hidden throughout the competition. Predictions for E12.5 are scored only at the final phase; ground truth is never released until the competition closes.
A model receives the public training stages as input — per-cell RNA matrices, ATAC peak matrices, paired stage labels, and any cell-type / lineage annotations released with the starter kit. At inference time, the model is given only the target stage label and must emit a predicted per-cell expression matrix (or distribution) for that stage.
A prediction file containing the predicted single-cell expression matrix for the held-out stage: genes × cells with a fixed gene ordering, cell counts matching the released schema, and the project-standard cell-type label vocabulary.
Metrics & formulas
Final ranking is a composite over these sub-scores; sub-scores are also reported individually so participants can diagnose where their model is strong and weak. Hidden labels are never released; all evaluation runs on the organisers' platform.
x and y are mean-aggregated expression vectors over predicted vs observed cells of the target stage (or of a cell-type subset). Reported globally and per cell type, with bootstrap confidence intervals across embryos.
Catches absolute-magnitude mismatch that Pearson is invariant to (a perfectly scaled wrong prediction can score high on r). G = number of genes.
Compares the full predicted single-cell distribution to the observed distribution, not just the mean. The FID-like term uses a frozen expression embedding. Penalises mode collapse.
Jaccard index on differentially expressed gene sets between the target stage and the preceding observed stage. Checks whether the predicted stage shows the right developmental change vs the prior observed stage.
File contract
Submissions are validated against this schema before scoring; mismatches result in a validation error rather than a low score. The starter kit ships a self-check script.
- Format
- AnnData .h5ad with X = predicted RNA expression
- Cells
- Match released cell count for the target stage
- Genes
- Project-standard gene order (provided in starter kit)
- Layers
- X = predicted log-normalised; optional `raw` layer for counts
- Obs columns
- `cell_type` (required), `lineage` (optional)
Starting points the starter kit will provide
- ▸Nearest-stage carry-forward: predict E12.5 = E9.5 distribution (sanity floor).
- ▸Linear extrapolation per gene across the three observed stages (no cell structure).
- ▸Conditional VAE / flow trained on observed stages with stage as condition.
- ▸Diffusion model with developmental-time conditioning and cell-type-aware decoder.
- ▸Foundation-model fine-tune: e.g. Geneformer / scGPT-style backbone with stage prompt.
- !E10.5 → E12.5 spans cardiac looping and major organ-system emergence — cell-type composition shifts dramatically, with whole new lineages appearing.
- !Models must produce a full distribution, not a point estimate; collapsing to a mean expression vector forfeits the distributional sub-scores.
- !No interpolation safety net: there is no observed stage between E9.5 and E12.5, so naive temporal interpolation fails.
- !Cell-type proportions in the target stage are not given as input — the model must infer composition as well as gene-level expression.
- ×Predicting a tightly clustered distribution (mode-collapse) — scores well on pseudobulk Pearson but tanks on MMD / sliced Wasserstein.
- ×Forgetting to include emerging cell types absent from training stages.
- ×Re-using training-set cell counts verbatim instead of matching the target-stage schema.
- ×Using ATAC as primary signal without aligning peak coordinates to the released schema.
Detailed task documents, the starter-kit repo, the submission portal, and the discussion forum land alongside the P1 launch (2026-07-30). Until then, reach the organisers at [email protected].