Mutant perturbation prediction
Predict how a mutant embryo develops differently from wild type — both gene expression and 3D spatial organization — under genetic knock-outs the model has never seen. Train on one perturbation, validate on a second, test on a third with a different collection time. This is the causal-generalisation core of the challenge.
3D MERFISH from genetically perturbed embryos plus matched wild-type controls. Three knock-outs across two developmental time points: β-catenin (E8.75), Gata4 (E8.75), Mab21l2 (E9.5). All three are cardiac developmental regulators with characterised loss-of-function phenotypes.
What the model gets, what it must predict
One observed perturbation. Wild-type MERFISH from the full window is also available as background.
Same developmental time as training perturbation but a different gene. Tests within-stage perturbation generalisation.
Different gene AND different developmental stage from training. Tests full perturbation × time generalisation. Ground truth hidden until competition close.
Wild-type MERFISH from the full developmental window plus the training mutant (β-catenin at E8.75) paired with its wild-type control. The model is asked to predict the held-out mutant spatial snapshot given only the gene identity and the matched wild-type stage data.
A 4D prediction of the mutant embryo: per-cell (x, y, z), predicted expression, predicted cell-type composition. Same format as Task 2.
Metrics & formulas
Final ranking is a composite over these sub-scores; sub-scores are also reported individually so participants can diagnose where their model is strong and weak. Hidden labels are never released; all evaluation runs on the organisers' platform.
Restricted to the cell types affected by the perturbation. Reported globally and per affected cell type. Per-perturbation reporting is the headline scoreboard for this task.
Base-2 JSD between predicted and observed mutant cell-type proportions. Uses the same frozen probe classifier as Task 2; reported at global and regional levels.
Same definition as Task 2, applied to the mutant snapshot. Penalises predictions that miss the spatial pattern of the perturbation (e.g. heart-field shape change).
Rewards models that get the direction of the perturbation right, even when the absolute magnitudes are off.
File contract
Submissions are validated against this schema before scoring; mismatches result in a validation error rather than a low score. The starter kit ships a self-check script.
- Format
- AnnData .h5ad — same schema as Task 2 spatial submission
- Perturbation
- .uns["perturbation"] = "Mab21l2_KO" (test set)
- Stage
- .uns["stage"] = "E9.5" (test set)
- Cells
- Match released schema for the held-out mutant condition
- Coordinates
- µm, embryo-centred, axes documented
Starting points the starter kit will provide
- ▸WT-copy baseline: predict mutant = matched-stage wild type (sanity floor).
- ▸Linear perturbation delta: learned (mut − wt) shift on β-catenin transferred to test conditions.
- ▸Causal perturbation autoencoder (CPA-style) with gene-embedding conditioning.
- ▸Diffusion model on (position, expression) with perturbation-token conditioning.
- ▸LLM-prompted retrieval: condition on perturbation gene function and known phenotype from literature.
- !Only one observed perturbation in training — models must factor the wild-type developmental trajectory from the genetic-perturbation effect with very little supervision.
- !Test perturbation is at a different stage (E9.5 vs E8.75 training) so any time-confounded perturbation representation will overfit.
- !The three KO genes target different cardiac developmental regulators with non-overlapping affected cell types — a model that just averages perturbation effects will be wrong everywhere.
- !Spatial perturbation phenotypes are not always cell-autonomous: a KO can change where cells are without dramatically changing their expression, and FGW exposes that.
- ×Treating β-catenin as a "generic perturbation" and copying its delta to other genes — will fail badly on FGW for Mab21l2.
- ×Ignoring the E9.5 stage shift in the test set — predicting an E8.75-shaped embryo at E9.5.
- ×Hand-engineering a gene-function feature from public databases without crediting the source — reproducibility checks will catch this.
- ×Reporting WT-vs-mutant deltas globally without controlling for embryo-to-embryo variability.
Detailed task documents, the starter-kit repo, the submission portal, and the discussion forum land alongside the P1 launch (2026-07-30). Until then, reach the organisers at [email protected].