← Back to The Virtual Embryo Challenge
T3

Mutant perturbation prediction

Predict how a mutant embryo develops differently from wild type — both gene expression and 3D spatial organization — under genetic knock-outs the model has never seen. Train on one perturbation, validate on a second, test on a third with a different collection time. This is the causal-generalisation core of the challenge.

3D MERFISH from genetically perturbed embryos plus matched wild-type controls. Three knock-outs across two developmental time points: β-catenin (E8.75), Gata4 (E8.75), Mab21l2 (E9.5). All three are cardiac developmental regulators with characterised loss-of-function phenotypes.

What the model gets, what it must predict

Trainβ-catenin KO at E8.75 + wild-type controls

One observed perturbation. Wild-type MERFISH from the full window is also available as background.

ValGata4 KO at E8.75

Same developmental time as training perturbation but a different gene. Tests within-stage perturbation generalisation.

TestMab21l2 KO at E9.5

Different gene AND different developmental stage from training. Tests full perturbation × time generalisation. Ground truth hidden until competition close.

Wild-type MERFISH from the full developmental window plus the training mutant (β-catenin at E8.75) paired with its wild-type control. The model is asked to predict the held-out mutant spatial snapshot given only the gene identity and the matched wild-type stage data.

A 4D prediction of the mutant embryo: per-cell (x, y, z), predicted expression, predicted cell-type composition. Same format as Task 2.

Metrics & formulas

Final ranking is a composite over these sub-scores; sub-scores are also reported individually so participants can diagnose where their model is strong and weak. Hidden labels are never released; all evaluation runs on the organisers' platform.

Per-perturbation pseudobulk Pearson

Restricted to the cell types affected by the perturbation. Reported globally and per affected cell type. Per-perturbation reporting is the headline scoreboard for this task.

Cell-type composition shift (Jensen-Shannon divergence)

Base-2 JSD between predicted and observed mutant cell-type proportions. Uses the same frozen probe classifier as Task 2; reported at global and regional levels.

Fused Gromov-Wasserstein

Same definition as Task 2, applied to the mutant snapshot. Penalises predictions that miss the spatial pattern of the perturbation (e.g. heart-field shape change).

WT-vs-mutant change consistency

Rewards models that get the direction of the perturbation right, even when the absolute magnitudes are off.

File contract

Submissions are validated against this schema before scoring; mismatches result in a validation error rather than a low score. The starter kit ships a self-check script.

Format
AnnData .h5ad — same schema as Task 2 spatial submission
Perturbation
.uns["perturbation"] = "Mab21l2_KO" (test set)
Stage
.uns["stage"] = "E9.5" (test set)
Cells
Match released schema for the held-out mutant condition
Coordinates
µm, embryo-centred, axes documented

Starting points the starter kit will provide

  • WT-copy baseline: predict mutant = matched-stage wild type (sanity floor).
  • Linear perturbation delta: learned (mut − wt) shift on β-catenin transferred to test conditions.
  • Causal perturbation autoencoder (CPA-style) with gene-embedding conditioning.
  • Diffusion model on (position, expression) with perturbation-token conditioning.
  • LLM-prompted retrieval: condition on perturbation gene function and known phenotype from literature.
  • !Only one observed perturbation in training — models must factor the wild-type developmental trajectory from the genetic-perturbation effect with very little supervision.
  • !Test perturbation is at a different stage (E9.5 vs E8.75 training) so any time-confounded perturbation representation will overfit.
  • !The three KO genes target different cardiac developmental regulators with non-overlapping affected cell types — a model that just averages perturbation effects will be wrong everywhere.
  • !Spatial perturbation phenotypes are not always cell-autonomous: a KO can change where cells are without dramatically changing their expression, and FGW exposes that.
  • ×Treating β-catenin as a "generic perturbation" and copying its delta to other genes — will fail badly on FGW for Mab21l2.
  • ×Ignoring the E9.5 stage shift in the test set — predicting an E8.75-shaped embryo at E9.5.
  • ×Hand-engineering a gene-function feature from public databases without crediting the source — reproducibility checks will catch this.
  • ×Reporting WT-vs-mutant deltas globally without controlling for embryo-to-embryo variability.

Detailed task documents, the starter-kit repo, the submission portal, and the discussion forum land alongside the P1 launch (2026-07-30). Until then, reach the organisers at [email protected].