Convert AnnData (.h5ad) to GRAVITY CSV¶
GRAVITY expects a cellDancer-style long-format CSV containing
spliced/unspliced counts, embeddings, and optional cluster labels. Use
gravity.export_intermediate_from_h5ad to produce this CSV once per dataset.
from gravity import export_intermediate_from_h5ad
export_intermediate_from_h5ad(
input_h5ad="data/postprocessed.h5ad",
output_csv="data/PancreaticEndocrinogenesis_cell_type_u_s.csv",
retain_genes=["GCG", "INS2"],
n_top_genes=1000,
embed_key="X_umap",
celltype_key="celltype",
overwrite=True,
)
The helper performs:
Reading the AnnData file and checking that
spliced/unsplicedlayers exist.Computing the preprocessing steps needed to populate the spliced/unspliced count table.
Exporting a cellDancer-style CSV with embedded coordinates plus optional cluster labels.
Keep large generated CSV files outside git and document their expected paths so subsequent pipeline runs can reuse them without recomputing AnnData steps.
Gene Order¶
By default, GRAVITY preserves the gene order found in the exported CSV. This is
fine for training a new model, but pretrained checkpoints require the same gene
index order used during their original run. If you plan to reuse a checkpoint,
keep the checkpoint’s genes.txt file and pass it later as gene_order_path
when running the pipeline.
For the provided pancreas reference checkpoints, use:
gene_order_path = "data/pancreas/reference_checkpoints/pancreas_genes.txt"
The gene set alone is not sufficient for checkpoint reproduction; the order also matters because model weights and attention matrices are indexed by gene position.