# Convert AnnData (.h5ad) to GRAVITY CSV GRAVITY expects a cellDancer-style long-format CSV containing spliced/unspliced counts, embeddings, and optional cluster labels. Use `gravity.export_intermediate_from_h5ad` to produce this CSV once per dataset. ```python from gravity import export_intermediate_from_h5ad export_intermediate_from_h5ad( input_h5ad="data/postprocessed.h5ad", output_csv="data/PancreaticEndocrinogenesis_cell_type_u_s.csv", retain_genes=["GCG", "INS2"], n_top_genes=1000, embed_key="X_umap", celltype_key="celltype", overwrite=True, ) ``` The helper performs: 1. Reading the AnnData file and checking that `spliced`/`unspliced` layers exist. 2. Computing the preprocessing steps needed to populate the spliced/unspliced count table. 3. Exporting a cellDancer-style CSV with embedded coordinates plus optional cluster labels. Keep large generated CSV files outside git and document their expected paths so subsequent pipeline runs can reuse them without recomputing AnnData steps. ## Gene Order By default, GRAVITY preserves the gene order found in the exported CSV. This is fine for training a new model, but pretrained checkpoints require the same gene index order used during their original run. If you plan to reuse a checkpoint, keep the checkpoint's `genes.txt` file and pass it later as `gene_order_path` when running the pipeline. For the provided pancreas reference checkpoints, use: ```python gene_order_path = "data/pancreas/reference_checkpoints/pancreas_genes.txt" ``` The gene set alone is not sufficient for checkpoint reproduction; the order also matters because model weights and attention matrices are indexed by gene position.