Genomi

Libraries

Genomi install purposes, local public reference libraries, managed tools, and network-backed source behavior.

Genomi manages every external data source it uses through one consistent model. Downloadable reference libraries are cached under GENOMI_HOME and tracked so an update re-fetches only what changed upstream; live public APIs (gnomAD, PGS Catalog, and similar) are queried directly and never cached locally. Install, update, and on-demand use during a query all go through the same path, so a library behaves the same however it was first materialized.

A library is only downloaded with your consent. When a tool needs one that is not installed, it returns a requires_library_install state with the exact install command instead of silently fetching it. When a live API source cannot be reached, the tool returns a source_unavailable state rather than treating the gap as biological evidence.

Install purposes

Install a purpose:

python3 scripts/install_for_agents.py --libraries common-questions

Install exact libraries:

python3 scripts/install_for_agents.py --libraries clinvar-grch38,hpo,gencc

Current local libraries

LibraryEnablesTypical size
clinvar-grch38GRCh38 ClinVar exact allele matching and candidate triage~180 MB
clinvar-grch37GRCh37 ClinVar exact allele matching and candidate triage~180 MB
hpoHPO phenotype-to-gene and disease annotation~100 MB
genccGenCC gene-disease validity records~25 MB
reference-grch38GRCh38 reference FASTA and .fai~3.2 GB
reference-grch37GRCh37/hg19 reference FASTA and .fai~3.1 GB
gencode-grch38GENCODE v49 transcript annotation for GRCh38~100 MB
gencode-grch37GENCODE v49lift37 transcript annotation for GRCh37~100 MB
encode-ccre-grch38ENCODE SCREEN cCRE BED for GRCh38~30 MB
panglaodb-markersPanglaoDB cell-type marker table~5 MB
cellmarker-humanCellMarker 2.0 human marker table normalized for Genomi~10 MB
pharmcatPharmCAT JAR for broad pharmacogenomic calling~30 MB
ancestry-1000g-30x-grch381000 Genomes 30x GRCh38 compact ancestry PCA panel~3 MB
liftover-chainsUCSC GRCh37/GRCh38 liftover chain files~3 MB
ancestry-1000g-30x-grch37GRCh37 ancestry panel built locally from the GRCh38 panel and liftover chains~3 MB
minimap2-binaryLong-read FASTQ alignment path on supported platforms~5 MB
bwa-mem2-binaryShort-read FASTQ alignment path on supported platforms~50 MB
msigdb-hallmarkMSigDB Hallmark pathway members from an official user-supplied GMTuser supplied

msigdb-hallmark is not part of any default purpose because its license requires the user to provide the official GMT export.

Updating libraries

genomi update (an alias of genomi install) refreshes everything that can be updated. For each installed library it runs a conditional check against the upstream source and re-downloads only what actually changed — rolling sources like ClinVar pick up new releases, while unchanged caches transfer nothing. Missing libraries in the selected set are fetched. A plain genomi update defaults to everything, so it refreshes all default libraries and fills any that are missing.

Pass --force to re-download unconditionally regardless of freshness. msigdb-hallmark is never auto-refreshed because it has no public source — re-supply its GMT export to update it.

Checking library state

Call the base MCP tool:

genomi.check_libraries({})

Pass a libraries array to check a subset. Results include install status, missing paths, how the library helps, and the installer command to fetch it. Live network-backed sources (gnomAD, PGS Catalog, PGxDB, and the FDA pharmacogenomic tables) also appear in the inventory as online entries: they are never cached locally, so they report as available without a download.

Missing libraries

If a tool returns status="requires_library_install", the host should:

  1. Explain what the named library enables for the current request.
  2. Ask before installing it.
  3. Run the returned install command only after approval.
  4. Avoid treating the missing library as evidence that a variant, condition, or drug response is absent.

Network-backed sources

Some operations use public APIs or source pages instead of local libraries. Examples include population frequency lookup from gnomAD, pathway retrieval from Reactome or KEGG, Human Protein Atlas marker retrieval, and some live pharmacogenomic source fetches.

These are declared in operation metadata. If the source is unavailable, Genomi returns a source-availability gap rather than silently fabricating or treating the absence as biological evidence.