Libraries
Genomi install purposes, local public reference libraries, managed tools, and network-backed source behavior.
Genomi manages every external data source it uses through one consistent model.
Downloadable reference libraries are cached under GENOMI_HOME and tracked so
an update re-fetches only what changed upstream; live public APIs (gnomAD, PGS
Catalog, and similar) are queried directly and never cached locally. Install,
update, and on-demand use during a query all go through the same path, so a
library behaves the same however it was first materialized.
A library is only downloaded with your consent. When a tool needs one that is
not installed, it returns a requires_library_install state with the exact
install command instead of silently fetching it. When a live API source cannot
be reached, the tool returns a source_unavailable state rather than treating
the gap as biological evidence.
Install purposes
Install a purpose:
python3 scripts/install_for_agents.py --libraries common-questionsInstall exact libraries:
python3 scripts/install_for_agents.py --libraries clinvar-grch38,hpo,genccCurrent local libraries
| Library | Enables | Typical size |
|---|---|---|
clinvar-grch38 | GRCh38 ClinVar exact allele matching and candidate triage | ~180 MB |
clinvar-grch37 | GRCh37 ClinVar exact allele matching and candidate triage | ~180 MB |
hpo | HPO phenotype-to-gene and disease annotation | ~100 MB |
gencc | GenCC gene-disease validity records | ~25 MB |
reference-grch38 | GRCh38 reference FASTA and .fai | ~3.2 GB |
reference-grch37 | GRCh37/hg19 reference FASTA and .fai | ~3.1 GB |
gencode-grch38 | GENCODE v49 transcript annotation for GRCh38 | ~100 MB |
gencode-grch37 | GENCODE v49lift37 transcript annotation for GRCh37 | ~100 MB |
encode-ccre-grch38 | ENCODE SCREEN cCRE BED for GRCh38 | ~30 MB |
panglaodb-markers | PanglaoDB cell-type marker table | ~5 MB |
cellmarker-human | CellMarker 2.0 human marker table normalized for Genomi | ~10 MB |
pharmcat | PharmCAT JAR for broad pharmacogenomic calling | ~30 MB |
ancestry-1000g-30x-grch38 | 1000 Genomes 30x GRCh38 compact ancestry PCA panel | ~3 MB |
liftover-chains | UCSC GRCh37/GRCh38 liftover chain files | ~3 MB |
ancestry-1000g-30x-grch37 | GRCh37 ancestry panel built locally from the GRCh38 panel and liftover chains | ~3 MB |
minimap2-binary | Long-read FASTQ alignment path on supported platforms | ~5 MB |
bwa-mem2-binary | Short-read FASTQ alignment path on supported platforms | ~50 MB |
msigdb-hallmark | MSigDB Hallmark pathway members from an official user-supplied GMT | user supplied |
msigdb-hallmark is not part of any default purpose because its license
requires the user to provide the official GMT export.
Updating libraries
genomi update (an alias of genomi install) refreshes everything that can be
updated. For each installed library it runs a conditional check against the
upstream source and re-downloads only what actually changed — rolling sources
like ClinVar pick up new releases, while unchanged caches transfer nothing.
Missing libraries in the selected set are fetched. A plain genomi update
defaults to everything, so it refreshes all default libraries and fills any
that are missing.
Pass --force to re-download unconditionally regardless of freshness.
msigdb-hallmark is never auto-refreshed because it has no public source —
re-supply its GMT export to update it.
Checking library state
Call the base MCP tool:
genomi.check_libraries({})Pass a libraries array to check a subset. Results include install status,
missing paths, how the library helps, and the installer command to fetch it.
Live network-backed sources (gnomAD, PGS Catalog, PGxDB, and the FDA
pharmacogenomic tables) also appear in the inventory as online entries: they
are never cached locally, so they report as available without a download.
Missing libraries
If a tool returns status="requires_library_install", the host should:
- Explain what the named library enables for the current request.
- Ask before installing it.
- Run the returned install command only after approval.
- Avoid treating the missing library as evidence that a variant, condition, or drug response is absent.
Network-backed sources
Some operations use public APIs or source pages instead of local libraries. Examples include population frequency lookup from gnomAD, pathway retrieval from Reactome or KEGG, Human Protein Atlas marker retrieval, and some live pharmacogenomic source fetches.
These are declared in operation metadata. If the source is unavailable, Genomi returns a source-availability gap rather than silently fabricating or treating the absence as biological evidence.