lookup_cache
Thread-safe, disk-backed key-value cache for external API lookups.
Designed to be shared by: - Gene sequence lookups (Ensembl, RefSeq)
→ namespaces
ensembl_sequences/refseq_sequences
SMILES lookups (future) → namespace
smiles_metaboliteAny other external API calls that are expensive and should survive process restarts
Cache files live in {project_root}/.lookup_cache/ by default, or in a directory
set by the VmaxBuilder_CACHE_DIR environment variable. Each namespace is a separate
JSON file, e.g. .lookup_cache/ensembl_sequences.json.
Thread-safety
All public methods acquire an internal threading.Lock so the cache can safely
be used from ThreadPoolExecutor workers that call set() concurrently.
Atomic writes
Saves go via {file}.tmp → Path.replace() so a crash mid-write never
leaves a corrupt cache file.
Usage example
from src.VmaxBuilder.utils.lookup_cache import LookupCache, get_default_cache_dir
cache = LookupCache(get_default_cache_dir(), "ensembl_sequences")
key = sequence_cache_key("homo_sapiens", "ENSG00000139618", "canonical_only")
if key not in cache:
result = expensive_api_call(...)
cache.set(key, gene_result_to_dict(result)) # saved to disk immediately
data = cache.get(key) # returns the stored dict
Classes
|
Container for sequence lookup results for one gene symbol. |
|
Thread-safe, disk-backed key-value store for a single namespace. |
|
JSON-friendly record describing one retrieved sequence. |
Functions
|
Generated: validation needed. |
|
Generated: validation needed. |
Resolve the default cache directory. |
|
|
Canonical cache key for a |
|
Canonical cache key for a metabolite SMILES lookup. |