cache

Two-layer disk cache for sayt2 datasets.

Layer 1 — data freshness: tracks whether the index is up-to-date.

Expires after expire seconds, triggering a downloader + rebuild.

Layer 2 — query results: caches SearchResult objects keyed by

(query, limit). Invalidated whenever L1 expires and a rebuild happens.

Both layers live in a single diskcache.Cache instance, distinguished by key prefixes and linked by a shared tag for bulk eviction.

class sayt2.cache.DataSetCache(dir_cache: Path, dataset_name: str, schema_hash: str, expire: int | None = None)[source]

Manages a two-layer cache backed by diskcache.

Parameters:
  • dir_cache – Directory for the diskcache.Cache files.

  • dataset_name – Logical name of the dataset (e.g. "books").

  • schema_hash – Short hash of the field definitions — ensures that a schema change automatically invalidates all cached data.

  • expire – Seconds before L1 (data freshness) expires. None means “never expire automatically”.

is_fresh() bool[source]

Return True if the dataset index is still considered fresh.

mark_fresh() None[source]

Mark the dataset as fresh. Starts the L1 expiry countdown.

Called after a successful downloader() build_index() cycle.

get_query_result(query: str, limit: int) SearchResult | None[source]

Return the cached result for (query, limit), or None on miss.

Query results are always SearchResult objects (never None), so a None return unambiguously means cache miss.

set_query_result(query: str, limit: int, result: SearchResult) None[source]

Cache a query result. L2 entries never expire on their own — they are bulk-evicted when L1 triggers a rebuild via evict_all().

evict_all() None[source]

Remove all entries (L1 + L2) belonging to this dataset.

Called before a rebuild so that stale query results are not served.

close() None[source]

Close the underlying diskcache.Cache.