Tiered storage: Moving old vectors to S3
Tiered storage keeps frequently accessed (hot) vectors on fast media—NVMe, local SSD, or RAM—and moves older or rarely accessed (cold) data to cheaper object storage such as S3. The vector database can still search the full dataset, but cold data may be loaded on demand or via a separate index, trading some latency for lower cost and scaling beyond RAM. This topic covers tiering policies, pipeline, and trade-offs.
Summary
- Hot data on fast media (NVMe, SSD, RAM); cold data moved to object storage (e.g. S3). Full dataset remains searchable; cold access trades latency for lower cost and scale beyond RAM.
- Tiering policies: by age, access frequency, or segment (e.g. segments older than 30 days or below query threshold → S3). Active index and recent immutable segments stay on fast storage.
- Some systems keep a compact index in memory and fetch only candidate vectors from disk/S3 for nearest neighbor refinement. Aligns with cloud-native compute–storage separation.
- Trade-offs: cold queries slower; backup and consistency must account for multiple tiers. Key for billion-scale, cost-efficient VDBs when not all data needs sub‑ms access.
- Practical tip: set tiering thresholds so most queries hit hot data; include all tiers in backup and disaster recovery; monitor cold-query latency and cost.
Tiering policies and layout
Policies typically tier by age, access frequency, or segment: e.g. segments older than 30 days or below a query threshold get written to S3 (or equivalent), while the active index and recent immutable segments stay on fast storage. Some systems keep a compact index in memory that points to vector data on disk or S3 and fetch only the candidate vectors needed for nearest neighbor refinement. This aligns with cloud-native separation of compute and storage.
Pipeline: new data lands on fast tier → background job or policy moves cold segments to S3 (and updates index to point to object store) → query path checks hot first, then may load cold segments on demand or search a separate cold index and merge results.
Trade-offs and operations
Cold queries can be slower; consistency and backup semantics must account for multiple tiers. Tiered storage is a key strategy for billion-scale and cost-efficient VDB deployments where not all data needs sub‑millisecond access.
Trade-off: lower cost and unlimited scale vs. higher and more variable latency for cold data. Practical tip: tune tiering thresholds (age, access count) so that the majority of queries are served from the hot tier; use monitoring to detect when cold path is too slow or expensive.
Frequently Asked Questions
How do I decide what is hot vs. cold?
Common criteria: age (e.g. last 7 days hot), access frequency (query count or recency), or segment creation time. Configure thresholds so that the majority of queries hit hot data; tune based on latency and cost goals.
Can I still run vector search over cold data?
Yes. The VDB can probe cold tiers (e.g. load segments from S3 or a separate on-disk index) and merge with hot results. Latency will be higher for cold segments; some systems support “search all tiers” with configurable timeout or limit.
How do backups work with tiered storage?
Backup and snapshot design must include all tiers (fast storage + object storage) and a consistent point-in-time view. WAL and segment references across tiers need to be coordinated; check your vendor’s disaster recovery docs.
Does tiered storage work with in-memory indexes?
Yes. Hot data can live in an in-memory index while cold data stays on disk or S3. The query path may hit both: in-memory for hot, on-demand load or separate index for cold. Same idea as in-memory vs. on-disk but with multiple tiers.