Weaviate
Open-source vector database with built-in ML and GraphQL.
Publisher review
Weaviate is an open-source vector database designed to unify search, retrieval-augmented generation, and AI agent workflows into a single data layer. Built in Amsterdam and maintained under a BSD-3 license, it has become a primary reference implementation for teams evaluating vector-native data platforms in production.
The core distinction is Weaviate's native hybrid search: a single query combines vector similarity, BM25 keyword matching, and metadata filtering without separate pipeline passes. This matters in practice. Teams using pure vector-only systems often discover that exact matches—product SKUs, error codes, proper nouns—vanish from results, forcing workarounds. Weaviate solves this at the database level.
The platform supports multiple deployment models: self-hosted (Docker, Kubernetes), managed cloud (Flex, Professional, Premium tiers), and embedded evaluation. This flexibility appeals to regulated industries, SaaS teams needing multi-tenancy, and organizations unwilling to vendor-lock critical data to a third-party API. Weaviate's module ecosystem—pluggable vectorizers, embedding models, and rerankers—reduces boilerplate when models evolve. Teams can swap embedding providers (OpenAI, Cohere, local models) or add reranking without rewriting application code.
The trade-offs are real. Setup complexity exceeds Pinecone's serverless simplicity. The GraphQL API has a learning curve. Self-hosting consumes Java runtime resources. Built-in vectorization adds latency and external API calls; teams seeking full pipeline control may prefer externally-managed embeddings. October 2025 pricing restructure moved Weaviate from AI Units to dimension-based billing: Flex ($45/month minimum), Professional ($280/month annual), Premium ($400/month+). Self-hosted remains free. Cost advantages emerge at scale—above 5M vectors, self-hosting often undercuts managed competitors. Community presence is strong in Europe, particularly among Dutch companies favoring open-source infrastructure.
How it works
-
Hybrid search
Vector similarity, BM25 keyword matching, and metadata filtering combined in a single query without separate ranking passes.
-
Modular vectorization
Swap embedding models, vectorizers, and rerankers (OpenAI, Cohere, local) without application code changes as models evolve.
-
Self-hosted deployment
Docker, Kubernetes, and on-premise options for data sovereignty and regulated environments; free BSD-3 license.
-
Multi-modal retrieval
Text, images, and audio stored in shared vector space for unified semantic search across media types.
-
GraphQL and REST APIs
Flexible query interfaces alongside SDKs for Python, Go, TypeScript, and JavaScript.
-
Native multi-tenancy
Built-in tenant isolation without separate schema management, simplifying SaaS architecture.
-
RAG integration
Direct integration with LLM workflows for retrieval-augmented generation without external orchestration.
Strengths and trade-offs
Strengths
- Hybrid search combines vector + keyword + metadata filtering in a single query—eliminates separate ranking passes and catches exact matches pure vector systems miss.
- Open-source (BSD-3) with self-hosting option—eliminates vendor lock-in, enables data sovereignty, and removes API cost scaling at large vector counts.
- Modular architecture swaps embedding models and rerankers without application rebuild; supports multi-modal data in shared vector space.
Trade-offs
- Self-hosted setup requires Java runtime expertise and more infrastructure management than managed competitors like Pinecone; GraphQL API steeper learning curve than REST or SQL.
- Built-in vectorization adds external API calls and latency; teams optimizing for pipeline control or pure throughput may need externally-managed embeddings.
- October 2025 pricing restructure increased minimum managed cloud entry point ($45/month Flex vs prior Serverless); annual commitment required for Professional tier discounts.
Pricing context
Weaviate offers free self-hosting (BSD-3 license) and a 14-day free cloud sandbox. Managed Cloud tiers start at $45/month minimum (Flex, shared cloud, 99.5% SLA) with usage-based overage charges (~$0.01–$0.02 per million vector dimensions), then Professional ($280/month annual, 99.9% SLA, SOC 2), and Premium (custom contract, BYOC, HIPAA). Billing adds per-GiB object storage and backup retention charges.
October 2025 replaced the prior AI Units model with dimension-based pricing. Self-hosted remains unbounded and free.
Getting started with Weaviate
-
Deploy or access Weaviate
Choose self-hosted (Docker, Kubernetes, free BSD-3 license) or managed cloud (Flex $45/month, Professional, Premium tiers). Deploy locally via Docker or create a managed cloud account. Both give you a running Weaviate instance with REST and GraphQL APIs.
-
Configure vectorization module
Select your embedding model (OpenAI, Cohere, or local) and provide API credentials if needed. Weaviate uses this module to automatically vectorize your data during ingestion. This modular approach lets you swap embedding providers later without rewriting application code.
-
Create schema and classes
Define your data structure by creating object classes and properties. Specify vectorization settings and which fields support hybrid search (vector + keyword + metadata filtering). This schema determines how Weaviate stores and searches your data.
-
Ingest data via APIs
Import data (text, images, audio) using REST/GraphQL APIs or SDKs (Python, Go, TypeScript, JavaScript). Weaviate automatically vectorizes content according to your schema. Monitor ingestion via API responses or logs to confirm data is stored and ready to search.
-
Run hybrid search queries
Run hybrid search queries combining vector similarity, BM25 keyword matching, and metadata filters in one pass. Test with business queries to verify exact matches (SKUs, IDs) and semantic similarity work together. Use successful queries as templates for production deployment and monitoring.
Frequently Asked Questions
What is Weaviate?
Weaviate is an open-source vector database combining vector similarity, keyword matching, and metadata filtering in single queries. It integrates retrieval-augmented generation and AI agent workflows, supporting multiple flexible deployment models: self-hosted Docker/Kubernetes, managed cloud tiers, and embedded evaluation platforms.
What is hybrid search in Weaviate?
Hybrid search in Weaviate combines vector similarity, BM25 keyword matching, and metadata filtering in a single query without separate ranking passes. This catches exact matches—SKUs, error codes, proper nouns—that pure vector-only systems miss, solving a critical production search problem.
What deployment options does Weaviate offer?
Weaviate supports three deployment models: self-hosted Docker and Kubernetes for data sovereignty in regulated industries, managed cloud with Flex, Professional, and Premium tiers, and embedded evaluation. This flexibility eliminates vendor lock-in and lets organizations choose based on compliance, cost, and operational capacity.
How much does Weaviate cost?
Weaviate offers free self-hosted deployment under BSD-3 license and managed cloud tiers starting at $45/month (Flex). Professional tier costs $280/month annually with 99.9% SLA; Premium requires custom contract. Usage-based billing charges approximately $0.01–$0.02 per million vector dimensions after October 2025 restructure.
When is self-hosted Weaviate more cost-effective?
Self-hosted Weaviate becomes cost-effective above 5 million vectors, where infrastructure expenses typically undercut managed competitors like Pinecone. For smaller deployments or teams prioritizing simplicity, managed cloud tiers remain viable. Scale, data sovereignty, and operational capacity determine the optimal choice for your organization.
What are Weaviate's main operational trade-offs?
Self-hosted Weaviate requires Java runtime expertise and more infrastructure management than managed competitors. GraphQL API has a steeper learning curve than REST. Built-in vectorization adds external API calls and latency; teams optimizing for pipeline control may prefer externally-managed embeddings instead.
Alternatives in this category
Integrations
How Weaviate compares
Direct head-to-head against 3 competitors. Picked by 7wData.
Weaviate
- Pricing
- Weaviate offers free self-hosting (BSD-3 license) and a 14-day free cloud sandbox. Managed Cloud tiers start at $45/month minimum (Flex, shared cloud, 99.5% SLA) with usage-based overage charges (~$0.01–$0.02 per million vector dimensions), then Professional ($280/month annual, 99.9% SLA, SOC 2), and Premium (custom contract, BYOC, HIPAA). Billing adds per-GiB object storage and backup retention charges. October 2025 replaced the prior AI Units model with dimension-based pricing. Self-hosted remains unbounded and free.
- Target
- Weaviate is an open-source vector database designed to unify search, retrieval-augmented generation, and AI agent workflows into a single data layer.
- Deployment
- self-hosted
- Strength
- Hybrid search combines vector + keyword + metadata filtering in a single query—eliminates separate ranking passes and catches exact matches pure vector systems miss.
- Watch for
- Self-hosted setup requires Java runtime expertise and more infrastructure management than managed competitors like Pinecone; GraphQL API steeper learning curve than REST or SQL.
Pinecone
- Pricing
- Free (1 index). Builder $20/mo. Standard $50/mo minimum. Enterprise $500/mo minimum. Serverless: $0.0000004 per write unit, $3.60/GB/mo storage.
- Target
- Teams building RAG pipelines and semantic search who want zero-ops managed vector storage without provisioning clusters.
- Deployment
- Fully managed SaaS only. No self-host option. Serverless default, pod-based indexes are legacy.
- Strength
- Serverless architecture scales to zero on idle, sub-second cold start, pre-built connectors for LangChain and OpenAI.
- Watch for
- Costs escalate sharply at scale: read-heavy workloads at 100M vectors exceed $700/mo. No self-host path means full vendor lock-in.
MongoDB Atlas Vector Search
- Pricing
- Free (M0, 512 MB). Flex pay-as-you-go capped ~$30/mo. Dedicated M10+ from ~$57/mo. Search Nodes billed separately per node-hour.
- Target
- Engineering teams already on MongoDB who want to add vector search without running a separate database system.
- Deployment
- Fully managed on AWS, GCP, Azure. On-premises not production-supported.
- Strength
- Unified cluster handles transactional and vector workloads together, with BM25 hybrid search at no separate product fee.
- Watch for
- Performance degrades past 10-20M vectors. Search Node costs are additive and rise steeply with high-dimensional embeddings.
Qdrant
- Pricing
- Self-hosted: free (open source). Cloud free tier: $0 (1 GB RAM). Standard: ~$57/mo per GB RAM. Premium and Hybrid: Custom/Contact sales.
- Target
- Teams with data-residency requirements or cost pressure at scale who need a self-hostable, Rust-based vector store.
- Deployment
- Open-source self-hosted, managed cloud (AWS/GCP/Azure), Hybrid Cloud (your infra, Qdrant-operated).
- Strength
- Rust implementation delivers low memory footprint and high throughput on dense and sparse vectors, with resource-based billing avoiding per-query charges.
- Watch for
- Premium and Hybrid pricing are sales-gated with no public rates. Free tier is single-node only. RAM-based billing spikes sharply at scale.
User reviews
No user reviews yet. Be the first to write one.
Sources
Reporting on this tool draws on these publicly available sources.
- weaviate.io — Current pricing tiers (Flex $45/month, Professional $280/month, Premium), usage-based billing model, October 2025 restructure details.
- docs.weaviate.io — Core features: hybrid search, RAG support, multi-deployment options (Docker, Kubernetes, Cloud), GraphQL/REST APIs, multi-modal support.
- pecollective.com — Weaknesses (setup complexity, learning curve, steeper operational overhead); strengths (hybrid search, self-hosting, cost at scale); comparison to Pinecone trade-offs.
- www.marktechpost.com — Architecture (Java runtime resource consumption, built-in vectorization latency), trade-offs (hybrid search vs pipeline control), October 2025 pricing model, when Weaviate fits best.
- dzone.com — Production deployment differences, infrastructure model trade-offs (Weaviate as infrastructure vs Pinecone as service), operational complexity, self-hosting cost advantages above 5M vectors.