pgvector vs managed vector database for semantic search
Comparing pgvector vs managed vector database for semantic search: setup effort, scaling concerns, filtering support, and fit in a typical app stack.

What problem semantic search solves in a business app
Semantic search helps people find the right answer even when they donât use the ârightâ keywords. Instead of matching exact words, it matches meaning. Someone who types âreset my loginâ should still see an article titled âChange your password and sign in againâ because the intent is the same.
Keyword search breaks down in business apps because real users are inconsistent. They use shorthand, make typos, mix product names, and describe symptoms instead of official terms. In FAQs, support tickets, policy docs, and onboarding guides, the same issue shows up in many different phrasings. A keyword engine often returns nothing useful, or a long list that forces people to click around.
Embeddings are the usual building block. Your app turns each document (an article, a ticket, a product note) into a vector, a long list of numbers that represents meaning. When a user asks a question, you embed the question too and look for the closest vectors. A âvector databaseâ is simply where you store those vectors and how you search them quickly.
In a typical business stack, semantic search touches four areas: the content store (knowledge base, docs, ticket system), the embedding pipeline (imports plus updates when content changes), the query experience (search box, suggested answers, agent assist), and the guardrails (permissions plus metadata like team, customer, plan, and region).
For most teams, âgood enoughâ beats perfect. The practical target is relevance on the first try, responses under a second, and costs that stay predictable as content grows. That goal matters more than debating tools.
Two common options: pgvector and managed vector databases
Most teams end up choosing between two patterns for semantic search: keep everything inside PostgreSQL with pgvector, or add a separate managed vector database next to your main database. The right choice depends less on âwhich is betterâ and more on where you want complexity to live.
pgvector is a PostgreSQL extension that adds a vector data type and indexes so you can store embeddings in a normal table and run similarity search with SQL. In practice, your documents table might include text, metadata (customer_id, status, visibility), plus an embedding column. Search becomes âembed the query, then return rows whose embeddings are closest.â
A managed vector database is a hosted service built mainly for embeddings. It usually gives you APIs for inserting vectors and querying by similarity, plus operational features youâd otherwise build yourself.
Both options do the same core job: store embeddings with an ID and metadata, find the nearest neighbors for a query, and return top matches so your app can show relevant items.
The key difference is the system of record. Even if you use a managed vector database, you almost always keep PostgreSQL for business data: accounts, permissions, billing, workflow state, and audit logs. The vector store becomes a retrieval layer, not where you run the whole app.
A common architecture looks like this: keep the authoritative record in Postgres, store embeddings either in Postgres (pgvector) or in the vector service, run a similarity search to get matching IDs, then fetch full rows from Postgres.
If you build apps in a platform like AppMaster, PostgreSQL is already a natural home for structured data and permissions. The question becomes whether embedding search should live there too, or sit in a specialized service while Postgres stays the source of truth.
Setup effort: what you actually have to do
Teams often pick based on features and then get surprised by the day-to-day work. The real decision is where you want the complexity: inside your existing Postgres setup, or in a separate service.
With pgvector, youâre adding vector search to the database you already run. The setup is usually straightforward, but itâs still database work, not just application code.
A typical pgvector setup includes enabling the extension, adding an embedding column, creating an index that matches your query pattern (and tuning it later), deciding how embeddings update when content changes, and writing similarity queries that also apply your normal filters.
With a managed vector database, you create a new system next to your main database. That can mean less SQL, but more integration glue.
A typical managed setup includes creating an index (dimensions and distance metric), wiring API keys into your secrets, building an ingestion job to push embeddings and metadata, keeping a stable ID mapping between app records and vector records, and locking down network access so only your backend can query it.
CI/CD and migrations differ too. pgvector fits naturally into your existing migrations and review process. Managed services shift changes into code plus admin settings, so youâll want a clear process for configuration changes and for reindexing.
Ownership usually follows the choice. pgvector leans on app dev plus whoever owns Postgres (sometimes a DBA). A managed service is often owned by a platform team, with app devs handling ingestion and query logic. Thatâs why this decision is as much about team structure as it is about technology.
Filtering and permissions: the make-or-break detail
Semantic search only helps if it respects what a user is allowed to see. In a real business app, every record has metadata next to the embedding: org_id, user_id, role, status (open, closed), and visibility (public, internal, private). If your search layer canât filter on that metadata cleanly, youâll get confusing results or, worse, data leaks.
The biggest practical difference is filtering before vs after the vector search. Filtering after sounds simple (search everything, then drop forbidden rows), but it fails in two common ways. First, the best matches might get removed, leaving you with worse results. Second, it increases security risk if any part of the pipeline logs, caches, or exposes unfiltered results.
With pgvector, vectors live in PostgreSQL alongside metadata, so you can apply permissions in the same SQL query and let Postgres enforce them.
PostgreSQL: permissions and joins are native
If your app already uses Postgres, pgvector often wins on simplicity: search can be âjust another query.â You can join across tickets, customers, and memberships, and you can use Row Level Security so the database itself blocks unauthorized rows.
A common pattern is to narrow the candidate set with org and status filters, then run vector similarity on whatâs left, optionally mixing in keyword matching for exact identifiers.
Managed vector DB: filters vary, permissions are usually on you
Most managed vector databases support metadata filters, but the filter language can be limited, and joins arenât a thing. You typically denormalize metadata into each vector record and re-implement permission checks in your application.
For hybrid search in business apps, you usually want all of these to work together: hard filters (org, role, status, visibility), keyword match (exact terms like an invoice number), vector similarity (meaning), and ranking rules (boost recent or open items).
Example: a support portal built in AppMaster can keep tickets and permissions in PostgreSQL, making it straightforward to enforce âagent sees only their orgâ while still getting semantic matches on ticket summaries and replies.
Search quality and performance basics
Search quality is the mix of relevance (are results actually useful?) and speed (does it feel instant?). With both pgvector and a managed vector database, you usually trade a bit of relevance for lower latency by using approximate search. That tradeoff is often fine for business apps, as long as you measure it with real queries.
At a high level, you tune three things: the embedding model (what âmeaningâ looks like), the index settings (how hard the engine searches), and the ranking layer (how results are ordered once you add filters, recency, or popularity).
In PostgreSQL with pgvector, you typically pick an index like IVFFlat or HNSW. IVFFlat is faster and lighter to build, but you need to tune how many âlistsâ it uses and you generally want enough data before it shines. HNSW often gives better recall at low latency, but it can use more memory and takes longer to build. Managed systems expose similar choices, just with different names and defaults.
A few tactics matter more than people expect: cache popular queries, batch work where you can (for example, prefetch the next page), and consider a two-stage flow where you do a fast vector search and then rerank the top 20 to 100 with business signals like recency or customer tier. Also watch network hops. If search lives in a separate service, every query is another round trip.
To measure quality, start small and concrete. Collect 20 to 50 real user questions, define what a âgoodâ answer looks like, and track top 3 and top 10 hit rate, median and p95 latency, percent of queries with no good result, and how much quality drops once permissions and filters apply.
This is where the choice stops being theoretical. The best option is the one that hits your relevance target at a latency users accept, with tuning you can actually maintain.
Scaling concerns and ongoing operations
Many teams start with pgvector because it keeps everything in one place: app data and embeddings. For a lot of business apps, a single PostgreSQL node is enough, especially if you have tens to a few hundreds of thousands of vectors and search isnât the top traffic driver.
You usually hit limits when semantic search becomes a core user action (on most pages, in every ticket, in chat), or when you store millions of vectors and need tight response times during peak hours.
Common signs a single Postgres setup is straining include p95 search latency jumping during normal write activity, having to choose between fast indexes and acceptable write speed, maintenance tasks turning into âschedule it at nightâ events, and needing different scaling for search than for the rest of the database.
With pgvector, scaling often means adding read replicas for query load, partitioning tables, tuning indexes, and planning around index builds and storage growth. Itâs doable, but it becomes ongoing work. You also face design choices like keeping embeddings in the same table as business data versus separating them to reduce bloat and lock contention.
Managed vector databases shift much of this to the vendor. They often offer independent scaling of compute and storage, built-in sharding, and simpler high availability. The tradeoff is operating two systems (Postgres plus the vector store) and keeping metadata and permissions in sync.
Cost tends to surprise teams more than performance. The big drivers are storage (vectors plus indexes grow quickly), peak query volume (often what sets the bill), update frequency (re-embedding and upserts), and data movement (extra calls when your app needs heavy filtering).
If youâre deciding between pgvector and a managed service, pick which pain you prefer: deeper Postgres tuning and capacity planning, or paying more for easier scaling while managing another dependency.
Security, compliance, and reliability questions to ask
Security details often decide faster than speed benchmarks. Ask early where data will live, who can see it, and what happens during an outage.
Start with data residency and access. Embeddings can still leak meaning, and many teams also store raw snippets for highlighting. Be clear about which system will hold raw text (tickets, notes, documents) versus only embeddings. Also decide who inside your company can query the store directly, and whether you need strict separation between production and analytics access.
Controls to confirm before you build
Ask these questions for either option:
- How is data encrypted at rest and in transit, and can you manage your own keys?
- Whatâs the backup plan, how often are restores tested, and what recovery time target do you need?
- Do you get audit logs for reads and writes, and can you alert on unusual query volume?
- How do you enforce multi-tenant isolation: separate databases, separate schemas, or row-level rules?
- Whatâs the retention policy for deleted content, including embeddings and caches?
Multi-tenant separation is the one that trips people up. If one customer must never influence another, you need strong tenant scoping in every query. With PostgreSQL, this can be enforced with row-level security and careful query patterns. With a managed vector database, you often rely on namespaces or collections plus application logic.
Reliability and failure modes
Plan for search downtime. If the vector store is down, what will users see? A safe default is to fall back to keyword search, or show recent items only, instead of breaking the page.
Example: in a support portal built with AppMaster, you might keep tickets in PostgreSQL and treat semantic search as an optional feature. If embeddings fail to load, the portal can still show ticket lists and allow exact keyword search while you recover the vector service.
Step-by-step: how to choose with a low-risk pilot
The safest way to decide is to run a small pilot that looks like your real app, not a demo notebook.
Start by writing down what youâre searching and what must be filtered. âSearch our docsâ is vague. âSearch help articles, ticket replies, and PDF manuals, but only show items the user is allowed to seeâ is a real requirement. Permissions, tenant ID, language, product area, and âonly published contentâ filters often decide the winner.
Next, pick an embedding model and a refresh plan. Decide what gets embedded (full document, chunks, or both) and how often it updates (on every edit, nightly, or on publish). If content changes often, measure how painful re-embedding is, not just how fast queries are.
Then build a thin search API in your backend. Keep it boring: one endpoint that takes a query plus filter fields, returns top results, and logs what happened. If youâre building with AppMaster, you can implement the ingestion and update flow as a backend service plus a Business Process that calls your embedding provider, writes vectors and metadata, and enforces access rules.
Run a two-week pilot with real users and real tasks. Use a handful of common questions users actually ask, track âfound answerâ rate and time to first useful result, review bad results weekly, watch re-embedding volume and query load, and test failure modes like missing metadata or stale vectors.
At the end, decide based on evidence. Keep pgvector if it meets quality and filtering needs with acceptable ops work. Switch to managed if scaling and reliability dominate. Or run a hybrid setup (PostgreSQL for metadata and permissions, vector store for retrieval) if that fits your stack.
Common mistakes teams run into
Most mistakes show up after the first demo works. A quick proof of concept can look great, then fall apart when you add real users, real data, and real rules.
The issues that most often cause rework are consistent:
- Assuming vectors handle access control. Similarity search doesnât know who is allowed to see what. If your app has roles, teams, tenants, or private notes, you still need strict permission filters and tests so search never leaks restricted content.
- Trusting âfeels goodâ demos. A few hand-picked queries arenât an evaluation. Without a small labeled set of questions and expected results, regressions are hard to catch when you change chunking, embeddings, or indexes.
- Embedding whole documents as a single vector. Large pages, tickets, and PDFs usually need chunking. Without chunks, results get vague. Without versioning, you canât tell which embedding matches which revision.
- Ignoring updates and deletes. Real apps edit and remove content. If you donât re-embed on update and clean up on delete, youâll serve stale matches that point to missing or outdated text.
- Over-tuning performance before nailing UX. Teams spend days on index settings while skipping basics like metadata filters, good snippets, and a keyword fallback when the query is very specific.
A simple âday-2â test catches these early: add a new permission rule, update 20 items, delete 5, then ask the same 10 evaluation questions again. If youâre building on a platform like AppMaster, plan these checks alongside your business logic and database model, not as an afterthought.
Example scenario: semantic search in a support portal
A mid-size SaaS company runs a support portal with two main content types: customer tickets and help center articles. They want a search box that understands meaning, so typing âcanât log in after changing phoneâ surfaces the right article and similar past tickets.
The non-negotiables are practical: each customer must only see their own tickets, agents need to filter by status (open, pending, solved), and results should feel instant because suggestions show as the user types.
Option A: pgvector inside the same PostgreSQL
If the portal already stores tickets and articles in PostgreSQL (common if you build on a stack that includes Postgres, like AppMaster), adding pgvector can be a clean first move. You keep embeddings, metadata, and permissions in one place, so âonly tickets for customer_123â is just a normal WHERE clause.
This tends to work well when your dataset is modest (tens or hundreds of thousands of items), your team is comfortable tuning Postgres indexes and query plans, and you want fewer moving parts with simpler access control.
The tradeoff is that vector search can compete with transactional workload. As usage grows, you may need extra capacity, careful indexing, or even a separate Postgres instance to protect ticket writes and SLAs.
Option B: managed vector DB for embeddings, PostgreSQL for metadata
With a managed vector database, you typically store embeddings and an ID there, then keep the âtruthâ (ticket status, customer_id, permissions) in PostgreSQL. In practice, teams either filter in Postgres first and then search eligible IDs, or they search first and re-check permissions before showing results.
This option often wins when growth is uncertain or the team doesnât want to spend time nursing performance. But the permission flow needs real care, or you risk leaking results across customers.
A practical call is to start with pgvector if you need tight filtering and simple ops now, and plan for a managed vector database if you expect rapid growth, heavy query volume, or canât afford search to slow down your core database.
Quick checklist and next steps
If youâre stuck, stop debating features and write down what your app must do on day one. The real requirements usually show up during a small pilot with real users and real data.
These questions usually decide the winner faster than benchmarks:
- What filters are non-negotiable (tenant, role, region, status, time range)?
- How big will the index get in 6 to 12 months (items and embeddings)?
- What latency feels instant for your users, including at peak?
- Who owns the budget and on-call responsibility?
- Where should the source of truth live: PostgreSQL tables or an external index?
Also plan for change. Embeddings arenât âset and forget.â Text changes, models change, and relevance drifts until someone complains. Decide upfront how youâll handle updates, how youâll detect drift, and what youâll monitor (query latency, error rate, recall on a small test set, and âno resultsâ searches).
If you want to move fast on the full business app around search, AppMaster (appmaster.io) can be a practical fit: it gives you PostgreSQL data modeling, backend logic, and web or mobile UI in one no-code workflow, and you can add semantic search as an iteration once the core app and permissions are in place.
FAQ
Semantic search returns useful results even when the userâs words donât match the documentâs exact wording. Itâs especially helpful when people use typos, shorthand, or describe symptoms instead of official terms, which is common in support portals, internal tools, and knowledge bases.
Use pgvector when you want fewer moving parts, tight SQL-based filtering, and your dataset and traffic are still modest. Itâs often the fastest path to a secure, permission-aware search because vectors and metadata live in the same PostgreSQL queries you already trust.
A managed vector database is a good fit when you expect rapid growth in vectors or query volume, or you want search scaling and availability handled outside your main database. Youâll trade simpler operations for extra integration work and careful permission handling.
Embedding is the process of turning text into a numeric vector that represents meaning. A vector database (or pgvector in PostgreSQL) stores those vectors and can quickly find the closest ones to a userâs embedded query, which is how you get âsimilar by intentâ results.
Filtering after the vector search often removes the best matches and can leave users with worse results or empty pages. It also increases the risk of accidental exposure through logs, caches, or debugging, so itâs safer to apply tenant and role filters as early as possible.
With pgvector, you can apply permissions, joins, and Row Level Security in the same SQL query that does similarity search. That makes it easier to guarantee ânever show forbidden rows,â because PostgreSQL enforces it where the data lives.
Most managed vector databases support metadata filters, but they typically donât support joins, and the filter language can be limited. You usually end up denormalizing permission-related metadata into each vector record and enforcing the final authorization checks in your application.
Chunking is splitting large documents into smaller parts before embedding, which usually improves precision because each vector represents a focused idea. Full-document embeddings can work for short items, but long tickets, policies, and PDFs often become vague unless you chunk and track versions.
Plan updates from day one: re-embed on publish or edit for content that changes often, and always remove vectors when the source record is deleted. If you skip this, youâll serve stale results that point to outdated text or missing records.
A practical pilot uses real queries and strict filters, measures relevance and latency, and tests failure cases like missing metadata or stale vectors. Pick the option that reliably returns good top results under your permission rules, with costs and operational work your team can keep up with.


