May 19, 2025·8 min read

pgvector vs managed vector database for semantic search

Comparing pgvector vs managed vector database for semantic search: setup effort, scaling concerns, filtering support, and fit in a typical app stack.

What problem semantic search solves in a business app

Semantic search helps people find the right answer even when they don’t use the “right” keywords. Instead of matching exact words, it matches meaning. Someone who types “reset my login” should still see an article titled “Change your password and sign in again” because the intent is the same.

Keyword search breaks down in business apps because real users are inconsistent. They use shorthand, make typos, mix product names, and describe symptoms instead of official terms. In FAQs, support tickets, policy docs, and onboarding guides, the same issue shows up in many different phrasings. A keyword engine often returns nothing useful, or a long list that forces people to click around.

Embeddings are the usual building block. Your app turns each document (an article, a ticket, a product note) into a vector, a long list of numbers that represents meaning. When a user asks a question, you embed the question too and look for the closest vectors. A “vector database” is simply where you store those vectors and how you search them quickly.

In a typical business stack, semantic search touches four areas: the content store (knowledge base, docs, ticket system), the embedding pipeline (imports plus updates when content changes), the query experience (search box, suggested answers, agent assist), and the guardrails (permissions plus metadata like team, customer, plan, and region).

For most teams, “good enough” beats perfect. The practical target is relevance on the first try, responses under a second, and costs that stay predictable as content grows. That goal matters more than debating tools.

Two common options: pgvector and managed vector databases

Most teams end up choosing between two patterns for semantic search: keep everything inside PostgreSQL with pgvector, or add a separate managed vector database next to your main database. The right choice depends less on “which is better” and more on where you want complexity to live.

pgvector is a PostgreSQL extension that adds a vector data type and indexes so you can store embeddings in a normal table and run similarity search with SQL. In practice, your documents table might include text, metadata (customer_id, status, visibility), plus an embedding column. Search becomes “embed the query, then return rows whose embeddings are closest.”

A managed vector database is a hosted service built mainly for embeddings. It usually gives you APIs for inserting vectors and querying by similarity, plus operational features you’d otherwise build yourself.

Both options do the same core job: store embeddings with an ID and metadata, find the nearest neighbors for a query, and return top matches so your app can show relevant items.

The key difference is the system of record. Even if you use a managed vector database, you almost always keep PostgreSQL for business data: accounts, permissions, billing, workflow state, and audit logs. The vector store becomes a retrieval layer, not where you run the whole app.

A common architecture looks like this: keep the authoritative record in Postgres, store embeddings either in Postgres (pgvector) or in the vector service, run a similarity search to get matching IDs, then fetch full rows from Postgres.

If you build apps in a platform like AppMaster, PostgreSQL is already a natural home for structured data and permissions. The question becomes whether embedding search should live there too, or sit in a specialized service while Postgres stays the source of truth.

Setup effort: what you actually have to do

Teams often pick based on features and then get surprised by the day-to-day work. The real decision is where you want the complexity: inside your existing Postgres setup, or in a separate service.

With pgvector, you’re adding vector search to the database you already run. The setup is usually straightforward, but it’s still database work, not just application code.

A typical pgvector setup includes enabling the extension, adding an embedding column, creating an index that matches your query pattern (and tuning it later), deciding how embeddings update when content changes, and writing similarity queries that also apply your normal filters.

With a managed vector database, you create a new system next to your main database. That can mean less SQL, but more integration glue.

A typical managed setup includes creating an index (dimensions and distance metric), wiring API keys into your secrets, building an ingestion job to push embeddings and metadata, keeping a stable ID mapping between app records and vector records, and locking down network access so only your backend can query it.

CI/CD and migrations differ too. pgvector fits naturally into your existing migrations and review process. Managed services shift changes into code plus admin settings, so you’ll want a clear process for configuration changes and for reindexing.

Ownership usually follows the choice. pgvector leans on app dev plus whoever owns Postgres (sometimes a DBA). A managed service is often owned by a platform team, with app devs handling ingestion and query logic. That’s why this decision is as much about team structure as it is about technology.

Filtering and permissions: the make-or-break detail

Semantic search only helps if it respects what a user is allowed to see. In a real business app, every record has metadata next to the embedding: org_id, user_id, role, status (open, closed), and visibility (public, internal, private). If your search layer can’t filter on that metadata cleanly, you’ll get confusing results or, worse, data leaks.

The biggest practical difference is filtering before vs after the vector search. Filtering after sounds simple (search everything, then drop forbidden rows), but it fails in two common ways. First, the best matches might get removed, leaving you with worse results. Second, it increases security risk if any part of the pipeline logs, caches, or exposes unfiltered results.

With pgvector, vectors live in PostgreSQL alongside metadata, so you can apply permissions in the same SQL query and let Postgres enforce them.

PostgreSQL: permissions and joins are native

If your app already uses Postgres, pgvector often wins on simplicity: search can be “just another query.” You can join across tickets, customers, and memberships, and you can use Row Level Security so the database itself blocks unauthorized rows.

A common pattern is to narrow the candidate set with org and status filters, then run vector similarity on what’s left, optionally mixing in keyword matching for exact identifiers.

Managed vector DB: filters vary, permissions are usually on you

Most managed vector databases support metadata filters, but the filter language can be limited, and joins aren’t a thing. You typically denormalize metadata into each vector record and re-implement permission checks in your application.

For hybrid search in business apps, you usually want all of these to work together: hard filters (org, role, status, visibility), keyword match (exact terms like an invoice number), vector similarity (meaning), and ranking rules (boost recent or open items).

Example: a support portal built in AppMaster can keep tickets and permissions in PostgreSQL, making it straightforward to enforce “agent sees only their org” while still getting semantic matches on ticket summaries and replies.

Search quality and performance basics

Model data cleanly

Design your PostgreSQL schema visually with the Data Designer in AppMaster.

Model Data

Search quality is the mix of relevance (are results actually useful?) and speed (does it feel instant?). With both pgvector and a managed vector database, you usually trade a bit of relevance for lower latency by using approximate search. That tradeoff is often fine for business apps, as long as you measure it with real queries.

At a high level, you tune three things: the embedding model (what “meaning” looks like), the index settings (how hard the engine searches), and the ranking layer (how results are ordered once you add filters, recency, or popularity).

In PostgreSQL with pgvector, you typically pick an index like IVFFlat or HNSW. IVFFlat is faster and lighter to build, but you need to tune how many “lists” it uses and you generally want enough data before it shines. HNSW often gives better recall at low latency, but it can use more memory and takes longer to build. Managed systems expose similar choices, just with different names and defaults.

A few tactics matter more than people expect: cache popular queries, batch work where you can (for example, prefetch the next page), and consider a two-stage flow where you do a fast vector search and then rerank the top 20 to 100 with business signals like recency or customer tier. Also watch network hops. If search lives in a separate service, every query is another round trip.

To measure quality, start small and concrete. Collect 20 to 50 real user questions, define what a “good” answer looks like, and track top 3 and top 10 hit rate, median and p95 latency, percent of queries with no good result, and how much quality drops once permissions and filters apply.

This is where the choice stops being theoretical. The best option is the one that hits your relevance target at a latency users accept, with tuning you can actually maintain.

Scaling concerns and ongoing operations

Many teams start with pgvector because it keeps everything in one place: app data and embeddings. For a lot of business apps, a single PostgreSQL node is enough, especially if you have tens to a few hundreds of thousands of vectors and search isn’t the top traffic driver.

You usually hit limits when semantic search becomes a core user action (on most pages, in every ticket, in chat), or when you store millions of vectors and need tight response times during peak hours.

Common signs a single Postgres setup is straining include p95 search latency jumping during normal write activity, having to choose between fast indexes and acceptable write speed, maintenance tasks turning into “schedule it at night” events, and needing different scaling for search than for the rest of the database.

With pgvector, scaling often means adding read replicas for query load, partitioning tables, tuning indexes, and planning around index builds and storage growth. It’s doable, but it becomes ongoing work. You also face design choices like keeping embeddings in the same table as business data versus separating them to reduce bloat and lock contention.

Managed vector databases shift much of this to the vendor. They often offer independent scaling of compute and storage, built-in sharding, and simpler high availability. The tradeoff is operating two systems (Postgres plus the vector store) and keeping metadata and permissions in sync.

Cost tends to surprise teams more than performance. The big drivers are storage (vectors plus indexes grow quickly), peak query volume (often what sets the bill), update frequency (re-embedding and upserts), and data movement (extra calls when your app needs heavy filtering).

If you’re deciding between pgvector and a managed service, pick which pain you prefer: deeper Postgres tuning and capacity planning, or paying more for easier scaling while managing another dependency.

Security, compliance, and reliability questions to ask

Keep vectors fresh

Re-embed on publish, edit, or delete with drag-and-drop business logic.

Automate Updates

Security details often decide faster than speed benchmarks. Ask early where data will live, who can see it, and what happens during an outage.

Start with data residency and access. Embeddings can still leak meaning, and many teams also store raw snippets for highlighting. Be clear about which system will hold raw text (tickets, notes, documents) versus only embeddings. Also decide who inside your company can query the store directly, and whether you need strict separation between production and analytics access.

Controls to confirm before you build

Ask these questions for either option:

How is data encrypted at rest and in transit, and can you manage your own keys?
What’s the backup plan, how often are restores tested, and what recovery time target do you need?
Do you get audit logs for reads and writes, and can you alert on unusual query volume?
How do you enforce multi-tenant isolation: separate databases, separate schemas, or row-level rules?
What’s the retention policy for deleted content, including embeddings and caches?

Multi-tenant separation is the one that trips people up. If one customer must never influence another, you need strong tenant scoping in every query. With PostgreSQL, this can be enforced with row-level security and careful query patterns. With a managed vector database, you often rely on namespaces or collections plus application logic.

Reliability and failure modes

Plan for search downtime. If the vector store is down, what will users see? A safe default is to fall back to keyword search, or show recent items only, instead of breaking the page.

Example: in a support portal built with AppMaster, you might keep tickets in PostgreSQL and treat semantic search as an optional feature. If embeddings fail to load, the portal can still show ticket lists and allow exact keyword search while you recover the vector service.

Step-by-step: how to choose with a low-risk pilot

Own your source code

Produce production-ready source code in Go, Vue3, Kotlin, and SwiftUI.

Generate Code

The safest way to decide is to run a small pilot that looks like your real app, not a demo notebook.

Start by writing down what you’re searching and what must be filtered. “Search our docs” is vague. “Search help articles, ticket replies, and PDF manuals, but only show items the user is allowed to see” is a real requirement. Permissions, tenant ID, language, product area, and “only published content” filters often decide the winner.

Next, pick an embedding model and a refresh plan. Decide what gets embedded (full document, chunks, or both) and how often it updates (on every edit, nightly, or on publish). If content changes often, measure how painful re-embedding is, not just how fast queries are.

Then build a thin search API in your backend. Keep it boring: one endpoint that takes a query plus filter fields, returns top results, and logs what happened. If you’re building with AppMaster, you can implement the ingestion and update flow as a backend service plus a Business Process that calls your embedding provider, writes vectors and metadata, and enforces access rules.

Run a two-week pilot with real users and real tasks. Use a handful of common questions users actually ask, track “found answer” rate and time to first useful result, review bad results weekly, watch re-embedding volume and query load, and test failure modes like missing metadata or stale vectors.

At the end, decide based on evidence. Keep pgvector if it meets quality and filtering needs with acceptable ops work. Switch to managed if scaling and reliability dominate. Or run a hybrid setup (PostgreSQL for metadata and permissions, vector store for retrieval) if that fits your stack.

Common mistakes teams run into

Most mistakes show up after the first demo works. A quick proof of concept can look great, then fall apart when you add real users, real data, and real rules.

The issues that most often cause rework are consistent:

Assuming vectors handle access control. Similarity search doesn’t know who is allowed to see what. If your app has roles, teams, tenants, or private notes, you still need strict permission filters and tests so search never leaks restricted content.
Trusting “feels good” demos. A few hand-picked queries aren’t an evaluation. Without a small labeled set of questions and expected results, regressions are hard to catch when you change chunking, embeddings, or indexes.
Embedding whole documents as a single vector. Large pages, tickets, and PDFs usually need chunking. Without chunks, results get vague. Without versioning, you can’t tell which embedding matches which revision.
Ignoring updates and deletes. Real apps edit and remove content. If you don’t re-embed on update and clean up on delete, you’ll serve stale matches that point to missing or outdated text.
Over-tuning performance before nailing UX. Teams spend days on index settings while skipping basics like metadata filters, good snippets, and a keyword fallback when the query is very specific.

A simple “day-2” test catches these early: add a new permission rule, update 20 items, delete 5, then ask the same 10 evaluation questions again. If you’re building on a platform like AppMaster, plan these checks alongside your business logic and database model, not as an afterthought.

Example scenario: semantic search in a support portal

Build the search API

Create secure backend endpoints and a search UI without starting from scratch.

Start Building

A mid-size SaaS company runs a support portal with two main content types: customer tickets and help center articles. They want a search box that understands meaning, so typing “can’t log in after changing phone” surfaces the right article and similar past tickets.

The non-negotiables are practical: each customer must only see their own tickets, agents need to filter by status (open, pending, solved), and results should feel instant because suggestions show as the user types.

Option A: pgvector inside the same PostgreSQL

If the portal already stores tickets and articles in PostgreSQL (common if you build on a stack that includes Postgres, like AppMaster), adding pgvector can be a clean first move. You keep embeddings, metadata, and permissions in one place, so “only tickets for customer_123” is just a normal WHERE clause.

This tends to work well when your dataset is modest (tens or hundreds of thousands of items), your team is comfortable tuning Postgres indexes and query plans, and you want fewer moving parts with simpler access control.

The tradeoff is that vector search can compete with transactional workload. As usage grows, you may need extra capacity, careful indexing, or even a separate Postgres instance to protect ticket writes and SLAs.

Option B: managed vector DB for embeddings, PostgreSQL for metadata

With a managed vector database, you typically store embeddings and an ID there, then keep the “truth” (ticket status, customer_id, permissions) in PostgreSQL. In practice, teams either filter in Postgres first and then search eligible IDs, or they search first and re-check permissions before showing results.

This option often wins when growth is uncertain or the team doesn’t want to spend time nursing performance. But the permission flow needs real care, or you risk leaking results across customers.

A practical call is to start with pgvector if you need tight filtering and simple ops now, and plan for a managed vector database if you expect rapid growth, heavy query volume, or can’t afford search to slow down your core database.

Quick checklist and next steps

If you’re stuck, stop debating features and write down what your app must do on day one. The real requirements usually show up during a small pilot with real users and real data.

These questions usually decide the winner faster than benchmarks:

What filters are non-negotiable (tenant, role, region, status, time range)?
How big will the index get in 6 to 12 months (items and embeddings)?
What latency feels instant for your users, including at peak?
Who owns the budget and on-call responsibility?
Where should the source of truth live: PostgreSQL tables or an external index?

Also plan for change. Embeddings aren’t “set and forget.” Text changes, models change, and relevance drifts until someone complains. Decide upfront how you’ll handle updates, how you’ll detect drift, and what you’ll monitor (query latency, error rate, recall on a small test set, and “no results” searches).

If you want to move fast on the full business app around search, AppMaster (appmaster.io) can be a practical fit: it gives you PostgreSQL data modeling, backend logic, and web or mobile UI in one no-code workflow, and you can add semantic search as an iteration once the core app and permissions are in place.

FAQ

Semantic search returns useful results even when the user’s words don’t match the document’s exact wording. It’s especially helpful when people use typos, shorthand, or describe symptoms instead of official terms, which is common in support portals, internal tools, and knowledge bases.

Use pgvector when you want fewer moving parts, tight SQL-based filtering, and your dataset and traffic are still modest. It’s often the fastest path to a secure, permission-aware search because vectors and metadata live in the same PostgreSQL queries you already trust.

A managed vector database is a good fit when you expect rapid growth in vectors or query volume, or you want search scaling and availability handled outside your main database. You’ll trade simpler operations for extra integration work and careful permission handling.

Embedding is the process of turning text into a numeric vector that represents meaning. A vector database (or pgvector in PostgreSQL) stores those vectors and can quickly find the closest ones to a user’s embedded query, which is how you get “similar by intent” results.

Filtering after the vector search often removes the best matches and can leave users with worse results or empty pages. It also increases the risk of accidental exposure through logs, caches, or debugging, so it’s safer to apply tenant and role filters as early as possible.

With pgvector, you can apply permissions, joins, and Row Level Security in the same SQL query that does similarity search. That makes it easier to guarantee “never show forbidden rows,” because PostgreSQL enforces it where the data lives.

Most managed vector databases support metadata filters, but they typically don’t support joins, and the filter language can be limited. You usually end up denormalizing permission-related metadata into each vector record and enforcing the final authorization checks in your application.

Chunking is splitting large documents into smaller parts before embedding, which usually improves precision because each vector represents a focused idea. Full-document embeddings can work for short items, but long tickets, policies, and PDFs often become vague unless you chunk and track versions.

Plan updates from day one: re-embed on publish or edit for content that changes often, and always remove vectors when the source record is deleted. If you skip this, you’ll serve stale results that point to outdated text or missing records.

A practical pilot uses real queries and strict filters, measures relevance and latency, and tests failure cases like missing metadata or stale vectors. Pick the option that reliably returns good top results under your permission rules, with costs and operational work your team can keep up with.