2025年5月22日·阅读约1分钟

用 PostgreSQL 的生成列加速管理员筛选

Q: When should I use a PostgreSQL generated column for an admin screen?

Generated columns help when you keep repeating the same expression in `WHERE` or `ORDER BY`, like normalizing names, mapping statuses, or building a sorting key. They’re especially useful for admin lists that are opened all day and need predictable filtering and sorting.

Q: What’s the difference between a stored generated column and an expression index?

A stored generated column is computed on insert or update and saved like a normal column, so reads can be fast and indexable. An expression index stores the result in the index without adding a new table column, but your queries still need to use the exact expression for the planner to match it.

Q: Will a generated column automatically make my query faster?

No, not by itself. A generated column mainly makes the query simpler and makes indexing a computed value straightforward, but you still need an index that matches your common filters and sorts if you want real speedups at scale.

Q: What are the best generated columns to add for admin search and sorting?

Usually it’s a field you filter or sort on constantly: a normalized search key, a “full name” sort key, a derived boolean like `is_overdue`, or a ranking number that matches how people expect results to sort. Pick one value that removes repeated work from many queries, not a one-off calculation.

Q: How do I choose the right index for an admin list that filters and sorts?

Start with the most common filter columns, then put the main sort key last, like `(workspace_id, status, full_name_key)` if that matches the screen. This lets PostgreSQL filter quickly and then return rows already ordered without extra work.

Q: Can generated columns fix slow contains search like ILIKE '%term%'?

Not very. A generated column can normalize text so behavior is consistent, but `ILIKE '%term%'` still tends to be slow with basic btree indexes on large tables. If performance matters, prefer prefix-style search where you can, reduce the searched dataset with other filters, or adjust the UI behavior for big tables.

Q: Can I create a generated column that depends on now() for “inactive” flags?

Stored generated columns have to be based on immutable expressions, so functions like `now()` typically aren’t allowed and would also be conceptually wrong because the value would go stale. For time-based flags like “inactive for 90 days,” consider a normal column maintained by a job, or compute it at query time if it’s not heavily used.

Q: What happens if I need to change the formula of a generated column later?

Yes, but plan it like a real migration. Changing the expression means updating the schema and recomputing values for existing rows, which can take time and add write load, so do it in a controlled deployment window if the table is large.

Q: Do generated columns add overhead to inserts and updates?

Yes. The database has to compute and store the value on every insert and update, so heavy write workloads (imports, sync jobs) can slow down if you add too many generated fields or complex expressions. Keep expressions short, add only what you use, and measure write performance on busy tables.

Q: What’s the safest way to roll out generated columns to speed up an existing admin screen?

Add a generated column, validate a few real rows, then add the index that matches the screen’s main filter and sort. Update the admin query to use the new column directly, and compare query time and rows scanned before and after to confirm the change helped.

了解如何使用 PostgreSQL 生成列加速管理员界面的筛选和排序，同时保持 SQL 可读性，包含实用示例和快速检查要点。

为什么管理员界面很快会变慢且混乱

管理员界面通常从简单开始：一张表，几个筛选，或许按“最新”排序。后来就开始有真正的需求。支持团队想要按姓名、邮箱和电话查找客户，销售要按“最后活动”排序，财务要“逾期余额”。每一项需求都会增加条件、连接和额外计算。

大多数管理员列表变慢的原因相同：每次点击都会改变查询。筛选和排序会迫使数据库扫描大量行，尤其是当查询需要为每行计算一个值才能确定是否匹配时。

一个常见的拐点是当 WHERE 和 ORDER BY 中充满了表达式。你不是在简单列上筛选，而是在 lower(email)、date_trunc('day', last_seen_at) 或将多个状态映射到同一个“桶”的 CASE 语句上筛选。这些表达式不仅更慢，还让 SQL 更难读、难以建立索引，也更容易出错。

混乱的管理员 SQL 通常来自几个反复出现的模式：

一个“搜索”输入检查多个字段且规则不同
按派生值排序（全名、优先级分数、“最后有意义事件”）
业务规则在多个界面间复制（活跃 vs 非活跃，已付 vs 逾期）
到处都有小“辅助”调整（trim、lower、coalesce）
相同的计算值在列表、筛选和排序中重复使用

团队经常尝试在应用层隐藏这些：动态查询生成器、有条件的连接，或在代码中预计算值。这可能有效，但会把逻辑分散到 UI 和数据库之间，调试慢查询会变得痛苦。

目标很明确：查询要快且保持可读。当一个计算值在多个管理员界面中反复出现时，PostgreSQL 的生成列可以把规则放在一个地方，同时让数据库进行优化。

用通俗的语言理解生成列

生成列是一个普通的表列，其值由其他列计算得出。你不需要手动写入这个值，PostgreSQL 会根据你定义的表达式填充它。

在 PostgreSQL 中，生成列是存储的。PostgreSQL 在插入或更新行时计算该值，然后像其他列一样保存到磁盘上。这通常正是管理员界面所需要的：快速读取，并且可以对计算值建立索引。

这与在每个查询中做相同计算不同。如果你不断写 WHERE lower(email) = lower($1) 或按 last_name || ', ' || first_name 排序，就要重复付出代价，SQL 也会变得嘈杂。生成列把这种重复计算移动到表定义里。你的查询变得更简单，结果在各处一致。

当源数据改变时，PostgreSQL 会自动为那一行更新生成值。你的应用不需要记住去同步它。

一个有用的思路：

把公式只定义一次。
PostgreSQL 在写入时计算它。
查询像读取普通列一样读取它。
因为它是存储的，你可以为它建立索引。

如果你后来更改公式，需要做一次模式变更。像处理任何迁移一样规划它，因为现有行需要更新以匹配新的表达式。

在筛选和排序中使用计算字段的良好场景

当值总是从其他列派生且你经常基于它进行筛选或排序时，生成列很有价值。对于一次性的报表则帮助不大。

面向搜索的常用字段

管理员的搜索很少是“纯粹”的搜索。用户期望输入框能处理脏数据、不一致的大小写和多余空格。如果你存一个已经规范化的“搜索键”，WHERE 子句会保持可读，并且在各处行为一致。

合适的候选项包括合并的全名、用于不区分大小写搜索的去空格小写文本、折叠多余空白的清理版本，或从多字段派生的状态标签。

例如，不要在每个查询中重复 lower(trim(first_name || ' ' || last_name))，而是在表中生成 full_name_key 并基于它筛选。

与人类期望一致的排序键

排序是计算字段最先见效的场景之一，因为排序可能迫使 PostgreSQL 对许多行评估表达式。

常见的排序键包括数值排名（将套餐等级映射到 1、2、3）、单一的“最新活动”时间戳（如两个时间戳的最大值），或按文本正确排序的带填充代码。

当排序键是普通的有索引列时，ORDER BY 会便宜很多。

快速筛选用的派生标志

管理员用户喜欢像“逾期”或“高价值”这样的复选框。当逻辑稳定且仅基于行数据时，这些很适合用生成列实现。

例如，如果客户列表需要“有未读消息”和“是否逾期”，生成一个基于 unread_count > 0 的布尔列 has_unread 和基于 due_date < now() 且 paid_at is null 的 is_overdue，界面筛选就能映射到简单条件。

在生成列、索引和其他选项之间做选择

管理员界面需要三样东西：快速筛选、快速排序、以及数月后仍能读懂的 SQL。真正的决定点是把计算放在哪：表里、索引里、视图里，还是在应用代码里。

生成列在你希望该值像真实列那样表现时是合适的：易于引用、在 SELECT 中可见、不容易在添加新筛选时忘记。它们也能自然地配合常规索引。

表达式索引可以更快地添加，因为你不用改表结构。如果你主要关心速度且不介意 SQL 变丑，表达式索引通常就足够。缺点是可读性差，而且你依赖于 planner 精确匹配你的表达式。

视图在你想共享“数据形状”时有用，尤其是当管理员列表连接了很多表。但复杂视图可能隐藏昂贵的工作并增加调试点。

触发器可以保持普通列的同步，但它们增加了活动部件，可能使批量更新变慢，也容易在排查问题时被忽视。

有时最好的选择是由应用填充的普通列。如果用户可以编辑它，或者公式经常根据业务决策（而不是仅仅行数据）更改，把它显式保存会更清晰。

一个快速决策法则：

想要可读查询且公式稳定并只基于行数据？使用生成列。
想为某个特定筛选迅速提速且不介意杂乱 SQL？用表达式索引。
需要跨表连接的报告型形状在多处复用？考虑视图。
需要跨表逻辑或副作用？优先在应用实现，触发器作为最后手段。

逐步指导：添加生成列并在查询中使用它

Ship better admin search

Add a normalized search key and hook it to simple filters your team understands.

Build Admin

从一个你在 UI 上能明显感觉到慢的管理员列表查询开始。写下该屏幕最常用的筛选和排序。先优化那条查询。

挑一个能去除重复工作的计算字段，并用 snake_case 明确命名，让别人无需重读表达式就能猜到它的含义。

1) 添加生成列（STORED）

ALTER TABLE customers
ADD COLUMN full_name_key text
GENERATED ALWAYS AS (
  lower(concat_ws(' ', last_name, first_name))
) STORED;

在添加索引前在真实行上验证：

SELECT id, first_name, last_name, full_name_key
FROM customers
ORDER BY id DESC
LIMIT 5;

如果输出不对，请现在修正表达式。STORED 表示 PostgreSQL 会在每次插入和更新时维护它。

2) 添加匹配管理员界面的索引

如果你的管理员界面按状态筛选并按姓名排序，就按该模式建立索引：

CREATE INDEX customers_status_full_name_key_idx
ON customers (status, full_name_key);

3) 更新管理员查询以使用新列

之前你可能有一个混乱的 ORDER BY。改用后就清晰了：

SELECT id, status, first_name, last_name
FROM customers
WHERE status = 'active'
ORDER BY full_name_key ASC
LIMIT 50 OFFSET 0;

把生成列用于人们每天筛选和排序的部分，而不是罕见的屏幕。

与真实管理员界面匹配的索引模式

Automate admin actions

Use the Business Process Editor to automate actions from admin screens.

Build Workflow

管理员界面重复几种行为：按少数字段筛选、按一列排序并分页。最佳方案很少是“给所有字段建索引”，更常见的是“为最常见查询的形状建索引”。

一个实用规则：把最常用的筛选列放前，把最常用的排序列放最后。如果是多租户，workspace_id（或类似字段）通常放在最前面：(workspace_id, status, created_at)。

文本搜索是另一个问题。很多搜索框最终会变成 ILIKE '%term%'，用基本的 btree 索引难以加速。有用的模式是对原文本建立规范化的辅助列（小写、去空格、或拼接），而不是直接对原字段搜索。如果 UI 可以接受前缀搜索（term%），那么对规范化列建立 btree 索引会有帮助。如果必须是包含搜索（%term%），考虑收紧 UI 行为（例如只允许“邮箱以...开头”），或将搜索限制在更小的子集上。

在添加索引前还要检查选择性。如果 95% 的行都有相同值（比如 status = 'active'），单独为该列建索引意义不大。把它和更具选择性的列组合，或为少数情况建立部分索引。

现实示例：一个始终快速的客户管理员列表

想象一个典型的客户管理员页面：一个搜索框、几个筛选（不活跃、余额区间）、以及可排序的“最后访问”列。随着时间推移，它会变成难以阅读的 SQL：LOWER()、TRIM()、COALESCE()、日期计算和重复的 CASE 块。

把这些重复表达式推到生成列中是一种保持快速且可读的方法。

表和生成列

假设有一个包含 name、email、last_seen 和 balance 的 customers 表。添加三个计算字段：

search_key：用于简单搜索的规范化文本块
is_inactive：一个布尔值，可以在不重复日期逻辑的情况下筛选
balance_bucket：用于快速分段的标签

ALTER TABLE customers
  ADD COLUMN search_key text
    GENERATED ALWAYS AS (
      lower(trim(coalesce(name, ''))) || ' ' || lower(trim(coalesce(email, '')))
    ) STORED,
  ADD COLUMN is_inactive boolean
    GENERATED ALWAYS AS (
      last_seen IS NULL OR last_seen < (now() - interval '90 days')
    ) STORED,
  ADD COLUMN balance_bucket text
    GENERATED ALWAYS AS (
      CASE
        WHEN balance < 0 THEN 'negative'
        WHEN balance < 100 THEN '0-99'
        WHEN balance < 500 THEN '100-499'
        ELSE '500+'
      END
    ) STORED;

现在管理员查询读起来就像界面一样直观。

可读的筛选 + 排序

“不活跃客户，按最新活动降序”变为：

SELECT id, name, email, last_seen, balance
FROM customers
WHERE is_inactive = true
ORDER BY last_seen DESC NULLS LAST
LIMIT 50;

而一个基本搜索变为：

SELECT id, name, email, last_seen, balance
FROM customers
WHERE search_key LIKE '%' || lower(trim($1)) || '%'
ORDER BY last_seen DESC NULLS LAST
LIMIT 50;

真正的收获是统一性。相同的字段驱动多个界面而无需重复逻辑：

客户列表的搜索框使用 search_key
“不活跃客户”标签页使用 is_inactive
余额筛选选项使用 balance_bucket

常见错误与陷阱

Turn rules into columns

Keep repeated PostgreSQL expressions in generated columns and use them across screens.

Try Now

生成列看起来像是简单的胜利：把计算放到表里，让查询干净。但它们只有在匹配界面筛选和排序方式并添加合适索引时才有用。

最常见的错误：

以为只添加生成列就能提速。计算值在大规模下仍需要索引才能快速筛选或排序。
把太多逻辑塞到一个字段里。如果生成列变成一个小程序，人们就不再信任它。保持表达式简短并清晰命名。
使用非不可变函数。PostgreSQL 要求 STORED 生成列的表达式是不可变的。像 now() 和 random() 这类函数不符合此要求，也不合常理。
忽视写入成本。插入和更新需要维护计算值。如果导入或集成很多，写入变慢可能得不偿失。
创建近似重复列。标准化一两种模式（比如单一规范化键），而不是累积五个相似列。

如果你的管理员列表做的是包含搜索（如 ILIKE '%ann%'），仅靠生成列通常不能解决性能问题。你可能需要不同的搜索方案。但对于日常的“筛选和排序”查询，生成列加上合适的索引通常能让性能更可预测。

上线前的快速检查清单

Add common modules fast

Add authentication and Stripe payments modules when your admin tool needs them.

Add Modules

在把更改推到管理员列表前，确认计算值、查询和索引是否匹配。

公式稳定且能用一句话解释清楚。
你的查询在 WHERE 和/或 ORDER BY 中实际使用了生成列。
索引匹配真实使用场景，而不是一次性测试。
在边缘情况（NULL、空字符串、奇怪空格、混合大小写）上比较过新旧逻辑的结果。
如果表很忙（导入、后台更新、集成），测试过写入性能。

下一步：把它应用到你的管理员界面

挑一个小而高影响的起点：团队全天打开的 2-3 个管理员界面（订单、客户、工单）。记录感觉慢的点（日期范围筛选、按“最后活动”排序、按合并姓名搜索、按状态标签筛选）。然后标准化一组可在各界面复用的短小计算字段。

一个易于衡量且容易回滚的上线计划：

添加生成列，使用清晰的名字。
在替换现有逻辑时短暂并行运行旧逻辑和新逻辑。
添加匹配主要筛选或排序的索引。
切换界面查询以使用新列。
对比上线前后（查询时间和扫描行数），然后移除旧的变通做法。

如果你在 AppMaster (appmaster.io) 构建内部管理工具，这些计算字段很适合放在共享数据模型中：数据库承载规则，UI 筛选可以直接指向简单的字段名，而无需在各界面重复表达式。

常见问题

Generated columns help when you keep repeating the same expression in WHERE or ORDER BY, like normalizing names, mapping statuses, or building a sorting key. They’re especially useful for admin lists that are opened all day and need predictable filtering and sorting.

A stored generated column is computed on insert or update and saved like a normal column, so reads can be fast and indexable. An expression index stores the result in the index without adding a new table column, but your queries still need to use the exact expression for the planner to match it.

No, not by itself. A generated column mainly makes the query simpler and makes indexing a computed value straightforward, but you still need an index that matches your common filters and sorts if you want real speedups at scale.

Usually it’s a field you filter or sort on constantly: a normalized search key, a “full name” sort key, a derived boolean like is_overdue, or a ranking number that matches how people expect results to sort. Pick one value that removes repeated work from many queries, not a one-off calculation.

Start with the most common filter columns, then put the main sort key last, like (workspace_id, status, full_name_key) if that matches the screen. This lets PostgreSQL filter quickly and then return rows already ordered without extra work.

Not very. A generated column can normalize text so behavior is consistent, but ILIKE '%term%' still tends to be slow with basic btree indexes on large tables. If performance matters, prefer prefix-style search where you can, reduce the searched dataset with other filters, or adjust the UI behavior for big tables.

Stored generated columns have to be based on immutable expressions, so functions like now() typically aren’t allowed and would also be conceptually wrong because the value would go stale. For time-based flags like “inactive for 90 days,” consider a normal column maintained by a job, or compute it at query time if it’s not heavily used.

Yes, but plan it like a real migration. Changing the expression means updating the schema and recomputing values for existing rows, which can take time and add write load, so do it in a controlled deployment window if the table is large.

Yes. The database has to compute and store the value on every insert and update, so heavy write workloads (imports, sync jobs) can slow down if you add too many generated fields or complex expressions. Keep expressions short, add only what you use, and measure write performance on busy tables.

Add a generated column, validate a few real rows, then add the index that matches the screen’s main filter and sort. Update the admin query to use the new column directly, and compare query time and rows scanned before and after to confirm the change helped.