浅谈KMP算法 —— 高效字符串匹配

pjh123 字数: 11820 阅读耗时: 29 分钟 2025/08/14 博客独享热度: 21 评论: 0

引言

字符串匹配是计算机科学里非常常见的问题，例如：

文本编辑器中的「查找」功能
DNA 序列比对
搜索引擎的关键词检索

设文本串长度为 n，模式串长度为 m，则朴素算法的时间复杂度是 O(nm)，当文本和模式串都很长时效率很差。

KMP 算法 通过在匹配失败时利用已知的匹配信息来 减少回退，最终实现最优 O(n+m) 的复杂度。

问题定义

给定一个文本串 T ，和一个模式串 P，要求判断 P 是否在 T 中出现，如果出现则给出所有匹配位置。

输入：

文本串：T = "ababcabcacbab"
模式串：P = "abcac"

输出

匹配位置：[6]

算法思想

直观解释

朴素的匹配算法，在单个字符匹配失败时会 全部回退，导致最坏的情况下可以达到 O(nm) 的复杂度。

KMP 算法则在匹配失败时根据 前缀函数（next数组） 回到 上一个可能匹配的位置，避免了从头开始匹配。

前缀函数（next数组）

定义 next[i] 表示模式串前缀串 P[0...i] 的 最长相等前后缀长度。

这意味着每当后缀匹配失败，本应重新开始的部分自动根据 相等的前缀 平移了若干位置来避免从头回退。

构造示例

对于模式串 P=abdabcab 的构造如下：

i	1	2	3	4	5	6	7	8
P	a	b	d	a	b	c	a	b
next	0	0	0	1	2	0	1	2

解释如下：

i = 4 时，前缀串为 abda，最长相等前后缀为 a；
i = 5 时，前缀串为 abdab，最长相等前后缀为 ab；
i = 6 时，前缀串为 abdabc，无相等前后缀；
i = 7 时，前缀串为 abdabca，最长相等前后缀为 a；
i = 8 时，前缀串为 abdabcab，最长相等前后缀为 ab。

代码实现

以 P3375 【模板】KMP 为例

#include <iostream>
#include <string>
#include <vector>

std::vector<int> build_fail(const std::string &str)
{
    std::vector<int> fails;
    fails.reserve(str.size());
    fails.push_back(0);
    int ptr = 0;
    for (int idx = 1; idx < str.size(); ++idx)
    {
        while (ptr > 0 && str[idx] != str[ptr])
            ptr = fails[ptr - 1];
        fails.push_back(ptr = (str[idx] == str[ptr] ? ++ptr : 0));
    }
    return fails;
}

std::vector<int> compare(const std::string &text, const std::string &cmp)
{
    std::vector<int> pos, fail = build_fail(cmp);
    for (int idx1 = 0, idx2 = 0; idx1 < text.size(); ++idx1)
    {
        if (idx2 < cmp.size() && text[idx1] == cmp[idx2])
            ++idx2;
        else
            while (idx2 > 0)
            {
                idx2 = fail[idx2 - 1];
                if (text[idx1] == cmp[idx2])
                {
                    idx2++;
                    break;
                }
            }

        if (idx2 == cmp.size())
            pos.push_back(idx1 - cmp.size() + 2);
    }
    return pos;
}

int main()
{
    std::string text, cmp;
    std::cin >> text >> cmp;
    auto pos = compare(text, cmp);
    for (const auto p : pos)
        std::cout << p << std::endl;

    auto fail = build_fail(cmp);
    for (const auto f : fail)
        std::cout << f << " ";
}