🚨 Google 2026 面試真題：字串轉寫匹配 —— Google Docs 背後的演算法，為何讓候選人頻頻翻車？

製造焦慮：打破幻覺

最近 Google 的一道字串面試題在北美求職圈引發熱議。

很多人第一眼看到題目：「不就是字串匹配嗎？Python 的 in 操作符秒了！」

大錯特錯。

這道題的核心在於轉寫映射（Transliteration）—— 一個字元可能映射成多個字元（如 æ → ae），這讓普通的子串匹配完全失效。

我們 oavoservice 團隊深入分析後發現，這道題考察的是 字串預處理、雙指標匹配 和 邊界處理 的綜合能力。

題目拆解：展示專業

🔍 原題描述

String Matching With Transliteration

You may or may not know that Google Docs has a feature to find and replace even for characters not in the standard English alphabet ([A-Z]).

To do this, we first need a transliteration map:

tl_map = {
    "ë": "e",
    "é": "e",
    "û": "u",
    "æ": "ae",
    "ž": "zed",
    # ...
}

Given a haystack string, a needle string, and the transliteration mapping, return whether the needle is matched within the haystack.

📝 中文翻譯

Google Docs 有一個功能：即使是非標準英文字母（如帶重音符號的字元），也能進行查找和替換。

為了實現這個功能，我們需要一個轉寫映射表，將非標準字元轉換為標準英文字串。

給定：

haystack：被搜尋的字串
needle：要查找的模式串
tl_map：轉寫映射表

問題：判斷 needle（轉寫後）是否是 haystack（轉寫後）的子串。

🎯 範例分析

Example 1:

haystack: "I love crème brûlée"
needle:   "eme"
return:   True

解釋：
- "crème" 中的 "è" 轉寫為 "e"
- "crème" → "creme"
- "eme" 是 "creme" 的子串 ✓

Example 2:

haystack: "I love crème brûlée"
needle:   "ême"
return:   True

解釋：
- needle "ême" 中的 "ê" 轉寫為 "e"
- needle → "eme"
- haystack 中 "crème" → "creme"
- "eme" 是 "creme" 的子串 ✓

Example 3:

haystack: "Ole Gunnar Solskjær is amazing"
needle:   "Solskja"
return:   True

解釋：
- "Solskjær" 中的 "æ" 轉寫為 "ae"
- "Solskjær" → "Solskjaer"
- "Solskja" 是 "Solskjaer" 的前綴 ✓

Example 4: ⚠️ 關鍵陷阱

haystack: "Ole Gunnar Solskjær is amazing"
needle:   "Solskjear"
return:   False

解釋：
- "Solskjær" → "Solskjaer"
- "Solskjear" ≠ "Solskjaer"（e 和 a 的位置不同）
- 注意：needle 中的 "ea" 不等於 "æ" 的轉寫 "ae" ✗

深度複盤：建立信任

🧠 題目本質分析

這道題的難點在於：

考點	具體內容
一對多映射	一個字元可能映射成多個字元（æ → ae）
雙向轉寫	haystack 和 needle 都需要轉寫
長度變化	轉寫後字串長度會變化
邊界處理	部分匹配、跨字元匹配

📊 關鍵洞察

方法一：預處理轉寫

最直觀的方法：先把 haystack 和 needle 都轉寫成標準英文，再做子串匹配。

haystack: "crème" → "creme"
needle:   "ême"   → "eme"
匹配: "eme" in "creme" → True

方法二：邊匹配邊轉寫

不預處理，在匹配過程中按需展開字元。適用於超長字串場景。

方案引入：核心演算法

解法一：預處理 + 子串匹配（推薦）

def transliterate(s: str, tl_map: dict) -> str:
    """
    將字串中的非標準字元轉寫為標準英文
    
    Args:
        s: 原始字串
        tl_map: 轉寫映射表
    
    Returns:
        轉寫後的字串
    """
    result = []
    for char in s:
        if char in tl_map:
            result.append(tl_map[char])
        else:
            result.append(char)
    return ''.join(result)


def string_match_with_transliteration(haystack: str, needle: str, tl_map: dict) -> bool:
    """
    Google 面試真題：字串轉寫匹配
    
    Args:
        haystack: 被搜尋的字串
        needle: 要查找的模式串
        tl_map: 轉寫映射表
    
    Returns:
        bool: needle（轉寫後）是否是 haystack（轉寫後）的子串
    """
    # Step 1: 轉寫兩個字串
    trans_haystack = transliterate(haystack, tl_map)
    trans_needle = transliterate(needle, tl_map)
    
    # Step 2: 子串匹配
    return trans_needle in trans_haystack

🧪 測試用例

tl_map = {
    "ë": "e",
    "é": "e",
    "è": "e",
    "ê": "e",
    "û": "u",
    "ù": "u",
    "æ": "ae",
    "ž": "zed",
}

# Test 1: 基本匹配
print(string_match_with_transliteration(
    "I love crème brûlée", "eme", tl_map
))  # Expected: True

# Test 2: needle 也需要轉寫
print(string_match_with_transliteration(
    "I love crème brûlée", "ême", tl_map
))  # Expected: True

# Test 3: 一對多映射
print(string_match_with_transliteration(
    "Ole Gunnar Solskjær is amazing", "Solskja", tl_map
))  # Expected: True

# Test 4: 關鍵陷阱 - 順序錯誤
print(string_match_with_transliteration(
    "Ole Gunnar Solskjær is amazing", "Solskjear", tl_map
))  # Expected: False

# Test 5: 完整匹配
print(string_match_with_transliteration(
    "Ole Gunnar Solskjær is amazing", "Solskjaer", tl_map
))  # Expected: True

# Test 6: 空 needle
print(string_match_with_transliteration(
    "Hello World", "", tl_map
))  # Expected: True

# Test 7: 無特殊字元
print(string_match_with_transliteration(
    "Hello World", "World", tl_map
))  # Expected: True

# Test 8: 多字元映射
print(string_match_with_transliteration(
    "ženeva", "zed", tl_map
))  # Expected: True

解法二：邊匹配邊轉寫（優化記憶體）

當 haystack 非常長時，預處理會佔用大量記憶體。可以採用「懶展開」策略：

def lazy_match(haystack: str, needle: str, tl_map: dict) -> bool:
    """
    懶展開匹配：不預處理整個字串
    適用於超長 haystack 場景
    """
    # 先轉寫 needle（通常較短）
    trans_needle = transliterate(needle, tl_map)
    
    if not trans_needle:
        return True
    
    # 生成器：逐字元展開 haystack
    def expand_haystack():
        for char in haystack:
            expanded = tl_map.get(char, char)
            for c in expanded:
                yield c
    
    # 滑動視窗匹配
    needle_len = len(trans_needle)
    window = []
    
    for char in expand_haystack():
        window.append(char)
        if len(window) > needle_len:
            window.pop(0)
        if len(window) == needle_len and ''.join(window) == trans_needle:
            return True
    
    return False

解法三：KMP 優化（面試加分項）

如果面試官追問「如何優化時間複雜度」，可以提及 KMP 演算法：

def kmp_match(haystack: str, needle: str, tl_map: dict) -> bool:
    """
    KMP 優化版本
    時間複雜度: O(n + m)，其中 n = len(haystack), m = len(needle)
    """
    trans_haystack = transliterate(haystack, tl_map)
    trans_needle = transliterate(needle, tl_map)
    
    if not trans_needle:
        return True
    
    # 構建 KMP 失敗函數
    def build_failure(pattern):
        m = len(pattern)
        failure = [0] * m
        j = 0
        for i in range(1, m):
            while j > 0 and pattern[i] != pattern[j]:
                j = failure[j - 1]
            if pattern[i] == pattern[j]:
                j += 1
            failure[i] = j
        return failure
    
    failure = build_failure(trans_needle)
    
    # KMP 匹配
    j = 0
    for i, char in enumerate(trans_haystack):
        while j > 0 and char != trans_needle[j]:
            j = failure[j - 1]
        if char == trans_needle[j]:
            j += 1
        if j == len(trans_needle):
            return True
    
    return False

複雜度分析

方法	時間複雜度	空間複雜度	適用場景
預處理 + `in`	O(n × m)	O(n + m)	一般情況
懶展開匹配	O(n × m)	O(m)	超長 haystack
KMP 優化	O(n + m)	O(n + m)	高效能要求

其中：n = len(haystack)，m = len(needle)

🤯 面試官的 Followup 陷阱

Followup 1: 如果映射是雙向的怎麼辦？

問題：不僅 æ → ae，還有 ae → æ？

答案：需要構建等價類，將所有等價的表示歸一化。

def normalize(s: str, equivalences: list) -> str:
    """
    歸一化：將所有等價表示轉為統一形式
    equivalences: [("æ", "ae"), ("ë", "e"), ...]
    """
    for orig, replacement in equivalences:
        s = s.replace(orig, replacement)
    return s

Followup 2: 如果要返回所有匹配位置呢？

def find_all_matches(haystack: str, needle: str, tl_map: dict) -> list:
    """返回所有匹配的起始位置（在原始 haystack 中）"""
    trans_haystack = transliterate(haystack, tl_map)
    trans_needle = transliterate(needle, tl_map)
    
    # 建立轉寫後位置到原始位置的映射
    pos_map = []  # pos_map[i] = 轉寫後第 i 個字元對應原始字串的位置
    for i, char in enumerate(haystack):
        expanded = tl_map.get(char, char)
        for _ in expanded:
            pos_map.append(i)
    
    # 查找所有匹配
    results = []
    start = 0
    while True:
        idx = trans_haystack.find(trans_needle, start)
        if idx == -1:
            break
        results.append(pos_map[idx])  # 轉換回原始位置
        start = idx + 1
    
    return results

Followup 3: 如果要支援大小寫不敏感匹配？

def case_insensitive_match(haystack: str, needle: str, tl_map: dict) -> bool:
    """大小寫不敏感的轉寫匹配"""
    # 將映射表也擴展為大小寫不敏感
    extended_map = {}
    for k, v in tl_map.items():
        extended_map[k.lower()] = v.lower()
        extended_map[k.upper()] = v.upper()
    
    return string_match_with_transliteration(
        haystack.lower(), 
        needle.lower(), 
        extended_map
    )

Followup 4: 如何處理 Unicode 組合字元？

問題：é 可能是單個字元 \u00e9，也可能是 e + \u0301（組合重音符號）。

答案：使用 Unicode 正規化（NFC/NFD）。

import unicodedata

def normalize_unicode(s: str) -> str:
    """Unicode 正規化：將組合字元轉為預組合形式"""
    return unicodedata.normalize('NFC', s)

def match_with_unicode_normalization(haystack: str, needle: str, tl_map: dict) -> bool:
    """先正規化再匹配"""
    haystack = normalize_unicode(haystack)
    needle = normalize_unicode(needle)
    return string_match_with_transliteration(haystack, needle, tl_map)

🔥 為什麼大多數人會掛？

常見錯誤	正確做法
只轉寫 haystack	haystack 和 needle 都要轉寫
忽略一對多映射	`æ → ae` 會改變字串長度
用 `replace` 鏈式呼叫	可能導致重複替換，應逐字元處理
忘記空字串檢查	needle 為空應返回 True
位置映射錯誤	返回位置時要映射回原始字串

程式碼模板（可直接使用）

def transliterate(s: str, tl_map: dict) -> str:
    """轉寫字串"""
    return ''.join(tl_map.get(c, c) for c in s)

def match_with_transliteration(haystack: str, needle: str, tl_map: dict) -> bool:
    """
    Google 面試真題：字串轉寫匹配
    """
    return transliterate(needle, tl_map) in transliterate(haystack, tl_map)

📞 oavoservice 服務

這種字串處理 + 映射轉換的題目，是 Google 面試的經典風格。

如果你在面試中遇到類似題目，我們可以提供：

✅ OA代寫：CodeSignal / HackerRank 滿分保障
✅ VO輔助：Live Coding 即時場外助攻
✅ Followup 準備：幫你預判面試官的 Unicode、位置映射等追問

👉 立即添加微信：Coding0201

不要讓一道字串轉寫題，毀掉你的 Google Offer。

本文由 oavoservice 團隊原創，轉載請註明出處。