← 返回博客列表
Google

🚨 Google 2026 Interview: String Matching With Transliteration — The Algorithm Behind Google Docs

2026-01-19

Manufacturing Anxiety: Breaking the Illusion

Recently, a Google string interview question has been causing a stir in North American job hunting circles.

Many people's first reaction: "Isn't this just string matching? Python's in operator, easy!"

Dead wrong.

The core of this problem is transliteration mapping — one character can map to multiple characters (like æ → ae), which makes ordinary substring matching completely fail.

Our oavoservice team analyzed this deeply and found that this question tests string preprocessing, two-pointer matching, and edge case handling comprehensively.


Question Breakdown: Demonstrating Expertise

🔍 Original Problem

String Matching With Transliteration

You may or may not know that Google Docs has a feature to find and replace even for characters not in the standard English alphabet ([A-Z]).

To do this, we first need a transliteration map:

tl_map = {
    "ë": "e",
    "é": "e",
    "û": "u",
    "æ": "ae",
    "ž": "zed",
    # ...
}

Given a haystack string, a needle string, and the transliteration mapping, return whether the needle is matched within the haystack.

🎯 Example Analysis

Example 1:

haystack: "I love crème brûlée"
needle:   "eme"
return:   True

Explanation:
- "è" in "crème" transliterates to "e"
- "crème" → "creme"
- "eme" is a substring of "creme" ✓

Example 2:

haystack: "I love crème brûlée"
needle:   "ême"
return:   True

Explanation:
- "ê" in needle "ême" transliterates to "e"
- needle → "eme"
- "crème" in haystack → "creme"
- "eme" is a substring of "creme" ✓

Example 3:

haystack: "Ole Gunnar Solskjær is amazing"
needle:   "Solskja"
return:   True

Explanation:
- "æ" in "Solskjær" transliterates to "ae"
- "Solskjær" → "Solskjaer"
- "Solskja" is a prefix of "Solskjaer" ✓

Example 4: ⚠️ Key Trap

haystack: "Ole Gunnar Solskjær is amazing"
needle:   "Solskjear"
return:   False

Explanation:
- "Solskjær" → "Solskjaer"
- "Solskjear" ≠ "Solskjaer" (e and a positions differ)
- Note: "ea" in needle ≠ transliteration "ae" of "æ" ✗

Deep Dive: Building Trust

🧠 Problem Essence Analysis

The difficulties in this problem:

Test Point Specific Content
One-to-Many Mapping One character can map to multiple (æ → ae)
Bidirectional Transliteration Both haystack and needle need transliteration
Length Changes String length changes after transliteration
Edge Case Handling Partial matches, cross-character matching

📊 Key Insight

Method 1: Preprocess Transliteration

Most intuitive: transliterate both haystack and needle to standard English first, then do substring matching.

haystack: "crème" → "creme"
needle:   "ême"   → "eme"
Match: "eme" in "creme" → True

Method 2: Match While Transliterating

No preprocessing, expand characters on-demand during matching. Suitable for very long strings.


Solution: Core Algorithm

Solution 1: Preprocessing + Substring Match (Recommended)

def transliterate(s: str, tl_map: dict) -> str:
    """
    Transliterate non-standard characters to standard English
    
    Args:
        s: Original string
        tl_map: Transliteration mapping
    
    Returns:
        Transliterated string
    """
    result = []
    for char in s:
        if char in tl_map:
            result.append(tl_map[char])
        else:
            result.append(char)
    return ''.join(result)


def string_match_with_transliteration(haystack: str, needle: str, tl_map: dict) -> bool:
    """
    Google Interview: String Matching With Transliteration
    
    Args:
        haystack: String to search in
        needle: Pattern to find
        tl_map: Transliteration mapping
    
    Returns:
        bool: Whether needle (transliterated) is substring of haystack (transliterated)
    """
    # Step 1: Transliterate both strings
    trans_haystack = transliterate(haystack, tl_map)
    trans_needle = transliterate(needle, tl_map)
    
    # Step 2: Substring matching
    return trans_needle in trans_haystack

🧪 Test Cases

tl_map = {
    "ë": "e",
    "é": "e",
    "è": "e",
    "ê": "e",
    "û": "u",
    "ù": "u",
    "æ": "ae",
    "ž": "zed",
}

# Test 1: Basic match
print(string_match_with_transliteration(
    "I love crème brûlée", "eme", tl_map
))  # Expected: True

# Test 2: Needle also needs transliteration
print(string_match_with_transliteration(
    "I love crème brûlée", "ême", tl_map
))  # Expected: True

# Test 3: One-to-many mapping
print(string_match_with_transliteration(
    "Ole Gunnar Solskjær is amazing", "Solskja", tl_map
))  # Expected: True

# Test 4: Key trap - wrong order
print(string_match_with_transliteration(
    "Ole Gunnar Solskjær is amazing", "Solskjear", tl_map
))  # Expected: False

# Test 5: Complete match
print(string_match_with_transliteration(
    "Ole Gunnar Solskjær is amazing", "Solskjaer", tl_map
))  # Expected: True

# Test 6: Empty needle
print(string_match_with_transliteration(
    "Hello World", "", tl_map
))  # Expected: True

# Test 7: No special characters
print(string_match_with_transliteration(
    "Hello World", "World", tl_map
))  # Expected: True

# Test 8: Multi-character mapping
print(string_match_with_transliteration(
    "ženeva", "zed", tl_map
))  # Expected: True

Solution 2: Lazy Expansion (Memory Optimized)

When haystack is very long, preprocessing uses too much memory. Use "lazy expansion":

def lazy_match(haystack: str, needle: str, tl_map: dict) -> bool:
    """
    Lazy expansion matching: don't preprocess entire string
    Suitable for very long haystack scenarios
    """
    # Transliterate needle first (usually shorter)
    trans_needle = transliterate(needle, tl_map)
    
    if not trans_needle:
        return True
    
    # Generator: expand haystack character by character
    def expand_haystack():
        for char in haystack:
            expanded = tl_map.get(char, char)
            for c in expanded:
                yield c
    
    # Sliding window matching
    needle_len = len(trans_needle)
    window = []
    
    for char in expand_haystack():
        window.append(char)
        if len(window) > needle_len:
            window.pop(0)
        if len(window) == needle_len and ''.join(window) == trans_needle:
            return True
    
    return False

Solution 3: KMP Optimization (Interview Bonus)

If interviewer asks "how to optimize time complexity", mention KMP:

def kmp_match(haystack: str, needle: str, tl_map: dict) -> bool:
    """
    KMP optimized version
    Time Complexity: O(n + m), where n = len(haystack), m = len(needle)
    """
    trans_haystack = transliterate(haystack, tl_map)
    trans_needle = transliterate(needle, tl_map)
    
    if not trans_needle:
        return True
    
    # Build KMP failure function
    def build_failure(pattern):
        m = len(pattern)
        failure = [0] * m
        j = 0
        for i in range(1, m):
            while j > 0 and pattern[i] != pattern[j]:
                j = failure[j - 1]
            if pattern[i] == pattern[j]:
                j += 1
            failure[i] = j
        return failure
    
    failure = build_failure(trans_needle)
    
    # KMP matching
    j = 0
    for i, char in enumerate(trans_haystack):
        while j > 0 and char != trans_needle[j]:
            j = failure[j - 1]
        if char == trans_needle[j]:
            j += 1
        if j == len(trans_needle):
            return True
    
    return False

Complexity Analysis

Method Time Complexity Space Complexity Use Case
Preprocess + in O(n × m) O(n + m) General
Lazy Expansion O(n × m) O(m) Very long haystack
KMP Optimization O(n + m) O(n + m) High performance

Where: n = len(haystack), m = len(needle)


🤯 Interviewer's Followup Traps

Followup 1: What if mapping is bidirectional?

Question: Not only æ → ae, but also ae → æ?

Answer: Need to build equivalence classes, normalize all equivalent representations.

def normalize(s: str, equivalences: list) -> str:
    """
    Normalize: convert all equivalent representations to unified form
    equivalences: [("æ", "ae"), ("ë", "e"), ...]
    """
    for orig, replacement in equivalences:
        s = s.replace(orig, replacement)
    return s

Followup 2: What if you need to return all match positions?

def find_all_matches(haystack: str, needle: str, tl_map: dict) -> list:
    """Return all match starting positions (in original haystack)"""
    trans_haystack = transliterate(haystack, tl_map)
    trans_needle = transliterate(needle, tl_map)
    
    # Build transliterated position to original position mapping
    pos_map = []  # pos_map[i] = original position of transliterated char i
    for i, char in enumerate(haystack):
        expanded = tl_map.get(char, char)
        for _ in expanded:
            pos_map.append(i)
    
    # Find all matches
    results = []
    start = 0
    while True:
        idx = trans_haystack.find(trans_needle, start)
        if idx == -1:
            break
        results.append(pos_map[idx])  # Convert back to original position
        start = idx + 1
    
    return results

Followup 3: What about case-insensitive matching?

def case_insensitive_match(haystack: str, needle: str, tl_map: dict) -> bool:
    """Case-insensitive transliteration matching"""
    # Extend mapping to be case-insensitive
    extended_map = {}
    for k, v in tl_map.items():
        extended_map[k.lower()] = v.lower()
        extended_map[k.upper()] = v.upper()
    
    return string_match_with_transliteration(
        haystack.lower(), 
        needle.lower(), 
        extended_map
    )

Followup 4: How to handle Unicode combining characters?

Question: é could be single character \u00e9, or e + \u0301 (combining accent).

Answer: Use Unicode normalization (NFC/NFD).

import unicodedata

def normalize_unicode(s: str) -> str:
    """Unicode normalization: convert combining chars to precomposed form"""
    return unicodedata.normalize('NFC', s)

def match_with_unicode_normalization(haystack: str, needle: str, tl_map: dict) -> bool:
    """Normalize first, then match"""
    haystack = normalize_unicode(haystack)
    needle = normalize_unicode(needle)
    return string_match_with_transliteration(haystack, needle, tl_map)

🔥 Why Most People Fail

Common Mistake Correct Approach
Only transliterate haystack Both haystack and needle need transliteration
Ignore one-to-many mapping æ → ae changes string length
Chain replace calls May cause double replacement, process char by char
Forget empty string check Empty needle should return True
Wrong position mapping Return positions must map back to original string

Code Template (Ready to Use)

def transliterate(s: str, tl_map: dict) -> str:
    """Transliterate string"""
    return ''.join(tl_map.get(c, c) for c in s)

def match_with_transliteration(haystack: str, needle: str, tl_map: dict) -> bool:
    """
    Google Interview: String Matching With Transliteration
    """
    return transliterate(needle, tl_map) in transliterate(haystack, tl_map)

📞 oavoservice Services

This type of string processing + mapping conversion problem is classic Google interview style.

If you encounter similar problems in interviews, we can provide:


👉 Add WeChat Now: Coding0201

Don't let a string transliteration problem ruin your Google Offer.


Original content by oavoservice team. Please credit when sharing.