Manufacturing Anxiety: Breaking the Illusion
Recently, a Google string interview question has been causing a stir in North American job hunting circles.
Many people's first reaction: "Isn't this just string matching? Python's in operator, easy!"
Dead wrong.
The core of this problem is transliteration mapping — one character can map to multiple characters (like æ → ae), which makes ordinary substring matching completely fail.
Our oavoservice team analyzed this deeply and found that this question tests string preprocessing, two-pointer matching, and edge case handling comprehensively.
Question Breakdown: Demonstrating Expertise
🔍 Original Problem
String Matching With Transliteration
You may or may not know that Google Docs has a feature to find and replace even for characters not in the standard English alphabet ([A-Z]).
To do this, we first need a transliteration map:
tl_map = {
"ë": "e",
"é": "e",
"û": "u",
"æ": "ae",
"ž": "zed",
# ...
}
Given a haystack string, a needle string, and the transliteration mapping, return whether the needle is matched within the haystack.
🎯 Example Analysis
Example 1:
haystack: "I love crème brûlée"
needle: "eme"
return: True
Explanation:
- "è" in "crème" transliterates to "e"
- "crème" → "creme"
- "eme" is a substring of "creme" ✓
Example 2:
haystack: "I love crème brûlée"
needle: "ême"
return: True
Explanation:
- "ê" in needle "ême" transliterates to "e"
- needle → "eme"
- "crème" in haystack → "creme"
- "eme" is a substring of "creme" ✓
Example 3:
haystack: "Ole Gunnar Solskjær is amazing"
needle: "Solskja"
return: True
Explanation:
- "æ" in "Solskjær" transliterates to "ae"
- "Solskjær" → "Solskjaer"
- "Solskja" is a prefix of "Solskjaer" ✓
Example 4: ⚠️ Key Trap
haystack: "Ole Gunnar Solskjær is amazing"
needle: "Solskjear"
return: False
Explanation:
- "Solskjær" → "Solskjaer"
- "Solskjear" ≠ "Solskjaer" (e and a positions differ)
- Note: "ea" in needle ≠ transliteration "ae" of "æ" ✗
Deep Dive: Building Trust
🧠 Problem Essence Analysis
The difficulties in this problem:
| Test Point | Specific Content |
|---|---|
| One-to-Many Mapping | One character can map to multiple (æ → ae) |
| Bidirectional Transliteration | Both haystack and needle need transliteration |
| Length Changes | String length changes after transliteration |
| Edge Case Handling | Partial matches, cross-character matching |
📊 Key Insight
Method 1: Preprocess Transliteration
Most intuitive: transliterate both haystack and needle to standard English first, then do substring matching.
haystack: "crème" → "creme"
needle: "ême" → "eme"
Match: "eme" in "creme" → True
Method 2: Match While Transliterating
No preprocessing, expand characters on-demand during matching. Suitable for very long strings.
Solution: Core Algorithm
Solution 1: Preprocessing + Substring Match (Recommended)
def transliterate(s: str, tl_map: dict) -> str:
"""
Transliterate non-standard characters to standard English
Args:
s: Original string
tl_map: Transliteration mapping
Returns:
Transliterated string
"""
result = []
for char in s:
if char in tl_map:
result.append(tl_map[char])
else:
result.append(char)
return ''.join(result)
def string_match_with_transliteration(haystack: str, needle: str, tl_map: dict) -> bool:
"""
Google Interview: String Matching With Transliteration
Args:
haystack: String to search in
needle: Pattern to find
tl_map: Transliteration mapping
Returns:
bool: Whether needle (transliterated) is substring of haystack (transliterated)
"""
# Step 1: Transliterate both strings
trans_haystack = transliterate(haystack, tl_map)
trans_needle = transliterate(needle, tl_map)
# Step 2: Substring matching
return trans_needle in trans_haystack
🧪 Test Cases
tl_map = {
"ë": "e",
"é": "e",
"è": "e",
"ê": "e",
"û": "u",
"ù": "u",
"æ": "ae",
"ž": "zed",
}
# Test 1: Basic match
print(string_match_with_transliteration(
"I love crème brûlée", "eme", tl_map
)) # Expected: True
# Test 2: Needle also needs transliteration
print(string_match_with_transliteration(
"I love crème brûlée", "ême", tl_map
)) # Expected: True
# Test 3: One-to-many mapping
print(string_match_with_transliteration(
"Ole Gunnar Solskjær is amazing", "Solskja", tl_map
)) # Expected: True
# Test 4: Key trap - wrong order
print(string_match_with_transliteration(
"Ole Gunnar Solskjær is amazing", "Solskjear", tl_map
)) # Expected: False
# Test 5: Complete match
print(string_match_with_transliteration(
"Ole Gunnar Solskjær is amazing", "Solskjaer", tl_map
)) # Expected: True
# Test 6: Empty needle
print(string_match_with_transliteration(
"Hello World", "", tl_map
)) # Expected: True
# Test 7: No special characters
print(string_match_with_transliteration(
"Hello World", "World", tl_map
)) # Expected: True
# Test 8: Multi-character mapping
print(string_match_with_transliteration(
"ženeva", "zed", tl_map
)) # Expected: True
Solution 2: Lazy Expansion (Memory Optimized)
When haystack is very long, preprocessing uses too much memory. Use "lazy expansion":
def lazy_match(haystack: str, needle: str, tl_map: dict) -> bool:
"""
Lazy expansion matching: don't preprocess entire string
Suitable for very long haystack scenarios
"""
# Transliterate needle first (usually shorter)
trans_needle = transliterate(needle, tl_map)
if not trans_needle:
return True
# Generator: expand haystack character by character
def expand_haystack():
for char in haystack:
expanded = tl_map.get(char, char)
for c in expanded:
yield c
# Sliding window matching
needle_len = len(trans_needle)
window = []
for char in expand_haystack():
window.append(char)
if len(window) > needle_len:
window.pop(0)
if len(window) == needle_len and ''.join(window) == trans_needle:
return True
return False
Solution 3: KMP Optimization (Interview Bonus)
If interviewer asks "how to optimize time complexity", mention KMP:
def kmp_match(haystack: str, needle: str, tl_map: dict) -> bool:
"""
KMP optimized version
Time Complexity: O(n + m), where n = len(haystack), m = len(needle)
"""
trans_haystack = transliterate(haystack, tl_map)
trans_needle = transliterate(needle, tl_map)
if not trans_needle:
return True
# Build KMP failure function
def build_failure(pattern):
m = len(pattern)
failure = [0] * m
j = 0
for i in range(1, m):
while j > 0 and pattern[i] != pattern[j]:
j = failure[j - 1]
if pattern[i] == pattern[j]:
j += 1
failure[i] = j
return failure
failure = build_failure(trans_needle)
# KMP matching
j = 0
for i, char in enumerate(trans_haystack):
while j > 0 and char != trans_needle[j]:
j = failure[j - 1]
if char == trans_needle[j]:
j += 1
if j == len(trans_needle):
return True
return False
Complexity Analysis
| Method | Time Complexity | Space Complexity | Use Case |
|---|---|---|---|
Preprocess + in |
O(n × m) | O(n + m) | General |
| Lazy Expansion | O(n × m) | O(m) | Very long haystack |
| KMP Optimization | O(n + m) | O(n + m) | High performance |
Where: n = len(haystack), m = len(needle)
🤯 Interviewer's Followup Traps
Followup 1: What if mapping is bidirectional?
Question: Not only æ → ae, but also ae → æ?
Answer: Need to build equivalence classes, normalize all equivalent representations.
def normalize(s: str, equivalences: list) -> str:
"""
Normalize: convert all equivalent representations to unified form
equivalences: [("æ", "ae"), ("ë", "e"), ...]
"""
for orig, replacement in equivalences:
s = s.replace(orig, replacement)
return s
Followup 2: What if you need to return all match positions?
def find_all_matches(haystack: str, needle: str, tl_map: dict) -> list:
"""Return all match starting positions (in original haystack)"""
trans_haystack = transliterate(haystack, tl_map)
trans_needle = transliterate(needle, tl_map)
# Build transliterated position to original position mapping
pos_map = [] # pos_map[i] = original position of transliterated char i
for i, char in enumerate(haystack):
expanded = tl_map.get(char, char)
for _ in expanded:
pos_map.append(i)
# Find all matches
results = []
start = 0
while True:
idx = trans_haystack.find(trans_needle, start)
if idx == -1:
break
results.append(pos_map[idx]) # Convert back to original position
start = idx + 1
return results
Followup 3: What about case-insensitive matching?
def case_insensitive_match(haystack: str, needle: str, tl_map: dict) -> bool:
"""Case-insensitive transliteration matching"""
# Extend mapping to be case-insensitive
extended_map = {}
for k, v in tl_map.items():
extended_map[k.lower()] = v.lower()
extended_map[k.upper()] = v.upper()
return string_match_with_transliteration(
haystack.lower(),
needle.lower(),
extended_map
)
Followup 4: How to handle Unicode combining characters?
Question: é could be single character \u00e9, or e + \u0301 (combining accent).
Answer: Use Unicode normalization (NFC/NFD).
import unicodedata
def normalize_unicode(s: str) -> str:
"""Unicode normalization: convert combining chars to precomposed form"""
return unicodedata.normalize('NFC', s)
def match_with_unicode_normalization(haystack: str, needle: str, tl_map: dict) -> bool:
"""Normalize first, then match"""
haystack = normalize_unicode(haystack)
needle = normalize_unicode(needle)
return string_match_with_transliteration(haystack, needle, tl_map)
🔥 Why Most People Fail
| Common Mistake | Correct Approach |
|---|---|
| Only transliterate haystack | Both haystack and needle need transliteration |
| Ignore one-to-many mapping | æ → ae changes string length |
Chain replace calls |
May cause double replacement, process char by char |
| Forget empty string check | Empty needle should return True |
| Wrong position mapping | Return positions must map back to original string |
Code Template (Ready to Use)
def transliterate(s: str, tl_map: dict) -> str:
"""Transliterate string"""
return ''.join(tl_map.get(c, c) for c in s)
def match_with_transliteration(haystack: str, needle: str, tl_map: dict) -> bool:
"""
Google Interview: String Matching With Transliteration
"""
return transliterate(needle, tl_map) in transliterate(haystack, tl_map)
📞 oavoservice Services
This type of string processing + mapping conversion problem is classic Google interview style.
If you encounter similar problems in interviews, we can provide:
- ✅ OA Assistance: CodeSignal / HackerRank perfect score guarantee
- ✅ VO Support: Live Coding real-time off-screen assistance
- ✅ Followup Prep: Help you anticipate Unicode, position mapping questions
👉 Add WeChat Now: Coding0201
Don't let a string transliteration problem ruin your Google Offer.
Original content by oavoservice team. Please credit when sharing.