← Back to blog Snowflake Onsite Debrief: LRU + Range Module Phone Screens, Sliding-Window Mode + Line Sweep + Log Ingestion Design
Snowflake

Snowflake Onsite Debrief: LRU + Range Module Phone Screens, Sliding-Window Mode + Line Sweep + Log Ingestion Design

2026-06-06

Snowflake's interviews weigh engineering completeness and system-design trade-offs heavily, especially on the Infra track. The full flow is two phone screens (pure coding) plus a four-round Virtual Onsite (two coding + one BQ + one system design). Difficulty runs medium-high, but interviewers look closely—boundary handling and the rationale behind data-structure choices drive the score. Here is the full debrief.

Snowflake Interview at a Glance

Stage Rounds Content
Phone screen 2 × 1h Pure coding: LRU Cache / Range Module
Onsite 4 rounds 2 coding + 1 BQ + 1 system design
Difficulty Medium-high Boundary handling + DS-choice rationale matter
Style Infra-oriented System design must reach concrete implementation

Phone Screen Quick Notes

Onsite Problem 1: Sliding-Window Mode

Given an integer array and window size k, return the most frequent element in each window. Queries must be efficient—no rescanning the window each time.

Reframe: a HashMap for frequencies + a max-heap with lazy deletion. As the window slides, update frequencies and push the new frequency onto the heap; if the top's frequency disagrees with the true count, lazily discard it.

import heapq
from collections import defaultdict

def window_mode(nums, k):
    freq = defaultdict(int)
    heap = []                           # (-count, value)
    res = []
    for i, x in enumerate(nums):
        freq[x] += 1
        heapq.heappush(heap, (-freq[x], x))
        if i >= k - 1:
            # lazy deletion: drop a stale top
            while -heap[0][0] != freq[heap[0][1]]:
                heapq.heappop(heap)
            res.append(heap[0][1])
            # evict the leftmost element of the window
            left = nums[i - k + 1]
            freq[left] -= 1
    return res

Discussion focus: the interviewer repeatedly asked how to efficiently handle already-removed elements—the heart of lazy deletion is "skip a stale frequency snapshot on the heap." Time: O(n log n). Space: O(n).

Onsite Problem 2: Interval Merge With Frequency (Line Sweep)

Intervals may overlap multiple times; output the merged intervals along with their overlap frequency.

Reframe: line sweep. Split each interval into a start (+1) and end (-1) event, sort by time, scan while tracking the active interval count (frequency), and emit a segment wherever the frequency changes.

def merge_with_freq(intervals):
    events = []
    for s, e in intervals:
        events.append((s, 1))           # enter: frequency +1
        events.append((e, -1))          # leave: frequency -1
    events.sort()
    res = []
    active = 0
    prev = None
    for pos, delta in events:
        if prev is not None and pos > prev and active > 0:
            res.append((prev, pos, active))   # [prev, pos) at frequency active
        active += delta
        prev = pos
    return res

Discussion focus: boundary handling—whether start and end share a timestamp, whether intervals are inclusive. Defining the ordering of -1 vs +1 events when sorting endpoints is key. Time: O(n log n). Space: O(n).

Onsite System Design: Log Ingestion Service

Design a (simplified) log ingestion service supporting high-throughput ingest, durable storage, and queries for the most recent N logs.

Architecture (it must reach concrete implementation, not just high level):

Layer Choice Purpose
Entry Load Balancer Spread write pressure
Buffer Kafka / homegrown ring buffer Absorb spikes, decouple ingest from storage
Storage S3 (persist) + RocksDB (index) Durability + fast queries
Query Time-based index Accelerate "most recent N"

Trade-offs the interviewer pushed on: log dedup, out-of-order handling, consistency. The conversation leaned Infra, probing your grasp of bottlenecks—write hotspots, index size, query latency, and how you balance them.

BQ Round Quick Notes

The interviewer asked several project-handling questions:

I shared balancing batch latency vs resource utilization in an async data-processing system. The interviewer cared whether you can make judgment calls in ambiguous situations.

Prep Suggestions


FAQ

How many rounds is the Snowflake interview?

Two phone screens (pure coding) plus a four-round onsite (two coding + one BQ + one system design). It leans Infra and weighs engineering completeness.

Is Snowflake's coding hard?

Medium-high. The problems themselves (LRU, Range Module, sliding window, line sweep) aren't the hardest, but interviewers look extremely closely and repeatedly probe corner cases like lazy deletion, TreeMap complexity, and interval splitting.

How do I prep Snowflake's system design?

Reach concrete implementation, not just high-level structure. For log ingestion, expand the buffer (Kafka / ring buffer), storage (S3 + RocksDB), index (time-based), and the dedup / out-of-order / consistency trade-offs.

What's the efficient way to prep the Snowflake VO?

Drill the high-frequency data-structure problems until you can justify your choices, and prepare an Infra-grounded system design. If you want timed mocks of these problems, focused drills on line sweep / lazy-deletion heaps, or live VO support / VO proxy pairing, share the job description so we can predict the problem set and plan practice.


Preparing for the Snowflake onsite?

Snowflake tests engineering completeness, Infra system design, and trade-off communication. oavoservice offers full VO coaching for Snowflake and data-infrastructure tracks: timed sliding-window / line-sweep mocks, log-ingestion design walkthroughs, and BQ trade-off story polishing, plus live VO support / VO proxy pairing. Coaches include former big-tech Infra engineers familiar with Snowflake's "implementation-detail" scoring style.

Add WeChat Coding0201 to get Snowflake VO problems and mocks.

Contact