Google VO Interview: UDP Out-of-Order Packet Reassembly — Sequencer Design (HashMap + seq pointer)

Google VO Interview

This Google VO question combines system design thinking with concrete implementation. It comes from real scenarios in log systems, stream processing pipelines, and distributed replication modules — all contexts where UDP's lack of ordering guarantees is a real challenge.

Problem Statement

UDP does not guarantee delivery order. An upstream sender continuously sends data fragments, each as (data, seq_num). Packets may arrive out of order, be duplicated, or be lost entirely.

Implement a Sequencer component on the receiver side that:

Outputs data strictly in ascending seq_num order, with no gaps.
If a middle sequence number has not arrived yet, buffer subsequent packets and wait until the gap is filled before outputting.

Approach

Core data structures: HashMap + next_seq pointer

Use a dict to buffer received-but-not-yet-outputtable fragments: key = seq_num, value = data.
Maintain a variable next_seq — the sequence number we expect to output next.

On receiving each new fragment:

Store (seq_num, data) in the buffer (deduplicate: ignore if already seen or already output).
Check if next_seq is in the buffer.
If yes: output it, remove from buffer, increment next_seq, repeat step 2.
If no: wait — do not output anything.

Why this works: HashMap gives O(1) lookup. Each packet is stored and retrieved exactly once. No sorting of the buffer is needed.

Code

class Sequencer:
    def __init__(self):
        self.buffer = {}      # seq_num -> data
        self.next_seq = 0     # next sequence number to output
        self.output = []      # ordered output so far

    def receive(self, seq_num: int, data: str):
        """Receive a data fragment."""
        # Deduplicate: ignore already-output or already-buffered
        if seq_num < self.next_seq or seq_num in self.buffer:
            return

        self.buffer[seq_num] = data

        # Flush as many in-order packets as possible
        while self.next_seq in self.buffer:
            self.output.append(self.buffer.pop(self.next_seq))
            self.next_seq += 1

    def get_output(self):
        return self.output

Dry Run

Arrival order: (2, "C") → (0, "A") → (1, "B") → (3, "D")

Call	buffer after store	next_seq	Flush action	output
receive(2,"C")	{2:"C"}	0	0 not in buffer, stop	[]
receive(0,"A")	{2:"C",0:"A"}	0	output "A", next=1; 1 not in buffer, stop	["A"]
receive(1,"B")	{2:"C",1:"B"}	1	output "B", next=2; output "C", next=3; 3 not in buffer	["A","B","C"]
receive(3,"D")	{3:"D"}	3	output "D", next=4	["A","B","C","D"]

Final ordered output: A → B → C → D ✓

Complexity Analysis

Time: O(1) per receive call (amortized) — each packet is stored once and flushed once.
Space: O(k) where k is the number of out-of-order packets currently buffered (worst case O(n)).

Follow-up: Packet Loss Handling Strategies

The interviewer will likely ask: what if a seq_num never arrives?

Strategy 1: Timeout + Retransmission Request (ARQ)

Track how long we've been waiting for next_seq. If it exceeds a threshold, send a retransmission request (NACK) to the sender.

import time

class SequencerWithTimeout(Sequencer):
    def __init__(self, timeout=5.0):
        super().__init__()
        self.wait_start = None
        self.timeout = timeout

    def receive(self, seq_num, data):
        super().receive(seq_num, data)
        if self.next_seq not in self.buffer:
            if self.wait_start is None:
                self.wait_start = time.time()
        else:
            self.wait_start = None

    def check_timeout(self):
        """Returns seq_num to retransmit, or None."""
        if self.wait_start and time.time() - self.wait_start > self.timeout:
            return self.next_seq
        return None

Strategy 2: Skip and Forward

For latency-sensitive applications (live streaming, real-time audio/video), waiting is unacceptable. Skip the missing sequence number, insert a placeholder, and advance next_seq to the next available buffered packet.

def skip_missing(self):
    """Skip the missing seq and continue from next available."""
    if not self.buffer:
        return
    next_available = min(self.buffer.keys())
    self.output.append(None)  # placeholder for lost packet
    self.next_seq = next_available
    while self.next_seq in self.buffer:
        self.output.append(self.buffer.pop(self.next_seq))
        self.next_seq += 1

Strategy Comparison

Strategy	Best for	Pros	Cons
Timeout + ARQ	Log systems, replication	Data completeness	Added latency
Skip & Forward	Live streaming, real-time	Low latency	Data loss accepted

Extended Discussion Points

Duplicates: handled by if seq_num < self.next_seq or seq_num in self.buffer.
Sliding window: add a maximum buffer size to prevent unbounded memory growth — analogous to TCP's receive window.
Thread safety: in production, receive and check_timeout run on different threads — protect buffer and next_seq with a lock.

💬 Need VO Support?

Real-time VO assistance for Google, Amazon, TikTok, Microsoft, Meta, Uber, and more.

WeChat: Coding0201

📬 Contact

Email: [email protected]
Telegram: @OAVOProxy