← Back to all recaps

Google VO Interview: UDP Out-of-Order Packet Reassembly — Sequencer Design (HashMap + seq pointer)

2 min read

Google VO Interview

This Google VO question combines system design thinking with concrete implementation. It comes from real scenarios in log systems, stream processing pipelines, and distributed replication modules — all contexts where UDP's lack of ordering guarantees is a real challenge.


Problem Statement

UDP does not guarantee delivery order. An upstream sender continuously sends data fragments, each as (data, seq_num). Packets may arrive out of order, be duplicated, or be lost entirely.

Implement a Sequencer component on the receiver side that:

  • Outputs data strictly in ascending seq_num order, with no gaps.
  • If a middle sequence number has not arrived yet, buffer subsequent packets and wait until the gap is filled before outputting.

Approach

Core data structures: HashMap + next_seq pointer

  • Use a dict to buffer received-but-not-yet-outputtable fragments: key = seq_num, value = data.
  • Maintain a variable next_seq — the sequence number we expect to output next.

On receiving each new fragment:

  1. Store (seq_num, data) in the buffer (deduplicate: ignore if already seen or already output).
  2. Check if next_seq is in the buffer.
  3. If yes: output it, remove from buffer, increment next_seq, repeat step 2.
  4. If no: wait — do not output anything.

Why this works: HashMap gives O(1) lookup. Each packet is stored and retrieved exactly once. No sorting of the buffer is needed.


Code

class Sequencer:
    def __init__(self):
        self.buffer = {}      # seq_num -> data
        self.next_seq = 0     # next sequence number to output
        self.output = []      # ordered output so far

    def receive(self, seq_num: int, data: str):
        """Receive a data fragment."""
        # Deduplicate: ignore already-output or already-buffered
        if seq_num < self.next_seq or seq_num in self.buffer:
            return

        self.buffer[seq_num] = data

        # Flush as many in-order packets as possible
        while self.next_seq in self.buffer:
            self.output.append(self.buffer.pop(self.next_seq))
            self.next_seq += 1

    def get_output(self):
        return self.output

Dry Run

Arrival order: (2, "C")(0, "A")(1, "B")(3, "D")

Call buffer after store next_seq Flush action output
receive(2,"C") {2:"C"} 0 0 not in buffer, stop []
receive(0,"A") {2:"C",0:"A"} 0 output "A", next=1; 1 not in buffer, stop ["A"]
receive(1,"B") {2:"C",1:"B"} 1 output "B", next=2; output "C", next=3; 3 not in buffer ["A","B","C"]
receive(3,"D") {3:"D"} 3 output "D", next=4 ["A","B","C","D"]

Final ordered output: A → B → C → D


Complexity Analysis

  • Time: O(1) per receive call (amortized) — each packet is stored once and flushed once.
  • Space: O(k) where k is the number of out-of-order packets currently buffered (worst case O(n)).

Follow-up: Packet Loss Handling Strategies

The interviewer will likely ask: what if a seq_num never arrives?

Strategy 1: Timeout + Retransmission Request (ARQ)

Track how long we've been waiting for next_seq. If it exceeds a threshold, send a retransmission request (NACK) to the sender.

import time

class SequencerWithTimeout(Sequencer):
    def __init__(self, timeout=5.0):
        super().__init__()
        self.wait_start = None
        self.timeout = timeout

    def receive(self, seq_num, data):
        super().receive(seq_num, data)
        if self.next_seq not in self.buffer:
            if self.wait_start is None:
                self.wait_start = time.time()
        else:
            self.wait_start = None

    def check_timeout(self):
        """Returns seq_num to retransmit, or None."""
        if self.wait_start and time.time() - self.wait_start > self.timeout:
            return self.next_seq
        return None

Strategy 2: Skip and Forward

For latency-sensitive applications (live streaming, real-time audio/video), waiting is unacceptable. Skip the missing sequence number, insert a placeholder, and advance next_seq to the next available buffered packet.

def skip_missing(self):
    """Skip the missing seq and continue from next available."""
    if not self.buffer:
        return
    next_available = min(self.buffer.keys())
    self.output.append(None)  # placeholder for lost packet
    self.next_seq = next_available
    while self.next_seq in self.buffer:
        self.output.append(self.buffer.pop(self.next_seq))
        self.next_seq += 1

Strategy Comparison

Strategy Best for Pros Cons
Timeout + ARQ Log systems, replication Data completeness Added latency
Skip & Forward Live streaming, real-time Low latency Data loss accepted

Extended Discussion Points

  • Duplicates: handled by if seq_num < self.next_seq or seq_num in self.buffer.
  • Sliding window: add a maximum buffer size to prevent unbounded memory growth — analogous to TCP's receive window.
  • Thread safety: in production, receive and check_timeout run on different threads — protect buffer and next_seq with a lock.

💬 Need VO Support?

Real-time VO assistance for Google, Amazon, TikTok, Microsoft, Meta, Uber, and more.

WeChat: Coding0201


📬 Contact

Email: [email protected]
Telegram: @OAVOProxy