HubSpot SDE OA Real Question Deep Dive | Per-Visitor Session Grouping + Full Python Solution

HubSpot's SDE / New Grad OA has a remarkably stable shape across the current hiring cycle — it doesn't test pure LeetCode tricks; it tests "half of a real engineering task": you're given a stream of page-visit events, and you have to slice it into per-visitor sessions, format the output to a strict JSON schema, then POST the result back to HubSpot's grader.

In other words, being good at LeetCode mediums is not enough — you also have to ship a tiny "data post-processing + API interaction" pipeline in 90 minutes.

This article walks through the canonical "Sessions" problem end to end: statement, approach, full Python solution, edge-case checklist, submission flow, and 7 Google-style FAQs. If HubSpot's OA is on your calendar, mastering this one question moves the needle on your pass rate more than any other prep item.

HubSpot OA Platform & Format

Dimension	Detail
Platform	HubSpot's in-house OA (some roles use HackerRank)
Total time	90 - 120 minutes
Number of questions	2 - 3 (1 flagship + 1-2 supporting)
Difficulty	LeetCode Medium with a practical bent, few hard tricks
Languages	Python / Java / JavaScript / C++
Submission	Flagship question: HTTP GET to pull data, HTTP POST to submit
Pass bar	Usually ≥ 2 problems, with the flagship fully working

Key insight: HubSpot's OA isn't trying to find LeetCode champions. It wants to see whether you can take a full pipeline — read data → transform → output a valid JSON schema → POST it back — and ship it under time pressure.

The Question: Slice Events into Per-Visitor Sessions

Problem statement

HubSpot gives you a JSON event stream where each event has url, visitorId, and timestamp (Unix ms):

{
  "events": [
    {"url": "/pages/a-big-river", "visitorId": "d1177368-2310-11e8-9e2a-9b860a0d9039", "timestamp": 1512754583000},
    {"url": "/pages/a-small-dog", "visitorId": "d1177368-2310-11e8-9e2a-9b860a0d9039", "timestamp": 1512754631000},
    {"url": "/pages/a-big-talk",  "visitorId": "f877b96c-9969-4abc-bbe2-54b17d030f8b", "timestamp": 1512709065294},
    {"url": "/pages/a-sad-story", "visitorId": "f877b96c-9969-4abc-bbe2-54b17d030f8b", "timestamp": 1512711000000},
    {"url": "/pages/a-big-river", "visitorId": "d1177368-2310-11e8-9e2a-9b860a0d9039", "timestamp": 1512754436000},
    {"url": "/pages/a-sad-story", "visitorId": "f877b96c-9969-4abc-bbe2-54b17d030f8b", "timestamp": 1512709024000}
  ]
}

Task: For each visitorId, sort events by timestamp, then group events whose consecutive gap is ≤ 10 minutes (600,000 ms) into a single session. A gap larger than 10 minutes starts a new session. Each session must report:

startTime — timestamp of the first event in the session
duration — last event timestamp − first event timestamp
pages — chronological list of URLs in this session

Expected output:

{
  "sessionsByUser": {
    "f877b96c-9969-4abc-bbe2-54b17d030f8b": [
      {
        "duration": 41294,
        "pages": ["/pages/a-sad-story", "/pages/a-big-talk"],
        "startTime": 1512709024000
      },
      {
        "duration": 0,
        "pages": ["/pages/a-sad-story"],
        "startTime": 1512711000000
      }
    ],
    "d1177368-2310-11e8-9e2a-9b860a0d9039": [
      {
        "duration": 195000,
        "pages": ["/pages/a-big-river", "/pages/a-big-river", "/pages/a-small-dog"],
        "startTime": 1512754436000
      }
    ]
  }
}

Finally, POST this object back to the HubSpot-provided endpoint.

Three-step approach

There's no clever algorithm here — what wins is decomposing the engineering task crisply:

Group by visitorId using a defaultdict(list).
Sort each visitor's events by timestamp ascending, then linear-scan and start a new session whenever the gap exceeds 10 minutes.
Wrap into the target JSON and POST it — watch the field names, the millisecond unit, and the fact that single-event sessions still carry duration: 0.

Python full solution

import json
import requests
from collections import defaultdict

SESSION_GAP_MS = 10 * 60 * 1000  # 10 minutes = 600,000 ms

def build_sessions(events):
    """Convert the event stream into HubSpot's sessionsByUser structure."""
    # Step 1: group by visitorId
    visitor_events = defaultdict(list)
    for ev in events:
        visitor_events[ev["visitorId"]].append(ev)

    sessions_by_user = {}

    # Step 2: per-visitor sort + session cut
    for visitor_id, evs in visitor_events.items():
        evs.sort(key=lambda x: x["timestamp"])

        sessions = []
        current = None

        for ev in evs:
            ts = ev["timestamp"]
            url = ev["url"]

            if current is None:
                current = {"startTime": ts, "_lastTime": ts, "pages": [url]}
            else:
                if ts - current["_lastTime"] <= SESSION_GAP_MS:
                    current["pages"].append(url)
                    current["_lastTime"] = ts
                else:
                    sessions.append(_finalize(current))
                    current = {"startTime": ts, "_lastTime": ts, "pages": [url]}

        if current is not None:
            sessions.append(_finalize(current))

        sessions_by_user[visitor_id] = sessions

    return {"sessionsByUser": sessions_by_user}


def _finalize(session):
    """Strip internal fields and emit the HubSpot-facing shape."""
    return {
        "startTime": session["startTime"],
        "duration": session["_lastTime"] - session["startTime"],
        "pages": session["pages"],
    }


def run(get_url, post_url):
    raw = requests.get(get_url).json()
    payload = build_sessions(raw["events"])
    resp = requests.post(post_url, json=payload)
    print("POST status:", resp.status_code, resp.text[:200])

Time complexity: O(N log N), N = total events (sort dominates). Space complexity: O(N).

5 edge cases you must handle

The majority of failed submissions aren't algorithmic — they're tiny details like these:

#	Edge case	Correct handling
1	Gap exactly 10 minutes	Spec says "no more than 10 minutes apart" — equal to 10 min stays in the same session
2	Single-event session	`duration = 0`, `pages` is still a length-1 list — don't drop it
3	Raw events arrive out of order	Always sort each visitor's events by timestamp first
4	Multiple sessions per visitor	Output is a list, ordered by `startTime` ascending
5	`duration` unit	Milliseconds, not seconds — don't divide by 1000

Common bugs

❌ Iterating events as a flat list and trying to cut sessions globally → events for the same visitorId may be scattered anywhere in the stream. Group first.
❌ Trusting dict insertion order to be timestamp order → always explicit sort.
❌ Missing the outer sessionsByUser key on POST → schema validation fails.
❌ Hand-concatenating JSON instead of json.dumps → numeric timestamps become strings, schema fails.
❌ Writing 10 * 60 instead of 10 * 60 * 1000 → threshold is 1000× too small, every event becomes its own session.

HubSpot OA Prep Strategy (90-minute speedrun)

Time slice	What to do
0 – 10 min	Read carefully + write the exact field names (`visitorId`, `startTime`, `duration`, `pages`) into a scratch cell so you never typo them later
10 – 25 min	Wire up the GET, print the first event — confirm schema matches your assumptions
25 – 65 min	Implement `build_sessions` — run it on the sample first before pointing at the live dataset
65 – 80 min	POST + read the response message; many graders tell you exactly which field is wrong
80 – 90 min	Patch edge cases (=10 min boundary, single-event session, duration units)

LeetCode warmup that transfers well

#	Problem	Why it helps
1834	Single-Threaded CPU	Sort by time + priority queue
1429	First Unique Number	Maintain stats in a dict
911	Online Election	Aggregate by time window
1244	Design A Leaderboard	dict + sorted-output pattern
359	Logger Rate Limiter	Minimal "time-gap threshold" template

FAQ

Q1: What types of questions appear on the HubSpot SDE New Grad OA? A: Engineering-flavored problems: take a stream of events / logs / orders, apply business rules, output a strict JSON. Algorithmically, it's mostly hashmaps, sorting, and string handling — rarely hard DP or graphs.

Q2: How long is the HubSpot OA and how many problems are there? A: Usually 90 – 120 minutes with 2 – 3 problems. You typically need ≥ 2 working solutions to advance, with the flagship question (e.g., the Sessions problem above) weighted heaviest.

Q3: Do I need a specific language? A: Python / Java / JavaScript / C++ are all allowed. Python's defaultdict, json, and requests make it the smoothest choice for a 90-minute timed assessment.

Q4: How exactly is a "session" defined? A: Per visitorId, events sorted by timestamp ascending; consecutive events whose gap is ≤ 10 minutes (inclusive) belong to one session, larger gaps start a new session. Each session must include startTime, duration, and pages.

Q5: Does the flagship question really require an HTTP POST? A: Yes. You GET the raw events from a given URL and POST your transformed result to another URL. Plan for retries and timeouts — flaky networks can eat your last 10 minutes.

Q6: What are the top 3 bugs on this question? A: ① Forgetting to sort per-visitor events; ② writing the 10-minute threshold as 600 instead of 600,000; ③ missing the outer sessionsByUser key on POST. Self-test these three before submitting.

Q7: What comes after passing the HubSpot OA? A: Typically 1 Recruiter Call → 2 – 3 technical interviews (one usually does a deep follow-up on your OA code) → 1 – 2 Hiring Manager / Team Match rounds. End-to-end timeline is 3 – 5 weeks.

Prepping for HubSpot OA / VO?

HubSpot's OA looks deceptively simple, but 90 minutes is short for a pipeline that touches data fetch → parse → business rules → schema → submit. If you'd rather not blow time on environment quirks and POST schema typos, we offer HubSpot-specific OA assist / OA proxy support: from setup walkthrough to per-problem co-piloting, designed to compress uncertainty.

We also run a complete VO assist / VO proxy track for the downstream interviews: candidate-and-question matchups, mock interviews, behavioral story templates — covering SDE / New Grad / Intern levels.

Add WeChat Coding0201 to grab the real-question set and a personalized prep plan.

Contact

Email: [email protected] Telegram: @OAVOProxy WeChat: Coding0201