← Back to blog HubSpot SDE OA Real Question Deep Dive | Per-Visitor Session Grouping + Full Python Solution
HubSpot

HubSpot SDE OA Real Question Deep Dive | Per-Visitor Session Grouping + Full Python Solution

2026-05-28

HubSpot's SDE / New Grad OA has a remarkably stable shape across the current hiring cycle — it doesn't test pure LeetCode tricks; it tests "half of a real engineering task": you're given a stream of page-visit events, and you have to slice it into per-visitor sessions, format the output to a strict JSON schema, then POST the result back to HubSpot's grader.

In other words, being good at LeetCode mediums is not enough — you also have to ship a tiny "data post-processing + API interaction" pipeline in 90 minutes.

This article walks through the canonical "Sessions" problem end to end: statement, approach, full Python solution, edge-case checklist, submission flow, and 7 Google-style FAQs. If HubSpot's OA is on your calendar, mastering this one question moves the needle on your pass rate more than any other prep item.


HubSpot OA Platform & Format

Dimension Detail
Platform HubSpot's in-house OA (some roles use HackerRank)
Total time 90 - 120 minutes
Number of questions 2 - 3 (1 flagship + 1-2 supporting)
Difficulty LeetCode Medium with a practical bent, few hard tricks
Languages Python / Java / JavaScript / C++
Submission Flagship question: HTTP GET to pull data, HTTP POST to submit
Pass bar Usually ≥ 2 problems, with the flagship fully working

Key insight: HubSpot's OA isn't trying to find LeetCode champions. It wants to see whether you can take a full pipeline — read data → transform → output a valid JSON schema → POST it back — and ship it under time pressure.


The Question: Slice Events into Per-Visitor Sessions

Problem statement

HubSpot gives you a JSON event stream where each event has url, visitorId, and timestamp (Unix ms):

{
  "events": [
    {"url": "/pages/a-big-river", "visitorId": "d1177368-2310-11e8-9e2a-9b860a0d9039", "timestamp": 1512754583000},
    {"url": "/pages/a-small-dog", "visitorId": "d1177368-2310-11e8-9e2a-9b860a0d9039", "timestamp": 1512754631000},
    {"url": "/pages/a-big-talk",  "visitorId": "f877b96c-9969-4abc-bbe2-54b17d030f8b", "timestamp": 1512709065294},
    {"url": "/pages/a-sad-story", "visitorId": "f877b96c-9969-4abc-bbe2-54b17d030f8b", "timestamp": 1512711000000},
    {"url": "/pages/a-big-river", "visitorId": "d1177368-2310-11e8-9e2a-9b860a0d9039", "timestamp": 1512754436000},
    {"url": "/pages/a-sad-story", "visitorId": "f877b96c-9969-4abc-bbe2-54b17d030f8b", "timestamp": 1512709024000}
  ]
}

Task: For each visitorId, sort events by timestamp, then group events whose consecutive gap is ≤ 10 minutes (600,000 ms) into a single session. A gap larger than 10 minutes starts a new session. Each session must report:

Expected output:

{
  "sessionsByUser": {
    "f877b96c-9969-4abc-bbe2-54b17d030f8b": [
      {
        "duration": 41294,
        "pages": ["/pages/a-sad-story", "/pages/a-big-talk"],
        "startTime": 1512709024000
      },
      {
        "duration": 0,
        "pages": ["/pages/a-sad-story"],
        "startTime": 1512711000000
      }
    ],
    "d1177368-2310-11e8-9e2a-9b860a0d9039": [
      {
        "duration": 195000,
        "pages": ["/pages/a-big-river", "/pages/a-big-river", "/pages/a-small-dog"],
        "startTime": 1512754436000
      }
    ]
  }
}

Finally, POST this object back to the HubSpot-provided endpoint.


Three-step approach

There's no clever algorithm here — what wins is decomposing the engineering task crisply:

  1. Group by visitorId using a defaultdict(list).
  2. Sort each visitor's events by timestamp ascending, then linear-scan and start a new session whenever the gap exceeds 10 minutes.
  3. Wrap into the target JSON and POST it — watch the field names, the millisecond unit, and the fact that single-event sessions still carry duration: 0.

Python full solution

import json
import requests
from collections import defaultdict

SESSION_GAP_MS = 10 * 60 * 1000  # 10 minutes = 600,000 ms

def build_sessions(events):
    """Convert the event stream into HubSpot's sessionsByUser structure."""
    # Step 1: group by visitorId
    visitor_events = defaultdict(list)
    for ev in events:
        visitor_events[ev["visitorId"]].append(ev)

    sessions_by_user = {}

    # Step 2: per-visitor sort + session cut
    for visitor_id, evs in visitor_events.items():
        evs.sort(key=lambda x: x["timestamp"])

        sessions = []
        current = None

        for ev in evs:
            ts = ev["timestamp"]
            url = ev["url"]

            if current is None:
                current = {"startTime": ts, "_lastTime": ts, "pages": [url]}
            else:
                if ts - current["_lastTime"] <= SESSION_GAP_MS:
                    current["pages"].append(url)
                    current["_lastTime"] = ts
                else:
                    sessions.append(_finalize(current))
                    current = {"startTime": ts, "_lastTime": ts, "pages": [url]}

        if current is not None:
            sessions.append(_finalize(current))

        sessions_by_user[visitor_id] = sessions

    return {"sessionsByUser": sessions_by_user}


def _finalize(session):
    """Strip internal fields and emit the HubSpot-facing shape."""
    return {
        "startTime": session["startTime"],
        "duration": session["_lastTime"] - session["startTime"],
        "pages": session["pages"],
    }


def run(get_url, post_url):
    raw = requests.get(get_url).json()
    payload = build_sessions(raw["events"])
    resp = requests.post(post_url, json=payload)
    print("POST status:", resp.status_code, resp.text[:200])

Time complexity: O(N log N), N = total events (sort dominates). Space complexity: O(N).


5 edge cases you must handle

The majority of failed submissions aren't algorithmic — they're tiny details like these:

# Edge case Correct handling
1 Gap exactly 10 minutes Spec says "no more than 10 minutes apart" — equal to 10 min stays in the same session
2 Single-event session duration = 0, pages is still a length-1 list — don't drop it
3 Raw events arrive out of order Always sort each visitor's events by timestamp first
4 Multiple sessions per visitor Output is a list, ordered by startTime ascending
5 duration unit Milliseconds, not seconds — don't divide by 1000

Common bugs


HubSpot OA Prep Strategy (90-minute speedrun)

Time slice What to do
0 – 10 min Read carefully + write the exact field names (visitorId, startTime, duration, pages) into a scratch cell so you never typo them later
10 – 25 min Wire up the GET, print the first event — confirm schema matches your assumptions
25 – 65 min Implement build_sessionsrun it on the sample first before pointing at the live dataset
65 – 80 min POST + read the response message; many graders tell you exactly which field is wrong
80 – 90 min Patch edge cases (=10 min boundary, single-event session, duration units)

LeetCode warmup that transfers well

# Problem Why it helps
1834 Single-Threaded CPU Sort by time + priority queue
1429 First Unique Number Maintain stats in a dict
911 Online Election Aggregate by time window
1244 Design A Leaderboard dict + sorted-output pattern
359 Logger Rate Limiter Minimal "time-gap threshold" template

FAQ

Q1: What types of questions appear on the HubSpot SDE New Grad OA? A: Engineering-flavored problems: take a stream of events / logs / orders, apply business rules, output a strict JSON. Algorithmically, it's mostly hashmaps, sorting, and string handling — rarely hard DP or graphs.

Q2: How long is the HubSpot OA and how many problems are there? A: Usually 90 – 120 minutes with 2 – 3 problems. You typically need ≥ 2 working solutions to advance, with the flagship question (e.g., the Sessions problem above) weighted heaviest.

Q3: Do I need a specific language? A: Python / Java / JavaScript / C++ are all allowed. Python's defaultdict, json, and requests make it the smoothest choice for a 90-minute timed assessment.

Q4: How exactly is a "session" defined? A: Per visitorId, events sorted by timestamp ascending; consecutive events whose gap is ≤ 10 minutes (inclusive) belong to one session, larger gaps start a new session. Each session must include startTime, duration, and pages.

Q5: Does the flagship question really require an HTTP POST? A: Yes. You GET the raw events from a given URL and POST your transformed result to another URL. Plan for retries and timeouts — flaky networks can eat your last 10 minutes.

Q6: What are the top 3 bugs on this question? A: ① Forgetting to sort per-visitor events; ② writing the 10-minute threshold as 600 instead of 600,000; ③ missing the outer sessionsByUser key on POST. Self-test these three before submitting.

Q7: What comes after passing the HubSpot OA? A: Typically 1 Recruiter Call → 2 – 3 technical interviews (one usually does a deep follow-up on your OA code) → 1 – 2 Hiring Manager / Team Match rounds. End-to-end timeline is 3 – 5 weeks.


Prepping for HubSpot OA / VO?

HubSpot's OA looks deceptively simple, but 90 minutes is short for a pipeline that touches data fetch → parse → business rules → schema → submit. If you'd rather not blow time on environment quirks and POST schema typos, we offer HubSpot-specific OA assist / OA proxy support: from setup walkthrough to per-problem co-piloting, designed to compress uncertainty.

We also run a complete VO assist / VO proxy track for the downstream interviews: candidate-and-question matchups, mock interviews, behavioral story templates — covering SDE / New Grad / Intern levels.

Add WeChat Coding0201 to grab the real-question set and a personalized prep plan.


Contact

Email: [email protected] Telegram: @OAVOProxy WeChat: Coding0201