HubSpot's SDE / New Grad OA has a remarkably stable shape across the current hiring cycle — it doesn't test pure LeetCode tricks; it tests "half of a real engineering task": you're given a stream of page-visit events, and you have to slice it into per-visitor sessions, format the output to a strict JSON schema, then POST the result back to HubSpot's grader.
In other words, being good at LeetCode mediums is not enough — you also have to ship a tiny "data post-processing + API interaction" pipeline in 90 minutes.
This article walks through the canonical "Sessions" problem end to end: statement, approach, full Python solution, edge-case checklist, submission flow, and 7 Google-style FAQs. If HubSpot's OA is on your calendar, mastering this one question moves the needle on your pass rate more than any other prep item.
HubSpot OA Platform & Format
| Dimension | Detail |
|---|---|
| Platform | HubSpot's in-house OA (some roles use HackerRank) |
| Total time | 90 - 120 minutes |
| Number of questions | 2 - 3 (1 flagship + 1-2 supporting) |
| Difficulty | LeetCode Medium with a practical bent, few hard tricks |
| Languages | Python / Java / JavaScript / C++ |
| Submission | Flagship question: HTTP GET to pull data, HTTP POST to submit |
| Pass bar | Usually ≥ 2 problems, with the flagship fully working |
Key insight: HubSpot's OA isn't trying to find LeetCode champions. It wants to see whether you can take a full pipeline — read data → transform → output a valid JSON schema → POST it back — and ship it under time pressure.
The Question: Slice Events into Per-Visitor Sessions
Problem statement
HubSpot gives you a JSON event stream where each event has url, visitorId, and timestamp (Unix ms):
{
"events": [
{"url": "/pages/a-big-river", "visitorId": "d1177368-2310-11e8-9e2a-9b860a0d9039", "timestamp": 1512754583000},
{"url": "/pages/a-small-dog", "visitorId": "d1177368-2310-11e8-9e2a-9b860a0d9039", "timestamp": 1512754631000},
{"url": "/pages/a-big-talk", "visitorId": "f877b96c-9969-4abc-bbe2-54b17d030f8b", "timestamp": 1512709065294},
{"url": "/pages/a-sad-story", "visitorId": "f877b96c-9969-4abc-bbe2-54b17d030f8b", "timestamp": 1512711000000},
{"url": "/pages/a-big-river", "visitorId": "d1177368-2310-11e8-9e2a-9b860a0d9039", "timestamp": 1512754436000},
{"url": "/pages/a-sad-story", "visitorId": "f877b96c-9969-4abc-bbe2-54b17d030f8b", "timestamp": 1512709024000}
]
}
Task: For each visitorId, sort events by timestamp, then group events whose consecutive gap is ≤ 10 minutes (600,000 ms) into a single session. A gap larger than 10 minutes starts a new session. Each session must report:
startTime— timestamp of the first event in the sessionduration— last event timestamp − first event timestamppages— chronological list of URLs in this session
Expected output:
{
"sessionsByUser": {
"f877b96c-9969-4abc-bbe2-54b17d030f8b": [
{
"duration": 41294,
"pages": ["/pages/a-sad-story", "/pages/a-big-talk"],
"startTime": 1512709024000
},
{
"duration": 0,
"pages": ["/pages/a-sad-story"],
"startTime": 1512711000000
}
],
"d1177368-2310-11e8-9e2a-9b860a0d9039": [
{
"duration": 195000,
"pages": ["/pages/a-big-river", "/pages/a-big-river", "/pages/a-small-dog"],
"startTime": 1512754436000
}
]
}
}
Finally, POST this object back to the HubSpot-provided endpoint.
Three-step approach
There's no clever algorithm here — what wins is decomposing the engineering task crisply:
- Group by
visitorIdusing adefaultdict(list). - Sort each visitor's events by
timestampascending, then linear-scan and start a new session whenever the gap exceeds 10 minutes. - Wrap into the target JSON and POST it — watch the field names, the millisecond unit, and the fact that single-event sessions still carry
duration: 0.
Python full solution
import json
import requests
from collections import defaultdict
SESSION_GAP_MS = 10 * 60 * 1000 # 10 minutes = 600,000 ms
def build_sessions(events):
"""Convert the event stream into HubSpot's sessionsByUser structure."""
# Step 1: group by visitorId
visitor_events = defaultdict(list)
for ev in events:
visitor_events[ev["visitorId"]].append(ev)
sessions_by_user = {}
# Step 2: per-visitor sort + session cut
for visitor_id, evs in visitor_events.items():
evs.sort(key=lambda x: x["timestamp"])
sessions = []
current = None
for ev in evs:
ts = ev["timestamp"]
url = ev["url"]
if current is None:
current = {"startTime": ts, "_lastTime": ts, "pages": [url]}
else:
if ts - current["_lastTime"] <= SESSION_GAP_MS:
current["pages"].append(url)
current["_lastTime"] = ts
else:
sessions.append(_finalize(current))
current = {"startTime": ts, "_lastTime": ts, "pages": [url]}
if current is not None:
sessions.append(_finalize(current))
sessions_by_user[visitor_id] = sessions
return {"sessionsByUser": sessions_by_user}
def _finalize(session):
"""Strip internal fields and emit the HubSpot-facing shape."""
return {
"startTime": session["startTime"],
"duration": session["_lastTime"] - session["startTime"],
"pages": session["pages"],
}
def run(get_url, post_url):
raw = requests.get(get_url).json()
payload = build_sessions(raw["events"])
resp = requests.post(post_url, json=payload)
print("POST status:", resp.status_code, resp.text[:200])
Time complexity: O(N log N), N = total events (sort dominates). Space complexity: O(N).
5 edge cases you must handle
The majority of failed submissions aren't algorithmic — they're tiny details like these:
| # | Edge case | Correct handling |
|---|---|---|
| 1 | Gap exactly 10 minutes | Spec says "no more than 10 minutes apart" — equal to 10 min stays in the same session |
| 2 | Single-event session | duration = 0, pages is still a length-1 list — don't drop it |
| 3 | Raw events arrive out of order | Always sort each visitor's events by timestamp first |
| 4 | Multiple sessions per visitor | Output is a list, ordered by startTime ascending |
| 5 | duration unit |
Milliseconds, not seconds — don't divide by 1000 |
Common bugs
- ❌ Iterating events as a flat list and trying to cut sessions globally → events for the same
visitorIdmay be scattered anywhere in the stream. Group first. - ❌ Trusting dict insertion order to be timestamp order → always explicit sort.
- ❌ Missing the outer
sessionsByUserkey on POST → schema validation fails. - ❌ Hand-concatenating JSON instead of
json.dumps→ numeric timestamps become strings, schema fails. - ❌ Writing
10 * 60instead of10 * 60 * 1000→ threshold is 1000× too small, every event becomes its own session.
HubSpot OA Prep Strategy (90-minute speedrun)
| Time slice | What to do |
|---|---|
| 0 – 10 min | Read carefully + write the exact field names (visitorId, startTime, duration, pages) into a scratch cell so you never typo them later |
| 10 – 25 min | Wire up the GET, print the first event — confirm schema matches your assumptions |
| 25 – 65 min | Implement build_sessions — run it on the sample first before pointing at the live dataset |
| 65 – 80 min | POST + read the response message; many graders tell you exactly which field is wrong |
| 80 – 90 min | Patch edge cases (=10 min boundary, single-event session, duration units) |
LeetCode warmup that transfers well
| # | Problem | Why it helps |
|---|---|---|
| 1834 | Single-Threaded CPU | Sort by time + priority queue |
| 1429 | First Unique Number | Maintain stats in a dict |
| 911 | Online Election | Aggregate by time window |
| 1244 | Design A Leaderboard | dict + sorted-output pattern |
| 359 | Logger Rate Limiter | Minimal "time-gap threshold" template |
FAQ
Q1: What types of questions appear on the HubSpot SDE New Grad OA? A: Engineering-flavored problems: take a stream of events / logs / orders, apply business rules, output a strict JSON. Algorithmically, it's mostly hashmaps, sorting, and string handling — rarely hard DP or graphs.
Q2: How long is the HubSpot OA and how many problems are there? A: Usually 90 – 120 minutes with 2 – 3 problems. You typically need ≥ 2 working solutions to advance, with the flagship question (e.g., the Sessions problem above) weighted heaviest.
Q3: Do I need a specific language?
A: Python / Java / JavaScript / C++ are all allowed. Python's defaultdict, json, and requests make it the smoothest choice for a 90-minute timed assessment.
Q4: How exactly is a "session" defined?
A: Per visitorId, events sorted by timestamp ascending; consecutive events whose gap is ≤ 10 minutes (inclusive) belong to one session, larger gaps start a new session. Each session must include startTime, duration, and pages.
Q5: Does the flagship question really require an HTTP POST? A: Yes. You GET the raw events from a given URL and POST your transformed result to another URL. Plan for retries and timeouts — flaky networks can eat your last 10 minutes.
Q6: What are the top 3 bugs on this question?
A: ① Forgetting to sort per-visitor events; ② writing the 10-minute threshold as 600 instead of 600,000; ③ missing the outer sessionsByUser key on POST. Self-test these three before submitting.
Q7: What comes after passing the HubSpot OA? A: Typically 1 Recruiter Call → 2 – 3 technical interviews (one usually does a deep follow-up on your OA code) → 1 – 2 Hiring Manager / Team Match rounds. End-to-end timeline is 3 – 5 weeks.
Prepping for HubSpot OA / VO?
HubSpot's OA looks deceptively simple, but 90 minutes is short for a pipeline that touches data fetch → parse → business rules → schema → submit. If you'd rather not blow time on environment quirks and POST schema typos, we offer HubSpot-specific OA assist / OA proxy support: from setup walkthrough to per-problem co-piloting, designed to compress uncertainty.
We also run a complete VO assist / VO proxy track for the downstream interviews: candidate-and-question matchups, mock interviews, behavioral story templates — covering SDE / New Grad / Intern levels.
Add WeChat Coding0201 to grab the real-question set and a personalized prep plan.
Contact
Email: [email protected] Telegram: @OAVOProxy WeChat: Coding0201