
Indigo Fair Data Engineer interview typically runs 5 rounds: Python/Technical Execution, SQL Execution, Data Modeling, Product Sense, Core Values/Behavioral. It took about one onsite loop and was notably algorithm-heavy and fast-paced.
$195K
Avg. Base Comp
$360K
Avg. Total Comp
6
Typical Rounds
3-5 weeks
Process Length
Our candidates report a very clear pattern at Indigo Fair: the role may read like data engineering or analytics engineering, but the bar is split between warehouse thinking and a surprisingly unforgiving algorithmic screen. The non-obvious make-or-break is that Python execution can function as a hard gate, even when the work itself is much more SQL- and modeling-heavy. That mismatch caught the candidate off guard, and it’s the kind of signal we’ve seen before at companies that want engineers who can move comfortably between product data and pure technical rigor.
What Indigo Fair seems to care about most is whether you can turn messy event streams into a clean analytical backbone. The strongest signal in the experience was the modeling discussion: defining grain, separating facts, and reasoning through incremental layers and session logic. We also see a strong emphasis on precision under time pressure in SQL, especially around window functions, joins, and edge cases. The product conversation appears less about polished opinions and more about whether you can translate ambiguous stakeholder asks into metrics that won’t collapse under changing definitions.
A recurring theme is that the company values practical architecture, but it expects you to defend your choices with rigor. The candidate’s notes on utilization, rush-hour boundaries, and overlapping trip types suggest that Indigo Fair is probing for people who think ahead about how the model will break as the business evolves. In other words, they are not just checking whether you can build a fact table; they want to know whether you can anticipate the next feature, the next metric dispute, and the next data edge case before it becomes a production problem.
Synthetized from 1 candidates reports by our editorial team.
Had an interview recently?
Share your experience. Unlock the full guide.
Real interview reports from people who went through the Indigo Fair process.
The Onsite Loop (5 Rounds):
Python / Technical Execution (60 minutes) This was a massive bait-and-switch. I was told beforehand that this round was potentially being deprecated and wasn't a core focus. Instead, it was a brutal, full-hour live-coding LeetCode session, all LeetCode hard in python. It was entirely algorithmic and totally misaligned with the day-to-day realities of an Analytics Engineer. If you are not actively grinding LeetCode Hard/Medium sliding window or binary tree problems, you will fail this round. It acts as a hard pass/fail floor for the entire loop, regardless of your other scores.
SQL Execution (60 minutes) A standard, live-coding SQL round. It was highly rigorous and tested speed as much as accuracy. You need to be incredibly fast with window functions, complex joins, and edge-case handling under a ticking clock.
Data Modeling (60 minutes) This was the most relevant round. It was a deep dive into warehouse architecture, requiring you to lay out facts and dimensions, define the grain, and explain incremental materialization strategies (STG/INT layers). They are looking for thoughtful architects who understand how data flows and how to keep a warehouse performant.
Product Sense (30 minutes) A conversational round focused on business alignment. They tested how well I could translate messy stakeholder requests into clean metrics and bridge the gap between business needs and data realities.
Core Values / Behavioral (30 minutes) Standard behavioral round indexing on Faire's specific cultural pillars.
Final Takeaway for Future Candidates: Do not let the "Analytics Engineer" title fool you. Even if you crush the Product and Data Modeling rounds (which I did), the engineering purists on the panel will absolutely fail you if you do not pass the 60-minute algorithmic Python round. Prepare for this as if you are interviewing for a pure Software Engineering backend role, or don't take the loop at all.
The problem is my recruiter explicitly told me two HMs were "fighting over me," and I know it wasn't a hard "no." Ultimately, it was a massive waste of my time. Frankly, it's time that we all start boycotting tech interviews.
Questions asked: Python: this was the leetcode problem i got: https://leetcode.com/problems/sliding-window-maximum/
SQL (i'll just give it to you cause I copied that shit!):
CREATE TABLE app_events ( event_uuid VARCHAR, -- unique event ID event_at TIMESTAMP, -- when it happened rider_id BIGINT, -- NULL on driver_online/offline driver_id BIGINT, -- present on all events trip_id BIGINT, -- NULL on driver_online/offline event_type VARCHAR, -- 5 types below event_metadata JSON -- payload varies by event_type );
**Event types and what they carry:**
| event_type | rider_id | driver_id | trip_id | event_metadata (confirmed) | event_metadata (likely) |
|---|---|---|---|---|---|
| driver_online | NULL | yes | NULL | unknown | `{"lat": X, "lng": Y}` |
| driver_offline | NULL | yes | NULL | unknown | `{"lat": X, "lng": Y}` |
| trip_fare_accept | yes | yes | yes | `{"amount": X}` | `{"amount": X, "surge": 1.5, "est_minutes": 12}` |
| trip_start | yes | yes | yes | unknown | `{"pickup_lat": X, "pickup_lng": Y}` |
| trip_end | yes | yes | yes | unknown | `{"dropoff_lat": X, "dropoff_lng": Y, "distance_miles": 4.2, "rider_rating": 5, "driver_rating": 4}` |
**Constraints:**
- Event stream is "perfect" — no missing events, no duplicates
- A driver can go online/offline multiple times per day
- Multiple trips within a single online session
- No separate dimension tables exist in the source
- event_metadata contains 'amount' (fare_amount) only when event_type = 'trip_fare_accept' (i.e., event_metadata::JSON->>'amount'::NUMERIC)
and here were the questions and my answers:
/*
1. Determine the average revenue of all completed trips that took 20 minutes or more.
*/
WITH trips AS (
SELECT
trip_id,
MAX(event_at) FILTER (WHERE event_type = 'trip_start') AS trip_start_ts,
MAX(event_at) FILTER (WHERE event_type = 'trip_end') AS trip_end_ts,
MAX((event_metadata::JSON->>'amount')::NUMERIC) FILTER (WHERE event_type = 'trip_fare_accept') AS fare_amount
FROM dbt_faire_rideshare.app_events
WHERE
event_type IN ('trip_start', 'trip_end', 'trip_fare_accept')
GROUP BY
1
),
cte_diff AS (
SELECT
trip_id,
(
EXTRACT(EPOCH FROM trip_end_ts) - EXTRACT(EPOCH FROM trip_start_ts)
) / 60 AS trip_length,
fare_amount
FROM trips
)
SELECT
ROUND(AVG(fare_amount), 2) AS avg_revenue
FROM cte_diff
WHERE
trip_length >= 20.0;
/*
2. Write a query to return the week that has the highest average
long-trip revenue (use the trip fare accept event to recognize revenue).
Break ties if needed by the earliest week.
-- */
WITH trips AS (
SELECT
trip_id,
MAX(event_at) FILTER (WHERE event_type = 'trip_start') AS trip_start_ts,
MAX(event_at) FILTER (WHERE event_type = 'trip_end') AS trip_end_ts,
MAX(event_at) FILTER (WHERE event_type = 'trip_fare_accept') AS fare_accept_ts,
MAX((event_metadata::JSON->>'amount')::NUMERIC) FILTER (WHERE event_type = 'trip_fare_accept') AS fare_amount
FROM dbt_faire_rideshare.app_events
WHERE
event_type IN ('trip_start', 'trip_end', 'trip_fare_accept')
GROUP BY
1
),
cte_diff AS (
SELECT
trip_id,
DATE_TRUNC('WEEK', fare_accept_ts)::DATE AS wk,
(
EXTRACT(EPOCH FROM trip_end_ts) - EXTRACT(EPOCH FROM trip_start_ts)
) / 60 AS trip_length,
fare_amount
FROM trips
),
cte_rnk AS (
SELECT
wk,
ROUND(AVG(fare_amount), 2) AS avg_revenue,
DENSE_RANK() OVER(
ORDER BY AVG(fare_amount) DESC NULLS LAST, wk ASC
) AS rnk
FROM cte_diff
WHERE
trip_length >= 20.0
GROUP BY
1
)
SELECT
wk,
avg_revenue,
rnk
FROM cte_rnk
WHERE
rnk = 1;
/*
3. Write a query to determine driver utilization for
each completed driver session.
*/
/*
i. define sessions: driver session defined: from driver_online to driver_offline
ii define trips within sessions; calculate the minutes between trip_start and end for each trip within session. aggregate that for each session. this is the numerator; the denominator is session length (driver_offline - driver_online)
*/
WITH int_ordered_events AS (
SELECT
driver_id,
event_at,
event_type,
-- Look ahead to the next event for this driver
LEAD(event_at) OVER(
PARTITION BY driver_id
ORDER BY event_at ASC
) AS next_event_at,
LEAD(event_type) OVER(
PARTITION BY driver_id
ORDER BY event_at ASC
) AS next_event_type
FROM app_events
WHERE
event_type IN ('driver_online', 'driver_offline')
),
driver_sessions AS (
SELECT
driver_id,
'S' || driver_id::VARCHAR || '-' || EXTRACT(EPOCH FROM event_at)::VARCHAR AS session_id,
event_at AS session_start_at,
next_event_at AS session_end_at,
(EXTRACT(EPOCH FROM next_event_at) - EXTRACT(EPOCH FROM event_at)) / 60.0 AS session_duration_minutes
FROM int_ordered_events
WHERE
event_type = 'driver_online'
-- Defensive check: ensure the next event is actually an offline event
AND next_event_type = 'driver_offline'
AND next_event_at IS NOT NULL
),
trips AS (
SELECT
trip_id,
driver_id,
MAX(event_at) FILTER (WHERE event_type = 'trip_start') AS trip_start_ts,
MAX(event_at) FILTER (WHERE event_type = 'trip_end') AS trip_end_ts,
-- Calculate the duration here so the final SELECT stays clean
(EXTRACT(EPOCH FROM MAX(event_at) FILTER (WHERE event_type = 'trip_end')) -
EXTRACT(EPOCH FROM MAX(event_at) FILTER (WHERE event_type = 'trip_start'))) / 60.0 AS trip_duration_minutes
FROM app_events
WHERE
event_type IN ('trip_start', 'trip_end', 'trip_fare_accept')
GROUP BY
1, 2
)
SELECT
s.session_id,
s.driver_id,
-- Utilization (Mathematically hard-capped at 100% because the denominator scales with the numerator)
ROUND(
(COALESCE(SUM(t.trip_duration_minutes), 0) * 100.0) /
NULLIF(
(EXTRACT(EPOCH FROM GREATEST(s.session_end_at, MAX(t.trip_end_ts)))
- EXTRACT(EPOCH FROM s.session_start_at)) / 60.0,
0
)::NUMERIC,
1
) AS utilization_pct
FROM driver_sessions s
LEFT JOIN trips t
ON s.driver_id = t.driver_id
AND t.trip_start_ts BETWEEN s.session_start_at AND s.session_end_at
GROUP BY
1, 2, s.session_start_at, s.session_end_at
ORDER BY
s.session_start_at;
---------------------
data modeling
---------------------
1. build the model -> write it down.
something like this:
┌──────────────────────────────────────────────────────────────┐
│ fact_trips │
│ Grain: ONE ROW PER trip_id │
│ Source: int_trips │
├──────────────────────────────────────────────────────────────┤
│ PK trip_id │
│ FK driver_id │
│ FK rider_id │
│ fare_accepted_at (timestamp) │
│ trip_started_at (timestamp) │
│ trip_ended_at (timestamp) │
│ fare_amount (numeric — from JSON) │
│ surge_multiplier (numeric — from JSON) │
│ pickup_lat / pickup_lng (numeric — from JSON) │
│ dropoff_lat / dropoff_lng (numeric — from JSON) │
│ wait_time_minutes (fare_accept → trip_start) │
│ trip_duration_minutes (trip_start → trip_end) │
│ trip_date (date — for partitioning) │
└──────────────────────────────────────────────────────────────┘
┌──────────────────────────────────────────────────────────────┐
│ fact_driver_sessions │
│ Grain: ONE ROW PER driver_id PER session │
│ Source: int_driver_sessions LEFT JOIN int_trips │
├──────────────────────────────────────────────────────────────┤
│ PK session_id (surrogate: dbt_utils hash of │
│ driver_id + session_start_at) │
│ FK driver_id │
│ session_start_at (timestamp) │
│ session_end_at (timestamp — GREATEST of raw │
│ offline and last trip_ended_at) │
│ session_end_reason (varchar — from offline metadata)│
│ session_minutes (total online time, stretched) │
│ trip_minutes (sum of trip durations) │
│ idle_minutes (session - trip) │
│ trips_completed (count) │
│ utilization_pct (trip / session, capped at 100%) │
│ session_date (date — for partitioning) │
└──────────────────────────────────────────────────────────────┘
you can explain the underlying logic from the sql used to build it. offer to build dims but the focus will be on the final fact tables
2. how would analysts analyze driver utilization during rush hour? edge cases: how do deal with sessions that start before (but end during) rush hour? -> answer: require that in order for this to be valid, each driver session needs to start and end in the time period you are analyzing / defining as 'rush hour'.
3. what if we introduced a new feature that gives different trip_types: 'shared' vs 'single'. do you need to account for overlapping trips in your design and if so, how and what would you change about the data model?
---------------------
product case:
Q1: given this we want to define a metric called x, how would you go about defining a metric. what are the pros and cons and tradeoffs with each of these metrics?
q2: will be around 'search volume' as a north star and talk about the pros and cons of using that as a north star
q3 will be about changing requirements; how do you adapt when things are changing to the requirements
---------------------
Core Values (behavioral): This interview will be focused on the core values of Faire. Be prepared to speak on examples from your personal/professional life that map to the following: Serve our community, make it happen (fast), raise the bar, seek the truth, one faire.
Prep tip from this candidate
Treat the Python round as a hard gate and study deque-based sliding window problems (specifically LC "Sliding Window Maximum") and binary tree algorithms at the Hard level — the round is pass/fail for the entire loop regardless of your other scores, and you will not have time to derive optimal solutions from scratch. For SQL, practice reconstructing multi-event-type trip lifecycles from a single raw events table using conditional aggregation (`MAX(...) FILTER (WHERE event_type = ...)`) and session-boundary logic with `LEAD()`, as the questions require you to pivot, join, and compute derived metrics like utilization entirely from one denormalized source under timed conditions.
Share your own interview experience to unlock all reports, or subscribe for full access.
Sourced from candidate reports and verified by our team.
Topics based on recent interview experiences.
Featured question at Indigo Fair
There was a robbery from the ATM at the bank where you work. Some unauthorized withdrawals were made, and you need to help your bank find out more about those withdrawals.
| Question | |
|---|---|
| Your Strengths and Weaknesses | |
| Azure Kubernetes Infrastructure | |
| Empty Neighborhoods | |
| 2nd Highest Salary | |
| Comments Histogram | |
| Customer Orders | |
| Top Three Salaries | |
| Subscription Overlap | |
| Merge Sorted Lists | |
| Prime to N | |
| Random SQL Sample | |
| Experiment Validity | |
| Download Facts | |
| Last Transaction | |
| Rolling Bank Transactions | |
| Average Quantity | |
| Monthly Customer Report | |
| Top 3 Users | |
| Closest SAT Scores | |
| Average Order Value | |
| Manager Team Sizes | |
| Cumulative Sales Since Last Restocking | |
| Over-Budget Projects | |
| Month Over Month | |
| Flight Records | |
| Paired Products | |
| Upsell Transactions | |
| Completed Shipments | |
| Total Spent on Products |
Synthesized from candidate reports. Individual experiences may vary.
A recruiter first aligns on the role, background, and interview logistics. In the reported experience, the recruiter also shared expectations about the loop, though the candidate later felt one technical round had been misrepresented.
This live-coding round tests SQL speed and accuracy under time pressure. Candidates should expect complex joins, window functions, edge cases, and analytical query design based on event-stream data.
A full-hour live coding round focused on algorithmic Python, including hard LeetCode-style problems. The reported question was Sliding Window Maximum, and the round was described as a high-stakes pass/fail screen.
This round dives into warehouse architecture and dimensional modeling. Candidates are expected to define grains, facts, dimensions, and incremental modeling strategies, and to reason through how to structure driver sessions and trip facts from raw event data.
A conversational round focused on translating messy stakeholder requests into metrics and business logic. The discussion includes defining a metric, evaluating a north star like search volume, and adapting when requirements change.
A behavioral interview centered on Faire’s core values. Candidates should prepare examples that demonstrate serving the community, moving fast, raising the bar, seeking the truth, and working as one team.