
Swiggy Data Scientist interview typically runs 2 rounds: resume deep-dive and ML coding/SQL. It usually takes about 2 rounds and is highly resume-driven and technical.
$1800K
Avg. Base Comp
$3520K
Avg. Total Comp
2
Typical Rounds
1-2 weeks
Process Length
We’ve seen Swiggy lean hard into whether candidates can explain their own work under pressure, not just name the right tools. In the experience shared here, the interviewer kept pulling on the thread of one project until the candidate had to justify the data source, preprocessing, log transforms, and even basic correlation intuition. That pattern tells us Swiggy is looking for people who can defend modeling decisions with clarity, especially when the choice isn’t the textbook one. The same theme showed up again in the NLP discussion, where the candidate had to walk through why TF-IDF fit the problem, why random forest beat logistic regression in that case, and how the evaluation metric changed depending on the business setting.
A second signal is that Swiggy seems to care about mechanics, not memorization. The candidate wasn’t just asked what OOB score means; they were pushed to derive why it lands around 37%. That kind of follow-up is a recurring marker of this process: if you mention a concept, expect to unpack the math or the tradeoff behind it. We also notice a practical streak in the coding round — clustering setup, NumPy/Pandas, and a simple but time-sensitive SQL query — which suggests the bar is less about exotic algorithms and more about whether you can translate ML thinking into working code and clean reasoning. Candidates who do best here are the ones whose resumes can survive a detailed audit.
Synthetized from 1 candidates reports by our editorial team.
Had an interview recently?
Share your experience. Unlock the full guide.
Real interview reports from people who went through the Swiggy process.
The first round felt like a full walkthrough of my resume, so I’d say the biggest mistake would be going in unprepared to defend every project choice. The interviewer started with my introduction and immediately asked whether I had a favorite project or wanted him to pick one. I chose one, and from there he drilled into where the data came from, what preprocessing I did, why I used a log transform on certain features, and even basic concepts like correlation, the correlation formula, and what a heatmap actually shows. He also asked about handling class imbalance, what data size is ideal for an ML model, and then moved into model behavior: why decision trees are more prone to overfitting than random forests, why a decision tree had higher accuracy than a random forest in my case, and what OOB score means. One question that stood out was a mathematical derivation of why the OOB score comes out to 37%, so it wasn’t just definitions — he wanted the reasoning too. After that, we switched to an NLP project and covered the problem statement, motivation, TF-IDF and its formula, bag of words, Keras vs TensorFlow, and why I chose random forest instead of logistic regression. He also asked about evaluation metrics beyond accuracy, with precision and recall formulas and when each should be prioritized, especially for a healthcare classification use case, plus the loss function and probability formula for logistic regression.
Round two was more practical and focused on ML coding in Python, NumPy, Pandas, and SQL. There was no DSA. The coding prompt was to work through a clustering-style setup with k=5, 50-dimensional data, and 1000 data points, where I had to randomly define cluster centroids, map data points to the nearest centroid, and create the data. The SQL question was straightforward but time-sensitive: given a table with employee_name and salary, return the employee_name with the 5th highest salary. Overall the process was pretty technical and very resume-driven, and I didn’t get an offer.
Prep tip from this candidate
Be ready to defend every choice in your projects, especially preprocessing, feature transforms, model selection, and evaluation metrics. For the second round, practice writing NumPy/Pandas code for centroid assignment and SQL for nth-highest salary queries without relying on DSA patterns.
Share your own interview experience to unlock all reports, or subscribe for full access.
Sourced from candidate reports and verified by our team.
Topics based on recent interview experiences.
Featured question at Swiggy
Write a function can_shift to return whether or not A can be shifted some number of places to get B
| Question | |
|---|---|
| Size of Joins | |
| Z and t-Tests | |
| Forecasting New Year Revenue | |
| Choosing k | |
| Empty Neighborhoods | |
| 2nd Highest Salary | |
| Experiment Validity | |
| Rolling Bank Transactions | |
| Customer Orders | |
| Comments Histogram | |
| Closest SAT Scores | |
| Subscription Overlap | |
| Top Three Salaries | |
| Upsell Transactions | |
| Button AB Test | |
| Monthly Customer Report | |
| First to Six | |
| Merge Sorted Lists | |
| Compute Deviation | |
| Download Facts | |
| SELECTive Wine Connoisseur | |
| Average Quantity | |
| Network Experiment Design | |
| 500 Cards | |
| Random Bucketing | |
| Random SQL Sample | |
| Longest Streak Users | |
| Manager Team Sizes | |
| Group Success |
Synthesized from candidate reports. Individual experiences may vary.
The first round is a very detailed walkthrough of your resume and past projects. Expect the interviewer to pick one project and drill into the problem statement, data source, preprocessing choices, feature engineering, model selection, and evaluation metrics, along with theory questions on topics like correlation, class imbalance, decision trees vs. random forests, OOB score, TF-IDF, bag of words, and logistic regression.
The second round focuses on practical implementation in Python, NumPy, Pandas, and SQL, with no DSA. Candidates may be asked to solve a clustering-style coding problem, such as generating synthetic data and assigning points to the nearest centroid, followed by a SQL query like finding the employee with the 5th highest salary.
Close preparation with examples that show ownership, communication, and how you work with cross-functional partners or technical peers. The available candidate evidence is sparse, so this stage is framed as a practical preparation bucket rather than a claim that every candidate saw a separate formal round. Where the source evidence blended final steps together, this stage captures the final evaluation themes without adding unsupported company-specific claims.