Citi is one of the largest global banks, serving over 200 million customer accounts across consumer, institutional, and markets businesses. Every credit decision, trading strategy, and digital interaction produces data that Citi increasingly relies on to manage risk, personalize products, and meet regulatory expectations. Data scientists sit inside this engine, turning noisy financial and customer data into models, tools, and insights that shape how Citi operates.
If you are preparing for a Citi data scientist interview, this guide walks through the role, how Citi typically evaluates candidates, and what to expect from the interview process. You will see how the loop blends machine learning, statistics, SQL, and business judgment, and you can use resources like the data science interview learning path and machine learning interview questions on Interview Query to structure your prep and get exposure to Citi data science interview questions that mirror the firm’s style.
Citi data scientists build models and analytics that support decisions in areas like consumer banking, risk, fraud, and operations. They work across large internal datasets, design experiments, and partner with business and technology teams to ship solutions into production.
Typical responsibilities include:
Many candidates strengthen their foundations with the data science interview learning path and practice SQL-heavy questions using the SQL interview learning path before interviewing.
The Citi data scientist role is a good fit if you want to work on applied machine learning problems that directly affect real customers, risk outcomes, and regulatory obligations at global scale.
Reasons candidates are drawn to this role:
If you want more practice with realistic end to end problems similar to what Citi might ask, you can work through take home style challenges or join mock interviews to refine how you communicate your thinking.
This section breaks down the Citi data scientist interview process end to end. Citi’s data scientist interview process typically combines online assessments, technical deep dives, and behavioral conversations that test both your modeling skills and your ability to work in a highly regulated financial environment. While the exact sequence can vary by team and location, most candidates see a structure like the one below.
| Stage | What it focuses on |
|---|---|
| Online assessment | Aptitude, reasoning, and basic analytics |
| Phone screen | Background fit, communication, and high level technical check |
| Technical interviews (1–2 rounds) | ML, statistics, coding, and project discussion |
| Behavioral or HR interview | Culture fit, teamwork, and long term motivation |
Many candidates use the data science interview learning path to review core concepts before starting, then move to more advanced practice using the modeling and machine learning interview path.
This initial screen is often referred to as the Citi data science assessment, which evaluates baseline reasoning, statistics, and analytics comfort.
What to expect:
Interviewers use this stage to ensure you are comfortable with numbers and can reason under time pressure.
Tip: Treat this like a warm up for later rounds. Practice timed questions, and review basic statistics and probability using curated sets from the data science interview learning path so the mechanics feel familiar.
The phone screen is usually a short call with a recruiter or, in some cases, a hiring manager or senior data scientist.
Typical topics:
This stage focuses more on communication and alignment than deep technical grilling.
Tip: Prepare one or two concise project summaries that cover problem, approach, and impact in under 90 seconds. Practicing this format in mock interviews can make the conversation smoother.
Most teams use a format similar to a Citi data science technical interview, blending ML fundamentals, statistics, SQL, and applied financial reasoning.
| Topic Area | What They Test | What It Looks Like |
|---|---|---|
| Machine learning fundamentals | Understanding of core algorithms, evaluation metrics, and model tradeoffs | Questions on regression, classification, ensemble methods, regularization, cross validation, bias variance, and model monitoring |
| Statistics and probability | Ability to reason about uncertainty in financial and operational data | Questions on distributions, hypothesis testing, correlations, sampling, confidence intervals, and inference |
| Python coding | Ability to manipulate data, implement logic, and explain modeling workflows | Small coding tasks in Python, often involving pandas, NumPy, or basic algorithmic patterns |
| SQL | Comfort querying structured banking and operational data | Joins, aggregations, window functions, filtering, and writing clean, efficient queries |
| Project deep dives | How you design models end to end and navigate real challenges | Detailed walkthroughs of past data science projects including assumptions, modeling choices, and impact |
| Applied financial problem solving | Ability to translate ambiguous business questions into modeling approaches | Hypothetical prompts on churn prediction, fraud detection, credit scoring, or operational optimization |
To practice these skills in a structured way, candidates often use the data science interview learning path and supplement it with the machine learning interview learning path for deeper ML pattern recognition.
Tip: Use a consistent structure in every technical answer: clarify the objective, outline your approach, explain tradeoffs, then describe how you would validate the model. Citi values clear thinking as much as technical correctness.
The final round is often an HR or hiring manager conversation that evaluates how you work on teams and how you align with Citi’s culture and long term needs.
Common themes:
Tip: Prepare a small set of STAR stories around topics like driving a project to completion, handling setbacks in a model deployment, or simplifying results for business partners. Practicing these stories in mock interviews can help you sound more confident and structured.
Citi data scientist interviews test a blend of machine learning depth, statistical reasoning, SQL proficiency, and structured thinking in financial problem solving. Below are the most common categories of questions you will encounter, with examples drawn from real Interview Query problems and Citi style interview patterns.
If you want a deeper library of practice questions, explore the machine learning interview learning path or the SQL interview learning path, which mirror the types of challenges seen in Citi’s process.
SQL interviews for Citi data scientist roles test your ability to manipulate large datasets, resolve messy data issues, interpret financial metrics, and write clean, reliable queries. You will work with transaction tables, payroll schemas, customer events, operational risk data, and time series metrics. Interviewers want to see whether you can reason cleanly about granularity, business rules, and reliable outputs.
Start by filtering for transactions occurring in 2020, then extract the month and group by it. You need to count distinct users, count total transactions, and sum the order amounts. Pay attention to duplicate user records or test transactions that may need to be excluded. Citi uses similar grouped reporting queries for monthly performance, fraud monitoring, and operational dashboards.
Tip: Clarify whether the user count should be distinct per month or lifetime distinct.
You need to join the employee and department tables, compute the total employees per department, filter out those with fewer than ten employees, then calculate the percentage above the salary threshold. A window function or order by clause can then rank the resulting percentages. This question evaluates your ability to chain together grouping, filtering, and ranking. Citi often asks similar compensation and workforce reporting questions across operations and HR analytics.
Tip: State how you would handle null salaries, which can affect calculation accuracy.
How would you identify and exclude duplicate client trades caused by a system replay using SQL?
This question tests real world banking data intuition. You must group or partition by keys such as trade id, client id, trade date, and product type. Then identify duplicates using COUNT(*) > 1 or ROW_NUMBER and keep only the canonical record. Citi uses these patterns heavily in trade reconciliation and operational risk analytics. You should also describe how you would validate that no legitimate trades are removed.
Tip: Explain at least one rule for selecting the correct canonical record such as earliest ingestion time.
Write a query to calculate month over month default rate changes for loan accounts.
Start by grouping loan data by month and computing default counts and total accounts. Then calculate the default rate, followed by a window function to compare it to the previous month. Be explicit about whether accounts defaulted in that month or were already defaulted from prior periods. Citi uses similar KPI calculations in consumer risk and credit quality reviews. Interviewers expect attention to detail around time granularity.
Tip: Mention how you would segment the analysis by borrower risk bands for interpretability.
You need to dedupe employees by selecting the most recent record using ROW_NUMBER or MAX based on an ID or timestamp. Partition by employee name or employee ID, then filter for the first row. The interviewer wants to see if you can diagnose upstream ETL issues and still produce clean analytical outputs. Citi places high importance on reconciliation and correcting schema inconsistencies.
Tip: Always mention how you would patch the upstream ETL pipeline to avoid repeated inserts.
You can practice this exact problem on the Interview Query dashboard, shown below. The platform lets you write and test SQL queries, view accepted solutions, and compare your performance with thousands of other learners. Features like AI coaching, submission stats, and language breakdowns help you identify areas to improve and prepare more effectively for data

Citi data scientist interviews emphasize structured ML reasoning, statistical clarity, and an ability to evaluate models used in credit risk, fraud detection, operations, and customer analytics. Interviewers want to see that you understand modeling tradeoffs, measurement frameworks, and how ML ties into actual financial outcomes.
The first model is a binary classification model predicting loan approval. To compare two models, you would evaluate performance on metrics like AUC, precision and recall, false positive rate, and calibration curves. Citi expects you to incorporate the fact that loans are paid monthly, so metrics such as time-dependent default curves and cohort analysis matter. Interviewers want to see that you can connect ML evaluation to real financial risk exposure.
Tip: Mention that business impact metrics such as expected loss or bad rate lift are often more important than pure accuracy.
How would you measure model drift and detect degradation in a production ML model used for fraud or credit decisioning?
Start by comparing predicted distributions over time to baseline distributions using population stability index or KL divergence. Monitor feature drifts, target drifts, and stability of odds ratios in logistic models. Explore whether changes come from upstream data quality or genuine population shifts. Citi values strong operational thinking around model risk controls. Your explanation should include alerts, dashboards, and regular backtesting cycles.
Tip: Clarify the cadence of monitoring, especially monthly or quarterly monitoring for regulated financial models.
How would you choose between a gradient boosting model and a logistic regression model for a credit scoring problem?
Begin by discussing interpretability requirements because many Citi models fall under regulatory scrutiny that favors simpler models. Then discuss nonlinear interactions, feature complexity, and predictive power. Gradient boosting may provide stronger lift but requires more extensive validation. Logistic regression is transparent, easier to stress test, and simpler to explain to auditors. Interviewers want to see that you can weigh technical performance against regulatory and operational constraints.
Tip: Always acknowledge that regulated credit models require documentation that often favors explainability.
How would you build a model to predict customer churn for Citi’s consumer banking division?
Start by defining churn clearly, such as no transactions for a specific time window. Engineer behavioral, transactional, demographic, and product engagement features. Test multiple models such as logistic regression, random forest, or XGBoost. Evaluate using precision and recall for the positive class, since churn events can be rare. Citi interviewers want to see your ability to tie outputs to retention strategies.
Tip: Explain how you would validate the model by running lift calculations on top decile risk scores.
How would you design a feature-selection process for a fraud detection model with thousands of potential predictors?
Start with filtering steps such as removing low variance or highly correlated features. Add embedded methods such as regularization or tree-based importance scoring. Use cross-validation to assess stability of selected features. Citi emphasizes feature governance, so explain how you would document feature definitions. You should also mention checking for leakage.
Tip: State explicitly how you would validate that features do not capture future information.
These questions test whether you can translate ambiguous business problems into structured data solutions. Citi heavily values problem framing, data intuition, and communication clarity because data scientists work across risk, operations, credit, digital banking, and customer analytics.
Start by defining a mismatch and identifying the most relevant location fields such as GPS coordinate history, request location, and pickup confirmation. Calculate deviations between expected and actual coordinates using distance thresholds. Plot distributions to see natural outliers versus systematic issues. Citi looks for structured diagnosis that separates user behavior, device errors, and upstream data issues. You should also mention validating time windows and sampling rates.
Tip: Explain how you would segment the error rate by region or device type for deeper insights.
Start by adjusting for the fact that each participant saw a different subset of shows. Normalize scores, compute average ratings, and test for statistical significance. Explore variance across segments such as age groups or prior genre preference. Citi interviewers want to see whether you can extract signal from sparse and unbalanced experimental data.
Tip: Describe how you would check rating bias caused by show ordering or participant fatigue.
You are building a model that classifies clients into risk tiers for operational efficiency. How would you validate that the tiering system is meaningful?
Begin by defining measurable outcomes tied to risk, such as complaint frequency, payment delinquencies, or manual case touches. Test whether tiers separate clients on these outcomes using ANOVA or lift analysis. Then validate stability across segments, time periods, and product types. This mirrors Citi’s operational risk segmentation practices. The interviewer wants to see thoughtful validation rather than blind modeling.
Tip: Mention building a confusion matrix of predicted versus actual outcomes to measure separation quality.
Citi wants to reduce manual reviews in a specific operations workflow. How would you design an experimentation framework to test an automation algorithm?
First identify which cases qualify for automation. Split workflow items into control and treatment groups while ensuring fair randomization. Set metrics such as time saved, reduction in errors, or case resolution success. Monitor unintended consequences like increased escalation rates. Citi expects you to understand both experimentation and risk control.
Tip: Clarify the guardrail metrics that must stay within acceptable thresholds.
You receive a dataset with millions of customer transactions but significant missing values and inconsistent timestamp formatting. How would you clean and prepare this for modeling?
Start by diagnosing missingness patterns across features. Standardize timestamps into a single time zone. Apply imputation strategies based on feature type and distribution. Validate that no duplicate or out-of-order transactions remain. Citi interviewers test whether you can bring order to messy operational data before modeling begins.
Tip: Highlight one or two automated checks you would build to prevent similar issues in the future.
Behavioral interviews at Citi test how you operate in a regulated, cross-functional environment. Expect questions about collaboration, ownership, risk awareness, and how you communicate complex ideas to non-technical partners. Strong answers show structured thinking, transparency, and a habit of validating assumptions early.
Describe a data science project you worked on. What were the main challenges you faced?
Citi looks for candidates who can manage ambiguity, navigate incomplete data, and work with multiple stakeholders across risk, product, and operations. Interviewers want to understand how you handled technical challenges such as data quality issues or model limitations, while balancing pressure from business partners for quick results. They also expect you to discuss how you validated assumptions and protected model integrity. A strong answer shows both the analytical depth of your work and your ability to communicate issues early. This question gives you space to demonstrate end-to-end ownership.
Tip: Mention one technical challenge and one stakeholder or process challenge to show full-spectrum problem solving.
Sample Answer: I built a churn prediction model for a retail banking segment where data quality varied significantly across regions. Midway through the project, I discovered inconsistent timestamp formats and missing behavioral signals for one market. I worked with data engineering to standardize ingestion logic and partnered with product managers to redefine churn windows. While engineering worked on the fix, I developed a temporary proxy model so leadership could still prioritize at-risk customers. After completion, the final model improved targeted retention performance by more than 15 percent.
Citi values communication as much as technical capability because data scientists frequently partner with risk, compliance, operations, and marketing teams. Interviewers want to understand how you simplify advanced concepts and provide actionable recommendations rather than model details. Strong answers highlight storytelling, clarity, and structured presentation. They also look for evidence that you create artifacts such as dashboards, documentation, or walkthroughs that empower teams to self-serve. Focus on impact rather than tools.
Tip: Tie your communication strategy to better decisions, faster execution, or reduced operational risk.
Sample Answer: When presenting a credit model, I focus first on the business question and the decision the stakeholder needs to make. Then I explain only the core mechanics of the model using visuals like score distributions and key drivers instead of equations. I also include a short one-page guide that clarifies definitions and common misinterpretations. For operations teams, I built a simple dashboard with explanations next to each KPI so analysts could review cases without needing my involvement. As a result, adoption increased and weekly review meetings became faster and more aligned.
What would your manager say about your strengths and weaknesses?
This question tests your self-awareness and whether you demonstrate habits aligned with Citi’s culture of control, collaboration, and accountability. Strong answers identify a strength that benefits cross-functional partners and a weakness that you are actively working to improve. Avoid generic traits and instead choose specific examples tied to work outcomes. Interviewers also look for humility combined with a growth mindset. Make sure the weakness does not undermine your core responsibilities as a data scientist.
Tip: Anchor each strength or weakness to a short example so it feels credible and measurable.
Sample Answer: My manager would say one of my strengths is methodical problem solving, especially when diagnosing data quality issues that could impact risk reporting. When two dashboards showed conflicting delinquency rates, I traced the issue back to mismatched filters and helped align definitions across teams. A weakness I have been improving is sharing early drafts instead of waiting until analyses are polished. I now schedule mid-project check-ins, which has sped up stakeholder feedback and reduced rework.
Talk about a time when you had difficulty aligning with a stakeholder. How did you resolve it?
Citi’s data scientists work with teams that have competing priorities, so interviewers want to see your ability to negotiate expectations and build alignment. A strong answer describes how you listened to concerns, clarified misunderstandings, and guided the discussion using data. You should demonstrate empathy without compromising analytical rigor. Interviewers expect you to highlight both the process and the outcome.
Tip: Show that you stay objective and use transparent communication to de-escalate misalignment.
Sample Answer: In a fraud analytics project, an operations manager disagreed with my recommendation to tighten certain thresholds because they feared increased manual reviews. I walked them through segmented data showing that the false-positive impact would be minimal, and I proposed a two-week pilot with monitoring safeguards. After reviewing results with them, we agreed on the updated thresholds, which improved fraud catch rate by more than 20 percent without materially increasing workload.
Tell me about a time you detected a major data quality issue. What did you do?
Data integrity is critical in financial services, and Citi wants data scientists who proactively identify and mitigate risks. Strong answers include how you validated assumptions, what triggered your suspicion, and how you escalated the issue. Interviewers also expect long-term fixes, not just quick patches. Demonstrating awareness of downstream impact is essential.
Tip: Emphasize both the immediate remediation and the systemic fix that prevented recurrence.
Sample Answer: While analyzing credit utilization trends, I noticed a sudden drop in one region that did not match market conditions. I traced the issue to an upstream ETL job that failed to load several days of transactions. I halted reporting, coordinated with engineering to restore missing data, and added a validation script that checks daily transaction counts before model refreshes. The fix prevented similar issues from reaching risk dashboards again.
You might think that behavioral interview questions are the least important, but they can quietly cost you the entire interview. In this video, Interview Query co-founder Jay Feng breaks down the most common behavioral questions and offers a clean framework for answering them effectively.
Preparing for a Citi data scientist interview requires strong machine learning fundamentals, crisp SQL skills, and an ability to reason about data in a regulated financial environment. Citi expects candidates to think clearly under constraints, validate assumptions, and communicate risks as effectively as they communicate insights. Every interview stage is designed to evaluate whether you can explain your models, work with complex operational data, and collaborate with stakeholders across risk, credit, compliance, consumer banking, and operations. Many candidates prepare using the machine learning interview questions library, the SQL interview learning path, and mock sessions focused on financial modeling.
Below are seven preparation strategies.
Master core machine learning fundamentals and how they apply to banking
Citi’s technical rounds emphasize ML intuition, model selection, and evaluation metrics more than obscure algorithms. Interviewers expect you to explain models clearly and justify design choices for financial applications such as credit risk, fraud detection, or churn. You should be able to discuss overfitting, leakage, monitoring, drift, and how you validate model robustness. Practice walking through end-to-end ML pipelines, not just the modeling portion.
Tip: Link every model explanation back to business impact, risk reduction, or operational efficiency.
Practice SQL for analytics, risk reporting, and messy data scenarios
Citi relies on SQL-heavy workflows for customer analytics, regulatory reporting, and operational data processing. Practice window functions, segmentation logic, deduplication, event ordering, and time series transformations using real datasets. Many problems in interviews mirror questions in the SQL interview learning path, especially multi-step queries. Be ready to explain your logic and validate assumptions.
Tip: Before writing any query, state the expected grain to avoid double counting.
Prepare to explain end-to-end data science projects with clarity and structure
Citi evaluates how well you communicate problem framing, data preparation, modeling choices, and business outcomes. Use a structured walkthrough: problem, data, approach, model, evaluation, business impact. Focus on decisions you made, trade-offs you considered, and how you validated results. This is often tested in both technical and behavioral rounds.
Tip: Choose one project where you handled a real constraint like imbalanced data or incomplete features.
Strengthen your understanding of model evaluation, monitoring, and governance
As a financial institution, Citi emphasizes transparency, fairness, and control in model development. Review evaluation metrics for classification, regression, time series, and anomaly detection, and understand when each is appropriate. Interviewers may also ask how you would monitor drift or set up alerts for model degradation. Strong candidates link evaluation to risk reduction.
Tip: Be ready to propose monitoring dashboards or validation steps that non-technical partners can interpret.
Review key concepts in statistics, probability, and hypothesis testing
Citi uses statistical rigor in risk modeling, A/B experiments, fraud detection, and operational forecasting. Expect questions on sampling, distributions, confidence intervals, significance testing, and Bayesian reasoning. Many candidates prepare with the statistics interview practice sets. Emphasize intuition and real examples instead of formula memorization.
Tip: Show how statistical reasoning catches issues early, such as data drift or inconsistent feature distributions.
Practice communicating complex models to stakeholders across risk, operations, and product
Clear communication is essential because Citi’s DS teams work closely with risk, credit, product, marketing, and compliance. Interviewers will test whether you can simplify an ML model, explain key drivers, and highlight limitations. Strong candidates show that they protect data quality and escalate issues responsibly. For practice, speak through examples from your projects during mock interviews.
Tip: Imagine explaining your model to a non-technical stakeholder in 90 seconds and practice that version.
Prepare for case-style business problems with structured problem solving
Citi often presents applied scenarios such as loan default prediction, fraud scoring, customer segmentation, and operational efficiency. Interviewers want to see structured reasoning: clarifying assumptions, outlining data needs, proposing a modeling approach, and describing metrics for success. Practicing with real cases in the machine learning case library helps simulate this style.
Tip: Think aloud so interviewers can evaluate your reasoning process step by step.
According to Levels.fyi, Citi data scientists in the United States earn between $120,000 and $168,000 per year, depending on level and location. Citi’s compensation is primarily base salary and bonus because the firm grants little to no equity for most data science roles.
Below is an overview of U.S. salary ranges by level.
| Level | Typical Title | Estimated Total Compensation (Annual) | Base (Annual) | Bonus (Annual) |
|---|---|---|---|---|
| C10 | Analyst | $120,000 | $108,000 | $13,200 |
| C11 | — | $120,000 | $120,000 | $0 |
| C12 | AVP | $120,000 | $120,000 | $2,900 |
| C13 | VP | $168,000 | $156,000 | $6,200 |
| C14 | SVP | Data incomplete | Data incomplete | Data incomplete |
These ranges reflect national averages and may vary depending on the role’s alignment with risk, fraud, operations, or consumer banking.
Average Base Salary
Average Total Compensation
Compensation for Citi data scientists varies significantly across major hubs. New York typically pays the highest, while Tampa and Dallas offer competitive salaries aligned with lower cost-of-living markets.
| Location | C10 Total (Annual) | C11 Total (Annual) | C12 Total (Annual) | C13 Total (Annual) |
|---|---|---|---|---|
| New York City Area | $120,000 | $120,000 | $144,000 | $168,000 |
| Tampa – St. Pete – Sarasota | — | — | $98,400 | $144,000 |
| Greater Dallas Area | — | — | $120,000 | $144,000 |
Insights:
Citi’s process is selective, especially for roles supporting risk, fraud, and consumer analytics. Candidates are evaluated on statistical depth, modeling intuition, SQL fluency, and clarity of communication. Practicing with structured problems in the machine learning interview questions library and the SQL interview learning path helps mirror Citi’s interview style.
Citi expects strong Python skills, solid SQL fundamentals, and a deep understanding of machine learning techniques. Experience with model evaluation, fairness, monitoring, and handling imbalanced datasets is important, especially for credit and fraud use cases. Candidates should also be comfortable with end to end project work, from data cleaning to communication.
Not always. Many roles focus on modeling, analytics, and experimentation rather than full production engineering. However, understanding how models are validated, monitored, and governed in a regulated environment is valuable. Highlighting experience with model pipelines, feature engineering, or MLOps tools is a plus.
Yes. SQL is a core component of Citi’s workflow across risk, operations, and consumer banking. Expect multi-step queries involving joins, window functions, segmentation logic, and data quality checks. Problems often resemble those found in the SQL practice set.
Very important. Citi values structured communication, collaboration, and risk awareness. Behavioral interviews assess how you handle ambiguity, escalate issues responsibly, and partner with cross functional teams. Strong answers clearly connect actions to business impact.
Choose projects where you solved a real problem, handled imperfect data, and produced measurable outcomes. Citi interviewers prefer projects involving classification, forecasting, anomaly detection, resource optimization, or financial modeling. Emphasize your reasoning process and the decisions your work enabled.
Often, yes. Because Citi operates in a regulated industry, data scientists frequently collaborate with risk, compliance, and audit teams. This makes transparency, model explainability, and careful documentation essential skills.
Most candidates complete the process in three to five weeks. Stages include an online assessment, phone or video technical screens, and a final round with a combination of technical and behavioral interviews. Timelines vary by team availability and role type.
To get hired at Citi, you need consistent fundamentals across machine learning, SQL, statistics, risk awareness, and communication. Top candidates show that they can reason clearly under constraints, validate data quality, and work within a regulated environment. Reviewing structured problems in the machine learning interview library and practicing SQL in the SQL learning path helps you build that consistency early in the process.
Preparing for Citi means learning to think like a modeler, analyst, and risk partner at the same time. The fastest way to sharpen those instincts is through structured practice that mirrors real interviews. Explore hands-on challenges in the machine learning interview library, reinforce your SQL fundamentals with the SQL interview learning path, and build your confidence through guided mock interviews.
Build a preparation routine that strengthens your clarity, structure, and modeling intuition so you can walk into your Citi interview ready to perform at your best.