Interview Query
Top 24 Revolut Data Scientist Interview Questions + Guide in 2025

Revolut Data Scientist Interview Questions + Guide in 2025

Overview

Revolut is a global financial technology company that offers a wide range of financial services including banking, cryptocurrency trading, and stock trading, all through its innovative app.

As a Data Scientist at Revolut, you will play a critical role in leveraging data to drive strategic decisions and enhance customer experiences. This position involves analyzing complex datasets, developing predictive models, and communicating insights to stakeholders across various teams. Key responsibilities include conducting exploratory data analysis, implementing machine learning algorithms, and collaborating with product and engineering teams to integrate data-driven solutions into the company's products.

To excel in this role, candidates should possess strong programming skills in Python and SQL, a solid foundation in statistics and machine learning techniques, and the ability to visualize and present data findings effectively. Furthermore, a keen understanding of the financial industry and a passion for using data to solve real-world problems align closely with Revolut's mission to revolutionize the way people manage their finances.

This guide will help you prepare for your interview by providing insights into the expectations for the role, the types of questions you may encounter, and how to showcase your skills and experiences effectively.

Introduction

Founded in 2015 by Nik Storonsky and Vladyaslav Yatsenko, Revolut stands among the leading fintech providers, with more than 400 million transactions a month. It offers cross-border transactions, stock trading, crypto trading, personal loans, and commodity trading.

As a data analyst looking to join Revolut, your responsibilities will likely include developing fraud detection algorithms, analyzing customer spending patterns, optimizing user experience, and enhancing financial services through collaboration with cross-functional teams.

If you’re curious about how to respond to Revolut data scientist interview questions and prepare for the entirety of the process, then this article is definitely for you.

Revolut Data Scientist Interview Questions

Experience and Background

In this section, we’ll review the various interview questions that might be asked during a Data Scientist interview at Revolut. The interview process will assess a combination of technical skills, statistical knowledge, and problem-solving abilities. Candidates should be prepared to demonstrate their understanding of machine learning algorithms, data analysis techniques, and coding proficiency.

Machine Learning

1. Explain the difference between Lasso and Ridge regression.

Understanding regularization techniques is crucial for a Data Scientist, as they help prevent overfitting in models.

How to Answer

Discuss the key differences in how Lasso and Ridge apply penalties to the coefficients of the regression model, and when you would use each.

Example

"Lasso regression applies an L1 penalty, which can shrink some coefficients to zero, effectively performing variable selection. Ridge regression, on the other hand, applies an L2 penalty, which shrinks coefficients but does not eliminate them. I would use Lasso when I suspect that many features are irrelevant, while Ridge is useful when I believe all features contribute to the prediction."

2. What is PCA and how does it work?

PCA (Principal Component Analysis) is a fundamental technique for dimensionality reduction.

How to Answer

Explain the mathematical basis of PCA, including how it transforms data into a new coordinate system.

Example

"PCA works by identifying the directions (principal components) in which the data varies the most. It computes the eigenvectors and eigenvalues of the covariance matrix, allowing us to project the data onto a lower-dimensional space while retaining as much variance as possible."

3. Describe the concept of overfitting and how to prevent it.

Overfitting is a common issue in machine learning that candidates should be able to identify and address.

How to Answer

Discuss the signs of overfitting and various techniques to mitigate it.

Example

"Overfitting occurs when a model learns the noise in the training data rather than the underlying pattern. To prevent it, I use techniques such as cross-validation, regularization, and pruning in decision trees, as well as ensuring that the model is not overly complex relative to the amount of training data."

4. Can you explain the concept of ensemble methods?

Ensemble methods combine multiple models to improve performance.

How to Answer

Describe the different types of ensemble methods and their advantages.

Example

"Ensemble methods like bagging and boosting combine the predictions of multiple models to improve accuracy. Bagging reduces variance by averaging predictions from multiple models, while boosting focuses on correcting errors made by previous models, leading to a strong final model."

5. What is the difference between supervised and unsupervised learning?

Understanding the distinction between these two types of learning is fundamental for a Data Scientist.

How to Answer

Define both terms and provide examples of each.

Example

"Supervised learning involves training a model on labeled data, where the outcome is known, such as predicting house prices. Unsupervised learning, on the other hand, deals with unlabeled data, where the goal is to find hidden patterns, like clustering customers based on purchasing behavior."

Statistics & Probability

1. What is the Central Limit Theorem?

The Central Limit Theorem is a key concept in statistics that candidates should be familiar with.

How to Answer

Explain the theorem and its implications for sampling distributions.

Example

"The Central Limit Theorem states that the distribution of the sample mean will approach a normal distribution as the sample size increases, regardless of the original distribution of the data. This is crucial for making inferences about population parameters based on sample statistics."

2. How do you handle missing data?

Handling missing data is a common challenge in data analysis.

How to Answer

Discuss various strategies for dealing with missing values.

Example

"I handle missing data by first assessing the extent and pattern of the missingness. Depending on the situation, I might use imputation techniques, such as filling in missing values with the mean or median, or I might choose to remove records with missing data if they are not significant."

3. Explain the difference between Type I and Type II errors.

Understanding these errors is essential for hypothesis testing.

How to Answer

Define both types of errors and their implications in statistical testing.

Example

"A Type I error occurs when we reject a true null hypothesis, while a Type II error happens when we fail to reject a false null hypothesis. Balancing these errors is crucial in hypothesis testing, as it affects the reliability of our conclusions."

4. What is a p-value?

P-values are a fundamental concept in statistical hypothesis testing.

How to Answer

Define p-values and their significance in hypothesis testing.

Example

"A p-value measures the strength of evidence against the null hypothesis. A low p-value indicates that the observed data is unlikely under the null hypothesis, leading us to reject it. However, it’s important to consider the context and not rely solely on p-values for decision-making."

5. Describe the Wilcoxon test and when to use it.

The Wilcoxon test is a non-parametric statistical test.

How to Answer

Explain the test and its applications.

Example

"The Wilcoxon test is used to compare two paired samples to assess whether their population mean ranks differ. It’s particularly useful when the data does not meet the assumptions of normality required for a t-test."

Coding and Data Manipulation

1. How would you implement a binary search algorithm?

Coding proficiency is essential for a Data Scientist.

How to Answer

Describe the algorithm and its efficiency.

Example

"A binary search algorithm works by repeatedly dividing a sorted array in half to locate a target value. It has a time complexity of O(log n), making it efficient for large datasets. I would implement it recursively or iteratively, depending on the requirements."

2. Explain how you would handle large datasets in Python.

Handling large datasets is a common task for Data Scientists.

How to Answer

Discuss techniques and libraries that facilitate working with large data.

Example

"I would use libraries like Pandas for data manipulation and Dask for parallel computing to handle large datasets that don’t fit into memory. Additionally, I would consider using SQL databases for efficient querying and data retrieval."

3. What is the purpose of using Jupyter Notebooks?

Jupyter Notebooks are widely used in data science for various reasons.

How to Answer

Explain the advantages of using Jupyter Notebooks for data analysis.

Example

"Jupyter Notebooks allow for interactive data analysis, combining code execution, visualization, and documentation in one place. This makes it easier to share insights and collaborate with others, as well as to present findings in a clear and organized manner."

4. How do you optimize a SQL query?

Optimizing SQL queries is crucial for performance in data retrieval.

How to Answer

Discuss strategies for improving SQL query performance.

Example

"I optimize SQL queries by using indexing, avoiding SELECT *, and ensuring that joins are performed on indexed columns. Additionally, I analyze query execution plans to identify bottlenecks and make adjustments accordingly."

5. Describe a time you used data visualization to communicate findings.

Data visualization is key in conveying insights effectively.

How to Answer

Provide an example of how you used visualization to present data.

Example

"I created a series of visualizations using Matplotlib and Seaborn to illustrate customer behavior trends over time. By presenting the data in a clear and engaging manner, I was able to effectively communicate the insights to stakeholders, leading to data-driven decisions."

Question
Topics
Difficulty
Ask Chance
Machine Learning
Hard
Very High
Machine Learning
Medium
Very High
Loading pricing options

View all Revolut Data Scientist questions

Revolut Data Scientist Interview Tips

Here are some tips to help you excel in your interview.

Understand the Interview Structure

The interview process at Revolut typically consists of multiple stages, including an initial screening call, a take-home challenge, technical interviews, and a final interview focused on cultural fit. Familiarize yourself with this structure and prepare accordingly. Knowing what to expect at each stage will help you manage your time and energy effectively.

Prepare for Technical Challenges

Expect to face a variety of technical questions that assess your knowledge in data science, machine learning, and programming. Brush up on key concepts such as linear regression, regularization techniques, and algorithms like random forests and gradient boosting. Additionally, practice coding challenges on platforms like LeetCode or HackerRank, as many candidates reported live coding exercises during their interviews.

Focus on Real-World Applications

Revolut values practical knowledge and the ability to apply theoretical concepts to real-world problems. When preparing for your take-home challenge, ensure that your solutions are not only correct but also well-documented and easy to understand. Be ready to discuss your thought process and the rationale behind your decisions during the technical interviews.

Communicate Clearly and Confidently

During interviews, especially when presenting your take-home challenge, clarity is key. Practice explaining your findings and methodologies in a straightforward manner. Be prepared to answer follow-up questions and defend your approach. This will demonstrate your ability to communicate complex ideas effectively, which is crucial in a collaborative environment.

Emphasize Cultural Fit

Revolut places a strong emphasis on cultural fit, so be prepared to discuss your values and how they align with the company's mission. Research Revolut's culture and be ready to articulate why you want to work there and how you can contribute to the team. Show enthusiasm for the company's goals and a willingness to adapt to its fast-paced environment.

Seek Feedback and Learn

If you receive feedback during the interview process, take it seriously and use it as an opportunity to improve. Many candidates appreciated the constructive feedback they received, which helped them understand areas for growth. If you don’t receive feedback, don’t hesitate to ask for it, as it can provide valuable insights for future interviews.

Stay Positive and Resilient

The interview process can be lengthy and sometimes frustrating. Maintain a positive attitude throughout, even if you encounter setbacks. Many candidates reported mixed experiences, but those who remained resilient and focused on their goals ultimately found success. Remember that each interview is a learning opportunity, regardless of the outcome.

By following these tips and preparing thoroughly, you can enhance your chances of success in the interview process at Revolut. Good luck!

Revolut Data Scientist Interview Process

The interview process for a Data Scientist role at Revolut is structured and consists of several key stages designed to assess both technical skills and cultural fit.

1. Initial Screening

The process begins with an initial phone interview with a recruiter or HR representative. This conversation typically lasts around 30 minutes and focuses on your background, motivation for applying to Revolut, and preliminary technical questions related to data science, Python, and SQL. This stage is crucial for establishing a rapport and understanding your fit within the company culture.

2. Take-Home Assignment

Following the initial screening, candidates are usually given a take-home assignment that can take up to a week to complete. This assignment often involves practical data science tasks, such as data preparation, analysis, and building machine learning models. The goal is to evaluate your problem-solving skills and ability to work with real-world data. Candidates are expected to present their findings in a clear and structured manner.

3. Technical Interviews

After successfully completing the take-home assignment, candidates typically participate in one or more technical interviews. These interviews may involve discussions with senior data scientists where you present your assignment results, answer technical questions related to data science and machine learning, and engage in live coding exercises. The focus here is on your technical expertise, analytical thinking, and ability to communicate complex ideas effectively.

4. Final Interview

The final stage of the interview process usually involves a conversation with team members or project managers. This interview assesses cultural fit and alignment between your skills and the team's goals. Expect questions that explore your previous experiences, how you approach teamwork, and your understanding of the business context in which data science operates.

Throughout the process, candidates can expect to receive feedback at each stage, which is a hallmark of Revolut's approach to recruitment.

As you prepare for your interview, it's essential to be ready for the specific questions that may arise during these stages.

What Revolut Looks for in a Data Scientist

1. Tell me about a time when you exceeded expectations during a project. What did you do, and how did you accomplish it?

A Revolut interviewer may ask this question to understand how you handle challenges and achieve success in projects.

How to Answer

Describe a project where you set ambitious goals, took initiative, and used your skills effectively to exceed expectations. Highlight your approach to problem-solving, collaboration with team members, and any innovative strategies you employed.

Example

“In a recent project, I was tasked with optimizing our fraud detection system. I took the initiative to analyze historical transaction data, identifying patterns and anomalies using advanced machine learning algorithms. By collaborating closely with the engineering team, we implemented a real-time monitoring system that significantly reduced false positives by 30%, exceeding the initial target. My proactive approach and data-driven insights were key to achieving this success.”

2. What are your three greatest strengths and weaknesses?

The interviewer at Revolut may ask this to gauge your understanding of your capabilities and areas for improvement as a data scientist.

How to Answer

Identify three strengths relevant to the position, highlighting skills like data analysis, problem-solving, and communication. For weaknesses, mention areas you’re actively working to improve, such as learning new programming languages or enhancing statistical modeling techniques.

Example

“In terms of strengths, I excel in data analysis, particularly in extracting actionable insights from complex datasets. My problem-solving skills allow me to tackle challenges efficiently, and my strong communication skills enable me to convey technical findings to non-technical stakeholders effectively. As for weaknesses, I’m currently focusing on improving my proficiency in advanced statistical modeling techniques to enhance predictive analytics.”

3. Tell me about a time when you had to deal with a large and complex dataset. How did you approach the problem, and what tools or techniques did you use to analyze the data?

This question assesses your experience handling complex data and your proficiency with analytical tools and techniques.

How to Answer

Describe a specific project where you dealt with a large and complex dataset. Discuss your approach to data cleaning, preprocessing, and analysis, as well as the tools and techniques you used, such as Python libraries like Pandas, NumPy, and scikit-learn, or SQL queries for data manipulation.

Example

“In a previous role, I worked on a project that involved analyzing customer transaction data from multiple sources to identify spending patterns and detect potential fraud. I first performed data cleaning and preprocessing to handle missing values and outliers using Python libraries like Pandas and NumPy. Then, I applied advanced statistical techniques and machine learning algorithms, such as clustering and anomaly detection, to uncover actionable insights from the dataset. Tools like Jupyter Notebooks and SQL were instrumental in managing and analyzing the large volume of data efficiently.”

4. Describe a situation where you faced a significant challenge or setback in a data analysis project. How did you overcome it, and what did you learn from the experience?

Your resilience and problem-solving skills when facing obstacles in a data analysis project will be assessed through this question.

How to Answer

Describe a challenge or setback you encountered in a data analysis project, how you addressed it, and the lessons learned. Emphasize your ability to adapt, collaborate with team members, and implement alternative solutions to overcome obstacles.

Example

“In a recent data analysis project, we faced challenges with data quality issues, resulting in inconsistencies and inaccuracies in the dataset. To address this setback, I worked closely with the data engineering team to identify the root causes and implement robust data validation processes. Additionally, I recalibrated our analytical models and performed sensitivity analyses to mitigate the impact of unreliable data. This experience taught me the importance of proactive data quality management and effective collaboration across teams to ensure accurate and reliable insights.”

5. Tell me about a time when you had to prioritize multiple competing tasks or projects. How did you decide what to focus on first, and how did you manage your workload to meet deadlines?

Revolut may ask this to check your organizational skills and time management abilities as a data scientist who may be required to work on multiple projects simultaneously.

How to Answer

Express a situation where you had to juggle multiple tasks or projects simultaneously. Discuss your approach to prioritization, including factors considered and strategies employed to meet deadlines. Emphasize your ability to delegate tasks, set realistic timelines, and adapt to changing priorities.

Example

“My previous role required frequently dealing with competing priorities and tight deadlines. When faced with multiple tasks, I first evaluated the urgency and importance of each task, considering factors like project deadlines and impact on business objectives. I then organized tasks into a prioritized list, focusing on high-impact projects while ensuring essential tasks were addressed promptly. To manage my workload effectively, I used project management tools like Trello to track progress and allocate time efficiently. By staying organized and adaptable, I consistently met deadlines and delivered quality results across multiple projects.”

6. You are given a dataframe with a single column, 'var'. Calculate the t-value for the mean of ‘var’ against a null hypothesis that μ=μ0.

Note: You do not have to calculate the p-value of the test or run the test.

Example:

Input:

mu0 = 1
print(df)
...
   var
0  -34
1   40
2  -89
3    5
4  -26

Output:

def t_score(mu0, df) ->
var   -1.015614
dtype: float64

Your data science interviewer at Revolut will assess your ability to calculate the t-value for a mean against a null hypothesis using Pandas DataFrame with this question.

How to Answer

To calculate the t-value for the mean of a single column against a null hypothesis, you can use the formula: t = (mean - μ0) / (std / sqrt(n)), where mean is the sample mean, μ0 is the hypothesized population mean, std is the standard deviation of the sample, and n is the sample size.

Example

import pandas as pd
import numpy as np
def t_score(mu0, df):
	mean = df['var'].mean()
	std = df['var'].std()
	n = len(df)
	t = (mean - mu0) / (std / np.sqrt(n))
	return t
mu0 = 1
df = pd.DataFrame({'var': [-34, 40, -89, 5, -26]})
print(t_score(mu0, df))

7. Let’s say you are designing a marketplace for your website. Selling firearms is prohibited by your website’s Terms of Service Agreement (not to mention the laws of your country). To this end, you want to create a system that can automatically detect if a listing on the marketplace is selling a gun. How would you do this?

Revolut may ask this to gauge your ability to implement solutions for compliance and regulatory purposes.

How to Answer

Describe a systematic approach to build a model that detects firearm sales listings in a marketplace. This may involve using natural language processing (NLP), image recognition, and machine learning algorithms to classify listings as firearm sales or non-firearm sales.

Example

“To detect firearm sales listings, I would first preprocess listing descriptions using NLP techniques to extract relevant keywords related to firearms. Additionally, I would use image recognition algorithms to scan listing images for recognizable firearm objects. Finally, I would train a machine learning model on labeled data to classify listings as either firearm sales or non-firearm sales based on textual and visual features.”

8. Imagine you are asked to build a machine learning model to decide on new loan approvals for a financial firm.

You ask the data department in the company for a subset of data to get started working on the problem. The data includes different features about applicants such as age, occupation, zip code, height, number of children, favorite color, etc. You decide to build multiple machine-learning models to test out different ideas before settling on the best one. How would you explain the bias-variance tradeoff with regards to building and choosing a model to use?

This question examines your understanding of the bias-variance tradeoff in the context of machine learning model selection.

How to Answer

Explain the bias-variance tradeoff in the context of machine learning model selection. Discuss how models with high bias may oversimplify the data, leading to underfitting, while models with high variance may capture noise in the data, leading to overfitting. Emphasize the need to find the right balance between bias and variance to optimize model performance.

Example

“The bias-variance tradeoff refers to the delicate balance between the simplicity and flexibility of a machine learning model. Models with high bias, such as linear regression, may oversimplify the underlying relationships in the data, resulting in underfitting and poor performance on both training and test datasets. On the other hand, models with high variance, such as decision trees with no constraints, may capture noise in the training data, leading to overfitting and poor generalization to unseen data. To find the optimal model, it’s essential to strike a balance between bias and variance by selecting a model complexity that minimizes both training and test errors.”

9. You are given a singly linked list. Write a function to find and return the last node of the list. If the list is empty, return null.

Your interviewer at Revolut may ask this to assess your proficiency in data structures and algorithms. This question evaluates your ability to manipulate linked lists in programming.

How to Answer

Implement a function to traverse the singly linked list until it reaches the last node and returns it. If the list is empty, return null.

Example

class ListNode:
    def __init__(self, val=0, next=None):
        self.val = val
        self.next = next

def last_node(head):
    if not head:
        return None
    while head.next:
        head = head.next
    return head

# Example usage:
# head = ListNode(1)
# head.next = ListNode(2)
# head.next.next = ListNode(3)
# print(last_node(head).val)  # Output: 3

10. You are given an array of numerical values, bootstrap samples, and size for a confidence interval. Write a function that performs bootstrap sampling on the given array and calculates the confidence interval based on the given size.

Note: The function should return a tuple containing the minimum and maximum values of the confidence interval rounded to the tenths place.

Example

Input:

values = [1, 2, 3, 4, 5]

Output

bootstrap_conf_interval(values, 1000, 0.95) -> (1.2, 4.8)

In this case, the function returns a tuple indicating that, based on our bootstrap samples, we are 95% confident that the population parameter lies between 1.2 and 4.8.

Note: Results may vary due to the randomness of bootstrap sampling.

Your ability to implement bootstrapping and calculate confidence intervals will be assessed through this question. You may be asked this to evaluate your statistical reasoning and coding skills.

How to Answer

Implement a function to perform bootstrap sampling on the given array and calculate the confidence interval based on the given size. The confidence interval can be calculated by taking percentiles of the bootstrap sample distribution.

Example

import numpy as np

def bootstrap_conf_interval(values, num_samples, confidence_level):
    bootstraps = np.random.choice(values, size=(num_samples, len(values)), replace=True)
    sample_means = np.mean(bootstraps, axis=1)
    lower_percentile = (1 - confidence_level) / 2
    upper_percentile = 1 - lower_percentile
    lower_bound = np.percentile(sample_means, lower_percentile * 100)
    upper_bound = np.percentile(sample_means, upper_percentile * 100)
    return round(lower_bound, 1), round(upper_bound, 1)

# Example usage:
# values = [1, 2, 3, 4, 5]
# print(bootstrap_conf_interval(values, 1000, 0.95))  # Output: (1.2, 4.8)

11. Suppose we have a binary classification model that determines whether an applicant is qualified for a loan.

Because we are a financial company, we must provide each rejected applicant with a reason. Given that we don’t have access to the feature weights, how would we give each rejected applicant a reason?

This question examines your problem-solving skills in providing reasons for rejection in a binary classification model without access to feature weights.

How to Answer

Discuss a systematic approach to provide reasons for rejection to unqualified applicants without access to feature weights. The solution may involve analyzing misclassified instances, identifying common patterns or features among rejected applicants, and developing rules or decision trees based on these patterns.

Example

“To provide reasons for rejection without access to feature weights, I would first analyze misclassified instances to identify common patterns among rejected applicants. For example, if a significant portion of rejected applicants have low credit scores and high debt-to-income ratios, these factors could be potential reasons for rejection. I would then develop rules or decision trees based on these patterns to explain to applicants why their application was rejected.”

12. Write a functionknnthat returns the nearest data point from a list of data points to a given query point. Use Euclidean distance as the similarity measure. For the purpose of this task, consider the scenario where k=1, meaning you only need to find the single closest data point.

Note: Using external libraries such as NumPy and scikit-learn is not allowed.

Example:

Input:

data_points = [
[1, 2, 3],
[4, 5, 6],
[7, 8, 9]
]

querying_point = [1, 9, 7]

Output:

def knn(data_point: List[List[float]], query_point: List[float]) -> [4, 5, 6]

The interviewer at Revolut may ask this to evaluate your proficiency in algorithmic coding and computational thinking as a data scientist.

How to Answer

Implement a function that calculates the Euclidean distance between the querying point and each data point in the list of data points. Then, return the data point that is nearest to the querying point based on the calculated distances.

Example

from typing import List

def knn(data_points: List[List[float]], query_point: List[float]) -> List[float]:
    min_distance = float('inf')
    nearest_point = None
    for point in data_points:
        distance = sum((x - y) ** 2 for x, y in zip(point, query_point)) ** 0.5
        if distance < min_distance:
            min_distance = distance
            nearest_point = point
    return nearest_point

# Example usage:
data_points = [[1, 2, 3], [4, 5, 6], [7, 8, 9]]
querying_point = [1, 9, 7]
print(knn(data_points, querying_point))  # Output: [4, 5, 6]

13. Write a query to identify the manager with the largest team. Assume there is only one.

Example:

Input:

employees table

Column Type
id INTEGER
name VARCHAR
manager_id INTEGER

managers table

Column Type
id INTEGER
name VARCHAR
team VARCHAR

Output:

Column Type
manager VARCHAR
team_size INTEGER

As a data scientist candidate, you may be asked this question to assess your ability to extract relevant information from a database and solve complex SQL queries.

How to Answer

Write an SQL query to join the employees and managers tables, group by manager, and calculate the size of each manager’s team. Then, select the manager with the largest team.

Example

SELECT managers.name AS manager,
       COUNT(employees.id) AS team_size
FROM employees
JOIN managers ON employees.manager_id = managers.id
GROUP BY managers.name
ORDER BY team_size DESC
LIMIT 1;

14. Can you explain the difference between supervised and unsupervised learning? Provide examples of each.

This question assesses your understanding and ability to provide examples of the fundamental differences between supervised and unsupervised learning.

How to Answer

Explain what supervised learning and unsupervised learning is and give examples of each.

Example

“Supervised learning involves training a model on labeled data, where the model learns to make predictions based on input-output pairs. An example of supervised learning is training a spam email classifier using labeled emails (spam or not spam).

In contrast, unsupervised learning involves training a model on unlabeled data, where the model learns to find patterns or structures in the data without explicit guidance. An example of unsupervised learning is clustering customer data to identify distinct customer segments based on their purchasing behavior.”

15. How would you handle imbalanced classes in a classification problem?

Revolut may ask this to assess your ability to address challenges commonly encountered in real-world data analysis scenarios.

How to Answer

Explain the techniques involved in handling imbalanced classes. Emphasize the importance of understanding the problem context and selecting the most suitable approach based on the specific dataset and business requirements.

Example

“To handle imbalanced classes, I would first explore resampling techniques such as oversampling the minority class using methods like Synthetic Minority Over-sampling Technique (SMOTE) or undersampling the majority class. Additionally, I would consider using evaluation metrics like precision-recall instead of accuracy to assess model performance more effectively in imbalanced datasets. Lastly, I would experiment with algorithms like Random Forest or Gradient Boosting Machines, which can handle class imbalance by adjusting class weights or incorporating sampling strategies, to improve model performance on imbalanced datasets.”

16. Explain the concept of regularization in machine learning. Why is it important, and what are some common regularization techniques?

Revolut may ask this to evaluate your knowledge of techniques for preventing overfitting and improving model generalization, which are necessary skills for a data scientist.

How to Answer

Explain what regularization is and how it’s used to prevent overfitting and improve the generalization of machine learning models by adding a penalty term to the loss function.

Example

“Regularization in machine learning is a technique used to prevent overfitting by adding a penalty term to the loss function. This penalty term discourages overly complex models with large coefficients, leading to improved generalization performance. Common regularization techniques include L1 regularization (Lasso), which adds the absolute values of coefficients to the loss function; L2 regularization (Ridge), which adds the squared values of coefficients to the loss function; and elastic net regularization, which combines both L1 and L2 penalties.”

17. Describe the differences between a decision tree and a random forest algorithm. When would you choose one over the other?

This question assesses your knowledge of decision trees and random forests, including their differences and when to choose one over the other.

How to Answer

Describe the differences between decision trees and random forests. Explain how decision trees are used when interpretability is important or the dataset is small, and describe how random forests are used when robustness and performance are priorities.

Example

“Decision trees are simple, interpretable models that recursively split the data based on feature thresholds to make predictions. However, they are prone to overfitting, especially with complex datasets. On the other hand, random forests are ensembles of decision trees where each tree is trained on a random subset of the data and features. Random forests reduce overfitting by averaging predictions from multiple trees, leading to better generalization performance. I would choose decision trees when interpretability is crucial or when working with a small dataset. In contrast, I would choose random forests when robustness and performance are priorities, especially with large and complex datasets.”

18. What are some key metrics you would track to evaluate user engagement with a mobile banking app like Revolut?

Your ability to identify key metrics for evaluating user engagement with a mobile banking app like Revolut will be evaluated through this question.

How to Answer

Identify and discuss key metrics that may help evaluate user engagement with Revolut. Include metrics such as active users, user retention rate, average session duration, and frequency of app usage.

Example

“To evaluate user engagement with a mobile banking app like Revolut, I would track key metrics such as active users, user retention rate, average session duration, frequency of app usage (daily, weekly, monthly), number of transactions per user, user satisfaction ratings through surveys or app store reviews, and conversion rates for specific features like account opening or card activation. These metrics provide insights into how users interact with the app and indicate the overall engagement level.”

19. Can you explain the concept of cohort analysis and how it can be useful in understanding user behavior?

The interviewer at Revolut may ask this to evaluate your proficiency as a day scientist in user behavior analytics and your ability to derive insights from data.

How to Answer

Explain what cohort analysis is and how it can be based on characteristics like signup date, acquisition channel, or demographic attributes. Mention that it helps identify trends, patterns, and differences in user behavior.

Example

“Cohort analysis is a powerful method in understanding user behavior by grouping users based on common characteristics or actions and analyzing their behavior over time. For example, we can create cohorts based on the signup date, acquisition channel, or demographic attributes of users. By tracking metrics like retention rate, engagement, and conversion rate for each cohort over time, we can identify trends, patterns, and differences in user behavior. Cohort analysis helps us understand how user behavior evolves and provides valuable insights for product improvement and targeted marketing strategies.”

20. Explain the central limit theorem and its importance in statistics.

This question examines your knowledge of the central limit theorem and how it works in statistics.

How to Answer

Explain the central limit theorem and how it enables us to make inferences about population parameters based on sample statistics, even when the population distribution is unknown or non-normal.

Example

“The central limit theorem is a fundamental concept in statistics that states that the distribution of sample means from a population approaches a normal distribution as the sample size increases, regardless of the population distribution. This theorem is important because it enables us to make inferences about population parameters, such as the mean or variance, based on sample statistics, even when the population distribution is unknown or non-normal. For example, when estimating the population mean from a sample, we can use the normal distribution to calculate confidence intervals or conduct hypothesis tests, assuming the sample size is sufficiently large.”

21. We’re given a table of bank transactions with three columns, user_id, a deposit or withdrawal value, and created_at time for each transaction. Write a query to get the total three-day rolling average for deposits by day.

This question is likely asked in a Revolut Data Scientist interview to assess your ability to work with time series data and financial metrics, which are crucial for a fintech company like Revolut.

How to Answer

When answering, explain that the key is to first aggregate the transactions by day to isolate daily deposits. Then, use a self-join to create a rolling three-day window, as SQL doesn’t natively support rolling calculations. This allows you to compute the rolling average by summing the relevant rows within each window.

Example

“To tackle this problem, I would first aggregate the transactions by day to focus solely on deposits. After that, I would use a self-join technique to simulate a rolling three-day window, since SQL doesn’t inherently support rolling calculations like some other languages. This approach allows me to calculate the rolling average by summing the deposits over the last three days for each date, providing a clear view of the trend in deposit activity.”

22. Given two strings, write a function to return True if the strings are anagrams of each other and False if they are not.

This question might be asked in a Revolut Data Scientist interview to assess your problem-solving skills and ability to write efficient code. Anagram detection is a common string manipulation problem that tests your understanding of algorithms and data structures, particularly around sorting and hash maps.

How to Answer

If the two strings are not equal length or they are the same word then they are not a valid anagram. Convert the two strings into 2 lists and sort them. For two anagrams when sorted, they become equal as the anagram is a rearrangement of letters.

Example

“I would start by checking if the two strings are the same length and if they are identical, as these conditions would immediately rule out them being anagrams. Then, I could convert both strings into lists of characters and sort them. If the sorted lists are identical, it would mean the strings are anagrams since sorting arranges the characters in the same order.”

23. Given a table exam_scores containing the data about all of the exams that students took, form a new table to track the scores for each student.

This task demonstrates the use of conditional aggregation in SQL to transform exam data into a pivot table. The objective is to create a single row for each student, showing their scores for specific exams.

How to Answer

To track scores for all four exams, use conditional logic (IF or CASE WHEN) to filter scores by exam_id and aggregate them with SUM() to ensure one row per student. Group by student_id to consolidate results, creating columns for each exam’s score within the same row.

Example

“To solve this, I’d first create separate columns for each exam by filtering scores using IF or CASE WHEN. Then, I’d use SUM() to aggregate these scores, ensuring each student’s exam scores appear in one row. Finally, I’d group by student_id to produce the required structure.”

24. A product manager decides to add threading to comments on group posts. The number of comments per user increased by 10%, but posts decreased by 2%. Why would that be?

This question examines your ability to analyze user behavior changes due to feature updates and suggests metrics to validate hypotheses. It evaluates your skills in identifying user interaction patterns, hypothesizing behavioral shifts, and proposing measurable outcomes.

How to Answer

Explain how threading organizes discussions, encouraging more comments within posts while reducing new posts. Highlight targeted notifications and reduced duplication as key drivers of this behavior. Suggest validating these hypotheses through metrics like average comments per post and posts per group member.

Example

“Threading organizes discussions and focuses notifications, increasing comments within posts but reducing new posts. It also prevents duplicate posts as users find answers in threads. To validate, I’d compare metrics like average comments per post and posts per member in a before-and-after analysis or an island test.”

How to Prepare for a Data Scientist Role at Revolut

Technical, behavioral, and analytical skills are critical in proving yourself as an efficient data scientist to Revolut. Here is a rough guideline on how to prepare for the role:

Master Data Science Fundamentals

Understand and learn to apply the core concepts of data science, such as algorithms, statistical modeling, data manipulation, and data visualization. Also, dive deeper into the popular Python libraries and frameworks, such as NumPy and Pandas. An extensive understanding of statistics & AB testing could also help you succeed in the data science interview at Revolut.

Technical Skills in Big Data Technologies

Acquire knowledge of big data technologies such as Apache Flink, Spark, and Hadoop to solidify your claim to the data scientist role at Revolut. Also, consider learning about financial and product metrics, which are often used in real-world data science projects involving marketing and risk management. Additionally, be sure you know something about distributed computing frameworks and batch-processing modes.

Advanced Machine Learning Techniques

Modeling and machine learning have become integral parts of the data science domain and are used for fraud detection, risk assessment, and personalization. Revolut especially focuses on deep learning, natural language processing (NLP), machine learning system design, and reinforcement learning. Ensure you have hours of learning and practice in these topics to stake your claim to the data science role at Revolut.

Collaboration and Communication Skills

It’s not enough to know concepts and answers. You need to convey your thought process to the Revolut interviewers. For that, practice a lot of data science behavioral questions and participate in our P2P mock interviews to refine your collaboration and communication skills. Moreover, religiously prepare the data science case study questions to avoid being caught off-guard during the interview rounds.

Prepare for Technical Interviews

During the technical rounds, you’ll be asked to solve a Take-Home Challenge and multiple Python and SQL interview questions. Be prepared for the challenges to avoid fumbling in front of the hiring manager.

For more details, follow our extensive data science interview guide.

FAQs

How much do data scientists at Revolut make in a year?

$112,792

Average Base Salary

$130,738

Average Total Compensation

Min: $80K
Max: $146K
Base Salary
Median: $111K
Mean (Average): $113K
Data points: 9
Min: $40K
Max: $201K
Total Compensation
Median: $128K
Mean (Average): $131K
Data points: 9

View the full Data Scientist at Revolut salary guide

The salary of data scientists at Revolut can vary based on factors such as experience, location, and specific job responsibilities. Depending on your level of experience, you may expect an average base salary of $123,000 and a total compensation of $179,000 as a data scientist at Revolut. However, as per our data scientist salary guide, senior employees command a more robust package.

Where can I read about others’ interview experiences for the Revolut Data Scientist role?

You’re welcome to explore our Slack community to read about other people’s interview experiences for the Revolut Data Scientist role. And after your interview, feel free to share your experience. We have real-time discussions about job interviews and share informative tips to help our candidates improve.

Does Interview Query have job postings for the Revolut Data Scientist role?

Yes. We have up-to-date info on job postings for the Revolut Data Scientist role. Check our Job Board to gain insight into the available positions and leave your application with them.

The Bottom Line

As a data scientist candidate at Revolut, you must have a deep understanding of machine learning models, Python, algorithms, and product metrics. We were, hopefully, able to guide you through the complexities of the Revolut interview process and answer the technical and project interview questions. To further If you have more queries, follow our Revolut main Interview guide and explore other positions such as data analyst and software engineer.