Revolut Data Scientist Interview Questions + Guide in 2024

Revolut Data Scientist Interview Questions + Guide in 2024Revolut Data Scientist Interview Questions + Guide in 2024

Overview

Founded in 2015 by Nik Storonsky and Vladyaslav Yatsenko, Revolut stands among the leading fintech providers, with more than 400 million transactions a month. It offers services that include cross-border transactions, stock trading, crypto trading, personal loans, and commodity trading.

As a data analyst looking to join Revolut, your responsibilities will likely include developing fraud detection algorithms, analyzing customer spending patterns, optimizing user experience, and enhancing financial services through collaboration with cross-functional teams.

If you’re curious about how to pass and secure a position through Revolut’s data analyst interview, this article is definitely for you.

What Is the Interview Process Like for the Data Scientist Role at Revolut

Your experience throughout the interview process may vary depending on the data scientist role (risk, credit, computer vision, etc.) you’re applying for. However, as far as we can tell, Revolut follows a predefined pattern in interviews with data scientist candidates.

Application Submission and CV Screening

If you aren’t encouraged to apply by a Revolut recruiter, you can directly apply for your preferred role through the Revolut Career portal. The application process for data scientist candidates involves a questionnaire covering your contact details, relocation preferences, and CV submission. Questions about your experience developing algorithms, participation in similar projects, and programming skills may also follow.

Submit the application and wait for a recruiter to screen your CV. If you pass this phase, you’ll be invited to attend an initial screening call.

Initial Screening Call/Video Interview

Revolut prioritizes soft skills and job-related interview questions. During this stage of the data science interview, which usually lasts 15-20 minutes, interviewers assess your soft skills and knowledge of data science. They may ask behavioral and basic algorithm questions and discuss problems with machine learning models.

Revolut interviewers typically refrain from asking generic coding questions unrelated to your potential position. However, your hiring manager may also request a video meeting during this round.

Take-Home Assignment

After passing the initial screening, you’ll be assigned a take-home task to help the technical managers assess your understanding of machine learning models, product metrics, and programming skills. The task usually involves a simulated work environment assignment that you would be asked to resolve as a data scientist. Depending on the task, you may or may not be provided datasets to find a solution.

Technical On-Site Interview

If you’ve been successful in the previous rounds, you’ll be asked to appear for an on-site interview at your nearest Revolut office. During this stage, you’ll meet your potential immediate seniors, colleagues, and the hiring manager. They may conduct multiple one-on-one interviews and organize a group exercise with other data science candidates to judge your collaborative and leadership skills.

This round often includes discussions of cross-validation techniques, neural network architectures, regularization and data visualization techniques, and big data technologies.

If Revolut hires you as a data scientist, you’ll be notified via email or phone after the on-site meeting. They may also host a partner interview or integration round to help you settle into the office environment.

What Questions Are Asked in a Revolut Data Scientist Interview?

Revolut data science interviewers typically discuss machine learning, programming skills, and statistics. They will also ask behavioral questions to draw out your understanding of Revolut’s visions, goals, and culture.

Our experienced members have found a few questions effective in a Revolut data science interview.

1. Tell me about a time when you exceeded expectations during a project. What did you do, and how did you accomplish it?

A Revolut interviewer may ask this question to understand how you handle challenges and achieve success in projects.

How to Answer

Describe a project where you set ambitious goals, took initiative, and used your skills effectively to exceed expectations. Highlight your approach to problem-solving, collaboration with team members, and any innovative strategies you employed.

Example

“In a recent project, I was tasked with optimizing our fraud detection system. I took the initiative to analyze historical transaction data, identifying patterns and anomalies using advanced machine learning algorithms. By collaborating closely with the engineering team, we implemented a real-time monitoring system that significantly reduced false positives by 30%, exceeding the initial target. My proactive approach and data-driven insights were key to achieving this success.”

2. What are your three greatest strengths and weaknesses?

The interviewer at Revolut may ask this to gauge your understanding of your capabilities and areas for improvement as a data scientist.

How to Answer

Identify three strengths relevant to the position, highlighting skills like data analysis, problem-solving, and communication. For weaknesses, mention areas you’re actively working to improve, such as learning new programming languages or enhancing statistical modeling techniques.

Example

“In terms of strengths, I excel in data analysis, particularly in extracting actionable insights from complex datasets. My problem-solving skills allow me to tackle challenges efficiently, and my strong communication skills enable me to convey technical findings to non-technical stakeholders effectively. As for weaknesses, I’m currently focusing on improving my proficiency in advanced statistical modeling techniques to enhance predictive analytics.”

3. Tell me about a time when you had to deal with a large and complex dataset. How did you approach the problem, and what tools or techniques did you use to analyze the data?

This question assesses your experience in handling complex data and your proficiency with analytical tools and techniques.

How to Answer

Describe a specific project where you dealt with a large and complex dataset. Discuss your approach to data cleaning, preprocessing, and analysis, as well as the tools and techniques you used, such as Python libraries like Pandas, NumPy, and scikit-learn, or SQL queries for data manipulation.

Example

“In a previous role, I worked on a project that involved analyzing customer transaction data from multiple sources to identify spending patterns and detect potential fraud. I first performed data cleaning and preprocessing to handle missing values and outliers using Python libraries like Pandas and NumPy. Then, I applied advanced statistical techniques and machine learning algorithms, such as clustering and anomaly detection, to uncover actionable insights from the dataset. Tools like Jupyter Notebooks and SQL were instrumental in managing and analyzing the large volume of data efficiently.”

4. Describe a situation where you faced a significant challenge or setback in a data analysis project. How did you overcome it, and what did you learn from the experience?

Your resilience and problem-solving skills when facing obstacles in a data analysis project will be assessed through this question.

How to Answer

Describe a challenge or setback you encountered in a data analysis project, how you addressed it, and the lessons learned. Emphasize your ability to adapt, collaborate with team members, and implement alternative solutions to overcome obstacles.

Example

“In a recent data analysis project, we faced challenges with data quality issues, resulting in inconsistencies and inaccuracies in the dataset. To address this setback, I worked closely with the data engineering team to identify the root causes and implement robust data validation processes. Additionally, I recalibrated our analytical models and performed sensitivity analyses to mitigate the impact of unreliable data. This experience taught me the importance of proactive data quality management and effective collaboration across teams to ensure accurate and reliable insights.”

5. Tell me about a time when you had to prioritize multiple competing tasks or projects. How did you decide what to focus on first, and how did you manage your workload to meet deadlines?

Revolut may ask this to check your organizational skills and time management abilities as a data scientist who may be required to work on multiple projects simultaneously.

How to Answer

Express a situation where you had to juggle multiple tasks or projects simultaneously. Discuss your approach to prioritization, including factors considered and strategies employed to meet deadlines. Emphasize your ability to delegate tasks, set realistic timelines, and adapt to changing priorities.

Example

“My previous role required frequently dealing with competing priorities and tight deadlines. When faced with multiple tasks, I first evaluated the urgency and importance of each task, considering factors like project deadlines and impact on business objectives. I then organized tasks into a prioritized list, focusing on high-impact projects while ensuring essential tasks were addressed promptly. To manage my workload effectively, I used project management tools like Trello to track progress and allocate time efficiently. By staying organized and adaptable, I consistently met deadlines and delivered quality results across multiple projects.”

6. You are given a dataframe with a single column, 'var'. Calculate the t-value for the mean of ‘var’ against a null hypothesis that μ=μ0.

Note: You do not have to calculate the p-value of the test or run the test.

Example:

Input:

mu0 = 1
print(df)
...
   var
0  -34
1   40
2  -89
3    5
4  -26

Output:

def t_score(mu0, df) ->
var   -1.015614
dtype: float64

Your data science interviewer at Revolut will assess your ability to calculate the t-value for a mean against a null hypothesis using Pandas DataFrame with this question.

How to Answer

To calculate the t-value for the mean of a single column against a null hypothesis, you can use the formula: t = (mean - μ0) / (std / sqrt(n)), where mean is the sample mean, μ0 is the hypothesized population mean, std is the standard deviation of the sample, and n is the sample size.

Example

import pandas as pd
import numpy as np
def t_score(mu0, df):
	mean = df['var'].mean()
	std = df['var'].std()
	n = len(df)
	t = (mean - mu0) / (std / np.sqrt(n))
	return t
mu0 = 1
df = pd.DataFrame({'var': [-34, 40, -89, 5, -26]})
print(t_score(mu0, df))

7. Let’s say you are designing a marketplace for your website. Selling firearms is prohibited by your website’s Terms of Service Agreement (not to mention the laws of your country). To this end, you want to create a system that can automatically detect if a listing on the marketplace is selling a gun. How would you do this?

Revolut may ask this to gauge your ability to implement solutions for compliance and regulatory purposes.

How to Answer

Describe a systematic approach to build a model that detects firearm sales listings in a marketplace. This may involve using natural language processing (NLP), image recognition, and machine learning algorithms to classify listings as firearm sales or non-firearm sales.

Example

“To detect firearm sales listings, I would first preprocess listing descriptions using NLP techniques to extract relevant keywords related to firearms. Additionally, I would use image recognition algorithms to scan listing images for recognizable firearm objects. Finally, I would train a machine learning model on labeled data to classify listings as either firearm sales or non-firearm sales based on textual and visual features.”

8. Imagine you are asked to build a machine learning model to decide on new loan approvals for a financial firm.

You ask the data department in the company for a subset of data to get started working on the problem. The data includes different features about applicants such as age, occupation, zip code, height, number of children, favorite color, etc. You decide to build multiple machine learning models to test out different ideas before settling on the best one. How would you explain the bias-variance tradeoff with regards to building and choosing a model to use?

This question examines your understanding of the bias-variance tradeoff in the context of machine learning model selection.

How to Answer

Explain the bias-variance tradeoff in the context of machine learning model selection. Discuss how models with high bias may oversimplify the data, leading to underfitting, while models with high variance may capture noise in the data, leading to overfitting. Emphasize the need to find the right balance between bias and variance to optimize model performance.

Example

“The bias-variance tradeoff refers to the delicate balance between the simplicity and flexibility of a machine learning model. Models with high bias, such as linear regression, may oversimplify the underlying relationships in the data, resulting in underfitting and poor performance on both training and test datasets. On the other hand, models with high variance, such as decision trees with no constraints, may capture noise in the training data, leading to overfitting and poor generalization to unseen data. To find the optimal model, it’s essential to strike a balance between bias and variance by selecting a model complexity that minimizes both training and test errors.”

9. You are given a singly linked list. Write a function to find and return the last node of the list. If the list is empty, return null.

Your interviewer at Revolut may ask this to assess your proficiency in data structures and algorithms. This question evaluates your ability to manipulate linked lists in programming.

How to Answer

Implement a function to traverse the singly linked list until it reaches the last node and returns it. If the list is empty, return null.

Example

class ListNode:
    def __init__(self, val=0, next=None):
        self.val = val
        self.next = next

def last_node(head):
    if not head:
        return None
    while head.next:
        head = head.next
    return head

# Example usage:
# head = ListNode(1)
# head.next = ListNode(2)
# head.next.next = ListNode(3)
# print(last_node(head).val)  # Output: 3

10. You are given an array of numerical values, bootstrap samples, and size for a confidence interval. Write a function that performs bootstrap sampling on the given array and calculates the confidence interval based on the given size.

Note: The function should return a tuple containing the minimum and maximum values of the confidence interval rounded to the tenths place.

Example

Input:

values = [1, 2, 3, 4, 5]

Output

bootstrap_conf_interval(values, 1000, 0.95) -> (1.2, 4.8)

In this case, the function returns a tuple indicating that, based on our bootstrap samples, we are 95% confident that the population parameter lies between 1.2 and 4.8.

Note: Results may vary due to the randomness of bootstrap sampling.

Your ability to implement bootstrapping and calculate confidence intervals will be assessed through this question. You may be asked this to evaluate your statistical reasoning and coding skills.

How to Answer

Implement a function to perform bootstrap sampling on the given array and calculate the confidence interval based on the given size. The confidence interval can be calculated by taking percentiles of the bootstrap sample distribution.

Example

import numpy as np

def bootstrap_conf_interval(values, num_samples, confidence_level):
    bootstraps = np.random.choice(values, size=(num_samples, len(values)), replace=True)
    sample_means = np.mean(bootstraps, axis=1)
    lower_percentile = (1 - confidence_level) / 2
    upper_percentile = 1 - lower_percentile
    lower_bound = np.percentile(sample_means, lower_percentile * 100)
    upper_bound = np.percentile(sample_means, upper_percentile * 100)
    return round(lower_bound, 1), round(upper_bound, 1)

# Example usage:
# values = [1, 2, 3, 4, 5]
# print(bootstrap_conf_interval(values, 1000, 0.95))  # Output: (1.2, 4.8)

11. Suppose we have a binary classification model that determines whether an applicant is qualified for a loan.

Because we are a financial company, we must provide each rejected applicant with a reason. Given that we don’t have access to the feature weights, how would we give each rejected applicant a reason?

This question examines your problem-solving skills in providing reasons for rejection in a binary classification model without access to feature weights.

How to Answer

Discuss a systematic approach to provide reasons for rejection to unqualified applicants without access to feature weights. The solution may involve analyzing misclassified instances, identifying common patterns or features among rejected applicants, and developing rules or decision trees based on these patterns.

Example

“To provide reasons for rejection without access to feature weights, I would first analyze misclassified instances to identify common patterns among rejected applicants. For example, if a significant portion of rejected applicants have low credit scores and high debt-to-income ratios, these factors could be potential reasons for rejection. I would then develop rules or decision trees based on these patterns to explain to applicants why their application was rejected.”

12. Write a functionknnthat returns the nearest data point from a list of data points to a given query point. Use Euclidean distance as the similarity measure. For the purpose of this task, consider the scenario where k=1, meaning you only need to find the single closest data point.

Note: Using external libraries such as NumPy and scikit-learn is not allowed.

Example:

Input:

data_points = [
[1, 2, 3],
[4, 5, 6],
[7, 8, 9]
]

querying_point = [1, 9, 7]

Output:

def knn(data_point: List[List[float]], query_point: List[float]) -> [4, 5, 6]

The interviewer at Revolut may ask this to evaluate your proficiency in algorithmic coding and computational thinking as a data scientist.

How to Answer

Implement a function that calculates the Euclidean distance between the querying point and each data point in the list of data points. Then, return the data point that is nearest to the querying point based on the calculated distances.

Example

from typing import List

def knn(data_points: List[List[float]], query_point: List[float]) -> List[float]:
    min_distance = float('inf')
    nearest_point = None
    for point in data_points:
        distance = sum((x - y) ** 2 for x, y in zip(point, query_point)) ** 0.5
        if distance < min_distance:
            min_distance = distance
            nearest_point = point
    return nearest_point

# Example usage:
data_points = [[1, 2, 3], [4, 5, 6], [7, 8, 9]]
querying_point = [1, 9, 7]
print(knn(data_points, querying_point))  # Output: [4, 5, 6]

13. Write a query to identify the manager with the largest team. Assume there is only one.

Example:

Input:

employees table

Column Type
id INTEGER
name VARCHAR
manager_id INTEGER

managers table

Column Type
id INTEGER
name VARCHAR
team VARCHAR

Output:

Column Type
manager VARCHAR
team_size INTEGER

As a data scientist candidate, you may be asked this question to assess your ability to extract relevant information from a database and solve complex SQL queries.

How to Answer

Write an SQL query to join the employees and managers tables, group by manager, and calculate the size of each manager’s team. Then, select the manager with the largest team.

Example

SELECT managers.name AS manager,
       COUNT(employees.id) AS team_size
FROM employees
JOIN managers ON employees.manager_id = managers.id
GROUP BY managers.name
ORDER BY team_size DESC
LIMIT 1;

14. Can you explain the difference between supervised and unsupervised learning? Provide examples of each.

This question assesses your understanding and ability to provide examples of the fundamental differences between supervised and unsupervised learning.

How to Answer

Explain what supervised learning and unsupervised learning is and give examples of each.

Example

“Supervised learning involves training a model on labeled data, where the model learns to make predictions based on input-output pairs. An example of supervised learning is training a spam email classifier using labeled emails (spam or not spam).

In contrast, unsupervised learning involves training a model on unlabeled data, where the model learns to find patterns or structures in the data without explicit guidance. An example of unsupervised learning is clustering customer data to identify distinct customer segments based on their purchasing behavior.”

15. How would you handle imbalanced classes in a classification problem?

Revolut may ask this to assess your ability to address challenges commonly encountered in real-world data analysis scenarios.

How to Answer

Explain the techniques involved in handling imbalanced classes. Emphasize the importance of understanding the problem context and selecting the most suitable approach based on the specific dataset and business requirements.

Example

“To handle imbalanced classes, I would first explore resampling techniques such as oversampling the minority class using methods like Synthetic Minority Over-sampling Technique (SMOTE) or undersampling the majority class. Additionally, I would consider using evaluation metrics like precision-recall instead of accuracy to assess model performance more effectively in imbalanced datasets. Lastly, I would experiment with algorithms like Random Forest or Gradient Boosting Machines, which can handle class imbalance by adjusting class weights or incorporating sampling strategies, to improve model performance on imbalanced datasets.”

16. Explain the concept of regularization in machine learning. Why is it important, and what are some common regularization techniques?

Revolut may ask this to evaluate your knowledge of techniques for preventing overfitting and improving model generalization, which are necessary skills for a data scientist.

How to Answer

Explain what regularization is and how it’s used to prevent overfitting and improve the generalization of machine learning models by adding a penalty term to the loss function.

Example

“Regularization in machine learning is a technique used to prevent overfitting by adding a penalty term to the loss function. This penalty term discourages overly complex models with large coefficients, leading to improved generalization performance. Common regularization techniques include L1 regularization (Lasso), which adds the absolute values of coefficients to the loss function; L2 regularization (Ridge), which adds the squared values of coefficients to the loss function; and elastic net regularization, which combines both L1 and L2 penalties.”

17. Describe the differences between a decision tree and a random forest algorithm. When would you choose one over the other?

This question assesses your knowledge of decision trees and random forests, including their differences and when to choose one over the other.

How to Answer

Describe the differences between decision trees and random forests. Explain how decision trees are used when interpretability is important or the dataset is small, and describe how random forests are used when robustness and performance are priorities.

Example

“Decision trees are simple, interpretable models that recursively split the data based on feature thresholds to make predictions. However, they are prone to overfitting, especially with complex datasets. On the other hand, random forests are ensembles of decision trees where each tree is trained on a random subset of the data and features. Random forests reduce overfitting by averaging predictions from multiple trees, leading to better generalization performance. I would choose decision trees when interpretability is crucial or when working with a small dataset. In contrast, I would choose random forests when robustness and performance are priorities, especially with large and complex datasets.”

18. What are some key metrics you would track to evaluate user engagement with a mobile banking app like Revolut?

Your ability to identify key metrics for evaluating user engagement with a mobile banking app like Revolut will be evaluated through this question.

How to Answer

Identify and discuss key metrics that may help evaluate user engagement with Revolut. Include metrics such as active users, user retention rate, average session duration, and frequency of app usage.

Example

“To evaluate user engagement with a mobile banking app like Revolut, I would track key metrics such as active users, user retention rate, average session duration, frequency of app usage (daily, weekly, monthly), number of transactions per user, user satisfaction ratings through surveys or app store reviews, and conversion rates for specific features like account opening or card activation. These metrics provide insights into how users interact with the app and indicate the overall engagement level.”

19. Can you explain the concept of cohort analysis and how it can be useful in understanding user behavior?

The interviewer at Revolut may ask this to evaluate your proficiency as a day scientist in user behavior analytics and your ability to derive insights from data.

How to Answer

Explain what cohort analysis is and how it can be based on characteristics like signup date, acquisition channel, or demographic attributes. Mention that it helps identify trends, patterns, and differences in user behavior.

Example

“Cohort analysis is a powerful method in understanding user behavior by grouping users based on common characteristics or actions and analyzing their behavior over time. For example, we can create cohorts based on the signup date, acquisition channel, or demographic attributes of users. By tracking metrics like retention rate, engagement, and conversion rate for each cohort over time, we can identify trends, patterns, and differences in user behavior. Cohort analysis helps us understand how user behavior evolves and provides valuable insights for product improvement and targeted marketing strategies.”

20. Explain the central limit theorem and its importance in statistics.

This question examines your knowledge of the central limit theorem and how it works in statistics.

How to Answer

Explain the central limit theorem and how it enables us to make inferences about population parameters based on sample statistics, even when the population distribution is unknown or non-normal.

Example

“The central limit theorem is a fundamental concept in statistics that states that the distribution of sample means from a population approaches a normal distribution as the sample size increases, regardless of the population distribution. This theorem is important because it enables us to make inferences about population parameters, such as the mean or variance, based on sample statistics, even when the population distribution is unknown or non-normal. For example, when estimating the population mean from a sample, we can use the normal distribution to calculate confidence intervals or conduct hypothesis tests, assuming the sample size is sufficiently large.”

How to Prepare for a Data Scientist Role at Revolut

Technical, behavioral, and analytical skills are critical in proving yourself as an efficient data scientist to Revolut. Here is a rough guideline on how to prepare for the role:

Master Data Science Fundamentals

Understand and learn to apply the core concepts of data science, such as algorithms, statistical modeling, data manipulation, and data visualization. Also, dive deeper into the popular Python libraries and frameworks, such as NumPy and Pandas. An extensive understanding of statistics & AB testing could also help you succeed in the data science interview at Revolut.

Technical Skills in Big Data Technologies

Acquire knowledge of big data technologies such as Apache Flink, Spark, and Hadoop to solidify your claim to the data scientist role at Revolut. Also, consider learning about financial and product metrics that are often used in real-world data science projects involving marketing and risk management. Additionally, be sure you know something about distributed computing frameworks and batch-processing modes.

Advanced Machine Learning Techniques

Modeling and machine learning have become integral parts of the data science domain, and they are used for fraud detection, risk assessment, and personalization. Revolut especially focuses on deep learning, natural language processing (NLP), machine learning system design, and reinforcement learning. Ensure that you have hours of learning and practice in these topics to stake your claim to the data science role at Revolut.

Collaboration and Communication Skills

It’s not enough to know concepts and answers. You need to convey your thought process to the Revolut interviewers. For that, practice a lot of data science behavioral questions and participate in our P2P mock interviews to refine your collaboration and communication skills. Moreover, religiously prepare the data science case study questions to avoid being caught off-guard during the interview rounds.

Prepare for Technical Interviews

During the technical rounds, you’ll be asked to solve a Take-Home Challenge and multiple Python and SQL interview questions. Be well prepared for the challenges to avoid fumbling in front of the hiring manager.

For more details, follow our extensive data science interview guide.

FAQs

How much do data scientists at Revolut make in a year?

The salary of data scientists at Revolut can vary based on factors such as experience, location, and specific job responsibilities. Depending on your level of experience, you may expect an average base salary of $123,000 and a total compensation of $179,000 as a data scientist at Revolut. However, as per our data scientist salary guide, senior employees command a more robust package.

$123,102

Average Base Salary

$136,009

Average Total Compensation

Min: $101K
Max: $146K
Base Salary
Median: $121K
Mean (Average): $123K
Data points: 6
Min: $28K
Max: $202K
Total Compensation
Median: $161K
Mean (Average): $136K
Data points: 6

View the full Data Scientist at Revolut salary guide

Where can I read about others’ interview experiences for the Revolut Data Scientist role?

You’re welcome to explore our Slack community to read about other people’s interview experiences for the Revolut Data Scientist role. And after your interview, feel free to share your experience. We have real-time discussions about job interviews and share informative tips to help our candidates improve.

Does Interview Query have job postings for the Revolut Data Scientist role?

Yes. We have up-to-date info on job postings for the Revolut Data Scientist role. Check our Job Board to gain insight into the available positions and leave your application with them.

The Bottom Line

As a data scientist candidate at Revolut, you must have a deep understanding of machine learning models, Python, algorithms, and product metrics. We were, hopefully, able to guide you through the complexities of the Revolut interview process and answer the technical and project interview questions. To further If you have more queries, follow our Revolut main Interview guide and explore other positions such as data analyst and software engineer.