Revolut is a global financial technology company that offers a wide range of financial services including banking, cryptocurrency trading, and stock trading, all through its innovative app.
As a Data Scientist at Revolut, you will play a critical role in leveraging data to drive strategic decisions and enhance customer experiences. This position involves analyzing complex datasets, developing predictive models, and communicating insights to stakeholders across various teams. Key responsibilities include conducting exploratory data analysis, implementing machine learning algorithms, and collaborating with product and engineering teams to integrate data-driven solutions into the company's products.
To excel in this role, candidates should possess strong programming skills in Python and SQL, a solid foundation in statistics and machine learning techniques, and the ability to visualize and present data findings effectively. Furthermore, a keen understanding of the financial industry and a passion for using data to solve real-world problems align closely with Revolut's mission to revolutionize the way people manage their finances.
This guide will help you prepare for your interview by providing insights into the expectations for the role, the types of questions you may encounter, and how to showcase your skills and experiences effectively.
Founded in 2015 by Nik Storonsky and Vladyaslav Yatsenko, Revolut stands among the leading fintech providers, with more than 400 million transactions a month. It offers cross-border transactions, stock trading, crypto trading, personal loans, and commodity trading.
As a data analyst looking to join Revolut, your responsibilities will likely include developing fraud detection algorithms, analyzing customer spending patterns, optimizing user experience, and enhancing financial services through collaboration with cross-functional teams.
If you’re curious about how to respond to Revolut data scientist interview questions and prepare for the entirety of the process, then this article is definitely for you.
In this section, we’ll review the various interview questions that might be asked during a Data Scientist interview at Revolut. The interview process will assess a combination of technical skills, statistical knowledge, and problem-solving abilities. Candidates should be prepared to demonstrate their understanding of machine learning algorithms, data analysis techniques, and coding proficiency.
Understanding regularization techniques is crucial for a Data Scientist, as they help prevent overfitting in models.
Discuss the key differences in how Lasso and Ridge apply penalties to the coefficients of the regression model, and when you would use each.
"Lasso regression applies an L1 penalty, which can shrink some coefficients to zero, effectively performing variable selection. Ridge regression, on the other hand, applies an L2 penalty, which shrinks coefficients but does not eliminate them. I would use Lasso when I suspect that many features are irrelevant, while Ridge is useful when I believe all features contribute to the prediction."
PCA (Principal Component Analysis) is a fundamental technique for dimensionality reduction.
Explain the mathematical basis of PCA, including how it transforms data into a new coordinate system.
"PCA works by identifying the directions (principal components) in which the data varies the most. It computes the eigenvectors and eigenvalues of the covariance matrix, allowing us to project the data onto a lower-dimensional space while retaining as much variance as possible."
Overfitting is a common issue in machine learning that candidates should be able to identify and address.
Discuss the signs of overfitting and various techniques to mitigate it.
"Overfitting occurs when a model learns the noise in the training data rather than the underlying pattern. To prevent it, I use techniques such as cross-validation, regularization, and pruning in decision trees, as well as ensuring that the model is not overly complex relative to the amount of training data."
Ensemble methods combine multiple models to improve performance.
Describe the different types of ensemble methods and their advantages.
"Ensemble methods like bagging and boosting combine the predictions of multiple models to improve accuracy. Bagging reduces variance by averaging predictions from multiple models, while boosting focuses on correcting errors made by previous models, leading to a strong final model."
Understanding the distinction between these two types of learning is fundamental for a Data Scientist.
Define both terms and provide examples of each.
"Supervised learning involves training a model on labeled data, where the outcome is known, such as predicting house prices. Unsupervised learning, on the other hand, deals with unlabeled data, where the goal is to find hidden patterns, like clustering customers based on purchasing behavior."
The Central Limit Theorem is a key concept in statistics that candidates should be familiar with.
Explain the theorem and its implications for sampling distributions.
"The Central Limit Theorem states that the distribution of the sample mean will approach a normal distribution as the sample size increases, regardless of the original distribution of the data. This is crucial for making inferences about population parameters based on sample statistics."
Handling missing data is a common challenge in data analysis.
Discuss various strategies for dealing with missing values.
"I handle missing data by first assessing the extent and pattern of the missingness. Depending on the situation, I might use imputation techniques, such as filling in missing values with the mean or median, or I might choose to remove records with missing data if they are not significant."
Understanding these errors is essential for hypothesis testing.
Define both types of errors and their implications in statistical testing.
"A Type I error occurs when we reject a true null hypothesis, while a Type II error happens when we fail to reject a false null hypothesis. Balancing these errors is crucial in hypothesis testing, as it affects the reliability of our conclusions."
P-values are a fundamental concept in statistical hypothesis testing.
Define p-values and their significance in hypothesis testing.
"A p-value measures the strength of evidence against the null hypothesis. A low p-value indicates that the observed data is unlikely under the null hypothesis, leading us to reject it. However, it’s important to consider the context and not rely solely on p-values for decision-making."
The Wilcoxon test is a non-parametric statistical test.
Explain the test and its applications.
"The Wilcoxon test is used to compare two paired samples to assess whether their population mean ranks differ. It’s particularly useful when the data does not meet the assumptions of normality required for a t-test."
Coding proficiency is essential for a Data Scientist.
Describe the algorithm and its efficiency.
"A binary search algorithm works by repeatedly dividing a sorted array in half to locate a target value. It has a time complexity of O(log n), making it efficient for large datasets. I would implement it recursively or iteratively, depending on the requirements."
Handling large datasets is a common task for Data Scientists.
Discuss techniques and libraries that facilitate working with large data.
"I would use libraries like Pandas for data manipulation and Dask for parallel computing to handle large datasets that don’t fit into memory. Additionally, I would consider using SQL databases for efficient querying and data retrieval."
Jupyter Notebooks are widely used in data science for various reasons.
Explain the advantages of using Jupyter Notebooks for data analysis.
"Jupyter Notebooks allow for interactive data analysis, combining code execution, visualization, and documentation in one place. This makes it easier to share insights and collaborate with others, as well as to present findings in a clear and organized manner."
Optimizing SQL queries is crucial for performance in data retrieval.
Discuss strategies for improving SQL query performance.
"I optimize SQL queries by using indexing, avoiding SELECT *, and ensuring that joins are performed on indexed columns. Additionally, I analyze query execution plans to identify bottlenecks and make adjustments accordingly."
Data visualization is key in conveying insights effectively.
Provide an example of how you used visualization to present data.
"I created a series of visualizations using Matplotlib and Seaborn to illustrate customer behavior trends over time. By presenting the data in a clear and engaging manner, I was able to effectively communicate the insights to stakeholders, leading to data-driven decisions."
Here are some tips to help you excel in your interview.
The interview process at Revolut typically consists of multiple stages, including an initial screening call, a take-home challenge, technical interviews, and a final interview focused on cultural fit. Familiarize yourself with this structure and prepare accordingly. Knowing what to expect at each stage will help you manage your time and energy effectively.
Expect to face a variety of technical questions that assess your knowledge in data science, machine learning, and programming. Brush up on key concepts such as linear regression, regularization techniques, and algorithms like random forests and gradient boosting. Additionally, practice coding challenges on platforms like LeetCode or HackerRank, as many candidates reported live coding exercises during their interviews.
Revolut values practical knowledge and the ability to apply theoretical concepts to real-world problems. When preparing for your take-home challenge, ensure that your solutions are not only correct but also well-documented and easy to understand. Be ready to discuss your thought process and the rationale behind your decisions during the technical interviews.
During interviews, especially when presenting your take-home challenge, clarity is key. Practice explaining your findings and methodologies in a straightforward manner. Be prepared to answer follow-up questions and defend your approach. This will demonstrate your ability to communicate complex ideas effectively, which is crucial in a collaborative environment.
Revolut places a strong emphasis on cultural fit, so be prepared to discuss your values and how they align with the company's mission. Research Revolut's culture and be ready to articulate why you want to work there and how you can contribute to the team. Show enthusiasm for the company's goals and a willingness to adapt to its fast-paced environment.
If you receive feedback during the interview process, take it seriously and use it as an opportunity to improve. Many candidates appreciated the constructive feedback they received, which helped them understand areas for growth. If you don’t receive feedback, don’t hesitate to ask for it, as it can provide valuable insights for future interviews.
The interview process can be lengthy and sometimes frustrating. Maintain a positive attitude throughout, even if you encounter setbacks. Many candidates reported mixed experiences, but those who remained resilient and focused on their goals ultimately found success. Remember that each interview is a learning opportunity, regardless of the outcome.
By following these tips and preparing thoroughly, you can enhance your chances of success in the interview process at Revolut. Good luck!
The interview process for a Data Scientist role at Revolut is structured and consists of several key stages designed to assess both technical skills and cultural fit.
The process begins with an initial phone interview with a recruiter or HR representative. This conversation typically lasts around 30 minutes and focuses on your background, motivation for applying to Revolut, and preliminary technical questions related to data science, Python, and SQL. This stage is crucial for establishing a rapport and understanding your fit within the company culture.
Following the initial screening, candidates are usually given a take-home assignment that can take up to a week to complete. This assignment often involves practical data science tasks, such as data preparation, analysis, and building machine learning models. The goal is to evaluate your problem-solving skills and ability to work with real-world data. Candidates are expected to present their findings in a clear and structured manner.
After successfully completing the take-home assignment, candidates typically participate in one or more technical interviews. These interviews may involve discussions with senior data scientists where you present your assignment results, answer technical questions related to data science and machine learning, and engage in live coding exercises. The focus here is on your technical expertise, analytical thinking, and ability to communicate complex ideas effectively.
The final stage of the interview process usually involves a conversation with team members or project managers. This interview assesses cultural fit and alignment between your skills and the team's goals. Expect questions that explore your previous experiences, how you approach teamwork, and your understanding of the business context in which data science operates.
Throughout the process, candidates can expect to receive feedback at each stage, which is a hallmark of Revolut's approach to recruitment.
As you prepare for your interview, it's essential to be ready for the specific questions that may arise during these stages.
A Revolut interviewer may ask this question to understand how you handle challenges and achieve success in projects.
How to Answer
Describe a project where you set ambitious goals, took initiative, and used your skills effectively to exceed expectations. Highlight your approach to problem-solving, collaboration with team members, and any innovative strategies you employed.
Example
“In a recent project, I was tasked with optimizing our fraud detection system. I took the initiative to analyze historical transaction data, identifying patterns and anomalies using advanced machine learning algorithms. By collaborating closely with the engineering team, we implemented a real-time monitoring system that significantly reduced false positives by 30%, exceeding the initial target. My proactive approach and data-driven insights were key to achieving this success.”
The interviewer at Revolut may ask this to gauge your understanding of your capabilities and areas for improvement as a data scientist.
How to Answer
Identify three strengths relevant to the position, highlighting skills like data analysis, problem-solving, and communication. For weaknesses, mention areas you’re actively working to improve, such as learning new programming languages or enhancing statistical modeling techniques.
Example
“In terms of strengths, I excel in data analysis, particularly in extracting actionable insights from complex datasets. My problem-solving skills allow me to tackle challenges efficiently, and my strong communication skills enable me to convey technical findings to non-technical stakeholders effectively. As for weaknesses, I’m currently focusing on improving my proficiency in advanced statistical modeling techniques to enhance predictive analytics.”
This question assesses your experience handling complex data and your proficiency with analytical tools and techniques.
How to Answer
Describe a specific project where you dealt with a large and complex dataset. Discuss your approach to data cleaning, preprocessing, and analysis, as well as the tools and techniques you used, such as Python libraries like Pandas, NumPy, and scikit-learn, or SQL queries for data manipulation.
Example
“In a previous role, I worked on a project that involved analyzing customer transaction data from multiple sources to identify spending patterns and detect potential fraud. I first performed data cleaning and preprocessing to handle missing values and outliers using Python libraries like Pandas and NumPy. Then, I applied advanced statistical techniques and machine learning algorithms, such as clustering and anomaly detection, to uncover actionable insights from the dataset. Tools like Jupyter Notebooks and SQL were instrumental in managing and analyzing the large volume of data efficiently.”
Your resilience and problem-solving skills when facing obstacles in a data analysis project will be assessed through this question.
How to Answer
Describe a challenge or setback you encountered in a data analysis project, how you addressed it, and the lessons learned. Emphasize your ability to adapt, collaborate with team members, and implement alternative solutions to overcome obstacles.
Example
“In a recent data analysis project, we faced challenges with data quality issues, resulting in inconsistencies and inaccuracies in the dataset. To address this setback, I worked closely with the data engineering team to identify the root causes and implement robust data validation processes. Additionally, I recalibrated our analytical models and performed sensitivity analyses to mitigate the impact of unreliable data. This experience taught me the importance of proactive data quality management and effective collaboration across teams to ensure accurate and reliable insights.”
Revolut may ask this to check your organizational skills and time management abilities as a data scientist who may be required to work on multiple projects simultaneously.
How to Answer
Express a situation where you had to juggle multiple tasks or projects simultaneously. Discuss your approach to prioritization, including factors considered and strategies employed to meet deadlines. Emphasize your ability to delegate tasks, set realistic timelines, and adapt to changing priorities.
Example
“My previous role required frequently dealing with competing priorities and tight deadlines. When faced with multiple tasks, I first evaluated the urgency and importance of each task, considering factors like project deadlines and impact on business objectives. I then organized tasks into a prioritized list, focusing on high-impact projects while ensuring essential tasks were addressed promptly. To manage my workload effectively, I used project management tools like Trello to track progress and allocate time efficiently. By staying organized and adaptable, I consistently met deadlines and delivered quality results across multiple projects.”
'var'
. Calculate the t-value for the mean of ‘var
’ against a null hypothesis that μ=μ0.Note: You do not have to calculate the p-value of the test or run the test.
Example:
Input:
mu0 = 1
print(df)
...
var
0 -34
1 40
2 -89
3 5
4 -26
Output:
def t_score(mu0, df) ->
var -1.015614
dtype: float64
Your data science interviewer at Revolut will assess your ability to calculate the t-value for a mean against a null hypothesis using Pandas DataFrame with this question.
How to Answer
To calculate the t-value for the mean of a single column against a null hypothesis, you can use the formula: t = (mean - μ0) / (std / sqrt(n)), where mean is the sample mean, μ0 is the hypothesized population mean, std is the standard deviation of the sample, and n is the sample size.
Example
import pandas as pd
import numpy as np
def t_score(mu0, df):
mean = df['var'].mean()
std = df['var'].std()
n = len(df)
t = (mean - mu0) / (std / np.sqrt(n))
return t
mu0 = 1
df = pd.DataFrame({'var': [-34, 40, -89, 5, -26]})
print(t_score(mu0, df))
Revolut may ask this to gauge your ability to implement solutions for compliance and regulatory purposes.
How to Answer
Describe a systematic approach to build a model that detects firearm sales listings in a marketplace. This may involve using natural language processing (NLP), image recognition, and machine learning algorithms to classify listings as firearm sales or non-firearm sales.
Example
“To detect firearm sales listings, I would first preprocess listing descriptions using NLP techniques to extract relevant keywords related to firearms. Additionally, I would use image recognition algorithms to scan listing images for recognizable firearm objects. Finally, I would train a machine learning model on labeled data to classify listings as either firearm sales or non-firearm sales based on textual and visual features.”
You ask the data department in the company for a subset of data to get started working on the problem. The data includes different features about applicants such as age, occupation, zip code, height, number of children, favorite color, etc. You decide to build multiple machine-learning models to test out different ideas before settling on the best one. How would you explain the bias-variance tradeoff with regards to building and choosing a model to use?
This question examines your understanding of the bias-variance tradeoff in the context of machine learning model selection.
How to Answer
Explain the bias-variance tradeoff in the context of machine learning model selection. Discuss how models with high bias may oversimplify the data, leading to underfitting, while models with high variance may capture noise in the data, leading to overfitting. Emphasize the need to find the right balance between bias and variance to optimize model performance.
Example
“The bias-variance tradeoff refers to the delicate balance between the simplicity and flexibility of a machine learning model. Models with high bias, such as linear regression, may oversimplify the underlying relationships in the data, resulting in underfitting and poor performance on both training and test datasets. On the other hand, models with high variance, such as decision trees with no constraints, may capture noise in the training data, leading to overfitting and poor generalization to unseen data. To find the optimal model, it’s essential to strike a balance between bias and variance by selecting a model complexity that minimizes both training and test errors.”
Your interviewer at Revolut may ask this to assess your proficiency in data structures and algorithms. This question evaluates your ability to manipulate linked lists in programming.
How to Answer
Implement a function to traverse the singly linked list until it reaches the last node and returns it. If the list is empty, return null.
Example
class ListNode:
def __init__(self, val=0, next=None):
self.val = val
self.next = next
def last_node(head):
if not head:
return None
while head.next:
head = head.next
return head
# Example usage:
# head = ListNode(1)
# head.next = ListNode(2)
# head.next.next = ListNode(3)
# print(last_node(head).val) # Output: 3
Note: The function should return a tuple containing the minimum and maximum values of the confidence interval rounded to the tenths place.
Example
Input:
values = [1, 2, 3, 4, 5]
Output
bootstrap_conf_interval(values, 1000, 0.95) -> (1.2, 4.8)
In this case, the function returns a tuple indicating that, based on our bootstrap samples, we are 95% confident that the population parameter lies between 1.2 and 4.8.
Note: Results may vary due to the randomness of bootstrap sampling.
Your ability to implement bootstrapping and calculate confidence intervals will be assessed through this question. You may be asked this to evaluate your statistical reasoning and coding skills.
How to Answer
Implement a function to perform bootstrap sampling on the given array and calculate the confidence interval based on the given size. The confidence interval can be calculated by taking percentiles of the bootstrap sample distribution.
Example
import numpy as np
def bootstrap_conf_interval(values, num_samples, confidence_level):
bootstraps = np.random.choice(values, size=(num_samples, len(values)), replace=True)
sample_means = np.mean(bootstraps, axis=1)
lower_percentile = (1 - confidence_level) / 2
upper_percentile = 1 - lower_percentile
lower_bound = np.percentile(sample_means, lower_percentile * 100)
upper_bound = np.percentile(sample_means, upper_percentile * 100)
return round(lower_bound, 1), round(upper_bound, 1)
# Example usage:
# values = [1, 2, 3, 4, 5]
# print(bootstrap_conf_interval(values, 1000, 0.95)) # Output: (1.2, 4.8)
Because we are a financial company, we must provide each rejected applicant with a reason. Given that we don’t have access to the feature weights, how would we give each rejected applicant a reason?
This question examines your problem-solving skills in providing reasons for rejection in a binary classification model without access to feature weights.
How to Answer
Discuss a systematic approach to provide reasons for rejection to unqualified applicants without access to feature weights. The solution may involve analyzing misclassified instances, identifying common patterns or features among rejected applicants, and developing rules or decision trees based on these patterns.
Example
“To provide reasons for rejection without access to feature weights, I would first analyze misclassified instances to identify common patterns among rejected applicants. For example, if a significant portion of rejected applicants have low credit scores and high debt-to-income ratios, these factors could be potential reasons for rejection. I would then develop rules or decision trees based on these patterns to explain to applicants why their application was rejected.”
knn
that returns the nearest data point from a list of data points to a given query point. Use Euclidean distance as the similarity measure. For the purpose of this task, consider the scenario where k=1, meaning you only need to find the single closest data point.Note: Using external libraries such as NumPy and scikit-learn is not allowed.
Example:
Input:
data_points = [
[1, 2, 3],
[4, 5, 6],
[7, 8, 9]
]
querying_point = [1, 9, 7]
Output:
def knn(data_point: List[List[float]], query_point: List[float]) -> [4, 5, 6]
The interviewer at Revolut may ask this to evaluate your proficiency in algorithmic coding and computational thinking as a data scientist.
How to Answer
Implement a function that calculates the Euclidean distance between the querying point and each data point in the list of data points. Then, return the data point that is nearest to the querying point based on the calculated distances.
Example
from typing import List
def knn(data_points: List[List[float]], query_point: List[float]) -> List[float]:
min_distance = float('inf')
nearest_point = None
for point in data_points:
distance = sum((x - y) ** 2 for x, y in zip(point, query_point)) ** 0.5
if distance < min_distance:
min_distance = distance
nearest_point = point
return nearest_point
# Example usage:
data_points = [[1, 2, 3], [4, 5, 6], [7, 8, 9]]
querying_point = [1, 9, 7]
print(knn(data_points, querying_point)) # Output: [4, 5, 6]
Example:
Input:
employees
table
Column | Type |
---|---|
id | INTEGER |
name | VARCHAR |
manager_id | INTEGER |
managers
table
Column | Type |
---|---|
id | INTEGER |
name | VARCHAR |
team | VARCHAR |
Output:
Column | Type |
---|---|
manager | VARCHAR |
team_size | INTEGER |
As a data scientist candidate, you may be asked this question to assess your ability to extract relevant information from a database and solve complex SQL queries.
How to Answer
Write an SQL query to join the employees
and managers
tables, group by manager, and calculate the size of each manager’s team. Then, select the manager with the largest team.
Example
SELECT managers.name AS manager,
COUNT(employees.id) AS team_size
FROM employees
JOIN managers ON employees.manager_id = managers.id
GROUP BY managers.name
ORDER BY team_size DESC
LIMIT 1;
This question assesses your understanding and ability to provide examples of the fundamental differences between supervised and unsupervised learning.
How to Answer
Explain what supervised learning and unsupervised learning is and give examples of each.
Example
“Supervised learning involves training a model on labeled data, where the model learns to make predictions based on input-output pairs. An example of supervised learning is training a spam email classifier using labeled emails (spam or not spam).
In contrast, unsupervised learning involves training a model on unlabeled data, where the model learns to find patterns or structures in the data without explicit guidance. An example of unsupervised learning is clustering customer data to identify distinct customer segments based on their purchasing behavior.”
Revolut may ask this to assess your ability to address challenges commonly encountered in real-world data analysis scenarios.
How to Answer
Explain the techniques involved in handling imbalanced classes. Emphasize the importance of understanding the problem context and selecting the most suitable approach based on the specific dataset and business requirements.
Example
“To handle imbalanced classes, I would first explore resampling techniques such as oversampling the minority class using methods like Synthetic Minority Over-sampling Technique (SMOTE) or undersampling the majority class. Additionally, I would consider using evaluation metrics like precision-recall instead of accuracy to assess model performance more effectively in imbalanced datasets. Lastly, I would experiment with algorithms like Random Forest or Gradient Boosting Machines, which can handle class imbalance by adjusting class weights or incorporating sampling strategies, to improve model performance on imbalanced datasets.”
Revolut may ask this to evaluate your knowledge of techniques for preventing overfitting and improving model generalization, which are necessary skills for a data scientist.
How to Answer
Explain what regularization is and how it’s used to prevent overfitting and improve the generalization of machine learning models by adding a penalty term to the loss function.
Example
“Regularization in machine learning is a technique used to prevent overfitting by adding a penalty term to the loss function. This penalty term discourages overly complex models with large coefficients, leading to improved generalization performance. Common regularization techniques include L1 regularization (Lasso), which adds the absolute values of coefficients to the loss function; L2 regularization (Ridge), which adds the squared values of coefficients to the loss function; and elastic net regularization, which combines both L1 and L2 penalties.”
This question assesses your knowledge of decision trees and random forests, including their differences and when to choose one over the other.
How to Answer
Describe the differences between decision trees and random forests. Explain how decision trees are used when interpretability is important or the dataset is small, and describe how random forests are used when robustness and performance are priorities.
Example
“Decision trees are simple, interpretable models that recursively split the data based on feature thresholds to make predictions. However, they are prone to overfitting, especially with complex datasets. On the other hand, random forests are ensembles of decision trees where each tree is trained on a random subset of the data and features. Random forests reduce overfitting by averaging predictions from multiple trees, leading to better generalization performance. I would choose decision trees when interpretability is crucial or when working with a small dataset. In contrast, I would choose random forests when robustness and performance are priorities, especially with large and complex datasets.”
Your ability to identify key metrics for evaluating user engagement with a mobile banking app like Revolut will be evaluated through this question.
How to Answer
Identify and discuss key metrics that may help evaluate user engagement with Revolut. Include metrics such as active users, user retention rate, average session duration, and frequency of app usage.
Example
“To evaluate user engagement with a mobile banking app like Revolut, I would track key metrics such as active users, user retention rate, average session duration, frequency of app usage (daily, weekly, monthly), number of transactions per user, user satisfaction ratings through surveys or app store reviews, and conversion rates for specific features like account opening or card activation. These metrics provide insights into how users interact with the app and indicate the overall engagement level.”
The interviewer at Revolut may ask this to evaluate your proficiency as a day scientist in user behavior analytics and your ability to derive insights from data.
How to Answer
Explain what cohort analysis is and how it can be based on characteristics like signup date, acquisition channel, or demographic attributes. Mention that it helps identify trends, patterns, and differences in user behavior.
Example
“Cohort analysis is a powerful method in understanding user behavior by grouping users based on common characteristics or actions and analyzing their behavior over time. For example, we can create cohorts based on the signup date, acquisition channel, or demographic attributes of users. By tracking metrics like retention rate, engagement, and conversion rate for each cohort over time, we can identify trends, patterns, and differences in user behavior. Cohort analysis helps us understand how user behavior evolves and provides valuable insights for product improvement and targeted marketing strategies.”
This question examines your knowledge of the central limit theorem and how it works in statistics.
How to Answer
Explain the central limit theorem and how it enables us to make inferences about population parameters based on sample statistics, even when the population distribution is unknown or non-normal.
Example
“The central limit theorem is a fundamental concept in statistics that states that the distribution of sample means from a population approaches a normal distribution as the sample size increases, regardless of the population distribution. This theorem is important because it enables us to make inferences about population parameters, such as the mean or variance, based on sample statistics, even when the population distribution is unknown or non-normal. For example, when estimating the population mean from a sample, we can use the normal distribution to calculate confidence intervals or conduct hypothesis tests, assuming the sample size is sufficiently large.”
This question is likely asked in a Revolut Data Scientist interview to assess your ability to work with time series data and financial metrics, which are crucial for a fintech company like Revolut.
How to Answer
When answering, explain that the key is to first aggregate the transactions by day to isolate daily deposits. Then, use a self-join to create a rolling three-day window, as SQL doesn’t natively support rolling calculations. This allows you to compute the rolling average by summing the relevant rows within each window.
Example
“To tackle this problem, I would first aggregate the transactions by day to focus solely on deposits. After that, I would use a self-join technique to simulate a rolling three-day window, since SQL doesn’t inherently support rolling calculations like some other languages. This approach allows me to calculate the rolling average by summing the deposits over the last three days for each date, providing a clear view of the trend in deposit activity.”
This question might be asked in a Revolut Data Scientist interview to assess your problem-solving skills and ability to write efficient code. Anagram detection is a common string manipulation problem that tests your understanding of algorithms and data structures, particularly around sorting and hash maps.
How to Answer
If the two strings are not equal length or they are the same word then they are not a valid anagram. Convert the two strings into 2 lists and sort them. For two anagrams when sorted, they become equal as the anagram is a rearrangement of letters.
Example
“I would start by checking if the two strings are the same length and if they are identical, as these conditions would immediately rule out them being anagrams. Then, I could convert both strings into lists of characters and sort them. If the sorted lists are identical, it would mean the strings are anagrams since sorting arranges the characters in the same order.”
This task demonstrates the use of conditional aggregation in SQL to transform exam data into a pivot table. The objective is to create a single row for each student, showing their scores for specific exams.
How to Answer
To track scores for all four exams, use conditional logic (IF or CASE WHEN) to filter scores by exam_id and aggregate them with SUM() to ensure one row per student. Group by student_id to consolidate results, creating columns for each exam’s score within the same row.
Example
“To solve this, I’d first create separate columns for each exam by filtering scores using IF or CASE WHEN. Then, I’d use SUM() to aggregate these scores, ensuring each student’s exam scores appear in one row. Finally, I’d group by student_id to produce the required structure.”
This question examines your ability to analyze user behavior changes due to feature updates and suggests metrics to validate hypotheses. It evaluates your skills in identifying user interaction patterns, hypothesizing behavioral shifts, and proposing measurable outcomes.
How to Answer
Explain how threading organizes discussions, encouraging more comments within posts while reducing new posts. Highlight targeted notifications and reduced duplication as key drivers of this behavior. Suggest validating these hypotheses through metrics like average comments per post and posts per group member.
Example
“Threading organizes discussions and focuses notifications, increasing comments within posts but reducing new posts. It also prevents duplicate posts as users find answers in threads. To validate, I’d compare metrics like average comments per post and posts per member in a before-and-after analysis or an island test.”
Technical, behavioral, and analytical skills are critical in proving yourself as an efficient data scientist to Revolut. Here is a rough guideline on how to prepare for the role:
Understand and learn to apply the core concepts of data science, such as algorithms, statistical modeling, data manipulation, and data visualization. Also, dive deeper into the popular Python libraries and frameworks, such as NumPy and Pandas. An extensive understanding of statistics & AB testing could also help you succeed in the data science interview at Revolut.
Acquire knowledge of big data technologies such as Apache Flink, Spark, and Hadoop to solidify your claim to the data scientist role at Revolut. Also, consider learning about financial and product metrics, which are often used in real-world data science projects involving marketing and risk management. Additionally, be sure you know something about distributed computing frameworks and batch-processing modes.
Modeling and machine learning have become integral parts of the data science domain and are used for fraud detection, risk assessment, and personalization. Revolut especially focuses on deep learning, natural language processing (NLP), machine learning system design, and reinforcement learning. Ensure you have hours of learning and practice in these topics to stake your claim to the data science role at Revolut.
It’s not enough to know concepts and answers. You need to convey your thought process to the Revolut interviewers. For that, practice a lot of data science behavioral questions and participate in our P2P mock interviews to refine your collaboration and communication skills. Moreover, religiously prepare the data science case study questions to avoid being caught off-guard during the interview rounds.
During the technical rounds, you’ll be asked to solve a Take-Home Challenge and multiple Python and SQL interview questions. Be prepared for the challenges to avoid fumbling in front of the hiring manager.
For more details, follow our extensive data science interview guide.
Average Base Salary
Average Total Compensation
The salary of data scientists at Revolut can vary based on factors such as experience, location, and specific job responsibilities. Depending on your level of experience, you may expect an average base salary of $123,000 and a total compensation of $179,000 as a data scientist at Revolut. However, as per our data scientist salary guide, senior employees command a more robust package.
You’re welcome to explore our Slack community to read about other people’s interview experiences for the Revolut Data Scientist role. And after your interview, feel free to share your experience. We have real-time discussions about job interviews and share informative tips to help our candidates improve.
Yes. We have up-to-date info on job postings for the Revolut Data Scientist role. Check our Job Board to gain insight into the available positions and leave your application with them.
As a data scientist candidate at Revolut, you must have a deep understanding of machine learning models, Python, algorithms, and product metrics. We were, hopefully, able to guide you through the complexities of the Revolut interview process and answer the technical and project interview questions. To further If you have more queries, follow our Revolut main Interview guide and explore other positions such as data analyst and software engineer.