Position: Data Scientist
How was the interview process? What was it like?
What technical questions were asked?
What was one of your solutions?
(q&a from Glassdoor)"Number of visits" and "time between visits" of crawler will be different from a normal user.
Intutively crawler will have more number of visits then a average user
And time different between visits will have less variance than a normal user.
Number of visits and average time difference between two visits for each user can be calculated using a simple group by sql query
Number of visits
From our data we can come up with cutoff( say 99 percentile of number of visits ) and then tag ids with Number of visits above 99 percentile number as crawler.
Time difference between the visits :
variance of Time difference between visits should be lower then some cutoff , cutoff again is some percenile number of whole data