Text Search System

Start Timer

0:00:00

Upvote
1
Downvote
Save question
Mark as completed
View comments (1)

Suppose that you’re building a tool for in-house analytics at LinkedIn. Design a pipeline that ingests images and PDFs of resumes and transforms them into queryable text data.

Your data pipeline will have the following outflows:

  • A data mart that allows machine learning models to tap into the text data for natural language processing.
  • A data product that analysts in your company use to track keywords.
  • A search API that allows recruiters to scout candidates using certain keywords.

You may assume the following:

  • The image-to-text models are adequately accurate and are ready for use.
  • Data does not have to be real-time but has to minimize turnaround.
  • Another team is working with the privacy and security filters. You do not need to consider this in your design.

State other assumptions you might have upfront.

.
.
.
.
.


Comments

Loading comments