Text Search System
Start Timer
0:00:00
Suppose that you’re building a tool for in-house analytics at LinkedIn. Design a pipeline that ingests images and PDFs of resumes and transforms them into queryable text data.
Your data pipeline will have the following outflows:
- A data mart that allows machine learning models to tap into the text data for natural language processing.
- A data product that analysts in your company use to track keywords.
- A search API that allows recruiters to scout candidates using certain keywords.
You may assume the following:
- The image-to-text models are adequately accurate and are ready for use.
- Data does not have to be real-time but has to minimize turnaround.
- Another team is working with the privacy and security filters. You do not need to consider this in your design.
State other assumptions you might have upfront.
.
.
.
.
.
.
.
.
.
Comments