Training vs Validation vs Test Data

0:00:00

You’re building a machine learning model to predict hospital readmission risk using patient records from the past five years. The data includes patient demographics, diagnoses, treatment history, and outcomes, but hospitals periodically change treatment protocols and data collection standards over time.

How would you structure your training, validation, and test splits to ensure your model generalizes to future patients? What risks would arise if these datasets were not clearly separated, and how would you validate that your final evaluation reflects real-world performance?

Training vs Validation vs Test Data

Comments