Suppose we are training a binary classification algorithm. 

The outcome is imbalanced, with 99.8% of individuals in our sample having an outcome value of 0, and 0.2% having an outcome value of 1. To build a model, we down-sample the training data using a random sample of 1% of the individuals who have an outcome of 0 and keeping all individuals with an outcome of 1.

Now we train the model on the smaller sample and build a binary classifier that predicts the probability of an individual with a value of 1.

How would we then adjust our output probabilities to use this model on the total, imbalanced population?

Rebalancing outcome probabilities for a classifier on imbalanced data.

Rebalance Probabilities

Comments