Unsafe Content ML Design

0:00:00

Meta’s trust and safety team has been running an ML model that detects accounts posting prohibited content (adult content, illegal goods, scam ads, policy-restricted ads) and bans them. Currently, approximately 5% of all content on the platform is prohibited.

The ML team has developed a new model intended to replace the existing one. You don’t yet know whether the new model performs better or worse.

How would you determine if the new system is working? Walk through the stakeholders affected, the key metrics you’d track, and how you’d distinguish a well-performing model from a poorly-performing one.

Unsafe Content ML Design

Comments