When Trusting 11 Nines Nearly Broke a Production ML Pipeline
https://padlet.com/interworldradiogvnkf/bookmarks-icdnmp5i5qpmrgew/wish/4b3zaM0bpVGrW2j7
When Trusting 11 Nines Nearly Broke Training: A Production Incident We had a model training cluster that churned through terabytes of image tiles every week