Avi Chawla(@_avichawla ):You're in an ML Engineer interview at Apple. The interviewer asks:

2026.05.04 07:41

You're in an ML Engineer interview at Apple. The interviewer asks: "Two models are 88% accurate. - Model A is 89% confident. - Model B is 99% confident. Which one would you pick?" You: "Any would work since both have same accuracy." Interview over. Here's what you missed: Modern neural networks can be misleading. They are overconfident in their predictions. For instance, I saw an experiment that used the CIFAR-100 dataset to compare LeNet with ResNet. LeNet produced: - Accuracy = ~0.55 - Average confidence = ~0.54 ResNet produced: - Accuracy = ~0.7 - Average confidence = ~0.9 Despite being more accurate, the ResNet model is overconfident in its predictions. While the model thinks it's 90% confident in its predictions, in reality, it only turns out to be 70% accurate. Calibration solves this. A model is calibrated if the predicted probabilities align with the actual outcomes. For instance, say a model predicts an event with a 70% probability. Then, ideally, out of 100 such predictions, ~70 should result in the event. Handling this is important because the model will be used in decision-making. In fact, an overly confident that is not equally accurate model can be highly misleading. To exemplify, say a government hospital wants to conduct an expensive medical test on patients. To ensure that the govt. funding is used optimally, a reliable probability estimate can help the doctors make this decision. If the model isn't calibrated, it will produce overly confident predictions. Reliability Diagrams are a visual way to inspect how well the model is currently calibrated. More specifically, this diagram plots the expected sample accuracy as a function of the corresponding confidence value (softmax) output by the model. If the model is perfectly calibrated, then the diagram should look like the identity function. That said, it is often also useful to compute a scalar value that measures the amount of miscalibration, called expected calibration error (ECE). One way to approximate the expected calibration error shown above is by partitioning predictions into equally spaced bins and taking a weighted average of the bins’ accuracy/confidence difference. These are some common techniques to calibrate ML models: > For binary classification models: - Histogram binning - Isotonic regression - Platt scaling > For multiclass classification models: - Binning methods - Matrix and vector scaling 👉 If you care about probabilities and both models are operationally similar, which model would you prefer? ____ Find me → @_avichawla Every day, I share tutorials and insights on DS, ML, LLMs, and RAGs.

112

Forward to community