Pick the wrong loss function and your model optimises the wrong thing — perfectly. The loss is the single number training tries to shrink, so it quietly defines what "wrong" even means. I built an interactive visualiser of MSE, MAE, and cross-entropy so you can see why the choice matters.
🎯 Drag the prediction: https://dev48v.infy.uk/dl/day6-loss-functions.html
This is Day 6 of DeepLearningFromZero.
Loss = one number for "how wrong"
The network's output is compared to the truth and collapsed into one scalar. Everything in training exists to make that number smaller. Choose the loss and you've defined the network's entire goal.
MSE — square the error (regression)
const mse = (pred, y) => (pred - y) ** 2;
Squaring means off-by-4 hurts 16×, off-by-1 hurts 1×. MSE obsesses over large errors — great when big misses are unacceptable, risky when outliers will drag the model around.
MAE — absolute error, outlier-robust
const mae = (pred, y) => Math.abs(pred - y);
Linear penalty: off-by-4 hurts exactly 4× off-by-1. One wild outlier can't dominate. The trade-off is a constant gradient, so it can be slower and less precise near the answer.
Cross-entropy — for classification
When the output is a probability, you don't use MSE. Cross-entropy rewards confident-and-right and brutally punishes confident-and-wrong:
const bce = (p, y) => -(y * Math.log(p) + (1 - y) * Math.log(1 - p));
Predict 1% for the true class and the loss screams toward infinity. In the demo, switch to Classification and slide p toward 0 to watch it explode.
The slope is what learning actually uses
Backprop doesn't follow the loss value — it follows the loss's gradient (slope) downhill. That's why the shape matters: cross-entropy's steep slope when very wrong gives a strong corrective push, helping classifiers learn faster than MSE would.
grad = dLoss / dPred; // gradient descent steps along this
Choosing the loss is a design decision
Predicting a price? MSE or MAE. Yes/no? Binary cross-entropy. One-of-many? Categorical cross-entropy. Same network, different loss, genuinely different behaviour — because the loss encodes what you actually care about.
The takeaway
The loss is the goal. Match it to the task, and remember its slope is what drives the learning. Drag the prediction in the demo and watch MSE's parabola tower over MAE's gentle V.
For further actions, you may consider blocking this person and/or reporting abuse
