*Enlace a Spanish version
Materials: [ ModelsClasif1ENG.pdf]
This video discusses the applicability (well, rather the NON applicability) of least
squares in classification; The video is a continuation of the videos [
Although least squares for fitting a function to training data might be reasonable as a first option to start with the problem, it suffers from several drawbacks:
The cost is ‘symmetric’, but some problems fare better with an asymmetric goal,
The only thing that really matters in the fit is the sign of not its value,
Probability distributions (or their logarithms) with binary results (Bernouilly, or binomial if we count repetitions) are not proportional to ’squared error’ as actually is the logarithm of the normal distribution in least squares for fitting continuous data (under certain assumptions such as additive normal measurement noise).
Therefore, a ’naive’ solution of applying ’linear regression’ or, in general, optimizing a least-squares index for binary data, may not have the desired performance or may not have a formally adequate statistical interpretation.
*Link to my [ whole collection] of videos in English. Link to larger [ Colección completa] in Spanish.