Below are short expositions of machine learning topics, with a mathematical focus. The posts are not meant to be a first introduction to the topic (except perhaps for a mathematician), but rather exhibit the way I think about these topics and what I personally focus on.
The oft repeated mantra goes as follows; “Gradient descent takes a step in the direction of steepest descent,” with which nothing is wrong, but needs to be put under the microscope.
For a loss function \(\ell : \Theta \to \mathbb{R}\), and a step size \(\alpha > 0\), the update algorithm is \[\label{eq:gradient_descent} \mathbf{\boldsymbol{\theta}} \leftarrow \mathbf{\boldsymbol{\theta}} - \alpha \nabla \ell(\mathbf{\boldsymbol{\theta}}).\] The intuitive picture is that we stand on a hilly landscape during an thick morning fog and want to go downhill. We can only sense the immediate steepness and take a step downhill along the negative gradient direction.... Read full post →
Let \(\mathcal{X}\times \mathcal{Y}\) be the data space split along an input-label axis. The hypothesis class is a collection of functions \(f \in \mathcal{F}\) \[f : \mathcal{X}\times \Theta \to \mathcal{Y}.\] For example, the hypothesis class could be a neural network with \(P\) weights (and biases), then \(\Theta = \mathbb{R}^P\) and \(f(\mathbf{\boldsymbol{x}}, \mathbf{\boldsymbol{\theta}})\) would be the function defined by the network.... Read full post →
This index was automatically generated on June 11, 2025 at 11:55 AM