\( \newcommand{\matr}[1] {\mathbf{#1}} \newcommand{\vertbar} {\rule[-1ex]{0.5pt}{2.5ex}} \newcommand{\horzbar} {\rule[.5ex]{2.5ex}{0.5pt}} \newcommand{\E} {\mathrm{E}} \)
abstract banner

Posts

Linux filesystem quiz

OS: Software: File: Lives where? Reveal Next Shuffle Reset score Correct answer: Explanation: Your result: Score: 0/0 Quiz question-answer pairs sourced from ChatGPT-5.2. Unix/Linux filesystem One of my favourite explanations, from a Reddit answer by AbsolutelyLudicrous: My own cheatsheet, given that I mostly work on Ubuntu or Fedora: A binary already installed or one I installed using apt or dnf? Then look under /usr/bin or /bin/ (which is often a symlink to /usr/bin/). Read more...

Inverse problems and variational auto-encoders

This post is going to trace a line from inverse probability problems to variational auto-encoders, where nothing really changes except for the symbols and terminology. An underlying concept that will remain intact is that we have a deterministic forward function, and from among the set of input parameters for this function, it makes sense to optimize over some of them and to integrate over the rest. Integration in the inverse setting relies on the stability of Gaussian distributions through linear maps in order to convert this integration to a linear transformation involving covariance matrices. Read more...

Four ways (two bad, two good) to calculate spike-triggered receptive fields

A spike of a neuron picks out a snippet of the visual stimulus being presented at that time. The collection of these snippets are the spike-triggered snippets, and the mean of the snippets is the spike-triggered average (STA). If the stimulus is a video, the STA will be a 3D volume. An array of snippets with shape \( (N, T, H, W) \). There are \( N \) snippets, one per spike. Read more...

Shifted noise for receptive field estimation

For receptive field mapping of stimulus driven neuronal responses, Gaussian checkerboard noise is popular due to its statistical properties. There is a trade-off when choosing the resolution of the grid. Small boxes increase the resolution of the estimated 2D receptive field, but as the box size is reduced, the likelihood that a group of nearby pixels will collectively elicit a response from a cell is reduced. One solution put forward is to have a large grid, and to add random offsets to this grid, with these random offsets being multiples of the desired finer resolution box size. Read more...

Frame registration and cell detection for 2-photon recordings

I worked on a project that involved carrying out 2-photon recording of the tectum of a zebrafish larva. For a single larva, we recorded 14 recordings, each 15 minutes long followed by an approximately 2-minute gap. The larva moved around, especially at the beginning, and to extract cell traces, we first needed a way to align frames and detect cells within and across recordings. Concatenation of 14 ~15-minute recordings with ~2-minute gaps. Read more...

Robust learning rate finder with Kalman smoothing

Kalman smoothing can be applied to the learning rate range test to produce smooth learning rate curves from which a learning rate can be chosen. Some example runs: A handful of lr-curves for various (dataset, batch size) combinations. Datasets vary left-to-right, batch size increases going down. The offset of the smoothed curve is just approximate, and doesn't need to be accurate for choosing the learning rate. I recently needed an automated way to choose a reasonable learning rate for a large number of (model, dataset, batch size) combinations. Read more...

Inside Neural Network Training

Below are some videos that show how weights, activations and gradients change as a network is trained. The videos were made while trying to test the idea that layers closer to the input stabilize earlier than layers closer to the output. The below videos suggest this hypothesis is wrong. In fact, quite often the updates to the last layer are the first to slow down, and the updates to the first layer are the last. Read more...

Motivating ELBO From Importance Sampling

Every derivation of the evidence lower bound I've seen has been unsatisfying in terms of motivation—they seem to be just moving symbols around. For me, the most convincing way to arrive at the evidence lower bound is from the perspective of importance sampling. Imagine you are trying to calculate an average, but each value contributing to the average requires carrying out importance sampling to estimate. Once the average is calculated, you will be maximizing it for some optimization task. Read more...

Origin of Lebesgue Integration

This article follows the steps of Henri Lebesgue as he came upon his theory of integration. The story could be started earlier, but we don't lose too much by starting with Borel, Lebesgue's adviser, at the end of the 19th century. Borel and the measure of a set At the end of the 19th century, Émile Borel was thinking about the problem of measure, that is, the problem of describing the size of things. Read more...

Visualizing a Perceptron

A lot of machine learning techniques can be viewed as an attempt to represent high-dimensional data in fewer dimensions without losing any important information. In a sense, it is lossy compression—compressing the data to be small and amenable before being passed to some next stage of data processing. If our data consists of elements of \( \mathbb{R}^D \), we are trying to find interesting functions of the form: \[ f : \mathbb{R}^D \to \mathbb{R}^d \] where \( D \) is fixed, but \( d \) can be chosen freely. Read more...
1 of 2 Next Page