Why does Kilosort scale linearly with the number of recorded neurons?

The time taken to run Kilosort scales linearly with the number of recorded neurons, rather than the number of channels, due to the low-dimensional parametrization of template waveforms.

How many passes is the main loop alternating template matching and inference?

The main loop alternating template matching and inference is run until the cost function approaches convergence (typically less than six full passes through the data).

How many spikes are allowed to be assigned to the same cluster?

The authors also anneal from small to large the ratio /λ, which controls the relative impact of the reconstruction term and amplitude bias term in equation 2; therefore, at the beginning of the optimization, spikes assigned to the same cluster are allowed to have more variable amplitudes.

How did the authors calculate the possible score after operator merges?

To estimate the best achievable score after operator merges, the authors took advantage ofthe ground truth data, and automatically merged together candidate clusters so as to greedily maximize their score.

What is the re-estimation of the running average?

Since firing rates vary over two orders of magnitude in typical recordings (from < 0.5 to 50 spikes/s), this adaptive running average procedure allows clusters with low firing rates to nonetheless average enough of their spikes to generate a smooth running-average template.

How many batches are used to obtain the templates?

After processing every hundred batches (or more, depending on their time length), the templates are obtained from the running average waveform

How did the authors avoid increasing the spike density at any location on the probe?

To avoid increasing the spike density at any location on the probe, the authors also subtracted off the denoised waveform from its original location.

(Open Access) Kilosort: realtime spike-sorting for extracellular electrophysiology with hundreds of channels (2016) | Marius Pachitariu

Q: What are the contributions mentioned in the paper "Kilosort: realtime spike-sorting for extracellular electrophysiology with hundreds of channels" ?

Here the authors introduce Kilosort, a spike sorting framework that meets these criteria, and show that it allows rapid and accurate sorting of large-scale in vivo data. The authors compare Kilosort to an established algorithm on data obtained from 384-channel electrodes, and show superior performance, at much reduced processing times.

Q: What is the oldest and reliable method for recording neural activity?

The oldest and most reliable method for recording neural activity involves lowering an electrode into the brain and recording the local electrical activity around the electrode tip.

Q: How many batches are used to obtain the templates?

After processing every hundred batches (or more, depending on their time length), the templates are obtained from the running average waveform

Q: What is the generative model of the electrical recorded voltage?

To define a generative model of the electrical recorded voltage, the authors take advantage of the approximately linear summation of electrical potentials from different sources in the extracellular medium.

Q: How did the authors avoid increasing the spike density at any location on the probe?

To avoid increasing the spike density at any location on the probe, the authors also subtracted off the denoised waveform from its original location.

Kilosort: realtime spike-sorting for extracellular electrophysiology

with hundreds of channels

Marius Pachitariu

1,2∗

, Nicholas Steinmetz

1,2

, Shabnam Kadir

1,2

, Matteo Carandini

and

Kenneth D. Harris

1,2

1 — UCL Institute of Neurology, London WC1E 6DE, United Kingdom.

2 — UCL Department of Neuroscience, Physiology, and Pharmacology, London WC1E 6DE,

United Kingdom.

3 — UCL Institute of Ophthalmology, London EC1V 9EL, United Kingdom.

* — Correspondence to marius.pachitariu.10@ucl.ac.uk.

Abstract

Advances in silicon probe technology mean that in vivo electrophysiological recordings from hun-

dreds of channels will soon become commonplace. To interpret these recordings we need fast,

scalable and accurate methods for spike sorting, whose output requires minimal time for manual

curation. Here we introduce Kilosort, a spike sorting framework that meets these criteria, and show

that it allows rapid and accurate sorting of large-scale in vivo data. Kilosort models the recorded

voltage as a sum of template waveforms triggered on the spike times, allowing overlapping spikes

to be identiﬁed and resolved. Rapid processing is achieved thanks to a novel low-dimensional ap-

proximation for the spatiotemporal distribution of each template, and to batch-based optimization

on GPUs. A novel post-clustering merging step based on the continuity of the templates substan-

tially reduces the requirement for subsequent manual curation operations. We compare Kilosort to

an established algorithm on data obtained from 384-channel electrodes, and show superior perfor-

mance, at much reduced processing times. Data from 384-channel electrode arrays can be pro-

cessed in approximately realtime. Kilosort is an important step towards fully automated spike sort-

ing of multichannel electrode recordings, and is freely available (github.com/cortex-lab/Kilosort).

1 Introduction

The oldest and most reliable method for recording neural activity involves lowering an electrode

into the brain and recording the local electrical activity around the electrode tip. Action potentials

of single neurons generate a stereotypical temporal deﬂection of the voltage, known as a spike

waveform. When multiple neurons close to the electrode ﬁre action potentials, their spikes must

be identiﬁed and assigned to the correct neuron based on the features of the recorded waveforms,

a process known as spike sorting

1–15

Measuring voltage at multiple closely-space sites in the extracellular medium can substantially

improve spike sorting accuracy. In this case, the recorded waveforms also have characteristic

spatial shapes, determined by each neuron’s location and physiological characteristics. Together,

the spatial and temporal shape of the waveform provide all the information that can be used to

assign a given spike to a neuron

Current methods for spike sorting, however, will struggle to meet the requirements raised by a

new generation of high-count, high-density electrodes that are soon to become commonplace.

These electrodes have several hundred closely-spaced recording sites

, and initial tests suggest

.CC-BY-NC-ND 4.0 International licensea

certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under

The copyright holder for this preprint (which was notthis version posted June 30, 2016. ; https://doi.org/10.1101/061481doi: bioRxiv preprint

500 1000 1500 2000 2500 3000 3500 4000 4500 5000

samples (25kHz)

100

120

channels

correlation of channel noise

20 40 60 80 100 120

channels

100

120

channels

Figure 1. Data from high-channel count recordings. a, High-pass ﬁltered and channel-whitened data. Negative

peaks are action potentials. b, Example mean waveforms, centered on their peaks. c, Example cross-correlation

matrix across channels (before whitening, no spikes included).

that they can reveal the activity of 100 to 1,000 neurons ﬁring tens of millions of spikes. When

applied to such data, algorithms designed for tens of recording sites

18,19

suffer from substantial

limitations. The automatic sorting software can take days to weeks to run, and require hours

to days of manual curation. Furthermore, with more channels and higher density, resolution of

spatiotemporally overlapping spikes becomes both more tractable and more important.

Here we overcome these limitations and present Kilosort, a new algorithm which takes advan-

tage of a novel mathematical approach that greatly reduces the amount of calculation required,

together with the computing capabilities of low-cost commercially available graphics processing

units (GPUs). To illustrate its abilities we show that it accurately spike sorts the output of 384-

channel dense probes in approximately real time.

1.1 High-density electrophysiology and structured sources of noise

With high-density neural probes (i.e. site spacing in the range ∼20 µm), the waveforms of each

neuron can be typically detected on 5 to 50 channels simultaneously (Fig. 1a,b; example data

available at http://data.cortexlab.net/dualPhase3). This provides a substantial amount of informa-

tion per spike, but because other neurons also ﬁre on the same channels, a clustering algorithm

is required to unmix the signals and assign spikes to the correct neuron. Furthermore, structured

sources of noise can make this assignment more difﬁcult. For example, neurons that are too dis-

tant from the electrode to be sortable provide myriad superimposed spike waveforms, a continuous

random background against which the features of sortable spikes must be distinguished

.CC-BY-NC-ND 4.0 International licensea

certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under

The copyright holder for this preprint (which was notthis version posted June 30, 2016. ; https://doi.org/10.1101/061481doi: bioRxiv preprint

1.2 Previous work

A traditional approach to spike sorting divides the problem into several stages

. First, spike times

are detected, for example as times when the negative voltage crosses a pre-deﬁned threshold.

Second, these spike waveforms are extracted and projected into a common low-dimensional

space, typically obtained by principal component analysis (PCA

. Third, the spikes are clustered

in this low-dimensional space using a variety of approaches, such as mixtures of Gaussians

or peak-density detection

. Some algorithms also include a fourth stage of template matching

that scans the raw data for overlapping spikes, which may have been missed in the ﬁrst detection

phase

11,12,14

. Finally, a manual curation stage is required, in which a human operator corrects the

imperfect automated results using a graphical user interface (GUI). This last step is particularly

necessary for recordings subject to electrode drift, where the waveforms of a given neuron vary

over time and may be assigned to multiple clusters.

Here we describe a system that omits spike detection and PCA and instead combines the identi-

ﬁcation of template waveforms and associated spike times in a single uniﬁed model. This model

seeks to reconstruct the entire raw voltage dataset with the templates of candidate neurons. We

deﬁne a cost function for this reconstruction, and derive approximate inference and learning algo-

rithms that can be successfully applied to very large channel count data. This approach is related

to a previous method

, but that method requires a generic convex optimization that is slow for

recordings with large numbers of channels.

As we demonstrate with constructed ground-truth datasets, our system is more accurate than

a current widely-used method

. Furthermore, we demonstrate that on real datasets with 384

channels, this implementation is fast enough to run in nearly real time.

2 Model formulation

We start with a generative model of the raw electrical voltage. Unlike the traditional pipeline, this

algorithm does not start with a spike detection step, nor project the spike waveforms to a lower-

dimensional PCA space. As we show below, both of these steps would discard potentially useful

information.

2.1 Pre-processing: common average referencing, temporal ﬁltering and spatial whitening

To remove low-frequency ﬂuctuations, such as the local ﬁeld potential, we high-pass ﬁlter each

channel of the raw data at 300 Hz. To diminish the effect of artifacts shared across all channels,

we subtract at each timepoint the median of the signal across all recording sites, an operation

known as common average referencing

. This step is best performed after high-pass ﬁltering,

because the LFP magnitude is variable across channels and can be comparable in size to the

artifacts.

Next, we whiten the data across channels to remove correlated noise. In the frequency range

typical of spikes, spatially correlated noise arises primarily from neurons far from the probe, whose

spikes are too small to sort directly

16,23

and have a large spatial spread over the surface of the

probe. Since there are many such neurons, their noise averages out to have a stereotypical cross-

correlation pattern across channels (Fig. 1c). To estimate this noise covariance, we ﬁrst remove

.CC-BY-NC-ND 4.0 International licensea

certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under

The copyright holder for this preprint (which was notthis version posted June 30, 2016. ; https://doi.org/10.1101/061481doi: bioRxiv preprint

raw waveform temporal PC (common) spatiotemporal PC (private)

temporal PC (common)

spatiotemporal PC (private)

residual waveform variance

a b

Figure 2. Spike reconstruction from three private PCs. a, Four example average waveforms (black) with their

respective reconstruction from three common temporal PCs/channel (blue), and with reconstruction from three spa-

tiotemporal PCs private to each spike (red). The red traces mostly overlap the black traces. b, Summary of residual

waveform variance for all neurons in one dataset.

the times of putative spikes (detected with a threshold criterion). We then estimate the covariance

matrix Σ, and use its singular vectors and singular values E, D to obtain a symmetrical whitening

matrix that maintains the spatial structure of the data, known as zero-phase component analysis

(ZCA): W

ZCA

= Σ

−1/2

= ED

−1/2

. To regularize D, we add a small value to its diagonal. For

very large channel counts, estimation of the full covariance matrix Σ is noisy, and we therefore

compute the columns of the whitening matrix W

ZCA

independently for each channel, based on

its nearest 32 neighbors. We then multiply the raw data matrix containing all channels with this

whitening matrix.

2.2 Modelling mean spike waveforms with SVD

When single spike waveforms are recorded across a large number of channels, most channels

will have no signal and only noise. To prevent the large total energy on these many noise chan-

nels from swamping the signal present on a the smaller number of signal channels, previous

approaches have estimated a “mask” to exclude channels with insufﬁcient SNR to identify any

given spike

18,19

; to further reduce noise and lower dimensionality, the spikes are usually projected

into a small number of temporal principal components per channel

, typically three.

Here we introduce a different method for simultaneous spatial denoising/masking and for lower-

ing the dimensionality of spikes. This method is based on the observation that any mean spike

waveforms can be well explained by a singular value decomposition (SVD) decomposition of its

spatiotemporal waveform, with as few as three components, but that the spatial and temporal

components required can vary substantially between neurons (Fig. 2a). This approach of tailoring

“private PCs” to each spike allows us to ﬁt the spikes with ∼5 times less residual variance than

the standard approach of applying a single PCA approximation per channel, to all neurons on that

channel (Fig. 2b). This decomposition results in an automated masking strategy, which allows the

waveforms to be denoised and irrelevant channels ignored, and also speeds up the algorithm, by

allowing the use of standard low-rank ﬁltering techniques (see below).

.CC-BY-NC-ND 4.0 International licensea

certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under

The copyright holder for this preprint (which was notthis version posted June 30, 2016. ; https://doi.org/10.1101/061481doi: bioRxiv preprint

2.3 Integrated template matching framework

To deﬁne a generative model of the electrical recorded voltage, we take advantage of the approxi-

mately linear summation of electrical potentials from different sources in the extracellular medium.

We combine the spike times of all neurons into a N

spikes

-dimensional vector s, such that the wave-

forms start at time samples s + 1. We deﬁne the cluster identity of spike k as σ(k), taking values

into the set {1, 2, 3, ..., N}, where N is the total number of neurons. We represent the normal-

ized waveform of neuron n as the matrix K

of size number of channels by number of sample

timepoints t

(typically 61). The matrix K

is approximated by a three-dimensional singular value

decomposition, K

= U

, whereby K

is deconstructed into three pairs of spatial and temporal

basis functions, U

and W

, such that the norm of U

is 1. The value of the electrical voltage

at time t on channel i is modeled by

V (i, t) = V

(i, t) + N (0, )

where the noise is modelled as independent Gaussian of variance  .V

(I, t) is deﬁned as

(i, t) =

s(k)≥t−t

k,s(k)<t

σ(k)

(i, t − s(k)) (1)

where the index k picks out those spikes that overlap with the timepoint t, because they happen

at nearby times s(k), and x

> 0 is the amplitude of spike k, further constrained by

∼ N



σ(k)

, λµ

σ(k)



This last equation models variations in spike amplitudes for spikes from the same neuron, due

to factors like burst adaptation and drift. We modelled this variability with a Gaussian distribution

whose variance scales with the square of its mean, to capture the fact that the spikes of neurons

closer to the probe vary in relative, not absolute amplitude. λ and  are hyperparameters that

control the relative scaling with respect to each other of the reconstruction error and the prior on

the amplitude.

This model formulation leads to the following cost function, which we minimize with respect to

spike times s, spike amplitudes x, templates K, and cluster assignments σ:

L(s, x, K, σ) = kV − V





σk

− 1



(2)

The second term in this expression has the purpose of limiting the number of spikes that are

assigned amplitudes that deviate strongly from the mean of the relevant cluster. It is scaled by the

ratio



, which we usually set to a constant between 1 and 10.

3 Learning and inference in the model

To optimize the cost function, we ﬁrst initialize the templates and then alternate between two steps:

ﬁnding the best spike times s, cluster assignments σ, and amplitudes x (template matching); and

optimizing the template waveforms K for a given s, σ, x (template optimization). After the ﬁnal spike

times and amplitudes have been extracted, we run a ﬁnal post-optimization merging algorithm

which ﬁnds pairs of clusters whose spikes form a single continuous density. These steps are

described in detail below.

.CC-BY-NC-ND 4.0 International licensea

certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under

The copyright holder for this preprint (which was notthis version posted June 30, 2016. ; https://doi.org/10.1101/061481doi: bioRxiv preprint

Kilosort: realtime spike-sorting for extracellular electrophysiology with hundreds of channels

Figures

Citations

Single-trial neural dynamics are dominated by richly varied movements

Open Ephys: an open-source, plugin-based platform for multichannel electrophysiology.

Neuropixels 2.0: A miniaturized high-density probe for stable, long-term brain recordings

Genetic Dissection of Neural Circuits: A Decade of Progress.

A Fully Automated Approach to Spike Sorting

References

Matching pursuits with time-frequency dictionaries

Clustering by fast search and find of density peaks

An analysis of single-layer networks in unsupervised feature learning

Unsupervised spike detection and sorting with wavelets and superparamagnetic clustering

Large-scale recording of neuronal ensembles

Related Papers (5)

Spike sorting for large, dense electrode arrays

Fully integrated silicon probes for high-density recording of neural activity

Unsupervised spike detection and sorting with wavelets and superparamagnetic clustering

Accuracy of Tetrode Spike Separation as Determined by Simultaneous Intracellular and Extracellular Measurements

DeepLabCut: markerless pose estimation of user-defined body parts with deep learning

Frequently Asked Questions (11)

Q1. What are the contributions mentioned in the paper "Kilosort: realtime spike-sorting for extracellular electrophysiology with hundreds of channels" ?

Q2. Why does Kilosort scale linearly with the number of recorded neurons?

Q3. How many passes is the main loop alternating template matching and inference?

Q4. What is the oldest and reliable method for recording neural activity?

Q5. How many spikes are allowed to be assigned to the same cluster?

Q6. How did the authors calculate the possible score after operator merges?

Q7. What is the re-estimation of the running average?

Q8. How many batches are used to obtain the templates?

Q9. What is the generative model of the electrical recorded voltage?

Q10. How did the authors avoid increasing the spike density at any location on the probe?

Q11. What is the method for whitening the mean spike waveforms?