scispace - formally typeset
Open AccessJournal ArticleDOI

The NumPy Array: A Structure for Efficient Numerical Computation

Reads0
Chats0
TLDR
In this article, the authors show how to improve the performance of NumPy arrays through vectorizing calculations, avoiding copying data in memory, and minimizing operation counts, which is a technique similar to the one described in this paper.
Abstract
In the Python world, NumPy arrays are the standard representation for numerical data and enable efficient implementation of numerical computations in a high-level language. As this effort shows, NumPy performance can be improved through three techniques: vectorizing calculations, avoiding copying data in memory, and minimizing operation counts.

read more

Content maybe subject to copyright    Report

HAL Id: inria-00564007
https://hal.inria.fr/inria-00564007
Submitted on 7 Feb 2011
HAL is a multi-disciplinary open access
archive for the deposit and dissemination of sci-
entic research documents, whether they are pub-
lished or not. The documents may come from
teaching and research institutions in France or
abroad, or from public or private research centers.
L’archive ouverte pluridisciplinaire HAL, est
destinée au dépôt et à la diusion de documents
scientiques de niveau recherche, publiés ou non,
émanant des établissements d’enseignement et de
recherche français ou étrangers, des laboratoires
publics ou privés.
The NumPy array: a structure for ecient numerical
computation
Stefan van Der Walt, S. Chris Colbert, Gaël Varoquaux
To cite this version:
Stefan van Der Walt, S. Chris Colbert, Gaël Varoquaux. The NumPy array: a structure for ecient
numerical computation. Computing in Science and Engineering, Institute of Electrical and Electronics
Engineers, 2011, 13 (2), pp.22-30. �10.1109/MCSE.2011.37�. �inria-00564007�

The NumPy array: a structure for efficient
numerical computation
St´efan van der Walt, Stellenbosch University South Africa
S. Chris Colbert, Enthought USA
Gael Varoquaux, INRIA Saclay France
This article is published in IEEE Computing in Science and Engineering. Please refer to
the published version if accessible, as it contains editor’s improvements. (c) 2011 IEEE.
In the Python world, NumPy arrays are the stan-
dard representation for numerical data. Here, we
show how these arrays enable efficient implemen-
tation of numerical computations in a high-level
language. Overall, three techniques are applied
to improve performance: vectorizing calculations,
avoiding copying data in memory, and minimizing
operation counts.
We first present the NumPy array structure, then
show how to use it for efficient computation, and
finally how to share array d ata with other li-
braries.
Introduction
The Python programming language provides a
rich set of high-level data structures: lists for enu-
merating a collection of objects, dictionaries to
build hash tables, etc. However, these structures
are not ideally suited to high-performance numer-
ical computation.
In the mid-90s, an international team of volun-
teers started to develop a data-structure for effi-
cient array computation. This structure evolved
into what is now known as the N-dimensional
NumPy array.
The NumPy package, which compr ises the
NumPy array as well as a set of accompanying
mathematical functions, has found wide-spread
adoption in academia, national laboratories, and
industry, with applications ranging from gaming
to space exploration.
Personal use of this material is permitted. Permission
from IEEE must be obtained for all other users, includ-
ing reprinting/ republishing this material for advertising
or promotional purposes, creating new collective works for
resale or redistribution to servers or lists, or reuse of any
copyrighted components of this work in other works.
A NumPy array is a multidimensional, uniform
collection of elements. An array is character-
ized by the type of elements it contains and by
its shape. For example, a matrix may be repre-
sented as an array of shape (M × N) that contains
numbers, e.g., floating point or complex numbers.
Unlike matrices, NumPy arrays can have any
dimensionality. Furthermore, they may contain
other kinds of elements (or even combinations of
elements), such as booleans or dates.
Underneath the hood, a NumPy arr ay is really
just a convenient way of describing one or more
blocks of compu ter memory, so that the numbers
represented may be easily manipulated.
Basic usage
Throughout the code examples in th e article,
we assume that NumPy is imported as follows:
import numpy as np
Code snippets are shown as th ey appear inside an
[
IPython] prompt, such as this:
In [3]: np.__version__
Out [3]: 1.4.1’
Elements contained in an array can be indexed us-
ing the [] operator. In addition, parts of an array
may be retrieved using standard Python slicing
of the form start:stop:step. For instance, the
first two rows of an array x are given by x[:2, :]
or columns 1 through 3 by x[:, 1:4]. Similarly,
every second r ow is given by x[::2, :]. Note
that Python uses zero-based indexing.
1

The structure of a NumPy array: a view on
memory
A NumPy array (also called an “ndarray”, short
for N-dimensional array) describes memory, using
the f ollowing attributes:
Data pointer the memory address of the rst
byte in the array.
Data type description the kind of elements con-
tained in the array, for example oating point
numbers or integers.
Shape the shape of the array, for example (10,
10) f or a ten-by-ten array, or (5, 5, 5) for a
block of data describing a mesh grid of x-, y-
and z-coord inates.
Strides the number of bytes to skip in memory to
proceed to the next element. For a (10, 10)
array of bytes, for example, the strides may be
(10, 1), in other words: proceed one byte to
get to the next column and ten bytes to locate
the next row.
Flags which define whether we are allowed to
modify the array, whether memory layout is
C- or Fortran-contiguous
1
, and so forth.
NumPy’s strided memory model deserves
particular attention, as it provides a pow-
erful way of viewing the same memory in
different ways without copying data. For
example, consider the following integer array:
# Generate the integers from zero to eight and
# re pack them into a 3x3 array
In [1]: x = np.arange(9).reshape((3, 3))
In [2]: x
Out[2]:
array([[0, 1, 2],
[3, 4, 5],
[6, 7, 8]])
In [3]: x.strides
Out[3]: (24, 8)
On our 64-bit system, the default integer
data-type occupies 64-bits, or 8 bytes, in mem-
ory. The strid es therefore describe skipping
3 integers in memory to get to the next row
and one to get to the next column. We can
1
In C, memory is laid out in “row major” order, i.e.,
rows are stored one after another in memory. In Fortran,
columns are stored successively.
now generate a view on the same memory
where we only examine every second element.
In [4]: y = x[::2, ::2]
In [5]: y
Out[5]:
array([[0, 2],
[6, 8]])
In [6]: y.strides
Out[6]: (48, 16)
The arrays x and y point to the same memory
(i.e., if we modify the values in y we also modify
those in x), but the strides for y have been
changed so that only every second element is
seen along either axis. y is said to be a view on x:
In [7]: y[0, 0] = 100
In [8]: x
Out[8]:
array([[100, 1, 2],
[ 3, 4, 5],
[ 6, 7, 8]])
Views need not be created u sing slicing only.
By modifying str ides, for example, an array
can be transposed or reshaped at zero cost (no
memory needs to be copied). Moreover, the
strides, shape and dtype attributes of an array
may be specified manually by the user (provided
they are all compatible), enabling a plethora of
ways in which to interpret the underlying data.
# Transpose the array, using the shorthand "T"
In [9]: xT = x.T
In [10]: xT
Out[10]:
array([[100, 3, 6],
[ 1, 4, 7],
[ 2, 5, 8]])
In [11]: xT.strides
Out[11]: (8, 24)
# Change the shape of the array
In [12]: z = x.reshape((1, 9))
In [13]: z
Out[13]: array([[100, 1, 2, 3, 4, 5, 6, 7, 8]])
In [14]: z.strides
Out[14]: (72, 8)
# i.e., for the two-dimensional z, 9 * 8 bytes
# to skip over a row of 9 uint8 elements,
# 8 bytes to skip a single element
2 IEEE Computing in science and Engineering

# View data as bytes in memory rather than
# 64bit integers
In [15]: z.dtype = np.dtype(’uint8’)
In [16]: z
Out[17]:
array([[100, 0, 0, ...,
0, 0, 0, 0, 0, 0]], dtype=uint8)
In [18]: z.shape
Out[19]: (1, 72)
In [20]: z.strides
Out[20]: (72, 1)
In each of these cases, the resulting array points to
the same memory. The difference lies in the way
the data is interpreted, based on shape, strides
and data-type. Since no data is copied in memory,
these operations are extremely efficient.
Numerical operations on arrays: vectoriza-
tion
In any scripting language, un judicious use of for-
loops may lead to poor performance, particularly
in the case where a simple computation is applied
to each element of a large data-set.
Grouping these element-wise operations together,
a process known as vectorisation, allows NumPy
to perform such computations much more rapidly.
Suppose we have a vector a and wish to
multiply its magnitude by 3. A traditional
for-lo op approach would look as follows:
In [21]: a = [1, 3, 5]
In [22]: b = [3*x for x in a]
In [23]: b
Out[23]: [3, 9, 15]
The vectorized approach applies this operation to
all elements of an array:
In [24]: a = np.array([1, 3, 5])
In [25]: b = 3 * a
In [26]: b
Out[26]: array([ 3, 9, 15])
Vectorized operations in NumPy are implemented
in C, resulting in a significant speed improvement.
Operations are not restricted to interactions be-
tween scalars and arrays . For example, here
NumPy performs a fast element-wise subtraction
of two arr ays:
In [27]: b - a
Out [27]: array([2, 6, 10])
When the shapes of the two arguments are
not the same, but share a common shape di-
mension, the operation is broadcast across the
array. In other words, NumPy expands the
arrays s uch that the operation becomes viable:
In [28]: m = np.arange(6).reshape((2,3))
In [29]:m
Out [29]:
array([[0, 1, 2],
[3, 4, 5]])
In [30]: b + m
Out[30]:
array([[ 3, 10, 17],
[ 6, 13, 20]])
Refer to “Broadcasting Rules” to see when these
operations are viable.
To save memory, the broadcasted arrays are never
physically constructed; NumP y simply uses the
appropriate array elements during computation
2
Broadcasting Rules
Before broadcasting two arrays, NumPy verifies
that all dimensions are suitably matched. Dimen-
sions match when they are equal, or when either
is 1 or None. In the latter case, the dimension of
the output array is expanded to the larger of the
two.
For example, consider arrays x and y with s hapes
(2, 4, 3) and (4, 1) respectively. These arrays
are to be combin ed in a broadcasting operation
such as z = x + y. We match their dimensions
as follows:
x (2, 4, 3)
y ( 4, 1)
z (2, 4, 3)
Therefore, the dimensions of th ese arrays are com-
patible, and yield and output of shape (2, 4, 3).
Vectorization and broadcasting examples
Evaluating Functions
Suppose we wish to evaluate a function f over
a large set of numbers, x, stored as an array.
2
Under the hood, this is achieved by using strides of
zero.
3

Using a for-loop, the result is produced as follows:
In [31]: def f(x):
...: return x**2 - 3*x + 4
...:
In [32]: x = np.arange(1e5)
In [33]: y = [f(i) for i in x]
On our machine, th is loop executes in approxi-
mately 500 miliseconds. Applying the function f
on th e NumPy array x engages the fast, vector-
ized loop, which operates on each element indi-
vidually:
In [34]: y = f(x)
In [35]: y
Out[35]:
array([ 4.0000e+0, 2.0000e+0, 2.0000e+0, ...,
9.9991e+9, 9.9993e+9, 9.9995e+9])
The vectorized computation executes in 1 milisec-
ond.
As the length of the input array x grows,
however, execution speed decreases due to th e
construction of large temporary arrays. For ex-
ample, the operation above roughly tran s lates to
a = x**2
b = 3*x
c = a - b
fx = c + 4
Most array-based systems do not provide a
way to circumvent the creation of these tempo-
raries. With NumPy, the user may choose to
perform operations “in-place”—in other words,
in such a way that no new memory is allocated
and all results are stored in the current array.
def g(x):
# Allocate the output array with x-squared
fx = x**2
# In-place operations: no new memory allocated
fx -= 3*x
fx += 4
return fx
Applying g to x takes 600 microseconds; almost
twice as fast as the naive vectorization. Note that
we did not compute 3*x in-place, as it would mod-
ify the original data in x.
This example illustrates the ease with which
NumPy handles vectorized array operations,
without relinquishing control over performance-
critical aspects such as memory allocation.
Note that performance may be bo osted even fur-
ther by using tools such as [
Cython], [Theano]
or [
numexpr], which lessen the load on the mem-
ory bus. Cython, a Python to C compiler dis-
cussed later in this issue, is especially useful in
cases where code cannot be vectorized easily.
Finite Differencing
The derivative on a discrete sequence is often
computed using finite differencing. Slicing makes
this operation trivial.
Suppose we have an n + 1 length vector and per-
form a forward divided difference.
In [36]: x = np.arange(0, 10, 2)
In [37]: x
Out[37]: array([ 0, 2, 4, 6, 8])
In [38]: y = x**2
In [39]: y
Out[39]: array([ 0, 4, 16, 36, 64])
In [40]: dy_dx = (y[1:]-y[:-1])/(x[1:]-x[:-1])
In [41]: dy_dx
Out[41]: array([ 2, 6, 10, 14, 18])
In this example, y[1:] takes a slice of the y ar-
ray starting at index 1 an d continuing to the end
of the array. y[:-1] takes a slice which starts at
index 0 and continues to one index short of the
end of the array. Thus y[1:] - y[:-1] has the
effect of subtracting, from each element in the ar-
ray, the element directly preceding it. Performing
the same differencing on the x array and dividing
the two resulting arrays yields the forward divided
difference.
If we assume that the vectors are length n +
2, then calculating the central divided differ-
ence is simply a matter of m odifying the slices:
In [42]: dy_dx_c = (y[2:]-y[:-2])/(x[2:]-x[:-2])
In [43]: dy_dx_c
Out[43]: array([ 4, 8, 12, 16])
In Pure Python, these operation would be written
using a for loop. For x containing 1000 elements,
the NumPy implementation is 100 times faster.
Creating a grid using broa dcasting
Suppose we want to produ ce a three-dimensional
grid of distances R
ijk
=
p
i
2
+ j
2
+ k
2
with i = 100 . . . 99, j = 100 . . . 99, and
4 IEEE Computing in science and Engineering

Citations
More filters
Journal ArticleDOI

Exploring the impact of analysis software on task fMRI results.

TL;DR: This work uses publicly shared data from three published task fMRI neuroimaging studies, reanalyzing each study using the three main neuroim imaging software packages, AFNI, FSL, and SPM, using parametric and nonparametric inference.
Journal ArticleDOI

Optimizing colormaps with consideration for color vision deficiency to enable accurate interpretation of scientific data

TL;DR: An example CVD-optimized colormap is presented that is optimized for viewing by those without a CVD as well as those with red-green colorblindness, and enables nearly-identical visual-data interpretation to both groups.
Journal ArticleDOI

Describing and Measuring the Pathway to Suicide Attempts: A Preliminary Study

TL;DR: Although the median onset for suicidal ideation occurs 1 to 5 years prior to attempting, the median for 6 of the 10 steps measured was within 6 hours of attempting, overall, 86.5% of proximal planning steps took place within 1 week of attempting and 66.6% occurred within 12 hours.
Journal ArticleDOI

Pysteps: an open-source Python library for probabilistic precipitation nowcasting (v1.0)

TL;DR: Pysteps is an open-source and community-driven Python library for probabilistic precipitation nowcasting, that is, very-short-range forecasting (0–6 h), which has the potential to become an important component for integrated early warning systems for severe weather.
References
More filters
Journal ArticleDOI

IPython: A System for Interactive Scientific Computing

TL;DR: The IPython project as mentioned in this paper provides an enhanced interactive environment that includes, among other features, support for data visualization and facilities for distributed and parallel computation for interactive work and a comprehensive library on top of which more sophisticated systems can be built.
Related Papers (5)
Frequently Asked Questions (1)
Q1. What contributions have the authors mentioned in the paper "The numpy array: a structure for efficient numerical computation" ?

The N-dimensional array this paper is a high-level data structure that facilitates vectorization of for-loops.