Journal Article•DOI•

The NumPy Array: A Structure for Efficient Numerical Computation

Stefan van der Walt¹, S. Chris Colbert, Gaël Varoquaux²•Institutions (2)

Stellenbosch University¹, French Institute for Research in Computer Science and Automation²

01 Mar 2011-Computing in Science and Engineering (IEEE Computer Society)-Vol. 13, Iss: 2, pp 22-30

TL;DR: In this article, the authors show how to improve the performance of NumPy arrays through vectorizing calculations, avoiding copying data in memory, and minimizing operation counts, which is a technique similar to the one described in this paper.

read less

Abstract: In the Python world, NumPy arrays are the standard representation for numerical data and enable efficient implementation of numerical computations in a high-level language. As this effort shows, NumPy performance can be improved through three techniques: vectorizing calculations, avoiding copying data in memory, and minimizing operation counts.

...read moreread less

Summary (2 min read)

Jump to: [Introduction] – [Basic usage] – [Broadcasting Rules] – [Evaluating Functions] – [Sharing data] and [Conclusion]

Introduction

The Python programming language provides a rich set of high-level data structures: lists for enumerating a collection of objects, dictionaries to build hash tables, etc.
In the mid-90s, an international team of volunteers started to develop a data-structure for efficient array computation.
∗Personal use of this material is permitted.
Underneath the hood, a NumPy array is really just a convenient way of describing one or more blocks of computer memory, so that the numbers represented may be easily manipulated.

Basic usage

Throughout the code examples in the article, the authors assume that NumPy is imported as follows: import numpy as np Code snippets are shown as they appear inside an [IPython] prompt, such as this: In [3]: np.__version__ Out [3]: ’1.4.1’ Elements contained in an array can be indexed using the [] operator.
The structure of a NumPy array: a view on memory A NumPy array (also called an “ndarray”, short for N-dimensional array) describes memory, using the following attributes: Data pointer the memory address of the first byte in the array.
Moreover, the strides, shape and dtype attributes of an array may be specified manually by the user (provided they are all compatible), enabling a plethora of ways in which to interpret the underlying data.
When the shapes of the two arguments are not the same, but share a common shape dimension, the operation is broadcast across the array.

Broadcasting Rules

Before broadcasting two arrays, NumPy verifies that all dimensions are suitably matched.
In the latter case, the dimension of the output array is expanded to the larger of the two.

Evaluating Functions

Under the hood, this is achieved by using strides of zero.
As the length of the input array x grows, however, execution speed decreases due to the construction of large temporary arrays.
Most array-based systems do not provide a way to circumvent the creation of these temporaries.
Note that the authors did not compute 3*x in-place, as it would modify the original data in x.
This example illustrates the ease with which NumPy handles vectorized array operations, without relinquishing control over performancecritical aspects such as memory allocation.

Conclusion

The authors show that the N-dimensional array introduced by NumPy is a high-level data structure that facilitates vectorization of for-loops.
Its sophisticated memory description allows a wide variety of operations to be performed without copying any data in memory, bringing significant performance gains as data-sets grow large.
Arrays of multiple dimensions may be combined using broadcasting to reduce the number of operations performed during numerical computation.
When NumPy is skillfully applied, most computation time is spent on vectorized array operations, instead of in Python for-loops (which are often a bottleneck).
Further speed improvements are achieved by optimizing compilers, such as Cython, which better exploit cache effects.

Did you find this useful? Give us your feedback

Figures (1)

Figure 1: Computing dense grid values without and with broadcasting. Note how, with broadcasting, much less memory is used.

Content maybe subject to copyright Report

HAL Id: inria-00564007

https://hal.inria.fr/inria-00564007

Submitted on 7 Feb 2011

HAL is a multi-disciplinary open access

archive for the deposit and dissemination of sci-

entic research documents, whether they are pub-

lished or not. The documents may come from

teaching and research institutions in France or

abroad, or from public or private research centers.

L’archive ouverte pluridisciplinaire HAL, est

destinée au dépôt et à la diusion de documents

scientiques de niveau recherche, publiés ou non,

émanant des établissements d’enseignement et de

recherche français ou étrangers, des laboratoires

publics ou privés.

The NumPy array: a structure for ecient numerical

computation

Stefan van Der Walt, S. Chris Colbert, Gaël Varoquaux

To cite this version:

Stefan van Der Walt, S. Chris Colbert, Gaël Varoquaux. The NumPy array: a structure for ecient

numerical computation. Computing in Science and Engineering, Institute of Electrical and Electronics

Engineers, 2011, 13 (2), pp.22-30. �10.1109/MCSE.2011.37�. �inria-00564007�

The NumPy array: a structure for eﬃcient

numerical computation

St´efan van der Walt, Stellenbosch University South Africa

S. Chris Colbert, Enthought USA

Gael Varoquaux, INRIA Saclay France

This article is published in IEEE Computing in Science and Engineering. Please refer to

∗

In the Python world, NumPy arrays are the stan-

dard representation for numerical data. Here, we

show how these arrays enable eﬃcient implemen-

tation of numerical computations in a high-level

language. Overall, three techniques are applied

to improve performance: vectorizing calculations,

avoiding copying data in memory, and minimizing

operation counts.

We ﬁrst present the NumPy array structure, then

show how to use it for eﬃcient computation, and

ﬁnally how to share array d ata with other li-

braries.

Introduction

The Python programming language provides a

rich set of high-level data structures: lists for enu-

merating a collection of objects, dictionaries to

build hash tables, etc. However, these structures

are not ideally suited to high-performance numer-

ical computation.

In the mid-90s, an international team of volun-

teers started to develop a data-structure for eﬃ-

cient array computation. This structure evolved

into what is now known as the N-dimensional

NumPy array.

The NumPy package, which compr ises the

NumPy array as well as a set of accompanying

mathematical functions, has found wide-spread

adoption in academia, national laboratories, and

industry, with applications ranging from gaming

to space exploration.

∗

Personal use of this material is permitted. Permission

from IEEE must be obtained for all other users, includ-

ing reprinting/ republishing this material for advertising

or promotional purposes, creating new collective works for

resale or redistribution to servers or lists, or reuse of any

copyrighted components of this work in other works.

A NumPy array is a multidimensional, uniform

collection of elements. An array is character-

ized by the type of elements it contains and by

its shape. For example, a matrix may be repre-

sented as an array of shape (M × N) that contains

numbers, e.g., ﬂoating point or complex numbers.

Unlike matrices, NumPy arrays can have any

dimensionality. Furthermore, they may contain

other kinds of elements (or even combinations of

elements), such as booleans or dates.

Underneath the hood, a NumPy arr ay is really

just a convenient way of describing one or more

blocks of compu ter memory, so that the numbers

represented may be easily manipulated.

Basic usage

Throughout the code examples in th e article,

we assume that NumPy is imported as follows:

import numpy as np

Code snippets are shown as th ey appear inside an

[

IPython] prompt, such as this:

In [3]: np.__version__

Out [3]: ’1.4.1’

Elements contained in an array can be indexed us-

ing the [] operator. In addition, parts of an array

may be retrieved using standard Python slicing

of the form start:stop:step. For instance, the

ﬁrst two rows of an array x are given by x[:2, :]

or columns 1 through 3 by x[:, 1:4]. Similarly,

every second r ow is given by x[::2, :]. Note

that Python uses zero-based indexing.

The structure of a NumPy array: a view on

memory

A NumPy array (also called an “ndarray”, short

for N-dimensional array) describes memory, using

the f ollowing attributes:

Data pointer the memory address of the ﬁ rst

byte in the array.

Data type description the kind of elements con-

tained in the array, for example ﬂoating point

numbers or integers.

Shape the shape of the array, for example (10,

10) f or a ten-by-ten array, or (5, 5, 5) for a

block of data describing a mesh grid of x-, y-

and z-coord inates.

Strides the number of bytes to skip in memory to

proceed to the next element. For a (10, 10)

array of bytes, for example, the strides may be

(10, 1), in other words: proceed one byte to

get to the next column and ten bytes to locate

the next row.

Flags which deﬁne whether we are allowed to

modify the array, whether memory layout is

C- or Fortran-contiguous

, and so forth.

NumPy’s strided memory model deserves

particular attention, as it provides a pow-

erful way of viewing the same memory in

diﬀerent ways without copying data. For

example, consider the following integer array:

# Generate the integers from zero to eight and

# re pack them into a 3x3 array

In [1]: x = np.arange(9).reshape((3, 3))

In [2]: x

Out[2]:

array([[0, 1, 2],

[3, 4, 5],

[6, 7, 8]])

In [3]: x.strides

Out[3]: (24, 8)

On our 64-bit system, the default integer

data-type occupies 64-bits, or 8 bytes, in mem-

ory. The strid es therefore describe skipping

3 integers in memory to get to the next row

and one to get to the next column. We can

In C, memory is laid out in “row major” order, i.e.,

rows are stored one after another in memory. In Fortran,

columns are stored successively.

now generate a view on the same memory

where we only examine every second element.

In [4]: y = x[::2, ::2]

In [5]: y

Out[5]:

array([[0, 2],

[6, 8]])

In [6]: y.strides

Out[6]: (48, 16)

The arrays x and y point to the same memory

(i.e., if we modify the values in y we also modify

those in x), but the strides for y have been

changed so that only every second element is

seen along either axis. y is said to be a view on x:

In [7]: y[0, 0] = 100

In [8]: x

Out[8]:

array([[100, 1, 2],

[ 3, 4, 5],

[ 6, 7, 8]])

Views need not be created u sing slicing only.

By modifying str ides, for example, an array

can be transposed or reshaped at zero cost (no

memory needs to be copied). Moreover, the

strides, shape and dtype attributes of an array

may be speciﬁed manually by the user (provided

they are all compatible), enabling a plethora of

ways in which to interpret the underlying data.

# Transpose the array, using the shorthand "T"

In [9]: xT = x.T

In [10]: xT

Out[10]:

array([[100, 3, 6],

[ 1, 4, 7],

[ 2, 5, 8]])

In [11]: xT.strides

Out[11]: (8, 24)

# Change the shape of the array

In [12]: z = x.reshape((1, 9))

In [13]: z

Out[13]: array([[100, 1, 2, 3, 4, 5, 6, 7, 8]])

In [14]: z.strides

Out[14]: (72, 8)

# i.e., for the two-dimensional z, 9 * 8 bytes

# to skip over a row of 9 uint8 elements,

# 8 bytes to skip a single element

2 IEEE Computing in science and Engineering

# View data as bytes in memory rather than

# 64bit integers

In [15]: z.dtype = np.dtype(’uint8’)

In [16]: z

Out[17]:

array([[100, 0, 0, ...,

0, 0, 0, 0, 0, 0]], dtype=uint8)

In [18]: z.shape

Out[19]: (1, 72)

In [20]: z.strides

Out[20]: (72, 1)

In each of these cases, the resulting array points to

the same memory. The diﬀerence lies in the way

the data is interpreted, based on shape, strides

and data-type. Since no data is copied in memory,

these operations are extremely eﬃcient.

Numerical operations on arrays: vectoriza-

tion

In any scripting language, un judicious use of for-

loops may lead to poor performance, particularly

in the case where a simple computation is applied

to each element of a large data-set.

Grouping these element-wise operations together,

a process known as vectorisation, allows NumPy

to perform such computations much more rapidly.

Suppose we have a vector a and wish to

multiply its magnitude by 3. A traditional

for-lo op approach would look as follows:

In [21]: a = [1, 3, 5]

In [22]: b = [3*x for x in a]

In [23]: b

Out[23]: [3, 9, 15]

The vectorized approach applies this operation to

all elements of an array:

In [24]: a = np.array([1, 3, 5])

In [25]: b = 3 * a

In [26]: b

Out[26]: array([ 3, 9, 15])

Vectorized operations in NumPy are implemented

in C, resulting in a signiﬁcant speed improvement.

Operations are not restricted to interactions be-

tween scalars and arrays . For example, here

NumPy performs a fast element-wise subtraction

of two arr ays:

In [27]: b - a

Out [27]: array([2, 6, 10])

When the shapes of the two arguments are

not the same, but share a common shape di-

mension, the operation is broadcast across the

array. In other words, NumPy expands the

arrays s uch that the operation becomes viable:

In [28]: m = np.arange(6).reshape((2,3))

In [29]:m

Out [29]:

array([[0, 1, 2],

[3, 4, 5]])

In [30]: b + m

Out[30]:

array([[ 3, 10, 17],

[ 6, 13, 20]])

Refer to “Broadcasting Rules” to see when these

operations are viable.

To save memory, the broadcasted arrays are never

physically constructed; NumP y simply uses the

appropriate array elements during computation

Broadcasting Rules

Before broadcasting two arrays, NumPy veriﬁes

that all dimensions are suitably matched. Dimen-

sions match when they are equal, or when either

is 1 or None. In the latter case, the dimension of

the output array is expanded to the larger of the

two.

For example, consider arrays x and y with s hapes

(2, 4, 3) and (4, 1) respectively. These arrays

are to be combin ed in a broadcasting operation

such as z = x + y. We match their dimensions

as follows:

x (2, 4, 3)

y ( 4, 1)

z (2, 4, 3)

Therefore, the dimensions of th ese arrays are com-

patible, and yield and output of shape (2, 4, 3).

Vectorization and broadcasting examples

Evaluating Functions

Suppose we wish to evaluate a function f over

a large set of numbers, x, stored as an array.

Under the hood, this is achieved by using strides of

zero.

Using a for-loop, the result is produced as follows:

In [31]: def f(x):

...: return x**2 - 3*x + 4

...:

In [32]: x = np.arange(1e5)

In [33]: y = [f(i) for i in x]

On our machine, th is loop executes in approxi-

mately 500 miliseconds. Applying the function f

on th e NumPy array x engages the fast, vector-

ized loop, which operates on each element indi-

vidually:

In [34]: y = f(x)

In [35]: y

Out[35]:

array([ 4.0000e+0, 2.0000e+0, 2.0000e+0, ...,

9.9991e+9, 9.9993e+9, 9.9995e+9])

The vectorized computation executes in 1 milisec-

ond.

As the length of the input array x grows,

however, execution speed decreases due to th e

construction of large temporary arrays. For ex-

ample, the operation above roughly tran s lates to

a = x**2

b = 3*x

c = a - b

fx = c + 4

Most array-based systems do not provide a

way to circumvent the creation of these tempo-

raries. With NumPy, the user may choose to

perform operations “in-place”—in other words,

in such a way that no new memory is allocated

and all results are stored in the current array.

def g(x):

# Allocate the output array with x-squared

fx = x**2

# In-place operations: no new memory allocated

fx -= 3*x

fx += 4

return fx

Applying g to x takes 600 microseconds; almost

twice as fast as the naive vectorization. Note that

we did not compute 3*x in-place, as it would mod-

ify the original data in x.

This example illustrates the ease with which

NumPy handles vectorized array operations,

without relinquishing control over performance-

critical aspects such as memory allocation.

Note that performance may be bo osted even fur-

ther by using tools such as [

Cython], [Theano]

or [

numexpr], which lessen the load on the mem-

ory bus. Cython, a Python to C compiler dis-

cussed later in this issue, is especially useful in

cases where code cannot be vectorized easily.

Finite Diﬀerencing

The derivative on a discrete sequence is often

computed using ﬁnite diﬀerencing. Slicing makes

this operation trivial.

Suppose we have an n + 1 length vector and per-

form a forward divided diﬀerence.

In [36]: x = np.arange(0, 10, 2)

In [37]: x

Out[37]: array([ 0, 2, 4, 6, 8])

In [38]: y = x**2

In [39]: y

Out[39]: array([ 0, 4, 16, 36, 64])

In [40]: dy_dx = (y[1:]-y[:-1])/(x[1:]-x[:-1])

In [41]: dy_dx

Out[41]: array([ 2, 6, 10, 14, 18])

In this example, y[1:] takes a slice of the y ar-

ray starting at index 1 an d continuing to the end

of the array. y[:-1] takes a slice which starts at

index 0 and continues to one index short of the

end of the array. Thus y[1:] - y[:-1] has the

eﬀect of subtracting, from each element in the ar-

ray, the element directly preceding it. Performing

the same diﬀerencing on the x array and dividing

the two resulting arrays yields the forward divided

diﬀerence.

If we assume that the vectors are length n +

2, then calculating the central divided diﬀer-

ence is simply a matter of m odifying the slices:

In [42]: dy_dx_c = (y[2:]-y[:-2])/(x[2:]-x[:-2])

In [43]: dy_dx_c

Out[43]: array([ 4, 8, 12, 16])

In Pure Python, these operation would be written

using a for loop. For x containing 1000 elements,

the NumPy implementation is 100 times faster.

Creating a grid using broa dcasting

Suppose we want to produ ce a three-dimensional

grid of distances R

ijk

+ j

+ k

with i = −100 . . . 99, j = −100 . . . 99, and

4 IEEE Computing in science and Engineering

HTML Viewer

References

PDF

Open Access

More filters

Journal Article•DOI•

IPython: A System for Interactive Scientific Computing

[...]

Fernando Perez¹, Brian E. Granger•Institutions (1)

University of Colorado Boulder¹

01 May 2007

TL;DR: The IPython project as mentioned in this paper provides an enhanced interactive environment that includes, among other features, support for data visualization and facilities for distributed and parallel computation for interactive work and a comprehensive library on top of which more sophisticated systems can be built.

...read moreread less

Abstract: Python offers basic facilities for interactive work and a comprehensive library on top of which more sophisticated systems can be built. The IPython project provides on enhanced interactive environment that includes, among other features, support for data visualization and facilities for distributed and parallel computation

...read moreread less

3,355 citations

Frequently Asked Questions (1)

Q1. What contributions have the authors mentioned in the paper "The numpy array: a structure for efficient numerical computation" ?

The N-dimensional array this paper is a high-level data structure that facilitates vectorization of for-loops.

The NumPy Array: A Structure for Efficient Numerical Computation

Summary (2 min read)

Introduction

Basic usage

Broadcasting Rules

Evaluating Functions

Conclusion

Figures (1)

Citations

References

Related Papers (5)

Frequently Asked Questions (1)

Q1. What contributions have the authors mentioned in the paper "The numpy array: a structure for efficient numerical computation" ?