scispace - formally typeset
Search or ask a question
Journal ArticleDOI

The NumPy Array: A Structure for Efficient Numerical Computation

TL;DR: In this article, the authors show how to improve the performance of NumPy arrays through vectorizing calculations, avoiding copying data in memory, and minimizing operation counts, which is a technique similar to the one described in this paper.
Abstract: In the Python world, NumPy arrays are the standard representation for numerical data and enable efficient implementation of numerical computations in a high-level language. As this effort shows, NumPy performance can be improved through three techniques: vectorizing calculations, avoiding copying data in memory, and minimizing operation counts.

Summary (2 min read)

Introduction

  • The Python programming language provides a rich set of high-level data structures: lists for enumerating a collection of objects, dictionaries to build hash tables, etc.
  • In the mid-90s, an international team of volunteers started to develop a data-structure for efficient array computation.
  • ∗Personal use of this material is permitted.
  • Underneath the hood, a NumPy array is really just a convenient way of describing one or more blocks of computer memory, so that the numbers represented may be easily manipulated.

Basic usage

  • Throughout the code examples in the article, the authors assume that NumPy is imported as follows: import numpy as np Code snippets are shown as they appear inside an [IPython] prompt, such as this: In [3]: np.__version__ Out [3]: ’1.4.1’ Elements contained in an array can be indexed using the [] operator.
  • The structure of a NumPy array: a view on memory A NumPy array (also called an “ndarray”, short for N-dimensional array) describes memory, using the following attributes: Data pointer the memory address of the first byte in the array.
  • Moreover, the strides, shape and dtype attributes of an array may be specified manually by the user (provided they are all compatible), enabling a plethora of ways in which to interpret the underlying data.
  • When the shapes of the two arguments are not the same, but share a common shape dimension, the operation is broadcast across the array.

Broadcasting Rules

  • Before broadcasting two arrays, NumPy verifies that all dimensions are suitably matched.
  • In the latter case, the dimension of the output array is expanded to the larger of the two.

Evaluating Functions

  • Under the hood, this is achieved by using strides of zero.
  • As the length of the input array x grows, however, execution speed decreases due to the construction of large temporary arrays.
  • Most array-based systems do not provide a way to circumvent the creation of these temporaries.
  • Note that the authors did not compute 3*x in-place, as it would modify the original data in x.
  • This example illustrates the ease with which NumPy handles vectorized array operations, without relinquishing control over performancecritical aspects such as memory allocation.

Sharing data

  • The authors show how NumPy may make use of foreign memory–in other words, memory that is not allocated or controlled by NumPy– without copying data.
  • This example illustrates how NumPy is able to interpret any block of memory, as long as the necessary information is provided via an array interface dictionary.
  • Suppose the authors have a file “foo.dat” that contains binary data structured according to the data-type ’dt’ introduced above: for each record, the first 8 bytes are a 64-bit unsigned integer time stamp and the next 16 bytes a position comprised of a 64-bit floating point ’x’ position and a floating point ’y’ position.

Conclusion

  • The authors show that the N-dimensional array introduced by NumPy is a high-level data structure that facilitates vectorization of for-loops.
  • Its sophisticated memory description allows a wide variety of operations to be performed without copying any data in memory, bringing significant performance gains as data-sets grow large.
  • Arrays of multiple dimensions may be combined using broadcasting to reduce the number of operations performed during numerical computation.
  • When NumPy is skillfully applied, most computation time is spent on vectorized array operations, instead of in Python for-loops (which are often a bottleneck).
  • Further speed improvements are achieved by optimizing compilers, such as Cython, which better exploit cache effects.

Did you find this useful? Give us your feedback

Content maybe subject to copyright    Report

HAL Id: inria-00564007
https://hal.inria.fr/inria-00564007
Submitted on 7 Feb 2011
HAL is a multi-disciplinary open access
archive for the deposit and dissemination of sci-
entic research documents, whether they are pub-
lished or not. The documents may come from
teaching and research institutions in France or
abroad, or from public or private research centers.
L’archive ouverte pluridisciplinaire HAL, est
destinée au dépôt et à la diusion de documents
scientiques de niveau recherche, publiés ou non,
émanant des établissements d’enseignement et de
recherche français ou étrangers, des laboratoires
publics ou privés.
The NumPy array: a structure for ecient numerical
computation
Stefan van Der Walt, S. Chris Colbert, Gaël Varoquaux
To cite this version:
Stefan van Der Walt, S. Chris Colbert, Gaël Varoquaux. The NumPy array: a structure for ecient
numerical computation. Computing in Science and Engineering, Institute of Electrical and Electronics
Engineers, 2011, 13 (2), pp.22-30. �10.1109/MCSE.2011.37�. �inria-00564007�

The NumPy array: a structure for efficient
numerical computation
St´efan van der Walt, Stellenbosch University South Africa
S. Chris Colbert, Enthought USA
Gael Varoquaux, INRIA Saclay France
This article is published in IEEE Computing in Science and Engineering. Please refer to
the published version if accessible, as it contains editor’s improvements. (c) 2011 IEEE.
In the Python world, NumPy arrays are the stan-
dard representation for numerical data. Here, we
show how these arrays enable efficient implemen-
tation of numerical computations in a high-level
language. Overall, three techniques are applied
to improve performance: vectorizing calculations,
avoiding copying data in memory, and minimizing
operation counts.
We first present the NumPy array structure, then
show how to use it for efficient computation, and
finally how to share array d ata with other li-
braries.
Introduction
The Python programming language provides a
rich set of high-level data structures: lists for enu-
merating a collection of objects, dictionaries to
build hash tables, etc. However, these structures
are not ideally suited to high-performance numer-
ical computation.
In the mid-90s, an international team of volun-
teers started to develop a data-structure for effi-
cient array computation. This structure evolved
into what is now known as the N-dimensional
NumPy array.
The NumPy package, which compr ises the
NumPy array as well as a set of accompanying
mathematical functions, has found wide-spread
adoption in academia, national laboratories, and
industry, with applications ranging from gaming
to space exploration.
Personal use of this material is permitted. Permission
from IEEE must be obtained for all other users, includ-
ing reprinting/ republishing this material for advertising
or promotional purposes, creating new collective works for
resale or redistribution to servers or lists, or reuse of any
copyrighted components of this work in other works.
A NumPy array is a multidimensional, uniform
collection of elements. An array is character-
ized by the type of elements it contains and by
its shape. For example, a matrix may be repre-
sented as an array of shape (M × N) that contains
numbers, e.g., floating point or complex numbers.
Unlike matrices, NumPy arrays can have any
dimensionality. Furthermore, they may contain
other kinds of elements (or even combinations of
elements), such as booleans or dates.
Underneath the hood, a NumPy arr ay is really
just a convenient way of describing one or more
blocks of compu ter memory, so that the numbers
represented may be easily manipulated.
Basic usage
Throughout the code examples in th e article,
we assume that NumPy is imported as follows:
import numpy as np
Code snippets are shown as th ey appear inside an
[
IPython] prompt, such as this:
In [3]: np.__version__
Out [3]: 1.4.1’
Elements contained in an array can be indexed us-
ing the [] operator. In addition, parts of an array
may be retrieved using standard Python slicing
of the form start:stop:step. For instance, the
first two rows of an array x are given by x[:2, :]
or columns 1 through 3 by x[:, 1:4]. Similarly,
every second r ow is given by x[::2, :]. Note
that Python uses zero-based indexing.
1

The structure of a NumPy array: a view on
memory
A NumPy array (also called an “ndarray”, short
for N-dimensional array) describes memory, using
the f ollowing attributes:
Data pointer the memory address of the rst
byte in the array.
Data type description the kind of elements con-
tained in the array, for example oating point
numbers or integers.
Shape the shape of the array, for example (10,
10) f or a ten-by-ten array, or (5, 5, 5) for a
block of data describing a mesh grid of x-, y-
and z-coord inates.
Strides the number of bytes to skip in memory to
proceed to the next element. For a (10, 10)
array of bytes, for example, the strides may be
(10, 1), in other words: proceed one byte to
get to the next column and ten bytes to locate
the next row.
Flags which define whether we are allowed to
modify the array, whether memory layout is
C- or Fortran-contiguous
1
, and so forth.
NumPy’s strided memory model deserves
particular attention, as it provides a pow-
erful way of viewing the same memory in
different ways without copying data. For
example, consider the following integer array:
# Generate the integers from zero to eight and
# re pack them into a 3x3 array
In [1]: x = np.arange(9).reshape((3, 3))
In [2]: x
Out[2]:
array([[0, 1, 2],
[3, 4, 5],
[6, 7, 8]])
In [3]: x.strides
Out[3]: (24, 8)
On our 64-bit system, the default integer
data-type occupies 64-bits, or 8 bytes, in mem-
ory. The strid es therefore describe skipping
3 integers in memory to get to the next row
and one to get to the next column. We can
1
In C, memory is laid out in “row major” order, i.e.,
rows are stored one after another in memory. In Fortran,
columns are stored successively.
now generate a view on the same memory
where we only examine every second element.
In [4]: y = x[::2, ::2]
In [5]: y
Out[5]:
array([[0, 2],
[6, 8]])
In [6]: y.strides
Out[6]: (48, 16)
The arrays x and y point to the same memory
(i.e., if we modify the values in y we also modify
those in x), but the strides for y have been
changed so that only every second element is
seen along either axis. y is said to be a view on x:
In [7]: y[0, 0] = 100
In [8]: x
Out[8]:
array([[100, 1, 2],
[ 3, 4, 5],
[ 6, 7, 8]])
Views need not be created u sing slicing only.
By modifying str ides, for example, an array
can be transposed or reshaped at zero cost (no
memory needs to be copied). Moreover, the
strides, shape and dtype attributes of an array
may be specified manually by the user (provided
they are all compatible), enabling a plethora of
ways in which to interpret the underlying data.
# Transpose the array, using the shorthand "T"
In [9]: xT = x.T
In [10]: xT
Out[10]:
array([[100, 3, 6],
[ 1, 4, 7],
[ 2, 5, 8]])
In [11]: xT.strides
Out[11]: (8, 24)
# Change the shape of the array
In [12]: z = x.reshape((1, 9))
In [13]: z
Out[13]: array([[100, 1, 2, 3, 4, 5, 6, 7, 8]])
In [14]: z.strides
Out[14]: (72, 8)
# i.e., for the two-dimensional z, 9 * 8 bytes
# to skip over a row of 9 uint8 elements,
# 8 bytes to skip a single element
2 IEEE Computing in science and Engineering

# View data as bytes in memory rather than
# 64bit integers
In [15]: z.dtype = np.dtype(’uint8’)
In [16]: z
Out[17]:
array([[100, 0, 0, ...,
0, 0, 0, 0, 0, 0]], dtype=uint8)
In [18]: z.shape
Out[19]: (1, 72)
In [20]: z.strides
Out[20]: (72, 1)
In each of these cases, the resulting array points to
the same memory. The difference lies in the way
the data is interpreted, based on shape, strides
and data-type. Since no data is copied in memory,
these operations are extremely efficient.
Numerical operations on arrays: vectoriza-
tion
In any scripting language, un judicious use of for-
loops may lead to poor performance, particularly
in the case where a simple computation is applied
to each element of a large data-set.
Grouping these element-wise operations together,
a process known as vectorisation, allows NumPy
to perform such computations much more rapidly.
Suppose we have a vector a and wish to
multiply its magnitude by 3. A traditional
for-lo op approach would look as follows:
In [21]: a = [1, 3, 5]
In [22]: b = [3*x for x in a]
In [23]: b
Out[23]: [3, 9, 15]
The vectorized approach applies this operation to
all elements of an array:
In [24]: a = np.array([1, 3, 5])
In [25]: b = 3 * a
In [26]: b
Out[26]: array([ 3, 9, 15])
Vectorized operations in NumPy are implemented
in C, resulting in a significant speed improvement.
Operations are not restricted to interactions be-
tween scalars and arrays . For example, here
NumPy performs a fast element-wise subtraction
of two arr ays:
In [27]: b - a
Out [27]: array([2, 6, 10])
When the shapes of the two arguments are
not the same, but share a common shape di-
mension, the operation is broadcast across the
array. In other words, NumPy expands the
arrays s uch that the operation becomes viable:
In [28]: m = np.arange(6).reshape((2,3))
In [29]:m
Out [29]:
array([[0, 1, 2],
[3, 4, 5]])
In [30]: b + m
Out[30]:
array([[ 3, 10, 17],
[ 6, 13, 20]])
Refer to “Broadcasting Rules” to see when these
operations are viable.
To save memory, the broadcasted arrays are never
physically constructed; NumP y simply uses the
appropriate array elements during computation
2
Broadcasting Rules
Before broadcasting two arrays, NumPy verifies
that all dimensions are suitably matched. Dimen-
sions match when they are equal, or when either
is 1 or None. In the latter case, the dimension of
the output array is expanded to the larger of the
two.
For example, consider arrays x and y with s hapes
(2, 4, 3) and (4, 1) respectively. These arrays
are to be combin ed in a broadcasting operation
such as z = x + y. We match their dimensions
as follows:
x (2, 4, 3)
y ( 4, 1)
z (2, 4, 3)
Therefore, the dimensions of th ese arrays are com-
patible, and yield and output of shape (2, 4, 3).
Vectorization and broadcasting examples
Evaluating Functions
Suppose we wish to evaluate a function f over
a large set of numbers, x, stored as an array.
2
Under the hood, this is achieved by using strides of
zero.
3

Using a for-loop, the result is produced as follows:
In [31]: def f(x):
...: return x**2 - 3*x + 4
...:
In [32]: x = np.arange(1e5)
In [33]: y = [f(i) for i in x]
On our machine, th is loop executes in approxi-
mately 500 miliseconds. Applying the function f
on th e NumPy array x engages the fast, vector-
ized loop, which operates on each element indi-
vidually:
In [34]: y = f(x)
In [35]: y
Out[35]:
array([ 4.0000e+0, 2.0000e+0, 2.0000e+0, ...,
9.9991e+9, 9.9993e+9, 9.9995e+9])
The vectorized computation executes in 1 milisec-
ond.
As the length of the input array x grows,
however, execution speed decreases due to th e
construction of large temporary arrays. For ex-
ample, the operation above roughly tran s lates to
a = x**2
b = 3*x
c = a - b
fx = c + 4
Most array-based systems do not provide a
way to circumvent the creation of these tempo-
raries. With NumPy, the user may choose to
perform operations “in-place”—in other words,
in such a way that no new memory is allocated
and all results are stored in the current array.
def g(x):
# Allocate the output array with x-squared
fx = x**2
# In-place operations: no new memory allocated
fx -= 3*x
fx += 4
return fx
Applying g to x takes 600 microseconds; almost
twice as fast as the naive vectorization. Note that
we did not compute 3*x in-place, as it would mod-
ify the original data in x.
This example illustrates the ease with which
NumPy handles vectorized array operations,
without relinquishing control over performance-
critical aspects such as memory allocation.
Note that performance may be bo osted even fur-
ther by using tools such as [
Cython], [Theano]
or [
numexpr], which lessen the load on the mem-
ory bus. Cython, a Python to C compiler dis-
cussed later in this issue, is especially useful in
cases where code cannot be vectorized easily.
Finite Differencing
The derivative on a discrete sequence is often
computed using finite differencing. Slicing makes
this operation trivial.
Suppose we have an n + 1 length vector and per-
form a forward divided difference.
In [36]: x = np.arange(0, 10, 2)
In [37]: x
Out[37]: array([ 0, 2, 4, 6, 8])
In [38]: y = x**2
In [39]: y
Out[39]: array([ 0, 4, 16, 36, 64])
In [40]: dy_dx = (y[1:]-y[:-1])/(x[1:]-x[:-1])
In [41]: dy_dx
Out[41]: array([ 2, 6, 10, 14, 18])
In this example, y[1:] takes a slice of the y ar-
ray starting at index 1 an d continuing to the end
of the array. y[:-1] takes a slice which starts at
index 0 and continues to one index short of the
end of the array. Thus y[1:] - y[:-1] has the
effect of subtracting, from each element in the ar-
ray, the element directly preceding it. Performing
the same differencing on the x array and dividing
the two resulting arrays yields the forward divided
difference.
If we assume that the vectors are length n +
2, then calculating the central divided differ-
ence is simply a matter of m odifying the slices:
In [42]: dy_dx_c = (y[2:]-y[:-2])/(x[2:]-x[:-2])
In [43]: dy_dx_c
Out[43]: array([ 4, 8, 12, 16])
In Pure Python, these operation would be written
using a for loop. For x containing 1000 elements,
the NumPy implementation is 100 times faster.
Creating a grid using broa dcasting
Suppose we want to produ ce a three-dimensional
grid of distances R
ijk
=
p
i
2
+ j
2
+ k
2
with i = 100 . . . 99, j = 100 . . . 99, and
4 IEEE Computing in science and Engineering

Citations
More filters
Journal Article
TL;DR: Scikit-learn is a Python module integrating a wide range of state-of-the-art machine learning algorithms for medium-scale supervised and unsupervised problems, focusing on bringing machine learning to non-specialists using a general-purpose high-level language.
Abstract: Scikit-learn is a Python module integrating a wide range of state-of-the-art machine learning algorithms for medium-scale supervised and unsupervised problems. This package focuses on bringing machine learning to non-specialists using a general-purpose high-level language. Emphasis is put on ease of use, performance, documentation, and API consistency. It has minimal dependencies and is distributed under the simplified BSD license, encouraging its use in both academic and commercial settings. Source code, binaries, and documentation can be downloaded from http://scikit-learn.sourceforge.net.

47,974 citations

Journal ArticleDOI
TL;DR: This work presents HTSeq, a Python library to facilitate the rapid development of custom scripts for high-throughput sequencing data analysis, and presents htseq-count, a tool developed with HTSequ that preprocesses RNA-Seq data for differential expression analysis by counting the overlap of reads with genes.
Abstract: Motivation: A large choice of tools exists for many standard tasks in the analysis of high-throughput sequencing (HTS) data. However, once a project deviates from standard workflows, custom scripts are needed. Results: We present HTSeq, a Python library to facilitate the rapid development of such scripts. HTSeq offers parsers for many common data formats in HTS projects, as well as classes to represent data, such as genomic coordinates, sequences, sequencing reads, alignments, gene model information and variant calls, and provides data structures that allow for querying via genomic coordinates. We also present htseq-count, a tool developed with HTSeq that preprocesses RNA-Seq data for differential expression analysis by counting the overlap of reads with genes. Availability and implementation: HTSeq is released as an opensource software under the GNU General Public Licence and available from http://www-huber.embl.de/HTSeq or from the Python Package Index at https://pypi.python.org/pypi/HTSeq. Contact: sanders@fs.tum.de

15,744 citations

Journal ArticleDOI
TL;DR: SciPy as discussed by the authors is an open source scientific computing library for the Python programming language, which includes functionality spanning clustering, Fourier transforms, integration, interpolation, file I/O, linear algebra, image processing, orthogonal distance regression, minimization algorithms, signal processing, sparse matrix handling, computational geometry, and statistics.
Abstract: SciPy is an open source scientific computing library for the Python programming language. SciPy 1.0 was released in late 2017, about 16 years after the original version 0.1 release. SciPy has become a de facto standard for leveraging scientific algorithms in the Python programming language, with more than 600 unique code contributors, thousands of dependent packages, over 100,000 dependent repositories, and millions of downloads per year. This includes usage of SciPy in almost half of all machine learning projects on GitHub, and usage by high profile projects including LIGO gravitational wave analysis and creation of the first-ever image of a black hole (M87). The library includes functionality spanning clustering, Fourier transforms, integration, interpolation, file I/O, linear algebra, image processing, orthogonal distance regression, minimization algorithms, signal processing, sparse matrix handling, computational geometry, and statistics. In this work, we provide an overview of the capabilities and development practices of the SciPy library and highlight some recent technical developments.

12,774 citations

Journal ArticleDOI
TL;DR: Astropy as discussed by the authors is a Python package for astronomy-related functionality, including support for domain-specific file formats such as flexible image transport system (FITS) files, Virtual Observatory (VO) tables, common ASCII table formats, unit and physical quantity conversions, physical constants specific to astronomy, celestial coordinate and time transformations, world coordinate system (WCS) support, generalized containers for representing gridded as well as tabular data, and a framework for cosmological transformations and conversions.
Abstract: We present the first public version (v02) of the open-source and community-developed Python package, Astropy This package provides core astronomy-related functionality to the community, including support for domain-specific file formats such as flexible image transport system (FITS) files, Virtual Observatory (VO) tables, and common ASCII table formats, unit and physical quantity conversions, physical constants specific to astronomy, celestial coordinate and time transformations, world coordinate system (WCS) support, generalized containers for representing gridded as well as tabular data, and a framework for cosmological transformations and conversions Significant functionality is under activedevelopment, such as a model fitting framework, VO client and server tools, and aperture and point spread function (PSF) photometry tools The core development team is actively making additions and enhancements to the current code base, and we encourage anyone interested to participate in the development of future Astropy versions

9,720 citations

Journal ArticleDOI
16 Sep 2020-Nature
TL;DR: In this paper, the authors review how a few fundamental array concepts lead to a simple and powerful programming paradigm for organizing, exploring and analysing scientific data, and their evolution into a flexible interoperability layer between increasingly specialized computational libraries is discussed.
Abstract: Array programming provides a powerful, compact and expressive syntax for accessing, manipulating and operating on data in vectors, matrices and higher-dimensional arrays. NumPy is the primary array programming library for the Python language. It has an essential role in research analysis pipelines in fields as diverse as physics, chemistry, astronomy, geoscience, biology, psychology, materials science, engineering, finance and economics. For example, in astronomy, NumPy was an important part of the software stack used in the discovery of gravitational waves1 and in the first imaging of a black hole2. Here we review how a few fundamental array concepts lead to a simple and powerful programming paradigm for organizing, exploring and analysing scientific data. NumPy is the foundation upon which the scientific Python ecosystem is constructed. It is so pervasive that several projects, targeting audiences with specialized needs, have developed their own NumPy-like interfaces and array objects. Owing to its central position in the ecosystem, NumPy increasingly acts as an interoperability layer between such array computation libraries and, together with its application programming interface (API), provides a flexible framework to support the next decade of scientific and industrial analysis. NumPy is the primary array programming library for Python; here its fundamental concepts are reviewed and its evolution into a flexible interoperability layer between increasingly specialized computational libraries is discussed.

7,624 citations

References
More filters
Journal ArticleDOI
01 May 2007
TL;DR: The IPython project as mentioned in this paper provides an enhanced interactive environment that includes, among other features, support for data visualization and facilities for distributed and parallel computation for interactive work and a comprehensive library on top of which more sophisticated systems can be built.
Abstract: Python offers basic facilities for interactive work and a comprehensive library on top of which more sophisticated systems can be built. The IPython project provides on enhanced interactive environment that includes, among other features, support for data visualization and facilities for distributed and parallel computation

3,355 citations

Related Papers (5)
Frequently Asked Questions (1)
Q1. What contributions have the authors mentioned in the paper "The numpy array: a structure for efficient numerical computation" ?

The N-dimensional array this paper is a high-level data structure that facilitates vectorization of for-loops.