Indexing vectors and arrays in Python

| categories: basic | tags:

Matlab post There are times where you have a lot of data in a vector or array and you want to extract a portion of the data for some analysis. For example, maybe you want to plot column 1 vs column 2, or you want the integral of data between x = 4 and x = 6, but your vector covers 0 < x < 10. Indexing is the way to do these things.

A key point to remember is that in python array/vector indices start at 0. Unlike Matlab, which uses parentheses to index a array, we use brackets in python.

import numpy as np

x = np.linspace(-np.pi, np.pi, 10)
print x

print x[0]  # first element
print x[2]  # third element
print x[-1] # last element
print x[-2] # second to last element
>>> >>> [-3.14159265 -2.44346095 -1.74532925 -1.04719755 -0.34906585  0.34906585
  1.04719755  1.74532925  2.44346095  3.14159265]
>>> -3.14159265359
-1.74532925199
3.14159265359
2.44346095279

We can select a range of elements too. The syntax a:b extracts the a^{th} to (b-1)^{th} elements. The syntax a:b:n starts at a, skips nelements up to the index b.

print x[1:4]     # second to fourth element. Element 5 is not included
print x[0:-1:2]  # every other element
print x[:]       # print the whole vector
print x[-1:0:-1] # reverse the vector!
[-2.44346095 -1.74532925 -1.04719755]
[-3.14159265 -1.74532925 -0.34906585  1.04719755  2.44346095]
[-3.14159265 -2.44346095 -1.74532925 -1.04719755 -0.34906585  0.34906585
  1.04719755  1.74532925  2.44346095  3.14159265]
[ 3.14159265  2.44346095  1.74532925  1.04719755  0.34906585 -0.34906585
 -1.04719755 -1.74532925 -2.44346095]

Suppose we want the part of the vector where x > 2. We could do that by inspection, but there is a better way. We can create a mask of boolean (0 or 1) values that specify whether x > 2 or not, and then use the mask as an index.

print x[x > 2]
[ 2.44346095  3.14159265]

You can use this to analyze subsections of data, for example to integrate the function y = sin(x) where x > 2.

y = np.sin(x)

print np.trapz( x[x > 2], y[x > 2])
>>> -1.79500162881

1 2d arrays

In 2d arrays, we use row, column notation. We use a : to indicate all rows or all columns.

a = np.array([[1, 2, 3], 
              [4, 5, 6], 
              [7, 8, 9]])

print a[0, 0]
print a[-1, -1]

print a[0, :] # row one
print a[:, 0] # column one
print a[:]
... >>> >>> 1
9
>>> [1 2 3]
[1 4 7]
[[1 2 3]
 [4 5 6]
 [7 8 9]]

2 Using indexing to assign values to rows and columns

b = np.zeros((3, 3))
print b

b[:, 0] = [1, 2, 3] # set column 0
b[2, 2] = 12        # set a single element
print b

b[2] = 6  # sets everything in row 2 to 6!
print b
[[ 0.  0.  0.]
 [ 0.  0.  0.]
 [ 0.  0.  0.]]
>>> >>> >>> [[  1.   0.   0.]
 [  2.   0.   0.]
 [  3.   0.  12.]]
>>> >>> [[ 1.  0.  0.]
 [ 2.  0.  0.]
 [ 6.  6.  6.]]

Python does not have the linear assignment method like Matlab does. You can achieve something like that as follows. We flatten the array to 1D, do the linear assignment, and reshape the result back to the 2D array.

c = b.flatten()
c[2] = 34
b[:] = c.reshape(b.shape)
print b
>>> >>> [[  1.   0.  34.]
 [  2.   0.   0.]
 [  6.   6.   6.]]

3 3D arrays

The 3d array is like book of 2D matrices. Each page has a 2D matrix on it. think about the indexing like this: (row, column, page)

M = np.random.uniform(size=(3,3,3))  # a 3x3x3 array
print M
[[[ 0.78557795  0.36454381  0.96090072]
  [ 0.76133373  0.03250485  0.08517174]
  [ 0.96007909  0.08654002  0.29693648]]

 [[ 0.58270738  0.60656083  0.47703339]
  [ 0.62551477  0.62244626  0.11030327]
  [ 0.2048839   0.83081982  0.83660668]]

 [[ 0.12489176  0.20783996  0.38481792]
  [ 0.05234762  0.03989146  0.09731516]
  [ 0.67427208  0.51793637  0.89016255]]]
print M[:, :, 0]  # 2d array on page 0
print M[:, 0, 0]  # column 0 on page 0
print M[1, :, 2]  # row 1 on page 2
[[ 0.78557795  0.76133373  0.96007909]
 [ 0.58270738  0.62551477  0.2048839 ]
 [ 0.12489176  0.05234762  0.67427208]]
[ 0.78557795  0.58270738  0.12489176]
[ 0.47703339  0.11030327  0.83660668]

4 Summary

The most common place to use indexing is probably when a function returns an array with the independent variable in column 1 and solution in column 2, and you want to plot the solution. Second is when you want to analyze one part of the solution. There are also applications in numerical methods, for example in assigning values to the elements of a matrix or vector.

Copyright (C) 2013 by John Kitchin. See the License for information about copying.

org-mode source

Discuss on Twitter