In [1]:
%load_ext load_style
%load_style talk.css

Numpy

In [2]:
import addutils.toc ; addutils.toc.js(ipy_notebook=True)
Out[2]:
In [3]:
from IPython.display import Image, HTML

Numpy is the fundamental building block of the so-called Python Scientific Stack

  • It provides a special data structure, the numpy ndarray (N - dimensional array) which allows the representation of arrays of numerical values, integers, reals, complex

  • Numpy arrays by default can only hold objects of the same type (e.g. cannot mix integers and floats))

  • It also provides with random number capability, some convenience simple statistical functions (mean, std, ...) and some essential linear algebra operations

In [4]:
HTML("<iframe src='http://numpy.org/' width=1000 height=500></iframe>")
Out[4]:

The conventional way to import numpy is

import numpy as np
In [5]:
import numpy as np
In [6]:
import matplotlib.pyplot as plt
%matplotlib inline

Numpy basics

Constructing an numpy array

In [7]:
x = [1,2,3,4] # x is a regular python list
In [8]:
type(x)
Out[8]:
list
In [9]:
x = np.array(x)
In [10]:
type(x)
Out[10]:
numpy.ndarray
In [11]:
a = np.array(range(10))
In [12]:
a
Out[12]:
array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])
In [13]:
x = [[1,2,3],[4,5,6]] # nested lists
In [14]:
x = np.array(x)
In [15]:
x
Out[15]:
array([[1, 2, 3],
       [4, 5, 6]])
In [16]:
x.shape
Out[16]:
(2, 3)

The shape attribute is a tuple containing the dimensions along each axis (0 = rows, 1 = columns, ...)

In [17]:
x = np.arange(1000).reshape((10,10,10))
In [18]:
x.shape
Out[18]:
(10, 10, 10)

Be careful: keep in mind references Vs. copies of arrays!

In [19]:
a
Out[19]:
array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])
In [20]:
x = a
In [21]:
x
Out[21]:
array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])
In [22]:
a[2] = 99
In [23]:
a
Out[23]:
array([ 0,  1, 99,  3,  4,  5,  6,  7,  8,  9])
In [24]:
x
Out[24]:
array([ 0,  1, 99,  3,  4,  5,  6,  7,  8,  9])

x has changed !, because x = a only means you create a new reference pointing to the same object as the array a

We can see that by using the id function, which returns the object's memory adress

In [25]:
id(a)
Out[25]:
4674173648
In [26]:
id(x)
Out[26]:
4674173648

If you want to create a copy instead of a reference, you need to do it explicitely

In [27]:
x = a.copy()
In [28]:
id(x)
Out[28]:
4674189152
In [29]:
a[3] = 888
In [30]:
a
Out[30]:
array([  0,   1,  99, 888,   4,   5,   6,   7,   8,   9])
In [31]:
x
Out[31]:
array([ 0,  1, 99,  3,  4,  5,  6,  7,  8,  9])

Numpy array types

Numpy tries and guess the correct type of the numerical values if you pass a list, here x is of type integer 64

In [32]:
x.dtype
Out[32]:
dtype('int64')

You can change that using the method astype()

In [33]:
x = x.astype(np.float)
In [34]:
x.dtype
Out[34]:
dtype('float64')
In [35]:
x
Out[35]:
array([  0.,   1.,  99.,   3.,   4.,   5.,   6.,   7.,   8.,   9.])

you can mix different types when creating an array from a list, but numpy will cast all values to the safest dtype

In [42]:
x = np.array([1.0,1,1])
In [43]:
x
Out[43]:
array([ 1.,  1.,  1.])

You can (but why would you ?) create arrays of native Python types, i.e. here an array containing a dictionnary and a tuple

In [44]:
x = np.array([{'A':1},(1,3)])

In that case the dtype is Object ('O')

In [45]:
x
Out[45]:
array([{'A': 1}, (1, 3)], dtype=object)

Some convenience functions for creating arrays

np.arange(start, stop, step, dtype=None)

In [46]:
np.arange(100).shape
Out[46]:
(100,)
In [47]:
np.arange(0,100+10,10)
Out[47]:
array([  0,  10,  20,  30,  40,  50,  60,  70,  80,  90, 100])

np.linspace(start, stop, num=50, *endpoint=True*, *retstep=False*)

In [48]:
np.linspace(0,0.5,10, endpoint=False)
Out[48]:
array([ 0.  ,  0.05,  0.1 ,  0.15,  0.2 ,  0.25,  0.3 ,  0.35,  0.4 ,  0.45])

same as np.arange(0,0.5, 0.05)

In [49]:
np.arange(0,0.5,0.05)
Out[49]:
array([ 0.  ,  0.05,  0.1 ,  0.15,  0.2 ,  0.25,  0.3 ,  0.35,  0.4 ,  0.45])

np.ones(shape, dtype=None, order='C') creates an array of ones, similar function is np.zeros

In [50]:
np.ones((10,12))
Out[50]:
array([[ 1.,  1.,  1.,  1.,  1.,  1.,  1.,  1.,  1.,  1.,  1.,  1.],
       [ 1.,  1.,  1.,  1.,  1.,  1.,  1.,  1.,  1.,  1.,  1.,  1.],
       [ 1.,  1.,  1.,  1.,  1.,  1.,  1.,  1.,  1.,  1.,  1.,  1.],
       [ 1.,  1.,  1.,  1.,  1.,  1.,  1.,  1.,  1.,  1.,  1.,  1.],
       [ 1.,  1.,  1.,  1.,  1.,  1.,  1.,  1.,  1.,  1.,  1.,  1.],
       [ 1.,  1.,  1.,  1.,  1.,  1.,  1.,  1.,  1.,  1.,  1.,  1.],
       [ 1.,  1.,  1.,  1.,  1.,  1.,  1.,  1.,  1.,  1.,  1.,  1.],
       [ 1.,  1.,  1.,  1.,  1.,  1.,  1.,  1.,  1.,  1.,  1.,  1.],
       [ 1.,  1.,  1.,  1.,  1.,  1.,  1.,  1.,  1.,  1.,  1.,  1.],
       [ 1.,  1.,  1.,  1.,  1.,  1.,  1.,  1.,  1.,  1.,  1.,  1.]])

np.empty can be used to pre-allocate "empty" arrays with small memory footprint

In [51]:
np.empty((5,5))
Out[51]:
array([[  0.00000000e+000,  -2.31584193e+077,   2.12215769e-314,
          1.97626258e-323,   0.00000000e+000],
       [  0.00000000e+000,   0.00000000e+000,   1.10413299e+283,
          0.00000000e+000,   0.00000000e+000],
       [ -2.67371240e+146,   2.16460678e-314,   2.30735947e-314,
          4.77469475e+092,   0.00000000e+000],
       [  0.00000000e+000,  -1.33336491e+267,   0.00000000e+000,
          0.00000000e+000,  -7.21732008e-308],
       [  0.00000000e+000,   0.00000000e+000,   1.28114332e+235,
          0.00000000e+000,   0.00000000e+000]])

Array indexing

Simple indexing

In [52]:
a = np.arange(25,dtype=float).reshape(5,5) # creating a 2D array, note the dtype
In [53]:
print(a)
[[  0.   1.   2.   3.   4.]
 [  5.   6.   7.   8.   9.]
 [ 10.  11.  12.  13.  14.]
 [ 15.  16.  17.  18.  19.]
 [ 20.  21.  22.  23.  24.]]
In [54]:
a.shape
Out[54]:
(5, 5)
In [55]:
a[0,1] # FIRST row, SECOND column, remember, Python indexing is zero-based
Out[55]:
1.0
In [56]:
a[:,0] # first column 
Out[56]:
array([  0.,   5.,  10.,  15.,  20.])
In [57]:
a[1,:] # second row
Out[57]:
array([ 5.,  6.,  7.,  8.,  9.])
In [58]:
a[:,2:] # all rows from third colum
Out[58]:
array([[  2.,   3.,   4.],
       [  7.,   8.,   9.],
       [ 12.,  13.,  14.],
       [ 17.,  18.,  19.],
       [ 22.,  23.,  24.]])
In [59]:
a[-1,:] # last row
Out[59]:
array([ 20.,  21.,  22.,  23.,  24.])
In [60]:
a[:,::-1] # handy: reverses column order 
Out[60]:
array([[  4.,   3.,   2.,   1.,   0.],
       [  9.,   8.,   7.,   6.,   5.],
       [ 14.,  13.,  12.,  11.,  10.],
       [ 19.,  18.,  17.,  16.,  15.],
       [ 24.,  23.,  22.,  21.,  20.]])
In [61]:
a
Out[61]:
array([[  0.,   1.,   2.,   3.,   4.],
       [  5.,   6.,   7.,   8.,   9.],
       [ 10.,  11.,  12.,  13.,  14.],
       [ 15.,  16.,  17.,  18.,  19.],
       [ 20.,  21.,  22.,  23.,  24.]])

slices allows to easily select contiguous row / colums

In [62]:
a[0:2,1:3] # first two rows, and columns 2 to 3
Out[62]:
array([[ 1.,  2.],
       [ 6.,  7.]])

When you want to select non-contiguous (or repeated) rows / columns, you can use the ix_ construct

In [63]:
a[np.ix_([0,1,3],[1,1,3])]
Out[63]:
array([[  1.,   1.,   3.],
       [  6.,   6.,   8.],
       [ 16.,  16.,  18.]])
In [64]:
a
Out[64]:
array([[  0.,   1.,   2.,   3.,   4.],
       [  5.,   6.,   7.,   8.,   9.],
       [ 10.,  11.,  12.,  13.,  14.],
       [ 15.,  16.,  17.,  18.,  19.],
       [ 20.,  21.,  22.,  23.,  24.]])

Boolean or conditional indexing

In [65]:
a = a[a>10]
In [66]:
print(a)
[ 11.  12.  13.  14.  15.  16.  17.  18.  19.  20.  21.  22.  23.  24.]

Array broadcasting

Simple possible case: scalar broadcasting

In [67]:
x = np.array([0,1,2,3]); print(x)
[0 1 2 3]
In [68]:
x.shape
Out[68]:
(4,)
In [69]:
x + 3
Out[69]:
array([3, 4, 5, 6])
![](images/broadcast_scalar.png)

2D broadcasting

In [70]:
x = np.arange(12).reshape((3,4))
In [71]:
x
Out[71]:
array([[ 0,  1,  2,  3],
       [ 4,  5,  6,  7],
       [ 8,  9, 10, 11]])
In [72]:
x.shape
Out[72]:
(3, 4)
In [73]:
y = np.arange(3).reshape((3,1)); # or y = np.arange(3)[np.newaxis,:] or y = np.arange(3)[None,:]
In [74]:
y.shape
Out[74]:
(3, 1)
In [75]:
x + y
Out[75]:
array([[ 0,  1,  2,  3],
       [ 5,  6,  7,  8],
       [10, 11, 12, 13]])
![](images/broadcast_2D.png)

3D broadcasting

In [76]:
x = np.arange(15).reshape((3, 5))
In [77]:
x.shape
Out[77]:
(3, 5)
In [78]:
y = np.ones(8)
In [79]:
z = x[...,np.newaxis] + y
In [80]:
print(z)
[[[  1.   1.   1.   1.   1.   1.   1.   1.]
  [  2.   2.   2.   2.   2.   2.   2.   2.]
  [  3.   3.   3.   3.   3.   3.   3.   3.]
  [  4.   4.   4.   4.   4.   4.   4.   4.]
  [  5.   5.   5.   5.   5.   5.   5.   5.]]

 [[  6.   6.   6.   6.   6.   6.   6.   6.]
  [  7.   7.   7.   7.   7.   7.   7.   7.]
  [  8.   8.   8.   8.   8.   8.   8.   8.]
  [  9.   9.   9.   9.   9.   9.   9.   9.]
  [ 10.  10.  10.  10.  10.  10.  10.  10.]]

 [[ 11.  11.  11.  11.  11.  11.  11.  11.]
  [ 12.  12.  12.  12.  12.  12.  12.  12.]
  [ 13.  13.  13.  13.  13.  13.  13.  13.]
  [ 14.  14.  14.  14.  14.  14.  14.  14.]
  [ 15.  15.  15.  15.  15.  15.  15.  15.]]]
In [81]:
z.shape
Out[81]:
(3, 5, 8)
![](images/array_3x5x8.png)

Array methods

We have already seen some array methods above, in this section I just dwelve into their general behavior and present some convenient array methods

In python, everything is an object, numpy ndarrays are no exceptions, and they expose a number of methods, which are equivalent to the call the respective numpy function, but can be more convenient to use, and - marginally - faster

In [82]:
a.shape # shape of a (like *size* in matlab)
Out[82]:
(14,)
In [83]:
np.shape(a)
Out[83]:
(14,)
In [84]:
%timeit a.mean() # The %timeit magic cell function times the execution of an expression
The slowest run took 15.17 times longer than the fastest. This could mean that an intermediate result is being cached.
100000 loops, best of 3: 4.58 µs per loop
In [85]:
%timeit np.mean(a)
The slowest run took 7.83 times longer than the fastest. This could mean that an intermediate result is being cached.
100000 loops, best of 3: 5.06 µs per loop
In [86]:
b = np.arange(100,dtype=float).reshape(10,10) # creating a 2D array, note the dtype
In [87]:
b.shape
Out[87]:
(10, 10)
In [88]:
b
Out[88]:
array([[  0.,   1.,   2.,   3.,   4.,   5.,   6.,   7.,   8.,   9.],
       [ 10.,  11.,  12.,  13.,  14.,  15.,  16.,  17.,  18.,  19.],
       [ 20.,  21.,  22.,  23.,  24.,  25.,  26.,  27.,  28.,  29.],
       [ 30.,  31.,  32.,  33.,  34.,  35.,  36.,  37.,  38.,  39.],
       [ 40.,  41.,  42.,  43.,  44.,  45.,  46.,  47.,  48.,  49.],
       [ 50.,  51.,  52.,  53.,  54.,  55.,  56.,  57.,  58.,  59.],
       [ 60.,  61.,  62.,  63.,  64.,  65.,  66.,  67.,  68.,  69.],
       [ 70.,  71.,  72.,  73.,  74.,  75.,  76.,  77.,  78.,  79.],
       [ 80.,  81.,  82.,  83.,  84.,  85.,  86.,  87.,  88.,  89.],
       [ 90.,  91.,  92.,  93.,  94.,  95.,  96.,  97.,  98.,  99.]])
In [89]:
b.sum(0) ### summing over the first axis ('rows')
Out[89]:
array([ 450.,  460.,  470.,  480.,  490.,  500.,  510.,  520.,  530.,  540.])
In [90]:
np.sum(b, axis=0)
Out[90]:
array([ 450.,  460.,  470.,  480.,  490.,  500.,  510.,  520.,  530.,  540.])
In [91]:
b[:,0].sum()
Out[91]:
450.0

swapaxes method swap the position of 2 axes

In [92]:
b = b.swapaxes(1,0)
In [93]:
b
Out[93]:
array([[  0.,  10.,  20.,  30.,  40.,  50.,  60.,  70.,  80.,  90.],
       [  1.,  11.,  21.,  31.,  41.,  51.,  61.,  71.,  81.,  91.],
       [  2.,  12.,  22.,  32.,  42.,  52.,  62.,  72.,  82.,  92.],
       [  3.,  13.,  23.,  33.,  43.,  53.,  63.,  73.,  83.,  93.],
       [  4.,  14.,  24.,  34.,  44.,  54.,  64.,  74.,  84.,  94.],
       [  5.,  15.,  25.,  35.,  45.,  55.,  65.,  75.,  85.,  95.],
       [  6.,  16.,  26.,  36.,  46.,  56.,  66.,  76.,  86.,  96.],
       [  7.,  17.,  27.,  37.,  47.,  57.,  67.,  77.,  87.,  97.],
       [  8.,  18.,  28.,  38.,  48.,  58.,  68.,  78.,  88.,  98.],
       [  9.,  19.,  29.,  39.,  49.,  59.,  69.,  79.,  89.,  99.]])

here equivalent to transpose() on 2D array

In [94]:
b = b.transpose()
In [95]:
b
Out[95]:
array([[  0.,   1.,   2.,   3.,   4.,   5.,   6.,   7.,   8.,   9.],
       [ 10.,  11.,  12.,  13.,  14.,  15.,  16.,  17.,  18.,  19.],
       [ 20.,  21.,  22.,  23.,  24.,  25.,  26.,  27.,  28.,  29.],
       [ 30.,  31.,  32.,  33.,  34.,  35.,  36.,  37.,  38.,  39.],
       [ 40.,  41.,  42.,  43.,  44.,  45.,  46.,  47.,  48.,  49.],
       [ 50.,  51.,  52.,  53.,  54.,  55.,  56.,  57.,  58.,  59.],
       [ 60.,  61.,  62.,  63.,  64.,  65.,  66.,  67.,  68.,  69.],
       [ 70.,  71.,  72.,  73.,  74.,  75.,  76.,  77.,  78.,  79.],
       [ 80.,  81.,  82.,  83.,  84.,  85.,  86.,  87.,  88.,  89.],
       [ 90.,  91.,  92.,  93.,  94.,  95.,  96.,  97.,  98.,  99.]])
In [96]:
b = np.transpose(b)
In [97]:
b
Out[97]:
array([[  0.,  10.,  20.,  30.,  40.,  50.,  60.,  70.,  80.,  90.],
       [  1.,  11.,  21.,  31.,  41.,  51.,  61.,  71.,  81.,  91.],
       [  2.,  12.,  22.,  32.,  42.,  52.,  62.,  72.,  82.,  92.],
       [  3.,  13.,  23.,  33.,  43.,  53.,  63.,  73.,  83.,  93.],
       [  4.,  14.,  24.,  34.,  44.,  54.,  64.,  74.,  84.,  94.],
       [  5.,  15.,  25.,  35.,  45.,  55.,  65.,  75.,  85.,  95.],
       [  6.,  16.,  26.,  36.,  46.,  56.,  66.,  76.,  86.,  96.],
       [  7.,  17.,  27.,  37.,  47.,  57.,  67.,  77.,  87.,  97.],
       [  8.,  18.,  28.,  38.,  48.,  58.,  68.,  78.,  88.,  98.],
       [  9.,  19.,  29.,  39.,  49.,  59.,  69.,  79.,  89.,  99.]])

argsort, argmin and argmax returns the indices that would sort an array (along an axis), and the indices of min and max values

In [98]:
a = np.array([11,13,15,17,19,18,16,14,12,10])
In [99]:
a.argsort()
Out[99]:
array([9, 0, 8, 1, 7, 2, 6, 3, 5, 4])
In [100]:
a.sort(); print(a)
[10 11 12 13 14 15 16 17 18 19]

array functions

we've seen above array methods, available through the object-oriented interface of Python, now we're gonna see a few useful numpy functions

In [101]:
from scipy import misc
In [115]:
img = misc.ascent()
In [116]:
img.shape
Out[116]:
(512, 512)
In [118]:
plt.imshow(img, cmap=plt.get_cmap('gray'))
Out[118]:
<matplotlib.image.AxesImage at 0x12196e710>
In [119]:
plt.imshow(np.tile(img, (2,1)), cmap=plt.get_cmap('gray'))
Out[119]:
<matplotlib.image.AxesImage at 0x12224a748>
In [120]:
plt.imshow(np.tile(img, (1,2)), cmap=plt.get_cmap('gray'))
Out[120]:
<matplotlib.image.AxesImage at 0x122472c18>
In [122]:
plt.imshow(np.repeat(img, 2, axis=1), cmap=plt.get_cmap('gray'))
Out[122]:
<matplotlib.image.AxesImage at 0x12624a4a8>
In [123]:
plt.imshow(np.resize(img, (750,512)), cmap=plt.get_cmap('gray'))
Out[123]:
<matplotlib.image.AxesImage at 0x126b49e10>

Universal functions

A universal function (or ufunc for short) is a function that operates on ndarrays in an element-by-element fashion, supporting array broadcasting, type casting, and several other standard features. That is, a ufunc is a “vectorized” wrapper for a function that takes a fixed number of scalar inputs and produces a fixed number of scalar outputs.

In [124]:
x = np.sin(np.linspace(-2*np.pi, 2*np.pi, 10000).reshape((100,100)))
In [125]:
plt.imshow(x, cmap=plt.get_cmap('gray_r'))
Out[125]:
<matplotlib.image.AxesImage at 0x126a7e2e8>

vectorization

In [126]:
def max100(x): 
    """
    returns the max between x and 100
    """

    return(max(x,100))
In [127]:
x = np.random.normal(loc=50, scale=100, size=(10,10))
In [128]:
x
Out[128]:
array([[  85.96882073,   -4.85419519,   40.93647613,   61.34115088,
         -74.07013883,  235.86121438,   67.56926204,  165.35244879,
         -11.55018803,  131.88944278],
       [  17.97899749,  -87.03910947,   69.88616634,  148.71165995,
          97.35570989,  -91.62212936,   10.4564738 ,  135.66393545,
          60.79207716,   98.40147212],
       [ 186.73376068,   51.47336231,   35.99202655,  -10.96664629,
          70.02658387,  163.90961498,   21.285523  ,  167.29285624,
         123.9716435 ,  134.59933669],
       [-161.2336334 ,   84.41361207,  242.62581312,  -81.96930962,
          87.03343265,   47.35928614,  181.98841243,   46.51274503,
        -135.52195286,  141.47775003],
       [ 115.18655372,  304.7347711 ,   36.96549786,    7.20365519,
          84.28024952,  140.34426477,   37.46342323,   96.49298757,
          99.2588406 ,   29.08305933],
       [ 183.62732115,  177.41355409,  129.54490258,  -41.45397043,
         175.705348  ,  213.84943131,  141.95564816,  -38.88866584,
         -62.11166971,  176.66332869],
       [  18.15725411,    6.39192946,   -6.9505756 ,  317.57223785,
         102.66031738,  123.0242222 ,  183.36227507,  266.52537873,
          48.47626772,  290.83283544],
       [ 216.58567703,   19.43401723,  -43.4156069 ,  -76.80477233,
          46.48375053,   31.36210677,   30.6872406 ,  -87.4210159 ,
         178.69093388,  128.07763315],
       [  47.06744496,  161.76900474,   85.54501735,   96.88438828,
         207.819342  ,   57.28317261,   26.71382561, -131.37734719,
         112.56385944,  -84.41433904],
       [ 117.54866027,  173.06462723,   34.66758558,  207.8307638 ,
          18.34003621,  173.22540381,   57.65330187,   47.48087741,
         -25.0513813 , -100.45954286]])
In [129]:
max100(x)
---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-129-156bf71d3f74> in <module>()
----> 1 max100(x)

<ipython-input-126-f7f82149988e> in max100(x)
      4     """
      5 
----> 6     return(max(x,100))

ValueError: The truth value of an array with more than one element is ambiguous. Use a.any() or a.all()

that fails because max100 is just a regular python function, that expects a single object

but we can build easily a vectorized version of the same function (i.e. building a ufuncmm)

In [130]:
vmax100 = np.vectorize(max100)
In [131]:
z = vmax100(x)
In [132]:
z.shape
Out[132]:
(10, 10)
In [133]:
z
Out[133]:
array([[100, 100, 100, 100, 100, 235, 100, 165, 100, 131],
       [100, 100, 100, 148, 100, 100, 100, 135, 100, 100],
       [186, 100, 100, 100, 100, 163, 100, 167, 123, 134],
       [100, 100, 242, 100, 100, 100, 181, 100, 100, 141],
       [115, 304, 100, 100, 100, 140, 100, 100, 100, 100],
       [183, 177, 129, 100, 175, 213, 141, 100, 100, 176],
       [100, 100, 100, 317, 102, 123, 183, 266, 100, 290],
       [216, 100, 100, 100, 100, 100, 100, 100, 178, 128],
       [100, 161, 100, 100, 207, 100, 100, 100, 112, 100],
       [117, 173, 100, 207, 100, 173, 100, 100, 100, 100]])

Numpy provides a number of already vectorized mathematical functions (ufuncs again). e.g. round, fix and all basic trigonometric (sin, cos, tan, sic), exponential (exp, exp2, sinh, cosh), and logarithmic functions (log, log10, log2).

Missing values in numpy

You can handle missing values in Numpy in two different ways:

  1. By using the np.nan (not a number) special type
  2. By casting a numpy array into a Masked array using the ma module

Using the masked array approach, while somehow more cumbersome, allows more flexibility

In [134]:
from numpy import ma
In [135]:
b[5,5] = -999.
In [136]:
print(b)
[[   0.   10.   20.   30.   40.   50.   60.   70.   80.   90.]
 [   1.   11.   21.   31.   41.   51.   61.   71.   81.   91.]
 [   2.   12.   22.   32.   42.   52.   62.   72.   82.   92.]
 [   3.   13.   23.   33.   43.   53.   63.   73.   83.   93.]
 [   4.   14.   24.   34.   44.   54.   64.   74.   84.   94.]
 [   5.   15.   25.   35.   45. -999.   65.   75.   85.   95.]
 [   6.   16.   26.   36.   46.   56.   66.   76.   86.   96.]
 [   7.   17.   27.   37.   47.   57.   67.   77.   87.   97.]
 [   8.   18.   28.   38.   48.   58.   68.   78.   88.   98.]
 [   9.   19.   29.   39.   49.   59.   69.   79.   89.   99.]]
In [137]:
c = b.copy()
In [138]:
c[c == -999.] = np.nan
In [139]:
print(c)
[[  0.  10.  20.  30.  40.  50.  60.  70.  80.  90.]
 [  1.  11.  21.  31.  41.  51.  61.  71.  81.  91.]
 [  2.  12.  22.  32.  42.  52.  62.  72.  82.  92.]
 [  3.  13.  23.  33.  43.  53.  63.  73.  83.  93.]
 [  4.  14.  24.  34.  44.  54.  64.  74.  84.  94.]
 [  5.  15.  25.  35.  45.  nan  65.  75.  85.  95.]
 [  6.  16.  26.  36.  46.  56.  66.  76.  86.  96.]
 [  7.  17.  27.  37.  47.  57.  67.  77.  87.  97.]
 [  8.  18.  28.  38.  48.  58.  68.  78.  88.  98.]
 [  9.  19.  29.  39.  49.  59.  69.  79.  89.  99.]]

nan values propagates automatically when using aggregation functions

In [140]:
c.mean(1)
Out[140]:
array([ 45.,  46.,  47.,  48.,  49.,  nan,  51.,  52.,  53.,  54.])

Unless you use one of the - few - NaN functions

In [141]:
np.nansum(c,axis=1)
Out[141]:
array([ 450.,  460.,  470.,  480.,  490.,  445.,  510.,  520.,  530.,  540.])

Now we're gonna see how to use the ma (Masked Arrays) module

In [142]:
b = ma.masked_values(b, -999.)
In [143]:
print(b)
[[0.0 10.0 20.0 30.0 40.0 50.0 60.0 70.0 80.0 90.0]
 [1.0 11.0 21.0 31.0 41.0 51.0 61.0 71.0 81.0 91.0]
 [2.0 12.0 22.0 32.0 42.0 52.0 62.0 72.0 82.0 92.0]
 [3.0 13.0 23.0 33.0 43.0 53.0 63.0 73.0 83.0 93.0]
 [4.0 14.0 24.0 34.0 44.0 54.0 64.0 74.0 84.0 94.0]
 [5.0 15.0 25.0 35.0 45.0 -- 65.0 75.0 85.0 95.0]
 [6.0 16.0 26.0 36.0 46.0 56.0 66.0 76.0 86.0 96.0]
 [7.0 17.0 27.0 37.0 47.0 57.0 67.0 77.0 87.0 97.0]
 [8.0 18.0 28.0 38.0 48.0 58.0 68.0 78.0 88.0 98.0]
 [9.0 19.0 29.0 39.0 49.0 59.0 69.0 79.0 89.0 99.0]]

b has now two new attributes: data, and mask

In [144]:
print(b.data) 
[[   0.   10.   20.   30.   40.   50.   60.   70.   80.   90.]
 [   1.   11.   21.   31.   41.   51.   61.   71.   81.   91.]
 [   2.   12.   22.   32.   42.   52.   62.   72.   82.   92.]
 [   3.   13.   23.   33.   43.   53.   63.   73.   83.   93.]
 [   4.   14.   24.   34.   44.   54.   64.   74.   84.   94.]
 [   5.   15.   25.   35.   45. -999.   65.   75.   85.   95.]
 [   6.   16.   26.   36.   46.   56.   66.   76.   86.   96.]
 [   7.   17.   27.   37.   47.   57.   67.   77.   87.   97.]
 [   8.   18.   28.   38.   48.   58.   68.   78.   88.   98.]
 [   9.   19.   29.   39.   49.   59.   69.   79.   89.   99.]]
In [145]:
print(b.mask)
[[False False False False False False False False False False]
 [False False False False False False False False False False]
 [False False False False False False False False False False]
 [False False False False False False False False False False]
 [False False False False False False False False False False]
 [False False False False False  True False False False False]
 [False False False False False False False False False False]
 [False False False False False False False False False False]
 [False False False False False False False False False False]
 [False False False False False False False False False False]]
In [146]:
b.mean(1)
Out[146]:
masked_array(data = [45.0 46.0 47.0 48.0 49.0 49.44444444444444 51.0 52.0 53.0 54.0],
             mask = [False False False False False False False False False False],
       fill_value = 1e+20)
In [147]:
c = b.mean(1)
In [148]:
c.shape
Out[148]:
(10,)

Masked values are just ignored in calculations

Other masked array functions allow to set masked values according to some conditions, all (I think) functions exposed for regular numpy arrays are available.

In [ ]:
b = ma.
In [151]:
b
Out[151]:
masked_array(data =
 [[0.0 10.0 20.0 30.0 40.0 50.0 60.0 70.0 80.0 90.0]
 [1.0 11.0 21.0 31.0 41.0 51.0 61.0 71.0 81.0 91.0]
 [2.0 12.0 22.0 32.0 42.0 52.0 62.0 72.0 82.0 92.0]
 [3.0 13.0 23.0 33.0 43.0 53.0 63.0 73.0 83.0 93.0]
 [4.0 14.0 24.0 34.0 44.0 54.0 64.0 74.0 84.0 94.0]
 [5.0 15.0 25.0 35.0 45.0 -- 65.0 75.0 85.0 95.0]
 [6.0 16.0 26.0 36.0 46.0 56.0 66.0 76.0 86.0 96.0]
 [7.0 17.0 27.0 37.0 47.0 57.0 67.0 77.0 87.0 97.0]
 [8.0 18.0 28.0 38.0 48.0 58.0 68.0 78.0 88.0 98.0]
 [9.0 19.0 29.0 39.0 49.0 59.0 69.0 79.0 89.0 99.0]],
             mask =
 [[False False False False False False False False False False]
 [False False False False False False False False False False]
 [False False False False False False False False False False]
 [False False False False False False False False False False]
 [False False False False False False False False False False]
 [False False False False False  True False False False False]
 [False False False False False False False False False False]
 [False False False False False False False False False False]
 [False False False False False False False False False False]
 [False False False False False False False False False False]],
       fill_value = -999.0)
In [152]:
b.mean(0)
Out[152]:
masked_array(data = [4.5 14.5 24.5 34.5 44.5 54.44444444444444 64.5 74.5 84.5 94.5],
             mask = [False False False False False False False False False False],
       fill_value = 1e+20)

Random numbers

You can generate random numbers coming from a wide range of distributions

In [ ]:
np.random.

If you want more distributions, and more flexibility, you need to look into scipy.stats.distributions (see the statistical modelling notebook)

Example 1: Normal distribution

$$P(x) = \frac{1}{\sqrt{2\pi\sigma^2}}\exp\left[-\frac{(x-\mu)^2}{2\sigma^2}\right]$$
In [154]:
x = np.random.normal(0,1,1000)
In [155]:
x = np.random.standard_normal(1000)
In [156]:
_ = plt.hist(x, color='0.6',normed=True)

Example 2: Poisson distribution

\begin{equation*} P\left( x \right) = \frac{{e^{ - \lambda } \lambda ^x }}{{x!}} \end{equation*}
In [157]:
poiss = np.random.poisson(lam=4, size=(10000))
In [158]:
#a = np.array([len(np.where(poiss==x)[0]) for x in np.unique(poiss)])
In [159]:
a = np.array([sum(poiss==x) for x in np.unique(poiss)])
In [160]:
f, ax = plt.subplots()
ax.plot(np.unique(poiss), a/10000., '-', color='grey')
ax.plot(np.unique(poiss), a/10000., 'o', color='purple', markersize=10)
ax.set_title("Poisson Empirical PMF, $\lambda = 4$", fontsize=20)
ax.set_xlim(-1,17)
ax.set_ylim(0,0.25)
[l.set_fontsize(16) for l in ax.xaxis.get_ticklabels()]
[l.set_fontsize(16) for l in ax.yaxis.get_ticklabels()];
In [161]:
import math # the math module exposes mathematical functions operating on scalars
In [162]:
lam = 4; end = 20 # lambda is a reserved word in python 
In [165]:
P = [math.exp(-lam)*lam**x/math.factorial(x) for x in range(end)] # note the use of a list comprehension
In [166]:
f, ax = plt.subplots()
ax.plot(P, '-', color='grey')
ax.plot(P, 'o', color='purple', markersize=10)
ax.set_title("Poisson PMF, $\lambda = 4$", fontsize=20)
ax.set_xlim(-1,17)
ax.set_ylim(0,0.25)
[l.set_fontsize(16) for l in ax.xaxis.get_ticklabels()]
[l.set_fontsize(16) for l in ax.yaxis.get_ticklabels()];
In [ ]: