%load_ext load_style
%load_style talk.css

Numpy¶

import addutils.toc ; addutils.toc.js(ipy_notebook=True)

from IPython.display import Image, HTML

Numpy is the fundamental building block of the so-called Python Scientific Stack

It provides a special data structure, the numpy ndarray (N - dimensional array) which allows the representation of arrays of numerical values, integers, reals, complex
Numpy arrays by default can only hold objects of the same type (e.g. cannot mix integers and floats))
It also provides with random number capability, some convenience simple statistical functions (mean, std, ...) and some essential linear algebra operations

HTML("<iframe src='http://numpy.org/' width=1000 height=500></iframe>")

The conventional way to import numpy is

import numpy as np

import numpy as np

import matplotlib.pyplot as plt
%matplotlib inline

Numpy basics¶

Constructing an numpy array

x = [1,2,3,4] # x is a regular python list

type(x)

list

x = np.array(x)

type(x)

numpy.ndarray

a = np.array(range(10))

a

array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])

x = [[1,2,3],[4,5,6]] # nested lists

x = np.array(x)

x

array([[1, 2, 3],
       [4, 5, 6]])

x.shape

(2, 3)

The shape attribute is a tuple containing the dimensions along each axis (0 = rows, 1 = columns, ...)

x = np.arange(1000).reshape((10,10,10))

x.shape

(10, 10, 10)

Be careful: keep in mind references Vs. copies of arrays!

a

array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])

x = a

x

array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])

a[2] = 99

a

array([ 0,  1, 99,  3,  4,  5,  6,  7,  8,  9])

x

array([ 0,  1, 99,  3,  4,  5,  6,  7,  8,  9])

x has changed !, because x = a only means you create a new reference pointing to the same object as the array a

We can see that by using the id function, which returns the object's memory adress

id(a)

4674173648

id(x)

4674173648

If you want to create a copy instead of a reference, you need to do it explicitely

x = a.copy()

id(x)

4674189152

a[3] = 888

a

array([  0,   1,  99, 888,   4,   5,   6,   7,   8,   9])

x

array([ 0,  1, 99,  3,  4,  5,  6,  7,  8,  9])

Numpy array types¶

Numpy tries and guess the correct type of the numerical values if you pass a list, here x is of type integer 64

x.dtype

dtype('int64')

You can change that using the method astype()

x = x.astype(np.float)

x.dtype

dtype('float64')

x

array([  0.,   1.,  99.,   3.,   4.,   5.,   6.,   7.,   8.,   9.])

you can mix different types when creating an array from a list, but numpy will cast all values to the safest dtype

x = np.array([1.0,1,1])

x

array([ 1.,  1.,  1.])

You can (but why would you ?) create arrays of native Python types, i.e. here an array containing a dictionnary and a tuple

x = np.array([{'A':1},(1,3)])

In that case the dtype is Object ('O')

x

array([{'A': 1}, (1, 3)], dtype=object)

Some convenience functions for creating arrays¶

np.arange(start, stop, step, dtype=None)

np.arange(100).shape

(100,)

np.arange(0,100+10,10)

array([  0,  10,  20,  30,  40,  50,  60,  70,  80,  90, 100])

np.linspace(start, stop, num=50, *endpoint=True*, *retstep=False*)

np.linspace(0,0.5,10, endpoint=False)

array([ 0.  ,  0.05,  0.1 ,  0.15,  0.2 ,  0.25,  0.3 ,  0.35,  0.4 ,  0.45])

same as np.arange(0,0.5, 0.05)

np.arange(0,0.5,0.05)

array([ 0.  ,  0.05,  0.1 ,  0.15,  0.2 ,  0.25,  0.3 ,  0.35,  0.4 ,  0.45])

np.ones(shape, dtype=None, order='C') creates an array of ones, similar function is np.zeros

np.ones((10,12))

array([[ 1.,  1.,  1.,  1.,  1.,  1.,  1.,  1.,  1.,  1.,  1.,  1.],
       [ 1.,  1.,  1.,  1.,  1.,  1.,  1.,  1.,  1.,  1.,  1.,  1.],
       [ 1.,  1.,  1.,  1.,  1.,  1.,  1.,  1.,  1.,  1.,  1.,  1.],
       [ 1.,  1.,  1.,  1.,  1.,  1.,  1.,  1.,  1.,  1.,  1.,  1.],
       [ 1.,  1.,  1.,  1.,  1.,  1.,  1.,  1.,  1.,  1.,  1.,  1.],
       [ 1.,  1.,  1.,  1.,  1.,  1.,  1.,  1.,  1.,  1.,  1.,  1.],
       [ 1.,  1.,  1.,  1.,  1.,  1.,  1.,  1.,  1.,  1.,  1.,  1.],
       [ 1.,  1.,  1.,  1.,  1.,  1.,  1.,  1.,  1.,  1.,  1.,  1.],
       [ 1.,  1.,  1.,  1.,  1.,  1.,  1.,  1.,  1.,  1.,  1.,  1.],
       [ 1.,  1.,  1.,  1.,  1.,  1.,  1.,  1.,  1.,  1.,  1.,  1.]])

np.empty can be used to pre-allocate "empty" arrays with small memory footprint

np.empty((5,5))

array([[  0.00000000e+000,  -2.31584193e+077,   2.12215769e-314,
          1.97626258e-323,   0.00000000e+000],
       [  0.00000000e+000,   0.00000000e+000,   1.10413299e+283,
          0.00000000e+000,   0.00000000e+000],
       [ -2.67371240e+146,   2.16460678e-314,   2.30735947e-314,
          4.77469475e+092,   0.00000000e+000],
       [  0.00000000e+000,  -1.33336491e+267,   0.00000000e+000,
          0.00000000e+000,  -7.21732008e-308],
       [  0.00000000e+000,   0.00000000e+000,   1.28114332e+235,
          0.00000000e+000,   0.00000000e+000]])

Array indexing¶

Simple indexing¶

a = np.arange(25,dtype=float).reshape(5,5) # creating a 2D array, note the dtype

print(a)

[[  0.   1.   2.   3.   4.]
 [  5.   6.   7.   8.   9.]
 [ 10.  11.  12.  13.  14.]
 [ 15.  16.  17.  18.  19.]
 [ 20.  21.  22.  23.  24.]]

a.shape

(5, 5)

a[0,1] # FIRST row, SECOND column, remember, Python indexing is zero-based

1.0

a[:,0] # first column

array([  0.,   5.,  10.,  15.,  20.])

a[1,:] # second row

array([ 5.,  6.,  7.,  8.,  9.])

a[:,2:] # all rows from third colum

array([[  2.,   3.,   4.],
       [  7.,   8.,   9.],
       [ 12.,  13.,  14.],
       [ 17.,  18.,  19.],
       [ 22.,  23.,  24.]])

a[-1,:] # last row

array([ 20.,  21.,  22.,  23.,  24.])

a[:,::-1] # handy: reverses column order

array([[  4.,   3.,   2.,   1.,   0.],
       [  9.,   8.,   7.,   6.,   5.],
       [ 14.,  13.,  12.,  11.,  10.],
       [ 19.,  18.,  17.,  16.,  15.],
       [ 24.,  23.,  22.,  21.,  20.]])

a

array([[  0.,   1.,   2.,   3.,   4.],
       [  5.,   6.,   7.,   8.,   9.],
       [ 10.,  11.,  12.,  13.,  14.],
       [ 15.,  16.,  17.,  18.,  19.],
       [ 20.,  21.,  22.,  23.,  24.]])

slices allows to easily select contiguous row / colums

a[0:2,1:3] # first two rows, and columns 2 to 3

array([[ 1.,  2.],
       [ 6.,  7.]])

When you want to select non-contiguous (or repeated) rows / columns, you can use the ix_ construct

a[np.ix_([0,1,3],[1,1,3])]

array([[  1.,   1.,   3.],
       [  6.,   6.,   8.],
       [ 16.,  16.,  18.]])

a

array([[  0.,   1.,   2.,   3.,   4.],
       [  5.,   6.,   7.,   8.,   9.],
       [ 10.,  11.,  12.,  13.,  14.],
       [ 15.,  16.,  17.,  18.,  19.],
       [ 20.,  21.,  22.,  23.,  24.]])

Boolean or conditional indexing¶

a = a[a>10]

print(a)

[ 11.  12.  13.  14.  15.  16.  17.  18.  19.  20.  21.  22.  23.  24.]

Array broadcasting¶

Simple possible case: scalar broadcasting¶

x = np.array([0,1,2,3]); print(x)

[0 1 2 3]

x.shape

(4,)

x + 3

array([3, 4, 5, 6])

![](images/broadcast_scalar.png)

2D broadcasting¶

x = np.arange(12).reshape((3,4))

x

array([[ 0,  1,  2,  3],
       [ 4,  5,  6,  7],
       [ 8,  9, 10, 11]])

x.shape

(3, 4)

y = np.arange(3).reshape((3,1)); # or y = np.arange(3)[np.newaxis,:] or y = np.arange(3)[None,:]

y.shape

(3, 1)

x + y

array([[ 0,  1,  2,  3],
       [ 5,  6,  7,  8],
       [10, 11, 12, 13]])

![](images/broadcast_2D.png)

3D broadcasting¶

x = np.arange(15).reshape((3, 5))

x.shape

(3, 5)

y = np.ones(8)

z = x[...,np.newaxis] + y

print(z)

[[[  1.   1.   1.   1.   1.   1.   1.   1.]
  [  2.   2.   2.   2.   2.   2.   2.   2.]
  [  3.   3.   3.   3.   3.   3.   3.   3.]
  [  4.   4.   4.   4.   4.   4.   4.   4.]
  [  5.   5.   5.   5.   5.   5.   5.   5.]]

 [[  6.   6.   6.   6.   6.   6.   6.   6.]
  [  7.   7.   7.   7.   7.   7.   7.   7.]
  [  8.   8.   8.   8.   8.   8.   8.   8.]
  [  9.   9.   9.   9.   9.   9.   9.   9.]
  [ 10.  10.  10.  10.  10.  10.  10.  10.]]

 [[ 11.  11.  11.  11.  11.  11.  11.  11.]
  [ 12.  12.  12.  12.  12.  12.  12.  12.]
  [ 13.  13.  13.  13.  13.  13.  13.  13.]
  [ 14.  14.  14.  14.  14.  14.  14.  14.]
  [ 15.  15.  15.  15.  15.  15.  15.  15.]]]

z.shape

(3, 5, 8)

![](images/array_3x5x8.png)

Array methods¶

We have already seen some array methods above, in this section I just dwelve into their general behavior and present some convenient array methods

In python, everything is an object, numpy ndarrays are no exceptions, and they expose a number of methods, which are equivalent to the call the respective numpy function, but can be more convenient to use, and - marginally - faster

a.shape # shape of a (like *size* in matlab)

(14,)

np.shape(a)

(14,)

%timeit a.mean() # The %timeit magic cell function times the execution of an expression

The slowest run took 15.17 times longer than the fastest. This could mean that an intermediate result is being cached.
100000 loops, best of 3: 4.58 µs per loop

%timeit np.mean(a)

The slowest run took 7.83 times longer than the fastest. This could mean that an intermediate result is being cached.
100000 loops, best of 3: 5.06 µs per loop

b = np.arange(100,dtype=float).reshape(10,10) # creating a 2D array, note the dtype

b.shape

(10, 10)

b

array([[  0.,   1.,   2.,   3.,   4.,   5.,   6.,   7.,   8.,   9.],
       [ 10.,  11.,  12.,  13.,  14.,  15.,  16.,  17.,  18.,  19.],
       [ 20.,  21.,  22.,  23.,  24.,  25.,  26.,  27.,  28.,  29.],
       [ 30.,  31.,  32.,  33.,  34.,  35.,  36.,  37.,  38.,  39.],
       [ 40.,  41.,  42.,  43.,  44.,  45.,  46.,  47.,  48.,  49.],
       [ 50.,  51.,  52.,  53.,  54.,  55.,  56.,  57.,  58.,  59.],
       [ 60.,  61.,  62.,  63.,  64.,  65.,  66.,  67.,  68.,  69.],
       [ 70.,  71.,  72.,  73.,  74.,  75.,  76.,  77.,  78.,  79.],
       [ 80.,  81.,  82.,  83.,  84.,  85.,  86.,  87.,  88.,  89.],
       [ 90.,  91.,  92.,  93.,  94.,  95.,  96.,  97.,  98.,  99.]])

b.sum(0) ### summing over the first axis ('rows')

array([ 450.,  460.,  470.,  480.,  490.,  500.,  510.,  520.,  530.,  540.])

np.sum(b, axis=0)

array([ 450.,  460.,  470.,  480.,  490.,  500.,  510.,  520.,  530.,  540.])

b[:,0].sum()

450.0

swapaxes method swap the position of 2 axes

b = b.swapaxes(1,0)

b

array([[  0.,  10.,  20.,  30.,  40.,  50.,  60.,  70.,  80.,  90.],
       [  1.,  11.,  21.,  31.,  41.,  51.,  61.,  71.,  81.,  91.],
       [  2.,  12.,  22.,  32.,  42.,  52.,  62.,  72.,  82.,  92.],
       [  3.,  13.,  23.,  33.,  43.,  53.,  63.,  73.,  83.,  93.],
       [  4.,  14.,  24.,  34.,  44.,  54.,  64.,  74.,  84.,  94.],
       [  5.,  15.,  25.,  35.,  45.,  55.,  65.,  75.,  85.,  95.],
       [  6.,  16.,  26.,  36.,  46.,  56.,  66.,  76.,  86.,  96.],
       [  7.,  17.,  27.,  37.,  47.,  57.,  67.,  77.,  87.,  97.],
       [  8.,  18.,  28.,  38.,  48.,  58.,  68.,  78.,  88.,  98.],
       [  9.,  19.,  29.,  39.,  49.,  59.,  69.,  79.,  89.,  99.]])

here equivalent to transpose() on 2D array

b = b.transpose()

b

array([[  0.,   1.,   2.,   3.,   4.,   5.,   6.,   7.,   8.,   9.],
       [ 10.,  11.,  12.,  13.,  14.,  15.,  16.,  17.,  18.,  19.],
       [ 20.,  21.,  22.,  23.,  24.,  25.,  26.,  27.,  28.,  29.],
       [ 30.,  31.,  32.,  33.,  34.,  35.,  36.,  37.,  38.,  39.],
       [ 40.,  41.,  42.,  43.,  44.,  45.,  46.,  47.,  48.,  49.],
       [ 50.,  51.,  52.,  53.,  54.,  55.,  56.,  57.,  58.,  59.],
       [ 60.,  61.,  62.,  63.,  64.,  65.,  66.,  67.,  68.,  69.],
       [ 70.,  71.,  72.,  73.,  74.,  75.,  76.,  77.,  78.,  79.],
       [ 80.,  81.,  82.,  83.,  84.,  85.,  86.,  87.,  88.,  89.],
       [ 90.,  91.,  92.,  93.,  94.,  95.,  96.,  97.,  98.,  99.]])

b = np.transpose(b)

b

array([[  0.,  10.,  20.,  30.,  40.,  50.,  60.,  70.,  80.,  90.],
       [  1.,  11.,  21.,  31.,  41.,  51.,  61.,  71.,  81.,  91.],
       [  2.,  12.,  22.,  32.,  42.,  52.,  62.,  72.,  82.,  92.],
       [  3.,  13.,  23.,  33.,  43.,  53.,  63.,  73.,  83.,  93.],
       [  4.,  14.,  24.,  34.,  44.,  54.,  64.,  74.,  84.,  94.],
       [  5.,  15.,  25.,  35.,  45.,  55.,  65.,  75.,  85.,  95.],
       [  6.,  16.,  26.,  36.,  46.,  56.,  66.,  76.,  86.,  96.],
       [  7.,  17.,  27.,  37.,  47.,  57.,  67.,  77.,  87.,  97.],
       [  8.,  18.,  28.,  38.,  48.,  58.,  68.,  78.,  88.,  98.],
       [  9.,  19.,  29.,  39.,  49.,  59.,  69.,  79.,  89.,  99.]])

argsort, argmin and argmax returns the indices that would sort an array (along an axis), and the indices of min and max values

a = np.array([11,13,15,17,19,18,16,14,12,10])

a.argsort()

array([9, 0, 8, 1, 7, 2, 6, 3, 5, 4])

a.sort(); print(a)

[10 11 12 13 14 15 16 17 18 19]

array functions¶

we've seen above array methods, available through the object-oriented interface of Python, now we're gonna see a few useful numpy functions

from scipy import misc

img = misc.ascent()

img.shape

(512, 512)

plt.imshow(img, cmap=plt.get_cmap('gray'))

<matplotlib.image.AxesImage at 0x12196e710>

plt.imshow(np.tile(img, (2,1)), cmap=plt.get_cmap('gray'))

<matplotlib.image.AxesImage at 0x12224a748>

plt.imshow(np.tile(img, (1,2)), cmap=plt.get_cmap('gray'))

<matplotlib.image.AxesImage at 0x122472c18>

plt.imshow(np.repeat(img, 2, axis=1), cmap=plt.get_cmap('gray'))

<matplotlib.image.AxesImage at 0x12624a4a8>

plt.imshow(np.resize(img, (750,512)), cmap=plt.get_cmap('gray'))

<matplotlib.image.AxesImage at 0x126b49e10>

Universal functions¶

A universal function (or ufunc for short) is a function that operates on ndarrays in an element-by-element fashion, supporting array broadcasting, type casting, and several other standard features. That is, a ufunc is a “vectorized” wrapper for a function that takes a fixed number of scalar inputs and produces a fixed number of scalar outputs.

x = np.sin(np.linspace(-2*np.pi, 2*np.pi, 10000).reshape((100,100)))

plt.imshow(x, cmap=plt.get_cmap('gray_r'))

<matplotlib.image.AxesImage at 0x126a7e2e8>

vectorization

def max100(x): 
    """
    returns the max between x and 100
    """

    return(max(x,100))

x = np.random.normal(loc=50, scale=100, size=(10,10))

x

array([[  85.96882073,   -4.85419519,   40.93647613,   61.34115088,
         -74.07013883,  235.86121438,   67.56926204,  165.35244879,
         -11.55018803,  131.88944278],
       [  17.97899749,  -87.03910947,   69.88616634,  148.71165995,
          97.35570989,  -91.62212936,   10.4564738 ,  135.66393545,
          60.79207716,   98.40147212],
       [ 186.73376068,   51.47336231,   35.99202655,  -10.96664629,
          70.02658387,  163.90961498,   21.285523  ,  167.29285624,
         123.9716435 ,  134.59933669],
       [-161.2336334 ,   84.41361207,  242.62581312,  -81.96930962,
          87.03343265,   47.35928614,  181.98841243,   46.51274503,
        -135.52195286,  141.47775003],
       [ 115.18655372,  304.7347711 ,   36.96549786,    7.20365519,
          84.28024952,  140.34426477,   37.46342323,   96.49298757,
          99.2588406 ,   29.08305933],
       [ 183.62732115,  177.41355409,  129.54490258,  -41.45397043,
         175.705348  ,  213.84943131,  141.95564816,  -38.88866584,
         -62.11166971,  176.66332869],
       [  18.15725411,    6.39192946,   -6.9505756 ,  317.57223785,
         102.66031738,  123.0242222 ,  183.36227507,  266.52537873,
          48.47626772,  290.83283544],
       [ 216.58567703,   19.43401723,  -43.4156069 ,  -76.80477233,
          46.48375053,   31.36210677,   30.6872406 ,  -87.4210159 ,
         178.69093388,  128.07763315],
       [  47.06744496,  161.76900474,   85.54501735,   96.88438828,
         207.819342  ,   57.28317261,   26.71382561, -131.37734719,
         112.56385944,  -84.41433904],
       [ 117.54866027,  173.06462723,   34.66758558,  207.8307638 ,
          18.34003621,  173.22540381,   57.65330187,   47.48087741,
         -25.0513813 , -100.45954286]])

max100(x)

---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-129-156bf71d3f74> in <module>()
----> 1 max100(x)

<ipython-input-126-f7f82149988e> in max100(x)
      4     """
      5 
----> 6     return(max(x,100))

ValueError: The truth value of an array with more than one element is ambiguous. Use a.any() or a.all()

that fails because max100 is just a regular python function, that expects a single object

but we can build easily a vectorized version of the same function (i.e. building a ufuncmm)

vmax100 = np.vectorize(max100)

z = vmax100(x)

z.shape

(10, 10)

z

array([[100, 100, 100, 100, 100, 235, 100, 165, 100, 131],
       [100, 100, 100, 148, 100, 100, 100, 135, 100, 100],
       [186, 100, 100, 100, 100, 163, 100, 167, 123, 134],
       [100, 100, 242, 100, 100, 100, 181, 100, 100, 141],
       [115, 304, 100, 100, 100, 140, 100, 100, 100, 100],
       [183, 177, 129, 100, 175, 213, 141, 100, 100, 176],
       [100, 100, 100, 317, 102, 123, 183, 266, 100, 290],
       [216, 100, 100, 100, 100, 100, 100, 100, 178, 128],
       [100, 161, 100, 100, 207, 100, 100, 100, 112, 100],
       [117, 173, 100, 207, 100, 173, 100, 100, 100, 100]])

Numpy provides a number of already vectorized mathematical functions (ufuncs again). e.g. round, fix and all basic trigonometric (sin, cos, tan, sic), exponential (exp, exp2, sinh, cosh), and logarithmic functions (log, log10, log2).

Missing values in numpy¶

You can handle missing values in Numpy in two different ways:

By using the np.nan (not a number) special type
By casting a numpy array into a Masked array using the ma module

Using the masked array approach, while somehow more cumbersome, allows more flexibility

from numpy import ma

b[5,5] = -999.

print(b)

[[   0.   10.   20.   30.   40.   50.   60.   70.   80.   90.]
 [   1.   11.   21.   31.   41.   51.   61.   71.   81.   91.]
 [   2.   12.   22.   32.   42.   52.   62.   72.   82.   92.]
 [   3.   13.   23.   33.   43.   53.   63.   73.   83.   93.]
 [   4.   14.   24.   34.   44.   54.   64.   74.   84.   94.]
 [   5.   15.   25.   35.   45. -999.   65.   75.   85.   95.]
 [   6.   16.   26.   36.   46.   56.   66.   76.   86.   96.]
 [   7.   17.   27.   37.   47.   57.   67.   77.   87.   97.]
 [   8.   18.   28.   38.   48.   58.   68.   78.   88.   98.]
 [   9.   19.   29.   39.   49.   59.   69.   79.   89.   99.]]

c = b.copy()

c[c == -999.] = np.nan

print(c)

[[  0.  10.  20.  30.  40.  50.  60.  70.  80.  90.]
 [  1.  11.  21.  31.  41.  51.  61.  71.  81.  91.]
 [  2.  12.  22.  32.  42.  52.  62.  72.  82.  92.]
 [  3.  13.  23.  33.  43.  53.  63.  73.  83.  93.]
 [  4.  14.  24.  34.  44.  54.  64.  74.  84.  94.]
 [  5.  15.  25.  35.  45.  nan  65.  75.  85.  95.]
 [  6.  16.  26.  36.  46.  56.  66.  76.  86.  96.]
 [  7.  17.  27.  37.  47.  57.  67.  77.  87.  97.]
 [  8.  18.  28.  38.  48.  58.  68.  78.  88.  98.]
 [  9.  19.  29.  39.  49.  59.  69.  79.  89.  99.]]

nan values propagates automatically when using aggregation functions

c.mean(1)

array([ 45.,  46.,  47.,  48.,  49.,  nan,  51.,  52.,  53.,  54.])

Unless you use one of the - few - NaN functions

np.nansum(c,axis=1)

array([ 450.,  460.,  470.,  480.,  490.,  445.,  510.,  520.,  530.,  540.])

Now we're gonna see how to use the ma (Masked Arrays) module

b = ma.masked_values(b, -999.)

print(b)

[[0.0 10.0 20.0 30.0 40.0 50.0 60.0 70.0 80.0 90.0]
 [1.0 11.0 21.0 31.0 41.0 51.0 61.0 71.0 81.0 91.0]
 [2.0 12.0 22.0 32.0 42.0 52.0 62.0 72.0 82.0 92.0]
 [3.0 13.0 23.0 33.0 43.0 53.0 63.0 73.0 83.0 93.0]
 [4.0 14.0 24.0 34.0 44.0 54.0 64.0 74.0 84.0 94.0]
 [5.0 15.0 25.0 35.0 45.0 -- 65.0 75.0 85.0 95.0]
 [6.0 16.0 26.0 36.0 46.0 56.0 66.0 76.0 86.0 96.0]
 [7.0 17.0 27.0 37.0 47.0 57.0 67.0 77.0 87.0 97.0]
 [8.0 18.0 28.0 38.0 48.0 58.0 68.0 78.0 88.0 98.0]
 [9.0 19.0 29.0 39.0 49.0 59.0 69.0 79.0 89.0 99.0]]

b has now two new attributes: data, and mask

print(b.data)

[[   0.   10.   20.   30.   40.   50.   60.   70.   80.   90.]
 [   1.   11.   21.   31.   41.   51.   61.   71.   81.   91.]
 [   2.   12.   22.   32.   42.   52.   62.   72.   82.   92.]
 [   3.   13.   23.   33.   43.   53.   63.   73.   83.   93.]
 [   4.   14.   24.   34.   44.   54.   64.   74.   84.   94.]
 [   5.   15.   25.   35.   45. -999.   65.   75.   85.   95.]
 [   6.   16.   26.   36.   46.   56.   66.   76.   86.   96.]
 [   7.   17.   27.   37.   47.   57.   67.   77.   87.   97.]
 [   8.   18.   28.   38.   48.   58.   68.   78.   88.   98.]
 [   9.   19.   29.   39.   49.   59.   69.   79.   89.   99.]]

print(b.mask)

[[False False False False False False False False False False]
 [False False False False False False False False False False]
 [False False False False False False False False False False]
 [False False False False False False False False False False]
 [False False False False False False False False False False]
 [False False False False False  True False False False False]
 [False False False False False False False False False False]
 [False False False False False False False False False False]
 [False False False False False False False False False False]
 [False False False False False False False False False False]]

b.mean(1)

masked_array(data = [45.0 46.0 47.0 48.0 49.0 49.44444444444444 51.0 52.0 53.0 54.0],
             mask = [False False False False False False False False False False],
       fill_value = 1e+20)

c = b.mean(1)

c.shape

(10,)

Masked values are just ignored in calculations

Other masked array functions allow to set masked values according to some conditions, all (I think) functions exposed for regular numpy arrays are available.

b = ma.

b

masked_array(data =
 [[0.0 10.0 20.0 30.0 40.0 50.0 60.0 70.0 80.0 90.0]
 [1.0 11.0 21.0 31.0 41.0 51.0 61.0 71.0 81.0 91.0]
 [2.0 12.0 22.0 32.0 42.0 52.0 62.0 72.0 82.0 92.0]
 [3.0 13.0 23.0 33.0 43.0 53.0 63.0 73.0 83.0 93.0]
 [4.0 14.0 24.0 34.0 44.0 54.0 64.0 74.0 84.0 94.0]
 [5.0 15.0 25.0 35.0 45.0 -- 65.0 75.0 85.0 95.0]
 [6.0 16.0 26.0 36.0 46.0 56.0 66.0 76.0 86.0 96.0]
 [7.0 17.0 27.0 37.0 47.0 57.0 67.0 77.0 87.0 97.0]
 [8.0 18.0 28.0 38.0 48.0 58.0 68.0 78.0 88.0 98.0]
 [9.0 19.0 29.0 39.0 49.0 59.0 69.0 79.0 89.0 99.0]],
             mask =
 [[False False False False False False False False False False]
 [False False False False False False False False False False]
 [False False False False False False False False False False]
 [False False False False False False False False False False]
 [False False False False False False False False False False]
 [False False False False False  True False False False False]
 [False False False False False False False False False False]
 [False False False False False False False False False False]
 [False False False False False False False False False False]
 [False False False False False False False False False False]],
       fill_value = -999.0)

b.mean(0)

masked_array(data = [4.5 14.5 24.5 34.5 44.5 54.44444444444444 64.5 74.5 84.5 94.5],
             mask = [False False False False False False False False False False],
       fill_value = 1e+20)

Random numbers¶

You can generate random numbers coming from a wide range of distributions

np.random.

If you want more distributions, and more flexibility, you need to look into scipy.stats.distributions (see the statistical modelling notebook)

Example 1: Normal distribution

$$P(x) = \frac{1}{\sqrt{2\pi\sigma^2}}\exp\left[-\frac{(x-\mu)^2}{2\sigma^2}\right]$$

x = np.random.normal(0,1,1000)

x = np.random.standard_normal(1000)

_ = plt.hist(x, color='0.6',normed=True)

Example 2: Poisson distribution

\begin{equation*} P\left( x \right) = \frac{{e^{ - \lambda } \lambda ^x }}{{x!}} \end{equation*}

poiss = np.random.poisson(lam=4, size=(10000))

#a = np.array([len(np.where(poiss==x)[0]) for x in np.unique(poiss)])

a = np.array([sum(poiss==x) for x in np.unique(poiss)])

f, ax = plt.subplots()
ax.plot(np.unique(poiss), a/10000., '-', color='grey')
ax.plot(np.unique(poiss), a/10000., 'o', color='purple', markersize=10)
ax.set_title("Poisson Empirical PMF, $\lambda = 4$", fontsize=20)
ax.set_xlim(-1,17)
ax.set_ylim(0,0.25)
[l.set_fontsize(16) for l in ax.xaxis.get_ticklabels()]
[l.set_fontsize(16) for l in ax.yaxis.get_ticklabels()];

import math # the math module exposes mathematical functions operating on scalars

lam = 4; end = 20 # lambda is a reserved word in python

P = [math.exp(-lam)*lam**x/math.factorial(x) for x in range(end)] # note the use of a list comprehension

f, ax = plt.subplots()
ax.plot(P, '-', color='grey')
ax.plot(P, 'o', color='purple', markersize=10)
ax.set_title("Poisson PMF, $\lambda = 4$", fontsize=20)
ax.set_xlim(-1,17)
ax.set_ylim(0,0.25)
[l.set_fontsize(16) for l in ax.xaxis.get_ticklabels()]
[l.set_fontsize(16) for l in ax.yaxis.get_ticklabels()];

From Wikipedia (http://en.wikipedia.org/wiki/Poisson_distribution)

Discrete variable