A brief introduction to the language Python

NOTE: the addutils package is available at http://www.add-for.com/file/AddUtils-0.5.4-py34.zip

In [4]:
import addutils.toc ; addutils.toc.js(ipy_notebook=True)
Out[4]:

Python is a modern, general-purpose, object-oriented, high-level programming language. It is a scripting language in the sense that python code runs (i.e. each expression is interpreted in turn) into the python interpreter, there is no linking, no compilation:

  • Similar to ruby, perl, php, matlab, R, ...
  • Unlike C, C++, Java, Fortran

It is widely used in science and engineering, and has gain considerable traction in the domain of scientific computing over the past few years

Some positive attributes of Python that are often cited:

  • Simplicity: It is easy to read and easy to learn, almost reads like pseudo-code in many instances
  • Expressive: Fewer lines of code, fewer bugs and easy to maintain.
  • Powerful: Python is not a language you grow out of. It can also be used for large projects, Big Data, High Performance Computing applications, etc.
  • Batteries included: The standard library is huge and includes some really cool libraries

the philosophy of Python

In [5]:
import this
The Zen of Python, by Tim Peters

Beautiful is better than ugly.
Explicit is better than implicit.
Simple is better than complex.
Complex is better than complicated.
Flat is better than nested.
Sparse is better than dense.
Readability counts.
Special cases aren't special enough to break the rules.
Although practicality beats purity.
Errors should never pass silently.
Unless explicitly silenced.
In the face of ambiguity, refuse the temptation to guess.
There should be one-- and preferably only one --obvious way to do it.
Although that way may not be obvious at first unless you're Dutch.
Now is better than never.
Although never is often better than *right* now.
If the implementation is hard to explain, it's a bad idea.
If the implementation is easy to explain, it may be a good idea.
Namespaces are one honking great idea -- let's do more of those!

1. Some elements of syntax

The basics

python scripts suffix .py

Shebang line:

#!/usr/bin/env/python 

or path to your python binary

#!{HOME}/anaconda/bin/python

commented lines are marked by #

In the following IPython notebook cell I'm writing the content of the cell to a file

In [10]:
%%writefile print_upper.py 
#!/Users/nicolasf/anaconda/anaconda/bin/python 
# This is a python script 

import sys # I import the sys module, part of the Python standard library

X = sys.argv[1:] # reading the command line arguments, X is list

X = " ".join(map(str,X)) # transform everything into a string

print(X.upper()) # printing the content, uppercase if applicable
Overwriting print_upper.py
In [11]:
!ls *.py 
load_style.py  print_upper.py talktools.py
In [12]:
!chmod +x print_upper.py # we make the file executable
In [13]:
!./print_upper.py something another thing 1 2 3
SOMETHING ANOTHER THING 1 2 3
In [14]:
!python print_upper.py something another thing 1 2 3
SOMETHING ANOTHER THING 1 2 3
In [15]:
%run print_upper.py something another thing 1 2 3
SOMETHING ANOTHER THING 1 2 3

Variable names

a good idea is to use meaningful variable names in your scripts / notebooks

Can contain only letters, numbers and _ and must NOT begin by a number, also avoid Python reserved names

In [16]:
for = 1
  File "<ipython-input-16-c8a8281ee30d>", line 1
    for = 1
        ^
SyntaxError: invalid syntax

Operators

Assignement operator is =

In [17]:
a = 5 
In [18]:
a * 2
Out[18]:
10
In [19]:
a += 2 # same as a = a + 2
In [20]:
a
Out[20]:
7
In [21]:
a -=2
In [22]:
a
Out[22]:
5

** is used for exponentiation

In [23]:
x = 2
In [24]:
x**2
Out[24]:
4
In [25]:
pow(x,2)
Out[25]:
4

NOTE: The case of integer division

In python 2.7 the ratio of two integers was always an integer, the results were truncated towards 0 if the result was not an integer. This behavior changed from the first version of Python 3. To do integer division in Python 3, use the // operator

In [28]:
9 / 5
Out[28]:
1.8
In [29]:
9 // 5
Out[29]:
1

2. Types and Data structures

Floats

In [30]:
x = 2.0 # can use 2. if you are lazy 
In [31]:
type(x)
Out[31]:
float
In [32]:
x = float(2)
In [33]:
type(x)
Out[33]:
float
In [34]:
x
Out[34]:
2.0

Complex numbers

can be created using the J notation or the complex function

In [35]:
x = 2 + 3J
In [36]:
print(type(x)); print(x)
<class 'complex'>
(2+3j)
In [37]:
x = complex(2, 3)
In [38]:
print(type(x)); print(x)
<class 'complex'>
(2+3j)

Integers

In [46]:
x = 1
In [47]:
type(x)
Out[47]:
int
In [48]:
x = int(1.2) ### will take the integer part 
In [49]:
x
Out[49]:
1
In [50]:
x = 1
In [51]:
type(x)
Out[51]:
int

From Python 3, Long integers and integers have been unified, see https://www.python.org/dev/peps/pep-0237/

In [52]:
x = 2**64
In [53]:
type(x)
Out[53]:
int
In [54]:
x
Out[54]:
18446744073709551616

Booleans

Used to represent True and False. Usually they arise as the result of a logical operation

In [56]:
x = True
In [57]:
type(x)
Out[57]:
bool
In [58]:
x = 1
In [59]:
x == 0
Out[59]:
False
In [60]:
y = (x == 0); y
Out[60]:
False
In [61]:
x = [True, True, False, True]
In [62]:
sum(x)
Out[62]:
3

Strings

You can define a string as any valid characters surrounded by single quotes

In [63]:
sentence = 'The Guide is definitive. Reality is frequently inaccurate.'; print(sentence)
The Guide is definitive. Reality is frequently inaccurate.

Or double quotes

In [64]:
sentence = "I'd take the awe of understanding over the awe of ignorance any day."; print(sentence)
I'd take the awe of understanding over the awe of ignorance any day.

Or triple quotes

In [65]:
sentence = """Time is an illusion.

Lunchtime doubly so."""; print(sentence)
Time is an illusion.

Lunchtime doubly so.
In [66]:
len(sentence) #!
Out[66]:
42

And you can convert the types above (floats, complex, ints, Longs) to a string with the str function

In [67]:
str(complex(2,3))
Out[67]:
'(2+3j)'

A string is a python iterable

You can INDEX a string variable, indexing in Python starts at 0 (not 1): the subscript refers to an offset from the starting position of an iterable, so the first element has an offset of zero

If you want to know more follow why python uses 0-based indexing

In [68]:
sentence[0:4]
Out[68]:
'Time'
In [69]:
sentence[::-1]
Out[69]:
'.os ylbuod emithcnuL\n\n.noisulli na si emiT'

But it is immutable: You cannot change string elements in place

In [70]:
sentence[2] = "b"
---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-70-5a1742dc3dfa> in <module>()
----> 1 sentence[2] = "b"

TypeError: 'str' object does not support item assignment

A lot of handy methods are available to manipulate strings

In [71]:
sentence.upper()
Out[71]:
'TIME IS AN ILLUSION.\n\nLUNCHTIME DOUBLY SO.'
In [72]:
sentence.endswith('.')
Out[72]:
True
In [73]:
sentence.split() # by default split on whitespaces, returns a list (see below)
Out[73]:
['Time', 'is', 'an', 'illusion.', 'Lunchtime', 'doubly', 'so.']

String contenation and formatting

In [74]:
"The answer is " + "42"
Out[74]:
'The answer is 42'
In [75]:
";".join(["The answer is ","42"]) # ["The answer is ","42"] is a list with two elements (separated by a ,)
Out[75]:
'The answer is ;42'
In [76]:
a = 42
In [77]:
"The answer is %s" % ( a )
Out[77]:
'The answer is 42'
In [78]:
"The answer is %4.2f" % ( a )
Out[78]:
'The answer is 42.00'
In [79]:
"The answer is {0:<6.4f}, {0:<6.4f} and not {1:<6.4f} ".format(a,42.0001)
Out[79]:
'The answer is 42.0000, 42.0000 and not 42.0001 '

Lists

In [80]:
int_list = [1,2,3,4,5,6]
In [81]:
int_list
Out[81]:
[1, 2, 3, 4, 5, 6]
In [82]:
str_list = ['thing', 'stuff', 'truc']
In [83]:
str_list
Out[83]:
['thing', 'stuff', 'truc']

lists can contain anything

In [84]:
mixed_list = [1, 1., 2+3J, 'sentence', """
long sentence
"""]
In [85]:
mixed_list
Out[85]:
[1, 1.0, (2+3j), 'sentence', '\nlong sentence\n']
In [86]:
type(mixed_list[1])
Out[86]:
float

Accessing elements and slicing lists

lists are iterable, their items (elements) can be accessed in a similar way as we saw for strings

In [87]:
int_list[0]
Out[87]:
1
In [88]:
int_list[1]
Out[88]:
2
In [89]:
int_list[::-1] ## same as int_list.reverse() but it is NOT operating in place
Out[89]:
[6, 5, 4, 3, 2, 1]
In [90]:
int_list.reverse()
In [91]:
int_list
Out[91]:
[6, 5, 4, 3, 2, 1]

lists can be nested (list of lists)

In [92]:
x = [[1,2,3],[4,5,6]]
In [93]:
x
Out[93]:
[[1, 2, 3], [4, 5, 6]]
In [94]:
from itertools import chain
In [95]:
list(chain(*x))
Out[95]:
[1, 2, 3, 4, 5, 6]
In [96]:
x[0]
Out[96]:
[1, 2, 3]
In [97]:
x[1]
Out[97]:
[4, 5, 6]
In [98]:
x[0][1]
Out[98]:
2

append is one of the most useful list methods

In [99]:
int_list.append(7); print(int_list)
[6, 5, 4, 3, 2, 1, 7]

lists are mutable: you can change their elements in place

In [100]:
int_list[0] = 2; print(int_list)
[2, 5, 4, 3, 2, 1, 7]
In [101]:
int_list.reverse() 
In [102]:
int_list ### ! list object methods are applied 'in place'
Out[102]:
[7, 1, 2, 3, 4, 5, 2]
In [103]:
int_list.count(2)
Out[103]:
2

Tuples

Tuples are also iterables, and they can be indexed and sliced like lists

In [104]:
int_tup = (1,2,3,5,6,7)
In [105]:
int_tup[1:3]
Out[105]:
(2, 3)
In [106]:
int_tup.index(2)
Out[106]:
1

This construction is also possible

In [107]:
tup = 1,2,3
In [108]:
tup
Out[108]:
(1, 2, 3)

Tuples ARE NOT mutable, contrary to lists

In [109]:
int_tup[0] = 1
---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-109-e1b3b1603cc4> in <module>()
----> 1 int_tup[0] = 1

TypeError: 'tuple' object does not support item assignment

Useful trick: zipping lists

In [114]:
a = range(5); print(a)
range(0, 5)
In [115]:
b = range(5,10); print(b)
range(5, 10)
In [118]:
a + b
---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-118-f96fb8f649b6> in <module>()
----> 1 a + b

TypeError: unsupported operand type(s) for +: 'range' and 'range'
In [119]:
a = list(range(5))
b = list(range(5,10))
In [120]:
print(a)
[0, 1, 2, 3, 4]
In [121]:
print(b)
[5, 6, 7, 8, 9]
In [122]:
a + b
Out[122]:
[0, 1, 2, 3, 4, 5, 6, 7, 8, 9]

From Python 3 range returns an iterator, NOT a list, see https://docs.python.org/3.0/whatsnew/3.0.html#views-and-iterators-instead-of-lists

In [126]:
tuple(zip(a,b)) # returns a list of tuples
Out[126]:
((0, 5), (1, 6), (2, 7), (3, 8), (4, 9))

List comprehension !

List comprehensions are one of the most useful and compacts Python expressions, I'm introducing that here but we'll see more about control flow structures later.

In [127]:
str_list
Out[127]:
['thing', 'stuff', 'truc']
In [128]:
['my ' + x for x in str_list]
Out[128]:
['my thing', 'my stuff', 'my truc']
In [129]:
[x.upper() for x in str_list]
Out[129]:
['THING', 'STUFF', 'TRUC']
In [130]:
[x+y for x,y in zip(a,b)] # using zip (above)
Out[130]:
[5, 7, 9, 11, 13]
In [131]:
a
Out[131]:
[0, 1, 2, 3, 4]
In [132]:
[x + 6 if (x < 3) else x for x in a]
Out[132]:
[6, 7, 8, 3, 4]

Dictionnaries

One of the more flexible built-in data structures is the dictionary. A dictionary maps a collection of values to a set of associated keys. These mappings are mutable, and unlike lists or tuples, are unordered. Hence, rather than using the sequence index to return elements of the collection, the corresponding key must be used. Dictionaries are specified by a comma-separated sequence of keys and values, which are separated in turn by colons. The dictionary is enclosed by curly braces. For example:

In [133]:
my_dict = {'a':16, 'b':(4,5), 'foo':'''(noun) a term used as a universal substitute 
           for something real, especially when discussing technological ideas and 
           problems'''}
my_dict
Out[133]:
{'a': 16,
 'b': (4, 5),
 'foo': '(noun) a term used as a universal substitute \n           for something real, especially when discussing technological ideas and \n           problems'}
In [134]:
my_dict['foo']
Out[134]:
'(noun) a term used as a universal substitute \n           for something real, especially when discussing technological ideas and \n           problems'
In [135]:
'a' in my_dict	# Checks to see if ‘a’ is in my_dict
Out[135]:
True
In [138]:
my_dict.items()		# Returns key/value pairs as list of tuples
Out[138]:
dict_items([('foo', '(noun) a term used as a universal substitute \n           for something real, especially when discussing technological ideas and \n           problems'), ('a', 16), ('b', (4, 5))])
In [139]:
my_dict.keys()		# Returns list of keys
Out[139]:
dict_keys(['foo', 'a', 'b'])
In [140]:
my_dict.values()	# Returns list of values
Out[140]:
dict_values(['(noun) a term used as a universal substitute \n           for something real, especially when discussing technological ideas and \n           problems', 16, (4, 5)])
In [141]:
my_dict['c']
---------------------------------------------------------------------------
KeyError                                  Traceback (most recent call last)
<ipython-input-141-0783a8759759> in <module>()
----> 1 my_dict['c']

KeyError: 'c'

If we would rather not get the error, we can use the get method, which returns None if the value is not present, or a value of your choice

In [142]:
my_dict.get('c')
In [143]:
my_dict.get('c', -1)
Out[143]:
-1

conversion between data structures

In [144]:
a = ['a','b','c']
b = [1,2,3]
In [145]:
type(tuple(a))
Out[145]:
tuple
In [146]:
d = dict(zip(a,b))
In [147]:
d
Out[147]:
{'a': 1, 'b': 2, 'c': 3}

3. Logical operators

Logical operators will test for some condition and return a boolean (True, False)

Comparison operators

  • > : Greater than
  • >= : Greater than or equal to
  • < : Less than
  • <= : Less than or equal to
  • == : Equal to
  • != : Not equal to

is / is not

Use == (!=) when comparing values and is (is not) when comparing identities.

In [148]:
x = 5.
In [149]:
type(x)
Out[149]:
float
In [150]:
y = 5
In [151]:
type(y)
Out[151]:
int
In [152]:
x == y
Out[152]:
True
In [153]:
x is y # x is a float, y is a int, they point to different addresses in memory
Out[153]:
False

Some examples of common comparisons

In [154]:
a = 5
b = 6
In [155]:
a == b
Out[155]:
False
In [156]:
a != b
Out[156]:
True
In [157]:
(a > 4) and (b < 7)
Out[157]:
True
In [158]:
(a > 4) and (b > 7)
Out[158]:
False
In [159]:
(a > 4) or (b > 7)
Out[159]:
True

All and Any can be used for a collection of booleans

In [160]:
x = [5,6,2,3,3]
In [161]:
cond = [item > 2 for item in x]
In [162]:
cond
Out[162]:
[True, True, False, True, True]
In [163]:
all(cond)
Out[163]:
False
In [164]:
any(cond)
Out[164]:
True

4. Control flow structures

Indentation is meaningfull

In Python, there are no annoying curly braces (I'm looking at you R), parenthesis, brackets etc as in other languages to delimitate flow control blocks, instead, the INDENTATION plays this role, which forces you to write clear(er) code ...

In [168]:
for x in range(10): 
    if x < 5:
        print(x**2)
    else:
        print(x) 
0
1
4
9
16
5
6
7
8
9

Note: The standard is to use 4 spaces (NOT tabs) for the indentation, set your favorite editor accordingly, for example in vi / vim:

set tabstop=4
set expandtab
set shiftwidth=4
set softtabstop=4

When editing a code cell in IPython, the indentation is handled intelligently, try typing in a new blank cell:

for x in xrange(10): 
    if x < 5:
        print x**2
    else:
        print x 

if ... elif ... else

In [170]:
x = 10

if x < 10: # not met
    x = x + 1
elif x > 10: 
    x = x - 1 # not met either 
else: 
    x = x * 2
    
print(x)
20
In [171]:
x = 10

if (x > 5 and x < 8): 
    x = x+1
elif (x > 5 and x < 12): 
    x = x * 3
else:
    x = x-1
    
print(x)
30

The For loop

The basic structure of FOR loops is 

for item in iterable: 
    expression(s)
In [173]:
count = 0
# x = range(1,10) # range creates an iterator ... 
x = range(1,10) 
for i in x:
    count += i
    print(count)
1
3
6
10
15
21
28
36
45

try ... except

You can see it as a generalization of the if ... else construction, allowing more flexibility in handling failures in code

In [174]:
text = ('a','1','54.1','43.a')
for t in text:
    try:
        temp = float(t)
        print(temp)
    except ValueError:
        # 
        print(str(t) + ' is Not convertible to a float')
a is Not convertible to a float
1.0
54.1
43.a is Not convertible to a float

A list of built-in exceptions is available here

http://docs.python.org/3.1/library/exceptions.html

5. Recycling code in Python

As with Matlab and R, it's a good idea to write functions for bits of code that you use often.

The syntax for defining a function in Python is:

def name_of_function(arguments): 
        "Some code here that works on arguments and produces outputs"
        ...
        return outputs

Note that the execution block must be indented ...

you can create a file (a module: extension .py required) which contains several functions, and can also define variables, and import some other functions from other modules

In [175]:
%%writefile some_module.py 

PI = 3.14159 # defining a variable

from numpy import arccos # importing a function from another module

def f(x): 
    """
    This is a function which adds 5 to its argument
     
    """
    return x + 5

def g(x, y): 
    """
    This is a function which sums its 2 arguments
    """
    return x + y
Writing some_module.py
In [176]:
import some_module
In [177]:
%whos
Variable      Type      Data/Info
---------------------------------
X             str       something another thing 1 2 3
a             int       5
addutils      module    <module 'addutils' from '<...>es/addutils/__init__.py'>
b             int       6
chain         type      <class 'itertools.chain'>
cond          list      n=5
count         int       45
d             dict      n=3
i             int       9
int_list      list      n=7
int_tup       tuple     n=6
mixed_list    list      n=5
my_dict       dict      n=3
sentence      str       Time is an illusion.\n\nLunchtime doubly so.
some_module   module    <module 'some_module' fro<...>otebooks/some_module.py'>
str_list      list      n=3
sys           module    <module 'sys' (built-in)>
t             str       43.a
temp          float     54.1
text          tuple     n=4
this          module    <module 'this' from '/Use<...>a/lib/python3.5/this.py'>
tup           tuple     n=3
x             range     range(1, 10)
y             int       5
In [178]:
dir(some_module)
Out[178]:
['PI',
 '__builtins__',
 '__cached__',
 '__doc__',
 '__file__',
 '__loader__',
 '__name__',
 '__package__',
 '__spec__',
 'arccos',
 'f',
 'g']
In [179]:
help(some_module)
Help on module some_module:

NAME
    some_module

FUNCTIONS
    f(x)
        This is a function which adds 5 to its argument
    
    g(x, y)
        This is a function which sums its 2 arguments

DATA
    PI = 3.14159
    arccos = <ufunc 'arccos'>

FILE
    /Users/nicolasf/Documents/talks_seminars/Python-for-data-analysis-and-visualisation/session_1/notebooks/some_module.py


In [180]:
some_module.PI
Out[180]:
3.14159
In [181]:
some_module.arccos?
In [182]:
some_module.f(7)
Out[182]:
12
In [183]:
help(some_module.f)
Help on function f in module some_module:

f(x)
    This is a function which adds 5 to its argument

In [184]:
from some_module import f
In [185]:
f(5)
Out[185]:
10
In [186]:
import some_module as sm
In [187]:
sm.f(10)
Out[187]:
15

The Zen of python says:

Namespaces are one honking great idea -- let's do more of those!

so don't do:

from some_module import *

As to avoid names conflicts ...

positional and keyword arguments

Functions can have positional as well as keyword arguments (with defaults, can be None if that's allowed / tested)

positional arguments must always come before keyword arguments

In [188]:
def some_function(a,b, c=5,d=1e3): 
    res = (a + b) * c * d
    return res
In [189]:
some_function(2,3)
Out[189]:
25000.0
In [190]:
some_function(2, 3, c=5, d=0.01)
Out[190]:
0.25

you can return more than one output, by default will be a tuple

In [191]:
def some_function(a, b): 
    return a+1, b+1, a*b
In [192]:
x = some_function(2,3)
In [193]:
type(x)
Out[193]:
tuple
In [194]:
a,b,c = some_function(2,3)
In [ ]: