{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# Introduction to `pandas`" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "* [**Loading and exploring data files**](#Loading-and-exploring-data-files)\n", " * [Selecting rows and columns](#Selecting-rows-and-columns)\n", " * [Sorting](#Sorting)\n", " * [Duplicates](#Duplicates)\n", " * [Adding, renaming, and removing columns](#Adding,-renaming,-and-removing-columns)\n", " * [Replacing values](#Replacing-values)\n", " * [Changing data types](#Changing-data-types)\n", "* [**Summarising data**](#Summarising-data)\n", " * [Summary statistics](#Summary-statistics)\n", " * [Pivot tables](#Pivot-tables)\n", " * [Split-apply-combine](#Split-apply-combine)\n", "* [**Visualising data**](#Visualising-data)\n", " * [Histograms and density plots](#Histograms-and-density-plots)\n", " * [Scatter plots](#Scatter-plots)\n", " * [Bar plots](#Bar-plots)\n", " * [Box plots](#Box-plots)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Let's start by importing some libraries:\n", "* `numpy` and `pandas` to load, explore, and summarise the data\n", "* `matplotlib` to visualise the data" ] }, { "cell_type": "code", "execution_count": 1, "metadata": {}, "outputs": [], "source": [ "import numpy as np\n", "import pandas as pd\n", "import matplotlib.pyplot as plt\n", "\n", "# Make sure plots are shown inside the notebook\n", "%matplotlib inline\n", "plt.rcParams['figure.figsize'] = (10, 6)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Loading and exploring data files" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "For this introduction, we'll use data on student performance from the [UCI Machine Learning Repository](https://archive.ics.uci.edu/ml/datasets/Student+Performance)." ] }, { "cell_type": "code", "execution_count": 2, "metadata": {}, "outputs": [], "source": [ "students = pd.read_csv('https://raw.githubusercontent.com/estimand/teaching-datasets/master/student-performance/student_performance.csv')" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Note how `pandas` can read data from a local file or directly from a URL. You should also explore other `read_` methods such as `read_excel`.\n", "\n", "What's the type of the `students` variable we've just created?" ] }, { "cell_type": "code", "execution_count": 3, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "pandas.core.frame.DataFrame" ] }, "execution_count": 3, "metadata": {}, "output_type": "execute_result" } ], "source": [ "type(students)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "`DataFrame`s are at the core of `pandas`. They're organised like Excel spreadsheets, with **rows representing observations** (people, items, etc.), and **columns representing variables** (measurements, attributes, etc.).\n", "\n", "The number of rows and columns in a `DataFrame` is known as its `shape`." ] }, { "cell_type": "code", "execution_count": 4, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "(649, 31)" ] }, "execution_count": 4, "metadata": {}, "output_type": "execute_result" } ], "source": [ "students.shape" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "We can have a quick look at the data using the functions `head` and `tail`, which return the first or last 5 rows (by default), respectively." ] }, { "cell_type": "code", "execution_count": 5, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
schoolsexagehome_areafamily_sizeparents_cohabiteducation_mothereducation_fatheroccupation_motheroccupation_father...internet_accessromantic_relationshipfamily_relationships_qualityfree_timegoing_outalcohol_weekdaysalcohol_weekendhealth_statusabsencesfinal_grade
0Gabriel PereiraF18Urban> 3FalseHigher educationHigher educationAt homeTeacher...FalseFalse434113411
1Gabriel PereiraF17Urban> 3TruePrimary educationPrimary educationAt homeOther...TrueFalse533113211
2Gabriel PereiraF15Urban<= 3TruePrimary educationPrimary educationAt homeOther...TrueFalse432233612
3Gabriel PereiraF15Urban> 3TrueHigher educationLower secondary educationHealthcareCivil service...TrueTrue322115014
4Gabriel PereiraF16Urban> 3TrueUpper secondary educationUpper secondary educationOtherOther...FalseFalse432125013
\n", "

5 rows × 31 columns

\n", "
" ], "text/plain": [ " school sex age home_area family_size parents_cohabit \\\n", "0 Gabriel Pereira F 18 Urban > 3 False \n", "1 Gabriel Pereira F 17 Urban > 3 True \n", "2 Gabriel Pereira F 15 Urban <= 3 True \n", "3 Gabriel Pereira F 15 Urban > 3 True \n", "4 Gabriel Pereira F 16 Urban > 3 True \n", "\n", " education_mother education_father occupation_mother \\\n", "0 Higher education Higher education At home \n", "1 Primary education Primary education At home \n", "2 Primary education Primary education At home \n", "3 Higher education Lower secondary education Healthcare \n", "4 Upper secondary education Upper secondary education Other \n", "\n", " occupation_father ... internet_access romantic_relationship \\\n", "0 Teacher ... False False \n", "1 Other ... True False \n", "2 Other ... True False \n", "3 Civil service ... True True \n", "4 Other ... False False \n", "\n", " family_relationships_quality free_time going_out alcohol_weekdays \\\n", "0 4 3 4 1 \n", "1 5 3 3 1 \n", "2 4 3 2 2 \n", "3 3 2 2 1 \n", "4 4 3 2 1 \n", "\n", " alcohol_weekend health_status absences final_grade \n", "0 1 3 4 11 \n", "1 1 3 2 11 \n", "2 3 3 6 12 \n", "3 1 5 0 14 \n", "4 2 5 0 13 \n", "\n", "[5 rows x 31 columns]" ] }, "execution_count": 5, "metadata": {}, "output_type": "execute_result" } ], "source": [ "students.head()" ] }, { "cell_type": "code", "execution_count": 6, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
schoolsexagehome_areafamily_sizeparents_cohabiteducation_mothereducation_fatheroccupation_motheroccupation_father...internet_accessromantic_relationshipfamily_relationships_qualityfree_timegoing_outalcohol_weekdaysalcohol_weekendhealth_statusabsencesfinal_grade
644Mouzinho da SilveiraF19Rural> 3TrueLower secondary educationUpper secondary educationCivil serviceOther...TrueFalse542125410
645Mouzinho da SilveiraF18Urban<= 3TrueUpper secondary educationPrimary educationTeacherCivil service...TrueFalse434111416
646Mouzinho da SilveiraF18Urban> 3TruePrimary educationPrimary educationOtherOther...FalseFalse11111569
647Mouzinho da SilveiraM17Urban<= 3TrueUpper secondary educationPrimary educationCivil serviceCivil service...TrueFalse245342610
648Mouzinho da SilveiraM18Rural<= 3TrueUpper secondary educationLower secondary educationCivil serviceOther...TrueFalse441345411
\n", "

5 rows × 31 columns

\n", "
" ], "text/plain": [ " school sex age home_area family_size parents_cohabit \\\n", "644 Mouzinho da Silveira F 19 Rural > 3 True \n", "645 Mouzinho da Silveira F 18 Urban <= 3 True \n", "646 Mouzinho da Silveira F 18 Urban > 3 True \n", "647 Mouzinho da Silveira M 17 Urban <= 3 True \n", "648 Mouzinho da Silveira M 18 Rural <= 3 True \n", "\n", " education_mother education_father occupation_mother \\\n", "644 Lower secondary education Upper secondary education Civil service \n", "645 Upper secondary education Primary education Teacher \n", "646 Primary education Primary education Other \n", "647 Upper secondary education Primary education Civil service \n", "648 Upper secondary education Lower secondary education Civil service \n", "\n", " occupation_father ... internet_access romantic_relationship \\\n", "644 Other ... True False \n", "645 Civil service ... True False \n", "646 Other ... False False \n", "647 Civil service ... True False \n", "648 Other ... True False \n", "\n", " family_relationships_quality free_time going_out alcohol_weekdays \\\n", "644 5 4 2 1 \n", "645 4 3 4 1 \n", "646 1 1 1 1 \n", "647 2 4 5 3 \n", "648 4 4 1 3 \n", "\n", " alcohol_weekend health_status absences final_grade \n", "644 2 5 4 10 \n", "645 1 1 4 16 \n", "646 1 5 6 9 \n", "647 4 2 6 10 \n", "648 4 5 4 11 \n", "\n", "[5 rows x 31 columns]" ] }, "execution_count": 6, "metadata": {}, "output_type": "execute_result" } ], "source": [ "students.tail()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "We can also specify the number of rows we want as an argument." ] }, { "cell_type": "code", "execution_count": 7, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
schoolsexagehome_areafamily_sizeparents_cohabiteducation_mothereducation_fatheroccupation_motheroccupation_father...internet_accessromantic_relationshipfamily_relationships_qualityfree_timegoing_outalcohol_weekdaysalcohol_weekendhealth_statusabsencesfinal_grade
0Gabriel PereiraF18Urban> 3FalseHigher educationHigher educationAt homeTeacher...FalseFalse434113411
1Gabriel PereiraF17Urban> 3TruePrimary educationPrimary educationAt homeOther...TrueFalse533113211
2Gabriel PereiraF15Urban<= 3TruePrimary educationPrimary educationAt homeOther...TrueFalse432233612
3Gabriel PereiraF15Urban> 3TrueHigher educationLower secondary educationHealthcareCivil service...TrueTrue322115014
4Gabriel PereiraF16Urban> 3TrueUpper secondary educationUpper secondary educationOtherOther...FalseFalse432125013
5Gabriel PereiraM16Urban<= 3TrueHigher educationUpper secondary educationCivil serviceOther...TrueFalse542125613
6Gabriel PereiraM16Urban<= 3TrueLower secondary educationLower secondary educationOtherOther...TrueFalse444113013
7Gabriel PereiraF17Urban> 3FalseHigher educationHigher educationOtherTeacher...FalseFalse414111213
8Gabriel PereiraM15Urban<= 3FalseUpper secondary educationLower secondary educationCivil serviceOther...TrueFalse422111017
9Gabriel PereiraM15Urban> 3TrueUpper secondary educationHigher educationOtherOther...TrueFalse551115013
\n", "

10 rows × 31 columns

\n", "
" ], "text/plain": [ " school sex age home_area family_size parents_cohabit \\\n", "0 Gabriel Pereira F 18 Urban > 3 False \n", "1 Gabriel Pereira F 17 Urban > 3 True \n", "2 Gabriel Pereira F 15 Urban <= 3 True \n", "3 Gabriel Pereira F 15 Urban > 3 True \n", "4 Gabriel Pereira F 16 Urban > 3 True \n", "5 Gabriel Pereira M 16 Urban <= 3 True \n", "6 Gabriel Pereira M 16 Urban <= 3 True \n", "7 Gabriel Pereira F 17 Urban > 3 False \n", "8 Gabriel Pereira M 15 Urban <= 3 False \n", "9 Gabriel Pereira M 15 Urban > 3 True \n", "\n", " education_mother education_father occupation_mother \\\n", "0 Higher education Higher education At home \n", "1 Primary education Primary education At home \n", "2 Primary education Primary education At home \n", "3 Higher education Lower secondary education Healthcare \n", "4 Upper secondary education Upper secondary education Other \n", "5 Higher education Upper secondary education Civil service \n", "6 Lower secondary education Lower secondary education Other \n", "7 Higher education Higher education Other \n", "8 Upper secondary education Lower secondary education Civil service \n", "9 Upper secondary education Higher education Other \n", "\n", " occupation_father ... internet_access romantic_relationship \\\n", "0 Teacher ... False False \n", "1 Other ... True False \n", "2 Other ... True False \n", "3 Civil service ... True True \n", "4 Other ... False False \n", "5 Other ... True False \n", "6 Other ... True False \n", "7 Teacher ... False False \n", "8 Other ... True False \n", "9 Other ... True False \n", "\n", " family_relationships_quality free_time going_out alcohol_weekdays \\\n", "0 4 3 4 1 \n", "1 5 3 3 1 \n", "2 4 3 2 2 \n", "3 3 2 2 1 \n", "4 4 3 2 1 \n", "5 5 4 2 1 \n", "6 4 4 4 1 \n", "7 4 1 4 1 \n", "8 4 2 2 1 \n", "9 5 5 1 1 \n", "\n", " alcohol_weekend health_status absences final_grade \n", "0 1 3 4 11 \n", "1 1 3 2 11 \n", "2 3 3 6 12 \n", "3 1 5 0 14 \n", "4 2 5 0 13 \n", "5 2 5 6 13 \n", "6 1 3 0 13 \n", "7 1 1 2 13 \n", "8 1 1 0 17 \n", "9 1 5 0 13 \n", "\n", "[10 rows x 31 columns]" ] }, "execution_count": 7, "metadata": {}, "output_type": "execute_result" } ], "source": [ "students.head(10)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The full set of columns in a `DataFrame` is stored in the attribute `columns`." ] }, { "cell_type": "code", "execution_count": 8, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "Index(['school', 'sex', 'age', 'home_area', 'family_size', 'parents_cohabit',\n", " 'education_mother', 'education_father', 'occupation_mother',\n", " 'occupation_father', 'reason', 'guardian', 'travel_time', 'study_time',\n", " 'failures', 'extra_school_support', 'family_support', 'extra_tutoring',\n", " 'extracurricular_activities', 'nursery', 'higher_education',\n", " 'internet_access', 'romantic_relationship',\n", " 'family_relationships_quality', 'free_time', 'going_out',\n", " 'alcohol_weekdays', 'alcohol_weekend', 'health_status', 'absences',\n", " 'final_grade'],\n", " dtype='object')" ] }, "execution_count": 8, "metadata": {}, "output_type": "execute_result" } ], "source": [ "students.columns" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Each column has an associated data type (e.g. `int` or `float`). These are stored in the attribute `dtypes`." ] }, { "cell_type": "code", "execution_count": 9, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "school object\n", "sex object\n", "age int64\n", "home_area object\n", "family_size object\n", "parents_cohabit bool\n", "education_mother object\n", "education_father object\n", "occupation_mother object\n", "occupation_father object\n", "reason object\n", "guardian object\n", "travel_time object\n", "study_time object\n", "failures object\n", "extra_school_support bool\n", "family_support bool\n", "extra_tutoring bool\n", "extracurricular_activities bool\n", "nursery bool\n", "higher_education bool\n", "internet_access bool\n", "romantic_relationship bool\n", "family_relationships_quality int64\n", "free_time int64\n", "going_out int64\n", "alcohol_weekdays int64\n", "alcohol_weekend int64\n", "health_status int64\n", "absences int64\n", "final_grade int64\n", "dtype: object" ] }, "execution_count": 9, "metadata": {}, "output_type": "execute_result" } ], "source": [ "students.dtypes" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Note how `pandas` stores `str`ings as `object`s.\n", "\n", "Type-specific methods can be accessed using attributes such as `str` (for `str`ings) and `dt` (for `datetime` objects representing dates and times)." ] }, { "cell_type": "code", "execution_count": 10, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "0 True\n", "1 True\n", "2 True\n", "3 True\n", "4 True\n", "5 True\n", "6 True\n", "7 True\n", "8 True\n", "9 True\n", "10 True\n", "11 True\n", "12 True\n", "13 True\n", "14 True\n", "15 True\n", "16 True\n", "17 True\n", "18 True\n", "19 True\n", "20 True\n", "21 True\n", "22 True\n", "23 True\n", "24 True\n", "25 True\n", "26 True\n", "27 True\n", "28 True\n", "29 True\n", " ... \n", "619 False\n", "620 False\n", "621 False\n", "622 False\n", "623 False\n", "624 False\n", "625 False\n", "626 False\n", "627 False\n", "628 False\n", "629 False\n", "630 False\n", "631 False\n", "632 False\n", "633 False\n", "634 False\n", "635 False\n", "636 False\n", "637 False\n", "638 False\n", "639 False\n", "640 False\n", "641 False\n", "642 False\n", "643 False\n", "644 False\n", "645 False\n", "646 False\n", "647 False\n", "648 False\n", "Name: school, Length: 649, dtype: bool" ] }, "execution_count": 10, "metadata": {}, "output_type": "execute_result" } ], "source": [ "students['school'].str.contains('Pereira')" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Selecting rows and columns" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### Boolean filtering\n", "\n", "Using `[]` and a `bool`ean condition, we can **select rows** that satisfy certain conditions." ] }, { "cell_type": "code", "execution_count": 11, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
schoolsexagehome_areafamily_sizeparents_cohabiteducation_mothereducation_fatheroccupation_motheroccupation_father...internet_accessromantic_relationshipfamily_relationships_qualityfree_timegoing_outalcohol_weekdaysalcohol_weekendhealth_statusabsencesfinal_grade
0Gabriel PereiraF18Urban> 3FalseHigher educationHigher educationAt homeTeacher...FalseFalse434113411
1Gabriel PereiraF17Urban> 3TruePrimary educationPrimary educationAt homeOther...TrueFalse533113211
2Gabriel PereiraF15Urban<= 3TruePrimary educationPrimary educationAt homeOther...TrueFalse432233612
3Gabriel PereiraF15Urban> 3TrueHigher educationLower secondary educationHealthcareCivil service...TrueTrue322115014
4Gabriel PereiraF16Urban> 3TrueUpper secondary educationUpper secondary educationOtherOther...FalseFalse432125013
\n", "

5 rows × 31 columns

\n", "
" ], "text/plain": [ " school sex age home_area family_size parents_cohabit \\\n", "0 Gabriel Pereira F 18 Urban > 3 False \n", "1 Gabriel Pereira F 17 Urban > 3 True \n", "2 Gabriel Pereira F 15 Urban <= 3 True \n", "3 Gabriel Pereira F 15 Urban > 3 True \n", "4 Gabriel Pereira F 16 Urban > 3 True \n", "\n", " education_mother education_father occupation_mother \\\n", "0 Higher education Higher education At home \n", "1 Primary education Primary education At home \n", "2 Primary education Primary education At home \n", "3 Higher education Lower secondary education Healthcare \n", "4 Upper secondary education Upper secondary education Other \n", "\n", " occupation_father ... internet_access romantic_relationship \\\n", "0 Teacher ... False False \n", "1 Other ... True False \n", "2 Other ... True False \n", "3 Civil service ... True True \n", "4 Other ... False False \n", "\n", " family_relationships_quality free_time going_out alcohol_weekdays \\\n", "0 4 3 4 1 \n", "1 5 3 3 1 \n", "2 4 3 2 2 \n", "3 3 2 2 1 \n", "4 4 3 2 1 \n", "\n", " alcohol_weekend health_status absences final_grade \n", "0 1 3 4 11 \n", "1 1 3 2 11 \n", "2 3 3 6 12 \n", "3 1 5 0 14 \n", "4 2 5 0 13 \n", "\n", "[5 rows x 31 columns]" ] }, "execution_count": 11, "metadata": {}, "output_type": "execute_result" } ], "source": [ "students[students['age'] <= 18].head()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Note that the result of this operation is another `DataFrame`, meaning that we can call methods such as `head`.\n", "\n", "We can also combine multiple conditions." ] }, { "cell_type": "code", "execution_count": 12, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
schoolsexagehome_areafamily_sizeparents_cohabiteducation_mothereducation_fatheroccupation_motheroccupation_father...internet_accessromantic_relationshipfamily_relationships_qualityfree_timegoing_outalcohol_weekdaysalcohol_weekendhealth_statusabsencesfinal_grade
3Gabriel PereiraF15Urban> 3TrueHigher educationLower secondary educationHealthcareCivil service...TrueTrue322115014
14Gabriel PereiraM15Urban> 3FalseLower secondary educationLower secondary educationOtherOther...TrueTrue452113015
29Gabriel PereiraM16Urban> 3TrueHigher educationHigher educationTeacherTeacher...TrueTrue445555412
32Gabriel PereiraM15Rural> 3TrueHigher educationUpper secondary educationTeacherAt home...TrueTrue452115015
37Gabriel PereiraM16Rural> 3FalseHigher educationHigher educationOtherTeacher...TrueTrue243115413
\n", "

5 rows × 31 columns

\n", "
" ], "text/plain": [ " school sex age home_area family_size parents_cohabit \\\n", "3 Gabriel Pereira F 15 Urban > 3 True \n", "14 Gabriel Pereira M 15 Urban > 3 False \n", "29 Gabriel Pereira M 16 Urban > 3 True \n", "32 Gabriel Pereira M 15 Rural > 3 True \n", "37 Gabriel Pereira M 16 Rural > 3 False \n", "\n", " education_mother education_father occupation_mother \\\n", "3 Higher education Lower secondary education Healthcare \n", "14 Lower secondary education Lower secondary education Other \n", "29 Higher education Higher education Teacher \n", "32 Higher education Upper secondary education Teacher \n", "37 Higher education Higher education Other \n", "\n", " occupation_father ... internet_access romantic_relationship \\\n", "3 Civil service ... True True \n", "14 Other ... True True \n", "29 Teacher ... True True \n", "32 At home ... True True \n", "37 Teacher ... True True \n", "\n", " family_relationships_quality free_time going_out alcohol_weekdays \\\n", "3 3 2 2 1 \n", "14 4 5 2 1 \n", "29 4 4 5 5 \n", "32 4 5 2 1 \n", "37 2 4 3 1 \n", "\n", " alcohol_weekend health_status absences final_grade \n", "3 1 5 0 14 \n", "14 1 3 0 15 \n", "29 5 5 4 12 \n", "32 1 5 0 15 \n", "37 1 5 4 13 \n", "\n", "[5 rows x 31 columns]" ] }, "execution_count": 12, "metadata": {}, "output_type": "execute_result" } ], "source": [ "students[(students['age'] <= 18) & (students['romantic_relationship'])].head()" ] }, { "cell_type": "code", "execution_count": 13, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
schoolsexagehome_areafamily_sizeparents_cohabiteducation_mothereducation_fatheroccupation_motheroccupation_father...internet_accessromantic_relationshipfamily_relationships_qualityfree_timegoing_outalcohol_weekdaysalcohol_weekendhealth_statusabsencesfinal_grade
2Gabriel PereiraF15Urban<= 3TruePrimary educationPrimary educationAt homeOther...TrueFalse432233612
3Gabriel PereiraF15Urban> 3TrueHigher educationLower secondary educationHealthcareCivil service...TrueTrue322115014
4Gabriel PereiraF16Urban> 3TrueUpper secondary educationUpper secondary educationOtherOther...FalseFalse432125013
5Gabriel PereiraM16Urban<= 3TrueHigher educationUpper secondary educationCivil serviceOther...TrueFalse542125613
6Gabriel PereiraM16Urban<= 3TrueLower secondary educationLower secondary educationOtherOther...TrueFalse444113013
\n", "

5 rows × 31 columns

\n", "
" ], "text/plain": [ " school sex age home_area family_size parents_cohabit \\\n", "2 Gabriel Pereira F 15 Urban <= 3 True \n", "3 Gabriel Pereira F 15 Urban > 3 True \n", "4 Gabriel Pereira F 16 Urban > 3 True \n", "5 Gabriel Pereira M 16 Urban <= 3 True \n", "6 Gabriel Pereira M 16 Urban <= 3 True \n", "\n", " education_mother education_father occupation_mother \\\n", "2 Primary education Primary education At home \n", "3 Higher education Lower secondary education Healthcare \n", "4 Upper secondary education Upper secondary education Other \n", "5 Higher education Upper secondary education Civil service \n", "6 Lower secondary education Lower secondary education Other \n", "\n", " occupation_father ... internet_access romantic_relationship \\\n", "2 Other ... True False \n", "3 Civil service ... True True \n", "4 Other ... False False \n", "5 Other ... True False \n", "6 Other ... True False \n", "\n", " family_relationships_quality free_time going_out alcohol_weekdays \\\n", "2 4 3 2 2 \n", "3 3 2 2 1 \n", "4 4 3 2 1 \n", "5 5 4 2 1 \n", "6 4 4 4 1 \n", "\n", " alcohol_weekend health_status absences final_grade \n", "2 3 3 6 12 \n", "3 1 5 0 14 \n", "4 2 5 0 13 \n", "5 2 5 6 13 \n", "6 1 3 0 13 \n", "\n", "[5 rows x 31 columns]" ] }, "execution_count": 13, "metadata": {}, "output_type": "execute_result" } ], "source": [ "students[(students['age'] <= 16) | (students['age'] >= 21)].head()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### Slicing" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Using `[]` and a single `str`ing, we can **select specific columns**." ] }, { "cell_type": "code", "execution_count": 14, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "0 Gabriel Pereira\n", "1 Gabriel Pereira\n", "2 Gabriel Pereira\n", "3 Gabriel Pereira\n", "4 Gabriel Pereira\n", "5 Gabriel Pereira\n", "6 Gabriel Pereira\n", "7 Gabriel Pereira\n", "8 Gabriel Pereira\n", "9 Gabriel Pereira\n", "10 Gabriel Pereira\n", "11 Gabriel Pereira\n", "12 Gabriel Pereira\n", "13 Gabriel Pereira\n", "14 Gabriel Pereira\n", "15 Gabriel Pereira\n", "16 Gabriel Pereira\n", "17 Gabriel Pereira\n", "18 Gabriel Pereira\n", "19 Gabriel Pereira\n", "20 Gabriel Pereira\n", "21 Gabriel Pereira\n", "22 Gabriel Pereira\n", "23 Gabriel Pereira\n", "24 Gabriel Pereira\n", "25 Gabriel Pereira\n", "26 Gabriel Pereira\n", "27 Gabriel Pereira\n", "28 Gabriel Pereira\n", "29 Gabriel Pereira\n", " ... \n", "619 Mouzinho da Silveira\n", "620 Mouzinho da Silveira\n", "621 Mouzinho da Silveira\n", "622 Mouzinho da Silveira\n", "623 Mouzinho da Silveira\n", "624 Mouzinho da Silveira\n", "625 Mouzinho da Silveira\n", "626 Mouzinho da Silveira\n", "627 Mouzinho da Silveira\n", "628 Mouzinho da Silveira\n", "629 Mouzinho da Silveira\n", "630 Mouzinho da Silveira\n", "631 Mouzinho da Silveira\n", "632 Mouzinho da Silveira\n", "633 Mouzinho da Silveira\n", "634 Mouzinho da Silveira\n", "635 Mouzinho da Silveira\n", "636 Mouzinho da Silveira\n", "637 Mouzinho da Silveira\n", "638 Mouzinho da Silveira\n", "639 Mouzinho da Silveira\n", "640 Mouzinho da Silveira\n", "641 Mouzinho da Silveira\n", "642 Mouzinho da Silveira\n", "643 Mouzinho da Silveira\n", "644 Mouzinho da Silveira\n", "645 Mouzinho da Silveira\n", "646 Mouzinho da Silveira\n", "647 Mouzinho da Silveira\n", "648 Mouzinho da Silveira\n", "Name: school, Length: 649, dtype: object" ] }, "execution_count": 14, "metadata": {}, "output_type": "execute_result" } ], "source": [ "students['school']" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Alternatively, we can access columns using the `.` notation." ] }, { "cell_type": "code", "execution_count": 15, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "0 Gabriel Pereira\n", "1 Gabriel Pereira\n", "2 Gabriel Pereira\n", "3 Gabriel Pereira\n", "4 Gabriel Pereira\n", "5 Gabriel Pereira\n", "6 Gabriel Pereira\n", "7 Gabriel Pereira\n", "8 Gabriel Pereira\n", "9 Gabriel Pereira\n", "10 Gabriel Pereira\n", "11 Gabriel Pereira\n", "12 Gabriel Pereira\n", "13 Gabriel Pereira\n", "14 Gabriel Pereira\n", "15 Gabriel Pereira\n", "16 Gabriel Pereira\n", "17 Gabriel Pereira\n", "18 Gabriel Pereira\n", "19 Gabriel Pereira\n", "20 Gabriel Pereira\n", "21 Gabriel Pereira\n", "22 Gabriel Pereira\n", "23 Gabriel Pereira\n", "24 Gabriel Pereira\n", "25 Gabriel Pereira\n", "26 Gabriel Pereira\n", "27 Gabriel Pereira\n", "28 Gabriel Pereira\n", "29 Gabriel Pereira\n", " ... \n", "619 Mouzinho da Silveira\n", "620 Mouzinho da Silveira\n", "621 Mouzinho da Silveira\n", "622 Mouzinho da Silveira\n", "623 Mouzinho da Silveira\n", "624 Mouzinho da Silveira\n", "625 Mouzinho da Silveira\n", "626 Mouzinho da Silveira\n", "627 Mouzinho da Silveira\n", "628 Mouzinho da Silveira\n", "629 Mouzinho da Silveira\n", "630 Mouzinho da Silveira\n", "631 Mouzinho da Silveira\n", "632 Mouzinho da Silveira\n", "633 Mouzinho da Silveira\n", "634 Mouzinho da Silveira\n", "635 Mouzinho da Silveira\n", "636 Mouzinho da Silveira\n", "637 Mouzinho da Silveira\n", "638 Mouzinho da Silveira\n", "639 Mouzinho da Silveira\n", "640 Mouzinho da Silveira\n", "641 Mouzinho da Silveira\n", "642 Mouzinho da Silveira\n", "643 Mouzinho da Silveira\n", "644 Mouzinho da Silveira\n", "645 Mouzinho da Silveira\n", "646 Mouzinho da Silveira\n", "647 Mouzinho da Silveira\n", "648 Mouzinho da Silveira\n", "Name: school, Length: 649, dtype: object" ] }, "execution_count": 15, "metadata": {}, "output_type": "execute_result" } ], "source": [ "students.school" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Since everything in Python is an object, what's the type of the column we've just selected?" ] }, { "cell_type": "code", "execution_count": 16, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "pandas.core.series.Series" ] }, "execution_count": 16, "metadata": {}, "output_type": "execute_result" } ], "source": [ "type(students['school'])" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Using `[]` and a `list` of `str`ings, we can also **select multiple columns** at the same time." ] }, { "cell_type": "code", "execution_count": 17, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
sexage
0F18
1F17
2F15
3F15
4F16
5M16
6M16
7F17
8M15
9M15
10F15
11F15
12M15
13M15
14M15
15F16
16F16
17F16
18M17
19M16
20M15
21M15
22M16
23M16
24F15
25F16
26M15
27M15
28M16
29M16
.........
619F18
620F17
621F17
622M18
623M18
624F17
625F18
626F18
627M18
628F17
629F17
630F18
631F18
632F19
633F18
634F18
635F17
636M18
637M18
638M17
639M19
640M18
641F18
642F17
643F18
644F19
645F18
646F18
647M17
648M18
\n", "

649 rows × 2 columns

\n", "
" ], "text/plain": [ " sex age\n", "0 F 18\n", "1 F 17\n", "2 F 15\n", "3 F 15\n", "4 F 16\n", "5 M 16\n", "6 M 16\n", "7 F 17\n", "8 M 15\n", "9 M 15\n", "10 F 15\n", "11 F 15\n", "12 M 15\n", "13 M 15\n", "14 M 15\n", "15 F 16\n", "16 F 16\n", "17 F 16\n", "18 M 17\n", "19 M 16\n", "20 M 15\n", "21 M 15\n", "22 M 16\n", "23 M 16\n", "24 F 15\n", "25 F 16\n", "26 M 15\n", "27 M 15\n", "28 M 16\n", "29 M 16\n", ".. .. ...\n", "619 F 18\n", "620 F 17\n", "621 F 17\n", "622 M 18\n", "623 M 18\n", "624 F 17\n", "625 F 18\n", "626 F 18\n", "627 M 18\n", "628 F 17\n", "629 F 17\n", "630 F 18\n", "631 F 18\n", "632 F 19\n", "633 F 18\n", "634 F 18\n", "635 F 17\n", "636 M 18\n", "637 M 18\n", "638 M 17\n", "639 M 19\n", "640 M 18\n", "641 F 18\n", "642 F 17\n", "643 F 18\n", "644 F 19\n", "645 F 18\n", "646 F 18\n", "647 M 17\n", "648 M 18\n", "\n", "[649 rows x 2 columns]" ] }, "execution_count": 17, "metadata": {}, "output_type": "execute_result" } ], "source": [ "students[['sex', 'age']]" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### `loc`, `iloc`, and `ix`" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "These three methods can be used to **simultaneously select rows and columns** of a `DataFrame`.\n", "\n", "`loc` selects rows or columns by **name**" ] }, { "cell_type": "code", "execution_count": 18, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "0 18\n", "1 17\n", "2 15\n", "3 15\n", "4 16\n", "5 16\n", "6 16\n", "7 17\n", "8 15\n", "9 15\n", "10 15\n", "11 15\n", "12 15\n", "13 15\n", "14 15\n", "15 16\n", "16 16\n", "17 16\n", "18 17\n", "19 16\n", "20 15\n", "21 15\n", "22 16\n", "23 16\n", "24 15\n", "25 16\n", "26 15\n", "27 15\n", "28 16\n", "29 16\n", " ..\n", "619 18\n", "620 17\n", "621 17\n", "622 18\n", "623 18\n", "624 17\n", "625 18\n", "626 18\n", "627 18\n", "628 17\n", "629 17\n", "630 18\n", "631 18\n", "632 19\n", "633 18\n", "634 18\n", "635 17\n", "636 18\n", "637 18\n", "638 17\n", "639 19\n", "640 18\n", "641 18\n", "642 17\n", "643 18\n", "644 19\n", "645 18\n", "646 18\n", "647 17\n", "648 18\n", "Name: age, Length: 649, dtype: int64" ] }, "execution_count": 18, "metadata": {}, "output_type": "execute_result" } ], "source": [ "students.loc[:,'age'] # All rows, column 'age'" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "`iloc` selects rows and columns by **position**" ] }, { "cell_type": "code", "execution_count": 19, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
schoolsexagehome_areafamily_sizeparents_cohabiteducation_mothereducation_fatheroccupation_motheroccupation_father...internet_accessromantic_relationshipfamily_relationships_qualityfree_timegoing_outalcohol_weekdaysalcohol_weekendhealth_statusabsencesfinal_grade
0Gabriel PereiraF18Urban> 3FalseHigher educationHigher educationAt homeTeacher...FalseFalse434113411
1Gabriel PereiraF17Urban> 3TruePrimary educationPrimary educationAt homeOther...TrueFalse533113211
2Gabriel PereiraF15Urban<= 3TruePrimary educationPrimary educationAt homeOther...TrueFalse432233612
3Gabriel PereiraF15Urban> 3TrueHigher educationLower secondary educationHealthcareCivil service...TrueTrue322115014
4Gabriel PereiraF16Urban> 3TrueUpper secondary educationUpper secondary educationOtherOther...FalseFalse432125013
\n", "

5 rows × 31 columns

\n", "
" ], "text/plain": [ " school sex age home_area family_size parents_cohabit \\\n", "0 Gabriel Pereira F 18 Urban > 3 False \n", "1 Gabriel Pereira F 17 Urban > 3 True \n", "2 Gabriel Pereira F 15 Urban <= 3 True \n", "3 Gabriel Pereira F 15 Urban > 3 True \n", "4 Gabriel Pereira F 16 Urban > 3 True \n", "\n", " education_mother education_father occupation_mother \\\n", "0 Higher education Higher education At home \n", "1 Primary education Primary education At home \n", "2 Primary education Primary education At home \n", "3 Higher education Lower secondary education Healthcare \n", "4 Upper secondary education Upper secondary education Other \n", "\n", " occupation_father ... internet_access romantic_relationship \\\n", "0 Teacher ... False False \n", "1 Other ... True False \n", "2 Other ... True False \n", "3 Civil service ... True True \n", "4 Other ... False False \n", "\n", " family_relationships_quality free_time going_out alcohol_weekdays \\\n", "0 4 3 4 1 \n", "1 5 3 3 1 \n", "2 4 3 2 2 \n", "3 3 2 2 1 \n", "4 4 3 2 1 \n", "\n", " alcohol_weekend health_status absences final_grade \n", "0 1 3 4 11 \n", "1 1 3 2 11 \n", "2 3 3 6 12 \n", "3 1 5 0 14 \n", "4 2 5 0 13 \n", "\n", "[5 rows x 31 columns]" ] }, "execution_count": 19, "metadata": {}, "output_type": "execute_result" } ], "source": [ "students.iloc[0:5,:] # First five rows, all columns" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "`ix` selects rows or columns by **name** (same as `loc`), but falls back to **position** (like `iloc`) if needed" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Sorting\n", "\n", "It may be desirable to sort a `DataFrame` by some column(s), for example to identify unusual observations." ] }, { "cell_type": "code", "execution_count": 20, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
schoolsexagehome_areafamily_sizeparents_cohabiteducation_mothereducation_fatheroccupation_motheroccupation_father...internet_accessromantic_relationshipfamily_relationships_qualityfree_timegoing_outalcohol_weekdaysalcohol_weekendhealth_statusabsencesfinal_grade
197Gabriel PereiraF17Urban<= 3TrueUpper secondary educationUpper secondary educationOtherOther...TrueTrue5332313214
212Gabriel PereiraF17Urban> 3TrueHigher educationHigher educationCivil serviceTeacher...TrueFalse4242323016
256Gabriel PereiraM18Urban> 3TrueLower secondary educationLower secondary educationOtherAt home...TrueTrue443221268
150Gabriel PereiraF15Urban> 3FalseUpper secondary educationUpper secondary educationCivil serviceCivil service...FalseTrue132231249
325Gabriel PereiraM17Urban<= 3FalseHigher educationPrimary educationCivil serviceOther...TrueTrue4542452210
\n", "

5 rows × 31 columns

\n", "
" ], "text/plain": [ " school sex age home_area family_size parents_cohabit \\\n", "197 Gabriel Pereira F 17 Urban <= 3 True \n", "212 Gabriel Pereira F 17 Urban > 3 True \n", "256 Gabriel Pereira M 18 Urban > 3 True \n", "150 Gabriel Pereira F 15 Urban > 3 False \n", "325 Gabriel Pereira M 17 Urban <= 3 False \n", "\n", " education_mother education_father occupation_mother \\\n", "197 Upper secondary education Upper secondary education Other \n", "212 Higher education Higher education Civil service \n", "256 Lower secondary education Lower secondary education Other \n", "150 Upper secondary education Upper secondary education Civil service \n", "325 Higher education Primary education Civil service \n", "\n", " occupation_father ... internet_access romantic_relationship \\\n", "197 Other ... True True \n", "212 Teacher ... True False \n", "256 At home ... True True \n", "150 Civil service ... False True \n", "325 Other ... True True \n", "\n", " family_relationships_quality free_time going_out alcohol_weekdays \\\n", "197 5 3 3 2 \n", "212 4 2 4 2 \n", "256 4 4 3 2 \n", "150 1 3 2 2 \n", "325 4 5 4 2 \n", "\n", " alcohol_weekend health_status absences final_grade \n", "197 3 1 32 14 \n", "212 3 2 30 16 \n", "256 2 1 26 8 \n", "150 3 1 24 9 \n", "325 4 5 22 10 \n", "\n", "[5 rows x 31 columns]" ] }, "execution_count": 20, "metadata": {}, "output_type": "execute_result" } ], "source": [ "students.sort_values('absences', ascending=False).head()" ] }, { "cell_type": "code", "execution_count": 21, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
schoolsexagehome_areafamily_sizeparents_cohabiteducation_mothereducation_fatheroccupation_motheroccupation_father...internet_accessromantic_relationshipfamily_relationships_qualityfree_timegoing_outalcohol_weekdaysalcohol_weekendhealth_statusabsencesfinal_grade
29Gabriel PereiraM16Urban> 3TrueHigher educationHigher educationTeacherTeacher...TrueTrue445555412
61Gabriel PereiraF16Urban> 3TruePrimary educationPrimary educationCivil serviceCivil service...TrueTrue555555016
66Gabriel PereiraM15Urban> 3FalseHigher educationHigher educationOtherCivil service...TrueTrue133553012
100Gabriel PereiraM16Urban> 3TrueHigher educationHigher educationCivil serviceCivil service...TrueFalse455554128
237Gabriel PereiraM18Urban> 3TrueLower secondary educationLower secondary educationOtherOther...TrueFalse333554910
\n", "

5 rows × 31 columns

\n", "
" ], "text/plain": [ " school sex age home_area family_size parents_cohabit \\\n", "29 Gabriel Pereira M 16 Urban > 3 True \n", "61 Gabriel Pereira F 16 Urban > 3 True \n", "66 Gabriel Pereira M 15 Urban > 3 False \n", "100 Gabriel Pereira M 16 Urban > 3 True \n", "237 Gabriel Pereira M 18 Urban > 3 True \n", "\n", " education_mother education_father occupation_mother \\\n", "29 Higher education Higher education Teacher \n", "61 Primary education Primary education Civil service \n", "66 Higher education Higher education Other \n", "100 Higher education Higher education Civil service \n", "237 Lower secondary education Lower secondary education Other \n", "\n", " occupation_father ... internet_access romantic_relationship \\\n", "29 Teacher ... True True \n", "61 Civil service ... True True \n", "66 Civil service ... True True \n", "100 Civil service ... True False \n", "237 Other ... True False \n", "\n", " family_relationships_quality free_time going_out alcohol_weekdays \\\n", "29 4 4 5 5 \n", "61 5 5 5 5 \n", "66 1 3 3 5 \n", "100 4 5 5 5 \n", "237 3 3 3 5 \n", "\n", " alcohol_weekend health_status absences final_grade \n", "29 5 5 4 12 \n", "61 5 5 0 16 \n", "66 5 3 0 12 \n", "100 5 4 12 8 \n", "237 5 4 9 10 \n", "\n", "[5 rows x 31 columns]" ] }, "execution_count": 21, "metadata": {}, "output_type": "execute_result" } ], "source": [ "students.sort_values(['alcohol_weekdays', 'alcohol_weekend'], ascending=False).head()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Duplicates\n", "\n", "Duplicate rows can be identified using `duplicated`, which returns `True` if a row (possibly limited to a subset of columns) has been seen previously. We can count the number of duplicate rows by combining `duplicated` with `sum`.\n", "\n", "Duplicate rows can be removed using Boolean filtering, or directly using `drop_duplicates`." ] }, { "cell_type": "code", "execution_count": 22, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "0" ] }, "execution_count": 22, "metadata": {}, "output_type": "execute_result" } ], "source": [ "students.duplicated().sum()" ] }, { "cell_type": "code", "execution_count": 23, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "647" ] }, "execution_count": 23, "metadata": {}, "output_type": "execute_result" } ], "source": [ "students['school'].duplicated().sum()" ] }, { "cell_type": "code", "execution_count": 24, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "647" ] }, "execution_count": 24, "metadata": {}, "output_type": "execute_result" } ], "source": [ "students.duplicated(['school', 'sex', 'age']).duplicated().sum()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Adding, renaming, and removing columns" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The `[]` notation can also be used to create new columns, for example based on existing information." ] }, { "cell_type": "code", "execution_count": 25, "metadata": {}, "outputs": [], "source": [ "students['minor'] = students['age'] < 18" ] }, { "cell_type": "code", "execution_count": 26, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "True 468\n", "False 181\n", "Name: minor, dtype: int64" ] }, "execution_count": 26, "metadata": {}, "output_type": "execute_result" } ], "source": [ "students['minor'].value_counts()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Columns can be renamed using `rename`, which takes a `dict`ionary mapping old names to new names." ] }, { "cell_type": "code", "execution_count": 27, "metadata": {}, "outputs": [], "source": [ "students.rename(columns={\n", " 'minor': 'is_minor'\n", "}, inplace=True)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Rows and columns can be removed using `drop`, with `axis` set to 0 for rows, and to 1 for columns." ] }, { "cell_type": "code", "execution_count": 28, "metadata": {}, "outputs": [], "source": [ "students.drop('is_minor', axis=1, inplace=True)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Replacing values\n", "\n", "Values in a `Series` can be replaced using `replace`, which takes a `dict`ionary mapping old values to new values." ] }, { "cell_type": "code", "execution_count": 29, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "0 549\n", "1 70\n", "2 16\n", ">= 3 14\n", "Name: failures, dtype: int64" ] }, "execution_count": 29, "metadata": {}, "output_type": "execute_result" } ], "source": [ "students.failures.value_counts()" ] }, { "cell_type": "code", "execution_count": 30, "metadata": {}, "outputs": [], "source": [ "students['failures'].replace({\n", " '>= 3': '3'\n", "}, inplace=True)" ] }, { "cell_type": "code", "execution_count": 31, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "0 549\n", "1 70\n", "2 16\n", "3 14\n", "Name: failures, dtype: int64" ] }, "execution_count": 31, "metadata": {}, "output_type": "execute_result" } ], "source": [ "students.failures.value_counts()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Changing data types\n", "\n", "Data types can be converted using `astype`." ] }, { "cell_type": "code", "execution_count": 32, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "dtype('O')" ] }, "execution_count": 32, "metadata": {}, "output_type": "execute_result" } ], "source": [ "students['failures'].dtype" ] }, { "cell_type": "code", "execution_count": 33, "metadata": {}, "outputs": [], "source": [ "students['failures'] = students['failures'].astype('int')" ] }, { "cell_type": "code", "execution_count": 34, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "dtype('int64')" ] }, "execution_count": 34, "metadata": {}, "output_type": "execute_result" } ], "source": [ "students['failures'].dtype" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Summarising data\n", "\n", "### Summary statistics\n", "\n", "A number of summary statistics can be computed using methods such as `mean` and `median`, which can be called on `Series` or `DataFrame`s." ] }, { "cell_type": "code", "execution_count": 35, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "16.7442218798151" ] }, "execution_count": 35, "metadata": {}, "output_type": "execute_result" } ], "source": [ "students['age'].mean()" ] }, { "cell_type": "code", "execution_count": 36, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "17.0" ] }, "execution_count": 36, "metadata": {}, "output_type": "execute_result" } ], "source": [ "students['age'].median()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "In addition, `describe` provides an easy way to compute a number of summary statistics in one go." ] }, { "cell_type": "code", "execution_count": 37, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "count 649.000000\n", "mean 16.744222\n", "std 1.218138\n", "min 15.000000\n", "25% 16.000000\n", "50% 17.000000\n", "75% 18.000000\n", "max 22.000000\n", "Name: age, dtype: float64" ] }, "execution_count": 37, "metadata": {}, "output_type": "execute_result" } ], "source": [ "students['age'].describe()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### Tabulation and cross-tabulation\n", "\n", "When dealing with categorical variables, it's often useful to:\n", "* Count the number of unique values (using `nunique`)\n", "* Retrieve the set of unique values (using `unique`)\n", "* Tabulate, i.e. count the number of occurrences of each value (using `value_counts`)" ] }, { "cell_type": "code", "execution_count": 38, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "2" ] }, "execution_count": 38, "metadata": {}, "output_type": "execute_result" } ], "source": [ "students['sex'].nunique()" ] }, { "cell_type": "code", "execution_count": 39, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "array(['F', 'M'], dtype=object)" ] }, "execution_count": 39, "metadata": {}, "output_type": "execute_result" } ], "source": [ "students['sex'].unique()" ] }, { "cell_type": "code", "execution_count": 40, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "F 383\n", "M 266\n", "Name: sex, dtype: int64" ] }, "execution_count": 40, "metadata": {}, "output_type": "execute_result" } ], "source": [ "students['sex'].value_counts()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "By default, the output of `value_counts` is in decreasing order. If desired, we may sort by value using `sort_index`." ] }, { "cell_type": "code", "execution_count": 41, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "1 451\n", "2 121\n", "3 43\n", "4 17\n", "5 17\n", "Name: alcohol_weekdays, dtype: int64" ] }, "execution_count": 41, "metadata": {}, "output_type": "execute_result" } ], "source": [ "students['alcohol_weekdays'].value_counts().sort_index()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Given two `Series`, we can also cross-tabulate." ] }, { "cell_type": "code", "execution_count": 42, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
alcohol_weekend12345
alcohol_weekdays
124111364285
233443347
31192012
411456
5110015
\n", "
" ], "text/plain": [ "alcohol_weekend 1 2 3 4 5\n", "alcohol_weekdays \n", "1 241 113 64 28 5\n", "2 3 34 43 34 7\n", "3 1 1 9 20 12\n", "4 1 1 4 5 6\n", "5 1 1 0 0 15" ] }, "execution_count": 42, "metadata": {}, "output_type": "execute_result" } ], "source": [ "pd.crosstab(students['alcohol_weekdays'], students['alcohol_weekend'])" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Pivot tables" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Using the method `pivot_table`, we can generate tables of summary statistics (`mean`s by default)." ] }, { "cell_type": "code", "execution_count": 43, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
age
schoolsex
Gabriel PereiraF16.738397
M16.575269
Mouzinho da SilveiraF16.869863
M16.925000
\n", "
" ], "text/plain": [ " age\n", "school sex \n", "Gabriel Pereira F 16.738397\n", " M 16.575269\n", "Mouzinho da Silveira F 16.869863\n", " M 16.925000" ] }, "execution_count": 43, "metadata": {}, "output_type": "execute_result" } ], "source": [ "students.pivot_table(values='age', index=['school', 'sex'])" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "In the call to `pivot_table`, variables used to define rows are specified in `index`, and columns in `columns`. A different statistic can be computed by specifying `aggfunc`." ] }, { "cell_type": "code", "execution_count": 44, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
sexFM
school
Gabriel Pereira1716
Mouzinho da Silveira1717
\n", "
" ], "text/plain": [ "sex F M\n", "school \n", "Gabriel Pereira 17 16\n", "Mouzinho da Silveira 17 17" ] }, "execution_count": 44, "metadata": {}, "output_type": "execute_result" } ], "source": [ "students.pivot_table(values='age', index='school', columns='sex', aggfunc='median')" ] }, { "cell_type": "code", "execution_count": 45, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
sexFM
schoolhome_area
Gabriel PereiraRural1.2926831.810811
Urban1.2908161.671141
Mouzinho da SilveiraRural1.3037972.150000
Urban1.2388062.025000
\n", "
" ], "text/plain": [ "sex F M\n", "school home_area \n", "Gabriel Pereira Rural 1.292683 1.810811\n", " Urban 1.290816 1.671141\n", "Mouzinho da Silveira Rural 1.303797 2.150000\n", " Urban 1.238806 2.025000" ] }, "execution_count": 45, "metadata": {}, "output_type": "execute_result" } ], "source": [ "students.pivot_table(values='alcohol_weekdays', index=['school', 'home_area'], columns='sex')" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Split-apply-combine\n", "\n", "[Split-apply-combine](http://pandas.pydata.org/pandas-docs/stable/groupby.html) involves:\n", "* Splitting the data into groups based on some criteria\n", "* Applying a function (e.g. `mean`) to each group independently\n", "* Combining the results back into a `DataFrame`\n", "\n", "![Split-apply-combine](http://i.imgur.com/yjNkiwL.png)" ] }, { "cell_type": "code", "execution_count": 46, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "school\n", "Gabriel Pereira 16.666667\n", "Mouzinho da Silveira 16.889381\n", "Name: age, dtype: float64" ] }, "execution_count": 46, "metadata": {}, "output_type": "execute_result" } ], "source": [ "students.groupby('school')['age'].mean()" ] }, { "cell_type": "code", "execution_count": 47, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
countmeanstdmin25%50%75%max
school
Gabriel Pereira423.016.6666671.24489515.016.017.018.022.0
Mouzinho da Silveira226.016.8893811.15515215.016.017.018.020.0
\n", "
" ], "text/plain": [ " count mean std min 25% 50% 75% max\n", "school \n", "Gabriel Pereira 423.0 16.666667 1.244895 15.0 16.0 17.0 18.0 22.0\n", "Mouzinho da Silveira 226.0 16.889381 1.155152 15.0 16.0 17.0 18.0 20.0" ] }, "execution_count": 47, "metadata": {}, "output_type": "execute_result" } ], "source": [ "students.groupby('school')['age'].describe()" ] }, { "cell_type": "code", "execution_count": 48, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
medianminmax
school
Gabriel Pereira171522
Mouzinho da Silveira171520
\n", "
" ], "text/plain": [ " median min max\n", "school \n", "Gabriel Pereira 17 15 22\n", "Mouzinho da Silveira 17 15 20" ] }, "execution_count": 48, "metadata": {}, "output_type": "execute_result" } ], "source": [ "students.groupby('school')['age'].agg(['median', 'min', 'max']) # Customisable" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Visualising data\n", "\n", "### Histograms and density plots\n", "\n", "Histograms and density plots show the distribution of a numerical variable." ] }, { "cell_type": "code", "execution_count": 49, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "" ] }, "execution_count": 49, "metadata": {}, "output_type": "execute_result" }, { "data": { "image/png": "iVBORw0KGgoAAAANSUhEUgAAAYgAAAD8CAYAAABthzNFAAAABHNCSVQICAgIfAhkiAAAAAlwSFlzAAALEgAACxIB0t1+/AAAADl0RVh0U29mdHdhcmUAbWF0cGxvdGxpYiB2ZXJzaW9uIDIuMS4yLCBodHRwOi8vbWF0cGxvdGxpYi5vcmcvNQv5yAAAFUNJREFUeJzt3X+QJ3V95/HnS8AoygmEhWyAdYAi3KEVVzLHcUfwSDCGH56IdRooSwlyrlygTupyVa56pdRVWYWJaOLlDrMeFOARAoggd+Apcp5UqgK4rMsCAWUhqy67t7sBi8XgQRbe98e3R78OPTvf2Z3+9rDzfFR9a7o//enp9/Z8Z17bn+7+dqoKSZKme0XfBUiSFiYDQpLUyoCQJLUyICRJrQwISVIrA0KS1MqAkCS1MiAkSa0MCElSq737LmB3HHTQQTUxMdF3GZL0snLffff9XVUtma3fyzogJiYmWL16dd9lSNLLSpIfjNLPISZJUisDQpLUyoCQJLUyICRJrQwISVIrA0KS1KqzgEhyeJJvJXk4yUNJPty0H5jkjiSPNl8PaNqT5PNJ1idZl+S4rmqTJM2uyyOIHcAfVtU/AU4ALkxyLLASuLOqjgbubOYBTgOObl4rgMs7rE2SNIvOAqKqNlfVmmb6GeBh4FDgTODqptvVwDub6TOBa2rgbmD/JEu7qk+StHNjuZM6yQTwZuAe4JCq2gyDEElycNPtUOBHQ6ttbNo2j6NGab5NrLytl+1uuPSMXrarPU/nJ6mTvBa4Cbi4qrbvrGtLW7V8vxVJVidZvW3btvkqU5I0TacBkWQfBuFwbVV9pWneMjV01Hzd2rRvBA4fWv0wYNP071lVq6pqsqomlyyZ9bOmJEm7qMurmAJcATxcVZ8dWnQrcG4zfS7w1aH29zdXM50APD01FCVJGr8uz0GcCLwPeCDJ2qbtY8ClwA1Jzgd+CLy7WXY7cDqwHngWOK/D2iRJs+gsIKrqr2g/rwBwSkv/Ai7sqh5J0tx4J7UkqZUBIUlqZUBIkloZEJKkVgaEJKmVASFJamVASJJaGRCSpFYGhCSplQEhSWplQEiSWhkQkqRWBoQkqZUBIUlqZUBIkloZEJKkVgaEJKlVl8+kvjLJ1iQPDrVdn2Rt89ow9SjSJBNJfjq07Atd1SVJGk2Xz6S+Cvgz4Jqphqr6vanpJJcBTw/1f6yqlndYjyRpDrp8JvVdSSbaliUJ8B7gt7vaviRp9/R1DuIkYEtVPTrUdkSS7yb5dpKTeqpLktTocohpZ84Brhua3wwsq6onk/wGcEuSN1TV9ukrJlkBrABYtmzZWIqVpMVo7EcQSfYG3gVcP9VWVc9V1ZPN9H3AY8Cvta1fVauqarKqJpcsWTKOkiVpUepjiOmtwCNVtXGqIcmSJHs100cCRwOP91CbJKnR5WWu1wF/DRyTZGOS85tFZ/OLw0sAbwHWJbkf+DJwQVU91VVtkqTZdXkV0zkztP9+S9tNwE1d1SJJmjvvpJYktTIgJEmtDAhJUisDQpLUyoCQJLUyICRJrQwISVIrA0KS1MqAkCS1MiAkSa0MCElSKwNCktTKgJAktTIgJEmtDAhJUisDQpLUqrMHBknqx8TK23rb9oZLz+ht25p/XT5y9MokW5M8ONR2SZInkqxtXqcPLftokvVJvpfkd7uqS5I0mi6HmK4CTm1p/1xVLW9etwMkOZbBs6rf0KzzX5Ps1WFtkqRZdBYQVXUX8NSI3c8E/rKqnquqvwXWA8d3VZskaXZ9nKS+KMm6ZgjqgKbtUOBHQ302Nm0vkWRFktVJVm/btq3rWiVp0Rp3QFwOHAUsBzYDlzXtaelbbd+gqlZV1WRVTS5ZsqSbKiVJ4w2IqtpSVS9U1YvAF/n5MNJG4PChrocBm8ZZmyTpF401IJIsHZo9C5i6wulW4Owkv5TkCOBo4N5x1iZJ+kWd3QeR5DrgZOCgJBuBTwInJ1nOYPhoA/AhgKp6KMkNwN8AO4ALq+qFrmqTJM2us4CoqnNamq/YSf9PAZ/qqh5J0tz4URuSpFYGhCSplQEhSWplQEiSWhkQkqRWBoQkqZUBIUlqZUBIkloZEJKkVgaEJKmVASFJamVASJJaGRCSpFYGhCSplQEhSWplQEiSWo0UEEneONdvnOTKJFuTPDjU9sdJHkmyLsnNSfZv2ieS/DTJ2ub1hbluT5I0v0Y9gvhCknuT/MHUH/URXAWcOq3tDuCNVfXrwPeBjw4te6yqljevC0bchiSpIyMFRFX9JvBe4HBgdZK/SPI7s6xzF/DUtLZvVNWOZvZu4LC5lyxJGoeRz0FU1aPAfwQ+AvxL4PPNcNG7dnHbHwC+NjR/RJLvJvl2kpN28XtKkubJ3qN0SvLrwHnAGQyGif5VVa1J8qvAXwNfmctGk3wc2AFc2zRtBpZV1ZNJfgO4Jckbqmp7y7orgBUAy5Ytm8tmJUlzMOoRxJ8Ba4A3VdWFVbUGoKo2MTiqGFmSc4G3A++tqmq+z3NV9WQzfR/wGPBrbetX1aqqmqyqySVLlsxl05KkORjpCAI4HfhpVb0AkOQVwKuq6tmq+tKoG0tyKs0QVVU9O9S+BHiqql5IciRwNPD4qN9XkjT/Rj2C+Cbw6qH5fZu2GSW5jsHw0zFJNiY5n8GRyH7AHdMuZ30LsC7J/cCXgQuq6qnWbyxJGotRjyBeVVU/mZqpqp8k2XdnK1TVOS3NV8zQ9ybgphFrkSSNwahHEH+f5LipmeZE8k+7KUmStBCMegRxMXBjkk3N/FLg97opSZK0EIwUEFX1nST/GDgGCPBIVf1Dp5VJkno16hEEwD8FJpp13pyEqrqmk6okSb0b9Ua5LwFHAWuBF5rmAgwISdpDjXoEMQkcO3VjmyRpzzfqVUwPAr/SZSGSpIVl1COIg4C/SXIv8NxUY1W9o5OqJEm9GzUgLumyCEl7homVt/Wy3Q2XntHLdvd0o17m+u0krweOrqpvNndR79VtaZKkPo36yNEPMviMpD9vmg4FbumqKElS/0Y9SX0hcCKwHX728KCDuypKktS/UQPiuap6fmomyd4M7oOQJO2hRg2Ibyf5GPDq5lnUNwL/o7uyJEl9GzUgVgLbgAeADwG3M8cnyUmSXl5GvYrpReCLzUuStAiM+llMf0vLOYeqOnLeK5IkLQhz+SymKa8C3g0cONtKSa4E3g5srao3Nm0HAtcz+GTYDcB7qurHSQL8KYPnXz8L/H5VrRmxPknSPBvpHERVPTn0eqKq/gT47RFWvQo4dVrbSuDOqjoauLOZBzgNOLp5rQAuH6U2SVI3Rh1iOm5o9hUMjij2m229qrorycS05jOBk5vpq4H/A3ykab+m+cTYu5Psn2RpVW0epUZJ0vwadYjpsqHpHTRDQ7u4zUOm/uhX1eYkUzfcHQr8aKjfxqbNgJCkHox6FdNvdV0Ig0eZvmTTL+mUrGAwBMWyZcu6rkmSFq1Rh5j+/c6WV9Vn57DNLVNDR0mWAlub9o3A4UP9DgM2tWxrFbAKYHJy0ru5Jakjo94oNwn8WwZDPocCFwDHMjgPMeu5iGluBc5tps8FvjrU/v4MnAA87fkHSerPXB4YdFxVPQOQ5BLgxqr6NztbKcl1DE5IH5RkI/BJ4FLghiTnAz9kcMksDO7OPh1Yz+Ay1/Pm9C+RJM2rUQNiGfD80PzzDO5j2KmqOmeGRae09C0GnxorSVoARg2ILwH3JrmZwYnjs4BrOqtKktS7Ua9i+lSSrwEnNU3nVdV3uytLktS3UU9SA+wLbK+qPwU2Jjmio5okSQvAqI8c/SSDu50/2jTtA/z3roqSJPVv1COIs4B3AH8PUFWbmPvlrZKkl5FRA+L55iqjAkjymu5KkiQtBKMGxA1J/hzYP8kHgW/iw4MkaY826lVMn2meRb0dOAb4RFXd0WllkqRezRoQSfYCvl5VbwUMBUlaJGYdYqqqF4Bnk7xuDPVIkhaIUe+k/n/AA0nuoLmSCaCq/l0nVUmSejdqQNzWvCRJi8ROAyLJsqr6YVVdPa6CJEkLw2znIG6ZmkhyU8e1SJIWkNkCYvgxoEd2WYgkaWGZLSBqhmlJ0h5utpPUb0qyncGRxKubaZr5qqp/1Gl1kqTe7DQgqmqv+d5gkmOA64eajgQ+AewPfBDY1rR/rKpun+/tS5JGM+plrvOmqr4HLIef3aX9BHAzg2dQf66qPjPumiRJLzWXBwZ14RTgsar6Qc91SJKm6TsgzgauG5q/KMm6JFcmOaCvoiRJPQZEklcyeAjRjU3T5cBRDIafNgOXzbDeiiSrk6zetm1bWxdJ0jzo8wjiNGBNVW0BqKotVfVCVb3I4FkTx7etVFWrqmqyqiaXLFkyxnIlaXHpMyDOYWh4KcnSoWVnAQ+OvSJJ0s+M/SomgCT7Ar8DfGio+Y+SLGdwQ96GacskSWPWS0BU1bPAL09re18ftUiS2vUSENK4TKz0U+qlXdX3Za6SpAXKgJAktTIgJEmtDAhJUitPUkt62evrYoQNl57Ry3bHxSMISVIrA0KS1MqAkCS1MiAkSa0MCElSKwNCktTKgJAktTIgJEmtDAhJUisDQpLUyoCQJLXq7bOYkmwAngFeAHZU1WSSA4HrgQkGjx19T1X9uK8aJWkx6/sI4reqanlVTTbzK4E7q+po4M5mXpLUg74DYrozgaub6auBd/ZYiyQtan0GRAHfSHJfkhVN2yFVtRmg+Xrw9JWSrEiyOsnqbdu2jbFcSVpc+nwexIlVtSnJwcAdSR4ZZaWqWgWsApicnKwuC5Skxay3I4iq2tR83QrcDBwPbEmyFKD5urWv+iRpseslIJK8Jsl+U9PA24AHgVuBc5tu5wJf7aM+SVJ/Q0yHADcnmarhL6rqfyX5DnBDkvOBHwLv7qk+SVr0egmIqnoceFNL+5PAKeOvSJI03UK7zFWStEAYEJKkVgaEJKmVASFJamVASJJaGRCSpFYGhCSplQEhSWplQEiSWhkQkqRWBoQkqZUBIUlqZUBIkloZEJKkVgaEJKmVASFJajX2gEhyeJJvJXk4yUNJPty0X5LkiSRrm9fp465NkvRzfTxRbgfwh1W1pnku9X1J7miWfa6qPtNDTZKkacYeEFW1GdjcTD+T5GHg0HHXIUnauV7PQSSZAN4M3NM0XZRkXZIrkxzQW2GSpP4CIslrgZuAi6tqO3A5cBSwnMERxmUzrLciyeokq7dt2za2eiVpseklIJLswyAcrq2qrwBU1ZaqeqGqXgS+CBzftm5VraqqyaqaXLJkyfiKlqRFpo+rmAJcATxcVZ8dal861O0s4MFx1yZJ+rk+rmI6EXgf8ECStU3bx4BzkiwHCtgAfKiH2iRJjT6uYvorIC2Lbh93LZKkmXkntSSplQEhSWplQEiSWvVxklqS9ggTK2/rbdsbLj2j820s6oDo64c7jh+sJO0uh5gkSa0MCElSq0U9xKTx6XOsVtKu8QhCktTKgJAktTIgJEmtDAhJUisDQpLUyoCQJLUyICRJrQwISVIrA0KS1GrBBUSSU5N8L8n6JCv7rkeSFqsFFRBJ9gL+C3AacCyD51Qf229VkrQ4LaiAAI4H1lfV41X1PPCXwJk91yRJi9JC+7C+Q4EfDc1vBP5ZT7XscfzAPElzsdACIi1t9QsdkhXAimb2J0m+txvbOwj4u91Yf5fk0yN37aW+ObC+3WN9u2dR1zeHvyNtXj9Kp4UWEBuBw4fmDwM2DXeoqlXAqvnYWJLVVTU5H9+rC9a3e6xv91jf7lno9Y1ioZ2D+A5wdJIjkrwSOBu4teeaJGlRWlBHEFW1I8lFwNeBvYArq+qhnsuSpEVpQQUEQFXdDtw+ps3Ny1BVh6xv91jf7rG+3bPQ65tVqmr2XpKkRWehnYOQJC0Qe3xAzPbRHUl+Kcn1zfJ7kkyMsbbDk3wrycNJHkry4ZY+Jyd5Osna5vWJcdU3VMOGJA8021/dsjxJPt/sw3VJjhtTXccM7Ze1SbYnuXhan7HvvyRXJtma5MGhtgOT3JHk0ebrATOse27T59Ek546xvj9O8kjz87s5yf4zrLvT90KH9V2S5Imhn+PpM6zb+Uf1zFDf9UO1bUiydoZ1O99/86qq9tgXgxPdjwFHAq8E7geOndbnD4AvNNNnA9ePsb6lwHHN9H7A91vqOxn4nz3vxw3AQTtZfjrwNQb3sZwA3NPTz/r/Aq/ve/8BbwGOAx4cavsjYGUzvRL4dMt6BwKPN18PaKYPGFN9bwP2bqY/3VbfKO+FDuu7BPgPI7wHdvr73lV905ZfBnyir/03n689/QhilI/uOBO4upn+MnBKkrYb9uZdVW2uqjXN9DPAwwzuJn+5ORO4pgbuBvZPsnTMNZwCPFZVPxjzdl+iqu4CnprWPPw+uxp4Z8uqvwvcUVVPVdWPgTuAU8dRX1V9o6p2NLN3M7gHqRcz7L9RjOWjenZWX/O34z3AdfO93T7s6QHR9tEd0/8A/6xP8wvyNPDLY6luSDO09WbgnpbF/zzJ/Um+luQNYy1soIBvJLmvuZN9ulH2c9fOZuZfyr73H8AhVbUZBv8xAA5u6bMQ9iPABxgcEbaZ7b3QpYuaIbArZxiiWwj77yRgS1U9OsPyPvffnO3pATHrR3eM2KdTSV4L3ARcXFXbpy1ew2DY5E3AfwZuGWdtjROr6jgGn7J7YZK3TFve6z5sbqp8B3Bjy+KFsP9GtRDeix8HdgDXztBltvdCVy4HjgKWA5sZDONM1/v+A85h50cPfe2/XbKnB8SsH90x3CfJ3sDr2LXD212SZB8G4XBtVX1l+vKq2l5VP2mmbwf2SXLQuOprtrup+boVuJnBofywUfZzl04D1lTVlukLFsL+a2yZGnZrvm5t6dPrfmxOir8deG81A+bTjfBe6ERVbamqF6rqReCLM2y37/23N/Au4PqZ+vS1/3bVnh4Qo3x0x63A1NUi/xr43zP9csy3ZrzyCuDhqvrsDH1+ZeqcSJLjGfzMnhxHfc02X5Nkv6lpBiczH5zW7Vbg/c3VTCcAT08Np4zJjP9r63v/DRl+n50LfLWlz9eBtyU5oBlCeVvT1rkkpwIfAd5RVc/O0GeU90JX9Q2f0zprhu32/VE9bwUeqaqNbQv73H+7rO+z5F2/GFxh830GVzd8vGn7Twx+EQBexWBoYj1wL3DkGGv7TQaHwOuAtc3rdOAC4IKmz0XAQwyuyLgb+Bdj3n9HNtu+v6ljah8O1xgGD3p6DHgAmBxjffsy+IP/uqG2Xvcfg7DaDPwDg//Vns/gvNadwKPN1wObvpPAfxta9wPNe3E9cN4Y61vPYPx+6n04dWXfrwK37+y9MKb6vtS8t9Yx+KO/dHp9zfxLft/HUV/TftXU+26o79j333y+vJNaktRqTx9ikiTtIgNCktTKgJAktTIgJEmtDAhJUisDQpLUyoCQJLUyICRJrf4/SGphZSC3IWoAAAAASUVORK5CYII=\n", "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "students['final_grade'].plot.hist() # 10 bins by default" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "We can also specify the number of bins as an argument." ] }, { "cell_type": "code", "execution_count": 50, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "" ] }, "execution_count": 50, "metadata": {}, "output_type": "execute_result" }, { "data": { "image/png": "iVBORw0KGgoAAAANSUhEUgAAAYgAAAD8CAYAAABthzNFAAAABHNCSVQICAgIfAhkiAAAAAlwSFlzAAALEgAACxIB0t1+/AAAADl0RVh0U29mdHdhcmUAbWF0cGxvdGxpYiB2ZXJzaW9uIDIuMS4yLCBodHRwOi8vbWF0cGxvdGxpYi5vcmcvNQv5yAAAElpJREFUeJzt3X2QXXV9x/H3lwSKqJXEBIw8uOBkVOpITVeGilpKqAWigI5aHEdTRFMrKtR2SnwYYTrTmdAqiG1HjUINlCoPPkAFqzGiTv8gmGDkKWIiRoiJyapIUKwIfvvHPUtu198mZ3fvuedm9/2auXPPw+/s+c7Ze/ezv/MYmYkkSWPt13YBkqTBZEBIkooMCElSkQEhSSoyICRJRQaEJKnIgJAkFRkQkqQiA0KSVDS77QKmYt68eTk0NNR2GZK0T1m/fv1PMnP+3trt0wExNDTEunXr2i5DkvYpEfHDOu3cxSRJKjIgJElFBoQkqciAkCQVGRCSpCIDQpJUZEBIkooMCElSkQEhSSrap6+klqaroeU3TXrZLSuW9LASzWT2ICRJRQaEJKnIgJAkFTUWEBFxRUTsjIi7uqbNjYjVEbGpep9TTY+I+EhEbI6IOyJiUVN1SZLqabIH8SnglDHTlgNrMnMhsKYaBzgVWFi9lgEfbbAuSVINjQVEZn4T+NmYyWcAq6rhVcCZXdOvzI5bgYMjYkFTtUmS9q7fxyAOzcztANX7IdX0w4AHutptraZJkloyKAepozAtiw0jlkXEuohYNzIy0nBZkjRz9TsgdozuOqred1bTtwJHdLU7HNhW+gGZuTIzhzNzeP78vT5SVZI0Sf0OiBuBpdXwUuCGrulvqs5mOh54aHRXlCSpHY3daiMiPg2cCMyLiK3AhcAK4NqIOAe4H3ht1fxm4DRgM/AIcHZTdUmS6mksIDLz9ePMWlxom8C5TdUiSZq4QTlILUkaMN7NVWrAVO7GKg0KexCSpCIDQpJUZEBIkooMCElSkQEhSSoyICRJRQaEJKnIgJAkFRkQkqQiA0KSVGRASJKKDAhJUpEBIUkqMiAkSUUGhCSpyICQJBUZEJKkIgNCklRkQEiSigwISVKRASFJKjIgJElFBoQkqciAkCQVGRCSpCIDQpJUZEBIkooMCElSkQEhSSqa3cZKI+JvgLcACdwJnA0sAD4DzAVuB96YmY+2UZ80Uw0tv2nSy25ZsaSHlWgQ9L0HERGHAe8ChjPz+cAs4CzgYuDSzFwIPAic0+/aJEm7tbWLaTbwpIiYDRwEbAdOAq6v5q8CzmypNkkSLQREZv4I+CBwP51geAhYD/w8Mx+rmm0FDut3bZKk3drYxTQHOAM4Cngm8GTg1ELTHGf5ZRGxLiLWjYyMNFeoJM1wbexiOhn4QWaOZOZvgM8BLwYOrnY5ARwObCstnJkrM3M4M4fnz5/fn4olaQZqIyDuB46PiIMiIoDFwD3ALcBrqjZLgRtaqE2SVGnjGMRaOgejb6dziut+wErgAuDdEbEZeDpweb9rkyTt1sp1EJl5IXDhmMn3Ace1UI4kqcArqSVJRQaEJKnIgJAkFRkQkqQiA0KSVGRASJKKDAhJUpEBIUkqMiAkSUUGhCSpqJVbbUhqzlQeGyp1swchSSoyICRJRQaEJKnIgJAkFRkQkqQiA0KSVGRASJKKDAhJUpEBIUkqMiAkSUW1AiIint90IZKkwVK3B/GxiLgtIt4eEQc3WpEkaSDUCojMfAnwBuAIYF1E/GdE/FmjlUmSWlX7GERmbgLeD1wA/AnwkYj4bkS8uqniJEntqXsM4gURcSmwETgJeGVmPq8avrTB+iRJLan7PIh/BT4BvDczfzU6MTO3RcT7G6lMktSqugFxGvCrzHwcICL2Aw7MzEcy86rGqpMktaZuQHwVOBn4RTV+EPAV4MVNFCVp3zPVJ9ltWbGkR5WoV+oepD4wM0fDgWr4oGZKkiQNgroB8cuIWDQ6EhF/BPxqD+0lSfu4uruYzgeui4ht1fgC4C8mu9LqYrtPAs8HEngzcC9wDTAEbAFel5kPTnYdkqSpqXuh3LeA5wJ/DbwdeF5mrp/Cei8D/jsznwscS+f02eXAmsxcCKypxiVJLanbgwB4EZ3/7mcDL4wIMvPKia4wIn4feBnwlwCZ+SjwaEScAZxYNVsFfJ3ORXmSpBbUCoiIuAp4NrABeLyanMCEAwI4GhgB/j0ijgXWA+cBh2bmdoDM3B4Rh4xTyzJgGcCRRx45idVLkuqo24MYBo7JzOzROhcB78zMtRFxGRPYnZSZK4GVAMPDw72oR5JUUPcspruAZ/RonVuBrZm5thq/nk5g7IiIBQDV+84erU+SNAl1exDzgHsi4jbg16MTM/P0ia4wM38cEQ9ExHMy815gMXBP9VoKrKjeb5joz5Yk9U7dgLiox+t9J3B1RBwA3AecTac3c21EnAPcD7y2x+uUJE1ArYDIzG9ExLOAhZn51Yg4CJg12ZVm5gY6xzXGWjzZnylJ6q26t/t+K51jBR+vJh0GfKGpoiRJ7at7kPpc4ARgFzzx8KDiaaiSpOmhbkD8urqgDYCImE3nOghJ0jRVNyC+ERHvBZ5UPYv6OuC/mitLktS2ugGxnM7Vz3cCfwXcTOf51JKkaaruWUy/pfPI0U80W44kaVDUvRfTDygcc8jMo3tekSRpIEzkXkyjDqRzEdvc3pcjSRoUdZ8H8dOu148y88PASQ3XJklqUd1dTIu6Rvej06N4aiMVSZIGQt1dTB/qGn6M6pGgPa9GkjQw6p7F9KdNFyJJGix1dzG9e0/zM/OS3pQjSRoUEzmL6UXAjdX4K4FvAg80UZQkqX0TeWDQosx8GCAiLgKuy8y3NFWYJKlddW+1cSTwaNf4o8BQz6uRJA2Muj2Iq4DbIuLzdK6ofhVwZWNVSZpxhpbfNOllt6xY0sNKNKruWUz/GBFfAl5aTTo7M7/dXFmSpLbV3cUEcBCwKzMvA7ZGxFEN1SRJGgB1Hzl6IXAB8J5q0v7AfzRVlCSpfXV7EK8CTgd+CZCZ2/BWG5I0rdUNiEczM6lu+R0RT26uJEnSIKgbENdGxMeBgyPircBX8eFBkjSt1T2L6YPVs6h3Ac8BPpCZqxutTJLUqr0GRETMAr6cmScDhoIkzRB73cWUmY8Dj0TE0/pQjyRpQNS9kvp/gTsjYjXVmUwAmfmuRqqSJLWubkDcVL0kSTPEHgMiIo7MzPszc1W/CpIkDYa9HYP4wuhARHy24VokSQNkbwERXcNH93LFETErIr4dEV+sxo+KiLURsSkiromIA3q5PknSxOwtIHKc4V44D9jYNX4xcGlmLgQeBM7p8fokSROwt4A4NiJ2RcTDwAuq4V0R8XBE7JrsSiPicGAJ8MlqPICTgOurJquAMyf78yVJU7fHg9SZOauh9X4Y+Ht23/Dv6cDPM/OxanwrcFhD65Yk1TCR50H0RES8AtiZmeu7JxeaFndpRcSyiFgXEetGRkYaqVGS1EJAACcAp0fEFuAzdHYtfZjOjQBHezSHA9tKC2fmyswczszh+fPn96NeSZqR+h4QmfmezDw8M4eAs4CvZeYbgFuA11TNlgI39Ls2SdJubfQgxnMB8O6I2EznmMTlLdcjSTNa3VttNCIzvw58vRq+DziuzXokSbsNUg9CkjRADAhJUpEBIUkqavUYhCT1wtDyyT+NYMuKJT2sZHqxByFJKrIHoWltKv9ZSjOdPQhJUpEBIUkqMiAkSUUGhCSpyICQJBUZEJKkIgNCklRkQEiSigwISVKRASFJKjIgJElF3otJ0ozmnWDHZw9CklRkQEiSigwISVKRASFJKjIgJElFBoQkqciAkCQVGRCSpCIDQpJUZEBIkooMCElSkQEhSSoyICRJRX0PiIg4IiJuiYiNEXF3RJxXTZ8bEasjYlP1PqfftUmSdmujB/EY8LeZ+TzgeODciDgGWA6sycyFwJpqXJLUkr4HRGZuz8zbq+GHgY3AYcAZwKqq2SrgzH7XJknardVjEBExBLwQWAscmpnboRMiwCHtVSZJai0gIuIpwGeB8zNz1wSWWxYR6yJi3cjISHMFStIM10pARMT+dMLh6sz8XDV5R0QsqOYvAHaWls3MlZk5nJnD8+fP70/BkjQDtXEWUwCXAxsz85KuWTcCS6vhpcAN/a5NkrTb7BbWeQLwRuDOiNhQTXsvsAK4NiLOAe4HXttCbZKkSt8DIjP/B4hxZi/uZy2SpPF5JbUkqciAkCQVGRCSpCIDQpJUZEBIkoraOM11IAwtv2lKy29ZsaRHlUjaV03l78i+8DfEHoQkqciAkCQVGRCSpCIDQpJUZEBIkooMCElSkQEhSSoyICRJRQaEJKloxl5JrX3HVK96lzQ59iAkSUUGhCSpyICQJBUZEJKkIgNCklRkQEiSigwISVKRASFJKjIgJElFBoQkqciAkCQVGRCSpCIDQpJU5N1c1RfekVXa99iDkCQVDVQPIiJOAS4DZgGfzMwVLZckSY2Yaq96y4olPapkfAPTg4iIWcC/AacCxwCvj4hj2q1KkmauQepBHAdszsz7ACLiM8AZwD2tVjWOqaR/P5K/1zyGIM08A9ODAA4DHuga31pNkyS1YJB6EFGYlr/TKGIZsKwa/UVE3DvJ9c0DfjLJZackLq7VrLX6arK+qRn0+mDwa5zR9dX8OzKeZ9VpNEgBsRU4omv8cGDb2EaZuRJYOdWVRcS6zBye6s9pivVNjfVN3aDXaH3NG6RdTN8CFkbEURFxAHAWcGPLNUnSjDUwPYjMfCwi3gF8mc5prldk5t0tlyVJM9bABARAZt4M3Nyn1U15N1XDrG9qrG/qBr1G62tYZP7OcWBJkgbqGIQkaYBM+4CIiFMi4t6I2BwRywvzfy8irqnmr42IoT7WdkRE3BIRGyPi7og4r9DmxIh4KCI2VK8P9Ku+av1bIuLOat3rCvMjIj5Sbb87ImJRH2t7Ttd22RARuyLi/DFt+r79IuKKiNgZEXd1TZsbEasjYlP1PmecZZdWbTZFxNI+1fbPEfHd6vf3+Yg4eJxl9/hZaLjGiyLiR12/x9PGWXaP3/cG67umq7YtEbFhnGX7sg17JjOn7YvOwe7vA0cDBwDfAY4Z0+btwMeq4bOAa/pY3wJgUTX8VOB7hfpOBL7Y4jbcAszbw/zTgC/RuY7leGBti7/rHwPPanv7AS8DFgF3dU37J2B5NbwcuLiw3Fzgvup9TjU8pw+1vRyYXQ1fXKqtzmeh4RovAv6uxmdgj9/3puobM/9DwAfa3Ia9ek33HsQTt+/IzEeB0dt3dDsDWFUNXw8sjojSRXs9l5nbM/P2avhhYCP73tXjZwBXZsetwMERsaCFOhYD38/MH7aw7v8nM78J/GzM5O7P2SrgzMKifw6szsyfZeaDwGrglKZry8yvZOZj1eitdK5Bas0426+OOt/3KdtTfdXfjtcBn+71etsw3QOizu07nmhTfUkeAp7el+q6VLu2XgisLcz+44j4TkR8KSL+oK+Fda5m/0pErK+uYh9rUG6Rchbjfynb3H6jDs3M7dD5xwA4pNBmELblm+n0CEv29llo2juq3WBXjLOLbhC230uBHZm5aZz5bW/DCZnuAVHn9h21bvHRpIh4CvBZ4PzM3DVm9u10dpscC/wL8IV+1gackJmL6Nxl99yIeNmY+YOw/Q4ATgeuK8xue/tNRKvbMiLeBzwGXD1Ok719Fpr0UeDZwB8C2+nsxhmr9c8i8Hr23HtocxtO2HQPiDq373iiTUTMBp7G5Lq3kxIR+9MJh6sz83Nj52fmrsz8RTV8M7B/RMzrV32Zua163wl8nk43vlutW6Q07FTg9szcMXZG29uvy47RXW/V+85Cm9a2ZXVA/BXAG7LaWT5Wjc9CYzJzR2Y+npm/BT4xzrpb/SxWfz9eDVwzXps2t+FkTPeAqHP7jhuB0bNFXgN8bbwvSK9V+ysvBzZm5iXjtHnG6DGRiDiOzu/sp32q78kR8dTRYToHM+8a0+xG4E3V2UzHAw+N7krpo3H/a2tz+43R/TlbCtxQaPNl4OURMafahfLyalqjovOgrguA0zPzkXHa1PksNFlj93GtV42z7rZv13My8N3M3Fqa2fY2nJS2j5I3/aJzls336Jzd8L5q2j/Q+TIAHEhn18Rm4Dbg6D7W9hI6XeA7gA3V6zTgbcDbqjbvAO6mc0bGrcCL+1jf0dV6v1PVMLr9uusLOg96+j5wJzDc59/vQXT+4D+ta1qr249OWG0HfkPnv9pz6BzXWgNsqt7nVm2H6Tw9cXTZN1efxc3A2X2qbTOdffejn8HRs/qeCdy8p89CH7ffVdXn6w46f/QXjK2xGv+d73s/6qumf2r0c9fVtpVt2KuXV1JLkoqm+y4mSdIkGRCSpCIDQpJUZEBIkooMCElSkQEhSSoyICRJRQaEJKno/wCg+Qb33ylINwAAAABJRU5ErkJggg==\n", "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "students['final_grade'].plot.hist(20)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Let's now compare these with the corresponding density plot ('smooth' version of a histogram)." ] }, { "cell_type": "code", "execution_count": 51, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "" ] }, "execution_count": 51, "metadata": {}, "output_type": "execute_result" }, { "data": { "image/png": "\n", "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "students['final_grade'].plot.density()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### Grouped histograms\n", "\n", "We can also create histograms grouped by some variable, e.g. `school`." ] }, { "cell_type": "code", "execution_count": 52, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "array([,\n", " ],\n", " dtype=object)" ] }, "execution_count": 52, "metadata": {}, "output_type": "execute_result" }, { "data": { "image/png": "\n", "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "students.hist(column='final_grade', by='school')" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "In addition, we can ensure that the x-axis and y-axis are shared." ] }, { "cell_type": "code", "execution_count": 53, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "array([,\n", " ],\n", " dtype=object)" ] }, "execution_count": 53, "metadata": {}, "output_type": "execute_result" }, { "data": { "image/png": "\n", "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "students.hist(column='final_grade', by='school', sharex=True, sharey=True)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Scatter plots\n", "\n", "Scatter plots show the relationship between two (or pairs of) numerical variables." ] }, { "cell_type": "code", "execution_count": 54, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "" ] }, "execution_count": 54, "metadata": {}, "output_type": "execute_result" }, { "data": { "image/png": "iVBORw0KGgoAAAANSUhEUgAAAYwAAAELCAYAAADKjLEqAAAABHNCSVQICAgIfAhkiAAAAAlwSFlzAAALEgAACxIB0t1+/AAAADl0RVh0U29mdHdhcmUAbWF0cGxvdGxpYiB2ZXJzaW9uIDIuMS4yLCBodHRwOi8vbWF0cGxvdGxpYi5vcmcvNQv5yAAAIABJREFUeJzt3X2UVOWV7/HvbmiBgAgCAiMgGBgVEInpRTSOXIyRqCFiYpxE48Qk48LMTe4kmeuKxDjqGE0cJyPOa4RJHE2MxowoKhiBMHgxvrcdQIQYGCE2ARpFEFBEsPf9ow5a1ZxT/VT1qTfq91mrF11PPfXUrhdq96lz9j7m7oiIiHSmodIBiIhIbVDCEBGRIEoYIiISRAlDRESCKGGIiEgQJQwREQlS0oRhZsPNbKmZrTGzF83sG9H4kWa22MzWRv/2T7j9pdGctWZ2aSljFRGR/KyUdRhmNhQY6u4tZnY48DxwPvAl4HV3v8nMZgL93f3KDrc9EmgGmgCPbvthd99esoBFRCRRSbcw3H2zu7dEv+8C1gBHA9OBO6Npd5JJIh19Aljs7q9HSWIxcHYp4xURkWRl24dhZiOBDwHPAIPdfTNkkgpwVMxNjgZasy5vjMZERKQCupfjTsysDzAX+Ka77zSzoJvFjB30/ZmZzQBmAPTu3fvDxx9/fFdCFRGpO88///xr7j6os3klTxhm1kgmWfzc3e+PhtvMbKi7b472c2yNuelGYErW5WHAYx0nufscYA5AU1OTNzc3pxi9iMihz8z+EDKv1EdJGfATYI2735J11UPAgaOeLgUejLn5QmCqmfWPjqKaGo2JiEgFlHofxmnAXwAfM7Pl0c+5wE3AWWa2FjgruoyZNZnZjwHc/XXge8Bz0c/10ZiIiFRASQ+rLTd9JSUiUjgze97dmzqbp0pvEREJooQhIiJBlDAi23bvZUXrDrbt3lvpUHJUa1wiUn/KUodR7R5c/keunLuSxoYG9rW3c/MFEzhvYuVrBKs1LhGpT3W/hbFt916unLuSt/e1s2vvft7e1863566s+F/01RqXiNSvuk8YG7fvobEh92lobGhg4/Y9FYooo1rjEpH6VfcJY1j/Xuxrb88Z29fezrD+vSoUUUa1xiUi9avuE8aAPj24+YIJ9Gxs4PAe3enZ2MDNF0xgQJ8eiktEJIsK9yLbdu9l4/Y9DOvfq6o+lKs1LhE5dIQW7ukoqciAPj2q8gO5WuMSkfpT919JiYhIGCUMEREJooQhIiJBlDAiV9zbwknXPcoV97Z0ea11bbu4r7mVdW27Uojs0KfnS6Q2aKc3MHLmgvd+v++3m7nvtwvYcNMni1rrmnkv8NOnX3nv8hdPHcH100/scoyHKj1fIrWj7rcwkrYoitnSWNe2K+fDD+CnT72iv5wT6PkSqS11nzAWr4k7nXjyeD7LW3cUNF7v9HyJ1JZSn9P7djPbamarssbuzTpd6wYzW55w2w1m9kI0r2Sn0TvrhKMKGs9n4vB+BY3XOz1fIrWl1FsYdwBnZw+4++fcfaK7TwTmAvfnuf0Z0dxOKxCL9cPPnVzQeD6jBx/OF08dkTP2xVNHMHrw4UXFdqjT8yVSW0reGsTMRgLz3X18h3EDXgE+5u5rY263AWhy99dC76srrUGuuLeFxWu2ctYJRxWVLLKta9vF8tYdTBzeTx9+AfR8iVRWLbQGOR1oi0sWEQcWmZkDs919TimD6WqSyDZ68OH64CuAni+R2lDJhHERcE+e609z901mdhSw2Mx+5+7LOk4ysxnADIARI0Z0vFpERFJSkaOkzKw78Bng3qQ57r4p+ncr8AAwKWHeHHdvcvemQYMGlSJcERGhcofVfhz4nbtvjLvSzHqb2eEHfgemAqvi5qZlxh3PcPzVjzDjjme6vNbspWs555+WMXtp0rdtmbblK1p3dHrK1ZAq6NC1mtdv45ZFL9G8flv+B1BmofGXW7XGJVIpJd3pbWb3AFOAgUAbcK27/8TM7gCedvfbsub+CfBjdz/XzI4ls1UBma/N7nb3Gzu7v2J3emdXeh9QbKX3CVc/wp797z+nvboba244N2fOg8v/yJVzV9LY0MC+9nZuvmAC5008+qC1QqqgQ9e65MdP85t17yeK00cP4GeXnVLUY0xTaPyKS6R0Qnd6l3QLw90vcveh7t7o7sPc/SfR+Jeyk0U0tsndz41+f9ndT4p+xoUki2IlbVEUs6Uxe+nanGQBsGe/52xpbNu9lyvnruTtfe3s2ruft/e18+25Kw/6KzakCjp0reb123KSBcDj67ZVfEsjNH7FJVId6r7Se9m6+A/NpPF85q3c3On4xu17aGzIfdobGxrYuH1PzlhIFXToWsvWxh+ZnDReLqHxl1u1xiVSaXWfMCaPHlDQeD7nTxja6fiw/r3Y196ec/2+9naG9e+VMxZSBR261uQxA2PXShovl9D4y61a4xKptLpPGHO+9JGCxvO5/Iwx9OpuOWO9uhuXnzHmvcsD+vTg5gsm0LOxgcN7dKdnYwM3XzDhoNOwhlRBh67VNGoAp3dIgKePHkDTqMKTYppC41dcItWh5JXe5dSVSu8ZdzzDsnXbmDx6QFHJItvspWuZt3Iz508YmpMssm3bvZeN2/cwrH+vvB9EIVXQoWs1r9/GsrWvMXnMwIoni2yh8ZdbtcYlkrbQnd5KGCIida4qjpISEZFDhxKGiIgEUcIQEZEgShiRabc+xqiZC5h262OJc0JbRSxZvYUr71vBktVbuhzXvJZWLrvzOea1tHZ5rWptM6IWHFKPavF9r53ehLUGCW0VMXXWY/y+7c33Lh83uDcLvzWl4JgATvn+YrbsfOe9y0P7HsZTV51V1FrV2mZELTikHlXb+147vQMlbVFkj4e2iliyektOsgB4qe3NorY05rW05iQLgM073ylqS6Na24yoBYfUo1p+39d9wnhxy5udjoe2ili0ui12raTxfOa/EJ9kksbzqdY2I2rBIfWolt/3dZ8wxg3p3el4aKuIqWMHx66VNJ7PtBOHFDSeT7W2GVELDqlHtfy+r/uEMf+bUzodD20VcebYIRw3ODcBHTe4N2eOLfxD/vyThzO072E5Y0P7Hsb5Jw8veK1qbTOiFhxSj2r5fa+d3pFptz7Gi1veZNyQ3olJJLRVxJLVW1i0uo2pYwcXlSyyzWtpZf4LW5h24pCikkW2am0zohYcUo+q6X2v1iAiIhJER0mJiEiqSpowzOx2M9tqZquyxq4zsz+a2fLo59yE255tZi+Z2Tozm1nKOEVEpHOl3sK4Azg7ZnyWu0+Mfh7peKWZdQP+DTgHGAtcZGZjSxno2bcsZeTMBZx9y9IurzVr4Rqm/MNSZi1c0+W1bnx4Faf+4Nfc+PCqxDlpVmeHrnXXk+u58LYnuevJ9V1eqxYrXkXqUcn3YZjZSGC+u4+PLl8H7Hb3H+a5zanAde7+iejydwDc/Qf57quUld6hxnxnAfuyntJGg7U/KG6tY2cuIPvguwbg5SIr0EOqs0PXOum6R3nj7Xffu3xEz26suC7374LQtaqt4lWkHlX7Poyvm9nK6Cur/jHXHw1klzRvjMZSl7RFUcyWxqyFa3KSBcA+p6gtjRsfXkV7h7H2aPyANKuzQ9e668n1OckC4I23383Z0ghdq5YrXkXqUSUSxo+ADwITgc3AP8bMsZix2E0hM5thZs1m1vzqq68WHMzvtr5V0Hg+D66Mr8JOGs9n/qqESu+s8TSrs0PXenDl5ti1ssdD16rlileRelT2hOHube7+rru3A/8BTIqZthHILjoYBmxKWG+Ouze5e9OgQYMKjuf4oz5Q0Hg+0yfE11wkjeczbXxCpXfWeJrV2aFrTZ8wNHat7PHQtWq54lWkHpU9YZhZ9ifOp4G4vbnPAWPMbJSZHQZ8HnioFPE8+jdnFDSez7c+cQKNHbaNGi0zXqjvfmr8QS9OQzR+QJrV2aFrXfLRURzRs1vO2BE9u3HJR0cVvFYtV7yK1KOS7vQ2s3uAKcBAoA24Nro8kcxXTBuAy919s5n9CfBjdz83uu25wK1AN+B2d7+xs/vrSuHe2bcs5Xdb3+L4oz5QVLLINmvhGh5cuYXpE4YUlSyy3fjwKuav2sK08UNykkW2NKuzQ9e668n1PLhyM9MnDM1JFsWsVU0VryL1SJXeIiISpNqPkhIRkRqjhCEiIkGUMCKn3LCIkTMXcMoNixLnhJ7H+op7Wzjpuke54t6WLseVZqV3mucaT/Oc3mmuleZ5y6tVrccv6SvXe0L7MAir9A49j3WaVeNpVnqnea7xNM/pneZaaZ63vFrVevySvjTeE9qHEShpiyJ7PPQ81klbFMVsaaRZ6Z3mucbTPKd3mmuled7yalXr8Uv6yv2eqPuEsWX3vk7HQ89jvXjN1th5SeP5pFnpnea5xtM8p3eaa6V53vJqVevxS/rK/Z6o+4QxpE9jp+Oh57E+64SjYucljeeTZqV3mucaT/Oc3mmuleZ5y6tVrccv6Sv3e6LuE8bTV0/tdDz0PNY//NzJsWsljeeTZqV3mucaT/Oc3mmuleZ5y6tVrccv6Sv3e0I7vSOn3LCILbv3MaRPY2ISCT2P9RX3trB4zVbOOuGoopJFtjQrvdM813ia5/ROc600z1terWo9fklfV98TqvQWEZEgOkpKRERSpYQhIiJBlDBERCRI90oHUC0+OHMB75Lppf4/CZXZITtUAS6e/QTP/mEHk47px92XnxY7J3QnVcgO9NC1rr5/Bb96sY1zxg3mhs+clDgvhHa8Vk65n3u91nKAdnoT1s4jpO1E6FqhpfzlXiuU2lNUTrmfe73W9UE7vQN9MOaDtON4SNsJyGxZxMkeDy3lD2kzErrW1feviF0raTwftaeonHI/93qtpaO6TxjvBoyHtJ0AePYP8fOyx0NL+UPajISu9asX41uAJI3no/YUlVPu516vtXRU0oRhZreb2VYzW5U19g9m9jszW2lmD5hZbE8HM9tgZi+Y2XIzK1lxRbeA8ZC2EwCTjomflz0eWsof0mYkdK1zxsW3AEkaz0ftKSqn3M+9XmvpqNRbGHcAZ3cYWwyMd/cJwO+B7+S5/RnuPjHku7ViJe3gzh4PaTsBJO7gzh4PLeUPaTMSulbSDu5idnyrPUXllPu512stHZV8p7eZjQTmu/tBfS3M7NPAZ939CzHXbQCa3D24dWlXKr11lFRhdORM5egoKUlb1bQG6SRhPAzc6+53xVy3HtgOODDb3ed0dl9qDSIiUrjQhFGxOgwz+y6wH/h5wpTT3H2TmR0FLDaz37n7sph1ZgAzAEaMGNHxahERSUlFjpIys0uBacAXPGETx903Rf9uBR4AJiXMm+PuTe7eNGjQoFKFLCJS98qeMMzsbOBK4Dx3fythTm8zO/zA78BUYFXc3LQ0Xf8oI2cuoOn6RxPnhJ5ofdqtjzFq5gKm3fpYl+OatXANU/5hKbMWrulyXPNaWrnszueY19La5bjuenI9F972JHc9ub7La4VY17aL+5pbD6p96Sj0uUhL6P01r9/GLYteKur0syLVoqT7MMzsHmAKMBBoA64lc1RUD+DA/5yn3f2rZvYnwI/d/VwzO5bMVgVkvja7291v7Oz+SlnpXYmK6jHfWcC+rJen0WDtD4qL65TvL2bLznfeuzy072E8ddVZRcV10nWP8sbb71eqHNGzGyuu63gwXHpCq+yrtQr6kh8/nXPu8tNHD+Bnl51SsrhEClUVld7ufpG7D3X3Rncf5u4/cffR7j48Olx2ort/NZq7yd3PjX5/2d1Pin7GhSSLYiVtUWSPh1a8Jm1RFLOlMWvhmpxkAbDPydnSCI1rXktrTrIA2LzznaK2NO56cn1OsgB44+13S7alEVplX61V0M3rt+UkC4DH123TlobUpOCEYWZ/ZmZfjn4fZGajShdW+bz2Vnytd/Z4aMXri1vejF0raTyfB1du6XQ8NK75L8SvlTSeP67NBY13VWiVfbVWQS9bG39UeNK4SDULShhmdi2Z/Q4HiuwagYMOha1FAz8QX+udPR5a8TpuSO55szsbz2f6hPhTqGaPh8Y17cT4tZLG88c1tKDxrgqtsq/WKujJYwbG3j5pXKSahW5hfBo4D3gT3juCKblyrYY0XxP/3Xv2eGjF6/xvToldK2k8n2994gQaLXes0TLjhcZ1/snDGdr3sJyxoX0P4/yThxcc1yUfHcURPXOT7BE9u3HJR0uzwRlaZV+tVdBNowZw+ujc85SfPnpAl89dLlIJQTu9zexZd59kZi3ufnJ05NJTUXuPqtGVwr2m6x/ltbfeZeAHuiUmkdCK12m3PsaLW95k3JDeRSWLbLMWruHBlVuYPmFITrIoJq55La3Mf2EL004cUlSyyHbXk+t5cOVmpk8YWrJkkS20yr5aq6Cb129j2drXmDxmoJKFVJ1UK73N7ApgDHAW8APgK2SOXPqXrgaaJlV6i4gULtVKb3f/oZmdBewEjgOucffFXYxRRERqSHBrkChBKEmIiNSpvAnDzHaRaf4Xy937ph5RhUy4dgE790LfHrDy7+IL7UK/h77w3x+npXUnJw/vy3/979O7FNfkm37NKzv2MqJfD5bN/HiX4rrx4VXMX7WFaeOH8N1PHdQLsiAh+1ZChewPCd2HETovRJr7HdTxVQ4Fofswrge2AD8DDPgCcLi731za8ApTykrv0GrdNCu904zr2JkLyD4ItAF4uYQV6KFCqsZDK71D54VIszpb58WWapd2pfcn3P3f3X2Xu+909x8BF3QtxOow4dr4c3pnj4dW617474/HrpU0ns/km37d6XhoXDc+vIrcigFoj8YLFVKBHiqkajy00jt0Xog0q7N1Xmw5lIQmjHfN7Atm1s3MGszsCySfDrum7Ez4f5s9Hlqt29K6M3Ze0ng+r+yIDyx7PDSu+asSKr0TxvMJqUAPX6vzqvHQSu/QeSHSrM7WebHlUBKaMC4G/pxMA8E24MJorOb1Tfg6OXs8tFr35OHxu3SSxvMZ0S8+sOzx0LimjU+o9E4YzyekAj18rc6rxkMrvUPnhUizOlvnxZZDSVDCcPcN7j7d3Qe6+yB3P9/dN5Q4trJI2sGdPR5arZu0g7uYHd9JO7izx0Pj+u6nxh/0QjdE44UKqUAPFVI1HlrpHTovRJrV2TovthxKQnd69wT+EhgH9Dww7u5fKV1ohetK4Z6OkiqMjpIqjI6SkmqWdqX3fwG/I/M11PVkjpJa4+7f6GqgaVKlt4hI4dI+Smq0u/8t8Ka73wl8EijueEUREalJoQljX/TvDjMbDxwBjCxJRCIiUpVCE8YcM+sPXA08BKwG/r6zG5nZ7Wa21cxWZY0daWaLzWxt9G//hNteGs1Za2aXBsYpIiIl0mnCMLMGYKe7b3f3Ze5+rLsf5e6zA9a/A+jYK3wmsMTdxwBLossd7/NIMuf//ggwCbg2KbGkZfw1Cxg5cwHjr4kv5IPMDtX7mls7LQabccczHH/1I8y445kur3Xx7CcYfdUCLp79RJfXmr10Lef80zJmL12bd16Iu55cz4W3PZn31KzN67dxy6KXOi14W7J6C1fet4Ilq5NrObbt3suK1h2dFryF3meIkPsMjSvkMaYZV5pC76/ccUn5he70Xubuk4u6A7ORwHx3Hx9dfgmY4u6bzWwo8Ji7H9fhNhdFcy6PLs+O5t2T775K2RoktO1Eta51wtWPsGf/+691r+7GmhvOPWheiJB2HqGtNabOeozft71/CtvjBvdm4bem5MwJba1R7nYeoXGFPMY040pT6P2p/UltS3un92Izu8LMhkdfKR0ZbQUUY7C7bwaI/j0qZs7RQGvW5Y3RWOqStiiyx0PbTiRtUWSPh66VtEWRPR661uyla3OSBcCe/V7UlkZIO4/Q1hpLVm/J+SAFeKntzZy/wkNba5S7nUdoXCGPMc240hR6f2p/Uj9CE8ZXgK8By4Dno59SHr9qMWOxm0JmNsPMms2s+dVXXy34jna/0/l4aNuJZeviP5yyx0PXevYP8fOyx0PXmpfQgiNpPJ+Qdh6hrTUWrW6LnZc9Htpao9ztPELjCnmMacaVptD7U/uT+hFa6T0q5ufYIu+zLfoqiujfrTFzNgLZ5xAdBmxKiG2Ouze5e9OgQYMKDqbPYZ2Ph7admDw6vrgrezx0rUnHxM/LHg9d6/yEFhxJ4/mEtPMIba0xdezg2HnZ46GtNcrdziM0rpDHmGZcaQq9P7U/qR9BCcPMPhPzc6aZxX2d1JmHgANHPV0KPBgzZyEw1cz6Rzu7p0ZjqVt1fXxVd/Z4aNuJOV/6SOxa2eOha919+Wmxa2WPh651+Rlj6NU9d6OtV3fj8jPGxN5HPiHtPEJba5w5dgjHDe6dM3bc4N6cOfb9vlShrTXK3c4jNK6Qx5hmXGkKvT+1P6kfoTu9FwCnAkujoSnA08CfAte7+88SbndPNHcgmaaF1wLzgF8CI4BXgAvd/XUzawK+6u6XRbf9CnBVtNSN7v6fncXZlUrv8dcsYPc7mS2LpCQS2nZixh3PsGzdNiaPHpCYRELXunj2Ezz7hx1MOqZfYhIJXWv20rXMW7mZ8ycMLSpZZAtp5xHaWmPJ6i0sWt3G1LGDEz9IQ1trlLudR2hcIY8xzbjSFHp/an9Su9JuDfIwcJm7t0WXBwM/Ai4Dlh04AqrS1BpERKRwaR8lNfJAsohsBf7U3V/n/SpwERE5hOU9p3eWx81sPvBf0eULgGVm1hso/Aw1IiJSc0K3ML4G/CcwEfgQ8FPga+7+prufUargyunq+1fw4e8t4ur7V3R5rXJXVFeiCnpeSyuX3fkc81paO5/ciTQrhFVtfGjQ61idgvZhdLqI2VPufmoK8XRJKSu9Q5W7oroSVdCnfH8xW3a+X6gytO9hPHXVWUWtlWaFsKqNDw16Hcsv7X0YnenZ+ZTqlLRFUcyWRrkrqitRBT2vpTUnWQBs3vlOUVsaaVYIq9r40KDXsbqllTC6vplSIb96Mb7iNmk8n3JXVFeiCnr+C/EtLZLG80mzQljVxocGvY7VLa2EUbPOGRdfcZs0nk+5K6orUQU97cT4GoKk8XzSrBBWtfGhQa9jdUsrYcT1fqoJN3zmpILG8yl3RXUlqqDPP3k4Q/vm9lMZ2vcwzj95eMItkqVZIaxq40ODXsfqltZO7/HuvqrzmaXVlcK9q+9fwa9ebOOccYOLShbZyl1RXYkq6Hktrcx/YQvTThxSVLLIlmaFsKqNDw16HcsrlUpvM9tF/P4JA9zd+xYfYvpU6S0iUrjQhJG3cM/dkxsTiYhIXQmt9AYg6k773iG07v5KnukiInIICW1vfp6ZrQXWA/8P2AD8qoRxiYhIlQk9Sup7wCnA7919FHAmEH8O0RoV0oIjzbXWte3ivubWg06n2tGS1Vu48r4VeU/pmWZrkNC1QuIPfYyh80Kk2VIizbjSXEsKozYj6Qltb97s7k1mtgL4kLu3m9mz7j6p9CGGK3and0gLjjTXumbeCznn4v7iqSO4fvqJB601ddZjOeeDPm5wbxZ+a0rOnDRbg4SuFRJ/6GMMnRcizZYSacaV5lpSGLUZCZN2a5AdZtaHzDm9f25m/wTs70qA1SKkBUeaa61r25Xz4QHw06deOegvzyWrt+QkC4CX2t7M2dJIszVI6Foh8Yc+xtB5IdJsKZFmXGmuJYVRm5H0hSaM6cAe4FvAo8D/AJ8qVVDlFNKCI821lrfGd4PvOL5odXxrkuzxNFuDhK4VEn/oYwydFyLNlhJpxpXmWlIYtRlJX1DCiNqYv+vu+939Tnf/Z3cvuke2mR1nZsuzfnaa2Tc7zJliZm9kzbmm2PvLJ6QFR5prTRzeL3ZOx/GpY+Nbk2SPp9kaJHStkPhDH2PovBBptpRIM64015LCqM1I+kKPkvqMma2NPsB3mtkuM9tZ7J26+0vuPtHdJwIfBt4CHoiZ+viBee5+fbH3l09IC4401xo9+HC+eOqInDlfPHXEQefiPnPsEI4b3Dtn7LjBvXPOB51ma5DQtULiD32MofNCpNlSIs240lxLCqM2I+kL3em9DviUu69JPQCzqcC17n5ah/EpwBXuPi10ra5Ueoe04EhzrXVtu1jeuoOJw/vl/fBYsnoLi1a3MXXs4JxkkS3N1iCha4XEH/oYQ+eFSLOlRJpxpbmWFEZtRjqXSmuQrMWe6PiBnhYzux1ocfd/7TA+BZgLbAQ2kUkeL+ZbS61BREQKl0prkCzNZnYvMA947xADd7+/yPgAMLPDgPOA78Rc3QIc4+67zezc6L4P6uRnZjOAGQAjRozoeLWIiKQk9CipvmT2M0wlc3TUp4Dgr4ryOIfM1sVBhwS5+0533x39/gjQaGYH7b119znu3uTuTYMGDUohJBERiRO0heHuXy7R/V8E3BN3hZkNAdrc3c1sEpnkVvSRWZ0p9/fVtf69arU+RrVKFymdvAnDzL7t7jeb2b8Q0+bc3f+62Ds2sw8AZwGXZ419NVr3NuCzwF+Z2X4yNSCf9zRO3hGj3FW9tV59Wq2PMc37rPXXSKQUOjsfxjZ3HxDVSGzveL2731nK4ApVzE7vdW27+PisZQeN//pbkwve0ghZa9vuvZz29//N2/vePz68Z2MDT1z5sZr4K7ZaH2Oa91nrr5FIodJqDdJmZscAXwYejvmpeeWu6q316tNqfYxp3metv0YipdLZPowfkWkFciyQ/ae7kfmK6tgSxVU25a7qrfXq02p9jGneZ62/RiKlkncLw93/xd1PAG5392Ozfka5e80nCyh/VW+tV59W62NM8z5r/TUSKZWgwr1a0ZXCPR0lVZhqfYw6SkqkcKlWetcKVXqLiBQu7fNhiIhInVPCEBGRIEoYJZDmOYR1PmKpFXqvHvpCmw9KIFUbSz3Se7U+aAsjRWmeQ1jnI5Zaofdq/VDCSJGqjaUe6b1aP5QwUqRqY6lHeq/WDyWMFKnaWOqR3qv1Q4V7JaBqY6lHeq/WrrRP0SoFGNCnR2r/YdJcS6SU9F499OkrKRERCaKEISIiQSqWMMxsg5m9YGbLzeygHQ+W8c9mts7MVprZyZWIU0REMiq9D+MMd38t4bpzgDHRz0fInMzpI+UKTGqTdrzHpsICAAALNklEQVSKlE6lE0Y+04GfeuYwrqfNrJ+ZDXX3zZUOTKqT2lOIlFYl92E4sMjMnjezGTHXHw20Zl3eGI2JHETtKURKr5IJ4zR3P5nMV09fM7PJHa63mNscVDRiZjPMrNnMml999dVSxCk1QO0pREqvYgnD3TdF/24FHgAmdZiyERiedXkYsClmnTnu3uTuTYMGDSpVuFLl1J5CpPQqkjDMrLeZHX7gd2AqsKrDtIeAL0ZHS50CvKH9F5JE7SlESq9SO70HAw+Y2YEY7nb3R83sqwDufhvwCHAusA54C/hyhWKVGnHexKM5bfRAHSUlUiIVSRju/jJwUsz4bVm/O/C1csYltU/tKURKR5XeIiISRAlDRESCKGGIiEgQJQwREQmihCEiIkGUMEREJIgShoiIBFHCEBGRIEoYIiISRAlDRESCKGGIiEgQJQwREQmihCEiIkGUMEREJIgShoiIBFHCEBGRIEoYIiISpFLn9B5uZkvNbI2ZvWhm34iZM8XM3jCz5dHPNZWIVUREMip1Tu/9wP919xYzOxx43swWu/vqDvMed/dpFYhPREQ6qMgWhrtvdveW6PddwBrg6ErEIiIiYSq+D8PMRgIfAp6JufpUM1thZr8ys3FlDUxERHJU6ispAMysDzAX+Ka77+xwdQtwjLvvNrNzgXnAmJg1ZgAzAEaMGFHiiEVE6lfFtjDMrJFMsvi5u9/f8Xp33+nuu6PfHwEazWxgzLw57t7k7k2DBg0qedwiIvWqUkdJGfATYI2735IwZ0g0DzObRCbWbeWLUkREslXqK6nTgL8AXjCz5dHYVcAIAHe/Dfgs8Fdmth/YA3ze3b0SwYqISIUShrv/BrBO5vwr8K/liUhERDpT8aOkRESkNihhiIhIECUMEREJooQhIiJBlDBERCSIEoaIiARRwhARkSBKGJFtu/eyonUH23bvrXQoIiJVqaLNB6vFg8v/yJVzV9LY0MC+9nZuvmAC501Ut3URkWx1v4Wxbfderpy7krf3tbNr737e3tfOt+eu1JaGiEgHdZ8wNm7fQ2ND7tPQ2NDAxu17KhSRiEh1qvuEMax/L/a1t+eM7WtvZ1j/XhWKSESkOtV9whjQpwc3XzCBno0NHN6jOz0bG7j5ggkM6NOj0qGJiFQV7fQGzpt4NKeNHsjG7XsY1r+XkoWISAwljMiAPj2UKERE8qj7r6RERCSMEoaIiASpWMIws7PN7CUzW2dmM2Ou72Fm90bXP2NmI8sfpYiIHFCRhGFm3YB/A84BxgIXmdnYDtP+Etju7qOBWcDflzKmi2c/weirFnDx7CdKeTciIjWrUlsYk4B17v6yu78D/AKY3mHOdODO6Pf7gDPNLO95wIs1cuYCnly/g/3t8OT6HYycuaAUdyMiUtMqlTCOBlqzLm+MxmLnuPt+4A1gQNqBJG1RaEtDRCRXpRJG3JaCFzEHM5thZs1m1vzqq68WHMizf9hR0LiISL2qVMLYCAzPujwM2JQ0x8y6A0cAr3dcyN3nuHuTuzcNGjSo4EAmHdOvoHERkXpVqYTxHDDGzEaZ2WHA54GHOsx5CLg0+v2zwH+7+0FbGF119+WnFTQuIlKvKpIwon0SXwcWAmuAX7r7i2Z2vZmdF037CTDAzNYBfwMcdOhtWjbc9Ek+Oqof3Rvgo6P6seGmT5bqrkREapaV4I/2imlqavLm5uZKhyEiUlPM7Hl3b+psniq9RUQkiBKGiIgEUcIQEZEgShgiIhJECUNERIIcUkdJmdmrwB+6sMRA4LWUwim3Wo4dajv+Wo4dajv+Wo4dqif+Y9y908rnQyphdJWZNYccWlaNajl2qO34azl2qO34azl2qL349ZWUiIgEUcIQEZEgShi55lQ6gC6o5dihtuOv5dihtuOv5dihxuLXPgwREQmiLQwREQmihAGY2dlm9pKZrTOzknXFLRUz22BmL5jZcjOr+u6LZna7mW01s1VZY0ea2WIzWxv927+SMSZJiP06M/tj9PwvN7NzKxljEjMbbmZLzWyNmb1oZt+IxmvluU+Kv+qffzPraWbPmtmKKPa/i8ZHmdkz0XN/b3S6h6pV919JmVk34PfAWWRO2vQccJG7r65oYAUwsw1Ak7tXw/HcnTKzycBu4KfuPj4auxl43d1vipJ2f3e/spJxxkmI/Tpgt7v/sJKxdcbMhgJD3b3FzA4HngfOB75EbTz3SfH/OVX+/JuZAb3dfbeZNQK/Ab5B5tQN97v7L8zsNmCFu/+okrHmoy0MmASsc/eX3f0d4BfA9ArHdEhz92UcfPbE6cCd0e93kvkgqDoJsdcEd9/s7i3R77vInIvmaGrnuU+Kv+p5xu7oYmP048DHgPui8ap97g9Qwsi84VqzLm+kRt6EWRxYZGbPm9mMSgdTpMHuvhkyHwzAURWOp1BfN7OV0VdWVfmVTjYzGwl8CHiGGnzuO8QPNfD8m1k3M1sObAUWA/8D7IhOKAc18NmjhAEWM1Zr39Od5u4nA+cAX4u+NpHy+RHwQWAisBn4x8qGk5+Z9QHmAt90952VjqdQMfHXxPPv7u+6+0RgGJlvNk6Im1beqAqjhJHJ6sOzLg8DNlUolqK4+6bo363AA2TejLWmLfqO+sB31VsrHE8wd2+LPgzagf+gip//6PvzucDP3f3+aLhmnvu4+Gvp+Qdw9x3AY8ApQD8z6x5dVfWfPUoYmZ3cY6KjFQ4DPg88VOGYgplZ72gHIGbWG5gKrMp/q6r0EHBp9PulwIMVjKUgBz5sI5+mSp//aMfrT4A17n5L1lU18dwnxV8Lz7+ZDTKzftHvvYCPk9kHsxT4bDStap/7A+r+KCmA6DC8W4FuwO3ufmOFQwpmZseS2aoA6A7cXe3xm9k9wBQynTrbgGuBecAvgRHAK8CF7l51O5cTYp9C5usQBzYAlx/YJ1BNzOzPgMeBF4D2aPgqMvsBauG5T4r/Iqr8+TezCWR2ancj84f6L939+uj/7y+AI4HfApe4+97KRZqfEoaIiATRV1IiIhJECUNERIIoYYiISBAlDBERCaKEISIiQZQwRAKY2e7OZ4kc2pQwREQkiBKGSAdmNi9q5PhidjNHM/tHM2sxsyVmNiga+2szWx01vvtFNNY7aoL3nJn91symR+NfMrP7zezR6PwHN2etfXa09gozW9LJOuOicyssj+53TDmfH6lfKtwT6cDMjnT316MWDs8B/wt4jUwV7s/N7BrgKHf/upltAka5+14z6+fuO8zs+8Bqd78ragfxLJnOqhcC10S/7wVeAv4MeBtoASa7+/qs+09a5ybg6SiWw4Bu7r6nfM+Q1KvunU8RqTt/bWafjn4fDowh04ri3mjsLuBA476VwM/NbB6Z9iaQ6ed1npldEV3uSabtBsASd38DwMxWA8cA/YFl7r4eIKstR9I6TwHfNbNhZE6+szadhy2SnxKGSBYzm0KmMdyp7v6WmT1G5oO6owOb5p8EJgPnAX9rZuPItMy/wN1f6rD2R8hsWRzwLpn/g0Z8W+vYdYA1ZvZMdN8Lzewyd//v8EcpUhztwxDJdQSwPUoWx5NpQQ2Z/ysHuopeDPzGzBqA4e6+FPg20A/oAywE/k/UXRUz+1An9/kU8L/MbFQ0/8hoPHadqGHdy+7+z2Q6zU7o4mMWCaItDJFcjwJfNbOVZPYxPB2NvwmMM7PngTeAz5HpPHqXmR1BZmtgVrQP43tkuh+vjD7sNwDTku7Q3V+Ndq7fHyWhrWTOMZ+0zueAS8xsH7AFuD7Fxy+SSDu9RUQkiL6SEhGRIEoYIiISRAlDRESCKGGIiEgQJQwREQmihCEiIkGUMEREJIgShoiIBPn/H3TKkYSJK5wAAAAASUVORK5CYII=\n", "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "students.plot.scatter(x='absences', y='final_grade')" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "When points overlap, it can be useful to add some transparency." ] }, { "cell_type": "code", "execution_count": 55, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "" ] }, "execution_count": 55, "metadata": {}, "output_type": "execute_result" }, { "data": { "image/png": "\n", "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "students.plot.scatter(x='absences', y='final_grade', alpha=0.3)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Using `scatter_matrix`, we can generate all pairwise scatter plots for a (small) set of numerical variables." ] }, { "cell_type": "code", "execution_count": 56, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "array([[,\n", " ,\n", " ],\n", " [,\n", " ,\n", " ],\n", " [,\n", " ,\n", " ]],\n", " dtype=object)" ] }, "execution_count": 56, "metadata": {}, "output_type": "execute_result" }, { "data": { "image/png": "\n", "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "pd.plotting.scatter_matrix(students[['age', 'absences', 'final_grade']], figsize=(10, 8))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Bar plots\n", "\n", "Bar plots show a numerical comparison across categories." ] }, { "cell_type": "code", "execution_count": 57, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "" ] }, "execution_count": 57, "metadata": {}, "output_type": "execute_result" }, { "data": { "image/png": "iVBORw0KGgoAAAANSUhEUgAAAXoAAAD4CAYAAADiry33AAAABHNCSVQICAgIfAhkiAAAAAlwSFlzAAALEgAACxIB0t1+/AAAADl0RVh0U29mdHdhcmUAbWF0cGxvdGxpYiB2ZXJzaW9uIDIuMS4yLCBodHRwOi8vbWF0cGxvdGxpYi5vcmcvNQv5yAAADWdJREFUeJzt3WuMnOV5h/HrH0xRUyIO8uJSH2JUuQ2OaA3ZOkhUFSlqOKSqiVQqaEUsSrOpZFSi5gtJVUFUIfGhCVKklMopJEZKILQJwm1RGurmIFoFYiPEyUFxgwsbG9tpIiAlIjLc/bDv1lOy3pk9zI79+PpJq5l55p2Z24N87fDszDpVhSSpXW8Z9QCSpOEy9JLUOEMvSY0z9JLUOEMvSY0z9JLUOEMvSY0z9JLUOEMvSY1bNuoBAJYvX15r164d9RiSdFzZtWvXD6pqrN9xx0To165dy86dO0c9hiQdV5L81yDHuXUjSY0z9JLUOEMvSY3rG/okq5N8LcnuJE8nubFbvyXJ95M83n1d0XObjybZk+TZJJcO8w8gSZrdID+MPQx8pKoeS/I2YFeSh7rrbq+qv+49OMl64GrgncAvAf+a5Feq6vXFHFySNJi+r+iran9VPdadfwXYDayc5SabgHur6rWqeg7YA2xcjGElSXM3pz36JGuB84FHuqUbkjyR5K4kZ3RrK4EXem42yQzfGJJMJNmZZOehQ4fmPLgkaTADhz7JqcCXgA9X1cvAHcAvAxuA/cAnpg+d4eY/8+8VVtXWqhqvqvGxsb7v95ckzdNAH5hKcjJTkf98VX0ZoKoO9Fz/GeCfuouTwOqem68C9i3KtLNYe9M/D/sh+tp72/tGPYIk/YxB3nUT4E5gd1V9smf97J7D3g881Z3fDlyd5JQk5wDrgEcXb2RJ0lwM8or+IuBa4Mkkj3drHwOuSbKBqW2ZvcCHAKrq6ST3Ac8w9Y6dLb7jRpJGp2/oq+phZt53f3CW29wK3LqAuSRJi8RPxkpS4wy9JDXO0EtS4wy9JDXO0EtS4wy9JDXO0EtS4wy9JDXO0EtS4wy9JDXO0EtS4wy9JDXO0EtS4wy9JDXO0EtS4wy9JDXO0EtS4wy9JDXO0EtS4wy9JDXO0EtS4wy9JDXO0EtS4wy9JDXO0EtS4wy9JDXO0EtS4wy9JDXO0EtS4wy9JDXO0EtS4wy9JDWub+iTrE7ytSS7kzyd5MZu/cwkDyX5bnd6RreeJJ9KsifJE0kuGPYfQpJ0dIO8oj8MfKSqzgUuBLYkWQ/cBOyoqnXAju4ywOXAuu5rArhj0aeWJA2sb+iran9VPdadfwXYDawENgHbusO2AVd25zcBd9eUbwGnJzl70SeXJA1kTnv0SdYC5wOPACuqaj9MfTMAzuoOWwm80HOzyW5NkjQCA4c+yanAl4APV9XLsx06w1rNcH8TSXYm2Xno0KFBx5AkzdFAoU9yMlOR/3xVfblbPjC9JdOdHuzWJ4HVPTdfBex7831W1daqGq+q8bGxsfnOL0nqY5B33QS4E9hdVZ/suWo7sLk7vxl4oGf9A927by4EXpre4pEkLb1lAxxzEXAt8GSSx7u1jwG3AfcluR54Hriqu+5B4ApgD/AqcN2iTixJmpO+oa+qh5l53x3gkhmOL2DLAueSJC0SPxkrSY0z9JLUOEMvSY0z9JLUOEMvSY0z9JLUOEMvSY0z9JLUOEMvSY0z9JLUOEMvSY0z9JLUOEMvSY0z9JLUOEMvSY0z9JLUOEMvSY0z9JLUOEMvSY0z9JLUOEMvSY0z9JLUOEMvSY0z9JLUOEMvSY0z9JLUOEMvSY0z9JLUOEMvSY0z9JLUOEMvSY0z9JLUOEMvSY3rG/okdyU5mOSpnrVbknw/yePd1xU91300yZ4kzya5dFiDS5IGM8gr+s8Bl82wfntVbei+HgRIsh64Gnhnd5u/SXLSYg0rSZq7vqGvqm8CPxzw/jYB91bVa1X1HLAH2LiA+SRJC7SQPfobkjzRbe2c0a2tBF7oOWayW5Mkjciyed7uDuCvgOpOPwH8MZAZjq2Z7iDJBDABsGbNmnmOoRndctqoJ4BbXhr1BJI683pFX1UHqur1qnoD+AxHtmcmgdU9h64C9h3lPrZW1XhVjY+Njc1nDEnSAOYV+iRn91x8PzD9jpztwNVJTklyDrAOeHRhI0qSFqLv1k2Se4CLgeVJJoGbgYuTbGBqW2Yv8CGAqno6yX3AM8BhYEtVvT6c0SVJg+gb+qq6ZoblO2c5/lbg1oUMJUlaPH4yVpIaZ+glqXGGXpIaZ+glqXGGXpIaZ+glqXGGXpIaZ+glqXGGXpIaZ+glqXGGXpIaZ+glqXGGXpIaZ+glqXGGXpIaZ+glqXGGXpIaZ+glqXGGXpIaZ+glqXF9/3Fw6Xh23rbzRj0CT25+ctQj6ATnK3pJapyhl6TGGXpJapyhl6TGGXpJapyhl6TGGXpJapyhl6TGGXpJapyhl6TGGXpJapyhl6TG9Q19kruSHEzyVM/amUkeSvLd7vSMbj1JPpVkT5InklwwzOElSf0N8or+c8Blb1q7CdhRVeuAHd1lgMuBdd3XBHDH4owpSZqvvqGvqm8CP3zT8iZgW3d+G3Blz/rdNeVbwOlJzl6sYSVJczffPfoVVbUfoDs9q1tfCbzQc9xktyZJGpHF/mFsZlirGQ9MJpLsTLLz0KFDizyGJGnafEN/YHpLpjs92K1PAqt7jlsF7JvpDqpqa1WNV9X42NjYPMeQJPUz39BvBzZ35zcDD/Ssf6B7982FwEvTWzySpNHo+2/GJrkHuBhYnmQSuBm4DbgvyfXA88BV3eEPAlcAe4BXgeuGMLMkaQ76hr6qrjnKVZfMcGwBWxY6lCRp8fjJWElqnKGXpMYZeklqnKGXpMYZeklqXN933Uhqw+53nDvqETj3O7tHPcIJyVf0ktQ4Qy9JjTP0ktQ4Qy9JjTP0ktQ4Qy9JjTP0ktQ4Qy9JjTP0ktQ4Qy9JjTP0ktQ4Qy9JjTP0ktQ4Qy9JjTP0ktQ4Qy9JjTP0ktQ4Qy9JjTP0ktQ4Qy9JjTP0ktQ4Qy9JjTP0ktQ4Qy9JjTP0ktQ4Qy9JjTP0ktS4ZQu5cZK9wCvA68DhqhpPcibwRWAtsBf4g6r60cLGlCTN12K8on9PVW2oqvHu8k3AjqpaB+zoLkuSRmQYWzebgG3d+W3AlUN4DEnSgBYa+gK+mmRXkolubUVV7QfoTs9a4GNIkhZgQXv0wEVVtS/JWcBDSb4z6A27bwwTAGvWrFngGJKko1nQK/qq2tedHgTuBzYCB5KcDdCdHjzKbbdW1XhVjY+NjS1kDEnSLOYd+iS/kORt0+eB9wJPAduBzd1hm4EHFjqkJGn+FrJ1swK4P8n0/Xyhqr6S5NvAfUmuB54Hrlr4mJKk+Zp36Kvqe8Cvz7D+38AlCxlKkobp03/6b6MegS1/+9tL9lh+MlaSGmfoJalxhl6SGmfoJalxhl6SGmfoJalxhl6SGmfoJalxhl6SGmfoJalxhl6SGmfoJalxhl6SGmfoJalxhl6SGmfoJalxhl6SGmfoJalxhl6SGmfoJalxhl6SGmfoJalxhl6SGmfoJalxhl6SGmfoJalxhl6SGmfoJalxhl6SGmfoJalxhl6SGmfoJalxhl6SGje00Ce5LMmzSfYkuWlYjyNJmt1QQp/kJODTwOXAeuCaJOuH8ViSpNkN6xX9RmBPVX2vqn4K3AtsGtJjSZJmkapa/DtNfh+4rKr+pLt8LfDuqrqh55gJYKK7+KvAs4s+yNwtB34w6iGOET4XR/hcHOFzccSx8Fy8varG+h20bEgPnhnW/t93lKraCmwd0uPPS5KdVTU+6jmOBT4XR/hcHOFzccTx9FwMa+tmEljdc3kVsG9IjyVJmsWwQv9tYF2Sc5L8HHA1sH1IjyVJmsVQtm6q6nCSG4B/AU4C7qqqp4fxWIvsmNpKGjGfiyN8Lo7wuTjiuHkuhvLDWEnSscNPxkpS4wy9JDXO0EtS4wy91CPJxiS/0Z1fn+TPk1wx6rmOBUnuHvUMmp9hfWBKx5Ek7wBWAo9U1Y971i+rqq+MbrKlleRmpn4/07IkDwHvBr4O3JTk/Kq6dZTzLaUkb347dID3JDkdoKp+b+mnOjYk+U2mfs3LU1X11VHPMwjfdTODJNdV1WdHPcdSSPJnwBZgN7ABuLGqHuiue6yqLhjlfEspyZNMPQenAC8Cq6rq5SQ/z9Q3wV8b6YBLKMljwDPA3zH1qfYA9zD1mRiq6hujm25pJXm0qjZ25z/I1N+X+4H3Av9YVbeNcr5BuHUzs4+PeoAl9EHgXVV1JXAx8JdJbuyum+lXWbTscFW9XlWvAv9ZVS8DVNVPgDdGO9qSGwd2AX8BvFRVXwd+UlXfOJEi3zm55/wE8DtV9XGmQv9Hoxlpbk7YrZskTxztKmDFUs4yYidNb9dU1d4kFwP/kOTtnHih/2mSt3ahf9f0YpLTOMFCX1VvALcn+fvu9AAnbi/ekuQMpl4Yp6oOAVTV/yQ5PNrRBnOi/oeDqZhfCvzoTesB/mPpxxmZF5NsqKrHAarqx0l+F7gLOG+0oy2536qq1+D/QjftZGDzaEYaraqaBK5K8j7g5VHPMyKnMfV/NwEqyS9W1YtJTuU4eTF0wu7RJ7kT+GxVPTzDdV+oqj8cwVhLLskqprYsXpzhuouq6t9HMJZ0zEvyVmBFVT036ln6OWFDL0knCn8YK0mNM/SS1DhDL0mNM/SS1Lj/BYNbOGnvcROrAAAAAElFTkSuQmCC\n", "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "students['alcohol_weekend'].value_counts().sort_index().plot.bar()" ] }, { "cell_type": "code", "execution_count": 58, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "" ] }, "execution_count": 58, "metadata": {}, "output_type": "execute_result" }, { "data": { "image/png": "\n", "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "students.groupby(['school', 'sex'])[['alcohol_weekdays', 'alcohol_weekend']].mean().plot.bar()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Box plots\n", "\n", "Box plots show quartiles (and outliers) for numerical variables (possibly across categories)." ] }, { "cell_type": "code", "execution_count": 59, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "" ] }, "execution_count": 59, "metadata": {}, "output_type": "execute_result" }, { "data": { "image/png": "iVBORw0KGgoAAAANSUhEUgAAAXQAAAD8CAYAAABn919SAAAABHNCSVQICAgIfAhkiAAAAAlwSFlzAAALEgAACxIB0t1+/AAAADl0RVh0U29mdHdhcmUAbWF0cGxvdGxpYiB2ZXJzaW9uIDIuMS4yLCBodHRwOi8vbWF0cGxvdGxpYi5vcmcvNQv5yAAADYlJREFUeJzt3X+s3fVdx/Hna7RuGBiy9M6Qyd2NRvBitzE5W1BcoEg2IotzW5g2umBsvG4BBoaYVa6RLaYG5zYzu/ijsxWMeLMt/NDJ4sDZrdYAriUdlF0GMYLpaGgnBjCjrsDbP3pYutrL+XHPuRc+9/lITnru53xOv+/+8+w333POPakqJEkvf69Y7gEkSaNh0CWpEQZdkhph0CWpEQZdkhph0CWpEQZdkhph0CWpEQZdkhqxaikPtmbNmpqamlrKQ0rSy97u3bu/XVUTvfYtadCnpqbYtWvXUh5Skl72kjzazz4vuUhSIwy6JDXCoEtSIwy6JDXCoEtSI3oGPcnpSbYnmU/yQJKruut/lOTBJPcluTXJD41/XGm05ubmWLt2LSeccAJr165lbm5uuUeShtbPGfqzwDVVNQ2cC1ye5CzgTmBtVb0ReAj4nfGNKY3e3Nwcs7OzbN68mUOHDrF582ZmZ2eNul62ega9qvZX1b3d+08D88DrquqOqnq2u+1u4EfGN6Y0eps2bWLr1q2sW7eO1atXs27dOrZu3cqmTZuWezRpKBnkO0WTTAE7OHJm/tRR618APltVf3Oc58wAMwCTk5PnPPpoX++Pl8buhBNO4NChQ6xevfp7a4cPH+ZVr3oVzz333DJOJn2/JLurqtNrX98viiY5CbgZuPqYmM9y5LLMTcd7XlVtqapOVXUmJnp+clVaMtPT0+zcufP71nbu3Mn09PQyTSQtTl9BT7KaIzG/qapuOWr9MuCdwK/UIKf60kvA7OwsGzZsYPv27Rw+fJjt27ezYcMGZmdnl3s0aSg9f5dLkgBbgfmq+uRR6xcDHwbOr6rvjG9EaTzWr18PwJVXXsn8/DzT09Ns2rTpe+vSy03Pa+hJfhb4F+B+4Pnu8rXAnwCvBP6ru3Z3VX3gxf6uTqdT/nIuSRpMv9fQe56hV9VOIMd56IvDDCZJGg8/KSpJjTDoktQIgy5JjTDoktQIgy5JjTDoktQIgy5JjTDoktQIgy5JjTDoktQIgy5JjTDoktQIgy5JjTDoktQIgy5JjTDoktQIgy5JjTDoktQIgy5JjegZ9CSnJ9meZD7JA0mu6q5f2v35+SQ9v7xUkjRePb8kGngWuKaq7k1yMrA7yZ3AXuA9wF+Mc0BJUn96Br2q9gP7u/efTjIPvK6q7gRIMt4JJUl96ecM/XuSTAFvBu4Z4DkzwAzA5OTkIIeThrZUJxpVtSTHkfrR94uiSU4Cbgaurqqn+n1eVW2pqk5VdSYmJoaZURpYVQ10e/2H/2Hg5xhzvdT0FfQkqzkS85uq6pbxjiRJGkY/73IJsBWYr6pPjn8kSdIw+rmGfh7wfuD+JHu6a9cCrwQ2AxPA7Un2VNU7xjOmJKmXft7lshNY6BWmW0c7jiRpWH5SVJIaYdAlqREGXZIaYdAlqREGXZIaYdAlqREGXZIaYdAlqREGXZIaYdAlqREGXZIaYdAlqREGXZIaYdAlqREGXZIaYdAlqREGXZIaYdAlqREGXZIa0TPoSU5Psj3JfJIHklzVXX9NkjuTPNz989TxjytJWkg/Z+jPAtdU1TRwLnB5krOAjcCXq+rHgS93f5YkLZOeQa+q/VV1b/f+08A88DrgXcCN3W03Ar84riElSb0NdA09yRTwZuAe4Ieraj8ciT7w2lEPJ0nqX99BT3IScDNwdVU9NcDzZpLsSrLr4MGDw8woSepDX0FPspojMb+pqm7pLj+e5LTu46cBB4733KraUlWdqupMTEyMYmZJ0nH08y6XAFuB+ar65FEP/T1wWff+ZcDfjX48SVK/VvWx5zzg/cD9SfZ0164Frgc+l2QD8J/ApeMZUZLUj55Br6qdQBZ4+OdGO44kaVh+UlSSGmHQJakRBl2SGmHQJakRBl2SGmHQJakRBl2SGmHQJakRBl2SGmHQJakRBl2SGmHQJakRBl2SGmHQJakRBl2SGmHQJakRBl2SGmHQJakRBl2SGtEz6Em2JTmQZO9Ra29KcleS+5N8IcmrxzumJKmXfs7QbwAuPmbtL4GNVfUG4Fbgt0c8lyRpQD2DXlU7gCeOWT4T2NG9fyfw3hHPJUka0LDX0PcCv9C9fylw+mjGkSQNa9ig/zpweZLdwMnAdxfamGQmya4kuw4ePDjk4SRJvQwV9Kp6sKreXlXnAHPAv7/I3i1V1amqzsTExLBzSpJ6GCroSV7b/fMVwO8Cfz7KoSRJg+vnbYtzwF3AmUn2JdkArE/yEPAg8BjwV+MdU5LUy6peG6pq/QIPfWrEs0iSFsFPikpSIwy6JDXCoEtSIwy6JDXCoEtSIwy6JDXCoEtSIwy6JDXCoEtSIwy6JDXCoEtSIwy6JDXCoEtSIwy6JDXCoEtSIwy6JDXCoEtSIwy6JDXCoEtSIwy6JDWiZ9CTbEtyIMneo9bOTnJ3kj1JdiV563jHlCT10s8Z+g3AxcesfQz4aFWdDfxe92dJ0jLqGfSq2gE8cewy8Oru/VOAx0Y8lyRpQKuGfN7VwJeSfJwj/yn8zEIbk8wAMwCTk5NDHk4r2Zs+egdPPnN47MeZ2nj7WP/+U05czdeve/tYj6GVbdigfxD4raq6Ocn7gK3ARcfbWFVbgC0AnU6nhjyeVrAnnznMI9dfstxjLNq4/8OQhn2Xy2XALd37nwd8UVSSltmwQX8MOL97/0Lg4dGMI0kaVs9LLknmgAuANUn2AdcBvwF8Kskq4BDda+SSpOXTM+hVtX6Bh84Z8SySpEXwk6KS1AiDLkmNMOiS1AiDLkmNMOiS1AiDLkmNMOiS1AiDLkmNMOiS1AiDLkmNMOiS1AiDLkmNMOiS1AiDLkmNMOiS1AiDLkmNMOiS1AiDLkmNMOiS1IieQU+yLcmBJHuPWvtskj3d2yNJ9ox3TElSLz2/JBq4Afg08NcvLFTVL71wP8kngCdHPpkkaSA9g15VO5JMHe+xJAHeB1w42rEkSYPq5wz9xbwNeLyqHl5oQ5IZYAZgcnJykYfTSnTy9EbecOPG5R5j0U6eBrhkucdQwxYb9PXA3IttqKotwBaATqdTizyeVqCn56/nketf/iGc2nj7co+gxg0d9CSrgPcA54xuHEnSsBbztsWLgAerat+ohpEkDa+fty3OAXcBZybZl2RD96FfpsflFknS0unnXS7rF1j/tZFPI0kamp8UlaRGGHRJaoRBl6RGGHRJaoRBl6RGGHRJaoRBl6RGGHRJaoRBl6RGGHRJaoRBl6RGGHRJaoRBl6RGGHRJaoRBl6RGGHRJaoRBl6RGGHRJaoRBl6RG9PMl0duSHEiy95j1K5N8M8kDST42vhElSf3o5wz9BuDioxeSrAPeBbyxqn4S+PjoR5MkDaJn0KtqB/DEMcsfBK6vqv/t7jkwhtkkSQMY9hr6GcDbktyT5KtJ3jLKoSRJg1u1iOedCpwLvAX4XJIfrao6dmOSGWAGYHJyctg5tcJNbbx9uUdYtFNOXL3cI6hxwwZ9H3BLN+D/luR5YA1w8NiNVbUF2ALQ6XT+X/ClXh65/pKxH2Nq4+1LchxpnIa95HIbcCFAkjOAHwC+PaqhJEmD63mGnmQOuABYk2QfcB2wDdjWfSvjd4HLjne5RZK0dHoGvarWL/DQr454FknSIvhJUUlqhEGXpEYYdElqhEGXpEYYdElqhEGXpEYYdElqhEGXpEYYdElqhEGXpEYYdElqhEGXpEYYdElqhEGXpEYYdElqhEGXpEYYdElqhEGXpEYYdElqRM+gJ9mW5ED3C6FfWPtIkm8l2dO9/fx4x5Qk9dLPGfoNwMXHWf/jqjq7e/viaMeSJA2qZ9CragfwxBLMIklahMVcQ78iyX3dSzKnjmwiSdJQhg36nwE/BpwN7Ac+sdDGJDNJdiXZdfDgwSEPJw0myUC3R//wnQM/J8ly/zOl7zNU0Kvq8ap6rqqeBz4DvPVF9m6pqk5VdSYmJoadUxpIVS3JTXopGSroSU476sd3A3sX2itJWhqrem1IMgdcAKxJsg+4DrggydlAAY8AvznGGSVJfegZ9Kpaf5zlrWOYRZK0CH5SVJIaYdAlqREGXZIaYdAlqREGXZIakaX8cESSg8CjS3ZAqX9rgG8v9xDSAl5fVT0/mbmkQZdeqpLsqqrOcs8hLYaXXCSpEQZdkhph0KUjtiz3ANJieQ1dkhrhGbokNcKgS1IjDLokNcKga8VIcluS3UkeSDLTXduQ5KEkX0nymSSf7q5PJLk5yde6t/OWd3qpN18U1YqR5DVV9USSE4GvAe8A/hX4KeBp4J+Br1fVFUn+FvjTqtqZZBL4UlVNL9vwUh96fsGF1JAPJXl39/7pwPuBr1bVEwBJPg+c0X38IuCso74I+tVJTq6qp5dyYGkQBl0rQpILOBLpn66q7yT5CvBNYKGz7ld09z6zNBNKi+c1dK0UpwD/3Y35TwDnAj8InJ/k1CSrgPcetf8O4IoXfuh+h670kmbQtVL8I7AqyX3A7wN3A98C/gC4B/gn4BvAk939HwI6Se5L8g3gA0s/sjQYXxTVipbkpKr6n+4Z+q3Atqq6dbnnkobhGbpWuo8k2QPsBf4DuG2Z55GG5hm6JDXCM3RJaoRBl6RGGHRJaoRBl6RGGHRJaoRBl6RG/B/L0Z4YCCa5HgAAAABJRU5ErkJggg==\n", "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "students['age'].plot.box()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "We can also include multiple variables and group by some variable, e.g. `sex`." ] }, { "cell_type": "code", "execution_count": 60, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "array([,\n", " ],\n", " dtype=object)" ] }, "execution_count": 60, "metadata": {}, "output_type": "execute_result" }, { "data": { "image/png": "\n", "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "students.boxplot(column=['alcohol_weekdays', 'alcohol_weekend'], by='sex')" ] } ], "metadata": { "anaconda-cloud": {}, "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.6.4" } }, "nbformat": 4, "nbformat_minor": 2 }