I used the Titanic dataset and all questions and findings pertain to it.

Introduction:¶

Having obtained the dataset that contained information about the passengers on board the Titanic, the first thought that occured to me was to understand any and all information about the survivors. The dataset contains information of 891 passengers out of the total 2224 passengers who were on board.

Questions regarding data set¶

After brainstorming the following 2 important questions regarding the dataset, I have proceeded to explain the approach, any assumptions, presented the analysis and finally reported the findings that helps answer these questions in the most meaningful way.

1) Did the following factors influence survival? - namely passenger class, sex and age. If so, how?

2) Is there a relationship between those who survived and those who had other relations traveling with them?

Method¶

After importing the data set, I looked at the first few rows to understand the content - column names, data types, size of data set etc. Next, I wanted to make sure to find and handle any missing information for a meaningful and sound analysis. I could see some null values and therefore decided to perform data wrangling to deal with the null values.

Data wrangling
The null values were present in the column Age (177 null values), Cabin (687 null values) and Embarked (2 null values). For the cabin and embarked columns, I decided to fill in the NaN values as a string 'Missing'. Since I wasn't performing any mathematical calculations with these two fields and my analysis wasn't centered around cabin or embarkation data, this method seemed satisfactory. The field 'Age' however, needed to be handled differently because I did intend to analyze the associations between age of passengers and survival rate. I had a few options on how to deal with the missing values - a) Fill in the missing values with 0. b) Ignore the missing values. c) Replace with a random number between the range (Mean-Std dev...Mean+Std dev) d) Replace with the mean of Age or mean of ages of specific groups.

Option a did not seem to be a good choice as replacing with 0, dragged the mean down and biased the result. (Mean age of all passengers was brought down to 24 as opposed to 30.) Logically, it seemed highly unlikely that the average age of passengers on a maiden voyage of the largest ship at the time was only 24.

Option c and d, were both fair options to use but I decided to go ahead with option b - i.e to discard missing values. This was because only 29% of the passengers whose age was missing actually survived. Since my analyses was more focused on those who survived, I was satisfied with working with the information on hand and proceeded to assume that the missing data does not significantly alter the results of my analyses. I therefore proceeded to filter out the rows with the NaN values for age and created a new dataset titanic_noNA with it and used this new dataframe in the plots that use 'Age' as a factor. I did not go with option c or d because of the same reason that assigning a random number might bias the mean given that age is missing for about 20% of the passengers (177 out of 891 total passengers).

Next steps After cleaning up the data, I wanted to look at some basic statistics, such as what was the mean age of the passengers, how many more males there were than females etc and then look at those characteristics in the proportion who survived.

Next I wanted to understand the correlation between the survived variable and the rest of the variables using Pearson's coefficient to determine if there were any strong positive or negative correlation between them.

In order to answer the 2nd question, the two columns of concern were the SibSp column and Parch column and a deep dive on those fields and their relationship to those who survived has been presented here.

Based on my findings, I plotted a few graphs from which I was able to confirm the findings graphically.

Report of findings¶

Qn. 1 - Did the following factors influence survival? - namely Age, Passenger Class and Sex. If so, how?¶

Average age of passengers who survived is about 29 years old (29.7). Median is close to the mean at about 28. 75th percentile is only 38. In other words, 75% of the passengers who survived (which is 342 based on available data) were equal to or younger than 38 years old. So it is safe to assume that those who survived were fairly young.
Looking at the proportion of passengers in each class who survived, we can see that 63% of 1st class passengers survived compared to only 24% of 3rd class. About half of 2nd class survived (47.3%). Looking at the correlation between Pclass and survived - about -0.34 : This negative correlation albeit weak indicates that there were more survivors as the passenger class decreased (i.e 3rd class passengers to 1st class passengers). This indicates that perhaps 1st class passengers were prioritized over others possibly due to proximity to lifeboats.
By a similar analysis, we can see that there were more female survivors (74.2%) than male survivors (18.9%). Although, we still need to dig deeper to understand how many children there were that contributed to this number but given that there is a wide margin between the two proportions, it is safe to assume that women (and possibly) children were possibly prioritized over the male passengers.

Plot results

a. Fig 1 shows a pairplot between Pclass and Age from which we can see that majority of passengers who died belong to the 3rd class as opposed to the 2nd or 1st classes (with 3>2>1) and also that people who survived were fairly young. This is reinforced by the fact that mean age of passengers who survived is 29 years.
b. From fig 2, we can see that the bar plot clearly depicts that the percentage of passengers who survived was higher in the upper classes (order being 1>2>3).
c. Fig 3 visualizes survival rate by sex and passenger class. We can see that a lot more women than men survived. Among that, a lot more in the upper class survived than in lower classes.

Qn. 2 - Is there a relationship between those who survived and those who had other relations traveling with them?¶

Only about 30% of passengers who travelled alone (without a parent/child/spouse/sibling) survived.
Passengers traveling in large groups (family of more than 4 members)did not survive except for one 38 year old female. There was probably a limit per family for lifeboat access which might explain the low survival rate in larger families. Also, passengers traveling with small families were prioritized over passengers traveling alone either because of the children involved or because the single passengers volunteered to give up their place.
Looking at all female single parents, we can see that all females traveling without a spouse (or sibling) but with atleast 1 child in 1st and 2nd class survived.

Note - A deep dive into the records of the large families (Goodwin and Sage families) has been dealt with here but no significant results have been obtained save for the overarching fact that they did not survive.

Plot results

a. Fig 4, clearly shows that smaller families with either fewer siblings or few children (assuming each passenger had either just 1 spouse and/or 1-2 children) survived over larger families with many children or with many other sibling relations. Fig 4 also shows the 30% survival rate of passengers traveling alone to be as low as those of larger families with relations >=4 reinforcing the priority given to families with children.
b. Fig 5 shows the histogarm of age and we can see clearly that a majority of passengers were fairly young roughly between 20 and 38.

# Python code supporting findings

#Importing necessary libraries and loading the data set.

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
%matplotlib inline

titanic_df = pd.read_csv('titanic-data.csv')

print(len(titanic_df)) #Total number of rows in file to understand the size of data we are dealing with and to confirm that
#we have all the rows of available data properly loaded.

#snapshot of data to understand names of columns, format and data types presented.
print(titanic_df.dtypes)
titanic_df.head(5)

891
PassengerId      int64
Survived         int64
Pclass           int64
Name            object
Sex             object
Age            float64
SibSp            int64
Parch            int64
Ticket          object
Fare           float64
Cabin           object
Embarked        object
dtype: object

'''First slice of the data - broad classification of data to understand how many survivors and 
how many men and women were on board.'''
print(titanic_df['Survived'].sum()) #Total number of survivors

titanic_df.groupby('Sex').count()['PassengerId'] # Total count of women vs men on board (based on dataset)

342

Sex
female    314
male      577
Name: PassengerId, dtype: int64

#Identifying presence of missing data. We notice age is missing for 177 passengers and cabin details are missing for 687. 
#2 passengers missing embarkation status.
titanic_df.isnull().sum()

PassengerId      0
Survived         0
Pclass           0
Name             0
Sex              0
Age            177
SibSp            0
Parch            0
Ticket           0
Fare             0
Cabin          687
Embarked         2
dtype: int64

''' __Begin data wrangling phase.__'''
#Exploring how to handle missing 'Age' data. Looking at how many survived whose age information is missing. Can see only
#that 70% whose age info is missing did not survive. Given that only 52 out of 177 passengers with missing age survived,
#we can perhaps discard the missing data without significantly affecting analysis of passengers who survived. 

print(titanic_df[np.isnan(titanic_df.Age)].groupby("Survived").count()) #subset data of passengers with missing age and obtain count of survived column.
titanic_df[np.isnan(titanic_df.Age)].groupby("Survived").count().apply(lambda x: x / x.sum()) #proportion of missing age data by survived.

          PassengerId  Pclass  Name  Sex  Age  SibSp  Parch  Ticket  Fare  \
Survived                                                                    
0                 125     125   125  125    0    125    125     125   125   
1                  52      52    52   52    0     52     52      52    52   

          Cabin  Embarked  relations  
Survived                              
0           125       125        125  
1            52        52         52

#Handling NaN age by eliminating rows that contain them and creating a new dataset with the filtered out data.
titanic_noNA = titanic_df[~np.isnan(titanic_df.Age)].copy()

#Handling missing values for Cabin
titanic_df['Cabin']=titanic_df['Cabin'].fillna('Missing') 

print(titanic_df['Cabin'].isnull().sum()) #confirming no null values in Cabin

titanic_df.head(5) #looking at a few rows that had null values for 'Cabin' column to ensure 'Missing' was added.

0

#Handing null values in Embarked column
titanic_df['Embarked']=titanic_df['Embarked'].fillna('Missing') 

print(titanic_df['Embarked'].isnull().sum())

titanic_df.loc[titanic_df['Embarked']=='Missing']

0

'''__Begin analysis.__'''
'''first glance at basic statistics from which gather mean, median and percentile data. We can see that average age of 
passengers on board was only 29.7 with 14.5 standard deviation and 75th percentile being just 35 years. Operating within
limitations of this dataset,
it is safe to assume that majority of passengers were fairly young.'''

print(titanic_df['Survived'].sum()) #understand total number survived for verifying numbers.


titanic_df.describe()

342

print(titanic_df.groupby('Sex')['Survived'].mean()) #proportion of passengers survived grouped by sex. 
                                                    #Can see majority were females (about 74%) which implies females were priotized
    #for lifeboat access over male passengers.


titanic_df.groupby('Pclass')['Survived'].mean() #proportion of passengers survived grouped by passenger class. 
                                                #Can see survival rate by passenger class is 1>2>3. We can see priority was given to 1sr
    #class passengers(63% survival rate) compared to 2nd (47%) or 3rd class (24%).

Sex
female    0.742038
male      0.188908
Name: Survived, dtype: float64

Pclass
1    0.629630
2    0.472826
3    0.242363
Name: Survived, dtype: float64

'''looking at Pearson's correlation coefficient to determine if there are strong relationships 
#between each pair of variables. Focusing on Survived variable, we see a mild negative correlation of -.33 with Pclass.
#This negative correlation albeit weak indicates that there were more survivors as the passenger class decreased 
(i.e 3rd class passengers to 1st class passengers) which reinforces the finding that upper classes were prioritized over lower classes'''

titanic_df.corr()

'''Next, we analyze passengers who travelled with other relations to see if there are any significant findings. 
Proportion of passengers survived grouped by sibling/spouse column'''
print(titanic_df.groupby('SibSp')['Survived'].mean()) #can see no survivors for passengers with SibSp >=5. 
titanic_df.loc[titanic_df['SibSp'] >= 5] 
#looking at the passenger records for large families on board based on previous query. Looks like the Goodwin's and Sage family 
#members were the only 2 large families and they did not survive. More specifically, we can deduce information about all the 5 
#Goodwin children (based on age column) but no information about the parents is available.
#As for the Sage family, it appears that although we do not have information about the age, 
#we can deduce they were all adults but they were all siblings (children of Sage family) given that Parch=2

SibSp
0    0.345395
1    0.535885
2    0.464286
3    0.250000
4    0.166667
5    0.000000
8    0.000000
Name: Survived, dtype: float64

 #Similarly, proportion of passengers survived grouped by parent/child column. 
print(titanic_df.groupby('Parch')['Survived'].mean())
#Can see that passengers with more than 3 parent/child relations did not survive save for 1 passenger with parch =5. 


titanic_df.loc[(titanic_df['Parch']==5) & titanic_df['Survived']==1]
#looking at the anomaly/outlier - the passenger(s) who survived with Parch =5. Can see it was possibly the mother (38 years old, 3rd class passenger)

Parch
0    0.343658
1    0.550847
2    0.500000
3    0.600000
4    0.000000
5    0.200000
6    0.000000
Name: Survived, dtype: float64

titanic_df.loc[(titanic_df['Name'].str.contains('Asplund'))]
#Analyzing whether the family of Mrs.Asplund survived. 
#We do not have information about all the family members but it appears that 2 of her children made it.
#Interestingly they have the same ticket number.

'''Next, we look at the survival rate of single female parent. 
Looking at no. of females traveling with one or more children but without a spouse/sibling and in 1st or 2nd class. 
We can see that all the single female parents survived which reinforces the priority given to both females as well as children'''

print(len(titanic_df.loc[(titanic_df['Parch']>=1) & (titanic_df['SibSp']==0)& (titanic_df['Sex']=='female')&(titanic_df['Pclass']<=2)]))

#looking at those who survived from previous step. Counts are same implying everyone in this category, survived.
len(titanic_df.loc[(titanic_df['Parch']>=1) & 
                   (titanic_df['SibSp']==0)& 
                   (titanic_df['Sex']=='female')&
                   (titanic_df['Pclass']<=2)&
                   (titanic_df['Survived']==1)])

27

27

 #analyzing other rows to see if there is a pattern.No pattern or other observations noted.
titanic_df.loc[(titanic_df['Parch']==0) & (titanic_df['SibSp']==3)]

Plots¶

Fig 1 - Pairplot between Age and passenger class categorized by 'Survived'.
Analysis - This plot compares the age and passenger class of those who survived vs those who died. Looking at the 1st plot on the 2nd row with hue survived, we can see see that status 0 or 'Died' points are clustered more towards the right of the plot (higher age values) and we see more of these points on Pclass=3 rather than Pclass=1. The 4th plot shows the green bar or 1 status or 'Survived' status to be much larger for Pclass=1 as opposed to Pclass = 3 and the exact proportion of this will be clear in the next plot.
ResultFrom the plot, we can see that the majority of passengers who died belong to the 3rd class as opposed to the 2nd or 1st classes (with 3>2>1) and among those the people who survived were fairly young.

1 denotes passenger survived and 0 denotes passenger died.

agevsclass = sns.pairplot(titanic_noNA,hue='Survived',vars=['Age','Pclass']) 
agevsclass.set(title='Age Vs Class comparison')

<seaborn.axisgrid.PairGrid at 0x121449518>

Fig 2 bar plot of survival rate by class.
Analysis Survival rate of Pclass = 1 was 63%, Pclass=2 was 47% and Pclass=3 was 24%. Clearly, the percentage of passengers who survived was higher in the upper classes compared to the lower with order of survival being 1>2>3.
Result Only a 24% survival rate among class 3 passengers indicates that either upper classes were prioritized over lower classes or that lifeboats were more easily accessible from the 1st class section of the boat rather than the 2nd or 3rd class.

prop = titanic_df.groupby('Pclass')['Survived'].mean()*100.

plt.ylabel('Survival rate')
ax = prop.plot(kind="bar", title="Proportion of survived by class")
for p in ax.patches:

    ax.annotate("%.2f" % p.get_height(), (p.get_x() + p.get_width() / 2., p.get_height()), ha='center', va='center', xytext=(0, 10), textcoords='offset points')

Fig 3 - Factorplot visualizing passenger survival based on sex and class.
Analysis Looking at the green line representing female passengers, the survival rate tapers off as we move right along the x-axis (with increasing Pclass). Following the blue line, we see it follows a similar pattern from left to right along the x-axis. Looking at each male/female pairs of data points for each Pclass, we see that data points for males are much lower on the y-axis (survival rate axis) than the data points for females.
Result We can deduce that a lot more women than men survived. It's importan to look at the survival rate of these two factors (Sex and Pclass) together because we see that among the women who survived, a lot more in the upper class survived than in lower classes. Therefore, in addition to preference given to upper class, priority was also given to female passengers.

pclass = sns.factorplot(data=titanic_df,x='Pclass',y='Survived',hue='Sex') 
pclass.set(title='Survived by Pclass and Sex')

<seaborn.axisgrid.FacetGrid at 0x1217acac8>

Fig 4 - Factorplot visualizing survival based on number of relations (Parent/child/sibling/spouse).
AnalysisMoving along the x axis, as the number of relations increases (sum of Parch and SibSp), the survival rate drops sharply to 20% at relations = 4. Highest survival rate is for number of relations =3, i.e families of 4. There is a slight increase to approximately 35% at relations = 6 but as seen earlier, it was a Mrs Asplund and her family of 7 where 2 of her children survived. Survival rate for number of relations =0 is almost as low as relations =4.
Result1There was probably a limit per family for lifeboat access which might explain the low survival rate in larger families. Also, passengers traveling with small families were prioritized over passengers traveling alone either because of the children involved or because the single passengers volunteered to give up their place.
Result2Majority who survived had between 0 and 3 Parent/child relationships with a 0-1 spouse/sibling relation on board. The plot reinforces the fact that large families (With sibsp > 1.5 or parch >3) did not survive save for 38 year old Mrs Asplund in class 3 as shown earlier.

titanic_df['relations'] = titanic_df['Parch']+titanic_df['SibSp']

relations = sns.factorplot(data=titanic_df,x='relations',y='Survived')

#Fig 5 histogram of age of passengers on board which shows that most of the passengers on board were between 20 and about 38 years old.
#This data discards the passengers with missing NaN values for age and therefore as stated earlier, any new information
#about age has the potential to significantly alter the resulting findings.
plt.hist(titanic_noNA['Age'],bins=9)  
plt.title('Age histogram')
plt.xlabel('Age')
plt.ylabel('Frequency')

<matplotlib.text.Text at 0x121bf3128>

Assumptions and Limitations¶

The dataset contains only a small fraction of information about the total number of passengers who were on board (about 40% of the total 2224). Therefore, any significant new information about the remaining passengers might alter the resulting findings.
At this point however, I assume this sample is a fair representation of the population and any conclusions derived from the analyses can be extrapolated to the population.
The data does not specify who is a child vs who is a parent but only whether a specific passenger travelled with a child or parent (similarly for a sibling/spouse relationship). Even though we can determine this based on the age of the passenger (atleast for a parent/child relationship), owing to the fact that we are missing a lot of data in the age column makes the classification by age difficult. Therefore while we can classify survival rate based on gender, the classification based on age is limited by the limited data available.
Finally, the rows containing missing values in age, have been eliminated from the analysis. A new dataset titanic_noNA has been created that contains the filtered out data (without NaN). Therefore, any conclusions based on age (Ex. histogram of survived by age) is limited by the lack of information. Any new information about the missing rows might alter the results of the study.

Links I referred to¶

https://www.kaggle.com/c/titanic/data http://stackoverflow.com/questions/17071871/select-rows-from-a-dataframe-based-on-values-in-a-column-in-pandas http://seaborn.pydata.org/generated/seaborn.pairplot.html https://www.kaggle.com/benhamner/python-seaborn-pairplot-example/code https://bespokeblog.wordpress.com/2011/07/11/basic-data-plotting-with-matplotlib-part-3-histograms/ https://discussions.udacity.com/t/nan-rows-not-showing-up-in-search/248475/7 https://discussions.udacity.com/t/nan-with-random-values/248494/5

	PassengerId	Survived	Pclass	Name	Sex	Age	SibSp	Ticket	Fare	Cabin	Embarked
0	1	0	3	Braund, Mr. Owen Harris	male	22.0	1	A/5 21171	7.2500	NaN	S
1	2	1	1	Cumings, Mrs. John Bradley (Florence Briggs Th...	female	38.0	1	PC 17599	71.2833	C85	C
2	3	1	3	Heikkinen, Miss. Laina	female	26.0	0	STON/O2. 3101282	7.9250	NaN	S
3	4	1	1	Futrelle, Mrs. Jacques Heath (Lily May Peel)	female	35.0	1	113803	53.1000	C123	S
4	5	0	3	Allen, Mr. William Henry	male	35.0	0	373450	8.0500	NaN	S

	PassengerId	Pclass	Name	Sex	Age	SibSp	Parch	Ticket	Fare	Cabin	Embarked	relations
Survived
0	0.706215	0.706215	0.706215	0.706215	NaN	0.706215	0.706215	0.706215	0.706215	0.706215	0.706215	0.706215
1	0.293785	0.293785	0.293785	0.293785	NaN	0.293785	0.293785	0.293785	0.293785	0.293785	0.293785	0.293785

	PassengerId	Survived	Pclass	Name	Sex	Age	SibSp	Ticket	Fare	Cabin	Embarked
0	1	0	3	Braund, Mr. Owen Harris	male	22.0	1	A/5 21171	7.2500	Missing	S
1	2	1	1	Cumings, Mrs. John Bradley (Florence Briggs Th...	female	38.0	1	PC 17599	71.2833	C85	C
2	3	1	3	Heikkinen, Miss. Laina	female	26.0	0	STON/O2. 3101282	7.9250	Missing	S
3	4	1	1	Futrelle, Mrs. Jacques Heath (Lily May Peel)	female	35.0	1	113803	53.1000	C123	S
4	5	0	3	Allen, Mr. William Henry	male	35.0	0	373450	8.0500	Missing	S

	PassengerId	Survived	Pclass	Age	SibSp	Parch	Fare	relations
count	891.000000	891.000000	891.000000	714.000000	891.000000	891.000000	891.000000	891.000000
mean	446.000000	0.383838	2.308642	29.699118	0.523008	0.381594	32.204208	0.904602
std	257.353842	0.486592	0.836071	14.526497	1.102743	0.806057	49.693429	1.613459
min	1.000000	0.000000	1.000000	0.420000	0.000000	0.000000	0.000000	0.000000
25%	223.500000	0.000000	2.000000	20.125000	0.000000	0.000000	7.910400	0.000000
50%	446.000000	0.000000	3.000000	28.000000	0.000000	0.000000	14.454200	0.000000
75%	668.500000	1.000000	3.000000	38.000000	1.000000	0.000000	31.000000	1.000000
max	891.000000	1.000000	3.000000	80.000000	8.000000	6.000000	512.329200	10.000000

	PassengerId	Survived	Pclass	Age	SibSp	Parch	Fare	relations
PassengerId	1.000000	-0.005007	-0.035144	0.036847	-0.057527	-0.001652	0.012658	-0.040143
Survived	-0.005007	1.000000	-0.338481	-0.077221	-0.035322	0.081629	0.257307	0.016639
Pclass	-0.035144	-0.338481	1.000000	-0.369226	0.083081	0.018443	-0.549500	0.065997
Age	0.036847	-0.077221	-0.369226	1.000000	-0.308247	-0.189119	0.096067	-0.301914
SibSp	-0.057527	-0.035322	0.083081	-0.308247	1.000000	0.414838	0.159651	0.890712
Parch	-0.001652	0.081629	0.018443	-0.189119	0.414838	1.000000	0.216225	0.783111
Fare	0.012658	0.257307	-0.549500	0.096067	0.159651	0.216225	1.000000	0.217138
relations	-0.040143	0.016639	0.065997	-0.301914	0.890712	0.783111	0.217138	1.000000

	PassengerId	Survived	Pclass	Name	Sex	Age	SibSp	Parch	Ticket	Fare	Cabin	Embarked
61	62	1	1	Icard, Miss. Amelie	female	38.0	0	0	113572	80.0	B28	Missing
829	830	1	1	Stone, Mrs. George Nelson (Martha Evelyn)	female	62.0	0	0	113572	80.0	B28	Missing

	PassengerId	Pclass	Name	Sex	Age	SibSp	Parch	Ticket	Fare	Cabin	Embarked	relations
59	60	3	Goodwin, Master. William Frederick	male	11.0	5	2	CA 2144	46.90	Missing	S	7
71	72	3	Goodwin, Miss. Lillian Amy	female	16.0	5	2	CA 2144	46.90	Missing	S	7
159	160	3	Sage, Master. Thomas Henry	male	NaN	8	2	CA. 2343	69.55	Missing	S	10
180	181	3	Sage, Miss. Constance Gladys	female	NaN	8	2	CA. 2343	69.55	Missing	S	10
201	202	3	Sage, Mr. Frederick	male	NaN	8	2	CA. 2343	69.55	Missing	S	10
324	325	3	Sage, Mr. George John Jr	male	NaN	8	2	CA. 2343	69.55	Missing	S	10
386	387	3	Goodwin, Master. Sidney Leonard	male	1.0	5	2	CA 2144	46.90	Missing	S	7
480	481	3	Goodwin, Master. Harold Victor	male	9.0	5	2	CA 2144	46.90	Missing	S	7
683	684	3	Goodwin, Mr. Charles Edward	male	14.0	5	2	CA 2144	46.90	Missing	S	7
792	793	3	Sage, Miss. Stella Anna	female	NaN	8	2	CA. 2343	69.55	Missing	S	10
846	847	3	Sage, Mr. Douglas Bullen	male	NaN	8	2	CA. 2343	69.55	Missing	S	10
863	864	3	Sage, Miss. Dorothy Edith "Dolly"	female	NaN	8	2	CA. 2343	69.55	Missing	S	10

	PassengerId	Survived	Pclass	Name	Sex	Age	SibSp	Parch	Ticket	Fare	Cabin	Embarked
25	26	1	3	Asplund, Mrs. Carl Oscar (Selma Augusta Emilia...	female	38.0	1	5	347077	31.3875	Missing	S
182	183	0	3	Asplund, Master. Clarence Gustaf Hugo	male	9.0	4	2	347077	31.3875	Missing	S
233	234	1	3	Asplund, Miss. Lillian Gertrud	female	5.0	4	2	347077	31.3875	Missing	S
261	262	1	3	Asplund, Master. Edvin Rojj Felix	male	3.0	4	2	347077	31.3875	Missing	S

	PassengerId	Survived	Pclass	Name	Sex	Age	SibSp	Parch	Ticket	Fare	Cabin	Embarked
85	86	1	3	Backstrom, Mrs. Karl Alfred (Maria Mathilda Gu...	female	33.0	3	0	3101278	15.85	Missing	S
726	727	1	2	Renouf, Mrs. Peter Henry (Lillian Jefferys)	female	30.0	3	0	31027	21.00	Missing	S