Prosper is a San Francisco-based peer-to-peer lending company established in 2005. Prosper generates revenue by charging borrowers an origination fees of 1% to 5% to verify identities and assess credibility of borrowers. It also charges investors a 1% annual servicing fee. The dataset provided to Udacity was last updated in 11th March, 2014.
To kickstart the analysis, analyses below are carried out:
## [1] "ListingKey"
## [2] "ListingNumber"
## [3] "ListingCreationDate"
## [4] "CreditGrade"
## [5] "Term"
## [6] "LoanStatus"
## [7] "ClosedDate"
## [8] "BorrowerAPR"
## [9] "BorrowerRate"
## [10] "LenderYield"
## [11] "EstimatedEffectiveYield"
## [12] "EstimatedLoss"
## [13] "EstimatedReturn"
## [14] "ProsperRating..numeric."
## [15] "ProsperRating..Alpha."
## [16] "ProsperScore"
## [17] "ListingCategory..numeric."
## [18] "BorrowerState"
## [19] "Occupation"
## [20] "EmploymentStatus"
## [21] "EmploymentStatusDuration"
## [22] "IsBorrowerHomeowner"
## [23] "CurrentlyInGroup"
## [24] "GroupKey"
## [25] "DateCreditPulled"
## [26] "CreditScoreRangeLower"
## [27] "CreditScoreRangeUpper"
## [28] "FirstRecordedCreditLine"
## [29] "CurrentCreditLines"
## [30] "OpenCreditLines"
## [31] "TotalCreditLinespast7years"
## [32] "OpenRevolvingAccounts"
## [33] "OpenRevolvingMonthlyPayment"
## [34] "InquiriesLast6Months"
## [35] "TotalInquiries"
## [36] "CurrentDelinquencies"
## [37] "AmountDelinquent"
## [38] "DelinquenciesLast7Years"
## [39] "PublicRecordsLast10Years"
## [40] "PublicRecordsLast12Months"
## [41] "RevolvingCreditBalance"
## [42] "BankcardUtilization"
## [43] "AvailableBankcardCredit"
## [44] "TotalTrades"
## [45] "TradesNeverDelinquent..percentage."
## [46] "TradesOpenedLast6Months"
## [47] "DebtToIncomeRatio"
## [48] "IncomeRange"
## [49] "IncomeVerifiable"
## [50] "StatedMonthlyIncome"
## [51] "LoanKey"
## [52] "TotalProsperLoans"
## [53] "TotalProsperPaymentsBilled"
## [54] "OnTimeProsperPayments"
## [55] "ProsperPaymentsLessThanOneMonthLate"
## [56] "ProsperPaymentsOneMonthPlusLate"
## [57] "ProsperPrincipalBorrowed"
## [58] "ProsperPrincipalOutstanding"
## [59] "ScorexChangeAtTimeOfListing"
## [60] "LoanCurrentDaysDelinquent"
## [61] "LoanFirstDefaultedCycleNumber"
## [62] "LoanMonthsSinceOrigination"
## [63] "LoanNumber"
## [64] "LoanOriginalAmount"
## [65] "LoanOriginationDate"
## [66] "LoanOriginationQuarter"
## [67] "MemberKey"
## [68] "MonthlyLoanPayment"
## [69] "LP_CustomerPayments"
## [70] "LP_CustomerPrincipalPayments"
## [71] "LP_InterestandFees"
## [72] "LP_ServiceFees"
## [73] "LP_CollectionFees"
## [74] "LP_GrossPrincipalLoss"
## [75] "LP_NetPrincipalLoss"
## [76] "LP_NonPrincipalRecoverypayments"
## [77] "PercentFunded"
## [78] "Recommendations"
## [79] "InvestmentFromFriendsCount"
## [80] "InvestmentFromFriendsAmount"
## [81] "Investors"
There are a total of 81 columns in the dataset. Excluding listing-related identifiers, there should be around 70 variables.
## ListingKey ListingNumber
## 17A93590655669644DB4C06: 6 Min. : 4
## 349D3587495831350F0F648: 4 1st Qu.: 400919
## 47C1359638497431975670B: 4 Median : 600554
## 8474358854651984137201C: 4 Mean : 627886
## DE8535960513435199406CE: 4 3rd Qu.: 892634
## 04C13599434217079754AEE: 3 Max. :1255725
## (Other) :113912
## ListingCreationDate CreditGrade Term
## 2013-10-02 17:20:16.550000000: 6 C : 5649 Min. :12.00
## 2013-08-28 20:31:41.107000000: 4 D : 5153 1st Qu.:36.00
## 2013-09-08 09:27:44.853000000: 4 B : 4389 Median :36.00
## 2013-12-06 05:43:13.830000000: 4 AA : 3509 Mean :40.83
## 2013-12-06 11:44:58.283000000: 4 HR : 3508 3rd Qu.:36.00
## 2013-08-21 07:25:22.360000000: 3 (Other): 6745 Max. :60.00
## (Other) :113912 NA's :84984
## LoanStatus ClosedDate
## Current :56576 2014-03-04 00:00:00: 105
## Completed :38074 2014-02-19 00:00:00: 100
## Chargedoff :11992 2014-02-11 00:00:00: 92
## Defaulted : 5018 2012-10-30 00:00:00: 81
## Past Due (1-15 days) : 806 2013-02-26 00:00:00: 78
## Past Due (31-60 days): 363 (Other) :54633
## (Other) : 1108 NA's :58848
## BorrowerAPR BorrowerRate LenderYield
## Min. :0.00653 Min. :0.0000 Min. :-0.0100
## 1st Qu.:0.15629 1st Qu.:0.1340 1st Qu.: 0.1242
## Median :0.20976 Median :0.1840 Median : 0.1730
## Mean :0.21883 Mean :0.1928 Mean : 0.1827
## 3rd Qu.:0.28381 3rd Qu.:0.2500 3rd Qu.: 0.2400
## Max. :0.51229 Max. :0.4975 Max. : 0.4925
## NA's :25
## EstimatedEffectiveYield EstimatedLoss EstimatedReturn
## Min. :-0.183 Min. :0.005 Min. :-0.183
## 1st Qu.: 0.116 1st Qu.:0.042 1st Qu.: 0.074
## Median : 0.162 Median :0.072 Median : 0.092
## Mean : 0.169 Mean :0.080 Mean : 0.096
## 3rd Qu.: 0.224 3rd Qu.:0.112 3rd Qu.: 0.117
## Max. : 0.320 Max. :0.366 Max. : 0.284
## NA's :29084 NA's :29084 NA's :29084
## ProsperRating..numeric. ProsperRating..Alpha. ProsperScore
## Min. :1.000 C :18345 Min. : 1.00
## 1st Qu.:3.000 B :15581 1st Qu.: 4.00
## Median :4.000 A :14551 Median : 6.00
## Mean :4.072 D :14274 Mean : 5.95
## 3rd Qu.:5.000 E : 9795 3rd Qu.: 8.00
## Max. :7.000 (Other):12307 Max. :11.00
## NA's :29084 NA's :29084 NA's :29084
## ListingCategory..numeric. BorrowerState Occupation
## Min. : 0.000 CA :14717 Other :28617
## 1st Qu.: 1.000 TX : 6842 Professional :13628
## Median : 1.000 NY : 6729 Computer Programmer: 4478
## Mean : 2.774 FL : 6720 Executive : 4311
## 3rd Qu.: 3.000 IL : 5921 Teacher : 3759
## Max. :20.000 (Other):67493 (Other) :55556
## NA's : 5515 NA's : 3588
## EmploymentStatus EmploymentStatusDuration IsBorrowerHomeowner
## Employed :67322 Min. : 0.00 False:56459
## Full-time :26355 1st Qu.: 26.00 True :57478
## Self-employed: 6134 Median : 67.00
## Not available: 5347 Mean : 96.07
## Other : 3806 3rd Qu.:137.00
## (Other) : 2718 Max. :755.00
## NA's : 2255 NA's :7625
## CurrentlyInGroup GroupKey
## False:101218 783C3371218786870A73D20: 1140
## True : 12719 3D4D3366260257624AB272D: 916
## 6A3B336601725506917317E: 698
## FEF83377364176536637E50: 611
## C9643379247860156A00EC0: 342
## (Other) : 9634
## NA's :100596
## DateCreditPulled CreditScoreRangeLower CreditScoreRangeUpper
## 2013-12-23 09:38:12: 6 Min. : 0.0 Min. : 19.0
## 2013-11-21 09:09:41: 4 1st Qu.:660.0 1st Qu.:679.0
## 2013-12-06 05:43:16: 4 Median :680.0 Median :699.0
## 2014-01-14 20:17:49: 4 Mean :685.6 Mean :704.6
## 2014-02-09 12:14:41: 4 3rd Qu.:720.0 3rd Qu.:739.0
## 2013-09-27 22:04:54: 3 Max. :880.0 Max. :899.0
## (Other) :113912 NA's :591 NA's :591
## FirstRecordedCreditLine CurrentCreditLines OpenCreditLines
## 1993-12-01 00:00:00: 185 Min. : 0.00 Min. : 0.00
## 1994-11-01 00:00:00: 178 1st Qu.: 7.00 1st Qu.: 6.00
## 1995-11-01 00:00:00: 168 Median :10.00 Median : 9.00
## 1990-04-01 00:00:00: 161 Mean :10.32 Mean : 9.26
## 1995-03-01 00:00:00: 159 3rd Qu.:13.00 3rd Qu.:12.00
## (Other) :112389 Max. :59.00 Max. :54.00
## NA's : 697 NA's :7604 NA's :7604
## TotalCreditLinespast7years OpenRevolvingAccounts
## Min. : 2.00 Min. : 0.00
## 1st Qu.: 17.00 1st Qu.: 4.00
## Median : 25.00 Median : 6.00
## Mean : 26.75 Mean : 6.97
## 3rd Qu.: 35.00 3rd Qu.: 9.00
## Max. :136.00 Max. :51.00
## NA's :697
## OpenRevolvingMonthlyPayment InquiriesLast6Months TotalInquiries
## Min. : 0.0 Min. : 0.000 Min. : 0.000
## 1st Qu.: 114.0 1st Qu.: 0.000 1st Qu.: 2.000
## Median : 271.0 Median : 1.000 Median : 4.000
## Mean : 398.3 Mean : 1.435 Mean : 5.584
## 3rd Qu.: 525.0 3rd Qu.: 2.000 3rd Qu.: 7.000
## Max. :14985.0 Max. :105.000 Max. :379.000
## NA's :697 NA's :1159
## CurrentDelinquencies AmountDelinquent DelinquenciesLast7Years
## Min. : 0.0000 Min. : 0.0 Min. : 0.000
## 1st Qu.: 0.0000 1st Qu.: 0.0 1st Qu.: 0.000
## Median : 0.0000 Median : 0.0 Median : 0.000
## Mean : 0.5921 Mean : 984.5 Mean : 4.155
## 3rd Qu.: 0.0000 3rd Qu.: 0.0 3rd Qu.: 3.000
## Max. :83.0000 Max. :463881.0 Max. :99.000
## NA's :697 NA's :7622 NA's :990
## PublicRecordsLast10Years PublicRecordsLast12Months RevolvingCreditBalance
## Min. : 0.0000 Min. : 0.000 Min. : 0
## 1st Qu.: 0.0000 1st Qu.: 0.000 1st Qu.: 3121
## Median : 0.0000 Median : 0.000 Median : 8549
## Mean : 0.3126 Mean : 0.015 Mean : 17599
## 3rd Qu.: 0.0000 3rd Qu.: 0.000 3rd Qu.: 19521
## Max. :38.0000 Max. :20.000 Max. :1435667
## NA's :697 NA's :7604 NA's :7604
## BankcardUtilization AvailableBankcardCredit TotalTrades
## Min. :0.000 Min. : 0 Min. : 0.00
## 1st Qu.:0.310 1st Qu.: 880 1st Qu.: 15.00
## Median :0.600 Median : 4100 Median : 22.00
## Mean :0.561 Mean : 11210 Mean : 23.23
## 3rd Qu.:0.840 3rd Qu.: 13180 3rd Qu.: 30.00
## Max. :5.950 Max. :646285 Max. :126.00
## NA's :7604 NA's :7544 NA's :7544
## TradesNeverDelinquent..percentage. TradesOpenedLast6Months
## Min. :0.000 Min. : 0.000
## 1st Qu.:0.820 1st Qu.: 0.000
## Median :0.940 Median : 0.000
## Mean :0.886 Mean : 0.802
## 3rd Qu.:1.000 3rd Qu.: 1.000
## Max. :1.000 Max. :20.000
## NA's :7544 NA's :7544
## DebtToIncomeRatio IncomeRange IncomeVerifiable
## Min. : 0.000 $25,000-49,999:32192 False: 8669
## 1st Qu.: 0.140 $50,000-74,999:31050 True :105268
## Median : 0.220 $100,000+ :17337
## Mean : 0.276 $75,000-99,999:16916
## 3rd Qu.: 0.320 Not displayed : 7741
## Max. :10.010 $1-24,999 : 7274
## NA's :8554 (Other) : 1427
## StatedMonthlyIncome LoanKey TotalProsperLoans
## Min. : 0 CB1B37030986463208432A1: 6 Min. :0.00
## 1st Qu.: 3200 2DEE3698211017519D7333F: 4 1st Qu.:1.00
## Median : 4667 9F4B37043517554537C364C: 4 Median :1.00
## Mean : 5608 D895370150591392337ED6D: 4 Mean :1.42
## 3rd Qu.: 6825 E6FB37073953690388BC56D: 4 3rd Qu.:2.00
## Max. :1750003 0D8F37036734373301ED419: 3 Max. :8.00
## (Other) :113912 NA's :91852
## TotalProsperPaymentsBilled OnTimeProsperPayments
## Min. : 0.00 Min. : 0.00
## 1st Qu.: 9.00 1st Qu.: 9.00
## Median : 16.00 Median : 15.00
## Mean : 22.93 Mean : 22.27
## 3rd Qu.: 33.00 3rd Qu.: 32.00
## Max. :141.00 Max. :141.00
## NA's :91852 NA's :91852
## ProsperPaymentsLessThanOneMonthLate ProsperPaymentsOneMonthPlusLate
## Min. : 0.00 Min. : 0.00
## 1st Qu.: 0.00 1st Qu.: 0.00
## Median : 0.00 Median : 0.00
## Mean : 0.61 Mean : 0.05
## 3rd Qu.: 0.00 3rd Qu.: 0.00
## Max. :42.00 Max. :21.00
## NA's :91852 NA's :91852
## ProsperPrincipalBorrowed ProsperPrincipalOutstanding
## Min. : 0 Min. : 0
## 1st Qu.: 3500 1st Qu.: 0
## Median : 6000 Median : 1627
## Mean : 8472 Mean : 2930
## 3rd Qu.:11000 3rd Qu.: 4127
## Max. :72499 Max. :23451
## NA's :91852 NA's :91852
## ScorexChangeAtTimeOfListing LoanCurrentDaysDelinquent
## Min. :-209.00 Min. : 0.0
## 1st Qu.: -35.00 1st Qu.: 0.0
## Median : -3.00 Median : 0.0
## Mean : -3.22 Mean : 152.8
## 3rd Qu.: 25.00 3rd Qu.: 0.0
## Max. : 286.00 Max. :2704.0
## NA's :95009
## LoanFirstDefaultedCycleNumber LoanMonthsSinceOrigination LoanNumber
## Min. : 0.00 Min. : 0.0 Min. : 1
## 1st Qu.: 9.00 1st Qu.: 6.0 1st Qu.: 37332
## Median :14.00 Median : 21.0 Median : 68599
## Mean :16.27 Mean : 31.9 Mean : 69444
## 3rd Qu.:22.00 3rd Qu.: 65.0 3rd Qu.:101901
## Max. :44.00 Max. :100.0 Max. :136486
## NA's :96985
## LoanOriginalAmount LoanOriginationDate LoanOriginationQuarter
## Min. : 1000 2014-01-22 00:00:00: 491 Q4 2013:14450
## 1st Qu.: 4000 2013-11-13 00:00:00: 490 Q1 2014:12172
## Median : 6500 2014-02-19 00:00:00: 439 Q3 2013: 9180
## Mean : 8337 2013-10-16 00:00:00: 434 Q2 2013: 7099
## 3rd Qu.:12000 2014-01-28 00:00:00: 339 Q3 2012: 5632
## Max. :35000 2013-09-24 00:00:00: 316 Q2 2012: 5061
## (Other) :111428 (Other):60343
## MemberKey MonthlyLoanPayment LP_CustomerPayments
## 63CA34120866140639431C9: 9 Min. : 0.0 Min. : -2.35
## 16083364744933457E57FB9: 8 1st Qu.: 131.6 1st Qu.: 1005.76
## 3A2F3380477699707C81385: 8 Median : 217.7 Median : 2583.83
## 4D9C3403302047712AD0CDD: 8 Mean : 272.5 Mean : 4183.08
## 739C338135235294782AE75: 8 3rd Qu.: 371.6 3rd Qu.: 5548.40
## 7E1733653050264822FAA3D: 8 Max. :2251.5 Max. :40702.39
## (Other) :113888
## LP_CustomerPrincipalPayments LP_InterestandFees LP_ServiceFees
## Min. : 0.0 Min. : -2.35 Min. :-664.87
## 1st Qu.: 500.9 1st Qu.: 274.87 1st Qu.: -73.18
## Median : 1587.5 Median : 700.84 Median : -34.44
## Mean : 3105.5 Mean : 1077.54 Mean : -54.73
## 3rd Qu.: 4000.0 3rd Qu.: 1458.54 3rd Qu.: -13.92
## Max. :35000.0 Max. :15617.03 Max. : 32.06
##
## LP_CollectionFees LP_GrossPrincipalLoss LP_NetPrincipalLoss
## Min. :-9274.75 Min. : -94.2 Min. : -954.5
## 1st Qu.: 0.00 1st Qu.: 0.0 1st Qu.: 0.0
## Median : 0.00 Median : 0.0 Median : 0.0
## Mean : -14.24 Mean : 700.4 Mean : 681.4
## 3rd Qu.: 0.00 3rd Qu.: 0.0 3rd Qu.: 0.0
## Max. : 0.00 Max. :25000.0 Max. :25000.0
##
## LP_NonPrincipalRecoverypayments PercentFunded Recommendations
## Min. : 0.00 Min. :0.7000 Min. : 0.00000
## 1st Qu.: 0.00 1st Qu.:1.0000 1st Qu.: 0.00000
## Median : 0.00 Median :1.0000 Median : 0.00000
## Mean : 25.14 Mean :0.9986 Mean : 0.04803
## 3rd Qu.: 0.00 3rd Qu.:1.0000 3rd Qu.: 0.00000
## Max. :21117.90 Max. :1.0125 Max. :39.00000
##
## InvestmentFromFriendsCount InvestmentFromFriendsAmount Investors
## Min. : 0.00000 Min. : 0.00 Min. : 1.00
## 1st Qu.: 0.00000 1st Qu.: 0.00 1st Qu.: 2.00
## Median : 0.00000 Median : 0.00 Median : 44.00
## Mean : 0.02346 Mean : 16.55 Mean : 80.48
## 3rd Qu.: 0.00000 3rd Qu.: 0.00 3rd Qu.: 115.00
## Max. :33.00000 Max. :25000.00 Max. :1189.00
##
Analyses included in the summaries of variables above:
range for continuous variables
top 5 items in discrete variables
## 'data.frame': 113937 obs. of 81 variables:
## $ ListingKey : Factor w/ 113066 levels "00003546482094282EF90E5",..: 7180 7193 6647 6669 6686 6689 6699 6706 6687 6687 ...
## $ ListingNumber : int 193129 1209647 81716 658116 909464 1074836 750899 768193 1023355 1023355 ...
## $ ListingCreationDate : Factor w/ 113064 levels "2005-11-09 20:44:28.847000000",..: 14184 111894 6429 64760 85967 100310 72556 74019 97834 97834 ...
## $ CreditGrade : Factor w/ 8 levels "A","AA","B","C",..: 4 NA 7 NA NA NA NA NA NA NA ...
## $ Term : int 36 36 36 36 36 60 36 36 36 36 ...
## $ LoanStatus : Factor w/ 12 levels "Cancelled","Chargedoff",..: 3 4 3 4 4 4 4 4 4 4 ...
## $ ClosedDate : Factor w/ 2802 levels "2005-11-25 00:00:00",..: 1137 NA 1262 NA NA NA NA NA NA NA ...
## $ BorrowerAPR : num 0.165 0.12 0.283 0.125 0.246 ...
## $ BorrowerRate : num 0.158 0.092 0.275 0.0974 0.2085 ...
## $ LenderYield : num 0.138 0.082 0.24 0.0874 0.1985 ...
## $ EstimatedEffectiveYield : num NA 0.0796 NA 0.0849 0.1832 ...
## $ EstimatedLoss : num NA 0.0249 NA 0.0249 0.0925 ...
## $ EstimatedReturn : num NA 0.0547 NA 0.06 0.0907 ...
## $ ProsperRating..numeric. : int NA 6 NA 6 3 5 2 4 7 7 ...
## $ ProsperRating..Alpha. : Factor w/ 7 levels "A","AA","B","C",..: NA 1 NA 1 5 3 6 4 2 2 ...
## $ ProsperScore : num NA 7 NA 9 4 10 2 4 9 11 ...
## $ ListingCategory..numeric. : int 0 2 0 16 2 1 1 2 7 7 ...
## $ BorrowerState : Factor w/ 51 levels "AK","AL","AR",..: 6 6 11 11 24 33 17 5 15 15 ...
## $ Occupation : Factor w/ 67 levels "Accountant/CPA",..: 36 42 36 51 20 42 49 28 23 23 ...
## $ EmploymentStatus : Factor w/ 8 levels "Employed","Full-time",..: 8 1 3 1 1 1 1 1 1 1 ...
## $ EmploymentStatusDuration : int 2 44 NA 113 44 82 172 103 269 269 ...
## $ IsBorrowerHomeowner : Factor w/ 2 levels "False","True": 2 1 1 2 2 2 1 1 2 2 ...
## $ CurrentlyInGroup : Factor w/ 2 levels "False","True": 2 1 2 1 1 1 1 1 1 1 ...
## $ GroupKey : Factor w/ 706 levels "00343376901312423168731",..: NA NA 334 NA NA NA NA NA NA NA ...
## $ DateCreditPulled : Factor w/ 112992 levels "2005-11-09 00:30:04.487000000",..: 14347 111883 6446 64724 85857 100382 72500 73937 97888 97888 ...
## $ CreditScoreRangeLower : int 640 680 480 800 680 740 680 700 820 820 ...
## $ CreditScoreRangeUpper : int 659 699 499 819 699 759 699 719 839 839 ...
## $ FirstRecordedCreditLine : Factor w/ 11585 levels "1947-08-24 00:00:00",..: 8638 6616 8926 2246 9497 496 8264 7684 5542 5542 ...
## $ CurrentCreditLines : int 5 14 NA 5 19 21 10 6 17 17 ...
## $ OpenCreditLines : int 4 14 NA 5 19 17 7 6 16 16 ...
## $ TotalCreditLinespast7years : int 12 29 3 29 49 49 20 10 32 32 ...
## $ OpenRevolvingAccounts : int 1 13 0 7 6 13 6 5 12 12 ...
## $ OpenRevolvingMonthlyPayment : num 24 389 0 115 220 1410 214 101 219 219 ...
## $ InquiriesLast6Months : int 3 3 0 0 1 0 0 3 1 1 ...
## $ TotalInquiries : num 3 5 1 1 9 2 0 16 6 6 ...
## $ CurrentDelinquencies : int 2 0 1 4 0 0 0 0 0 0 ...
## $ AmountDelinquent : num 472 0 NA 10056 0 ...
## $ DelinquenciesLast7Years : int 4 0 0 14 0 0 0 0 0 0 ...
## $ PublicRecordsLast10Years : int 0 1 0 0 0 0 0 1 0 0 ...
## $ PublicRecordsLast12Months : int 0 0 NA 0 0 0 0 0 0 0 ...
## $ RevolvingCreditBalance : num 0 3989 NA 1444 6193 ...
## $ BankcardUtilization : num 0 0.21 NA 0.04 0.81 0.39 0.72 0.13 0.11 0.11 ...
## $ AvailableBankcardCredit : num 1500 10266 NA 30754 695 ...
## $ TotalTrades : num 11 29 NA 26 39 47 16 10 29 29 ...
## $ TradesNeverDelinquent..percentage. : num 0.81 1 NA 0.76 0.95 1 0.68 0.8 1 1 ...
## $ TradesOpenedLast6Months : num 0 2 NA 0 2 0 0 0 1 1 ...
## $ DebtToIncomeRatio : num 0.17 0.18 0.06 0.15 0.26 0.36 0.27 0.24 0.25 0.25 ...
## $ IncomeRange : Factor w/ 8 levels "$0","$1-24,999",..: 4 5 7 4 3 3 4 4 4 4 ...
## $ IncomeVerifiable : Factor w/ 2 levels "False","True": 2 2 2 2 2 2 2 2 2 2 ...
## $ StatedMonthlyIncome : num 3083 6125 2083 2875 9583 ...
## $ LoanKey : Factor w/ 113066 levels "00003683605746079487FF7",..: 100337 69837 46303 70776 71387 86505 91250 5425 908 908 ...
## $ TotalProsperLoans : int NA NA NA NA 1 NA NA NA NA NA ...
## $ TotalProsperPaymentsBilled : int NA NA NA NA 11 NA NA NA NA NA ...
## $ OnTimeProsperPayments : int NA NA NA NA 11 NA NA NA NA NA ...
## $ ProsperPaymentsLessThanOneMonthLate: int NA NA NA NA 0 NA NA NA NA NA ...
## $ ProsperPaymentsOneMonthPlusLate : int NA NA NA NA 0 NA NA NA NA NA ...
## $ ProsperPrincipalBorrowed : num NA NA NA NA 11000 NA NA NA NA NA ...
## $ ProsperPrincipalOutstanding : num NA NA NA NA 9948 ...
## $ ScorexChangeAtTimeOfListing : int NA NA NA NA NA NA NA NA NA NA ...
## $ LoanCurrentDaysDelinquent : int 0 0 0 0 0 0 0 0 0 0 ...
## $ LoanFirstDefaultedCycleNumber : int NA NA NA NA NA NA NA NA NA NA ...
## $ LoanMonthsSinceOrigination : int 78 0 86 16 6 3 11 10 3 3 ...
## $ LoanNumber : int 19141 134815 6466 77296 102670 123257 88353 90051 121268 121268 ...
## $ LoanOriginalAmount : int 9425 10000 3001 10000 15000 15000 3000 10000 10000 10000 ...
## $ LoanOriginationDate : Factor w/ 1873 levels "2005-11-15 00:00:00",..: 426 1866 260 1535 1757 1821 1649 1666 1813 1813 ...
## $ LoanOriginationQuarter : Factor w/ 33 levels "Q1 2006","Q1 2007",..: 18 8 2 32 24 33 16 16 33 33 ...
## $ MemberKey : Factor w/ 90831 levels "00003397697413387CAF966",..: 11071 10302 33781 54939 19465 48037 60448 40951 26129 26129 ...
## $ MonthlyLoanPayment : num 330 319 123 321 564 ...
## $ LP_CustomerPayments : num 11396 0 4187 5143 2820 ...
## $ LP_CustomerPrincipalPayments : num 9425 0 3001 4091 1563 ...
## $ LP_InterestandFees : num 1971 0 1186 1052 1257 ...
## $ LP_ServiceFees : num -133.2 0 -24.2 -108 -60.3 ...
## $ LP_CollectionFees : num 0 0 0 0 0 0 0 0 0 0 ...
## $ LP_GrossPrincipalLoss : num 0 0 0 0 0 0 0 0 0 0 ...
## $ LP_NetPrincipalLoss : num 0 0 0 0 0 0 0 0 0 0 ...
## $ LP_NonPrincipalRecoverypayments : num 0 0 0 0 0 0 0 0 0 0 ...
## $ PercentFunded : num 1 1 1 1 1 1 1 1 1 1 ...
## $ Recommendations : int 0 0 0 0 0 0 0 0 0 0 ...
## $ InvestmentFromFriendsCount : int 0 0 0 0 0 0 0 0 0 0 ...
## $ InvestmentFromFriendsAmount : num 0 0 0 0 0 0 0 0 0 0 ...
## $ Investors : int 258 1 41 158 20 1 1 1 1 1 ...
Internal structures of the 81 variables are as above.
## Q1 2006 Q1 2007 Q1 2008 Q1 2010 Q1 2011 Q1 2012 Q1 2013 Q1 2014 Q2 2006
## 315 3079 3074 1243 1744 4435 3616 12172 1254
## Q2 2007 Q2 2008 Q2 2009 Q2 2010 Q2 2011 Q2 2012 Q2 2013 Q3 2006 Q3 2007
## 3118 4344 13 1539 2478 5061 7099 1934 2671
## Q3 2008 Q3 2009 Q3 2010 Q3 2011 Q3 2012 Q3 2013 Q4 2005 Q4 2006 Q4 2007
## 3602 585 1270 3093 5632 9180 22 2403 2592
## Q4 2008 Q4 2009 Q4 2010 Q4 2011 Q4 2012 Q4 2013
## 532 1449 1600 3913 4425 14450
There is an increasing trend from end 2005 till 2014 except for the period of end 2008 till early 2009. It drop in loan being approved could be due to the Global Financial Crisis. There is also a dip at the end of 2012 which could be caused by the European sovereign debt crisis.
## AA A B C D E HR NA NA's
## 5372 14551 15581 18345 14274 9795 6935 0 29084
Majority of borrowers are not classified. Among those being rated, ‘C’ is the most common rating. ‘AA’ is the highest rating and relatively less borrowers qualified for the rating. Excluding those non-classified, the plot shows a normal distribution.
## Min. 1st Qu. Median Mean 3rd Qu. Max. NA's
## 1.00 4.00 6.00 5.95 8.00 11.00 29084
Majority of the loan applicants are not rated. Among those rated, most have a score between 4 to 8.
## Not employed $0 $1-24,999 $25,000-49,999 $50,000-74,999
## 806 621 7274 32192 31050
## $75,000-99,999 $100,000+ Not displayed
## 16916 17337 7741
The median household income in the USA was $53,657 in 2014 (U.S. Census Bureau) and most of the borrowers are from the middle or lower-middle class.
There are less number of borrowers for those earning more than $75,000, as them usually have savings to cover their needs. It is worthwhile to note that comparatively there are way less number of loans approved for those earning less than $25,000 as they are deemed to risky to lend money to.
## Min. 1st Qu. Median Mean 3rd Qu. Max. NA's
## 0.000 0.140 0.220 0.276 0.320 10.010 8554
The debt-to-income ratio histogram on the left has a long tail where there are few people with a ratio of 10, which indicates them as risky borrowers as their income is too low to service their debt. By removing the top 1% outliers, we can see that most borrowers have a ratio of around 0.2.
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 1000 4000 6500 8337 12000 35000
We can see that most of the loan amount are around $5,000.
There are occasional spikes in $5k, $10k, $15k, $20k and even up $35k which are explainable by the fact that they are multiples of 5,000 where most people tend to use when deciding the amount to borrow.
## Employed Full-time Part-time Self-employed Retied
## 67322 26355 1088 6134 0
## Not employed Other Not available NA NA's
## 835 3806 5347 0 3050
Most of the borrowers are employed, be it full-time, part-time, self-employed or non-specified. This makes sense as loan applicants need to demonstrate that they have stable income to pay back the loan.
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 12.00 36.00 36.00 40.83 36.00 60.00
Majority of the borrowers have a loan period of 36 months or 3 years.
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0.0 131.6 217.7 272.5 371.6 2251.5
## [1] "173.71"
Majority of the monthly loan payment are less than $250.
$174 is the most common amount of monthly installment and only few borrowers have an installment of exceeding $1,000.
## Min. 1st Qu. Median Mean 3rd Qu. Max. NA's
## 0.0 660.0 680.0 685.6 720.0 880.0 591
## Min. 1st Qu. Median Mean 3rd Qu. Max. NA's
## 19.0 679.0 699.0 704.6 739.0 899.0 591
Line charts instead of bar charts are chosen to better reflect the range of score overlaid on top of each other.
The credit score range for most borrowers are between 650 to 750 and the gap between upper and lower range is around 20 points for most borrowers.
## False True
## 56459 57478
Homeownership is roughly equally split between True and False for borrowers.
From this, it can be deduced that homeownership might not be the top factors in deciding whether to extend the loans to borrowers.
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0.0000 0.1340 0.1840 0.1928 0.2500 0.4975
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## -0.0100 0.1242 0.1730 0.1827 0.2400 0.4925
The histograms show a bimodal distribution. Majority of the borrower rates and lender yield are between 0.1 and 0.2. The peak at above 0.3 could be possibly explained by the more common rate given to borrowers with less stellar creditworthiness.
When compared to the borrower rate, lender yield shows a similar trend with the x-axis shifted slightly to the left by 0.01. This could be explained by the fact that Prosper probably charges a 1% fees as its revenue.
## AK AL AR AZ CA CO CT DC DE FL GA HI
## 200 1679 855 1901 14717 2210 1627 382 300 6720 5008 409
## IA ID IL IN KS KY LA MA MD ME MI MN
## 186 599 5921 2078 1062 983 954 2242 2821 101 3593 2318
## MO MS MT NC ND NE NH NJ NM NV NY OH
## 2615 787 330 3084 52 674 551 3097 472 1090 6729 4197
## OK OR PA RI SC SD TN TX UT VA VT WA
## 971 1817 2972 435 1122 189 1737 6842 877 3278 207 3048
## WI WV WY NA's
## 1842 391 150 5515
California by far has the most borrowers at slightly less than 15,000, followed by Georgia, Florida, Illinois, New York and Texas which has between 5,000 and 7,000 borrowers each.
The high number of borrowers from these states doesn’t come as surprise as they are among the states with the most population. However, the much higher number of borrowers from California is not proportional to its population when compared to Texas. One hypothesis is that it enjoys higher awareness among Californians as an alternative to bank loans could be the reasons due to its location in California.
The dataset contains 81 variables with 113937 observations from year 2005 to 2014.
The typical characteristics of the borrowers are of interest for this dataset. Various plots are created to observe and identify the trend of each variable.
Income range, Debt-to-income ratio are few of the variables that will help to explain why the loans were approved and what are the yield/rate for the loans.
No, but I rearranged the factors such Prosper’s rating (Alpha), Prosper’s score, income range, employmenet status and loan term (months) so that the charts can be understood more easily. I also created new factors value for loan origination quarter to facilitate the ordering by year and quarter later on.
Most features do not have any unusual distributions and if they do, they are explainable by some other factors. The only one that I am interested in is spike at 0.3 in the borrower rate and lender yield. My expectation was that the graph is skewed towards lower rate to favor borrower with better credit history for risk management.
Initially, only 36-month term loan were given. 12-month and 60-month term loan were introduced in Q4 2010 but only 60-month term loan took off. 12-month term loan is believed to be discontinued in end 2012.
##
## Pearson's product-moment correlation
##
## data: ProsperScore and BorrowerRate
## t = -248.98, df = 84851, p-value < 2.2e-16
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
## -0.6536072 -0.6458311
## sample estimates:
## cor
## -0.6497361
##
## Pearson's product-moment correlation
##
## data: ProsperScore and LenderYield
## t = -249.01, df = 84851, p-value < 2.2e-16
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
## -0.6536541 -0.6458788
## sample estimates:
## cor
## -0.6497835
The boxplots above shows that Borrower Rate and Lender Yield decrease with improved Prosper’s score. Applicants with better rating pose less risk and thus have lower chance of defaulting. Therefore, lenders are willing to charge less interest rate.
##
## Pearson's product-moment correlation
##
## data: ProsperScore and StatedMonthlyIncome
## t = 24.484, df = 84851, p-value < 2.2e-16
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
## 0.07707163 0.09043415
## sample estimates:
## cor
## 0.08375665
##
## Pearson's product-moment correlation
##
## data: ProsperScore and LoanOriginalAmount
## t = 80.475, df = 84851, p-value < 2.2e-16
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
## 0.2600308 0.2725335
## sample estimates:
## cor
## 0.2662933
Delving into the monthly income and loan amount, both boxplot charts didn’t present any surprises. Applicants with higher rating tend to have higher monthly income and larger loan amount.
Looking at the relationship between employment status and loan amount, employed, self-employed and full-time borrowers are usually afforded higher loan amount as opposed to part-timers, not employed or not available.
##
## Pearson's product-moment correlation
##
## data: Term and LoanOriginalAmount
## t = 121.6, df = 113940, p-value < 2.2e-16
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
## 0.3337778 0.3440569
## sample estimates:
## cor
## 0.3389275
##
## Pearson's product-moment correlation
##
## data: Term and BorrowerRate
## t = 6.781, df = 113940, p-value = 1.199e-11
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
## 0.01428050 0.02588888
## sample estimates:
## cor
## 0.02008537
##
## Pearson's product-moment correlation
##
## data: Term and LenderYield
## t = 6.94, df = 113940, p-value = 3.941e-12
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
## 0.01475137 0.02635952
## sample estimates:
## cor
## 0.02055614
When investigating the effect of loan term (months), it can be said that loans with longer terms usually come with larger amount. As such, a higher interest rate is levied due to higher risk exposure. This is the same for lender yield as higher interest rate is needed to attract investor to lend to riskier borrowers.
##
## Pearson's product-moment correlation
##
## data: StatedMonthlyIncome and LoanOriginalAmount
## t = 69.353, df = 113940, p-value < 2.2e-16
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
## 0.1956816 0.2068243
## sample estimates:
## cor
## 0.2012595
##
## Pearson's product-moment correlation
##
## data: StatedMonthlyIncome and DebtToIncomeRatio
## t = -40.121, df = 105380, p-value < 2.2e-16
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
## -0.1286017 -0.1167082
## sample estimates:
## cor
## -0.1226594
When comparing the Income Range, those with higher income are able to borrow more as they also tend to have a lower debt-to-income ratio which indicates lower risk.
##
## Pearson's product-moment correlation
##
## data: DebtToIncomeRatio and BorrowerRate
## t = 20.465, df = 105380, p-value < 2.2e-16
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
## 0.05690080 0.06892819
## sample estimates:
## cor
## 0.06291678
##
## Pearson's product-moment correlation
##
## data: DebtToIncomeRatio and LenderYield
## t = 20.147, df = 105380, p-value < 2.2e-16
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
## 0.05592580 0.06795465
## sample estimates:
## cor
## 0.06194247
Lower debt-to-income ratio does lead to lower borrower rate or lender yield. That is because those with lower debt to income ratio indicates that they have better ability to service their loan installment and therefore have lower probability of defaulting on their loan.
In general, most of the relationship observed in the charts are aligned with my expectation. Applicants with higher Prosper’s score and lower debt-to-income ratio are able to enjoy lower borrower rate.
Even though borrower rate tends to increase as debt-to-income ratio increases, that seems to not be the case for those with a debt-to-income ratio of more than 1.5. There are probably other factors that lead to lower borrower rate. Further investigation is needed to understand this anomaly.
The strongest relatioship found is between Prosper’s score and borrower rate where higher Prosper’s score leads to lower borrower rate. Its correlation coefficient is -0.66.
Continuing my investigtion of the relationship between debt-to-income ratio and borrower rate, I removed all debt-to-income ratios of less than 1.5. From the scatterplotplot, it is rather a surprise that lots of borrowers with low or unidentified income are able to borrow a large sum of more than $10,000 with low borrower rate (less than 0.25). I suspect that there are other variables behind it.
By further extending my investigation, it shows that these borrowers with low income but yet able to borrow with low rate have rather good Prosper’s score (exclude those with ‘NA’ score). That explains the anomalies in the debt-to-income ratio vs borrower rate boxplot chart.
One interesting finding is borrower rate of respective Prosper’s rating tends to have a narrow range of borrower rate irregardless with the debt-to-income ratio. Debt-to-income ratio seems not to matter much as long as borrowers establish a good credit rating score.
Similarly, lender yield is more likely to be determined by the Prosper’s rating of the borrowers than the debt-to-income ratio.
Borrowers with no credit history are allowed to borrow less than $5,000 in general while borrowers with good ratings are allowed to borrow more up to $35,000.
It seems that most of the applicants that fulfil undesirable features of ‘bad borrowers’ are those with no prior Prosper’s rating, low monthly income and high debt-to-income ratio. However, this is normal especially for young graduates who just started out.
It can be seen that the length of the loan term somewhat corresponds to the amount of the loan. Larger loan typically requires longer monthly installment period irregardless of the monthly income.
It is also necessary to take note that 12-month term loan were discontinued by 2012 possibly due to lack of interest as there might be an upper limit placed on the amount of loan figure.
Using facet wrap, we can observe clearly that borrowers with lower Prosper’s score are allowed to borrow smaller amount of loan due to higher perceived risk of defaulting.
As Prosper expands over the years, the expansion mostly comes from borrowers with good rating while those with no credit history have decreased. It shows that Propser is pursuing more sustainable business model. Another probably explanation is that potential new customer have acquired some credit history over the years.
The barchart above confirms my hypothesis that Prosper expansion comes mostly from performing loans. Over the years, the number of defaulted loans has dropped.
Monthly income is a strong factor in determining the borrower rate. At the same time, borrowers with no credit history tend to be those with lower salary which suggests that they might be young people who just graduated or still in school.
It is interesting to note that debt-to-income ratio has minimal effect on the borrower rate, holding the Prosper’s rating variable constant. Previous chart that shows debt-to-income ratio drops with increasing salary suggests that the ratio is rather a dependent variable of monthly income.
The boxplots above shows that borrowers with better Prosper’s Score tend to enjoy lower borrower rate. The range of the rate doesn’t fluctuate much for most Prosper’s score with the exception of those with moderate score between 4 to 7.
However, there are outliers in the opposite trend for those with the best and worst score. That could be due to other factors such as amount of loan taken, new monthly income or change in employment status.
This graph is chosen as it shows that the Prosper’s rating (alpha) is curcial in determining the borrower rate. Holding the Prosper’s rating (alpha) constant, an increase in debt-to-income ratio has insignificant impact on the borrower rate.
Those with ‘HR’ rating are more likely to have debt-to-income ratio larger than 1.0. On the other hand, borrowers rated ‘AA’ tend to have ratio of less than 0.5 and thus have a lower borrower rate that is usually below 0.1.
The barchart above shows that Prosper has been expanding its loaning operation with the exception of late 2008 and late 2012 which is possibly caused by the Global Financial Crisis and the European Sovereign Debt Crisis.
Over the years, non-performing loans have decreased largely. That shows that Prosper’s ability to predict its applicants creditworthiness has been improving. Another reason could be Prosper decided to pursue a more sustainable expansion instead of lending to risky borrowers for higher yield which might result in bankruptcy when non-performing loans outnumber performing loans. When crisis hit, Prosper seemed to tigheten its lending policy which is in line with most bank practices as well.
When I started exploring this dataset, I was overwhelmed by the number of variables available. It was very tedious to study the relationship between all variables. As such, I only chose about 20 variables that I am more familiar with. It would be great if Prosper is able to provide better clarification how some of the rating, score or borrower rate were determined. However, I also understand that these data are Prosper’s confidential proprieatry. That being said, once I spent a few days working on the data I have a better grasp of the dataset. I only included plots that are related to the storytelling and excluded others variables that doesn’t tell much about the characteristics of the demographics.
The other challenge that I faced is unfamiliarity with R. Since this is my first time coding in R, I took a lot of notes on everything and did a lot of Googling on forums and documentations to find out how to plot certain graphs or customize the charts. I am glad that my effort paid off well as I am able to produce complete the chapter and this project in less than 2 weeks’ time.
Overall, I was able to come out with a great storyline for this report. The variables don’t seem too intimidating after a while since most are quite self-explanatory. Without specific questions or directions, I am free to venture around and determine the focus of the storyline. That is when I decided to look more info how the borrower rate was determined and how the Prosper’s rating affected other variables.
I was rather surprised that debt-to-income ratio doesn’t seem to play an important role in determining the borrower rate after taking into account of Prosper’s rating. However, I can’t totally exclude the importance of debt-to-income ratio without knowing how Prosper’s rating is determined as debt-to-income ratio might be one of the main determinant components.
To move on from here, it would be great to be able to build an equation or predictive model to simulate real world scenario. Prosper can also collect other related info that might aid in making the prediction more accurate such as age, education level or city of the applicants/borrowers. Prosper can also help to explain how the rating were determined without revealing too much corporate info as it will help in building the predictive model.