Loan Data from Prosper by Kai Sheng TEH

Prosper is a San Francisco-based peer-to-peer lending company established in 2005. Prosper generates revenue by charging borrowers an origination fees of 1% to 5% to verify identities and assess credibility of borrowers. It also charges investors a 1% annual servicing fee. The dataset provided to Udacity was last updated in 11th March, 2014.

Univariate Plots Section

To kickstart the analysis, analyses below are carried out:

##  [1] "ListingKey"                         
##  [2] "ListingNumber"                      
##  [3] "ListingCreationDate"                
##  [4] "CreditGrade"                        
##  [5] "Term"                               
##  [6] "LoanStatus"                         
##  [7] "ClosedDate"                         
##  [8] "BorrowerAPR"                        
##  [9] "BorrowerRate"                       
## [10] "LenderYield"                        
## [11] "EstimatedEffectiveYield"            
## [12] "EstimatedLoss"                      
## [13] "EstimatedReturn"                    
## [14] "ProsperRating..numeric."            
## [15] "ProsperRating..Alpha."              
## [16] "ProsperScore"                       
## [17] "ListingCategory..numeric."          
## [18] "BorrowerState"                      
## [19] "Occupation"                         
## [20] "EmploymentStatus"                   
## [21] "EmploymentStatusDuration"           
## [22] "IsBorrowerHomeowner"                
## [23] "CurrentlyInGroup"                   
## [24] "GroupKey"                           
## [25] "DateCreditPulled"                   
## [26] "CreditScoreRangeLower"              
## [27] "CreditScoreRangeUpper"              
## [28] "FirstRecordedCreditLine"            
## [29] "CurrentCreditLines"                 
## [30] "OpenCreditLines"                    
## [31] "TotalCreditLinespast7years"         
## [32] "OpenRevolvingAccounts"              
## [33] "OpenRevolvingMonthlyPayment"        
## [34] "InquiriesLast6Months"               
## [35] "TotalInquiries"                     
## [36] "CurrentDelinquencies"               
## [37] "AmountDelinquent"                   
## [38] "DelinquenciesLast7Years"            
## [39] "PublicRecordsLast10Years"           
## [40] "PublicRecordsLast12Months"          
## [41] "RevolvingCreditBalance"             
## [42] "BankcardUtilization"                
## [43] "AvailableBankcardCredit"            
## [44] "TotalTrades"                        
## [45] "TradesNeverDelinquent..percentage." 
## [46] "TradesOpenedLast6Months"            
## [47] "DebtToIncomeRatio"                  
## [48] "IncomeRange"                        
## [49] "IncomeVerifiable"                   
## [50] "StatedMonthlyIncome"                
## [51] "LoanKey"                            
## [52] "TotalProsperLoans"                  
## [53] "TotalProsperPaymentsBilled"         
## [54] "OnTimeProsperPayments"              
## [55] "ProsperPaymentsLessThanOneMonthLate"
## [56] "ProsperPaymentsOneMonthPlusLate"    
## [57] "ProsperPrincipalBorrowed"           
## [58] "ProsperPrincipalOutstanding"        
## [59] "ScorexChangeAtTimeOfListing"        
## [60] "LoanCurrentDaysDelinquent"          
## [61] "LoanFirstDefaultedCycleNumber"      
## [62] "LoanMonthsSinceOrigination"         
## [63] "LoanNumber"                         
## [64] "LoanOriginalAmount"                 
## [65] "LoanOriginationDate"                
## [66] "LoanOriginationQuarter"             
## [67] "MemberKey"                          
## [68] "MonthlyLoanPayment"                 
## [69] "LP_CustomerPayments"                
## [70] "LP_CustomerPrincipalPayments"       
## [71] "LP_InterestandFees"                 
## [72] "LP_ServiceFees"                     
## [73] "LP_CollectionFees"                  
## [74] "LP_GrossPrincipalLoss"              
## [75] "LP_NetPrincipalLoss"                
## [76] "LP_NonPrincipalRecoverypayments"    
## [77] "PercentFunded"                      
## [78] "Recommendations"                    
## [79] "InvestmentFromFriendsCount"         
## [80] "InvestmentFromFriendsAmount"        
## [81] "Investors"

There are a total of 81 columns in the dataset. Excluding listing-related identifiers, there should be around 70 variables.

##                    ListingKey     ListingNumber    
##  17A93590655669644DB4C06:     6   Min.   :      4  
##  349D3587495831350F0F648:     4   1st Qu.: 400919  
##  47C1359638497431975670B:     4   Median : 600554  
##  8474358854651984137201C:     4   Mean   : 627886  
##  DE8535960513435199406CE:     4   3rd Qu.: 892634  
##  04C13599434217079754AEE:     3   Max.   :1255725  
##  (Other)                :113912                    
##                     ListingCreationDate  CreditGrade         Term      
##  2013-10-02 17:20:16.550000000:     6   C      : 5649   Min.   :12.00  
##  2013-08-28 20:31:41.107000000:     4   D      : 5153   1st Qu.:36.00  
##  2013-09-08 09:27:44.853000000:     4   B      : 4389   Median :36.00  
##  2013-12-06 05:43:13.830000000:     4   AA     : 3509   Mean   :40.83  
##  2013-12-06 11:44:58.283000000:     4   HR     : 3508   3rd Qu.:36.00  
##  2013-08-21 07:25:22.360000000:     3   (Other): 6745   Max.   :60.00  
##  (Other)                      :113912   NA's   :84984                  
##                  LoanStatus                  ClosedDate   
##  Current              :56576   2014-03-04 00:00:00:  105  
##  Completed            :38074   2014-02-19 00:00:00:  100  
##  Chargedoff           :11992   2014-02-11 00:00:00:   92  
##  Defaulted            : 5018   2012-10-30 00:00:00:   81  
##  Past Due (1-15 days) :  806   2013-02-26 00:00:00:   78  
##  Past Due (31-60 days):  363   (Other)            :54633  
##  (Other)              : 1108   NA's               :58848  
##   BorrowerAPR       BorrowerRate     LenderYield     
##  Min.   :0.00653   Min.   :0.0000   Min.   :-0.0100  
##  1st Qu.:0.15629   1st Qu.:0.1340   1st Qu.: 0.1242  
##  Median :0.20976   Median :0.1840   Median : 0.1730  
##  Mean   :0.21883   Mean   :0.1928   Mean   : 0.1827  
##  3rd Qu.:0.28381   3rd Qu.:0.2500   3rd Qu.: 0.2400  
##  Max.   :0.51229   Max.   :0.4975   Max.   : 0.4925  
##  NA's   :25                                          
##  EstimatedEffectiveYield EstimatedLoss   EstimatedReturn 
##  Min.   :-0.183          Min.   :0.005   Min.   :-0.183  
##  1st Qu.: 0.116          1st Qu.:0.042   1st Qu.: 0.074  
##  Median : 0.162          Median :0.072   Median : 0.092  
##  Mean   : 0.169          Mean   :0.080   Mean   : 0.096  
##  3rd Qu.: 0.224          3rd Qu.:0.112   3rd Qu.: 0.117  
##  Max.   : 0.320          Max.   :0.366   Max.   : 0.284  
##  NA's   :29084           NA's   :29084   NA's   :29084   
##  ProsperRating..numeric. ProsperRating..Alpha.  ProsperScore  
##  Min.   :1.000           C      :18345         Min.   : 1.00  
##  1st Qu.:3.000           B      :15581         1st Qu.: 4.00  
##  Median :4.000           A      :14551         Median : 6.00  
##  Mean   :4.072           D      :14274         Mean   : 5.95  
##  3rd Qu.:5.000           E      : 9795         3rd Qu.: 8.00  
##  Max.   :7.000           (Other):12307         Max.   :11.00  
##  NA's   :29084           NA's   :29084         NA's   :29084  
##  ListingCategory..numeric. BorrowerState                 Occupation   
##  Min.   : 0.000            CA     :14717   Other              :28617  
##  1st Qu.: 1.000            TX     : 6842   Professional       :13628  
##  Median : 1.000            NY     : 6729   Computer Programmer: 4478  
##  Mean   : 2.774            FL     : 6720   Executive          : 4311  
##  3rd Qu.: 3.000            IL     : 5921   Teacher            : 3759  
##  Max.   :20.000            (Other):67493   (Other)            :55556  
##                            NA's   : 5515   NA's               : 3588  
##       EmploymentStatus EmploymentStatusDuration IsBorrowerHomeowner
##  Employed     :67322   Min.   :  0.00           False:56459        
##  Full-time    :26355   1st Qu.: 26.00           True :57478        
##  Self-employed: 6134   Median : 67.00                              
##  Not available: 5347   Mean   : 96.07                              
##  Other        : 3806   3rd Qu.:137.00                              
##  (Other)      : 2718   Max.   :755.00                              
##  NA's         : 2255   NA's   :7625                                
##  CurrentlyInGroup                    GroupKey     
##  False:101218     783C3371218786870A73D20:  1140  
##  True : 12719     3D4D3366260257624AB272D:   916  
##                   6A3B336601725506917317E:   698  
##                   FEF83377364176536637E50:   611  
##                   C9643379247860156A00EC0:   342  
##                   (Other)                :  9634  
##                   NA's                   :100596  
##             DateCreditPulled  CreditScoreRangeLower CreditScoreRangeUpper
##  2013-12-23 09:38:12:     6   Min.   :  0.0         Min.   : 19.0        
##  2013-11-21 09:09:41:     4   1st Qu.:660.0         1st Qu.:679.0        
##  2013-12-06 05:43:16:     4   Median :680.0         Median :699.0        
##  2014-01-14 20:17:49:     4   Mean   :685.6         Mean   :704.6        
##  2014-02-09 12:14:41:     4   3rd Qu.:720.0         3rd Qu.:739.0        
##  2013-09-27 22:04:54:     3   Max.   :880.0         Max.   :899.0        
##  (Other)            :113912   NA's   :591           NA's   :591          
##         FirstRecordedCreditLine CurrentCreditLines OpenCreditLines
##  1993-12-01 00:00:00:   185     Min.   : 0.00      Min.   : 0.00  
##  1994-11-01 00:00:00:   178     1st Qu.: 7.00      1st Qu.: 6.00  
##  1995-11-01 00:00:00:   168     Median :10.00      Median : 9.00  
##  1990-04-01 00:00:00:   161     Mean   :10.32      Mean   : 9.26  
##  1995-03-01 00:00:00:   159     3rd Qu.:13.00      3rd Qu.:12.00  
##  (Other)            :112389     Max.   :59.00      Max.   :54.00  
##  NA's               :   697     NA's   :7604       NA's   :7604   
##  TotalCreditLinespast7years OpenRevolvingAccounts
##  Min.   :  2.00             Min.   : 0.00        
##  1st Qu.: 17.00             1st Qu.: 4.00        
##  Median : 25.00             Median : 6.00        
##  Mean   : 26.75             Mean   : 6.97        
##  3rd Qu.: 35.00             3rd Qu.: 9.00        
##  Max.   :136.00             Max.   :51.00        
##  NA's   :697                                     
##  OpenRevolvingMonthlyPayment InquiriesLast6Months TotalInquiries   
##  Min.   :    0.0             Min.   :  0.000      Min.   :  0.000  
##  1st Qu.:  114.0             1st Qu.:  0.000      1st Qu.:  2.000  
##  Median :  271.0             Median :  1.000      Median :  4.000  
##  Mean   :  398.3             Mean   :  1.435      Mean   :  5.584  
##  3rd Qu.:  525.0             3rd Qu.:  2.000      3rd Qu.:  7.000  
##  Max.   :14985.0             Max.   :105.000      Max.   :379.000  
##                              NA's   :697          NA's   :1159     
##  CurrentDelinquencies AmountDelinquent   DelinquenciesLast7Years
##  Min.   : 0.0000      Min.   :     0.0   Min.   : 0.000         
##  1st Qu.: 0.0000      1st Qu.:     0.0   1st Qu.: 0.000         
##  Median : 0.0000      Median :     0.0   Median : 0.000         
##  Mean   : 0.5921      Mean   :   984.5   Mean   : 4.155         
##  3rd Qu.: 0.0000      3rd Qu.:     0.0   3rd Qu.: 3.000         
##  Max.   :83.0000      Max.   :463881.0   Max.   :99.000         
##  NA's   :697          NA's   :7622       NA's   :990            
##  PublicRecordsLast10Years PublicRecordsLast12Months RevolvingCreditBalance
##  Min.   : 0.0000          Min.   : 0.000            Min.   :      0       
##  1st Qu.: 0.0000          1st Qu.: 0.000            1st Qu.:   3121       
##  Median : 0.0000          Median : 0.000            Median :   8549       
##  Mean   : 0.3126          Mean   : 0.015            Mean   :  17599       
##  3rd Qu.: 0.0000          3rd Qu.: 0.000            3rd Qu.:  19521       
##  Max.   :38.0000          Max.   :20.000            Max.   :1435667       
##  NA's   :697              NA's   :7604              NA's   :7604          
##  BankcardUtilization AvailableBankcardCredit  TotalTrades    
##  Min.   :0.000       Min.   :     0          Min.   :  0.00  
##  1st Qu.:0.310       1st Qu.:   880          1st Qu.: 15.00  
##  Median :0.600       Median :  4100          Median : 22.00  
##  Mean   :0.561       Mean   : 11210          Mean   : 23.23  
##  3rd Qu.:0.840       3rd Qu.: 13180          3rd Qu.: 30.00  
##  Max.   :5.950       Max.   :646285          Max.   :126.00  
##  NA's   :7604        NA's   :7544            NA's   :7544    
##  TradesNeverDelinquent..percentage. TradesOpenedLast6Months
##  Min.   :0.000                      Min.   : 0.000         
##  1st Qu.:0.820                      1st Qu.: 0.000         
##  Median :0.940                      Median : 0.000         
##  Mean   :0.886                      Mean   : 0.802         
##  3rd Qu.:1.000                      3rd Qu.: 1.000         
##  Max.   :1.000                      Max.   :20.000         
##  NA's   :7544                       NA's   :7544           
##  DebtToIncomeRatio         IncomeRange    IncomeVerifiable
##  Min.   : 0.000    $25,000-49,999:32192   False:  8669    
##  1st Qu.: 0.140    $50,000-74,999:31050   True :105268    
##  Median : 0.220    $100,000+     :17337                   
##  Mean   : 0.276    $75,000-99,999:16916                   
##  3rd Qu.: 0.320    Not displayed : 7741                   
##  Max.   :10.010    $1-24,999     : 7274                   
##  NA's   :8554      (Other)       : 1427                   
##  StatedMonthlyIncome                    LoanKey       TotalProsperLoans
##  Min.   :      0     CB1B37030986463208432A1:     6   Min.   :0.00     
##  1st Qu.:   3200     2DEE3698211017519D7333F:     4   1st Qu.:1.00     
##  Median :   4667     9F4B37043517554537C364C:     4   Median :1.00     
##  Mean   :   5608     D895370150591392337ED6D:     4   Mean   :1.42     
##  3rd Qu.:   6825     E6FB37073953690388BC56D:     4   3rd Qu.:2.00     
##  Max.   :1750003     0D8F37036734373301ED419:     3   Max.   :8.00     
##                      (Other)                :113912   NA's   :91852    
##  TotalProsperPaymentsBilled OnTimeProsperPayments
##  Min.   :  0.00             Min.   :  0.00       
##  1st Qu.:  9.00             1st Qu.:  9.00       
##  Median : 16.00             Median : 15.00       
##  Mean   : 22.93             Mean   : 22.27       
##  3rd Qu.: 33.00             3rd Qu.: 32.00       
##  Max.   :141.00             Max.   :141.00       
##  NA's   :91852              NA's   :91852        
##  ProsperPaymentsLessThanOneMonthLate ProsperPaymentsOneMonthPlusLate
##  Min.   : 0.00                       Min.   : 0.00                  
##  1st Qu.: 0.00                       1st Qu.: 0.00                  
##  Median : 0.00                       Median : 0.00                  
##  Mean   : 0.61                       Mean   : 0.05                  
##  3rd Qu.: 0.00                       3rd Qu.: 0.00                  
##  Max.   :42.00                       Max.   :21.00                  
##  NA's   :91852                       NA's   :91852                  
##  ProsperPrincipalBorrowed ProsperPrincipalOutstanding
##  Min.   :    0            Min.   :    0              
##  1st Qu.: 3500            1st Qu.:    0              
##  Median : 6000            Median : 1627              
##  Mean   : 8472            Mean   : 2930              
##  3rd Qu.:11000            3rd Qu.: 4127              
##  Max.   :72499            Max.   :23451              
##  NA's   :91852            NA's   :91852              
##  ScorexChangeAtTimeOfListing LoanCurrentDaysDelinquent
##  Min.   :-209.00             Min.   :   0.0           
##  1st Qu.: -35.00             1st Qu.:   0.0           
##  Median :  -3.00             Median :   0.0           
##  Mean   :  -3.22             Mean   : 152.8           
##  3rd Qu.:  25.00             3rd Qu.:   0.0           
##  Max.   : 286.00             Max.   :2704.0           
##  NA's   :95009                                        
##  LoanFirstDefaultedCycleNumber LoanMonthsSinceOrigination   LoanNumber    
##  Min.   : 0.00                 Min.   :  0.0              Min.   :     1  
##  1st Qu.: 9.00                 1st Qu.:  6.0              1st Qu.: 37332  
##  Median :14.00                 Median : 21.0              Median : 68599  
##  Mean   :16.27                 Mean   : 31.9              Mean   : 69444  
##  3rd Qu.:22.00                 3rd Qu.: 65.0              3rd Qu.:101901  
##  Max.   :44.00                 Max.   :100.0              Max.   :136486  
##  NA's   :96985                                                            
##  LoanOriginalAmount          LoanOriginationDate LoanOriginationQuarter
##  Min.   : 1000      2014-01-22 00:00:00:   491   Q4 2013:14450         
##  1st Qu.: 4000      2013-11-13 00:00:00:   490   Q1 2014:12172         
##  Median : 6500      2014-02-19 00:00:00:   439   Q3 2013: 9180         
##  Mean   : 8337      2013-10-16 00:00:00:   434   Q2 2013: 7099         
##  3rd Qu.:12000      2014-01-28 00:00:00:   339   Q3 2012: 5632         
##  Max.   :35000      2013-09-24 00:00:00:   316   Q2 2012: 5061         
##                     (Other)            :111428   (Other):60343         
##                    MemberKey      MonthlyLoanPayment LP_CustomerPayments
##  63CA34120866140639431C9:     9   Min.   :   0.0     Min.   :   -2.35   
##  16083364744933457E57FB9:     8   1st Qu.: 131.6     1st Qu.: 1005.76   
##  3A2F3380477699707C81385:     8   Median : 217.7     Median : 2583.83   
##  4D9C3403302047712AD0CDD:     8   Mean   : 272.5     Mean   : 4183.08   
##  739C338135235294782AE75:     8   3rd Qu.: 371.6     3rd Qu.: 5548.40   
##  7E1733653050264822FAA3D:     8   Max.   :2251.5     Max.   :40702.39   
##  (Other)                :113888                                         
##  LP_CustomerPrincipalPayments LP_InterestandFees LP_ServiceFees   
##  Min.   :    0.0              Min.   :   -2.35   Min.   :-664.87  
##  1st Qu.:  500.9              1st Qu.:  274.87   1st Qu.: -73.18  
##  Median : 1587.5              Median :  700.84   Median : -34.44  
##  Mean   : 3105.5              Mean   : 1077.54   Mean   : -54.73  
##  3rd Qu.: 4000.0              3rd Qu.: 1458.54   3rd Qu.: -13.92  
##  Max.   :35000.0              Max.   :15617.03   Max.   :  32.06  
##                                                                   
##  LP_CollectionFees  LP_GrossPrincipalLoss LP_NetPrincipalLoss
##  Min.   :-9274.75   Min.   :  -94.2       Min.   : -954.5    
##  1st Qu.:    0.00   1st Qu.:    0.0       1st Qu.:    0.0    
##  Median :    0.00   Median :    0.0       Median :    0.0    
##  Mean   :  -14.24   Mean   :  700.4       Mean   :  681.4    
##  3rd Qu.:    0.00   3rd Qu.:    0.0       3rd Qu.:    0.0    
##  Max.   :    0.00   Max.   :25000.0       Max.   :25000.0    
##                                                              
##  LP_NonPrincipalRecoverypayments PercentFunded    Recommendations   
##  Min.   :    0.00                Min.   :0.7000   Min.   : 0.00000  
##  1st Qu.:    0.00                1st Qu.:1.0000   1st Qu.: 0.00000  
##  Median :    0.00                Median :1.0000   Median : 0.00000  
##  Mean   :   25.14                Mean   :0.9986   Mean   : 0.04803  
##  3rd Qu.:    0.00                3rd Qu.:1.0000   3rd Qu.: 0.00000  
##  Max.   :21117.90                Max.   :1.0125   Max.   :39.00000  
##                                                                     
##  InvestmentFromFriendsCount InvestmentFromFriendsAmount   Investors      
##  Min.   : 0.00000           Min.   :    0.00            Min.   :   1.00  
##  1st Qu.: 0.00000           1st Qu.:    0.00            1st Qu.:   2.00  
##  Median : 0.00000           Median :    0.00            Median :  44.00  
##  Mean   : 0.02346           Mean   :   16.55            Mean   :  80.48  
##  3rd Qu.: 0.00000           3rd Qu.:    0.00            3rd Qu.: 115.00  
##  Max.   :33.00000           Max.   :25000.00            Max.   :1189.00  
## 

Analyses included in the summaries of variables above:

  1. range for continuous variables

  2. top 5 items in discrete variables

## 'data.frame':    113937 obs. of  81 variables:
##  $ ListingKey                         : Factor w/ 113066 levels "00003546482094282EF90E5",..: 7180 7193 6647 6669 6686 6689 6699 6706 6687 6687 ...
##  $ ListingNumber                      : int  193129 1209647 81716 658116 909464 1074836 750899 768193 1023355 1023355 ...
##  $ ListingCreationDate                : Factor w/ 113064 levels "2005-11-09 20:44:28.847000000",..: 14184 111894 6429 64760 85967 100310 72556 74019 97834 97834 ...
##  $ CreditGrade                        : Factor w/ 8 levels "A","AA","B","C",..: 4 NA 7 NA NA NA NA NA NA NA ...
##  $ Term                               : int  36 36 36 36 36 60 36 36 36 36 ...
##  $ LoanStatus                         : Factor w/ 12 levels "Cancelled","Chargedoff",..: 3 4 3 4 4 4 4 4 4 4 ...
##  $ ClosedDate                         : Factor w/ 2802 levels "2005-11-25 00:00:00",..: 1137 NA 1262 NA NA NA NA NA NA NA ...
##  $ BorrowerAPR                        : num  0.165 0.12 0.283 0.125 0.246 ...
##  $ BorrowerRate                       : num  0.158 0.092 0.275 0.0974 0.2085 ...
##  $ LenderYield                        : num  0.138 0.082 0.24 0.0874 0.1985 ...
##  $ EstimatedEffectiveYield            : num  NA 0.0796 NA 0.0849 0.1832 ...
##  $ EstimatedLoss                      : num  NA 0.0249 NA 0.0249 0.0925 ...
##  $ EstimatedReturn                    : num  NA 0.0547 NA 0.06 0.0907 ...
##  $ ProsperRating..numeric.            : int  NA 6 NA 6 3 5 2 4 7 7 ...
##  $ ProsperRating..Alpha.              : Factor w/ 7 levels "A","AA","B","C",..: NA 1 NA 1 5 3 6 4 2 2 ...
##  $ ProsperScore                       : num  NA 7 NA 9 4 10 2 4 9 11 ...
##  $ ListingCategory..numeric.          : int  0 2 0 16 2 1 1 2 7 7 ...
##  $ BorrowerState                      : Factor w/ 51 levels "AK","AL","AR",..: 6 6 11 11 24 33 17 5 15 15 ...
##  $ Occupation                         : Factor w/ 67 levels "Accountant/CPA",..: 36 42 36 51 20 42 49 28 23 23 ...
##  $ EmploymentStatus                   : Factor w/ 8 levels "Employed","Full-time",..: 8 1 3 1 1 1 1 1 1 1 ...
##  $ EmploymentStatusDuration           : int  2 44 NA 113 44 82 172 103 269 269 ...
##  $ IsBorrowerHomeowner                : Factor w/ 2 levels "False","True": 2 1 1 2 2 2 1 1 2 2 ...
##  $ CurrentlyInGroup                   : Factor w/ 2 levels "False","True": 2 1 2 1 1 1 1 1 1 1 ...
##  $ GroupKey                           : Factor w/ 706 levels "00343376901312423168731",..: NA NA 334 NA NA NA NA NA NA NA ...
##  $ DateCreditPulled                   : Factor w/ 112992 levels "2005-11-09 00:30:04.487000000",..: 14347 111883 6446 64724 85857 100382 72500 73937 97888 97888 ...
##  $ CreditScoreRangeLower              : int  640 680 480 800 680 740 680 700 820 820 ...
##  $ CreditScoreRangeUpper              : int  659 699 499 819 699 759 699 719 839 839 ...
##  $ FirstRecordedCreditLine            : Factor w/ 11585 levels "1947-08-24 00:00:00",..: 8638 6616 8926 2246 9497 496 8264 7684 5542 5542 ...
##  $ CurrentCreditLines                 : int  5 14 NA 5 19 21 10 6 17 17 ...
##  $ OpenCreditLines                    : int  4 14 NA 5 19 17 7 6 16 16 ...
##  $ TotalCreditLinespast7years         : int  12 29 3 29 49 49 20 10 32 32 ...
##  $ OpenRevolvingAccounts              : int  1 13 0 7 6 13 6 5 12 12 ...
##  $ OpenRevolvingMonthlyPayment        : num  24 389 0 115 220 1410 214 101 219 219 ...
##  $ InquiriesLast6Months               : int  3 3 0 0 1 0 0 3 1 1 ...
##  $ TotalInquiries                     : num  3 5 1 1 9 2 0 16 6 6 ...
##  $ CurrentDelinquencies               : int  2 0 1 4 0 0 0 0 0 0 ...
##  $ AmountDelinquent                   : num  472 0 NA 10056 0 ...
##  $ DelinquenciesLast7Years            : int  4 0 0 14 0 0 0 0 0 0 ...
##  $ PublicRecordsLast10Years           : int  0 1 0 0 0 0 0 1 0 0 ...
##  $ PublicRecordsLast12Months          : int  0 0 NA 0 0 0 0 0 0 0 ...
##  $ RevolvingCreditBalance             : num  0 3989 NA 1444 6193 ...
##  $ BankcardUtilization                : num  0 0.21 NA 0.04 0.81 0.39 0.72 0.13 0.11 0.11 ...
##  $ AvailableBankcardCredit            : num  1500 10266 NA 30754 695 ...
##  $ TotalTrades                        : num  11 29 NA 26 39 47 16 10 29 29 ...
##  $ TradesNeverDelinquent..percentage. : num  0.81 1 NA 0.76 0.95 1 0.68 0.8 1 1 ...
##  $ TradesOpenedLast6Months            : num  0 2 NA 0 2 0 0 0 1 1 ...
##  $ DebtToIncomeRatio                  : num  0.17 0.18 0.06 0.15 0.26 0.36 0.27 0.24 0.25 0.25 ...
##  $ IncomeRange                        : Factor w/ 8 levels "$0","$1-24,999",..: 4 5 7 4 3 3 4 4 4 4 ...
##  $ IncomeVerifiable                   : Factor w/ 2 levels "False","True": 2 2 2 2 2 2 2 2 2 2 ...
##  $ StatedMonthlyIncome                : num  3083 6125 2083 2875 9583 ...
##  $ LoanKey                            : Factor w/ 113066 levels "00003683605746079487FF7",..: 100337 69837 46303 70776 71387 86505 91250 5425 908 908 ...
##  $ TotalProsperLoans                  : int  NA NA NA NA 1 NA NA NA NA NA ...
##  $ TotalProsperPaymentsBilled         : int  NA NA NA NA 11 NA NA NA NA NA ...
##  $ OnTimeProsperPayments              : int  NA NA NA NA 11 NA NA NA NA NA ...
##  $ ProsperPaymentsLessThanOneMonthLate: int  NA NA NA NA 0 NA NA NA NA NA ...
##  $ ProsperPaymentsOneMonthPlusLate    : int  NA NA NA NA 0 NA NA NA NA NA ...
##  $ ProsperPrincipalBorrowed           : num  NA NA NA NA 11000 NA NA NA NA NA ...
##  $ ProsperPrincipalOutstanding        : num  NA NA NA NA 9948 ...
##  $ ScorexChangeAtTimeOfListing        : int  NA NA NA NA NA NA NA NA NA NA ...
##  $ LoanCurrentDaysDelinquent          : int  0 0 0 0 0 0 0 0 0 0 ...
##  $ LoanFirstDefaultedCycleNumber      : int  NA NA NA NA NA NA NA NA NA NA ...
##  $ LoanMonthsSinceOrigination         : int  78 0 86 16 6 3 11 10 3 3 ...
##  $ LoanNumber                         : int  19141 134815 6466 77296 102670 123257 88353 90051 121268 121268 ...
##  $ LoanOriginalAmount                 : int  9425 10000 3001 10000 15000 15000 3000 10000 10000 10000 ...
##  $ LoanOriginationDate                : Factor w/ 1873 levels "2005-11-15 00:00:00",..: 426 1866 260 1535 1757 1821 1649 1666 1813 1813 ...
##  $ LoanOriginationQuarter             : Factor w/ 33 levels "Q1 2006","Q1 2007",..: 18 8 2 32 24 33 16 16 33 33 ...
##  $ MemberKey                          : Factor w/ 90831 levels "00003397697413387CAF966",..: 11071 10302 33781 54939 19465 48037 60448 40951 26129 26129 ...
##  $ MonthlyLoanPayment                 : num  330 319 123 321 564 ...
##  $ LP_CustomerPayments                : num  11396 0 4187 5143 2820 ...
##  $ LP_CustomerPrincipalPayments       : num  9425 0 3001 4091 1563 ...
##  $ LP_InterestandFees                 : num  1971 0 1186 1052 1257 ...
##  $ LP_ServiceFees                     : num  -133.2 0 -24.2 -108 -60.3 ...
##  $ LP_CollectionFees                  : num  0 0 0 0 0 0 0 0 0 0 ...
##  $ LP_GrossPrincipalLoss              : num  0 0 0 0 0 0 0 0 0 0 ...
##  $ LP_NetPrincipalLoss                : num  0 0 0 0 0 0 0 0 0 0 ...
##  $ LP_NonPrincipalRecoverypayments    : num  0 0 0 0 0 0 0 0 0 0 ...
##  $ PercentFunded                      : num  1 1 1 1 1 1 1 1 1 1 ...
##  $ Recommendations                    : int  0 0 0 0 0 0 0 0 0 0 ...
##  $ InvestmentFromFriendsCount         : int  0 0 0 0 0 0 0 0 0 0 ...
##  $ InvestmentFromFriendsAmount        : num  0 0 0 0 0 0 0 0 0 0 ...
##  $ Investors                          : int  258 1 41 158 20 1 1 1 1 1 ...

Internal structures of the 81 variables are as above.

## Q1 2006 Q1 2007 Q1 2008 Q1 2010 Q1 2011 Q1 2012 Q1 2013 Q1 2014 Q2 2006 
##     315    3079    3074    1243    1744    4435    3616   12172    1254 
## Q2 2007 Q2 2008 Q2 2009 Q2 2010 Q2 2011 Q2 2012 Q2 2013 Q3 2006 Q3 2007 
##    3118    4344      13    1539    2478    5061    7099    1934    2671 
## Q3 2008 Q3 2009 Q3 2010 Q3 2011 Q3 2012 Q3 2013 Q4 2005 Q4 2006 Q4 2007 
##    3602     585    1270    3093    5632    9180      22    2403    2592 
## Q4 2008 Q4 2009 Q4 2010 Q4 2011 Q4 2012 Q4 2013 
##     532    1449    1600    3913    4425   14450

There is an increasing trend from end 2005 till 2014 except for the period of end 2008 till early 2009. It drop in loan being approved could be due to the Global Financial Crisis. There is also a dip at the end of 2012 which could be caused by the European sovereign debt crisis.

##    AA     A     B     C     D     E    HR    NA  NA's 
##  5372 14551 15581 18345 14274  9795  6935     0 29084

Majority of borrowers are not classified. Among those being rated, ‘C’ is the most common rating. ‘AA’ is the highest rating and relatively less borrowers qualified for the rating. Excluding those non-classified, the plot shows a normal distribution.

##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max.    NA's 
##    1.00    4.00    6.00    5.95    8.00   11.00   29084

Majority of the loan applicants are not rated. Among those rated, most have a score between 4 to 8.

##   Not employed             $0      $1-24,999 $25,000-49,999 $50,000-74,999 
##            806            621           7274          32192          31050 
## $75,000-99,999      $100,000+  Not displayed 
##          16916          17337           7741

The median household income in the USA was $53,657 in 2014 (U.S. Census Bureau) and most of the borrowers are from the middle or lower-middle class.

There are less number of borrowers for those earning more than $75,000, as them usually have savings to cover their needs. It is worthwhile to note that comparatively there are way less number of loans approved for those earning less than $25,000 as they are deemed to risky to lend money to.

##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max.    NA's 
##   0.000   0.140   0.220   0.276   0.320  10.010    8554

The debt-to-income ratio histogram on the left has a long tail where there are few people with a ratio of 10, which indicates them as risky borrowers as their income is too low to service their debt. By removing the top 1% outliers, we can see that most borrowers have a ratio of around 0.2.

##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##    1000    4000    6500    8337   12000   35000

We can see that most of the loan amount are around $5,000.

There are occasional spikes in $5k, $10k, $15k, $20k and even up $35k which are explainable by the fact that they are multiples of 5,000 where most people tend to use when deciding the amount to borrow.

##      Employed     Full-time     Part-time Self-employed        Retied 
##         67322         26355          1088          6134             0 
##  Not employed         Other Not available            NA          NA's 
##           835          3806          5347             0          3050

Most of the borrowers are employed, be it full-time, part-time, self-employed or non-specified. This makes sense as loan applicants need to demonstrate that they have stable income to pay back the loan.

##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##   12.00   36.00   36.00   40.83   36.00   60.00

Majority of the borrowers have a loan period of 36 months or 3 years.

##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##     0.0   131.6   217.7   272.5   371.6  2251.5
## [1] "173.71"

Majority of the monthly loan payment are less than $250.

$174 is the most common amount of monthly installment and only few borrowers have an installment of exceeding $1,000.

##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max.    NA's 
##     0.0   660.0   680.0   685.6   720.0   880.0     591
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max.    NA's 
##    19.0   679.0   699.0   704.6   739.0   899.0     591

Line charts instead of bar charts are chosen to better reflect the range of score overlaid on top of each other.

The credit score range for most borrowers are between 650 to 750 and the gap between upper and lower range is around 20 points for most borrowers.

## False  True 
## 56459 57478

Homeownership is roughly equally split between True and False for borrowers.

From this, it can be deduced that homeownership might not be the top factors in deciding whether to extend the loans to borrowers.

##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##  0.0000  0.1340  0.1840  0.1928  0.2500  0.4975
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
## -0.0100  0.1242  0.1730  0.1827  0.2400  0.4925

The histograms show a bimodal distribution. Majority of the borrower rates and lender yield are between 0.1 and 0.2. The peak at above 0.3 could be possibly explained by the more common rate given to borrowers with less stellar creditworthiness.

When compared to the borrower rate, lender yield shows a similar trend with the x-axis shifted slightly to the left by 0.01. This could be explained by the fact that Prosper probably charges a 1% fees as its revenue.

##    AK    AL    AR    AZ    CA    CO    CT    DC    DE    FL    GA    HI 
##   200  1679   855  1901 14717  2210  1627   382   300  6720  5008   409 
##    IA    ID    IL    IN    KS    KY    LA    MA    MD    ME    MI    MN 
##   186   599  5921  2078  1062   983   954  2242  2821   101  3593  2318 
##    MO    MS    MT    NC    ND    NE    NH    NJ    NM    NV    NY    OH 
##  2615   787   330  3084    52   674   551  3097   472  1090  6729  4197 
##    OK    OR    PA    RI    SC    SD    TN    TX    UT    VA    VT    WA 
##   971  1817  2972   435  1122   189  1737  6842   877  3278   207  3048 
##    WI    WV    WY  NA's 
##  1842   391   150  5515

California by far has the most borrowers at slightly less than 15,000, followed by Georgia, Florida, Illinois, New York and Texas which has between 5,000 and 7,000 borrowers each.

The high number of borrowers from these states doesn’t come as surprise as they are among the states with the most population. However, the much higher number of borrowers from California is not proportional to its population when compared to Texas. One hypothesis is that it enjoys higher awareness among Californians as an alternative to bank loans could be the reasons due to its location in California.

Univariate Analysis

What is the structure of your dataset?

The dataset contains 81 variables with 113937 observations from year 2005 to 2014.

What is/are the main feature(s) of interest in your dataset?

The typical characteristics of the borrowers are of interest for this dataset. Various plots are created to observe and identify the trend of each variable.

What other features in the dataset do you think will help support your investigation into your feature(s) of interest?

Income range, Debt-to-income ratio are few of the variables that will help to explain why the loans were approved and what are the yield/rate for the loans.

Did you create any new variables from existing variables in the dataset?

No, but I rearranged the factors such Prosper’s rating (Alpha), Prosper’s score, income range, employmenet status and loan term (months) so that the charts can be understood more easily. I also created new factors value for loan origination quarter to facilitate the ordering by year and quarter later on.

Of the features you investigated, were there any unusual distributions? Did you perform any operations on the data to tidy, adjust, or change the form of the data? If so, why did you do this?

Most features do not have any unusual distributions and if they do, they are explainable by some other factors. The only one that I am interested in is spike at 0.3 in the borrower rate and lender yield. My expectation was that the graph is skewed towards lower rate to favor borrower with better credit history for risk management.

Bivariate Plots Section

Initially, only 36-month term loan were given. 12-month and 60-month term loan were introduced in Q4 2010 but only 60-month term loan took off. 12-month term loan is believed to be discontinued in end 2012.

## 
##  Pearson's product-moment correlation
## 
## data:  ProsperScore and BorrowerRate
## t = -248.98, df = 84851, p-value < 2.2e-16
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
##  -0.6536072 -0.6458311
## sample estimates:
##        cor 
## -0.6497361
## 
##  Pearson's product-moment correlation
## 
## data:  ProsperScore and LenderYield
## t = -249.01, df = 84851, p-value < 2.2e-16
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
##  -0.6536541 -0.6458788
## sample estimates:
##        cor 
## -0.6497835

The boxplots above shows that Borrower Rate and Lender Yield decrease with improved Prosper’s score. Applicants with better rating pose less risk and thus have lower chance of defaulting. Therefore, lenders are willing to charge less interest rate.

## 
##  Pearson's product-moment correlation
## 
## data:  ProsperScore and StatedMonthlyIncome
## t = 24.484, df = 84851, p-value < 2.2e-16
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
##  0.07707163 0.09043415
## sample estimates:
##        cor 
## 0.08375665
## 
##  Pearson's product-moment correlation
## 
## data:  ProsperScore and LoanOriginalAmount
## t = 80.475, df = 84851, p-value < 2.2e-16
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
##  0.2600308 0.2725335
## sample estimates:
##       cor 
## 0.2662933

Delving into the monthly income and loan amount, both boxplot charts didn’t present any surprises. Applicants with higher rating tend to have higher monthly income and larger loan amount.

Looking at the relationship between employment status and loan amount, employed, self-employed and full-time borrowers are usually afforded higher loan amount as opposed to part-timers, not employed or not available.

## 
##  Pearson's product-moment correlation
## 
## data:  Term and LoanOriginalAmount
## t = 121.6, df = 113940, p-value < 2.2e-16
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
##  0.3337778 0.3440569
## sample estimates:
##       cor 
## 0.3389275
## 
##  Pearson's product-moment correlation
## 
## data:  Term and BorrowerRate
## t = 6.781, df = 113940, p-value = 1.199e-11
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
##  0.01428050 0.02588888
## sample estimates:
##        cor 
## 0.02008537
## 
##  Pearson's product-moment correlation
## 
## data:  Term and LenderYield
## t = 6.94, df = 113940, p-value = 3.941e-12
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
##  0.01475137 0.02635952
## sample estimates:
##        cor 
## 0.02055614

When investigating the effect of loan term (months), it can be said that loans with longer terms usually come with larger amount. As such, a higher interest rate is levied due to higher risk exposure. This is the same for lender yield as higher interest rate is needed to attract investor to lend to riskier borrowers.

## 
##  Pearson's product-moment correlation
## 
## data:  StatedMonthlyIncome and LoanOriginalAmount
## t = 69.353, df = 113940, p-value < 2.2e-16
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
##  0.1956816 0.2068243
## sample estimates:
##       cor 
## 0.2012595
## 
##  Pearson's product-moment correlation
## 
## data:  StatedMonthlyIncome and DebtToIncomeRatio
## t = -40.121, df = 105380, p-value < 2.2e-16
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
##  -0.1286017 -0.1167082
## sample estimates:
##        cor 
## -0.1226594

When comparing the Income Range, those with higher income are able to borrow more as they also tend to have a lower debt-to-income ratio which indicates lower risk.

## 
##  Pearson's product-moment correlation
## 
## data:  DebtToIncomeRatio and BorrowerRate
## t = 20.465, df = 105380, p-value < 2.2e-16
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
##  0.05690080 0.06892819
## sample estimates:
##        cor 
## 0.06291678
## 
##  Pearson's product-moment correlation
## 
## data:  DebtToIncomeRatio and LenderYield
## t = 20.147, df = 105380, p-value < 2.2e-16
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
##  0.05592580 0.06795465
## sample estimates:
##        cor 
## 0.06194247

Lower debt-to-income ratio does lead to lower borrower rate or lender yield. That is because those with lower debt to income ratio indicates that they have better ability to service their loan installment and therefore have lower probability of defaulting on their loan.

Bivariate Analysis

Talk about some of the relationships you observed in this part of the investigation. How did the feature(s) of interest vary with other features in the dataset?

In general, most of the relationship observed in the charts are aligned with my expectation. Applicants with higher Prosper’s score and lower debt-to-income ratio are able to enjoy lower borrower rate.

Did you observe any interesting relationships between the other features (not the main feature(s) of interest)?

Even though borrower rate tends to increase as debt-to-income ratio increases, that seems to not be the case for those with a debt-to-income ratio of more than 1.5. There are probably other factors that lead to lower borrower rate. Further investigation is needed to understand this anomaly.

What was the strongest relationship you found?

The strongest relatioship found is between Prosper’s score and borrower rate where higher Prosper’s score leads to lower borrower rate. Its correlation coefficient is -0.66.

Multivariate Plots Section

Continuing my investigtion of the relationship between debt-to-income ratio and borrower rate, I removed all debt-to-income ratios of less than 1.5. From the scatterplotplot, it is rather a surprise that lots of borrowers with low or unidentified income are able to borrow a large sum of more than $10,000 with low borrower rate (less than 0.25). I suspect that there are other variables behind it.

By further extending my investigation, it shows that these borrowers with low income but yet able to borrow with low rate have rather good Prosper’s score (exclude those with ‘NA’ score). That explains the anomalies in the debt-to-income ratio vs borrower rate boxplot chart.

One interesting finding is borrower rate of respective Prosper’s rating tends to have a narrow range of borrower rate irregardless with the debt-to-income ratio. Debt-to-income ratio seems not to matter much as long as borrowers establish a good credit rating score.

Similarly, lender yield is more likely to be determined by the Prosper’s rating of the borrowers than the debt-to-income ratio.

Borrowers with no credit history are allowed to borrow less than $5,000 in general while borrowers with good ratings are allowed to borrow more up to $35,000.

It seems that most of the applicants that fulfil undesirable features of ‘bad borrowers’ are those with no prior Prosper’s rating, low monthly income and high debt-to-income ratio. However, this is normal especially for young graduates who just started out.

It can be seen that the length of the loan term somewhat corresponds to the amount of the loan. Larger loan typically requires longer monthly installment period irregardless of the monthly income.

It is also necessary to take note that 12-month term loan were discontinued by 2012 possibly due to lack of interest as there might be an upper limit placed on the amount of loan figure.

Using facet wrap, we can observe clearly that borrowers with lower Prosper’s score are allowed to borrow smaller amount of loan due to higher perceived risk of defaulting.

As Prosper expands over the years, the expansion mostly comes from borrowers with good rating while those with no credit history have decreased. It shows that Propser is pursuing more sustainable business model. Another probably explanation is that potential new customer have acquired some credit history over the years.

The barchart above confirms my hypothesis that Prosper expansion comes mostly from performing loans. Over the years, the number of defaulted loans has dropped.

Multivariate Analysis

Talk about some of the relationships you observed in this part of the investigation. Were there features that strengthened each other in terms of looking at your feature(s) of interest?

Monthly income is a strong factor in determining the borrower rate. At the same time, borrowers with no credit history tend to be those with lower salary which suggests that they might be young people who just graduated or still in school.

Were there any interesting or surprising interactions between features?

It is interesting to note that debt-to-income ratio has minimal effect on the borrower rate, holding the Prosper’s rating variable constant. Previous chart that shows debt-to-income ratio drops with increasing salary suggests that the ratio is rather a dependent variable of monthly income.


Final Plots and Summary

Plot One

Description One

The boxplots above shows that borrowers with better Prosper’s Score tend to enjoy lower borrower rate. The range of the rate doesn’t fluctuate much for most Prosper’s score with the exception of those with moderate score between 4 to 7.

However, there are outliers in the opposite trend for those with the best and worst score. That could be due to other factors such as amount of loan taken, new monthly income or change in employment status.

Plot Two

Description Two

This graph is chosen as it shows that the Prosper’s rating (alpha) is curcial in determining the borrower rate. Holding the Prosper’s rating (alpha) constant, an increase in debt-to-income ratio has insignificant impact on the borrower rate.

Those with ‘HR’ rating are more likely to have debt-to-income ratio larger than 1.0. On the other hand, borrowers rated ‘AA’ tend to have ratio of less than 0.5 and thus have a lower borrower rate that is usually below 0.1.

Plot Three

Description Three

The barchart above shows that Prosper has been expanding its loaning operation with the exception of late 2008 and late 2012 which is possibly caused by the Global Financial Crisis and the European Sovereign Debt Crisis.

Over the years, non-performing loans have decreased largely. That shows that Prosper’s ability to predict its applicants creditworthiness has been improving. Another reason could be Prosper decided to pursue a more sustainable expansion instead of lending to risky borrowers for higher yield which might result in bankruptcy when non-performing loans outnumber performing loans. When crisis hit, Prosper seemed to tigheten its lending policy which is in line with most bank practices as well.


Reflection

When I started exploring this dataset, I was overwhelmed by the number of variables available. It was very tedious to study the relationship between all variables. As such, I only chose about 20 variables that I am more familiar with. It would be great if Prosper is able to provide better clarification how some of the rating, score or borrower rate were determined. However, I also understand that these data are Prosper’s confidential proprieatry. That being said, once I spent a few days working on the data I have a better grasp of the dataset. I only included plots that are related to the storytelling and excluded others variables that doesn’t tell much about the characteristics of the demographics.

The other challenge that I faced is unfamiliarity with R. Since this is my first time coding in R, I took a lot of notes on everything and did a lot of Googling on forums and documentations to find out how to plot certain graphs or customize the charts. I am glad that my effort paid off well as I am able to produce complete the chapter and this project in less than 2 weeks’ time.

Overall, I was able to come out with a great storyline for this report. The variables don’t seem too intimidating after a while since most are quite self-explanatory. Without specific questions or directions, I am free to venture around and determine the focus of the storyline. That is when I decided to look more info how the borrower rate was determined and how the Prosper’s rating affected other variables.

I was rather surprised that debt-to-income ratio doesn’t seem to play an important role in determining the borrower rate after taking into account of Prosper’s rating. However, I can’t totally exclude the importance of debt-to-income ratio without knowing how Prosper’s rating is determined as debt-to-income ratio might be one of the main determinant components.

To move on from here, it would be great to be able to build an equation or predictive model to simulate real world scenario. Prosper can also collect other related info that might aid in making the prediction more accurate such as age, education level or city of the applicants/borrowers. Prosper can also help to explain how the rating were determined without revealing too much corporate info as it will help in building the predictive model.