There are three vital questions I want to answer in this presentation:
To answer this questions, I built a stacked bar chart that highlights the relationship between home ownership and the loan amount a borrower gets, then built a regplot and a heat map to see the Relationship between Recommendations and loan amount, finally built a multivariate bar plot that show the Relationship between Employment status, homeownership and Prosper Ratings
As you can see, the Prosper loaon system does not emphasize home ownership in determining the access to loans, it is very clear that Recommendations did not play significant role in determinig the loan amount given to a borrower, since a large number of borrowers with no recommendations also get large amount of loans. Lastly, borrowers that have some source of income or employment have better Prosper ratings.
So my conclusion is that while it not not significant for borrowers the own a home or get recommendations before getting loans, it is important for them to be employent or have some source of income because this will increase their Prosper ratings.
This data set contains 113,937 loans with 81 variables on each loan, including loan amount, borrower rate (or interest rate), current loan status, borrower income, and many others. Most of the loans originated between December 2014 and November 2015. here and the Dictionary definations here.
There are 30 variable selected for this analysis with 7,6224 rows each, Primarimy, the feature of interest that we exlpored are: IsBorrowerHomeowner, EmploymentStatus, ProsperRatingNumeric, PercentFunded, LoanOriginalAmount,and Recommendations colums.
# import all packages and set plots to be embedded inline
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sb
%matplotlib inline
# suppress warnings from final output
import warnings
warnings.simplefilter("ignore")
# load in the dataset into a pandas dataframe
fields=['Term','BorrowerAPR','BorrowerRate','ProsperRating (numeric)','LenderYield','ProsperRating (Alpha)','ProsperScore','Occupation',
'EmploymentStatus','IsBorrowerHomeowner','CurrentlyInGroup','CurrentCreditLines',
'TotalCreditLinespast7years','TotalInquiries','AmountDelinquent','DelinquenciesLast7Years',
'AvailableBankcardCredit','DebtToIncomeRatio','LoanStatus','IncomeRange','LoanNumber','LoanOriginalAmount',
'LoanOriginationDate','LoanOriginationQuarter','MonthlyLoanPayment','PercentFunded','Recommendations',
'InvestmentFromFriendsCount','InvestmentFromFriendsAmount','Investors']
df = pd.read_csv('ProsperLoanData.csv', skipinitialspace=True, usecols=fields)
# Drop rows with missing value
df.dropna(inplace=True)
# Rename some columns
df.rename(columns = {'ProsperRating (numeric)':'ProsperRatingNumeric','ProsperRating (Alpha)':'ProsperRatingAlpha'}, inplace = True)
During investigation we found out that Borrowers can access any loan amount regardless they own a home or not.
sb.catplot(data=df, x="IsBorrowerHomeowner", y="LoanOriginalAmount");
plt.xlabel('Home ownership status')
plt.ylabel('Number of Loan Amount');
Recommendations did not play significant role in determinig the loan amount given to a borrower, as we can see a large number of borrowers with no recommendations also get large amount of loans.
plt.figure(figsize = [18, 6])
# PLOT ON LEFT
plt.subplot(1, 2, 1)
sb.regplot(data = df, x = 'Recommendations', y = 'LoanOriginalAmount');
plt.xlabel('Number of recommendations')
plt.ylabel('Loan Amount')
# PLOT ON RIGHT
plt.subplot(1, 2, 2)
plt.hist2d(data = df, x = 'Recommendations', y = 'LoanOriginalAmount')
plt.colorbar()
plt.xlabel('Number of recommendations')
plt.ylabel('Loan Amount');
Generally, borrowers that have some source of income or employment and are also homeowners have better Prosper ratings, hence we can say a borrowers employment status affect their rating.
ax = sb.barplot(data = df, x = 'EmploymentStatus', y = 'ProsperRatingNumeric', hue = 'IsBorrowerHomeowner',ci = 'sd')
ax.legend(loc = 0, ncol = 2, framealpha = 0, title = 'Homeowner')
plt.xticks(rotation = 30)
plt.xlabel('Employment Status')
plt.ylabel('Prosper Numeric Rating');