Project information

  • Category: Data Analysis (Python)
  • Date Source: Kaggle
  • Client: Personal Project
  • Project date: 8 August, 2022
  • Project URL: See Project

The Prosper Loan Data Analysis

This data set contains 113,937 loans with 81 variables on each loan, including loan amount, borrower rate (or interest rate), current loan status, borrower income, and many others. Most of the loans originated between December 2014 and November 2015.
There are 30 variable selected for this analysis with 7,6224 rows each, Primarimy, the feature of interest that we exlpored are: IsBorrowerHomeowner, EmploymentStatus, ProsperRatingNumeric, PercentFunded, LoanOriginalAmount,and Recommendations colums
A lot of wrangling was done to make the data usable. Here are some general steps taken to clean the data:

  • Imported the data from an external data source.
  • Created a backup copy of the original data in a separate workbook.
  • Removed irrelevant data.
  • Deduplicated the data.
  • Fixed structural errors.
  • Dealt with missing data.
  • Filtered out data outliers.
  • Validated the data.