Analysis of Potential Company Fleet Additions

Matthew Unrue, July 2019

Western Governors University Data Visualization Project

Restructured into this Jupyter Notebook February, 2020

Vehicle Information by Combined Criteria

This report details the analysis of four potential vehicles for a company fleet addition based on six criteria:

  • Safety Features
  • Maintenance Costs
  • Price Point
  • Insurance Costs
  • Fuel Economy
  • Resale Value

Safety features data has been pulled from the NHTSA WebAPI ("NHTSA WebAPIs", 2019) and the remaining information has been scraped from Kelly Blue Book ("Kelley Blue Book: New and Used Car Price Values, Expert Car Reviews", 2019). The company’s given weighting is ten points for safety ratings, seven points for price point, and five points for maintenance costs. My personal weightings for the additional remaining qualifications are ten points for fuel economy, eight points for resale value, and four points for insurance cost.

The weighting was applied by ranking each vehicle in order within each criterion and granting points to each vehicle based on that placement multiplied by the weighting. The lowest ranked vehicle received the base weighting number of points, the second to lowest ranked vehicle received the base weighting times two, the second highest ranking received the base weighting times three, and the highest ranking was granted four times the base rating of points. The final decision was made by summing the total number of weighted points each vehicle received and comparing them to each other.

The process of creating this report began with the compilation of the appropriate data for each criterion. Safety features pulled from the National Highway Traffic Safety Administration contained the overall rating, the overall front crash rating, the front crash driver side rating, the front crash passenger side rating, the overall side crash rating, the side crash driver side rating, the side crash passenger side rating, the rollover rating, and the rollover possibility. All of these were ratings from one to five, with five being the most favorable, except for the rollover possibility, which was given in a percentage. The safety rating values used for analysis were created by averaging each vehicle’s various test ratings, weighting each test equally. The rollover possibilities were all very similar, with a range of less than four percent and were not factored in to the overall safety features.

Finally, the only anomaly in the process of data gathering occurred here, with the 2019 Honda CR-V not having available safety ratings outside of the rollover rating and rollover possibility ("2019 Honda CR-V SE New Car Prices: Kelley Blue Book", 2019). Given this missing information, the CR-V was given a zero in each missing field and ranked the worst in the safety features category as a precaution. The remaining information for each criterion was scraped, or in the case of resale value, calculated without issue in a dollar amount or the number of miles per gallon a vehicle was rated to achieve.


February, 2020 Note:

When this report was initially created, the above information was true. Now, the previously missing Honda CR-V is available and is accordingly scraped by the below code. The report is not rewritten to accommodate this.


The Safety Features Criteria from the NHTSA:

In [1]:
import requests
from bs4 import BeautifulSoup
from selenium import webdriver
from selenium.webdriver.firefox.options import Options
In [2]:
# Create list of RESTful API URLs to call.
safety_features_urls = [
    "https://webapi.nhtsa.gov/api/SafetyRatings/VehicleId/13255?format=xml",
    "https://webapi.nhtsa.gov/api/SafetyRatings/VehicleId/14026?format=xml",
    "https://webapi.nhtsa.gov/api/SafetyRatings/VehicleId/13087?format=xml",
    "https://webapi.nhtsa.gov/api/SafetyRatings/VehicleId/14081?format=xml"
]

# Create list of URLs to scrape.
urls = [
    "https://www.kbb.com/ford/escape/2019/se/?vehicleid=439775&intent=buy-new",
    "https://www.kbb.com/honda/cr-v/2019/ex/?vehicleid=439695&intent=buy-new",
    "https://www.kbb.com/hyundai/santa-fe/2019/24-se/?vehicleid=436474&intent=buy-new",
    "https://www.kbb.com/toyota/rav4/2019/xle/?vehicleid=440125&intent=buy-new"
]
In [3]:
# Create each line of data to append scraped information to
first_line = ""
second_line = ""
third_line = ""
fourth_line = ""

# Create list of csv lines.
lines = [
    first_line,
    second_line,
    third_line,
    fourth_line
]
In [4]:
# Create list of values.
safety_values = [
    "ModelYear",
    "Make",
    "Model",
    "OverallRating",
    "OverallFrontCrashRating",
    "FrontCrashDriversideRating",
    "FrontCrashPassengersideRating",
    "OverallSideCrashRating",
    "SideCrashDriversideRating",
    "SideCrashPassengersideRating",
    "RolloverRating",
    "RolloverPossibility",
    "SidePoleCrashRating"
]
In [5]:
# Create the header line.
header_line = ""

for value in safety_values:
    header_line = header_line + value + ","
    
header_line = header_line + "MaintenanceCosts,MSRP,MSRPFrameURL"


# Open a .csv file to write the information to.
v = open("Criteria 1 Data.csv", 'w', encoding = 'UTF')
v.write(header_line + "\n")
Out[5]:
284
In [6]:
# Scrape the safety ratings data from each RESTful API call.
for i, url in enumerate(safety_features_urls):
    # Request the current URL.
    page = requests.get(url)
    
    # Parse the page with BeautifulSoup
    soup = BeautifulSoup(page.content, 'xml')
    
    # Append values to the appropriate line.
    for value in safety_values:
        lines[i] = lines[i] + soup.find(value).text + ","
In [7]:
# Scrape the remaining values from the final DOM on KBB.
options = Options()
options.headless = True
driver = webdriver.Firefox(options = options)
In [8]:
# Add the 5 year projected maintenance costs and MSRP values to each line.
for i, url in enumerate(urls):
    driver.get(url)
    soup = BeautifulSoup(driver.page_source, 'html.parser')
    frame_url_base = "https://www.kbb.com"
    
    maintenance_cost = soup.find_all("div", class_="col-base-6")[9].text[1:]
    maintenance_cost = str(maintenance_cost).replace(",", "")
    
    
    # This site displays some of the information that needs to be scraped by loading a webpage
    # created by an internal API call into an iFrame. Some manipulation of the API call data and URLs
    # has to occur to replicate these calls and then scrape the information programatically just from the base URLs.
    driver.get(url)
    soup = BeautifulSoup(driver.page_source, 'html.parser')
    # Get the API call url.
    soup.select("Object#PriceAdvisorFrame")
    
    # Manipulate the call URL into a valid call.
    frame_url_base = "https://www.kbb.com"
    frame_url_data = str(soup.select("Object#PriceAdvisorFrame")).split()[1][6:-1]
    frame_url_data_manipulated = frame_url_data.split("&")
    url_end = "16"
    url_end = url_end + frame_url_data_manipulated[0][12:]
    
    for section in frame_url_data_manipulated[1:]:
        url_end = url_end + "&" + section[4:]
        
    # Combine all of the information into a valid URL that can be scraped.
    frame_url = frame_url_base + frame_url_data + url_end
    
    
    # Open a new webdriver concurrently to the original one to load and scrape the new page.
    driver2 = webdriver.Firefox(options = options)
    driver2.get(frame_url)
    soup2 = BeautifulSoup(driver2.page_source, 'html.parser')
    inner_text = [x.text for x in soup2.find_all("text")]
    msrp_value = inner_text[6]
    msrp_value = msrp_value[1:].replace(",", "")
    
    
    lines[i] = lines[i] + maintenance_cost + "," + msrp_value + "," + frame_url
In [9]:
# Add the line breaks and print the lines to review before writing to file.
for line in lines:
    line = line + "\n"
    print(line)
    v.write(line)
2019,FORD,ESCAPE,5,5,5,4,5,5,5,4,0.1910,5,2219,SRP,https://www.kbb.com/Api/3.9.539.0/73115/vehicle/upa/PriceAdvisor/meter.svg?action=Get&intent=buy-new&pricetype=FPP&zipcode=27701&vehicleid=439775&hideMonthlyPayment=False16.0/73115/vehicle/upa/PriceAdvisor/meter.svg?action=Get&intent=buy-new&pricetype=FPP&zipcode=27701&vehicleid=439775&hideMonthlyPayment=False

2019,HONDA,CR-V,5,5,5,4,5,5,5,4,0.1640,5,2182,SRP,https://www.kbb.com/Api/3.9.539.0/73115/vehicle/upa/PriceAdvisor/meter.svg?action=Get&intent=buy-new&pricetype=FPP&zipcode=27701&vehicleid=439695&hideMonthlyPayment=False16.0/73115/vehicle/upa/PriceAdvisor/meter.svg?action=Get&intent=buy-new&pricetype=FPP&zipcode=27701&vehicleid=439695&hideMonthlyPayment=False

2019,HYUNDAI,SANTA FE,5,4,4,5,5,5,5,4,0.1550,5,1960,SRP,https://www.kbb.com/Api/3.9.539.0/73115/vehicle/upa/PriceAdvisor/meter.svg?action=Get&intent=buy-new&pricetype=FPP&zipcode=27701&vehicleid=436474&hideMonthlyPayment=False16.0/73115/vehicle/upa/PriceAdvisor/meter.svg?action=Get&intent=buy-new&pricetype=FPP&zipcode=27701&vehicleid=436474&hideMonthlyPayment=False

2019,TOYOTA,RAV4,5,4,4,5,5,5,5,4,0.1590,5,2271,SRP,https://www.kbb.com/Api/3.9.539.0/73115/vehicle/upa/PriceAdvisor/meter.svg?action=Get&intent=buy-new&pricetype=FPP&zipcode=27701&vehicleid=440125&hideMonthlyPayment=False16.0/73115/vehicle/upa/PriceAdvisor/meter.svg?action=Get&intent=buy-new&pricetype=FPP&zipcode=27701&vehicleid=440125&hideMonthlyPayment=False

In [10]:
# Close the browsers that were opened to get the fully loaded DOMs.
driver.quit()
driver2.quit()

# Close the .csv file.
v.close()

The Remaining Criterion from Kelley Blue Book:


February, 2020 Note:

The report is not rewritten to accommodate the changes in price values since the original reprt was created.


In [11]:
# Create list of URLs to scrape.
urls = [
    "https://www.kbb.com/ford/escape/2019/se/?vehicleid=439775&intent=buy-new",
    "https://www.kbb.com/honda/cr-v/2019/ex/?vehicleid=439695&intent=buy-new",
    "https://www.kbb.com/hyundai/santa-fe/2019/24-se/?vehicleid=436474&intent=buy-new",
    "https://www.kbb.com/toyota/rav4/2019/xle/?vehicleid=440125&intent=buy-new"
]
In [12]:
# Create each line of data to append scraped information to
first_line = "2019,Ford,Escape,"
second_line = "2019,Honda,CR-V,"
third_line = "2019,Hyundai,Santa Fe,"
fourth_line = "2019,Toyota,RAV4,"

# Create list of csv lines.
lines = [
    first_line,
    second_line,
    third_line,
    fourth_line
]
In [13]:
# Create the header line.
header_line = "ModelYear,Make,Model,InsuranceCost,CityMPG,HwyMPG,CombMPG,DepreciationAmount"


# Open a .csv file to write the information to.
v = open("Criteria 2 Data.csv", 'w', encoding = 'UTF')
v.write(header_line + "\n")
Out[13]:
77
In [14]:
# Scrape the data values from the final DOM on KBB.
options = Options()
options.headless = True
driver = webdriver.Firefox(options = options)
In [15]:
# Add the 5 year projected insurance costs, fuel economy values, and depreciation values to each line.
for i, url in enumerate(urls):
    driver.get(url)
    soup = BeautifulSoup(driver.page_source, 'html.parser')
    
    # Insurance costs.
    insurance_cost = soup.find_all("div", class_="col-base-6")[3].text[1:]
    insurance_cost = str(insurance_cost).replace(",", "")
    
    # Fuel Economy values.
    mpgs_div = soup.find("div", {"class": "hub-specs-item col-base-4"}).find("div", {"class": "paragraph-three"}).text
    mpgs = mpgs_div.split("/")
    mpgs[0] = mpgs[0][-2:]
    mpgs[1] = mpgs[1][-2:]
    mpgs[2] = mpgs[2][5:7]
    
    fuel_economy_values = mpgs[0] + ',' + mpgs[1] + ',' + mpgs[2]
    
    # Depreciation amount.
    depreciation = soup.find("div", {"class": "cto-box loss-of-value"}).find("div", {"class": "title-two"}).text[1:]
    depreciation = depreciation.replace(",", "")
    
    
    # Add these values to their respective lines.
    lines[i] = lines[i] + insurance_cost + "," + fuel_economy_values + "," + depreciation + "\n"
    print(lines[i])
2019,Ford,Escape,2565,23,30,26,18966

2019,Honda,CR-V,2635,28,34,30,14919

2019,Hyundai,Santa Fe,2635,22,29,25,16539

2019,Toyota,RAV4,2895,26,35,30,15545

In [16]:
# Add the line breaks and print the lines to review before writing to file.
for line in lines:
    v.write(line)
In [17]:
# Close the browsers that were opened to get the fully loaded DOMs.
driver.quit()

# Close the .csv file.
v.close()

With safety features being among the most heavily weighted, the analysis begins there. All three vehicles that had safety ratings exclusively received fours and fives in each test, putting them among the safest vehicles on the market and not putting any of them particularly in the lead. The Ford Escape did receive a single five rating more than the Hyundai Santa Fe and the Toyota RAV4, which tied, and was thus given the highest number of points towards the final decision. The ten different safety ratings are displayed on a single chart to allow for each rating to be easily compared across each vehicle.

The price point (MSRP) was examined next, alongside the projected maintenance cost, since the maintenance costs can be seen as an additional cost on top of the initial investment. A higher MSRP seemed to correspond with a higher projected maintenance in these four vehicles except for one instance. The Honda CR-V has a higher MSRP than the Ford Escape but has a slightly lower project maintenance cost. The Hyundai Santa Fe is the best vehicle given only these three criteria, followed by the Ford Escape, while the Toyota RAV4 and the Honda CR-V are very close to each other.

The second set of criteria tells a very different story than the first set. Fuel economy is a very common concern when purchasing a new vehicle and it is equally applicable in this scenario. City MPG, Highway MPG, and Combined MPG were all available, but the combined rating was the one points were awarded on for this analysis. The Honda CR-V and the Toyota RAV4 were both rated the highest, at thirty miles per gallon. The other two vehicles were considerably lower, at twenty-six and twenty-five miles per gallon. Like the MSRP and projected maintenance costs, the insurance costs were also viewed as an additional cost to the vehicle. The Toyota RAV4 had a significantly higher insurance cost than the others, which were all very comparable regardless of MSRP difference.

Combined Criteria Weighted Decision Matrix.xlsx

Combined%20Criteria%20Weighted%20Decision%20Matrix.png

Vehicle Information by Company Criteria

Vehicle Information by the Analyst's Criteria

Vehicle Information by Combined Criteria

With all of these criteria examined and points awarded to each vehicle for their rankings in them, we have an overall placement for the potential purchases. Despite being the lowest ranked vehicle among the first set of criteria, the 2019 Honda CR-V is found to be the best purchase with these weighted criteria with 119 overall points. This is in spite of having an automatic worst placement in the heavily weighted safety features criterion as a result of missing data. It would very likely have a significantly higher score if these ratings were present, which would make it the obvious best decision by a longshot. If these missing safety ratings are a great cause for concern however, the 2019 Hyundai Santa Fe is still a somewhat close contestant at 116 points. The 2019 Ford Escape and the 2019 Toyota RAV4 are also very good vehicles but have lower scores at 115 and 110 points respectively. After all of these factors, the recommended vehicle choice is the 2019 Honda CR-V.

References

2019 Ford Escape SE New Car Prices: Kelley Blue Book. (2019). Retrieved July 23, 2019, from https://www.kbb.com/ford/escape/2019/se/?vehicleid=439775&intent=buy-new

2019 Honda CR-V SE New Car Prices: Kelley Blue Book. (2019). Retrieved July 23, 2019, from https://www.kbb.com/honda/cr-v/2019/ex/?vehicleid=439695&intent=buy-new

2019 Hyundai Santa Fe New Car Prices: Kelley Blue Book. (2019). Retrieved July 23, 2019, from https://www.kbb.com/hyundai/santa-fe/2019/24-se/?vehicleid=436474&intent=buy-new

2019 Toyota RAV4 XLE New Car Prices: Kelley Blue Book. (2019). Retrieved July 23, 2019, from https://www.kbb.com/toyota/rav4/2019/xle/?vehicleid=440125&intent=buy-new

Kelley Blue Book: New and Used Car Price Values, Expert Car Reviews. (n.d.). Retrieved July 23, 2019, from https://www.kbb.com/

NHTSA WebAPIs. (n.d.). Retrieved July 23, 2019, from https://webapi.nhtsa.gov/

VehicleId 13087. (2019). Retrieved July 23, 2019, from https://webapi.nhtsa.gov/api/SafetyRatings/VehicleId/13087?format=xml

VehicleId 13255. (2019). Retrieved July 23, 2019, from https://webapi.nhtsa.gov/api/SafetyRatings/VehicleId/13255?format=xml

VehicleId 14026. (2019). Retrieved July 23, 2019, from https://webapi.nhtsa.gov/api/SafetyRatings/VehicleId/14026?format=xml

VehicleId 14081. (2019). Retrieved July 23, 2019, from https://webapi.nhtsa.gov/api/SafetyRatings/VehicleId/14081?format=xml