I started by importing my packages
import pandas as pd
import numpy as np
import geoviews as gv
import geoviews.tile_sources as gvts
from geoviews import dim, opts
from bokeh.models import HoverTool
gv.extension('bokeh')
pd.set_option('mode.chained_assignment', None)
From there, I imported my data and cleaned it up by removing some unnecessary columns. Because I needed the price column, I replaced all zero values with the mean.
a = pd.read_csv('airbnb_listings.csv')
airbnb = a.drop(['neighbourhood_group', 'reviews_per_month', 'last_review', 'host_id', 'host_name', 'calculated_host_listings_count'], axis=1)
airbnb['price'].replace(0, airbnb['price'].mean(), inplace=True)
In order to visualize the breadth of the data, I plotted all the points on a map of Chicago. The map is slightly interactive and can be explored via the hover and zoom tools.
tooltips = [('Price', '@price'), ('Longitude', '$x'),('Latitude', '$y')]
hover = HoverTool(tooltips=tooltips)
airbnb_gv_points = gv.Points(airbnb, ['longitude', 'latitude'], 'price')
(gvts.CartoDark * airbnb_gv_points).opts(opts.Points(width=900, height=900, alpha=0.5,
xaxis=None, yaxis=None,
tools=['hover'], hover_fill_alpha=1))
In this next step, I wanted to compare some of the neighborhoods that we visited to some on the north side of the city. I made two subsets, each containing some neighborhoods in their respective areas that had a large number of listings. I wanted to exclude any listings in Downtown Chicago due to possible confounding factors that could impact price.
southside_areas = ['Grand Boulevard', 'Hyde Park', 'Woodlawn', 'Bridgeport', 'South Shore']
northside_areas = ['Lincoln Park', 'Near North Side', 'Logan Square', 'West Town']
southside = airbnb.query('neighbourhood in @southside_areas')
southside['color']= '#30a2da'
northside = airbnb.query('neighbourhood in @northside_areas')
northside['color']= '#fc4f30'
areas = pd.merge(southside, northside, how='outer')
I made another plot deliniating between the two areas by color. I also sized the dots by the natural log of the price in order to overly emphasize exponential differences.
subplot = gv.Points(areas, ['longitude', 'latitude'], ['price', 'color'])
(gvts.CartoDark * subplot).opts(opts.Points(width=900, height=900, alpha=0.5,
xaxis=None, yaxis=None,
color=dim('color'), size=np.log(dim('price')),
tools=['hover'], hover_fill_alpha=1))
Voila! There are clear differences between the density and price of Airbnbs in northern areas versus southern ones. The blue dots are much more sparely populated with some mild concentration near the coast. The red dots are larger on average and more more concentrated. Feel free to explore the data!
All my data came from this dataset: https://www.kaggle.com/jinbonnie/chicago-airbnb-open-data