import pandas as pd
import numpy as np
import plotly
import plotly.express as px
airbnb = pd.read_csv('airbnb.csv') # For Seattle only
There's quite a few different ways to show geo data, usually with choropleth charts or scatter plots.
Our friend Plotly has them all: https://plotly.com/python/maps/
A quick note about how this work before letting you leaf through the docs page.
Most of the params in px.scatter_mapbox()
behave pretty similarly to px.scatter
, except that we provide latitude and longitude data instead of x
and y
. Luckily, our dataset already has that included, but oftentimes we'll have to find a lookup table online to convert city names, for example, to lat / lon coordinates.
We don't necessarily have to provide a value to size=
, but that usually can help highlight points of interest.
zoom=
on the other hand, just changes how zoomed in the initial picture is when first loaded.
Finally, we'll have to update the mapbox_style=
parameter of the figure to a specific base map to load.
For more information on what options are available here, check out https://plotly.com/python/mapbox-layers/
fig = px.scatter_mapbox(airbnb, lat='latitude', lon='longitude',
color='neighbourhood_group', size='price', opacity=.6,
hover_name='name',hover_data=['neighbourhood'],
color_discrete_sequence=plotly.colors.qualitative.Prism_r,
zoom=10, labels={'neighbourhood_group':'Seattle Neighborhood'}
)
fig.update_layout(mapbox_style="carto-positron")
fig.update_layout(margin={"r":0,"t":0,"l":0,"b":0})
fig.show('notebook')
Though the Airbnb dataset is great, it doesn't quite show a dynamic picture of how prices have changed over time in Seattle.
Zillow provides some fantastic data for this with their Zillow Rental Index (ZRI).
The dataset below was pulled from https://www.zillow.com/research/data/, and subsetted to Washington State.
zillow = pd.read_csv('zillow.csv')
zillow.head() # full state of WA, by zip
print(zillow.shape)
print(zillow.columns)
zillow.head(3)
A quick bit of cleaning is necessary here.
Right now, each row in zillow
represents a unique region, and the ZRI value for each date is given in its own date column (113 months => 113 columns).
For visualization, however, we'd like each unique time point to be its own row.
To accomplish this "pivot" of sorts, we'll use pd.melt()
. Google the documentation, and see if you can get something that looks like:
RegionID | RegionName | City | State | Metro | CountyName | SizeRank | Date | ZRI
ZRI = pd.melt(zillow[zillow.Metro.isin(['Seattle-Tacoma-Bellevue'])],
id_vars=['RegionID','RegionName','City','State','Metro','CountyName','SizeRank'],
var_name='Date',
value_name='ZRI')
ZRI_Seattle_2020 = ZRI[ZRI.City=='Seattle'].loc[ZRI.Date=='2020-01']
Choropleth charts are similar to geographic scatter plots, in that they can overlay some feature of interest, say price, over geographic maps.
However, sometimes we'd want to define explicit areas - city borders, county lines, etc - in order to show some value across an entire region, instead of just a single point given by a lat/lon coordinate.
To do this, we'll need another dataset that defines those boundaries. This information is stored in something called a .json
file, something we'll explore further in week 9.
import json
with open('seattle.geojson') as boundaries: # We're reading in a special format, called a .geojson file
neighborhoods = json.load(boundaries)
Think of a json
object as a set of nested dictionaries. Each element contains information about the lat/lon coordinates that define a region, as well as some names / identifiers for that region.
neighborhoods["features"][22]["properties"]
ZRI_Seattle_2020[ZRI_Seattle_2020['RegionName']=='Brighton']
For the choropleth itself, we'll use the px.choropleth_mapbox
function, with similar parameters as before
fig = px.choropleth_mapbox(ZRI_Seattle_2020, geojson=neighborhoods,
locations='RegionName', featureidkey = 'properties.name',
color='ZRI', opacity=.6,
center = {'lat':47.625,'lon':-122.333}, zoom = 10,
hover_name='RegionName',hover_data=['ZRI','SizeRank','Date'],
color_continuous_scale=plotly.colors.sequential.BuPu[2:],
labels={'ZRI':'Zillow Rent Index'}
)
fig.update_layout(mapbox_style="carto-positron")
fig.update_layout(margin={"r":0,"t":0,"l":0,"b":0})
fig.show('notebook')
Clearly, rental values closer downtown and the waterfront are much more expensive than the suburbs.
In breakout rooms with your PC, see if you can:
Hint: Google & Plotly Documentation are your best friends here
date_idx = pd.date_range(start='2011-01-01', end='2020-01-01', freq='MS')
def fill_timeseries(df):
df2 = df.set_index(pd.to_datetime(df['Date'])).drop(columns='Date')
df3 = df2.reindex(date_idx, fill_value=np.nan)
return df3.reset_index()
ZRI_f = ZRI.groupby(by=['RegionID']).apply(fill_timeseries).reset_index(drop=True).rename(columns={'index':'Date'})
ZRI_f['missing'] = ZRI_f.ZRI.apply(lambda x: 1 if x==np.nan else 0)
ZRI_f['ZRI'] = ZRI_f.ZRI.interpolate(method='linear')
ZRI_f
ZRI_f[ZRI_f.missing==1]
ZRI_f[ZRI_f.RegionName == 'Capitol Hill']
#1
hoods = ['Denny Triangle', 'First Hill', 'Capitol Hill', 'Belltown', 'Uptown']
fig = px.line(ZRI_f[ZRI_f.RegionName.isin(hoods)], x='Date', y='ZRI',
color='RegionName',
color_discrete_sequence = plotly.colors.sequential.BuPu[2:]
)
fig.show('notebook')
#2
region_names = ['Downtown', 'Uptown', 'Belltown', 'West Queen Anne', 'First Hill', 'Denny Triangle']
ZRI_downtown = ZRI[ZRI.RegionName.isin(region_names)]
# fig = px.choropleth_mapbox(ZRI_downtown, geojson=neighborhoods,
# locations='RegionName', featureidkey = 'properties.name',
# animation_frame='Date', animation_group='RegionName',
# color='ZRI', opacity=.6,
# center = {'lat':47.625,'lon':-122.333}, zoom = 10,
# hover_name='RegionName',hover_data=['ZRI','SizeRank','Date'],
# color_continuous_scale=plotly.colors.sequential.BuPu[2:],
# labels={'ZRI':'Zillow Rent Index'}
# )
# fig.update_layout(mapbox_style="carto-positron")
# fig.update_layout(margin={"r":0,"t":0,"l":0,"b":0})
# fig.show('notebook')