Data Viz w/ Plotly, Part 2

In [1]:
import pandas as pd
import numpy as np

import plotly
import plotly.express as px
In [2]:
airbnb = pd.read_csv('airbnb.csv') # For Seattle only

Geographic Scatter Plots

There's quite a few different ways to show geo data, usually with choropleth charts or scatter plots.
Our friend Plotly has them all: https://plotly.com/python/maps/

A quick note about how this work before letting you leaf through the docs page.


Most of the params in px.scatter_mapbox() behave pretty similarly to px.scatter, except that we provide latitude and longitude data instead of x and y. Luckily, our dataset already has that included, but oftentimes we'll have to find a lookup table online to convert city names, for example, to lat / lon coordinates.


We don't necessarily have to provide a value to size=, but that usually can help highlight points of interest.
zoom= on the other hand, just changes how zoomed in the initial picture is when first loaded.


Finally, we'll have to update the mapbox_style= parameter of the figure to a specific base map to load.
For more information on what options are available here, check out https://plotly.com/python/mapbox-layers/

In [3]:
fig = px.scatter_mapbox(airbnb, lat='latitude', lon='longitude', 
                        color='neighbourhood_group', size='price', opacity=.6,
                        hover_name='name',hover_data=['neighbourhood'],
                        color_discrete_sequence=plotly.colors.qualitative.Prism_r, 
                        zoom=10, labels={'neighbourhood_group':'Seattle Neighborhood'}
                       )
fig.update_layout(mapbox_style="carto-positron")
fig.update_layout(margin={"r":0,"t":0,"l":0,"b":0})

fig.show('notebook')

Zillow Data

Though the Airbnb dataset is great, it doesn't quite show a dynamic picture of how prices have changed over time in Seattle.


Zillow provides some fantastic data for this with their Zillow Rental Index (ZRI).
The dataset below was pulled from https://www.zillow.com/research/data/, and subsetted to Washington State.

In [4]:
zillow = pd.read_csv('zillow.csv')
In [5]:
zillow.head() # full state of WA, by zip
Out[5]:
RegionID RegionName City State Metro CountyName SizeRank 2010-09 2010-10 2010-11 ... 2019-04 2019-05 2019-06 2019-07 2019-08 2019-09 2019-10 2019-11 2019-12 2020-01
0 271985 South End Tacoma WA Seattle-Tacoma-Bellevue Pierce County 262 1153.0 1168.0 1188.0 ... 1420.0 1422.0 1427.0 1435.0 1445.0 1454.0 1465.0 1480.0 1490.0 1517.0
1 250206 Capitol Hill Seattle WA Seattle-Tacoma-Bellevue King County 320 1372.0 1401.0 1429.0 ... 2138.0 2146.0 2162.0 2181.0 2199.0 2210.0 2195.0 2183.0 2192.0 2169.0
2 273587 Eastside-ENACT Tacoma WA Seattle-Tacoma-Bellevue Pierce County 356 1218.0 1242.0 1263.0 ... 1451.0 1458.0 1468.0 1478.0 1486.0 1492.0 1496.0 1504.0 1519.0 1547.0
3 344035 Nevada-Lidgerwood Spokane WA Spokane-Spokane Valley Spokane County 419 919.0 939.0 950.0 ... 1040.0 1047.0 1053.0 1057.0 1062.0 1067.0 1087.0 1096.0 1101.0 1100.0
4 272001 University District Seattle WA Seattle-Tacoma-Bellevue King County 449 1313.0 1331.0 1354.0 ... 2051.0 2060.0 2081.0 2106.0 2126.0 2141.0 2087.0 2048.0 2067.0 2056.0

5 rows × 120 columns

In [6]:
print(zillow.shape)
print(zillow.columns)
zillow.head(3)
(260, 120)
Index(['RegionID', 'RegionName', 'City', 'State', 'Metro', 'CountyName',
       'SizeRank', '2010-09', '2010-10', '2010-11',
       ...
       '2019-04', '2019-05', '2019-06', '2019-07', '2019-08', '2019-09',
       '2019-10', '2019-11', '2019-12', '2020-01'],
      dtype='object', length=120)
Out[6]:
RegionID RegionName City State Metro CountyName SizeRank 2010-09 2010-10 2010-11 ... 2019-04 2019-05 2019-06 2019-07 2019-08 2019-09 2019-10 2019-11 2019-12 2020-01
0 271985 South End Tacoma WA Seattle-Tacoma-Bellevue Pierce County 262 1153.0 1168.0 1188.0 ... 1420.0 1422.0 1427.0 1435.0 1445.0 1454.0 1465.0 1480.0 1490.0 1517.0
1 250206 Capitol Hill Seattle WA Seattle-Tacoma-Bellevue King County 320 1372.0 1401.0 1429.0 ... 2138.0 2146.0 2162.0 2181.0 2199.0 2210.0 2195.0 2183.0 2192.0 2169.0
2 273587 Eastside-ENACT Tacoma WA Seattle-Tacoma-Bellevue Pierce County 356 1218.0 1242.0 1263.0 ... 1451.0 1458.0 1468.0 1478.0 1486.0 1492.0 1496.0 1504.0 1519.0 1547.0

3 rows × 120 columns

A quick bit of cleaning is necessary here. Right now, each row in zillow represents a unique region, and the ZRI value for each date is given in its own date column (113 months => 113 columns).


For visualization, however, we'd like each unique time point to be its own row.

To accomplish this "pivot" of sorts, we'll use pd.melt(). Google the documentation, and see if you can get something that looks like:

RegionID | RegionName | City | State | Metro | CountyName | SizeRank | Date | ZRI

In [7]:
ZRI = pd.melt(zillow[zillow.Metro.isin(['Seattle-Tacoma-Bellevue'])], 
              id_vars=['RegionID','RegionName','City','State','Metro','CountyName','SizeRank'], 
              var_name='Date', 
              value_name='ZRI')
ZRI_Seattle_2020 = ZRI[ZRI.City=='Seattle'].loc[ZRI.Date=='2020-01']

Choropleth Plots

Choropleth charts are similar to geographic scatter plots, in that they can overlay some feature of interest, say price, over geographic maps.


However, sometimes we'd want to define explicit areas - city borders, county lines, etc - in order to show some value across an entire region, instead of just a single point given by a lat/lon coordinate.


To do this, we'll need another dataset that defines those boundaries. This information is stored in something called a .json file, something we'll explore further in week 9.

In [8]:
import json
with open('seattle.geojson') as boundaries: # We're reading in a special format, called a .geojson file
    neighborhoods = json.load(boundaries)

Think of a json object as a set of nested dictionaries. Each element contains information about the lat/lon coordinates that define a region, as well as some names / identifiers for that region.

In [9]:
neighborhoods["features"][22]["properties"]
Out[9]:
{'city': 'Seattle',
 'name': 'Brighton',
 'regionid': '250146',
 'geo_point_2d': [47.53889531486397, -122.27538443421297],
 'county': 'King',
 'state': 'WA'}
In [10]:
ZRI_Seattle_2020[ZRI_Seattle_2020['RegionName']=='Brighton']
Out[10]:
RegionID RegionName City State Metro CountyName SizeRank Date ZRI
22568 250146 Brighton Seattle WA Seattle-Tacoma-Bellevue King County 1861 2020-01 1834.0

For the choropleth itself, we'll use the px.choropleth_mapbox function, with similar parameters as before

In [11]:
fig = px.choropleth_mapbox(ZRI_Seattle_2020, geojson=neighborhoods, 
                           locations='RegionName', featureidkey = 'properties.name',
                           color='ZRI', opacity=.6, 
                           center = {'lat':47.625,'lon':-122.333}, zoom = 10,
                           hover_name='RegionName',hover_data=['ZRI','SizeRank','Date'],
                           color_continuous_scale=plotly.colors.sequential.BuPu[2:], 
                           labels={'ZRI':'Zillow Rent Index'}
                           )
fig.update_layout(mapbox_style="carto-positron")
fig.update_layout(margin={"r":0,"t":0,"l":0,"b":0})
fig.show('notebook')

Clearly, rental values closer downtown and the waterfront are much more expensive than the suburbs.

Time Series Plots & Animations

In breakout rooms with your PC, see if you can:

  1. Use time series plots to examine how prices have fluctuated over time in various Seattle neightborhoods.
  2. Animate a choropleth map of rental values over the time periods listed in the dataset. This may take some time, but is defintely worth looking at.

Hint: Google & Plotly Documentation are your best friends here

In [12]:
date_idx = pd.date_range(start='2011-01-01', end='2020-01-01', freq='MS')

def fill_timeseries(df):
    df2 = df.set_index(pd.to_datetime(df['Date'])).drop(columns='Date')
    df3 = df2.reindex(date_idx, fill_value=np.nan)
    return df3.reset_index()
In [13]:
ZRI_f = ZRI.groupby(by=['RegionID']).apply(fill_timeseries).reset_index(drop=True).rename(columns={'index':'Date'})
ZRI_f['missing'] = ZRI_f.ZRI.apply(lambda x: 1 if x==np.nan else 0)
ZRI_f['ZRI'] = ZRI_f.ZRI.interpolate(method='linear')
ZRI_f
Out[13]:
Date RegionID RegionName City State Metro CountyName SizeRank ZRI missing
0 2011-01-01 12207 Kingsgate Kirkland WA Seattle-Tacoma-Bellevue King County 1386 1197.0 0
1 2011-02-01 12207 Kingsgate Kirkland WA Seattle-Tacoma-Bellevue King County 1386 1206.0 0
2 2011-03-01 12207 Kingsgate Kirkland WA Seattle-Tacoma-Bellevue King County 1386 1211.0 0
3 2011-04-01 12207 Kingsgate Kirkland WA Seattle-Tacoma-Bellevue King County 1386 1227.0 0
4 2011-05-01 12207 Kingsgate Kirkland WA Seattle-Tacoma-Bellevue King County 1386 1233.0 0
... ... ... ... ... ... ... ... ... ... ...
21904 2019-09-01 764337 Tyee Park Lakewood WA Seattle-Tacoma-Bellevue Pierce County 2515 1623.0 0
21905 2019-10-01 764337 Tyee Park Lakewood WA Seattle-Tacoma-Bellevue Pierce County 2515 1606.0 0
21906 2019-11-01 764337 Tyee Park Lakewood WA Seattle-Tacoma-Bellevue Pierce County 2515 1620.0 0
21907 2019-12-01 764337 Tyee Park Lakewood WA Seattle-Tacoma-Bellevue Pierce County 2515 1642.0 0
21908 2020-01-01 764337 Tyee Park Lakewood WA Seattle-Tacoma-Bellevue Pierce County 2515 1660.0 0

21909 rows × 10 columns

In [14]:
ZRI_f[ZRI_f.missing==1]
Out[14]:
Date RegionID RegionName City State Metro CountyName SizeRank ZRI missing
In [15]:
ZRI_f[ZRI_f.RegionName == 'Capitol Hill']
Out[15]:
Date RegionID RegionName City State Metro CountyName SizeRank ZRI missing
2398 2011-01-01 250206 Capitol Hill Seattle WA Seattle-Tacoma-Bellevue King County 320 1455.0 0
2399 2011-02-01 250206 Capitol Hill Seattle WA Seattle-Tacoma-Bellevue King County 320 1458.0 0
2400 2011-03-01 250206 Capitol Hill Seattle WA Seattle-Tacoma-Bellevue King County 320 1461.0 0
2401 2011-04-01 250206 Capitol Hill Seattle WA Seattle-Tacoma-Bellevue King County 320 1466.0 0
2402 2011-05-01 250206 Capitol Hill Seattle WA Seattle-Tacoma-Bellevue King County 320 1476.0 0
... ... ... ... ... ... ... ... ... ... ...
2502 2019-09-01 250206 Capitol Hill Seattle WA Seattle-Tacoma-Bellevue King County 320 2210.0 0
2503 2019-10-01 250206 Capitol Hill Seattle WA Seattle-Tacoma-Bellevue King County 320 2195.0 0
2504 2019-11-01 250206 Capitol Hill Seattle WA Seattle-Tacoma-Bellevue King County 320 2183.0 0
2505 2019-12-01 250206 Capitol Hill Seattle WA Seattle-Tacoma-Bellevue King County 320 2192.0 0
2506 2020-01-01 250206 Capitol Hill Seattle WA Seattle-Tacoma-Bellevue King County 320 2169.0 0

109 rows × 10 columns

In [16]:
#1
hoods = ['Denny Triangle', 'First Hill', 'Capitol Hill', 'Belltown', 'Uptown']
fig = px.line(ZRI_f[ZRI_f.RegionName.isin(hoods)], x='Date', y='ZRI', 
              color='RegionName',
              color_discrete_sequence = plotly.colors.sequential.BuPu[2:]
             )
fig.show('notebook')
In [17]:
#2 
region_names = ['Downtown', 'Uptown', 'Belltown', 'West Queen Anne', 'First Hill', 'Denny Triangle']
ZRI_downtown = ZRI[ZRI.RegionName.isin(region_names)]
# fig = px.choropleth_mapbox(ZRI_downtown, geojson=neighborhoods, 
#                            locations='RegionName', featureidkey = 'properties.name',
#                            animation_frame='Date', animation_group='RegionName',
#                            color='ZRI', opacity=.6, 
#                            center = {'lat':47.625,'lon':-122.333}, zoom = 10,
#                            hover_name='RegionName',hover_data=['ZRI','SizeRank','Date'],
#                            color_continuous_scale=plotly.colors.sequential.BuPu[2:], 
#                            labels={'ZRI':'Zillow Rent Index'}
#                            )
# fig.update_layout(mapbox_style="carto-positron")
# fig.update_layout(margin={"r":0,"t":0,"l":0,"b":0})
# fig.show('notebook')

And More!

In [ ]: