February 2019: Visualizing my discogs collection

In parts one and two of this series I did a whole lot of API pulling and data cleaning to get my discogs collection into a tidy state šŸ™ Now I’m finally ready to do something with it!

I want to be able to explore my collection on a map (😱) and also see what styles of music I like, from where, and how that has changed over time.

collection_data
## # A tibble: 169 x 11
##    release_id title format artist_id artist_name  year style city  country
##         <int> <chr> <chr>      <int> <chr>       <dbl> <chr> <chr> <chr>  
##  1    7496378 Demo  Tape     4619796 Mollot       2015 Hard… Toro… Canada 
##  2    4490852 Obse… "12\""   3192745 Una BĆØstia…  2013 Hard… Barc… Spain  
##  3    5556486 Fuck… "12\""   2876549 Good Throb   2014 Post… Lond… UK     
##  4    9827276 I     "7\""    2769828 S.H.I.T.     2017 Hard… Toro… Canada 
##  5    9769203 OĆ­do… "12\""   4282571 Rata Negra   2017 Punk  Madr… Spain  
##  6    7237138 A Ca… "7\""    3596548 Ivy          2015 Hard… New … USA    
##  7   13117042 Tash… "7\""    5211980 Tashme       2019 Hard… Toro… Canada 
##  8    7113575 Demo  Tape     4450861 Desgraciad…  2014 Hard… Calg… Canada 
##  9   10540713 Let … Tape     4273896 Phantom He…  2015 Post… Kans… USA    
## 10   11260950 Sub … Tape     5694086 Sub Space    2017 Hard… New … USA    
## # … with 159 more rows, and 2 more variables: lat <dbl>, long <dbl>

So, yes, I want to map my discogs collection all over the world šŸŒ

Pretty much everything I know about spatial data is from Jesse Sadler’s amazing blog post, Introduction to GIS with R, so I’m pulling this code heavily from there.

First, so that we don’t have legend fatigue, I’m going to lump the least common music styles together. My collection is fairly dominated by a few things:

collection_data %>%
  count(style, sort = TRUE)
## # A tibble: 17 x 2
##    style                n
##    <chr>            <int>
##  1 Hardcore            78
##  2 Punk                37
##  3 Post-Punk           14
##  4 Indie Rock          13
##  5 Black Metal          4
##  6 New Wave             4
##  7 Shoegaze             4
##  8 Experimental         3
##  9 Hip Hop              2
## 10 Indie Pop            2
## 11 Pop Rock             2
## 12 Alternative Rock     1
## 13 Avantgarde           1
## 14 Grunge               1
## 15 Ska                  1
## 16 Stoner Rock          1
## 17 Synth-pop            1

and while I’d love to specifically look at New Wave releases across the world, it just doesn’t make sense for that grand total of 4.

library(forcats)

collection_data <- collection_data %>%
  mutate(style = fct_lump(as_factor(style), 4))

Next, I’m converting my data frame into an sf object using the long and lat fields.

library(sf)
library(dplyr)

points_sf <- collection_data %>%
  filter(!is.na(lat)) %>%
  st_as_sf(coords = c("long", "lat"), crs = 4326)

points_sf
## Simple feature collection with 167 features and 9 fields
## geometry type:  POINT
## dimension:      XY
## bbox:           xmin: -123.13 ymin: -33.46 xmax: 139.77 ymax: 63.83
## epsg (SRID):    4326
## proj4string:    +proj=longlat +datum=WGS84 +no_defs
## # A tibble: 167 x 10
##    release_id title format artist_id artist_name  year style city  country
##         <int> <chr> <chr>      <int> <chr>       <dbl> <fct> <chr> <chr>  
##  1    7496378 Demo  Tape     4619796 Mollot       2015 Hard… Toro… Canada 
##  2    4490852 Obse… "12\""   3192745 Una BĆØstia…  2013 Hard… Barc… Spain  
##  3    5556486 Fuck… "12\""   2876549 Good Throb   2014 Post… Lond… UK     
##  4    9827276 I     "7\""    2769828 S.H.I.T.     2017 Hard… Toro… Canada 
##  5    9769203 OĆ­do… "12\""   4282571 Rata Negra   2017 Punk  Madr… Spain  
##  6    7237138 A Ca… "7\""    3596548 Ivy          2015 Hard… New … USA    
##  7   13117042 Tash… "7\""    5211980 Tashme       2019 Hard… Toro… Canada 
##  8    7113575 Demo  Tape     4450861 Desgraciad…  2014 Hard… Calg… Canada 
##  9   10540713 Let … Tape     4273896 Phantom He…  2015 Post… Kans… USA    
## 10   11260950 Sub … Tape     5694086 Sub Space    2017 Hard… New … USA    
## # … with 157 more rows, and 1 more variable: geometry <POINT [°]>

In order to visualize those, I need a map of the world so I have something to plot on top of (I mean, I guess I don’t need to use the actual earth as a reference point, but I think we’d all appreciate it if I did)

library(rnaturalearth)

countries_sf <- ne_countries(scale = "medium", returnclass = "sf")

And then I can plot my collection! I’m using different colours for different music styles, and shapes for different formats.

To no surprise, the vast majority of my collection is from North America, with a real focus on the Pacific North West (I used to live in Vancouver ā˜‚ļø) and Toronto/East Coast USA (there’s just a lot of punk there, in general šŸŽø).

library(ggplot2)
library(paletteer)
library(plotly)

collection_plot <- ggplot() +
  geom_sf(data = countries_sf, fill = "white", size = 0.25, alpha = 0.5) +
  geom_sf(
    data = st_jitter(points_sf,
                     amount = 0.75),
    aes(color = style, shape = format,
        text = glue::glue('"{title}" by {artist_name}<br>{city}, {country}<br>{style} {format}')),
    alpha = 0.75,
    show.legend = FALSE,
    size = 2
  ) + 
  theme_bw() + 
  theme(legend.position = "none", 
        legend.title = element_blank(),
        axis.text = element_blank(), 
        axis.ticks = element_blank()) + 
  scale_color_paletteer_d("rcartocolor", "Pastel")

ggplotly(collection_plot, 
         tooltip = "text")
## Error: stat_sf requires the following missing aesthetics: geometry

I’m also interested in the different eras of my music taste – do I like different kinds of music from different times? You know how to add the time dimension to a plot?

Animation šŸ˜Ž 🌠

Similar to spatial data, everything I know about animation is from one source: Thomas Lin Pedersen’s talk about the gganimate package from RStudio conf.

I’m going to focus on North America, since that’s where most of my information is from. In a maybe blasphemous move, I’m overlaying the American states and Canadian provinces and territories over the map of the world’s countries 😬

states_sf <- ne_states(country = c("Canada", "United States of America"), returnclass = "sf")

north_america_collection_plot <- ggplot() +
  geom_sf(data = countries_sf, fill = "white", size = 0.25) +
  geom_sf(data = states_sf, fill = NA, size = 0.25) + 
  geom_sf(
    data = st_jitter(points_sf %>% filter(year > 0),
                     amount = 0.75),
    aes(color = style, shape = format),
    alpha = 0.75,
    show.legend = "point",
    size = 3
  ) +
  theme_bw() + 
  theme(legend.title = element_blank(),
        legend.position = "bottom") + 
  guides(colour = guide_legend(override.aes = list(size=5, alpha = 1)),
         shape = guide_legend(override.aes = list(size=5, alpha = 1))) + 
  scale_color_paletteer_d("rcartocolor", "Pastel") + 
  coord_sf(xlim = c(-130, -65), ylim = c(23, 55), datum = NA)

Without animation, it’s not bad.

north_america_collection_plot

With animation it’s way cooler.

library(gganimate)

north_america_collection_plot + 
  transition_states(as.factor(year),
                    state_length = 3) + 
  ggtitle("{closest_state}") + 
  shadow_mark() 

I lived in the PNW from 2013 to 2017, and you can see a huuuge increase in music from there during that time. Pretty cool!

I think that’s all I have šŸ’ Bye!

Avatar
Sharla Gelfand
Freelance R and Shiny Developer