Sharla Gelfand

February 2019: Visualizing my discogs collection

In parts one and two of this series I did a whole lot of API pulling and data cleaning to get my discogs collection into a tidy state 🙏 Now I’m finally ready to do something with it!

I want to be able to explore my collection on a map (😱) and also see what styles of music I like, from where, and how that has changed over time.

collection_data
## # A tibble: 169 x 11
##    release_id title format artist_id artist_name  year style city  country
##         <int> <chr> <chr>      <int> <chr>       <dbl> <chr> <chr> <chr>  
##  1    7496378 Demo  Tape     4619796 Mollot       2015 Hard… Toro… Canada 
##  2    4490852 Obse… "12\""   3192745 Una Bèstia…  2013 Hard… Barc… Spain  
##  3    5556486 Fuck… "12\""   2876549 Good Throb   2014 Post… Lond… UK     
##  4    9827276 I     "7\""    2769828 S.H.I.T.     2017 Hard… Toro… Canada 
##  5    9769203 Oído… "12\""   4282571 Rata Negra   2017 Punk  Madr… Spain  
##  6    7237138 A Ca… "7\""    3596548 Ivy          2015 Hard… New … USA    
##  7   13117042 Tash… "7\""    5211980 Tashme       2019 Hard… Toro… Canada 
##  8    7113575 Demo  Tape     4450861 Desgraciad…  2014 Hard… Calg… Canada 
##  9   10540713 Let … Tape     4273896 Phantom He…  2015 Post… Kans… USA    
## 10   11260950 Sub … Tape     5694086 Sub Space    2017 Hard… New … USA    
## # … with 159 more rows, and 2 more variables: lat <dbl>, long <dbl>

So, yes, I want to map my discogs collection all over the world 🌐

Pretty much everything I know about spatial data is from Jesse Sadler’s amazing blog post, Introduction to GIS with R, so I’m pulling this code heavily from there.

First, so that we don’t have legend fatigue, I’m going to lump the least common music styles together. My collection is fairly dominated by a few things:

collection_data %>%
  count(style, sort = TRUE)
## # A tibble: 17 x 2
##    style                n
##    <chr>            <int>
##  1 Hardcore            78
##  2 Punk                37
##  3 Post-Punk           14
##  4 Indie Rock          13
##  5 Black Metal          4
##  6 New Wave             4
##  7 Shoegaze             4
##  8 Experimental         3
##  9 Hip Hop              2
## 10 Indie Pop            2
## 11 Pop Rock             2
## 12 Alternative Rock     1
## 13 Avantgarde           1
## 14 Grunge               1
## 15 Ska                  1
## 16 Stoner Rock          1
## 17 Synth-pop            1

and while I’d love to specifically look at New Wave releases across the world, it just doesn’t make sense for that grand total of 4.

library(forcats)

collection_data <- collection_data %>%
  mutate(style = fct_lump(as_factor(style), 4))

Next, I’m converting my data frame into an sf object using the long and lat fields.

library(sf)
library(dplyr)

points_sf <- collection_data %>%
  filter(!is.na(lat)) %>%
  st_as_sf(coords = c("long", "lat"), crs = 4326)

points_sf
## Simple feature collection with 167 features and 9 fields
## geometry type:  POINT
## dimension:      XY
## bbox:           xmin: -123.13 ymin: -33.46 xmax: 139.77 ymax: 63.83
## epsg (SRID):    4326
## proj4string:    +proj=longlat +datum=WGS84 +no_defs
## # A tibble: 167 x 10
##    release_id title format artist_id artist_name  year style city  country
##         <int> <chr> <chr>      <int> <chr>       <dbl> <fct> <chr> <chr>  
##  1    7496378 Demo  Tape     4619796 Mollot       2015 Hard… Toro… Canada 
##  2    4490852 Obse… "12\""   3192745 Una Bèstia…  2013 Hard… Barc… Spain  
##  3    5556486 Fuck… "12\""   2876549 Good Throb   2014 Post… Lond… UK     
##  4    9827276 I     "7\""    2769828 S.H.I.T.     2017 Hard… Toro… Canada 
##  5    9769203 Oído… "12\""   4282571 Rata Negra   2017 Punk  Madr… Spain  
##  6    7237138 A Ca… "7\""    3596548 Ivy          2015 Hard… New … USA    
##  7   13117042 Tash… "7\""    5211980 Tashme       2019 Hard… Toro… Canada 
##  8    7113575 Demo  Tape     4450861 Desgraciad…  2014 Hard… Calg… Canada 
##  9   10540713 Let … Tape     4273896 Phantom He…  2015 Post… Kans… USA    
## 10   11260950 Sub … Tape     5694086 Sub Space    2017 Hard… New … USA    
## # … with 157 more rows, and 1 more variable: geometry <POINT [°]>

In order to visualize those, I need a map of the world so I have something to plot on top of (I mean, I guess I don’t need to use the actual earth as a reference point, but I think we’d all appreciate it if I did)

library(rnaturalearth)

countries_sf <- ne_countries(scale = "medium", returnclass = "sf")

And then I can plot my collection! I’m using different colours for different music styles, and shapes for different formats.

To no surprise, the vast majority of my collection is from North America, with a real focus on the Pacific North West (I used to live in Vancouver ☂️) and Toronto/East Coast USA (there’s just a lot of punk there, in general 🎸).

library(ggplot2)
library(paletteer)
library(plotly)

collection_plot <- ggplot() +
  geom_sf(data = countries_sf, fill = "white", size = 0.25, alpha = 0.5) +
  geom_sf(
    data = st_jitter(points_sf,
                     amount = 0.75),
    aes(color = style, shape = format,
        text = glue::glue('"{title}" by {artist_name}<br>{city}, {country}<br>{style} {format}')),
    alpha = 0.75,
    show.legend = FALSE,
    size = 2
  ) + 
  theme_bw() + 
  theme(legend.position = "none", 
        legend.title = element_blank(),
        axis.text = element_blank(), 
        axis.ticks = element_blank()) + 
  scale_color_paletteer_d("rcartocolor", "Pastel")

ggplotly(collection_plot, 
         tooltip = "text")