February 2019: Visualizing my discogs collection
In parts one and two of this series I did a whole lot of API pulling and data cleaning to get my discogs collection into a tidy state š Now Iām finally ready to do something with it!
I want to be able to explore my collection on a map (š±) and also see what styles of music I like, from where, and how that has changed over time.
collection_data
## # A tibble: 169 x 11
## release_id title format artist_id artist_name year style city country
## <int> <chr> <chr> <int> <chr> <dbl> <chr> <chr> <chr>
## 1 7496378 Demo Tape 4619796 Mollot 2015 Hard⦠Toro⦠Canada
## 2 4490852 Obse⦠"12\"" 3192745 Una Bèstia⦠2013 Hard⦠Barc⦠Spain
## 3 5556486 Fuck⦠"12\"" 2876549 Good Throb 2014 Post⦠Lond⦠UK
## 4 9827276 I "7\"" 2769828 S.H.I.T. 2017 Hard⦠Toro⦠Canada
## 5 9769203 OĆdo⦠"12\"" 4282571 Rata Negra 2017 Punk Madr⦠Spain
## 6 7237138 A Ca⦠"7\"" 3596548 Ivy 2015 Hard⦠New ⦠USA
## 7 13117042 Tash⦠"7\"" 5211980 Tashme 2019 Hard⦠Toro⦠Canada
## 8 7113575 Demo Tape 4450861 Desgraciad⦠2014 Hard⦠Calg⦠Canada
## 9 10540713 Let ⦠Tape 4273896 Phantom He⦠2015 Post⦠Kans⦠USA
## 10 11260950 Sub ⦠Tape 5694086 Sub Space 2017 Hard⦠New ⦠USA
## # ⦠with 159 more rows, and 2 more variables: lat <dbl>, long <dbl>
So, yes, I want to map my discogs collection all over the world š
Pretty much everything I know about spatial data is from Jesse Sadlerās amazing blog post, Introduction to GIS with R, so Iām pulling this code heavily from there.
First, so that we donāt have legend fatigue, Iām going to lump the least common music styles together. My collection is fairly dominated by a few things:
collection_data %>%
count(style, sort = TRUE)
## # A tibble: 17 x 2
## style n
## <chr> <int>
## 1 Hardcore 78
## 2 Punk 37
## 3 Post-Punk 14
## 4 Indie Rock 13
## 5 Black Metal 4
## 6 New Wave 4
## 7 Shoegaze 4
## 8 Experimental 3
## 9 Hip Hop 2
## 10 Indie Pop 2
## 11 Pop Rock 2
## 12 Alternative Rock 1
## 13 Avantgarde 1
## 14 Grunge 1
## 15 Ska 1
## 16 Stoner Rock 1
## 17 Synth-pop 1
and while Iād love to specifically look at New Wave releases across the world, it just doesnāt make sense for that grand total of 4.
library(forcats)
collection_data <- collection_data %>%
mutate(style = fct_lump(as_factor(style), 4))
Next, Iām converting my data frame into an sf
object using the long
and lat
fields.
library(sf)
library(dplyr)
points_sf <- collection_data %>%
filter(!is.na(lat)) %>%
st_as_sf(coords = c("long", "lat"), crs = 4326)
points_sf
## Simple feature collection with 167 features and 9 fields
## geometry type: POINT
## dimension: XY
## bbox: xmin: -123.13 ymin: -33.46 xmax: 139.77 ymax: 63.83
## epsg (SRID): 4326
## proj4string: +proj=longlat +datum=WGS84 +no_defs
## # A tibble: 167 x 10
## release_id title format artist_id artist_name year style city country
## <int> <chr> <chr> <int> <chr> <dbl> <fct> <chr> <chr>
## 1 7496378 Demo Tape 4619796 Mollot 2015 Hard⦠Toro⦠Canada
## 2 4490852 Obse⦠"12\"" 3192745 Una Bèstia⦠2013 Hard⦠Barc⦠Spain
## 3 5556486 Fuck⦠"12\"" 2876549 Good Throb 2014 Post⦠Lond⦠UK
## 4 9827276 I "7\"" 2769828 S.H.I.T. 2017 Hard⦠Toro⦠Canada
## 5 9769203 OĆdo⦠"12\"" 4282571 Rata Negra 2017 Punk Madr⦠Spain
## 6 7237138 A Ca⦠"7\"" 3596548 Ivy 2015 Hard⦠New ⦠USA
## 7 13117042 Tash⦠"7\"" 5211980 Tashme 2019 Hard⦠Toro⦠Canada
## 8 7113575 Demo Tape 4450861 Desgraciad⦠2014 Hard⦠Calg⦠Canada
## 9 10540713 Let ⦠Tape 4273896 Phantom He⦠2015 Post⦠Kans⦠USA
## 10 11260950 Sub ⦠Tape 5694086 Sub Space 2017 Hard⦠New ⦠USA
## # ⦠with 157 more rows, and 1 more variable: geometry <POINT [°]>
In order to visualize those, I need a map of the world so I have something to plot on top of (I mean, I guess I donāt need to use the actual earth as a reference point, but I think weād all appreciate it if I did)
library(rnaturalearth)
countries_sf <- ne_countries(scale = "medium", returnclass = "sf")
And then I can plot my collection! Iām using different colours for different music styles, and shapes for different formats.
To no surprise, the vast majority of my collection is from North America, with a real focus on the Pacific North West (I used to live in Vancouver āļø) and Toronto/East Coast USA (thereās just a lot of punk there, in general šø).
library(ggplot2)
library(paletteer)
library(plotly)
collection_plot <- ggplot() +
geom_sf(data = countries_sf, fill = "white", size = 0.25, alpha = 0.5) +
geom_sf(
data = st_jitter(points_sf,
amount = 0.75),
aes(color = style, shape = format,
text = glue::glue('"{title}" by {artist_name}<br>{city}, {country}<br>{style} {format}')),
alpha = 0.75,
show.legend = FALSE,
size = 2
) +
theme_bw() +
theme(legend.position = "none",
legend.title = element_blank(),
axis.text = element_blank(),
axis.ticks = element_blank()) +
scale_color_paletteer_d("rcartocolor", "Pastel")
ggplotly(collection_plot,
tooltip = "text")
## Error: stat_sf requires the following missing aesthetics: geometry
Iām also interested in the different eras of my music taste ā do I like different kinds of music from different times? You know how to add the time dimension to a plot?
Animation š š
Similar to spatial data, everything I know about animation is from one source: Thomas Lin Pedersenās talk about the gganimate
package from RStudio conf.
Iām going to focus on North America, since thatās where most of my information is from. In a maybe blasphemous move, Iām overlaying the American states and Canadian provinces and territories over the map of the worldās countries š¬
states_sf <- ne_states(country = c("Canada", "United States of America"), returnclass = "sf")
north_america_collection_plot <- ggplot() +
geom_sf(data = countries_sf, fill = "white", size = 0.25) +
geom_sf(data = states_sf, fill = NA, size = 0.25) +
geom_sf(
data = st_jitter(points_sf %>% filter(year > 0),
amount = 0.75),
aes(color = style, shape = format),
alpha = 0.75,
show.legend = "point",
size = 3
) +
theme_bw() +
theme(legend.title = element_blank(),
legend.position = "bottom") +
guides(colour = guide_legend(override.aes = list(size=5, alpha = 1)),
shape = guide_legend(override.aes = list(size=5, alpha = 1))) +
scale_color_paletteer_d("rcartocolor", "Pastel") +
coord_sf(xlim = c(-130, -65), ylim = c(23, 55), datum = NA)
Without animation, itās not bad.
north_america_collection_plot
With animation itās way cooler.
library(gganimate)
north_america_collection_plot +
transition_states(as.factor(year),
state_length = 3) +
ggtitle("{closest_state}") +
shadow_mark()
I lived in the PNW from 2013 to 2017, and you can see a huuuge increase in music from there during that time. Pretty cool!
I think thatās all I have š Bye!