Ever since publishing my first post about the development of scrobbler I’ve been meaning to spend much more time writing and showcasing some of the analyses I wanted to do with it. As with many things, life gets in the way, but this is my effort to show something small and simple.

My idea for this post came from Spotify’s ‘2020 Wrapped’ video they make for you showing trends of your listening over the previous year. One thing that stood out to me was that I played Abby Christo’s excellent song Mustang over 50 times in a single day! This got me thinking, what if we extended that question back over the whole dataset of my music. Essentially we can ask the question “What song was played the most often on a specific day?”.

This is fairly easy to answer with scrobbler and dplyr, so lets dive in. If you haven’t read the previous post about scrobbler, I recommend you do that now.

First lets start by loading necessary packages

library(scrobbler)
library(dplyr)
library(anytime)
library(tidyr)

Now lets use scrobbler to grab all my music history. Note I’m using environment variables here. There will be a future post about this, but you can just pass the parameters as detailed in the scrobbler readme.

my_scrobbles <- download_scrobbles(
  Sys.getenv('LASTFM_API_USERNAME'),
  Sys.getenv('LASTFM_API_KEY')
  )

Lets just quickly check out what the data looks like

head(my_scrobbles)
##                              song_mbid   song_title
## 1 192915d7-c2df-44f6-9e08-b7d80745bdd3      So Soon
## 2 6dd374d1-e707-4de0-89b3-889fbb7d7bad       B Team
## 3 45d25340-5791-4f93-a642-94494b057646 Toy Soldiers
## 4 0f361896-d6fc-4179-9987-47bf59437c83      Stutter
## 5 34d419dd-eaf7-48de-b4df-704c61463cd7      Fallout
## 6 3c8fe6d5-66ac-3b8b-a3f5-36f63fcff693    Porcelain
##                            artist_mbid          artist X.attr.nowplaying
## 1 e358276d-4377-4b9b-88dd-db0d17b0e3c6 Marianas Trench              true
## 2 e358276d-4377-4b9b-88dd-db0d17b0e3c6 Marianas Trench              <NA>
## 3 e358276d-4377-4b9b-88dd-db0d17b0e3c6 Marianas Trench              <NA>
## 4 e358276d-4377-4b9b-88dd-db0d17b0e3c6 Marianas Trench              <NA>
## 5 e358276d-4377-4b9b-88dd-db0d17b0e3c6 Marianas Trench              <NA>
## 6 e358276d-4377-4b9b-88dd-db0d17b0e3c6 Marianas Trench              <NA>
##                             album_mbid      album  date_unix               date
## 1 180bb020-8349-4031-b8a3-bb544a396d84 Ever After         NA               <NA>
## 2 180bb020-8349-4031-b8a3-bb544a396d84 Ever After 1608151945 16 Dec 2020, 20:52
## 3 180bb020-8349-4031-b8a3-bb544a396d84 Ever After 1608151694 16 Dec 2020, 20:48
## 4 180bb020-8349-4031-b8a3-bb544a396d84 Ever After 1608151493 16 Dec 2020, 20:44
## 5 180bb020-8349-4031-b8a3-bb544a396d84 Ever After 1608151238 16 Dec 2020, 20:40
## 6 180bb020-8349-4031-b8a3-bb544a396d84 Ever After 1608150918 16 Dec 2020, 20:35

One thing to notice is that the date column contains time information. This is obviously useful in general, but we want to be able to group by day only, which wont be possible with time also included in the field. This is something scrobbler really should handle better through a) better naming (this is really a datetime, not a date), and b) also providing a date only column.

Luckily, a column like this is easy to parse into just a date, and since I’m lazy, I’m using anydate() from Dirk Eddelbuettel’s excellent anytime package, which is designed to parse anything into a date.

my_scrobbles$date_parsed <- anydate(my_scrobbles$date)
head(my_scrobbles)
##                              song_mbid   song_title
## 1 192915d7-c2df-44f6-9e08-b7d80745bdd3      So Soon
## 2 6dd374d1-e707-4de0-89b3-889fbb7d7bad       B Team
## 3 45d25340-5791-4f93-a642-94494b057646 Toy Soldiers
## 4 0f361896-d6fc-4179-9987-47bf59437c83      Stutter
## 5 34d419dd-eaf7-48de-b4df-704c61463cd7      Fallout
## 6 3c8fe6d5-66ac-3b8b-a3f5-36f63fcff693    Porcelain
##                            artist_mbid          artist X.attr.nowplaying
## 1 e358276d-4377-4b9b-88dd-db0d17b0e3c6 Marianas Trench              true
## 2 e358276d-4377-4b9b-88dd-db0d17b0e3c6 Marianas Trench              <NA>
## 3 e358276d-4377-4b9b-88dd-db0d17b0e3c6 Marianas Trench              <NA>
## 4 e358276d-4377-4b9b-88dd-db0d17b0e3c6 Marianas Trench              <NA>
## 5 e358276d-4377-4b9b-88dd-db0d17b0e3c6 Marianas Trench              <NA>
## 6 e358276d-4377-4b9b-88dd-db0d17b0e3c6 Marianas Trench              <NA>
##                             album_mbid      album  date_unix               date
## 1 180bb020-8349-4031-b8a3-bb544a396d84 Ever After         NA               <NA>
## 2 180bb020-8349-4031-b8a3-bb544a396d84 Ever After 1608151945 16 Dec 2020, 20:52
## 3 180bb020-8349-4031-b8a3-bb544a396d84 Ever After 1608151694 16 Dec 2020, 20:48
## 4 180bb020-8349-4031-b8a3-bb544a396d84 Ever After 1608151493 16 Dec 2020, 20:44
## 5 180bb020-8349-4031-b8a3-bb544a396d84 Ever After 1608151238 16 Dec 2020, 20:40
## 6 180bb020-8349-4031-b8a3-bb544a396d84 Ever After 1608150918 16 Dec 2020, 20:35
##   date_parsed
## 1        <NA>
## 2  2020-12-16
## 3  2020-12-16
## 4  2020-12-16
## 5  2020-12-16
## 6  2020-12-16

So now we have a column that identifies a unique day. Now we need a column that identifies each song. You might think we could use the song_title column for this, but we’d quickly run into issues with duplicated column names. For instance, take the song ‘Runaway’. From title alone, we have no idea if this is the ‘Runaway’ sung by Bon Jovi, The Corrs, Avril Lavigne, etc etc. The song_mbid (MusicBrainz ID) column is a partial solution to this, as it assigns a unique code for each song, however it is somewhat incomplete in the dataset, and so is not truly representative.

The simplest way to solve this is to just create a new column that concatenates song and artist, thus creating a unique combination. We can use tidyr’s unite function for this

my_scrobbles <- unite(my_scrobbles, 
                      song_id, 
                      c("song_title", "artist"), 
                      sep = "-", 
                      remove = FALSE)
head(my_scrobbles)
##                              song_mbid                      song_id
## 1 192915d7-c2df-44f6-9e08-b7d80745bdd3      So Soon-Marianas Trench
## 2 6dd374d1-e707-4de0-89b3-889fbb7d7bad       B Team-Marianas Trench
## 3 45d25340-5791-4f93-a642-94494b057646 Toy Soldiers-Marianas Trench
## 4 0f361896-d6fc-4179-9987-47bf59437c83      Stutter-Marianas Trench
## 5 34d419dd-eaf7-48de-b4df-704c61463cd7      Fallout-Marianas Trench
## 6 3c8fe6d5-66ac-3b8b-a3f5-36f63fcff693    Porcelain-Marianas Trench
##     song_title                          artist_mbid          artist
## 1      So Soon e358276d-4377-4b9b-88dd-db0d17b0e3c6 Marianas Trench
## 2       B Team e358276d-4377-4b9b-88dd-db0d17b0e3c6 Marianas Trench
## 3 Toy Soldiers e358276d-4377-4b9b-88dd-db0d17b0e3c6 Marianas Trench
## 4      Stutter e358276d-4377-4b9b-88dd-db0d17b0e3c6 Marianas Trench
## 5      Fallout e358276d-4377-4b9b-88dd-db0d17b0e3c6 Marianas Trench
## 6    Porcelain e358276d-4377-4b9b-88dd-db0d17b0e3c6 Marianas Trench
##   X.attr.nowplaying                           album_mbid      album  date_unix
## 1              true 180bb020-8349-4031-b8a3-bb544a396d84 Ever After         NA
## 2              <NA> 180bb020-8349-4031-b8a3-bb544a396d84 Ever After 1608151945
## 3              <NA> 180bb020-8349-4031-b8a3-bb544a396d84 Ever After 1608151694
## 4              <NA> 180bb020-8349-4031-b8a3-bb544a396d84 Ever After 1608151493
## 5              <NA> 180bb020-8349-4031-b8a3-bb544a396d84 Ever After 1608151238
## 6              <NA> 180bb020-8349-4031-b8a3-bb544a396d84 Ever After 1608150918
##                 date date_parsed
## 1               <NA>        <NA>
## 2 16 Dec 2020, 20:52  2020-12-16
## 3 16 Dec 2020, 20:48  2020-12-16
## 4 16 Dec 2020, 20:44  2020-12-16
## 5 16 Dec 2020, 20:40  2020-12-16
## 6 16 Dec 2020, 20:35  2020-12-16

So now all we need to do is group by day and song, and count the number of entries in each group. This is the sort of problem I love using dplyr for because it composes so beautifully.

my_scrobbles %>% 
  group_by(date_parsed, song_id) %>% 
  count(name = "times_played") %>% 
  arrange(desc(times_played)) %>% 
  head(8)
## # A tibble: 8 × 3
## # Groups:   date_parsed, song_id [8]
##   date_parsed song_id                                            times_played
##   <date>      <chr>                                                     <int>
## 1 2019-04-08  Break Free-Ariana Grande                                     51
## 2 2020-01-26  Oath  ft. Becky G-Cher Lloyd                                 49
## 3 2018-09-12  Nunca Me Olvides - Remix-Yandel                              33
## 4 2019-11-04  I Don't Want to Know - 2004 Remaster-Fleetwood Mac           33
## 5 2019-02-28  Before I Cry-Lady Gaga                                       32
## 6 2018-10-10  Halley Came To Jackson-Mary Chapin Carpenter                 29
## 7 2020-05-20  Inside Out-Camila Cabello                                    29
## 8 2020-10-22  Mustang-Abby Christo                                         29

And there we go, on 8th April, 2019, I played Break Free by Ariana Grande 51 times. Only slightly excessive :P.

While a simple analysis, this does show some of the cool things you can do with scrobbler. It also shows some more things we need to investigate. For instance, as I mentioned at thr start, Spotify thinks I played Mustang over 50 times on one day, but my data here says Mustang was only played 29 times. Given spotifys enormous engineering team, I’m doubting the issue is on their side, so it’ll become an exercise to me to figure out if the issue is in the Last.fm API, the scrobblers themselves, or my implementation of scrobbler

Take care…