Ever since publishing my first post about the development of scrobbler
I’ve been meaning to spend much more time writing and showcasing some of the analyses I wanted to do with it. As with many things, life gets in the way, but this is my effort to show something small and simple.
My idea for this post came from Spotify’s ‘2020 Wrapped’ video they make for you showing trends of your listening over the previous year. One thing that stood out to me was that I played Abby Christo’s excellent song Mustang over 50 times in a single day! This got me thinking, what if we extended that question back over the whole dataset of my music. Essentially we can ask the question “What song was played the most often on a specific day?”.
This is fairly easy to answer with scrobbler
and dplyr
, so lets dive in. If you haven’t read the previous post about scrobbler
, I recommend you do that now.
First lets start by loading necessary packages
library(scrobbler)
library(dplyr)
library(anytime)
library(tidyr)
Now lets use scrobbler
to grab all my music history. Note I’m using environment variables here. There will be a future post about this, but you can just pass the parameters as detailed in the scrobbler
readme.
my_scrobbles <- download_scrobbles(
Sys.getenv('LASTFM_API_USERNAME'),
Sys.getenv('LASTFM_API_KEY')
)
Lets just quickly check out what the data looks like
head(my_scrobbles)
## song_mbid song_title
## 1 192915d7-c2df-44f6-9e08-b7d80745bdd3 So Soon
## 2 6dd374d1-e707-4de0-89b3-889fbb7d7bad B Team
## 3 45d25340-5791-4f93-a642-94494b057646 Toy Soldiers
## 4 0f361896-d6fc-4179-9987-47bf59437c83 Stutter
## 5 34d419dd-eaf7-48de-b4df-704c61463cd7 Fallout
## 6 3c8fe6d5-66ac-3b8b-a3f5-36f63fcff693 Porcelain
## artist_mbid artist X.attr.nowplaying
## 1 e358276d-4377-4b9b-88dd-db0d17b0e3c6 Marianas Trench true
## 2 e358276d-4377-4b9b-88dd-db0d17b0e3c6 Marianas Trench <NA>
## 3 e358276d-4377-4b9b-88dd-db0d17b0e3c6 Marianas Trench <NA>
## 4 e358276d-4377-4b9b-88dd-db0d17b0e3c6 Marianas Trench <NA>
## 5 e358276d-4377-4b9b-88dd-db0d17b0e3c6 Marianas Trench <NA>
## 6 e358276d-4377-4b9b-88dd-db0d17b0e3c6 Marianas Trench <NA>
## album_mbid album date_unix date
## 1 180bb020-8349-4031-b8a3-bb544a396d84 Ever After NA <NA>
## 2 180bb020-8349-4031-b8a3-bb544a396d84 Ever After 1608151945 16 Dec 2020, 20:52
## 3 180bb020-8349-4031-b8a3-bb544a396d84 Ever After 1608151694 16 Dec 2020, 20:48
## 4 180bb020-8349-4031-b8a3-bb544a396d84 Ever After 1608151493 16 Dec 2020, 20:44
## 5 180bb020-8349-4031-b8a3-bb544a396d84 Ever After 1608151238 16 Dec 2020, 20:40
## 6 180bb020-8349-4031-b8a3-bb544a396d84 Ever After 1608150918 16 Dec 2020, 20:35
One thing to notice is that the date column contains time information. This is obviously useful in general, but we want to be able to group by day only, which wont be possible with time also included in the field. This is something scrobbler
really should handle better through a) better naming (this is really a datetime, not a date), and b) also providing a date only column.
Luckily, a column like this is easy to parse into just a date, and since I’m lazy, I’m using anydate()
from Dirk Eddelbuettel’s excellent anytime
package, which is designed to parse anything into a date.
my_scrobbles$date_parsed <- anydate(my_scrobbles$date)
head(my_scrobbles)
## song_mbid song_title
## 1 192915d7-c2df-44f6-9e08-b7d80745bdd3 So Soon
## 2 6dd374d1-e707-4de0-89b3-889fbb7d7bad B Team
## 3 45d25340-5791-4f93-a642-94494b057646 Toy Soldiers
## 4 0f361896-d6fc-4179-9987-47bf59437c83 Stutter
## 5 34d419dd-eaf7-48de-b4df-704c61463cd7 Fallout
## 6 3c8fe6d5-66ac-3b8b-a3f5-36f63fcff693 Porcelain
## artist_mbid artist X.attr.nowplaying
## 1 e358276d-4377-4b9b-88dd-db0d17b0e3c6 Marianas Trench true
## 2 e358276d-4377-4b9b-88dd-db0d17b0e3c6 Marianas Trench <NA>
## 3 e358276d-4377-4b9b-88dd-db0d17b0e3c6 Marianas Trench <NA>
## 4 e358276d-4377-4b9b-88dd-db0d17b0e3c6 Marianas Trench <NA>
## 5 e358276d-4377-4b9b-88dd-db0d17b0e3c6 Marianas Trench <NA>
## 6 e358276d-4377-4b9b-88dd-db0d17b0e3c6 Marianas Trench <NA>
## album_mbid album date_unix date
## 1 180bb020-8349-4031-b8a3-bb544a396d84 Ever After NA <NA>
## 2 180bb020-8349-4031-b8a3-bb544a396d84 Ever After 1608151945 16 Dec 2020, 20:52
## 3 180bb020-8349-4031-b8a3-bb544a396d84 Ever After 1608151694 16 Dec 2020, 20:48
## 4 180bb020-8349-4031-b8a3-bb544a396d84 Ever After 1608151493 16 Dec 2020, 20:44
## 5 180bb020-8349-4031-b8a3-bb544a396d84 Ever After 1608151238 16 Dec 2020, 20:40
## 6 180bb020-8349-4031-b8a3-bb544a396d84 Ever After 1608150918 16 Dec 2020, 20:35
## date_parsed
## 1 <NA>
## 2 2020-12-16
## 3 2020-12-16
## 4 2020-12-16
## 5 2020-12-16
## 6 2020-12-16
So now we have a column that identifies a unique day. Now we need a column that identifies each song. You might think we could use the song_title
column for this, but we’d quickly run into issues with duplicated column names. For instance, take the song ‘Runaway’. From title alone, we have no idea if this is the ‘Runaway’ sung by Bon Jovi, The Corrs, Avril Lavigne, etc etc. The song_mbid
(MusicBrainz ID) column is a partial solution to this, as it assigns a unique code for each song, however it is somewhat incomplete in the dataset, and so is not truly representative.
The simplest way to solve this is to just create a new column that concatenates song and artist, thus creating a unique combination. We can use tidyr
’s unite
function for this
my_scrobbles <- unite(my_scrobbles,
song_id,
c("song_title", "artist"),
sep = "-",
remove = FALSE)
head(my_scrobbles)
## song_mbid song_id
## 1 192915d7-c2df-44f6-9e08-b7d80745bdd3 So Soon-Marianas Trench
## 2 6dd374d1-e707-4de0-89b3-889fbb7d7bad B Team-Marianas Trench
## 3 45d25340-5791-4f93-a642-94494b057646 Toy Soldiers-Marianas Trench
## 4 0f361896-d6fc-4179-9987-47bf59437c83 Stutter-Marianas Trench
## 5 34d419dd-eaf7-48de-b4df-704c61463cd7 Fallout-Marianas Trench
## 6 3c8fe6d5-66ac-3b8b-a3f5-36f63fcff693 Porcelain-Marianas Trench
## song_title artist_mbid artist
## 1 So Soon e358276d-4377-4b9b-88dd-db0d17b0e3c6 Marianas Trench
## 2 B Team e358276d-4377-4b9b-88dd-db0d17b0e3c6 Marianas Trench
## 3 Toy Soldiers e358276d-4377-4b9b-88dd-db0d17b0e3c6 Marianas Trench
## 4 Stutter e358276d-4377-4b9b-88dd-db0d17b0e3c6 Marianas Trench
## 5 Fallout e358276d-4377-4b9b-88dd-db0d17b0e3c6 Marianas Trench
## 6 Porcelain e358276d-4377-4b9b-88dd-db0d17b0e3c6 Marianas Trench
## X.attr.nowplaying album_mbid album date_unix
## 1 true 180bb020-8349-4031-b8a3-bb544a396d84 Ever After NA
## 2 <NA> 180bb020-8349-4031-b8a3-bb544a396d84 Ever After 1608151945
## 3 <NA> 180bb020-8349-4031-b8a3-bb544a396d84 Ever After 1608151694
## 4 <NA> 180bb020-8349-4031-b8a3-bb544a396d84 Ever After 1608151493
## 5 <NA> 180bb020-8349-4031-b8a3-bb544a396d84 Ever After 1608151238
## 6 <NA> 180bb020-8349-4031-b8a3-bb544a396d84 Ever After 1608150918
## date date_parsed
## 1 <NA> <NA>
## 2 16 Dec 2020, 20:52 2020-12-16
## 3 16 Dec 2020, 20:48 2020-12-16
## 4 16 Dec 2020, 20:44 2020-12-16
## 5 16 Dec 2020, 20:40 2020-12-16
## 6 16 Dec 2020, 20:35 2020-12-16
So now all we need to do is group by day and song, and count the number of entries in each group. This is the sort of problem I love using dplyr
for because it composes so beautifully.
my_scrobbles %>%
group_by(date_parsed, song_id) %>%
count(name = "times_played") %>%
arrange(desc(times_played)) %>%
head(8)
## # A tibble: 8 × 3
## # Groups: date_parsed, song_id [8]
## date_parsed song_id times_played
## <date> <chr> <int>
## 1 2019-04-08 Break Free-Ariana Grande 51
## 2 2020-01-26 Oath ft. Becky G-Cher Lloyd 49
## 3 2018-09-12 Nunca Me Olvides - Remix-Yandel 33
## 4 2019-11-04 I Don't Want to Know - 2004 Remaster-Fleetwood Mac 33
## 5 2019-02-28 Before I Cry-Lady Gaga 32
## 6 2018-10-10 Halley Came To Jackson-Mary Chapin Carpenter 29
## 7 2020-05-20 Inside Out-Camila Cabello 29
## 8 2020-10-22 Mustang-Abby Christo 29
And there we go, on 8th April, 2019, I played Break Free by Ariana Grande 51 times. Only slightly excessive :P.
While a simple analysis, this does show some of the cool things you can do with scrobbler
. It also shows some more things we need to investigate. For instance, as I mentioned at thr start, Spotify thinks I played Mustang
over 50 times on one day, but my data here says Mustang was only played 29 times. Given spotifys enormous engineering team, I’m doubting the issue is on their side, so it’ll become an exercise to me to figure out if the issue is in the Last.fm API, the scrobblers themselves, or my implementation of scrobbler
…
Take care…