Overview
Since 2008 I’ve been uploading (called “scrobbling”) my listens to my Last.fm profile, so I have quite the trove of data. I then sync my Last.fm listen history to ListenBrainz, mostly for data redundancy.
In this post I’ll use VisiData to explore my listen history. VisiData makes it really easy to wrangle tabular data, and is an indispensible tool in any engineer’s toolkit. My two main goals for this post are:
- Showcase how VisiData can be used to manipulate and explore tabular data.
- Gain more familiarity with the tool myself.
Setup
First I sync my listens from Last.fm to ListenBrainz. I do this mostly for data redundancy, but ListenBrainz also makes it convenient to export my listens. Last.fm doesn’t provide such an export service.
ListenBrainz exports into a JSON file. Once we have this file we can proceed to the next step, where we clean up the data.
Clean the data
Before we begin exploring our data, we should clean it up first. VisiData supports not just CSVs, but JSON as well, so we can open the JSON file directly. Open the file downloaded from ListenBrainz with the following command.
$ vd listens.json
Here’s what VisiData looks like.
File Edit View Column Row Data Plot System Help Ctrl+H for help menu track_metadata | listened_at#| recording_msid | user_name | inserted_at#║ {4} | 1645359422 | 122938e1-5157-432b…| hbenjamin | 1645375360 ║ {4} | 1645359060 | 3e027b89-a08f-4664…| hbenjamin | 1645375360 ║ {4} | 1645358737 | 7f327fee-1d9e-460c…| hbenjamin | 1645375360 ║ {4} | 1645358456 | c2c55801-bb73-498f…| hbenjamin | 1645375360 ║ {4} | 1645358120 | b4eea3be-23dd-4982…| hbenjamin | 1645375360 ║ {4} | 1645357874 | 32b5380d-3745-45f2…| hbenjamin | 1645375360 ║ {5} | 1645357642 | 27cd64c9-25fb-45c6…| hbenjamin | 1645375360 ║ {5} | 1645357445 | 0e29105e-f268-4cbf…| hbenjamin | 1645375360 ║ {5} | 1645357131 | 602060cc-589e-4adb…| hbenjamin | 1645375360 ║ {5} | 1645356867 | 60d1968a-bda5-4749…| hbenjamin | 1645375360 ║ {5} | 1645356628 | c329612b-426c-4272…| hbenjamin | 1645375360 ║ {5} | 1645356287 | cb3c39b9-8d69-4b0a…| hbenjamin | 1645375360 ║ {5} | 1645356049 | 7b891d32-7de8-4d12…| hbenjamin | 1645375360 ║ {5} | 1645355785 | b4f66b79-d157-45d5…| hbenjamin | 1645375360 ║ {5} | 1645355474 | 1f81475f-0e31-4d7d…| hbenjamin | 1645375360 ║ {5} | 1645355171 | 3d3e1488-d6d7-4d25…| hbenjamin | 1645375360 ║ {5} | 1645355137 | 4bfafb45-7560-461f…| hbenjamin | 1645375360 ║ {5} | 1645286392 | 0511a065-fd94-4202…| hbenjamin | 1645375360 ║ {5} | 1645286079 | 5cff0e56-299a-4c85…| hbenjamin | 1645375360 ║ {5} | 1645174407 | aedf1436-d310-4934…| hbenjamin | 1645375360 ║ {5} | 1645165001 | b11944f4-9987-470b…| hbenjamin | 1645375360 ║ {5} | 1645164652 | dfa87939-e1d0-4713…| hbenjamin | 1645375360 ║ 1› listens| saul.pw/VisiData v2.8 | opening /tmp/listens.json as json | Someo 94081 rows •0
I immediately noticed that about 1500 records have a malformed listened_at
value.
File Edit View Column Row Data Plot System Help Ctrl+H for help menu track_metadata | listened_at#| recording_msid | user_name | inserted_at#║ {5} | 12 | 7ee0fbe2-3e13-4dd2…| hbenjamin | 1595522511 ║ {5} | 11 | cec1b686-8a57-438f…| hbenjamin | 1595522511 ║ {5} | 10 | cec1b686-8a57-438f…| hbenjamin | 1595522511 ║ {5} | 9 | 24cd6cdb-a089-449f…| hbenjamin | 1595522511 ║ {5} | 8 | 93fbf716-2e82-48cf…| hbenjamin | 1595522511 ║ {5} | 7 | 9af6db9c-8ec8-49dd…| hbenjamin | 1595522511 ║ {5} | 6 | 9af6db9c-8ec8-49dd…| hbenjamin | 1595522511 ║ {5} | 5 | 1a8ee0ac-6bf6-4614…| hbenjamin | 1595522511 ║ {4} | 4 | 51e187d2-d463-44a6…| hbenjamin | 1595522511 ║ {5} | 3 | 8d8a4f12-d713-4fff…| hbenjamin | 1595522511 ║ {5} | 2 | 3b6a31bd-3c28-4757…| hbenjamin | 1595522511 ║ {5} | 1 | 3b6a31bd-3c28-4757…| hbenjamin | 1595522511 ║ 1› listens| j go-down 94081 rows •0
It’s supposed to be a unix timestamp, but these records have an autoincrementing integer from 1 to about 1500 instead. We’ll remove these records.
- Select valid unix timestamps using a Python expression with
z| listened_at > 100000
. - Open the selection in new sheet with
"
.
Next we’ll remove some uninteresting tracks. I often listen to an internet radio station called soma.fm, which has the occasional ad break. These ads end up being registered in my listening history.
- Expand the
track_metadata
column with(
. - Select the
track_metadata.artist_name
column. - Create a frequency table with
Shift+F
, this will open a new sheet with a histogram. - Select rows in the frequency table containing “soma” with
|soma
. - Open a new sheet with
"
.
File Edit View Column Row Data Plot System Help Ctrl+H for help menu track_metadata.artist_name ║ count♯| percent%| histogram ~║ Somafm Id ║ 116 | 0.12 | * ║ SomaFM ║ 97 | 0.10 | * ║ SomaFM Secret Agent ║ 67 | 0.07 | * ║ SomaFM DropIn ║ 34 | 0.04 | ║ Soma Fm Spy Numbers ║ 18 | 0.02 | ║ SomaFM Secret Agent Sfx ║ 14 | 0.01 | ║ SomaFM Secret Agent Dropin ║ 9 | 0.01 | ║ SomaFM Begathon ║ 9 | 0.01 | ║ New SomaFM ID ║ 2 | 0.00 | ║ Soma Sonic ║ 2 | 0.00 | ║ SomaFM Xmas in Frisko ║ 1 | 0.00 | ║ SomaFM Xmas Dropin ║ 1 | 0.00 | ║ 3› listens_track_metadata.artist_name_freq_selectedref| KEY_RESIZE redraw 12 bins •0
From this we can see that there are about 300 of these tracks.
- Select all rows in the frequency table with
gs
. - Go back to the original sheet with
q
. Notice that the rows in the original sheet that were in the histogram buckets are now also selected. - Invert the selection with
gt
. Now we are selecting every row except the soma.fm ads. - Create a new sheet with
"
.
Next we’ll rename some of the columns. Since we opened a JSON file with
nested objects, our sheet contains many columns prefixed with
track_metadata.
. Let’s start by flattening the sheet.
- Expand the nested columns using
(
. - Open the Columns sheet with
Shift+C
.
File Edit View Column Row Data Plot System Help Ctrl+H for help menu name ║ width#| height#| type | fmtstr | value > track_metadata ║ 0 | 1 | | | {4} track_metadata.artist_name ║ 63 | 1 | | | Rudimental/A… track_metadata.release_name ║ 29 | 1 | | | Home (Deluxe… track_metadata.additional_info ║ 0 | 1 | | | {5} track_metadata.additional_info.artist_msid ║ 44 | 1 | | | 1193820a-2fa… track_metadata.additional_info.release_msid ║ 45 | 1 | | | c2e8775c-722… track_metadata.additional_info.listening_from ║ 47 | 1 | | | lastfm track_metadata.additional_info.recording_mbid ║ 47 | 1 | | | track_metadata.additional_info.recording_msid ║ 47 | 1 | | | c2c55801-bb7… track_metadata.additional_info.lastfm_artist_mbid ║ 51 | 1 | | | track_metadata.additional_info.lastfm_release_mbid ║ 52 | 1 | | | track_metadata.additional_info.lastfm_track_mbid ║ 50 | 1 | | | 1fd04c2e-940… track_metadata.track_name ║ 69 | 1 | | | Hell Could F… track_metadata.mbid_mapping ║ 29 | 1 | | | listened_at ║ 13 | 1 | int | | 1645358456 recording_msid ║ 38 | 1 | | | c2c55801-bb7… user_name ║ 11 | 1 | | | hbenjamin inserted_at ║ 13 | 1 | int | | 1645375360 Shift+C› listens_selectedref_selectedref_columns| j go-down 18 columns •0
Let’s start by removing the track_metadata.
prefix from these columns. We can
do this one by one by using e
to edit each cell, but let’s do it in bulk with
a search-and-replace instead.
- Select columns containing
track_metadata.
with| track_metadata\.
. - Search and replace selected values with
g* track_metadata\./
(g*
accepts input in the form<regex>/<substitution>
).
File Edit View Column Row Data Plot System Help Ctrl+H for help menu name ║ width#| height#| type | fmtstr | value > track_metadata ║ 0 | 1 | | | {4} artist_name ║ 63 | 1 | | | Rudimental/A… release_name ║ 29 | 1 | | | Home (Deluxe… additional_info ║ 0 | 1 | | | {5} additional_info.artist_msid ║ 44 | 1 | | | 1193820a-2fa… additional_info.release_msid ║ 45 | 1 | | | c2e8775c-722… additional_info.listening_from ║ 47 | 1 | | | lastfm additional_info.recording_mbid ║ 47 | 1 | | | additional_info.recording_msid ║ 47 | 1 | | | c2c55801-bb7… additional_info.lastfm_artist_mbid ║ 51 | 1 | | | additional_info.lastfm_release_mbid ║ 52 | 1 | | | additional_info.lastfm_track_mbid ║ 50 | 1 | | | 1fd04c2e-940… track_name ║ 69 | 1 | | | Hell Could F… mbid_mapping ║ 29 | 1 | | | listened_at ║ 13 | 1 | int | | 1645358456 recording_msid ║ 38 | 1 | | | c2c55801-bb7… user_name ║ 11 | 1 | | | hbenjamin inserted_at ║ 13 | 1 | int | | 1645375360 Shift+C› listens_selectedref_selectedref_columns| j go-down 18 columns [M] •0
Previously we’ve been looking at the Columns sheet. Let’s take a look at the Statistics sheet now. It’s similar to the Columns sheet, but has some basic statistics about the values in each column.
- On the main sheet, go to the Statistics sheet with
Shift+I
.
File Edit View Column Row Data Plot System Help Ctrl+H for help menu column ║ errors♯| nulls♯| distinct♯| mode~| min~| max~| sum | median~> artist_name ║ 0 | 0 | 5940 | E.S.…| | | | | release_name ║ 0 | 12065 | 874 | Cart…| | | | | additional_info.artist_msid ║ 0 | 0 | 5971 | e665…| | | | | additional_info.release_msid ║ 0 | 12065 | 926 | 17b5…| | | | | additional_info.listening_from ║ 0 | 0 | 1 | last…| | | | | additional_info.recording_mbid ║ 0 | 24986 | 7658 | 00bd…| | | | | additional_info.recording_msid ║ 0 | 0 | 17970 | cd48…| | | | | additional_info.lastfm_artist_mbid ║ 0 | 23291 | 1794 | 391c…| | | | | additional_info.lastfm_release_mbid ║ 0 | 23241 | 739 | 977f…| | | | | additional_info.lastfm_track_mbid ║ 0 | 81746 | 4123 | 006c…| | | | | track_name ║ 0 | 0 | 14544 | Intro| | | | | mbid_mapping ║ 84524 | 8933 | 1 | !| | | | | listened_at ║ 0 | 0 | 93447 | 1624…| 1 | 164…| 12…#| 140665…| recording_msid ║ 0 | 0 | 17970 | cd48…| | | | | user_name ║ 0 | 0 | 1 | hben…| | | | | inserted_at ║ 0 | 0 | 1638 | 1642…| 159…| 164…| 14…#| 159552…| 11› listens_selectedref_selectedref_describe| j go-down 16 columns •0
Here we can see some interesting information, like how many distinct artists, albums and tracks I’ve listened to. We’ll do more analysis later, for now I’d like to draw your eye to columns with only 1 distinct value.
Columns with only 1 distinct value tend not to be interesting. The column
listened_from
that contains only the value lastfm
. The user_name
column
contains only my username. Let’s delete these columns.
- Return to the Columns sheet with
Shift+C
. - Delete the
listened_from
anduser_name
columns withd
.
Let’s expand all our remaining columns. You’ll notice that we have quite a few ID columns, either MSID or MBID. MSIDs are MessyBrainz IDs whereas MBIDs are MusicBrainz IDs. In short, MSIDs represent unclean data, and MBIDs represent clean data.
File Edit View Column Row Data Plot System Help Ctrl+H for help menu name ║ width#| height#| type | fmtstr | value | expr > additional_info.artist_msid ║ 38 | 1 | | | 548b0bb6-80b6-43c5…| additi… additional_info.release_msid ║ 38 | 1 | | | c2e8775c-722f-4cff…| additi… additional_info.recording_msid ║ 38 | 1 | | | 27cd64c9-25fb-45c6…| additi… recording_msid ║ 38 | 1 | | | 27cd64c9-25fb-45c6…| record… additional_info.lastfm_track_mbid ║ 38 | 1 | | | 18b9ade6-6791-4b27…| additi… additional_info.lastfm_artist_mbid ║ 38 | 1 | | | b09ced80-aa65-43c1…| additi… mbid_mapping ║ 0 | 1 | | | {3} | mbid_m… mbid_mapping.recording_mbid ║ 38 | 1 | | | ddf2e8df-f671-4975…| mbid_mapping.release_mbid ║ 38 | 1 | | | ec6d908f-cf79-4225…| mbid_mapping.artist_mbids ║ 0 | 1 | | | [1] | mbid_mapping.artist_mbids[0] ║ 38 | 1 | | | b09ced80-aa65-43c1…| mbid_mapping.artist_mbids[1] ║ 38 | 1 | | | | mbid_mapping.artist_mbids[2] ║ 38 | 1 | | | | mbid_mapping.artist_mbids[3] ║ 38 | 1 | | | | mbid_mapping.artist_mbids[4] ║ 38 | 1 | | | | additional_info.lastfm_release_mbid ║ 38 | 1 | | | | additi… additional_info.recording_mbid ║ 32 | 1 | | | | additi… 4› listens-cleaned_columns_selectedref| KEY_RESIZE redraw 17 columns •0
Let’s get rid of the MSIDs since these represent unclean data. We’ll also get
rid of all the fields populated by Last.fm and only use those populated by
MusicBrainz. Basically, we’ll just keep the ID fields
mbid_mapping.artist_mbids[0]
, mbid_mapping.release_mbid
,
mbid_mapping.recording_mbid
. We’ll also remove the mbid_mapping.
prefix.
These are the columns we have now.
File Edit View Column Row Data Plot System Help Ctrl+H for help menu name ║ width#| height#| type | fmtstr | value | expr | aggregators ║ artist_name ║ 20 | 1 | | | Rudimental/Becky H…| 0 #| ║ release_name ║ 20 | 1 | | | Home (Deluxe Editi…| 1 #| ║ track_name ║ 20 | 1 | | | Powerless | 2 #| ║ listened_at ║ 13 | 1 | | | 1645359422 | 3 #| ║ recording_mbid ║ 20 | 1 | | | | 4 #| ║ release_mbid ║ 20 | 1 | | | | 5 #| ║ artist_mbid ║ 38 | 1 | | | | 6 #| ║ Shift+C› listens-cleaned_columns| k go-up 7 columns •0
Looks good so far. Save it with Ctrl+S
. VisiData will prompt for the output
filepath. We’ll save it as a CSV file now instead of JSON. Call it
listens-cleaned.csv
.
Artists with the most listens
Let’s start with something simple: finding the artists with the most listens. If you followed the section where we cleaned up the data, you might already have an idea of how to accomplish this.
- Select the
artist_name
column - Open a frequency table with
Shift+F
File Edit View Column Row Data Plot System Help Ctrl+H for help menu artist_name ║↓count♯| percent%| histogram ~║ E.S. Posthumus ║ 2963 | 3.22 | ************************************************** ║ Queen ║ 2762 | 3.01 | ********************************************** ║ Black Sabbath ║ 2562 | 2.79 | ******************************************* ║ Nightwish ║ 1694 | 1.84 | **************************** ║ Within Temptation ║ 1674 | 1.82 | **************************** ║ Amon Amarth ║ 1631 | 1.77 | *************************** ║ Flobots ║ 1613 | 1.76 | *************************** ║ Opeth ║ 1594 | 1.73 | ************************** ║ Pendulum ║ 1556 | 1.69 | ************************** ║ Florence + the Mac…║ 1449 | 1.58 | ************************ ║ The xx ║ 1405 | 1.53 | *********************** ║ Delain ║ 1377 | 1.50 | *********************** ║ Pantera ║ 1286 | 1.40 | ********************* ║ Iron Maiden ║ 1276 | 1.39 | ********************* ║ Bon Jovi ║ 1224 | 1.33 | ******************** ║ Epica ║ 1158 | 1.26 | ******************* ║ Massive Attack ║ 1128 | 1.23 | ******************* ║ Rainbow ║ 1119 | 1.22 | ****************** ║ Dark Tranquillity ║ 1065 | 1.16 | ***************** ║ AC/DC ║ 1052 | 1.14 | ***************** ║ Metallica ║ 967 | 1.05 | **************** ║ Avenged Sevenfold ║ 955 | 1.04 | **************** ║ Krewella ║ 952 | 1.04 | **************** ║ Judas Priest ║ 904 | 0.98 | *************** ║ 2› listens-cleaned_artist_name_freq| Shift+F 5940 bins •0
It’s that simple! From the frequency table we can also hit Enter
on any row,
VisiData will show us all the listens from that artist.
The frequency table is also just a sheet like any other, so you can filter, search, and manipulate this sheet however you wish.
Albums with the most listens
Let’s find the top albums next. We could simply do the same as the top artists
and use Shift+F
on the release_name
field, but this doesn’t account for
different artists with the same album name. Instead we can use key columns to
tell VisiData what the unique key should be when generating the histogram.
- Select the
artist_name
column and use!
to mark it as a key column. The column will turn blue to indicate it is now a key column. - Select the
release_name
column and use!
to mark it as a key column. - Open a histogram with
gF
.
File Edit View Column Row Data Plot System Help Ctrl+H for help menu artist_name | release_name ║↓count♯| percent%| histogram E.S. Posthumus | Cartographer ║ 1603 | 1.74 | *************************************… The xx | xx ║ 1307 | 1.42 | *************************************… Bon Jovi | Crossroad ║ 1034 | 1.13 | ******************************** Florence + the Mac…| Lungs ║ 1025 | 1.12 | ******************************* Queen | Greatest Hits I ║ 978 | 1.06 | ****************************** Queen | Greatest Hits II ║ 916 | 1.00 | **************************** Amon Amarth | With Oden On Our S…║ 908 | 0.99 | **************************** Dio | Holy Diver ║ 895 | 0.97 | *************************** Queen | Greatest Hits III ║ 866 | 0.94 | *************************** E.S. Posthumus | Unearthed ║ 850 | 0.92 | ************************** Flobots | Fight With Tools ║ 846 | 0.92 | ************************** Pendulum | Hold Your Colour ║ 834 | 0.91 | ************************** Within Temptation | Silent Force ║ 775 | 0.84 | ************************ Flobots | Survival Story ║ 766 | 0.83 | *********************** Grimes | Visions ║ 644 | 0.70 | ******************** Pantera | Cowboys From Hell ║ 614 | 0.67 | ******************* Portishead | Dummy ║ 585 | 0.64 | ****************** Delain | Lucidity ║ 565 | 0.61 | ***************** Within Temptation | The Heart of Every…║ 538 | 0.59 | **************** Pendulum | In Sillico ║ 532 | 0.58 | **************** Opeth | Ghost Reveries ║ 519 | 0.56 | **************** E.S. Posthumus | Makara ║ 510 | 0.55 | *************** Delain | April Rain ║ 495 | 0.54 | *************** OceanLab | Sirens of the Sea ║ 487 | 0.53 | *************** 3› listens-cleaned_artist_name-release_name_freq| gShift+F 6608 bins •0
Tip: Many VisiData commands can be prefixed with
g
for a “global” version. For instance,s
selects a single row whilegs
selects all rows.Shift+F
opens a frequency table on the selected column, whilegF
opens one on the key columns.
Top artists for each year
Let’s find which were my most listened-to artists for each given year. To do so
we need to derive a year
column from our listened_at
column, which is
currently just a unix timestamp.
- Set the
listened_at
column’s data type to integer with#
. - Create a new column using a Python expression using
= datetime.datetime.fromtimestamp(listened_at).year
- Set the new column’s data type to integer with
#
- Rename the new column with
^
. Name ityear
.
We should have something that looks like this:
File Edit View Column Row Data Plot System Help Ctrl+H for help menu artist_name | release_name | track_name | listened_at #| year#| reco> Within Temptation | The Unforgiving | In the Mid…| 1645355474 | 2022 | 20fb… Within Temptation | The Unforgiving | Shot in th…| 1645355171 | 2022 | bbbe… Within Temptation | The Unforgiving | Why Not Me | 1645355137 | 2022 | e97d… AC/DC | Back in Black | Shoot to T…| 1645286392 | 2022 | 994b… AC/DC | Back in Black | Hells Bells| 1645286079 | 2022 | c7c2… 1› listens-cleaned| l go-right 91895 rows •0
To see the artists with the most listens per year, we can use a pivot table.
- Mark the
artist_name
column as a key column with!
. - Mark the
year
column as a key column with!
. - Select the
year
column and pivot the table around it withShift+W
.
We should now have something that looks like this:
File Edit View Column Row Data Plot System Help Ctrl+H for help menu artist_name ║↓2022#| 2021#| 2020#| 2019#| 2018#| 2017#| 2016#| 2015#| 2014#| 2013#|> Celestial Aeon Project ║ 126 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |0 Black Sabbath ║ 86 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |0 Purity Ring ║ 40 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |0 Nightwish ║ 36 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |0 Queen ║ 33 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |0 Moderator ║ 29 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |0 Dio ║ 24 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |0 Ludovico Einaudi ║ 22 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |0 Iron Maiden ║ 22 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |0 Blackmore's Night ║ 21 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |0 Scar Symmetry ║ 21 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |0 Within Temptation ║ 20 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |0 DragonForce ║ 20 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |0 Massive Attack ║ 20 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |0 28› listens-cleaned_pivot_year_count| KEY_RESIZE redraw 8264 grouped rows •0
To view the top artists for different years, simply select the year and use ]
to sort in descending order. Here’s my top artists five years ago in 2017.
File Edit View Column Row Data Plot System Help Ctrl+H for help menu artist_name ║ 2022#| 2021#| 2020#| 2019#| 2018#|↓2017#| 2016#| 2015#| 2014#| 2013#|> E.S. Posthumus ║ 0 | 0 | 0 | 0 | 0 | 240 | 0 | 0 | 0 | 0 |0 Celestial Aeon Project ║ 0 | 0 | 0 | 0 | 0 | 200 | 0 | 0 | 0 | 0 |0 Queen ║ 0 | 0 | 0 | 0 | 0 | 188 | 0 | 0 | 0 | 0 |0 Amon Amarth ║ 0 | 0 | 0 | 0 | 0 | 186 | 0 | 0 | 0 | 0 |0 Amaranthe ║ 0 | 0 | 0 | 0 | 0 | 168 | 0 | 0 | 0 | 0 |0 Florence + the Machine ║ 0 | 0 | 0 | 0 | 0 | 158 | 0 | 0 | 0 | 0 |0 Within Temptation ║ 0 | 0 | 0 | 0 | 0 | 145 | 0 | 0 | 0 | 0 |0 Avenged Sevenfold ║ 0 | 0 | 0 | 0 | 0 | 143 | 0 | 0 | 0 | 0 |0 Black Sabbath ║ 0 | 0 | 0 | 0 | 0 | 133 | 0 | 0 | 0 | 0 |0 Carly Rae Jepsen ║ 0 | 0 | 0 | 0 | 0 | 127 | 0 | 0 | 0 | 0 |0 Nightwish ║ 0 | 0 | 0 | 0 | 0 | 102 | 0 | 0 | 0 | 0 |0 Iron Maiden ║ 0 | 0 | 0 | 0 | 0 | 101 | 0 | 0 | 0 | 0 |0 Arch Enemy ║ 0 | 0 | 0 | 0 | 0 | 96 | 0 | 0 | 0 | 0 |0 Epica ║ 0 | 0 | 0 | 0 | 0 | 93 | 0 | 0 | 0 | 0 |0 28› listens-cleaned_pivot_year_count| ] sort-desc 8264 grouped rows •0
The same pivot approach can be used to find the top albums for each year. Make both the
artist_name
and release_name
key columns, then pivot on the year
column again.
Top 20 artists for each year
Let’s do something similar to the previous section, but this time we select the
top 20 artists for each year. The goal is to have a sheet with 21 columns: 1
year
column and 20 topN
columns.
| year | top1 | top2 | top3 | ... | top20 |
We’ll start by generating a histogram.
- Mark the
year
andartist_name
columns as key columns with!
- Create a frequency table on our key columns with
gF
- Sort the
year
column in descending order with]
. - Sort the
count
column as well. This time we add it to existing sort criteria withz]
.
Now we are sorting the frequency table by year descending, then count descending.
File Edit View Column Row Data Plot System Help Ctrl+H for help menu ↓year #| artist_name ║ count♯| percent%| histogram 2022 | Celestial Aeon Pro…║ 126 | 0.14 | ********* 2022 | Black Sabbath ║ 86 | 0.09 | ****** 2022 | Purity Ring ║ 40 | 0.04 | ** 2022 | Nightwish ║ 36 | 0.04 | ** 2022 | Queen ║ 33 | 0.04 | ** 2022 | Moderator ║ 29 | 0.03 | ** 2022 | Dio ║ 24 | 0.03 | * 2022 | Ludovico Einaudi ║ 22 | 0.02 | * 2022 | Iron Maiden ║ 22 | 0.02 | * 2› listens-cleaned_year-artist_name_freq| KEY_RESIZE redraw 8718 bins •0
Next we’ll flatten the sheet to get our desired column layout. To do so we’ll
utilize frequency tables again, but this time with a list
aggregator applied
to the artist_name
column.
- Apply an aggregator on the
artist_name
column with+
. Typelist
or pressCtrl+X
for an interactive menu. - Create a frequency table on the
year
column withShift+F
.
With the list
aggregator on the artist_name
column, when VisiData creates
the frequency table, it also aggregates the values of that column into a list.
The frequency table looks like this.
File Edit View Column Row Data Plot System Help Ctrl+H for help menu year #║ count♯| artist_name_list ║ 2022 ║ 335 | [335] ║ 2021 ║ 1537 | [1537] ║ 2020 ║ 1129 | [1129] ║ 2019 ║ 24 | [24] ║ 6› listens-cleaned_year-artist_name_freq_year_freq| KEY_RESIZE redraw 15 bins •0
The artist_name_list
column is an array, so can be expanded with (
. The
sheet now looks like this.
File Edit View Column Row Data Plot System Help Ctrl+H for help menu year#║ count♯| artist_name_list[0] | artist_name_list[1] | artist_name_list[2] | artist_nam> 2022 ║ 335 | Celestial Aeon Project | Black Sabbath | Purity Ring | Nightwish 2021 ║ 1537 | Nightwish | Epica | Within Temptation | ABBA 2020 ║ 1129 | Nightwish | Within Temptation | Black Sabbath | Moderator 2019 ║ 24 | Dance With The Dead | The xx | Bhelliom | Avenged Se… 12› listens-cleaned_year-artist_name_freq_year_freq| KEY_RESIZE redraw 15 bins •0
We only want the top 20, so let’s use a Python expression to extract just the first 20 elements from this array.
- Collapse the
artist_name_list
column again with)
. - Create a new column using a Python expression with
= artist_name_list[:20]
. - Expand the new column with
(
.
The new columns look like:
File Edit View Column Row Data Plot System Help Ctrl+H for help menu year#║ artist_name_list[:20][0] | artist_name_list[:20][1] | artist_name_list[:20][2] | artist_na> 2022 ║ Celestial Aeon Project | Black Sabbath | Purity Ring | Nightwish 2021 ║ Nightwish | Epica | Within Temptation | ABBA 2020 ║ Nightwish | Within Temptation | Black Sabbath | Moderator 2019 ║ Dance With The Dead | The xx | Bhelliom | Avenged S… 12› listens-cleaned_year-artist_name_freq_year_freq| k go-up 15 bins •0
They’re not very nicely named, so let’s clean it up. We can rename it one-by-one, but I’ll just hack out something in Python to do so.
- Open the Columns sheet with
Shift+C
. - Select the columns using a search with
| artist_name_list
. - Rename the columns using a Python expression with
g= 'top' + str(int(name.split('][')[-1][:-1]) + 1)
. Not the cleanest, but it’ll work.
Now we have the sheet in our desired format. We can see from a quick glance there are some artists that consistently appear in my top 20 over the years.
File Edit View Column Row Data Plot System Help Ctrl+H for help menu year#║ top1 | top2 | top3 | top4 | top5 > 2022 ║ Celestial Aeon Project | Black Sabbath | Purity Ring | Nightwish | Queen 2021 ║ Nightwish | Epica | Within Temptation | ABBA | Pantera 2020 ║ Nightwish | Within Temptation | Black Sabbath | Moderator | ABBA 2019 ║ Dance With The Dead | The xx | Bhelliom | Avenged Sevenfold | Timecop1983 2018 ║ Dance With The Dead | Carly Rae Jepsen | Blackmore's Night | Slow Club | Queen 2017 ║ E.S. Posthumus | Celestial Aeon Project | Queen | Amon Amarth | Amaranthe 2016 ║ E.S. Posthumus | The xx | Black Sabbath | Grimes | Massive Atta… 2015 ║ Queen | The Glitch Mob | Ludovico Einaudi | E.S. Posthumus | Rainbow 2014 ║ Krewella | Queen | E.S. Posthumus | Grimes | Within Tempt… 2013 ║ Krewella | E.S. Posthumus | Queen | Purity Ring | Flobots 2012 ║ Queen | E.S. Posthumus | Dragonette | Flobots | Florence + t… 2011 ║ Flobots | Pendulum | Black Sabbath | The xx | Queen 2010 ║ Black Sabbath | Florence + the Machine | Flobots | E.S. Posthumus | Overkill 2009 ║ Black Sabbath | Opeth | E.S. Posthumus | AC/DC | Queen 2008 ║ Opeth | Amon Amarth | Dark Tranquillity | Iron Maiden | Judas Priest 12› listens-cleaned_year-artist_name_freq_year_freq| k go-up 15 bins •0
Artists with the most staying power over the years
In the previous section we found the top 20 artists for each year. Let’s extend from those results and find the artists that have appeared the most in my 14 years of listens.
- Melt the sheet with
Shift+M
. A melt is an inverse of a pivot.
The sheet now looks like this.
File Edit View Column Row Data Plot System Help Ctrl+H for help menu year#║ Variable | Value ║ 2022 ║ top1 | Celestial Aeon Project ║ 2022 ║ top2 | Black Sabbath ║ 2022 ║ top3 | Purity Ring ║ 2022 ║ top4 | Nightwish ║ 2022 ║ top5 | Queen ║ 2022 ║ top6 | Moderator ║ 2022 ║ top7 | Dio ║ 2022 ║ top8 | Ludovico Einaudi ║ 2022 ║ top9 | Iron Maiden ║ 2022 ║ top10 | Blackmore's Night ║ 2022 ║ top11 | Scar Symmetry ║ 2022 ║ top12 | Within Temptation ║ 2022 ║ top13 | DragonForce ║ 2022 ║ top14 | Massive Attack ║ 2022 ║ top15 | Ensiferum ║ 2022 ║ top16 | Amon Amarth ║ 15› listens-cleaned_year-artist_name_freq_year_freq_melted| h go-left 300 melted values •0
By default when VisiData does a melt, it adds the max
aggregator to the
Value
column. Let’s first remove it by going into the Columns sheet with
Shift+C
and using e
to edit the aggregators cell for the Value
column.
Now we can simply create a frequency table to get our results.
- Select the
Value
column. - Create a frequency table with
Shift+F
.
File Edit View Column Row Data Plot System Help Ctrl+H for help menu Value ║↓count♯| percent%| histogram ~║ Black Sabbath ║ 13 | 4.33 | ************************************************** ║ Queen ║ 12 | 4.00 | ********************************************** ║ Amon Amarth ║ 11 | 3.67 | ****************************************** ║ Florence + the Machine ║ 10 | 3.33 | ************************************** ║ E.S. Posthumus ║ 10 | 3.33 | ************************************** ║ Within Temptation ║ 9 | 3.00 | ********************************** ║ Delain ║ 9 | 3.00 | ********************************** ║ Pendulum ║ 9 | 3.00 | ********************************** ║ Iron Maiden ║ 8 | 2.67 | ****************************** ║ The xx ║ 8 | 2.67 | ****************************** ║ Nightwish ║ 7 | 2.33 | ************************** ║ Massive Attack ║ 7 | 2.33 | ************************** ║ Slow Club ║ 6 | 2.00 | *********************** ║ Avenged Sevenfold ║ 6 | 2.00 | *********************** ║ Bon Jovi ║ 6 | 2.00 | *********************** ║ Rainbow ║ 6 | 2.00 | *********************** ║ Opeth ║ 6 | 2.00 | *********************** ║ Purity Ring ║ 5 | 1.67 | ******************* ║ Moderator ║ 5 | 1.67 | ******************* ║ Dio ║ 5 | 1.67 | ******************* ║ Dance With The Dead ║ 5 | 1.67 | ******************* ║ Amaranthe ║ 5 | 1.67 | ******************* ║ Blind Guardian ║ 5 | 1.67 | ******************* ║ Flobots ║ 5 | 1.67 | ******************* ║ Meiko ║ 5 | 1.67 | ******************* ║ Judas Priest ║ 5 | 1.67 | ******************* ║ 23› listens-cleaned_year-artist_name_freq_year_freq_melted_V KEY_RESIZE redraw 80 bins •0
Tip: Pressing
<Enter>
on any row in a frequency table will open a new sheet with the rows in the source sheet that comprise the histogram bucket. In this case<Enter>
will show you which years the selected artist was in the top-20 for.
As you might have noticed by now, the ability to quickly create
frequency tables is extremely convenient. In fact, during my day-to-day work,
whenever I have to explore some data, I can very quickly do so with a
combination to Shift+F
, filtering with <Enter>
and searching with |
. It’s
almost a superpower.
Day of week with the most listens on average
Let’s find out on which day of the week I have the most listens on. My intuition is that most listens will be clustered around weekdays, since I tend to leave my music on in the background while I work.
- Create a new column containing the day of week using a Python expression
with
= datetime.datetime.fromtimestamp(listened_at).strftime("%A")
. - Rename the new column with
^
.
File Edit View Column Row Data Plot System Help Ctrl+H for help menu artist_name | release_name | track_name | listened_at#| day_of_week |> Within Temptation | The Unforgiving | Fire and Ice | 1645356049 | Sunday |… Within Temptation | The Unforgiving | Faster | 1645355785 | Sunday |… Within Temptation | The Unforgiving | In the Middle of t…| 1645355474 | Sunday |… Within Temptation | The Unforgiving | Shot in the Dark | 1645355171 | Sunday |… Within Temptation | The Unforgiving | Why Not Me | 1645355137 | Sunday |… AC/DC | Back in Black | Shoot to Thrill | 1645286392 | Saturday |… AC/DC | Back in Black | Hells Bells | 1645286079 | Saturday |… The Glitch Mob | Drink the Sea | Animus Vox | 1645174407 | Friday |… Timecop1983 | Daydreaming | Hold Me (feat. Han…| 1645165001 | Friday |… Timecop1983 | Daydreaming | Dreaming (About Yo…| 1645164652 | Friday |… Timecop1983 | Childhood Memories | Burning Hearts (bo…| 1645164354 | Friday |… Timecop1983 | Childhood Memories | Space Jam | 1645164174 | Friday |… Timecop1983 | Childhood Memories | Into the Night | 1645163921 | Friday |… Timecop1983 | Childhood Memories | Summer Love | 1645163653 | Friday |… Timecop1983 | Childhood Memories | Renegade Dreams | 1645163353 | Friday |… Timecop1983 | Childhood Memories | To the Summit | 1645163074 | Friday |… 1› listens-cleaned| l go-right 91895 rows •0
- Open a frequency table on the
day_of_week
column withShift+F
.
File Edit View Column Row Data Plot System Help Ctrl+H for help menu day_of_week ║↓count♯| percent%| histogram ~║ Sunday ║ 15761 | 17.15 | ************************************************** ║ Saturday ║ 15034 | 16.36 | *********************************************** ║ Friday ║ 13535 | 14.73 | ****************************************** ║ Monday ║ 12022 | 13.08 | ************************************** ║ Wednesday ║ 11972 | 13.03 | ************************************* ║ Thursday ║ 11865 | 12.91 | ************************************* ║ Tuesday ║ 11706 | 12.74 | ************************************* ║ 4› listens-cleaned_day_of_week_freq| KEY_RESIZE redraw 7 bins •0
Seems like my intuition was wrong. I noticeably listen to more music over the weekend. At least, this is true across listens over the 14 year period. Let’s take a look at the distribution for the more recent years (2020-2022) instead.
File Edit View Column Row Data Plot System Help Ctrl+H for help menu day_of_week ║↓count♯| percent%| histogram ~║ Wednesday ║ 3129 | 20.62 | ************************************************** ║ Thursday ║ 2552 | 16.82 | **************************************** ║ Tuesday ║ 2392 | 15.77 | ************************************** ║ Monday ║ 2375 | 15.65 | ************************************* ║ Friday ║ 2234 | 14.72 | *********************************** ║ Saturday ║ 1399 | 9.22 | ********************** ║ Sunday ║ 1091 | 7.19 | ***************** ║ 9› listens-cleaned_selectedref_day_of_week_freq| KEY_RESIZE redraw 7 bins •0
This is much more in line with what I expected. Weekdays tend to have significantly more listens. Wednesdays seem to be a strange anomaly, with considerably more listens compared to the other weekdays.
Time of day with the most listens on average
Similar to what we did for the day of week, let’s analyze during what time of day I have the most listens. In particular I’m interested to see how many of those listens happen during my work hours. I’ll only consider the years 2020-2022 for this analysis.
- Create a new column with the hour using a Python expression with
= datetime.datetime.fromtimestamp(listened_at).hour
. - Rename the new column with
^
. - Open a frequency table with
Shift+F
. - Sort the
hour
column in ascending order with[
.
File Edit View Column Row Data Plot System Help Ctrl+H for help menu ↑hour #║ count♯| percent%| histogram ~║ 0 ║ 1305 | 8.60 | ************************************* ║ 1 ║ 774 | 5.10 | ********************** ║ 2 ║ 307 | 2.02 | ******** ║ 3 ║ 81 | 0.53 | ** ║ 4 ║ 25 | 0.16 | ║ 8 ║ 5 | 0.03 | ║ 9 ║ 2 | 0.01 | ║ 10 ║ 68 | 0.45 | * ║ 11 ║ 330 | 2.18 | ********* ║ 12 ║ 458 | 3.02 | ************* ║ 13 ║ 556 | 3.66 | *************** ║ 14 ║ 1398 | 9.21 | **************************************** ║ 15 ║ 1746 | 11.51 | ************************************************** ║ 16 ║ 1615 | 10.64 | ********************************************** ║ 17 ║ 1181 | 7.78 | ********************************* ║ 18 ║ 875 | 5.77 | ************************* ║ 19 ║ 482 | 3.18 | ************* ║ 20 ║ 652 | 4.30 | ****************** ║ 21 ║ 973 | 6.41 | *************************** ║ 22 ║ 1098 | 7.24 | ******************************* ║ 23 ║ 1241 | 8.18 | *********************************** ║ 12› listens-cleaned_selectedref_hour_freq| KEY_RESIZE redraw 21 bins •0
Initially I didn’t think the frequency analysis looked that interesting, but there’s actually some insights we can draw from it.
- I typically head off to bed around 2 to 3am.
- My listens drop off during dinner time around 6 or 7pm, before ramping back up later in the night.
- Most listens begin around 2pm, after lunch time.
In closing
I hope this post showcased the usefulness of VisiData. It truly is a joy to use, being able to wrangle CSV data with a few keystrokes, as opposed to importing the data into a spreadsheet and mucking around with a GUI and formulas.
I use it quite often in my day-to-day work, for instance to analyze Sentry events. I also have a personal VisiData cheatsheet.