Exploring my listen history with VisiData

2022-03-10

Table of contents

Overview

Since 2008 I’ve been uploading (called “scrobbling”) my listens to my Last.fm profile, so I have quite the trove of data. I then sync my Last.fm listen history to ListenBrainz, mostly for data redundancy.

In this post I’ll use VisiData to explore my listen history. VisiData makes it really easy to wrangle tabular data, and is an indispensible tool in any engineer’s toolkit. My two main goals for this post are:

Showcase how VisiData can be used to manipulate and explore tabular data.
Gain more familiarity with the tool myself.

Setup

First I sync my listens from Last.fm to ListenBrainz. I do this mostly for data redundancy, but ListenBrainz also makes it convenient to export my listens. Last.fm doesn’t provide such an export service.

ListenBrainz exports into a JSON file. Once we have this file we can proceed to the next step, where we clean up the data.

Clean the data

Before we begin exploring our data, we should clean it up first. VisiData supports not just CSVs, but JSON as well, so we can open the JSON file directly. Open the file downloaded from ListenBrainz with the following command.

$ vd listens.json

Here’s what VisiData looks like.

  File  Edit  View  Column  Row  Data  Plot  System  Help     Ctrl+H for help menu                  
 track_metadata | listened_at#| recording_msid     | user_name | inserted_at#║
 {4}            |  1645359422 | 122938e1-5157-432b…| hbenjamin |  1645375360 ║
 {4}            |  1645359060 | 3e027b89-a08f-4664…| hbenjamin |  1645375360 ║
 {4}            |  1645358737 | 7f327fee-1d9e-460c…| hbenjamin |  1645375360 ║
 {4}            |  1645358456 | c2c55801-bb73-498f…| hbenjamin |  1645375360 ║
 {4}            |  1645358120 | b4eea3be-23dd-4982…| hbenjamin |  1645375360 ║
 {4}            |  1645357874 | 32b5380d-3745-45f2…| hbenjamin |  1645375360 ║
 {5}            |  1645357642 | 27cd64c9-25fb-45c6…| hbenjamin |  1645375360 ║
 {5}            |  1645357445 | 0e29105e-f268-4cbf…| hbenjamin |  1645375360 ║
 {5}            |  1645357131 | 602060cc-589e-4adb…| hbenjamin |  1645375360 ║
 {5}            |  1645356867 | 60d1968a-bda5-4749…| hbenjamin |  1645375360 ║
 {5}            |  1645356628 | c329612b-426c-4272…| hbenjamin |  1645375360 ║
 {5}            |  1645356287 | cb3c39b9-8d69-4b0a…| hbenjamin |  1645375360 ║
 {5}            |  1645356049 | 7b891d32-7de8-4d12…| hbenjamin |  1645375360 ║
 {5}            |  1645355785 | b4f66b79-d157-45d5…| hbenjamin |  1645375360 ║
 {5}            |  1645355474 | 1f81475f-0e31-4d7d…| hbenjamin |  1645375360 ║
 {5}            |  1645355171 | 3d3e1488-d6d7-4d25…| hbenjamin |  1645375360 ║
 {5}            |  1645355137 | 4bfafb45-7560-461f…| hbenjamin |  1645375360 ║
 {5}            |  1645286392 | 0511a065-fd94-4202…| hbenjamin |  1645375360 ║
 {5}            |  1645286079 | 5cff0e56-299a-4c85…| hbenjamin |  1645375360 ║
 {5}            |  1645174407 | aedf1436-d310-4934…| hbenjamin |  1645375360 ║
 {5}            |  1645165001 | b11944f4-9987-470b…| hbenjamin |  1645375360 ║
 {5}            |  1645164652 | dfa87939-e1d0-4713…| hbenjamin |  1645375360 ║
1› listens| saul.pw/VisiData v2.8 | opening /tmp/listens.json as json | Someo         94081 rows  •0

I immediately noticed that about 1500 records have a malformed listened_at value.

  File  Edit  View  Column  Row  Data  Plot  System  Help     Ctrl+H for help menu                  
 track_metadata | listened_at#| recording_msid     | user_name | inserted_at#║
 {5}            |          12 | 7ee0fbe2-3e13-4dd2…| hbenjamin |  1595522511 ║
 {5}            |          11 | cec1b686-8a57-438f…| hbenjamin |  1595522511 ║
 {5}            |          10 | cec1b686-8a57-438f…| hbenjamin |  1595522511 ║
 {5}            |           9 | 24cd6cdb-a089-449f…| hbenjamin |  1595522511 ║
 {5}            |           8 | 93fbf716-2e82-48cf…| hbenjamin |  1595522511 ║
 {5}            |           7 | 9af6db9c-8ec8-49dd…| hbenjamin |  1595522511 ║
 {5}            |           6 | 9af6db9c-8ec8-49dd…| hbenjamin |  1595522511 ║
 {5}            |           5 | 1a8ee0ac-6bf6-4614…| hbenjamin |  1595522511 ║
 {4}            |           4 | 51e187d2-d463-44a6…| hbenjamin |  1595522511 ║
 {5}            |           3 | 8d8a4f12-d713-4fff…| hbenjamin |  1595522511 ║
 {5}            |           2 | 3b6a31bd-3c28-4757…| hbenjamin |  1595522511 ║
 {5}            |           1 | 3b6a31bd-3c28-4757…| hbenjamin |  1595522511 ║
1› listens|                                                           j   go-down     94081 rows  •0

It’s supposed to be a unix timestamp, but these records have an autoincrementing integer from 1 to about 1500 instead. We’ll remove these records.

Select valid unix timestamps using a Python expression with z| listened_at > 100000.
Open the selection in new sheet with ".

Next we’ll remove some uninteresting tracks. I often listen to an internet radio station called soma.fm, which has the occasional ad break. These ads end up being registered in my listening history.

Expand the track_metadata column with (.
Select the track_metadata.artist_name column.
Create a frequency table with Shift+F, this will open a new sheet with a histogram.
Select rows in the frequency table containing “soma” with |soma.
Open a new sheet with ".

  File  Edit  View  Column  Row  Data  Plot  System  Help     Ctrl+H for help menu                  
 track_metadata.artist_name ║ count♯| percent%| histogram                                         ~║
 Somafm Id                  ║   116 |    0.12 | *                                                  ║
 SomaFM                     ║    97 |    0.10 | *                                                  ║
 SomaFM Secret Agent        ║    67 |    0.07 | *                                                  ║
 SomaFM DropIn              ║    34 |    0.04 |                                                    ║
 Soma Fm Spy Numbers        ║    18 |    0.02 |                                                    ║
 SomaFM Secret Agent Sfx    ║    14 |    0.01 |                                                    ║
 SomaFM Secret Agent Dropin ║     9 |    0.01 |                                                    ║
 SomaFM Begathon            ║     9 |    0.01 |                                                    ║
 New SomaFM ID              ║     2 |    0.00 |                                                    ║
 Soma Sonic                 ║     2 |    0.00 |                                                    ║
 SomaFM Xmas in Frisko      ║     1 |    0.00 |                                                    ║
 SomaFM Xmas Dropin         ║     1 |    0.00 |                                                    ║
3› listens_track_metadata.artist_name_freq_selectedref|       KEY_RESIZE   redraw        12 bins  •0

From this we can see that there are about 300 of these tracks.

Select all rows in the frequency table with gs.
Go back to the original sheet with q. Notice that the rows in the original sheet that were in the histogram buckets are now also selected.
Invert the selection with gt. Now we are selecting every row except the soma.fm ads.
Create a new sheet with ".

Next we’ll rename some of the columns. Since we opened a JSON file with nested objects, our sheet contains many columns prefixed with track_metadata.. Let’s start by flattening the sheet.

Expand the nested columns using (.
Open the Columns sheet with Shift+C.

  File  Edit  View  Column  Row  Data  Plot  System  Help     Ctrl+H for help menu                  
 name                                               ║ width#| height#| type | fmtstr | value       >
 track_metadata                                     ║     0 |      1 |      |        | {4}          
 track_metadata.artist_name                         ║    63 |      1 |      |        | Rudimental/A…
 track_metadata.release_name                        ║    29 |      1 |      |        | Home (Deluxe…
 track_metadata.additional_info                     ║     0 |      1 |      |        | {5}          
 track_metadata.additional_info.artist_msid         ║    44 |      1 |      |        | 1193820a-2fa…
 track_metadata.additional_info.release_msid        ║    45 |      1 |      |        | c2e8775c-722…
 track_metadata.additional_info.listening_from      ║    47 |      1 |      |        | lastfm
 track_metadata.additional_info.recording_mbid      ║    47 |      1 |      |        |
 track_metadata.additional_info.recording_msid      ║    47 |      1 |      |        | c2c55801-bb7…
 track_metadata.additional_info.lastfm_artist_mbid  ║    51 |      1 |      |        |
 track_metadata.additional_info.lastfm_release_mbid ║    52 |      1 |      |        |
 track_metadata.additional_info.lastfm_track_mbid   ║    50 |      1 |      |        | 1fd04c2e-940…
 track_metadata.track_name                          ║    69 |      1 |      |        | Hell Could F…
 track_metadata.mbid_mapping                        ║    29 |      1 |      |        |
 listened_at                                        ║    13 |      1 | int  |        | 1645358456
 recording_msid                                     ║    38 |      1 |      |        | c2c55801-bb7…
 user_name                                          ║    11 |      1 |      |        | hbenjamin
 inserted_at                                        ║    13 |      1 | int  |        | 1645375360
Shift+C› listens_selectedref_selectedref_columns|                  j   go-down        18 columns  •0

Let’s start by removing the track_metadata. prefix from these columns. We can do this one by one by using e to edit each cell, but let’s do it in bulk with a search-and-replace instead.

Select columns containing track_metadata. with | track_metadata\..
Search and replace selected values with g* track_metadata\./ (g* accepts input in the form <regex>/<substitution>).

  File  Edit  View  Column  Row  Data  Plot  System  Help     Ctrl+H for help menu                  
 name                                               ║ width#| height#| type | fmtstr | value       >
 track_metadata                                     ║     0 |      1 |      |        | {4}          
 artist_name                                        ║    63 |      1 |      |        | Rudimental/A…
 release_name                                       ║    29 |      1 |      |        | Home (Deluxe…
 additional_info                                    ║     0 |      1 |      |        | {5}          
 additional_info.artist_msid                        ║    44 |      1 |      |        | 1193820a-2fa…
 additional_info.release_msid                       ║    45 |      1 |      |        | c2e8775c-722…
 additional_info.listening_from                     ║    47 |      1 |      |        | lastfm       
 additional_info.recording_mbid                     ║    47 |      1 |      |        |
 additional_info.recording_msid                     ║    47 |      1 |      |        | c2c55801-bb7…
 additional_info.lastfm_artist_mbid                 ║    51 |      1 |      |        |
 additional_info.lastfm_release_mbid                ║    52 |      1 |      |        |
 additional_info.lastfm_track_mbid                  ║    50 |      1 |      |        | 1fd04c2e-940…
 track_name                                         ║    69 |      1 |      |        | Hell Could F…
 mbid_mapping                                       ║    29 |      1 |      |        |
 listened_at                                        ║    13 |      1 | int  |        | 1645358456
 recording_msid                                     ║    38 |      1 |      |        | c2c55801-bb7…
 user_name                                          ║    11 |      1 |      |        | hbenjamin
 inserted_at                                        ║    13 |      1 | int  |        | 1645375360
Shift+C› listens_selectedref_selectedref_columns|              j   go-down        18 columns  [M] •0

Previously we’ve been looking at the Columns sheet. Let’s take a look at the Statistics sheet now. It’s similar to the Columns sheet, but has some basic statistics about the values in each column.

On the main sheet, go to the Statistics sheet with Shift+I.

  File  Edit  View  Column  Row  Data  Plot  System  Help     Ctrl+H for help menu                  
 column                              ║ errors♯| nulls♯| distinct♯| mode~| min~| max~| sum | median~>
 artist_name                         ║      0 |     0 |     5940 | E.S.…|     |     |     |        |
 release_name                        ║      0 | 12065 |      874 | Cart…|     |     |     |        |
 additional_info.artist_msid         ║      0 |     0 |     5971 | e665…|     |     |     |        |
 additional_info.release_msid        ║      0 | 12065 |      926 | 17b5…|     |     |     |        |
 additional_info.listening_from      ║      0 |     0 |        1 | last…|     |     |     |        |
 additional_info.recording_mbid      ║      0 | 24986 |     7658 | 00bd…|     |     |     |        |
 additional_info.recording_msid      ║      0 |     0 |    17970 | cd48…|     |     |     |        |
 additional_info.lastfm_artist_mbid  ║      0 | 23291 |     1794 | 391c…|     |     |     |        |
 additional_info.lastfm_release_mbid ║      0 | 23241 |      739 | 977f…|     |     |     |        |
 additional_info.lastfm_track_mbid   ║      0 | 81746 |     4123 | 006c…|     |     |     |        |
 track_name                          ║      0 |     0 |    14544 | Intro|     |     |     |        |
 mbid_mapping                        ║  84524 |  8933 |        1 |     !|     |     |     |        |
 listened_at                         ║      0 |     0 |    93447 | 1624…| 1   | 164…| 12…#| 140665…|
 recording_msid                      ║      0 |     0 |    17970 | cd48…|     |     |     |        |
 user_name                           ║      0 |     0 |        1 | hben…|     |     |     |        |
 inserted_at                         ║      0 |     0 |     1638 | 1642…| 159…| 164…| 14…#| 159552…|


11› listens_selectedref_selectedref_describe|                      j   go-down        16 columns  •0

Here we can see some interesting information, like how many distinct artists, albums and tracks I’ve listened to. We’ll do more analysis later, for now I’d like to draw your eye to columns with only 1 distinct value.

Columns with only 1 distinct value tend not to be interesting. The column listened_from that contains only the value lastfm. The user_name column contains only my username. Let’s delete these columns.

Return to the Columns sheet with Shift+C.
Delete the listened_from and user_name columns with d.

Let’s expand all our remaining columns. You’ll notice that we have quite a few ID columns, either MSID or MBID. MSIDs are MessyBrainz IDs whereas MBIDs are MusicBrainz IDs. In short, MSIDs represent unclean data, and MBIDs represent clean data.

  File  Edit  View  Column  Row  Data  Plot  System  Help     Ctrl+H for help menu                  
 name                                ║ width#| height#| type | fmtstr | value              | expr  >
 additional_info.artist_msid         ║    38 |      1 |      |        | 548b0bb6-80b6-43c5…| additi…
 additional_info.release_msid        ║    38 |      1 |      |        | c2e8775c-722f-4cff…| additi…
 additional_info.recording_msid      ║    38 |      1 |      |        | 27cd64c9-25fb-45c6…| additi…
 recording_msid                      ║    38 |      1 |      |        | 27cd64c9-25fb-45c6…| record…
 additional_info.lastfm_track_mbid   ║    38 |      1 |      |        | 18b9ade6-6791-4b27…| additi…
 additional_info.lastfm_artist_mbid  ║    38 |      1 |      |        | b09ced80-aa65-43c1…| additi…
 mbid_mapping                        ║     0 |      1 |      |        | {3}                | mbid_m…
 mbid_mapping.recording_mbid         ║    38 |      1 |      |        | ddf2e8df-f671-4975…|
 mbid_mapping.release_mbid           ║    38 |      1 |      |        | ec6d908f-cf79-4225…|
 mbid_mapping.artist_mbids           ║     0 |      1 |      |        | [1]                |        
 mbid_mapping.artist_mbids[0]        ║    38 |      1 |      |        | b09ced80-aa65-43c1…|
 mbid_mapping.artist_mbids[1]        ║    38 |      1 |      |        |                    |
 mbid_mapping.artist_mbids[2]        ║    38 |      1 |      |        |                    |
 mbid_mapping.artist_mbids[3]        ║    38 |      1 |      |        |                    |
 mbid_mapping.artist_mbids[4]        ║    38 |      1 |      |        |                    |
 additional_info.lastfm_release_mbid ║    38 |      1 |      |        |                    | additi…
 additional_info.recording_mbid      ║    32 |      1 |      |        |                    | additi…
4› listens-cleaned_columns_selectedref|                    KEY_RESIZE   redraw        17 columns  •0

Let’s get rid of the MSIDs since these represent unclean data. We’ll also get rid of all the fields populated by Last.fm and only use those populated by MusicBrainz. Basically, we’ll just keep the ID fields mbid_mapping.artist_mbids[0], mbid_mapping.release_mbid, mbid_mapping.recording_mbid. We’ll also remove the mbid_mapping. prefix.

These are the columns we have now.

  File  Edit  View  Column  Row  Data  Plot  System  Help     Ctrl+H for help menu                  
 name           ║ width#| height#| type | fmtstr | value              | expr | aggregators ║
 artist_name    ║    20 |      1 |      |        | Rudimental/Becky H…| 0   #|             ║
 release_name   ║    20 |      1 |      |        | Home (Deluxe Editi…| 1   #|             ║
 track_name     ║    20 |      1 |      |        | Powerless          | 2   #|             ║
 listened_at    ║    13 |      1 |      |        | 1645359422         | 3   #|             ║
 recording_mbid ║    20 |      1 |      |        |                    | 4   #|             ║
 release_mbid   ║    20 |      1 |      |        |                    | 5   #|             ║
 artist_mbid    ║    38 |      1 |      |        |                    | 6   #|             ║

Shift+C› listens-cleaned_columns|                                    k   go-up         7 columns  •0

Looks good so far. Save it with Ctrl+S. VisiData will prompt for the output filepath. We’ll save it as a CSV file now instead of JSON. Call it listens-cleaned.csv.

Artists with the most listens

Let’s start with something simple: finding the artists with the most listens. If you followed the section where we cleaned up the data, you might already have an idea of how to accomplish this.

Select the artist_name column
Open a frequency table with Shift+F

  File  Edit  View  Column  Row  Data  Plot  System  Help     Ctrl+H for help menu                 
 artist_name        ║↓count♯| percent%| histogram                                         ~║
 E.S. Posthumus     ║  2963 |    3.22 | ************************************************** ║
 Queen              ║  2762 |    3.01 | **********************************************     ║
 Black Sabbath      ║  2562 |    2.79 | *******************************************        ║
 Nightwish          ║  1694 |    1.84 | ****************************                       ║
 Within Temptation  ║  1674 |    1.82 | ****************************                       ║
 Amon Amarth        ║  1631 |    1.77 | ***************************                        ║
 Flobots            ║  1613 |    1.76 | ***************************                        ║
 Opeth              ║  1594 |    1.73 | **************************                         ║
 Pendulum           ║  1556 |    1.69 | **************************                         ║
 Florence + the Mac…║  1449 |    1.58 | ************************                           ║
 The xx             ║  1405 |    1.53 | ***********************                            ║
 Delain             ║  1377 |    1.50 | ***********************                            ║
 Pantera            ║  1286 |    1.40 | *********************                              ║
 Iron Maiden        ║  1276 |    1.39 | *********************                              ║
 Bon Jovi           ║  1224 |    1.33 | ********************                               ║
 Epica              ║  1158 |    1.26 | *******************                                ║
 Massive Attack     ║  1128 |    1.23 | *******************                                ║
 Rainbow            ║  1119 |    1.22 | ******************                                 ║
 Dark Tranquillity  ║  1065 |    1.16 | *****************                                  ║
 AC/DC              ║  1052 |    1.14 | *****************                                  ║
 Metallica          ║   967 |    1.05 | ****************                                   ║
 Avenged Sevenfold  ║   955 |    1.04 | ****************                                   ║
 Krewella           ║   952 |    1.04 | ****************                                   ║
 Judas Priest       ║   904 |    0.98 | ***************                                    ║
2› listens-cleaned_artist_name_freq|                                  Shift+F         5940 bins  •0

It’s that simple! From the frequency table we can also hit Enter on any row, VisiData will show us all the listens from that artist.

The frequency table is also just a sheet like any other, so you can filter, search, and manipulate this sheet however you wish.

Albums with the most listens

Let’s find the top albums next. We could simply do the same as the top artists and use Shift+F on the release_name field, but this doesn’t account for different artists with the same album name. Instead we can use key columns to tell VisiData what the unique key should be when generating the histogram.

Select the artist_name column and use ! to mark it as a key column. The column will turn blue to indicate it is now a key column.
Select the release_name column and use ! to mark it as a key column.
Open a histogram with gF.

  File  Edit  View  Column  Row  Data  Plot  System  Help     Ctrl+H for help menu                 
 artist_name        | release_name       ║↓count♯| percent%| histogram                             
 E.S. Posthumus     | Cartographer       ║  1603 |    1.74 | *************************************…
 The xx             | xx                 ║  1307 |    1.42 | *************************************…
 Bon Jovi           | Crossroad          ║  1034 |    1.13 | ********************************
 Florence + the Mac…| Lungs              ║  1025 |    1.12 | *******************************
 Queen              | Greatest Hits I    ║   978 |    1.06 | ******************************
 Queen              | Greatest Hits II   ║   916 |    1.00 | ****************************
 Amon Amarth        | With Oden On Our S…║   908 |    0.99 | ****************************
 Dio                | Holy Diver         ║   895 |    0.97 | ***************************
 Queen              | Greatest Hits III  ║   866 |    0.94 | ***************************
 E.S. Posthumus     | Unearthed          ║   850 |    0.92 | **************************
 Flobots            | Fight With Tools   ║   846 |    0.92 | **************************
 Pendulum           | Hold Your Colour   ║   834 |    0.91 | **************************
 Within Temptation  | Silent Force       ║   775 |    0.84 | ************************
 Flobots            | Survival Story     ║   766 |    0.83 | ***********************
 Grimes             | Visions            ║   644 |    0.70 | ********************
 Pantera            | Cowboys From Hell  ║   614 |    0.67 | *******************
 Portishead         | Dummy              ║   585 |    0.64 | ******************
 Delain             | Lucidity           ║   565 |    0.61 | *****************
 Within Temptation  | The Heart of Every…║   538 |    0.59 | ****************
 Pendulum           | In Sillico         ║   532 |    0.58 | ****************
 Opeth              | Ghost Reveries     ║   519 |    0.56 | ****************
 E.S. Posthumus     | Makara             ║   510 |    0.55 | ***************
 Delain             | April Rain         ║   495 |    0.54 | ***************
 OceanLab           | Sirens of the Sea  ║   487 |    0.53 | ***************
3› listens-cleaned_artist_name-release_name_freq|                    gShift+F         6608 bins  •0

Tip: Many VisiData commands can be prefixed with g for a “global” version. For instance, s selects a single row while gs selects all rows. Shift+F opens a frequency table on the selected column, while gF opens one on the key columns.

Top artists for each year

Let’s find which were my most listened-to artists for each given year. To do so we need to derive a year column from our listened_at column, which is currently just a unix timestamp.

Set the listened_at column’s data type to integer with #.
Create a new column using a Python expression using = datetime.datetime.fromtimestamp(listened_at).year
Set the new column’s data type to integer with #
Rename the new column with ^. Name it year.

We should have something that looks like this:

  File  Edit  View  Column  Row  Data  Plot  System  Help     Ctrl+H for help menu                 
 artist_name               | release_name          | track_name | listened_at       #| year#| reco>
 Within Temptation         | The Unforgiving       | In the Mid…|         1645355474 | 2022 | 20fb…
 Within Temptation         | The Unforgiving       | Shot in th…|         1645355171 | 2022 | bbbe…
 Within Temptation         | The Unforgiving       | Why Not Me |         1645355137 | 2022 | e97d…
 AC/DC                     | Back in Black         | Shoot to T…|         1645286392 | 2022 | 994b…
 AC/DC                     | Back in Black         | Hells Bells|         1645286079 | 2022 | c7c2…
1› listens-cleaned|                                                 l   go-right     91895 rows  •0

To see the artists with the most listens per year, we can use a pivot table.

Mark the artist_name column as a key column with !.
Mark the year column as a key column with !.
Select the year column and pivot the table around it with Shift+W.

We should now have something that looks like this:

  File  Edit  View  Column  Row  Data  Plot  System  Help     Ctrl+H for help menu                 
 artist_name               ║↓2022#| 2021#| 2020#| 2019#| 2018#| 2017#| 2016#| 2015#| 2014#| 2013#|>
 Celestial Aeon Project    ║  126 |    0 |    0 |    0 |    0 |    0 |    0 |    0 |    0 |    0 |0
 Black Sabbath             ║   86 |    0 |    0 |    0 |    0 |    0 |    0 |    0 |    0 |    0 |0
 Purity Ring               ║   40 |    0 |    0 |    0 |    0 |    0 |    0 |    0 |    0 |    0 |0
 Nightwish                 ║   36 |    0 |    0 |    0 |    0 |    0 |    0 |    0 |    0 |    0 |0
 Queen                     ║   33 |    0 |    0 |    0 |    0 |    0 |    0 |    0 |    0 |    0 |0
 Moderator                 ║   29 |    0 |    0 |    0 |    0 |    0 |    0 |    0 |    0 |    0 |0
 Dio                       ║   24 |    0 |    0 |    0 |    0 |    0 |    0 |    0 |    0 |    0 |0
 Ludovico Einaudi          ║   22 |    0 |    0 |    0 |    0 |    0 |    0 |    0 |    0 |    0 |0
 Iron Maiden               ║   22 |    0 |    0 |    0 |    0 |    0 |    0 |    0 |    0 |    0 |0
 Blackmore's Night         ║   21 |    0 |    0 |    0 |    0 |    0 |    0 |    0 |    0 |    0 |0
 Scar Symmetry             ║   21 |    0 |    0 |    0 |    0 |    0 |    0 |    0 |    0 |    0 |0
 Within Temptation         ║   20 |    0 |    0 |    0 |    0 |    0 |    0 |    0 |    0 |    0 |0
 DragonForce               ║   20 |    0 |    0 |    0 |    0 |    0 |    0 |    0 |    0 |    0 |0
 Massive Attack            ║   20 |    0 |    0 |    0 |    0 |    0 |    0 |    0 |    0 |    0 |0
28› listens-cleaned_pivot_year_count|                KEY_RESIZE   redraw      8264 grouped rows  •0

To view the top artists for different years, simply select the year and use ] to sort in descending order. Here’s my top artists five years ago in 2017.

  File  Edit  View  Column  Row  Data  Plot  System  Help     Ctrl+H for help menu                 
 artist_name               ║ 2022#| 2021#| 2020#| 2019#| 2018#|↓2017#| 2016#| 2015#| 2014#| 2013#|>
 E.S. Posthumus            ║    0 |    0 |    0 |    0 |    0 |  240 |    0 |    0 |    0 |    0 |0
 Celestial Aeon Project    ║    0 |    0 |    0 |    0 |    0 |  200 |    0 |    0 |    0 |    0 |0
 Queen                     ║    0 |    0 |    0 |    0 |    0 |  188 |    0 |    0 |    0 |    0 |0
 Amon Amarth               ║    0 |    0 |    0 |    0 |    0 |  186 |    0 |    0 |    0 |    0 |0
 Amaranthe                 ║    0 |    0 |    0 |    0 |    0 |  168 |    0 |    0 |    0 |    0 |0
 Florence + the Machine    ║    0 |    0 |    0 |    0 |    0 |  158 |    0 |    0 |    0 |    0 |0
 Within Temptation         ║    0 |    0 |    0 |    0 |    0 |  145 |    0 |    0 |    0 |    0 |0
 Avenged Sevenfold         ║    0 |    0 |    0 |    0 |    0 |  143 |    0 |    0 |    0 |    0 |0
 Black Sabbath             ║    0 |    0 |    0 |    0 |    0 |  133 |    0 |    0 |    0 |    0 |0
 Carly Rae Jepsen          ║    0 |    0 |    0 |    0 |    0 |  127 |    0 |    0 |    0 |    0 |0
 Nightwish                 ║    0 |    0 |    0 |    0 |    0 |  102 |    0 |    0 |    0 |    0 |0
 Iron Maiden               ║    0 |    0 |    0 |    0 |    0 |  101 |    0 |    0 |    0 |    0 |0
 Arch Enemy                ║    0 |    0 |    0 |    0 |    0 |   96 |    0 |    0 |    0 |    0 |0
 Epica                     ║    0 |    0 |    0 |    0 |    0 |   93 |    0 |    0 |    0 |    0 |0
28› listens-cleaned_pivot_year_count|                      ]   sort-desc      8264 grouped rows  •0

The same pivot approach can be used to find the top albums for each year. Make both the artist_name and release_name key columns, then pivot on the year column again.

Top 20 artists for each year

Let’s do something similar to the previous section, but this time we select the top 20 artists for each year. The goal is to have a sheet with 21 columns: 1 year column and 20 topN columns.

| year | top1 | top2 | top3 | ... | top20 |

We’ll start by generating a histogram.

Mark the year and artist_name columns as key columns with !
Create a frequency table on our key columns with gF
Sort the year column in descending order with ].
Sort the count column as well. This time we add it to existing sort criteria with z].

Now we are sorting the frequency table by year descending, then count descending.

  File  Edit  View  Column  Row  Data  Plot  System  Help     Ctrl+H for help menu                 
↓year              #| artist_name        ║ count♯| percent%| histogram                             
               2022 | Celestial Aeon Pro…║   126 |    0.14 | *********                             
               2022 | Black Sabbath      ║    86 |    0.09 | ******
               2022 | Purity Ring        ║    40 |    0.04 | **
               2022 | Nightwish          ║    36 |    0.04 | **
               2022 | Queen              ║    33 |    0.04 | **
               2022 | Moderator          ║    29 |    0.03 | **
               2022 | Dio                ║    24 |    0.03 | *
               2022 | Ludovico Einaudi   ║    22 |    0.02 | *
               2022 | Iron Maiden        ║    22 |    0.02 | *
2› listens-cleaned_year-artist_name_freq|                    KEY_RESIZE   redraw      8718 bins  •0

Next we’ll flatten the sheet to get our desired column layout. To do so we’ll utilize frequency tables again, but this time with a list aggregator applied to the artist_name column.

Apply an aggregator on the artist_name column with +. Type list or press Ctrl+X for an interactive menu.
Create a frequency table on the year column with Shift+F.

With the list aggregator on the artist_name column, when VisiData creates the frequency table, it also aggregates the values of that column into a list. The frequency table looks like this.

  File  Edit  View  Column  Row  Data  Plot  System  Help     Ctrl+H for help menu                 
 year              #║ count♯| artist_name_list ║
               2022 ║   335 | [335]            ║
               2021 ║  1537 | [1537]           ║
               2020 ║  1129 | [1129]           ║
               2019 ║    24 | [24]             ║
6› listens-cleaned_year-artist_name_freq_year_freq|          KEY_RESIZE   redraw        15 bins  •0

The artist_name_list column is an array, so can be expanded with (. The sheet now looks like this.

  File  Edit  View  Column  Row  Data  Plot  System  Help     Ctrl+H for help menu                 
 year#║ count♯| artist_name_list[0]    | artist_name_list[1]    | artist_name_list[2] | artist_nam>
 2022 ║   335 | Celestial Aeon Project | Black Sabbath          | Purity Ring         | Nightwish  
 2021 ║  1537 | Nightwish              | Epica                  | Within Temptation   | ABBA
 2020 ║  1129 | Nightwish              | Within Temptation      | Black Sabbath       | Moderator
 2019 ║    24 | Dance With The Dead    | The xx                 | Bhelliom            | Avenged Se…
12› listens-cleaned_year-artist_name_freq_year_freq|         KEY_RESIZE   redraw        15 bins  •0

We only want the top 20, so let’s use a Python expression to extract just the first 20 elements from this array.

Collapse the artist_name_list column again with ).
Create a new column using a Python expression with = artist_name_list[:20].
Expand the new column with (.

The new columns look like:

  File  Edit  View  Column  Row  Data  Plot  System  Help     Ctrl+H for help menu                 
 year#║ artist_name_list[:20][0] | artist_name_list[:20][1] | artist_name_list[:20][2] | artist_na>
 2022 ║ Celestial Aeon Project   | Black Sabbath            | Purity Ring              | Nightwish 
 2021 ║ Nightwish                | Epica                    | Within Temptation        | ABBA
 2020 ║ Nightwish                | Within Temptation        | Black Sabbath            | Moderator
 2019 ║ Dance With The Dead      | The xx                   | Bhelliom                 | Avenged S…
12› listens-cleaned_year-artist_name_freq_year_freq|                   k   go-up        15 bins  •0

They’re not very nicely named, so let’s clean it up. We can rename it one-by-one, but I’ll just hack out something in Python to do so.

Open the Columns sheet with Shift+C.
Select the columns using a search with | artist_name_list.
Rename the columns using a Python expression with g= 'top' + str(int(name.split('][')[-1][:-1]) + 1). Not the cleanest, but it’ll work.

Now we have the sheet in our desired format. We can see from a quick glance there are some artists that consistently appear in my top 20 over the years.

  File  Edit  View  Column  Row  Data  Plot  System  Help     Ctrl+H for help menu                                               
 year#║ top1                     | top2                     | top3                     | top4                     | top5        >
 2022 ║ Celestial Aeon Project   | Black Sabbath            | Purity Ring              | Nightwish                | Queen        
 2021 ║ Nightwish                | Epica                    | Within Temptation        | ABBA                     | Pantera
 2020 ║ Nightwish                | Within Temptation        | Black Sabbath            | Moderator                | ABBA
 2019 ║ Dance With The Dead      | The xx                   | Bhelliom                 | Avenged Sevenfold        | Timecop1983
 2018 ║ Dance With The Dead      | Carly Rae Jepsen         | Blackmore's Night        | Slow Club                | Queen
 2017 ║ E.S. Posthumus           | Celestial Aeon Project   | Queen                    | Amon Amarth              | Amaranthe
 2016 ║ E.S. Posthumus           | The xx                   | Black Sabbath            | Grimes                   | Massive Atta…
 2015 ║ Queen                    | The Glitch Mob           | Ludovico Einaudi         | E.S. Posthumus           | Rainbow
 2014 ║ Krewella                 | Queen                    | E.S. Posthumus           | Grimes                   | Within Tempt…
 2013 ║ Krewella                 | E.S. Posthumus           | Queen                    | Purity Ring              | Flobots
 2012 ║ Queen                    | E.S. Posthumus           | Dragonette               | Flobots                  | Florence + t…
 2011 ║ Flobots                  | Pendulum                 | Black Sabbath            | The xx                   | Queen
 2010 ║ Black Sabbath            | Florence + the Machine   | Flobots                  | E.S. Posthumus           | Overkill
 2009 ║ Black Sabbath            | Opeth                    | E.S. Posthumus           | AC/DC                    | Queen
 2008 ║ Opeth                    | Amon Amarth              | Dark Tranquillity        | Iron Maiden              | Judas Priest

12› listens-cleaned_year-artist_name_freq_year_freq|                                                 k   go-up        15 bins  •0

Artists with the most staying power over the years

In the previous section we found the top 20 artists for each year. Let’s extend from those results and find the artists that have appeared the most in my 14 years of listens.

Melt the sheet with Shift+M. A melt is an inverse of a pivot.

The sheet now looks like this.

  File  Edit  View  Column  Row  Data  Plot  System  Help     Ctrl+H for help menu                 
 year#║ Variable | Value                   ║
 2022 ║ top1     | Celestial Aeon Project  ║
 2022 ║ top2     | Black Sabbath           ║
 2022 ║ top3     | Purity Ring             ║
 2022 ║ top4     | Nightwish               ║
 2022 ║ top5     | Queen                   ║
 2022 ║ top6     | Moderator               ║
 2022 ║ top7     | Dio                     ║
 2022 ║ top8     | Ludovico Einaudi        ║
 2022 ║ top9     | Iron Maiden             ║
 2022 ║ top10    | Blackmore's Night       ║
 2022 ║ top11    | Scar Symmetry           ║
 2022 ║ top12    | Within Temptation       ║
 2022 ║ top13    | DragonForce             ║
 2022 ║ top14    | Massive Attack          ║
 2022 ║ top15    | Ensiferum               ║
 2022 ║ top16    | Amon Amarth             ║
15› listens-cleaned_year-artist_name_freq_year_freq_melted| h   go-left       300 melted values  •0

By default when VisiData does a melt, it adds the max aggregator to the Value column. Let’s first remove it by going into the Columns sheet with Shift+C and using e to edit the aggregators cell for the Value column.

Now we can simply create a frequency table to get our results.

Select the Value column.
Create a frequency table with Shift+F.

  File  Edit  View  Column  Row  Data  Plot  System  Help     Ctrl+H for help menu                 
 Value                   ║↓count♯| percent%| histogram                                         ~║
 Black Sabbath           ║    13 |    4.33 | ************************************************** ║
 Queen                   ║    12 |    4.00 | **********************************************     ║
 Amon Amarth             ║    11 |    3.67 | ******************************************         ║
 Florence + the Machine  ║    10 |    3.33 | **************************************             ║
 E.S. Posthumus          ║    10 |    3.33 | **************************************             ║
 Within Temptation       ║     9 |    3.00 | **********************************                 ║
 Delain                  ║     9 |    3.00 | **********************************                 ║
 Pendulum                ║     9 |    3.00 | **********************************                 ║
 Iron Maiden             ║     8 |    2.67 | ******************************                     ║
 The xx                  ║     8 |    2.67 | ******************************                     ║
 Nightwish               ║     7 |    2.33 | **************************                         ║
 Massive Attack          ║     7 |    2.33 | **************************                         ║
 Slow Club               ║     6 |    2.00 | ***********************                            ║
 Avenged Sevenfold       ║     6 |    2.00 | ***********************                            ║
 Bon Jovi                ║     6 |    2.00 | ***********************                            ║
 Rainbow                 ║     6 |    2.00 | ***********************                            ║
 Opeth                   ║     6 |    2.00 | ***********************                            ║
 Purity Ring             ║     5 |    1.67 | *******************                                ║
 Moderator               ║     5 |    1.67 | *******************                                ║
 Dio                     ║     5 |    1.67 | *******************                                ║
 Dance With The Dead     ║     5 |    1.67 | *******************                                ║
 Amaranthe               ║     5 |    1.67 | *******************                                ║
 Blind Guardian          ║     5 |    1.67 | *******************                                ║
 Flobots                 ║     5 |    1.67 | *******************                                ║
 Meiko                   ║     5 |    1.67 | *******************                                ║
 Judas Priest            ║     5 |    1.67 | *******************                                ║
23› listens-cleaned_year-artist_name_freq_year_freq_melted_V KEY_RESIZE   redraw        80 bins  •0

Tip: Pressing <Enter> on any row in a frequency table will open a new sheet with the rows in the source sheet that comprise the histogram bucket. In this case <Enter> will show you which years the selected artist was in the top-20 for.

As you might have noticed by now, the ability to quickly create frequency tables is extremely convenient. In fact, during my day-to-day work, whenever I have to explore some data, I can very quickly do so with a combination to Shift+F, filtering with <Enter> and searching with |. It’s almost a superpower.

Day of week with the most listens on average

Let’s find out on which day of the week I have the most listens on. My intuition is that most listens will be clustered around weekdays, since I tend to leave my music on in the background while I work.

Create a new column containing the day of week using a Python expression with = datetime.datetime.fromtimestamp(listened_at).strftime("%A").
Rename the new column with ^.

  File  Edit  View  Column  Row  Data  Plot  System  Help     Ctrl+H for help menu                 
 artist_name        | release_name       | track_name         | listened_at#| day_of_week        |>
 Within Temptation  | The Unforgiving    | Fire and Ice       |  1645356049 | Sunday             |…
 Within Temptation  | The Unforgiving    | Faster             |  1645355785 | Sunday             |…
 Within Temptation  | The Unforgiving    | In the Middle of t…|  1645355474 | Sunday             |…
 Within Temptation  | The Unforgiving    | Shot in the Dark   |  1645355171 | Sunday             |…
 Within Temptation  | The Unforgiving    | Why Not Me         |  1645355137 | Sunday             |…
 AC/DC              | Back in Black      | Shoot to Thrill    |  1645286392 | Saturday           |…
 AC/DC              | Back in Black      | Hells Bells        |  1645286079 | Saturday           |…
 The Glitch Mob     | Drink the Sea      | Animus Vox         |  1645174407 | Friday             |…
 Timecop1983        | Daydreaming        | Hold Me (feat. Han…|  1645165001 | Friday             |…
 Timecop1983        | Daydreaming        | Dreaming (About Yo…|  1645164652 | Friday             |…
 Timecop1983        | Childhood Memories | Burning Hearts (bo…|  1645164354 | Friday             |…
 Timecop1983        | Childhood Memories | Space Jam          |  1645164174 | Friday             |…
 Timecop1983        | Childhood Memories | Into the Night     |  1645163921 | Friday             |…
 Timecop1983        | Childhood Memories | Summer Love        |  1645163653 | Friday             |…
 Timecop1983        | Childhood Memories | Renegade Dreams    |  1645163353 | Friday             |…
 Timecop1983        | Childhood Memories | To the Summit      |  1645163074 | Friday             |…
1› listens-cleaned|                                                 l   go-right     91895 rows  •0

Open a frequency table on the day_of_week column with Shift+F.

  File  Edit  View  Column  Row  Data  Plot  System  Help     Ctrl+H for help menu                 
 day_of_week        ║↓count♯| percent%| histogram                                         ~║
 Sunday             ║ 15761 |   17.15 | ************************************************** ║
 Saturday           ║ 15034 |   16.36 | ***********************************************    ║
 Friday             ║ 13535 |   14.73 | ******************************************         ║
 Monday             ║ 12022 |   13.08 | **************************************             ║
 Wednesday          ║ 11972 |   13.03 | *************************************              ║
 Thursday           ║ 11865 |   12.91 | *************************************              ║
 Tuesday            ║ 11706 |   12.74 | *************************************              ║
4› listens-cleaned_day_of_week_freq|                         KEY_RESIZE   redraw         7 bins  •0

Seems like my intuition was wrong. I noticeably listen to more music over the weekend. At least, this is true across listens over the 14 year period. Let’s take a look at the distribution for the more recent years (2020-2022) instead.

  File  Edit  View  Column  Row  Data  Plot  System  Help     Ctrl+H for help menu                 
 day_of_week        ║↓count♯| percent%| histogram                                         ~║
 Wednesday          ║  3129 |   20.62 | ************************************************** ║
 Thursday           ║  2552 |   16.82 | ****************************************           ║
 Tuesday            ║  2392 |   15.77 | **************************************             ║
 Monday             ║  2375 |   15.65 | *************************************              ║
 Friday             ║  2234 |   14.72 | ***********************************                ║
 Saturday           ║  1399 |    9.22 | **********************                             ║
 Sunday             ║  1091 |    7.19 | *****************                                  ║
9› listens-cleaned_selectedref_day_of_week_freq|             KEY_RESIZE   redraw         7 bins  •0

This is much more in line with what I expected. Weekdays tend to have significantly more listens. Wednesdays seem to be a strange anomaly, with considerably more listens compared to the other weekdays.

Time of day with the most listens on average

Similar to what we did for the day of week, let’s analyze during what time of day I have the most listens. In particular I’m interested to see how many of those listens happen during my work hours. I’ll only consider the years 2020-2022 for this analysis.

Create a new column with the hour using a Python expression with = datetime.datetime.fromtimestamp(listened_at).hour.
Rename the new column with ^.
Open a frequency table with Shift+F.
Sort the hour column in ascending order with [.

  File  Edit  View  Column  Row  Data  Plot  System  Help     Ctrl+H for help menu                 
↑hour    #║ count♯| percent%| histogram                                         ~║
        0 ║  1305 |    8.60 | *************************************              ║
        1 ║   774 |    5.10 | **********************                             ║
        2 ║   307 |    2.02 | ********                                           ║
        3 ║    81 |    0.53 | **                                                 ║
        4 ║    25 |    0.16 |                                                    ║
        8 ║     5 |    0.03 |                                                    ║
        9 ║     2 |    0.01 |                                                    ║
       10 ║    68 |    0.45 | *                                                  ║
       11 ║   330 |    2.18 | *********                                          ║
       12 ║   458 |    3.02 | *************                                      ║
       13 ║   556 |    3.66 | ***************                                    ║
       14 ║  1398 |    9.21 | ****************************************           ║
       15 ║  1746 |   11.51 | ************************************************** ║
       16 ║  1615 |   10.64 | **********************************************     ║
       17 ║  1181 |    7.78 | *********************************                  ║
       18 ║   875 |    5.77 | *************************                          ║
       19 ║   482 |    3.18 | *************                                      ║
       20 ║   652 |    4.30 | ******************                                 ║
       21 ║   973 |    6.41 | ***************************                        ║
       22 ║  1098 |    7.24 | *******************************                    ║
       23 ║  1241 |    8.18 | ***********************************                ║

12› listens-cleaned_selectedref_hour_freq|                   KEY_RESIZE   redraw        21 bins  •0

Initially I didn’t think the frequency analysis looked that interesting, but there’s actually some insights we can draw from it.

I typically head off to bed around 2 to 3am.
My listens drop off during dinner time around 6 or 7pm, before ramping back up later in the night.
Most listens begin around 2pm, after lunch time.

In closing

I hope this post showcased the usefulness of VisiData. It truly is a joy to use, being able to wrangle CSV data with a few keystrokes, as opposed to importing the data into a spreadsheet and mucking around with a GUI and formulas.

I use it quite often in my day-to-day work, for instance to analyze Sentry events. I also have a personal VisiData cheatsheet.