Exploring my listen history with VisiData

2022-03-10
Table of contents

Overview

Since 2008 I’ve been uploading (called “scrobbling”) my listens to my Last.fm profile, so I have quite the trove of data. I then sync my Last.fm listen history to ListenBrainz, mostly for data redundancy.

In this post I’ll use VisiData to explore my listen history. VisiData makes it really easy to wrangle tabular data, and is an indispensible tool in any engineer’s toolkit. My two main goals for this post are:

  1. Showcase how VisiData can be used to manipulate and explore tabular data.
  2. Gain more familiarity with the tool myself.

Setup

First I sync my listens from Last.fm to ListenBrainz. I do this mostly for data redundancy, but ListenBrainz also makes it convenient to export my listens. Last.fm doesn’t provide such an export service.

ListenBrainz exports into a JSON file. Once we have this file we can proceed to the next step, where we clean up the data.

Clean the data

Before we begin exploring our data, we should clean it up first. VisiData supports not just CSVs, but JSON as well, so we can open the JSON file directly. Open the file downloaded from ListenBrainz with the following command.

$ vd listens.json

Here’s what VisiData looks like.

  File  Edit  View  Column  Row  Data  Plot  System  Help     Ctrl+H for help menu                  
 track_metadata | listened_at#| recording_msid     | user_name | inserted_at#
 {4}            |  1645359422 | 122938e1-5157-432b…| hbenjamin |  1645375360 ║
 {4}            |  1645359060 | 3e027b89-a08f-4664…| hbenjamin |  1645375360 
 {4}            |  1645358737 | 7f327fee-1d9e-460c…| hbenjamin |  1645375360 
 {4}            |  1645358456 | c2c55801-bb73-498f…| hbenjamin |  1645375360 
 {4}            |  1645358120 | b4eea3be-23dd-4982…| hbenjamin |  1645375360 
 {4}            |  1645357874 | 32b5380d-3745-45f2…| hbenjamin |  1645375360 
 {5}            |  1645357642 | 27cd64c9-25fb-45c6…| hbenjamin |  1645375360 
 {5}            |  1645357445 | 0e29105e-f268-4cbf…| hbenjamin |  1645375360 
 {5}            |  1645357131 | 602060cc-589e-4adb…| hbenjamin |  1645375360 
 {5}            |  1645356867 | 60d1968a-bda5-4749…| hbenjamin |  1645375360 
 {5}            |  1645356628 | c329612b-426c-4272…| hbenjamin |  1645375360 
 {5}            |  1645356287 | cb3c39b9-8d69-4b0a…| hbenjamin |  1645375360 
 {5}            |  1645356049 | 7b891d32-7de8-4d12…| hbenjamin |  1645375360 
 {5}            |  1645355785 | b4f66b79-d157-45d5…| hbenjamin |  1645375360 
 {5}            |  1645355474 | 1f81475f-0e31-4d7d…| hbenjamin |  1645375360 
 {5}            |  1645355171 | 3d3e1488-d6d7-4d25…| hbenjamin |  1645375360 
 {5}            |  1645355137 | 4bfafb45-7560-461f…| hbenjamin |  1645375360 
 {5}            |  1645286392 | 0511a065-fd94-4202…| hbenjamin |  1645375360 
 {5}            |  1645286079 | 5cff0e56-299a-4c85…| hbenjamin |  1645375360 
 {5}            |  1645174407 | aedf1436-d310-4934…| hbenjamin |  1645375360 
 {5}            |  1645165001 | b11944f4-9987-470b…| hbenjamin |  1645375360 
 {5}            |  1645164652 | dfa87939-e1d0-4713…| hbenjamin |  1645375360 
1› listens| saul.pw/VisiData v2.8 | opening /tmp/listens.json as json | Someo         94081 rows  •0

I immediately noticed that about 1500 records have a malformed listened_at value.

  File  Edit  View  Column  Row  Data  Plot  System  Help     Ctrl+H for help menu                  
 track_metadata | listened_at#| recording_msid     | user_name | inserted_at#
 {5}            |          12 | 7ee0fbe2-3e13-4dd2…| hbenjamin |  1595522511 
 {5}            |          11 | cec1b686-8a57-438f…| hbenjamin |  1595522511 
 {5}            |          10 | cec1b686-8a57-438f…| hbenjamin |  1595522511 
 {5}            |           9 | 24cd6cdb-a089-449f…| hbenjamin |  1595522511 
 {5}            |           8 | 93fbf716-2e82-48cf…| hbenjamin |  1595522511 
 {5}            |           7 | 9af6db9c-8ec8-49dd…| hbenjamin |  1595522511 
 {5}            |           6 | 9af6db9c-8ec8-49dd…| hbenjamin |  1595522511 
 {5}            |           5 | 1a8ee0ac-6bf6-4614…| hbenjamin |  1595522511 
 {4}            |           4 | 51e187d2-d463-44a6…| hbenjamin |  1595522511 
 {5}            |           3 | 8d8a4f12-d713-4fff…| hbenjamin |  1595522511 
 {5}            |           2 | 3b6a31bd-3c28-4757…| hbenjamin |  1595522511 
 {5}            |           1 | 3b6a31bd-3c28-4757…| hbenjamin |  1595522511 ║
1› listens|                                                           j   go-down     94081 rows  •0

It’s supposed to be a unix timestamp, but these records have an autoincrementing integer from 1 to about 1500 instead. We’ll remove these records.

  1. Select valid unix timestamps using a Python expression with z| listened_at > 100000.
  2. Open the selection in new sheet with ".

Next we’ll remove some uninteresting tracks. I often listen to an internet radio station called soma.fm, which has the occasional ad break. These ads end up being registered in my listening history.

  1. Expand the track_metadata column with (.
  2. Select the track_metadata.artist_name column.
  3. Create a frequency table with Shift+F, this will open a new sheet with a histogram.
  4. Select rows in the frequency table containing “soma” with |soma.
  5. Open a new sheet with ".
  File  Edit  View  Column  Row  Data  Plot  System  Help     Ctrl+H for help menu                  
 track_metadata.artist_name  count♯| percent%| histogram                                         ~
 Somafm Id                  ║   116 |    0.12 | *                                                  ║
 SomaFM                         97 |    0.10 | *                                                  
 SomaFM Secret Agent            67 |    0.07 | *                                                  
 SomaFM DropIn                  34 |    0.04 |                                                    
 Soma Fm Spy Numbers            18 |    0.02 |                                                    
 SomaFM Secret Agent Sfx        14 |    0.01 |                                                    
 SomaFM Secret Agent Dropin      9 |    0.01 |                                                    
 SomaFM Begathon                 9 |    0.01 |                                                    
 New SomaFM ID                   2 |    0.00 |                                                    
 Soma Sonic                      2 |    0.00 |                                                    
 SomaFM Xmas in Frisko           1 |    0.00 |                                                    
 SomaFM Xmas Dropin              1 |    0.00 |                                                    
3› listens_track_metadata.artist_name_freq_selectedref|       KEY_RESIZE   redraw        12 bins  •0

From this we can see that there are about 300 of these tracks.

  1. Select all rows in the frequency table with gs.
  2. Go back to the original sheet with q. Notice that the rows in the original sheet that were in the histogram buckets are now also selected.
  3. Invert the selection with gt. Now we are selecting every row except the soma.fm ads.
  4. Create a new sheet with ".

Next we’ll rename some of the columns. Since we opened a JSON file with nested objects, our sheet contains many columns prefixed with track_metadata.. Let’s start by flattening the sheet.

  1. Expand the nested columns using (.
  2. Open the Columns sheet with Shift+C.
  File  Edit  View  Column  Row  Data  Plot  System  Help     Ctrl+H for help menu                  
 name                                                width#| height#| type | fmtstr | value       >
 track_metadata                                     ║     0 |      1 |      |        | {4}          
 track_metadata.artist_name                         ║    63 |      1 |      |        | Rudimental/A…
 track_metadata.release_name                            29 |      1 |      |        | Home (Deluxe…
 track_metadata.additional_info                     ║     0 |      1 |      |        | {5}          
 track_metadata.additional_info.artist_msid             44 |      1 |      |        | 1193820a-2fa…
 track_metadata.additional_info.release_msid            45 |      1 |      |        | c2e8775c-722…
 track_metadata.additional_info.listening_from          47 |      1 |      |        | lastfm
 track_metadata.additional_info.recording_mbid          47 |      1 |      |        |
 track_metadata.additional_info.recording_msid          47 |      1 |      |        | c2c55801-bb7…
 track_metadata.additional_info.lastfm_artist_mbid      51 |      1 |      |        |
 track_metadata.additional_info.lastfm_release_mbid     52 |      1 |      |        |
 track_metadata.additional_info.lastfm_track_mbid       50 |      1 |      |        | 1fd04c2e-940…
 track_metadata.track_name                              69 |      1 |      |        | Hell Could F…
 track_metadata.mbid_mapping                            29 |      1 |      |        |
 listened_at                                            13 |      1 | int  |        | 1645358456
 recording_msid                                         38 |      1 |      |        | c2c55801-bb7…
 user_name                                              11 |      1 |      |        | hbenjamin
 inserted_at                                            13 |      1 | int  |        | 1645375360
Shift+C› listens_selectedref_selectedref_columns|                  j   go-down        18 columns  •0

Let’s start by removing the track_metadata. prefix from these columns. We can do this one by one by using e to edit each cell, but let’s do it in bulk with a search-and-replace instead.

  1. Select columns containing track_metadata. with | track_metadata\..
  2. Search and replace selected values with g* track_metadata\./ (g* accepts input in the form <regex>/<substitution>).
  File  Edit  View  Column  Row  Data  Plot  System  Help     Ctrl+H for help menu                  
 name                                                width#| height#| type | fmtstr | value       >
 track_metadata                                     ║     0 |      1 |      |        | {4}          
 artist_name                                            63 |      1 |      |        | Rudimental/A…
 release_name                                           29 |      1 |      |        | Home (Deluxe…
 additional_info                                    ║     0 |      1 |      |        | {5}          
 additional_info.artist_msid                            44 |      1 |      |        | 1193820a-2fa…
 additional_info.release_msid                           45 |      1 |      |        | c2e8775c-722…
 additional_info.listening_from                     ║    47 |      1 |      |        | lastfm       
 additional_info.recording_mbid                         47 |      1 |      |        |
 additional_info.recording_msid                         47 |      1 |      |        | c2c55801-bb7…
 additional_info.lastfm_artist_mbid                     51 |      1 |      |        |
 additional_info.lastfm_release_mbid                    52 |      1 |      |        |
 additional_info.lastfm_track_mbid                      50 |      1 |      |        | 1fd04c2e-940…
 track_name                                             69 |      1 |      |        | Hell Could F…
 mbid_mapping                                           29 |      1 |      |        |
 listened_at                                            13 |      1 | int  |        | 1645358456
 recording_msid                                         38 |      1 |      |        | c2c55801-bb7…
 user_name                                              11 |      1 |      |        | hbenjamin
 inserted_at                                            13 |      1 | int  |        | 1645375360
Shift+C› listens_selectedref_selectedref_columns|              j   go-down        18 columns  [M] •0

Previously we’ve been looking at the Columns sheet. Let’s take a look at the Statistics sheet now. It’s similar to the Columns sheet, but has some basic statistics about the values in each column.

  1. On the main sheet, go to the Statistics sheet with Shift+I.
  File  Edit  View  Column  Row  Data  Plot  System  Help     Ctrl+H for help menu                  
 column                               errors♯| nulls♯| distinct♯| mode~| min~| max~| sum | median~>
 artist_name                               0 |     0 |     5940 | E.S.…|     |     |     |        |
 release_name                              0 | 12065 |      874 | Cart…|     |     |     |        |
 additional_info.artist_msid               0 |     0 |     5971 | e665…|     |     |     |        |
 additional_info.release_msid              0 | 12065 |      926 | 17b5…|     |     |     |        |
 additional_info.listening_from      ║      0 |     0 |        1 | last…|     |     |     |        |
 additional_info.recording_mbid            0 | 24986 |     7658 | 00bd…|     |     |     |        |
 additional_info.recording_msid            0 |     0 |    17970 | cd48…|     |     |     |        |
 additional_info.lastfm_artist_mbid        0 | 23291 |     1794 | 391c…|     |     |     |        |
 additional_info.lastfm_release_mbid       0 | 23241 |      739 | 977f…|     |     |     |        |
 additional_info.lastfm_track_mbid         0 | 81746 |     4123 | 006c…|     |     |     |        |
 track_name                                0 |     0 |    14544 | Intro|     |     |     |        |
 mbid_mapping                          84524 |  8933 |        1 |     !|     |     |     |        |
 listened_at                               0 |     0 |    93447 | 1624…| 1   | 164…| 12…#| 140665…|
 recording_msid                            0 |     0 |    17970 | cd48…|     |     |     |        |
 user_name                                 0 |     0 |        1 | hben…|     |     |     |        |
 inserted_at                               0 |     0 |     1638 | 1642…| 159…| 164…| 14…#| 159552…|


11› listens_selectedref_selectedref_describe|                      j   go-down        16 columns  •0

Here we can see some interesting information, like how many distinct artists, albums and tracks I’ve listened to. We’ll do more analysis later, for now I’d like to draw your eye to columns with only 1 distinct value.

Columns with only 1 distinct value tend not to be interesting. The column listened_from that contains only the value lastfm. The user_name column contains only my username. Let’s delete these columns.

  1. Return to the Columns sheet with Shift+C.
  2. Delete the listened_from and user_name columns with d.

Let’s expand all our remaining columns. You’ll notice that we have quite a few ID columns, either MSID or MBID. MSIDs are MessyBrainz IDs whereas MBIDs are MusicBrainz IDs. In short, MSIDs represent unclean data, and MBIDs represent clean data.

  File  Edit  View  Column  Row  Data  Plot  System  Help     Ctrl+H for help menu                  
 name                                 width#| height#| type | fmtstr | value              | expr  >
 additional_info.artist_msid         ║    38 |      1 |      |        | 548b0bb6-80b6-43c5…| additi…
 additional_info.release_msid            38 |      1 |      |        | c2e8775c-722f-4cff…| additi…
 additional_info.recording_msid          38 |      1 |      |        | 27cd64c9-25fb-45c6…| additi…
 recording_msid                          38 |      1 |      |        | 27cd64c9-25fb-45c6…| record…
 additional_info.lastfm_track_mbid       38 |      1 |      |        | 18b9ade6-6791-4b27…| additi…
 additional_info.lastfm_artist_mbid      38 |      1 |      |        | b09ced80-aa65-43c1…| additi…
 mbid_mapping                        ║     0 |      1 |      |        | {3}                | mbid_m…
 mbid_mapping.recording_mbid             38 |      1 |      |        | ddf2e8df-f671-4975…|
 mbid_mapping.release_mbid               38 |      1 |      |        | ec6d908f-cf79-4225…|
 mbid_mapping.artist_mbids           ║     0 |      1 |      |        | [1]                |        
 mbid_mapping.artist_mbids[0]            38 |      1 |      |        | b09ced80-aa65-43c1…|
 mbid_mapping.artist_mbids[1]            38 |      1 |      |        |                    |
 mbid_mapping.artist_mbids[2]            38 |      1 |      |        |                    |
 mbid_mapping.artist_mbids[3]            38 |      1 |      |        |                    |
 mbid_mapping.artist_mbids[4]            38 |      1 |      |        |                    |
 additional_info.lastfm_release_mbid     38 |      1 |      |        |                    | additi…
 additional_info.recording_mbid          32 |      1 |      |        |                    | additi…
4› listens-cleaned_columns_selectedref|                    KEY_RESIZE   redraw        17 columns  •0

Let’s get rid of the MSIDs since these represent unclean data. We’ll also get rid of all the fields populated by Last.fm and only use those populated by MusicBrainz. Basically, we’ll just keep the ID fields mbid_mapping.artist_mbids[0], mbid_mapping.release_mbid, mbid_mapping.recording_mbid. We’ll also remove the mbid_mapping. prefix.

These are the columns we have now.

  File  Edit  View  Column  Row  Data  Plot  System  Help     Ctrl+H for help menu                  
 name            width#| height#| type | fmtstr | value              | expr | aggregators 
 artist_name    ║    20 |      1 |      |        | Rudimental/Becky H…| 0   #|             ║
 release_name       20 |      1 |      |        | Home (Deluxe Editi…| 1   #|             
 track_name         20 |      1 |      |        | Powerless          | 2   #|             
 listened_at        13 |      1 |      |        | 1645359422         | 3   #|             
 recording_mbid     20 |      1 |      |        |                    | 4   #|             
 release_mbid       20 |      1 |      |        |                    | 5   #|             
 artist_mbid        38 |      1 |      |        |                    | 6   #|             

Shift+C› listens-cleaned_columns|                                    k   go-up         7 columns  •0

Looks good so far. Save it with Ctrl+S. VisiData will prompt for the output filepath. We’ll save it as a CSV file now instead of JSON. Call it listens-cleaned.csv.

Artists with the most listens

Let’s start with something simple: finding the artists with the most listens. If you followed the section where we cleaned up the data, you might already have an idea of how to accomplish this.

  1. Select the artist_name column
  2. Open a frequency table with Shift+F
  File  Edit  View  Column  Row  Data  Plot  System  Help     Ctrl+H for help menu                 
 artist_name        ↓count♯| percent%| histogram                                         ~
 E.S. Posthumus     ║  2963 |    3.22 | ************************************************** ║
 Queen                2762 |    3.01 | **********************************************     
 Black Sabbath        2562 |    2.79 | *******************************************        
 Nightwish            1694 |    1.84 | ****************************                       
 Within Temptation    1674 |    1.82 | ****************************                       
 Amon Amarth          1631 |    1.77 | ***************************                        
 Flobots              1613 |    1.76 | ***************************                        
 Opeth                1594 |    1.73 | **************************                         
 Pendulum             1556 |    1.69 | **************************                         
 Florence + the Mac…  1449 |    1.58 | ************************                           
 The xx               1405 |    1.53 | ***********************                            
 Delain               1377 |    1.50 | ***********************                            
 Pantera              1286 |    1.40 | *********************                              
 Iron Maiden          1276 |    1.39 | *********************                              
 Bon Jovi             1224 |    1.33 | ********************                               
 Epica                1158 |    1.26 | *******************                                
 Massive Attack       1128 |    1.23 | *******************                                
 Rainbow              1119 |    1.22 | ******************                                 
 Dark Tranquillity    1065 |    1.16 | *****************                                  
 AC/DC                1052 |    1.14 | *****************                                  
 Metallica             967 |    1.05 | ****************                                   
 Avenged Sevenfold     955 |    1.04 | ****************                                   
 Krewella              952 |    1.04 | ****************                                   
 Judas Priest          904 |    0.98 | ***************                                    
2› listens-cleaned_artist_name_freq|                                  Shift+F         5940 bins  •0

It’s that simple! From the frequency table we can also hit Enter on any row, VisiData will show us all the listens from that artist.

The frequency table is also just a sheet like any other, so you can filter, search, and manipulate this sheet however you wish.

Albums with the most listens

Let’s find the top albums next. We could simply do the same as the top artists and use Shift+F on the release_name field, but this doesn’t account for different artists with the same album name. Instead we can use key columns to tell VisiData what the unique key should be when generating the histogram.

  1. Select the artist_name column and use ! to mark it as a key column. The column will turn blue to indicate it is now a key column.
  2. Select the release_name column and use ! to mark it as a key column.
  3. Open a histogram with gF.
  File  Edit  View  Column  Row  Data  Plot  System  Help     Ctrl+H for help menu                 
 artist_name        | release_name       ↓count♯| percent%| histogram                             
 E.S. Posthumus     | Cartographer       ║  1603 |    1.74 | *************************************…
 The xx             | xx                   1307 |    1.42 | *************************************…
 Bon Jovi           | Crossroad            1034 |    1.13 | ********************************
 Florence + the Mac…| Lungs                1025 |    1.12 | *******************************
 Queen              | Greatest Hits I       978 |    1.06 | ******************************
 Queen              | Greatest Hits II      916 |    1.00 | ****************************
 Amon Amarth        | With Oden On Our S…   908 |    0.99 | ****************************
 Dio                | Holy Diver            895 |    0.97 | ***************************
 Queen              | Greatest Hits III     866 |    0.94 | ***************************
 E.S. Posthumus     | Unearthed             850 |    0.92 | **************************
 Flobots            | Fight With Tools      846 |    0.92 | **************************
 Pendulum           | Hold Your Colour      834 |    0.91 | **************************
 Within Temptation  | Silent Force          775 |    0.84 | ************************
 Flobots            | Survival Story        766 |    0.83 | ***********************
 Grimes             | Visions               644 |    0.70 | ********************
 Pantera            | Cowboys From Hell     614 |    0.67 | *******************
 Portishead         | Dummy                 585 |    0.64 | ******************
 Delain             | Lucidity              565 |    0.61 | *****************
 Within Temptation  | The Heart of Every…   538 |    0.59 | ****************
 Pendulum           | In Sillico            532 |    0.58 | ****************
 Opeth              | Ghost Reveries        519 |    0.56 | ****************
 E.S. Posthumus     | Makara                510 |    0.55 | ***************
 Delain             | April Rain            495 |    0.54 | ***************
 OceanLab           | Sirens of the Sea     487 |    0.53 | ***************
3› listens-cleaned_artist_name-release_name_freq|                    gShift+F         6608 bins  •0

Tip: Many VisiData commands can be prefixed with g for a “global” version. For instance, s selects a single row while gs selects all rows. Shift+F opens a frequency table on the selected column, while gF opens one on the key columns.

Top artists for each year

Let’s find which were my most listened-to artists for each given year. To do so we need to derive a year column from our listened_at column, which is currently just a unix timestamp.

  1. Set the listened_at column’s data type to integer with #.
  2. Create a new column using a Python expression using = datetime.datetime.fromtimestamp(listened_at).year
  3. Set the new column’s data type to integer with #
  4. Rename the new column with ^. Name it year.

We should have something that looks like this:

  File  Edit  View  Column  Row  Data  Plot  System  Help     Ctrl+H for help menu                 
 artist_name               | release_name          | track_name | listened_at       #| year#| reco>
 Within Temptation         | The Unforgiving       | In the Mid…|         1645355474 | 2022 | 20fb…
 Within Temptation         | The Unforgiving       | Shot in th…|         1645355171 | 2022 | bbbe…
 Within Temptation         | The Unforgiving       | Why Not Me |         1645355137 | 2022 | e97d…
 AC/DC                     | Back in Black         | Shoot to T…|         1645286392 | 2022 | 994b…
 AC/DC                     | Back in Black         | Hells Bells|         1645286079 | 2022 | c7c2…
1› listens-cleaned|                                                 l   go-right     91895 rows  •0

To see the artists with the most listens per year, we can use a pivot table.

  1. Mark the artist_name column as a key column with !.
  2. Mark the year column as a key column with !.
  3. Select the year column and pivot the table around it with Shift+W.

We should now have something that looks like this:

  File  Edit  View  Column  Row  Data  Plot  System  Help     Ctrl+H for help menu                 
 artist_name               ↓2022#| 2021#| 2020#| 2019#| 2018#| 2017#| 2016#| 2015#| 2014#| 2013#|>
 Celestial Aeon Project    ║  126 |    0 |    0 |    0 |    0 |    0 |    0 |    0 |    0 |    0 |0
 Black Sabbath                86 |    0 |    0 |    0 |    0 |    0 |    0 |    0 |    0 |    0 |0
 Purity Ring                  40 |    0 |    0 |    0 |    0 |    0 |    0 |    0 |    0 |    0 |0
 Nightwish                    36 |    0 |    0 |    0 |    0 |    0 |    0 |    0 |    0 |    0 |0
 Queen                        33 |    0 |    0 |    0 |    0 |    0 |    0 |    0 |    0 |    0 |0
 Moderator                    29 |    0 |    0 |    0 |    0 |    0 |    0 |    0 |    0 |    0 |0
 Dio                          24 |    0 |    0 |    0 |    0 |    0 |    0 |    0 |    0 |    0 |0
 Ludovico Einaudi             22 |    0 |    0 |    0 |    0 |    0 |    0 |    0 |    0 |    0 |0
 Iron Maiden                  22 |    0 |    0 |    0 |    0 |    0 |    0 |    0 |    0 |    0 |0
 Blackmore's Night            21 |    0 |    0 |    0 |    0 |    0 |    0 |    0 |    0 |    0 |0
 Scar Symmetry                21 |    0 |    0 |    0 |    0 |    0 |    0 |    0 |    0 |    0 |0
 Within Temptation            20 |    0 |    0 |    0 |    0 |    0 |    0 |    0 |    0 |    0 |0
 DragonForce                  20 |    0 |    0 |    0 |    0 |    0 |    0 |    0 |    0 |    0 |0
 Massive Attack               20 |    0 |    0 |    0 |    0 |    0 |    0 |    0 |    0 |    0 |0
28› listens-cleaned_pivot_year_count|                KEY_RESIZE   redraw      8264 grouped rows  •0

To view the top artists for different years, simply select the year and use ] to sort in descending order. Here’s my top artists five years ago in 2017.

  File  Edit  View  Column  Row  Data  Plot  System  Help     Ctrl+H for help menu                 
 artist_name                2022#| 2021#| 2020#| 2019#| 2018#|↓2017#| 2016#| 2015#| 2014#| 2013#|>
 E.S. Posthumus            ║    0 |    0 |    0 |    0 |    0 |  240 |    0 |    0 |    0 |    0 |0
 Celestial Aeon Project        0 |    0 |    0 |    0 |    0 |  200 |    0 |    0 |    0 |    0 |0
 Queen                         0 |    0 |    0 |    0 |    0 |  188 |    0 |    0 |    0 |    0 |0
 Amon Amarth                   0 |    0 |    0 |    0 |    0 |  186 |    0 |    0 |    0 |    0 |0
 Amaranthe                     0 |    0 |    0 |    0 |    0 |  168 |    0 |    0 |    0 |    0 |0
 Florence + the Machine        0 |    0 |    0 |    0 |    0 |  158 |    0 |    0 |    0 |    0 |0
 Within Temptation             0 |    0 |    0 |    0 |    0 |  145 |    0 |    0 |    0 |    0 |0
 Avenged Sevenfold             0 |    0 |    0 |    0 |    0 |  143 |    0 |    0 |    0 |    0 |0
 Black Sabbath                 0 |    0 |    0 |    0 |    0 |  133 |    0 |    0 |    0 |    0 |0
 Carly Rae Jepsen              0 |    0 |    0 |    0 |    0 |  127 |    0 |    0 |    0 |    0 |0
 Nightwish                     0 |    0 |    0 |    0 |    0 |  102 |    0 |    0 |    0 |    0 |0
 Iron Maiden                   0 |    0 |    0 |    0 |    0 |  101 |    0 |    0 |    0 |    0 |0
 Arch Enemy                    0 |    0 |    0 |    0 |    0 |   96 |    0 |    0 |    0 |    0 |0
 Epica                         0 |    0 |    0 |    0 |    0 |   93 |    0 |    0 |    0 |    0 |0
28› listens-cleaned_pivot_year_count|                      ]   sort-desc      8264 grouped rows  •0

The same pivot approach can be used to find the top albums for each year. Make both the artist_name and release_name key columns, then pivot on the year column again.

Top 20 artists for each year

Let’s do something similar to the previous section, but this time we select the top 20 artists for each year. The goal is to have a sheet with 21 columns: 1 year column and 20 topN columns.

| year | top1 | top2 | top3 | ... | top20 |

We’ll start by generating a histogram.

  1. Mark the year and artist_name columns as key columns with !
  2. Create a frequency table on our key columns with gF
  3. Sort the year column in descending order with ].
  4. Sort the count column as well. This time we add it to existing sort criteria with z].

Now we are sorting the frequency table by year descending, then count descending.

  File  Edit  View  Column  Row  Data  Plot  System  Help     Ctrl+H for help menu                 
↓year              #| artist_name         count♯| percent%| histogram                             
               2022 | Celestial Aeon Pro…║   126 |    0.14 | *********                             
               2022 | Black Sabbath          86 |    0.09 | ******
               2022 | Purity Ring            40 |    0.04 | **
               2022 | Nightwish              36 |    0.04 | **
               2022 | Queen                  33 |    0.04 | **
               2022 | Moderator              29 |    0.03 | **
               2022 | Dio                    24 |    0.03 | *
               2022 | Ludovico Einaudi       22 |    0.02 | *
               2022 | Iron Maiden            22 |    0.02 | *
2› listens-cleaned_year-artist_name_freq|                    KEY_RESIZE   redraw      8718 bins  •0

Next we’ll flatten the sheet to get our desired column layout. To do so we’ll utilize frequency tables again, but this time with a list aggregator applied to the artist_name column.

  1. Apply an aggregator on the artist_name column with +. Type list or press Ctrl+X for an interactive menu.
  2. Create a frequency table on the year column with Shift+F.

With the list aggregator on the artist_name column, when VisiData creates the frequency table, it also aggregates the values of that column into a list. The frequency table looks like this.

  File  Edit  View  Column  Row  Data  Plot  System  Help     Ctrl+H for help menu                 
 year              # count♯| artist_name_list 
               2022 ║   335 | [335]            ║
               2021   1537 | [1537]           
               2020   1129 | [1129]           
               2019     24 | [24]             
6› listens-cleaned_year-artist_name_freq_year_freq|          KEY_RESIZE   redraw        15 bins  •0

The artist_name_list column is an array, so can be expanded with (. The sheet now looks like this.

  File  Edit  View  Column  Row  Data  Plot  System  Help     Ctrl+H for help menu                 
 year# count♯| artist_name_list[0]    | artist_name_list[1]    | artist_name_list[2] | artist_nam>
 2022 ║   335 | Celestial Aeon Project | Black Sabbath          | Purity Ring         | Nightwish  
 2021   1537 | Nightwish              | Epica                  | Within Temptation   | ABBA
 2020   1129 | Nightwish              | Within Temptation      | Black Sabbath       | Moderator
 2019     24 | Dance With The Dead    | The xx                 | Bhelliom            | Avenged Se…
12› listens-cleaned_year-artist_name_freq_year_freq|         KEY_RESIZE   redraw        15 bins  •0

We only want the top 20, so let’s use a Python expression to extract just the first 20 elements from this array.

  1. Collapse the artist_name_list column again with ).
  2. Create a new column using a Python expression with = artist_name_list[:20].
  3. Expand the new column with (.

The new columns look like:

  File  Edit  View  Column  Row  Data  Plot  System  Help     Ctrl+H for help menu                 
 year# artist_name_list[:20][0] | artist_name_list[:20][1] | artist_name_list[:20][2] | artist_na>
 2022 ║ Celestial Aeon Project   | Black Sabbath            | Purity Ring              | Nightwish 
 2021  Nightwish                | Epica                    | Within Temptation        | ABBA
 2020  Nightwish                | Within Temptation        | Black Sabbath            | Moderator
 2019  Dance With The Dead      | The xx                   | Bhelliom                 | Avenged S…
12› listens-cleaned_year-artist_name_freq_year_freq|                   k   go-up        15 bins  •0

They’re not very nicely named, so let’s clean it up. We can rename it one-by-one, but I’ll just hack out something in Python to do so.

  1. Open the Columns sheet with Shift+C.
  2. Select the columns using a search with | artist_name_list.
  3. Rename the columns using a Python expression with g= 'top' + str(int(name.split('][')[-1][:-1]) + 1). Not the cleanest, but it’ll work.

Now we have the sheet in our desired format. We can see from a quick glance there are some artists that consistently appear in my top 20 over the years.

  File  Edit  View  Column  Row  Data  Plot  System  Help     Ctrl+H for help menu                                               
 year# top1                     | top2                     | top3                     | top4                     | top5        >
 2022 ║ Celestial Aeon Project   | Black Sabbath            | Purity Ring              | Nightwish                | Queen        
 2021  Nightwish                | Epica                    | Within Temptation        | ABBA                     | Pantera
 2020  Nightwish                | Within Temptation        | Black Sabbath            | Moderator                | ABBA
 2019  Dance With The Dead      | The xx                   | Bhelliom                 | Avenged Sevenfold        | Timecop1983
 2018  Dance With The Dead      | Carly Rae Jepsen         | Blackmore's Night        | Slow Club                | Queen
 2017  E.S. Posthumus           | Celestial Aeon Project   | Queen                    | Amon Amarth              | Amaranthe
 2016  E.S. Posthumus           | The xx                   | Black Sabbath            | Grimes                   | Massive Atta…
 2015  Queen                    | The Glitch Mob           | Ludovico Einaudi         | E.S. Posthumus           | Rainbow
 2014  Krewella                 | Queen                    | E.S. Posthumus           | Grimes                   | Within Tempt…
 2013  Krewella                 | E.S. Posthumus           | Queen                    | Purity Ring              | Flobots
 2012  Queen                    | E.S. Posthumus           | Dragonette               | Flobots                  | Florence + t…
 2011  Flobots                  | Pendulum                 | Black Sabbath            | The xx                   | Queen
 2010  Black Sabbath            | Florence + the Machine   | Flobots                  | E.S. Posthumus           | Overkill
 2009  Black Sabbath            | Opeth                    | E.S. Posthumus           | AC/DC                    | Queen
 2008  Opeth                    | Amon Amarth              | Dark Tranquillity        | Iron Maiden              | Judas Priest

12› listens-cleaned_year-artist_name_freq_year_freq|                                                 k   go-up        15 bins  •0

Artists with the most staying power over the years

In the previous section we found the top 20 artists for each year. Let’s extend from those results and find the artists that have appeared the most in my 14 years of listens.

  1. Melt the sheet with Shift+M. A melt is an inverse of a pivot.

The sheet now looks like this.

  File  Edit  View  Column  Row  Data  Plot  System  Help     Ctrl+H for help menu                 
 year# Variable | Value                   
 2022  top1     | Celestial Aeon Project  
 2022  top2     | Black Sabbath           
 2022  top3     | Purity Ring             
 2022  top4     | Nightwish               
 2022  top5     | Queen                   
 2022  top6     | Moderator               
 2022  top7     | Dio                     
 2022  top8     | Ludovico Einaudi        
 2022  top9     | Iron Maiden             
 2022 ║ top10    | Blackmore's Night       ║
 2022  top11    | Scar Symmetry           
 2022  top12    | Within Temptation       
 2022  top13    | DragonForce             
 2022  top14    | Massive Attack          
 2022  top15    | Ensiferum               
 2022  top16    | Amon Amarth             
15› listens-cleaned_year-artist_name_freq_year_freq_melted| h   go-left       300 melted values  •0

By default when VisiData does a melt, it adds the max aggregator to the Value column. Let’s first remove it by going into the Columns sheet with Shift+C and using e to edit the aggregators cell for the Value column.

Now we can simply create a frequency table to get our results.

  1. Select the Value column.
  2. Create a frequency table with Shift+F.
  File  Edit  View  Column  Row  Data  Plot  System  Help     Ctrl+H for help menu                 
 Value                   ↓count♯| percent%| histogram                                         ~
 Black Sabbath           ║    13 |    4.33 | ************************************************** ║
 Queen                       12 |    4.00 | **********************************************     
 Amon Amarth                 11 |    3.67 | ******************************************         
 Florence + the Machine      10 |    3.33 | **************************************             
 E.S. Posthumus              10 |    3.33 | **************************************             
 Within Temptation            9 |    3.00 | **********************************                 
 Delain                       9 |    3.00 | **********************************                 
 Pendulum                     9 |    3.00 | **********************************                 
 Iron Maiden                  8 |    2.67 | ******************************                     
 The xx                       8 |    2.67 | ******************************                     
 Nightwish                    7 |    2.33 | **************************                         
 Massive Attack               7 |    2.33 | **************************                         
 Slow Club                    6 |    2.00 | ***********************                            
 Avenged Sevenfold            6 |    2.00 | ***********************                            
 Bon Jovi                     6 |    2.00 | ***********************                            
 Rainbow                      6 |    2.00 | ***********************                            
 Opeth                        6 |    2.00 | ***********************                            
 Purity Ring                  5 |    1.67 | *******************                                
 Moderator                    5 |    1.67 | *******************                                
 Dio                          5 |    1.67 | *******************                                
 Dance With The Dead          5 |    1.67 | *******************                                
 Amaranthe                    5 |    1.67 | *******************                                
 Blind Guardian               5 |    1.67 | *******************                                
 Flobots                      5 |    1.67 | *******************                                
 Meiko                        5 |    1.67 | *******************                                
 Judas Priest                 5 |    1.67 | *******************                                
23› listens-cleaned_year-artist_name_freq_year_freq_melted_V KEY_RESIZE   redraw        80 bins  •0

Tip: Pressing <Enter> on any row in a frequency table will open a new sheet with the rows in the source sheet that comprise the histogram bucket. In this case <Enter> will show you which years the selected artist was in the top-20 for.

As you might have noticed by now, the ability to quickly create frequency tables is extremely convenient. In fact, during my day-to-day work, whenever I have to explore some data, I can very quickly do so with a combination to Shift+F, filtering with <Enter> and searching with |. It’s almost a superpower.

Day of week with the most listens on average

Let’s find out on which day of the week I have the most listens on. My intuition is that most listens will be clustered around weekdays, since I tend to leave my music on in the background while I work.

  1. Create a new column containing the day of week using a Python expression with = datetime.datetime.fromtimestamp(listened_at).strftime("%A").
  2. Rename the new column with ^.
  File  Edit  View  Column  Row  Data  Plot  System  Help     Ctrl+H for help menu                 
 artist_name        | release_name       | track_name         | listened_at#| day_of_week        |>
 Within Temptation  | The Unforgiving    | Fire and Ice       |  1645356049 | Sunday             |…
 Within Temptation  | The Unforgiving    | Faster             |  1645355785 | Sunday             |…
 Within Temptation  | The Unforgiving    | In the Middle of t…|  1645355474 | Sunday             |…
 Within Temptation  | The Unforgiving    | Shot in the Dark   |  1645355171 | Sunday             |…
 Within Temptation  | The Unforgiving    | Why Not Me         |  1645355137 | Sunday             |…
 AC/DC              | Back in Black      | Shoot to Thrill    |  1645286392 | Saturday           |…
 AC/DC              | Back in Black      | Hells Bells        |  1645286079 | Saturday           |…
 The Glitch Mob     | Drink the Sea      | Animus Vox         |  1645174407 | Friday             |…
 Timecop1983        | Daydreaming        | Hold Me (feat. Han…|  1645165001 | Friday             |…
 Timecop1983        | Daydreaming        | Dreaming (About Yo…|  1645164652 | Friday             |…
 Timecop1983        | Childhood Memories | Burning Hearts (bo…|  1645164354 | Friday             |…
 Timecop1983        | Childhood Memories | Space Jam          |  1645164174 | Friday             |…
 Timecop1983        | Childhood Memories | Into the Night     |  1645163921 | Friday             |…
 Timecop1983        | Childhood Memories | Summer Love        |  1645163653 | Friday             |…
 Timecop1983        | Childhood Memories | Renegade Dreams    |  1645163353 | Friday             |…
 Timecop1983        | Childhood Memories | To the Summit      |  1645163074 | Friday             |1› listens-cleaned|                                                 l   go-right     91895 rows  •0

  1. Open a frequency table on the day_of_week column with Shift+F.
  File  Edit  View  Column  Row  Data  Plot  System  Help     Ctrl+H for help menu                 
 day_of_week        ↓count♯| percent%| histogram                                         ~
 Sunday             ║ 15761 |   17.15 | ************************************************** ║
 Saturday            15034 |   16.36 | ***********************************************    
 Friday              13535 |   14.73 | ******************************************         
 Monday              12022 |   13.08 | **************************************             
 Wednesday           11972 |   13.03 | *************************************              
 Thursday            11865 |   12.91 | *************************************              
 Tuesday             11706 |   12.74 | *************************************              
4› listens-cleaned_day_of_week_freq|                         KEY_RESIZE   redraw         7 bins  •0

Seems like my intuition was wrong. I noticeably listen to more music over the weekend. At least, this is true across listens over the 14 year period. Let’s take a look at the distribution for the more recent years (2020-2022) instead.

  File  Edit  View  Column  Row  Data  Plot  System  Help     Ctrl+H for help menu                 
 day_of_week        ↓count♯| percent%| histogram                                         ~
 Wednesday          ║  3129 |   20.62 | ************************************************** ║
 Thursday             2552 |   16.82 | ****************************************           
 Tuesday              2392 |   15.77 | **************************************             
 Monday               2375 |   15.65 | *************************************              
 Friday               2234 |   14.72 | ***********************************                
 Saturday             1399 |    9.22 | **********************                             
 Sunday               1091 |    7.19 | *****************                                  
9› listens-cleaned_selectedref_day_of_week_freq|             KEY_RESIZE   redraw         7 bins  •0

This is much more in line with what I expected. Weekdays tend to have significantly more listens. Wednesdays seem to be a strange anomaly, with considerably more listens compared to the other weekdays.

Time of day with the most listens on average

Similar to what we did for the day of week, let’s analyze during what time of day I have the most listens. In particular I’m interested to see how many of those listens happen during my work hours. I’ll only consider the years 2020-2022 for this analysis.

  1. Create a new column with the hour using a Python expression with = datetime.datetime.fromtimestamp(listened_at).hour.
  2. Rename the new column with ^.
  3. Open a frequency table with Shift+F.
  4. Sort the hour column in ascending order with [.
  File  Edit  View  Column  Row  Data  Plot  System  Help     Ctrl+H for help menu                 
↑hour    # count♯| percent%| histogram                                         ~
        0 ║  1305 |    8.60 | *************************************              ║
        1    774 |    5.10 | **********************                             
        2    307 |    2.02 | ********                                           
        3     81 |    0.53 | **                                                 
        4     25 |    0.16 |                                                    
        8      5 |    0.03 |                                                    
        9      2 |    0.01 |                                                    
       10     68 |    0.45 | *                                                  
       11    330 |    2.18 | *********                                          
       12    458 |    3.02 | *************                                      
       13    556 |    3.66 | ***************                                    
       14   1398 |    9.21 | ****************************************           
       15   1746 |   11.51 | ************************************************** 
       16   1615 |   10.64 | **********************************************     
       17   1181 |    7.78 | *********************************                  
       18    875 |    5.77 | *************************                          
       19    482 |    3.18 | *************                                      
       20    652 |    4.30 | ******************                                 
       21    973 |    6.41 | ***************************                        
       22   1098 |    7.24 | *******************************                    
       23   1241 |    8.18 | ***********************************                

12› listens-cleaned_selectedref_hour_freq|                   KEY_RESIZE   redraw        21 bins  •0

Initially I didn’t think the frequency analysis looked that interesting, but there’s actually some insights we can draw from it.

  1. I typically head off to bed around 2 to 3am.
  2. My listens drop off during dinner time around 6 or 7pm, before ramping back up later in the night.
  3. Most listens begin around 2pm, after lunch time.

In closing

I hope this post showcased the usefulness of VisiData. It truly is a joy to use, being able to wrangle CSV data with a few keystrokes, as opposed to importing the data into a spreadsheet and mucking around with a GUI and formulas.

I use it quite often in my day-to-day work, for instance to analyze Sentry events. I also have a personal VisiData cheatsheet.