INTRO

Ever since coming across Matt Daniel's Rapper Vocabulary Chart, I've been interested in how one of my favorite rappers -- Buck 65 -- would place on there. To find that, I'll be getting as many lyrics as I can from LyricsGenius to get up to 35,000 lyrics in accordance with the original methodology:

35,000 words covers 3-5 studio albums and EPs. I included mixtapes if the artist was just short of the 35,000 words. Quite a few rappers don’t have enough official material to be included (e.g., Biggie, Kendrick Lamar). As a benchmark, I included data points for Shakespeare and Herman Melville, using the same approach (35,000 words across several plays for Shakespeare, first 35,000 of Moby Dick).

I used a research methodology called token analysis to determine each artist’s vocabulary. Each word is counted once, so pimps, pimp, pimping, and pimpin are four unique words. To avoid issues with apostrophes (e.g., pimpin’ vs. pimpin), they’re removed from the dataset. It still isn’t perfect. Hip hop is full of slang that is hard to transcribe (e.g., shorty vs. shawty), compound words (e.g., king shit), featured vocalists, and repetitive choruses.

With those lyrics, I'll be cleaning the data to remove apostrophes and (possibly) other special characters, and then using NLTK to break the lyrics into tokens and count the number of individual words.

In [1]:
import lyricsgenius
import json
import pandas as pd

secrets_file = open('secrets.json')
secrets = json.load(secrets_file)
secrets_file.close()

def make_initial_dataframe(json_file):
    f = open(json_file)
    buck_json = json.load(f)
    songs = pd.DataFrame(buck_json['songs'])

    unneeded_cols = list(songs.columns.values)

    # we only need these three values, so we drop the rest
    unneeded_cols.remove('lyrics')
    unneeded_cols.remove('title')
    unneeded_cols.remove('release_date')
    unneeded_cols.remove('album')

    songs = songs.drop(unneeded_cols, axis=1)
    songs.head()
    return songs

def find_album(album):
    return album['name']

def format_albums(songs):
    albums = songs['album']
    songs['album'] = albums.map(find_album, na_action='ignore')

    songs.head()
    return songs

def clean_lyrics(songs):
    # remove remixes
    remixes = songs['title'].str.contains('([rR]emix\)|\[Acoustic Version\])')
    songs = songs[~remixes]

    # remove newlines
    songs['lyrics'] = songs['lyrics'].str.replace('[\n\t]', ' ')
    # replace hyphens/dashes with spaces
    songs['lyrics'] = songs['lyrics'].str.replace('[-–—]', ' ')
    # remove all other punctuation
    songs['lyrics'] = songs['lyrics'].str.replace('[^a-zA-Z0-9 ]', '')

    songs['lyrics'] = songs['lyrics'].str.lower()
    return songs
In [2]:
# set up genius api access
genius = lyricsgenius.Genius(secrets['CLIENT_ACCESS_TOKEN'])
genius.remove_section_headers = True
In [3]:
# this only needs to be run if Lyrics_Buck65.json doesn't exist
# it will also take a while
# buck = genius.search_artist("Buck 65")
# buck.save_lyrics()
In [4]:
# turn JSON into pandas dataframe
songs = make_initial_dataframe("Lyrics_Buck65.json")
In [5]:
# get the album name from the JSON 
songs = format_albums(songs)
In [6]:
songs.sort_values(by='album', inplace=True)
songs.head(20)
Out[6]:
title release_date album lyrics
81 Who By Fire None 20 Odd Years And who by fire?\nWho by water?\nWho in the su...
27 Cold Steel Drum 2011-01-01 20 Odd Years I lay down for you, in black and blue\nLife he...
31 Zombie Delight 2011-02-07 20 Odd Years Zombie Delight Zombie Delight\nZombies are com...
54 She Said Yes 2011-01-01 20 Odd Years She wrote back, too alone\nA single pair of sh...
59 BCC 2011-01-01 20 Odd Years BCC the ADD\nWe don't have much time you see\n...
62 Tears Of Your Heart 2011-02-07 20 Odd Years (French singing)\n\nAge is beauty, bewildered ...
76 Final Approach 2011-01-01 20 Odd Years The sun is always shining bright, at thirty th...
8 Paper Airplane 2011-01-01 20 Odd Years Down by the lake you saw me\nAnd you knew I wa...
25 Gee Whiz None 20 Odd Years Tell me what is it is, Gee Whiz, I don't think...
5 Whispers Of The Waves 2011-01-01 20 Odd Years I am the deck, you are the sea...*\nI am the l...
101 Superstars Don’t Love None 20 Odd Years Michael Jackson died today\nCycle's action, hi...
95 Stop 2011-01-01 20 Odd Years Stop you need to listen\nJust let me show you ...
107 Joey Bats None 20 Odd Years: Volume 4 - Ostranenie Jose Baustia, also known as Joey Bats\nKnow th...
108 Dolores 2011-01-01 20 Odd Years: Volume 4 - Ostranenie All the world was black and white\nJust a soun...
127 Stupid None Boy-Girl Fight Get stupid y'all\n\nLimited Scream\nMen and wo...
163 Highway 101 None Boy-Girl Fight Snakes in the outerfield, infield vipers\nI re...
145 January 2004-01-01 Climbing Up a Mountain With a Basket Full of F... Now I could hear the coyotes when I laid in a ...
9 Blood, Pt. 2 2009-02-16 Dark Was the Night You're not bloody swab paradise\nYou're golden...
153 Feels Like None Dirtbike She found the lost boy, eyes that are crying c...
129 Why So Sad? 2008-01-01 Dirtbike 3 Under the bed, on the floor, through the roof\...
In [7]:
# clean data
songs = clean_lyrics(songs)


# save
songs.to_csv('buck65.tsv', sep="\t", index=False)

songs['lyrics'].str.len().sum()
/root/miniconda3/envs/buck65/lib/python3.9/site-packages/pandas/core/strings/accessor.py:101: UserWarning: This pattern has match groups. To actually get the groups, use str.extract.
  return func(self, *args, **kwargs)
<ipython-input-1-8f6cb536841b>:42: FutureWarning: The default value of regex will change from True to False in a future version.
  songs['lyrics'] = songs['lyrics'].str.replace('[\n\t]', ' ')
<ipython-input-1-8f6cb536841b>:42: SettingWithCopyWarning: 
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  songs['lyrics'] = songs['lyrics'].str.replace('[\n\t]', ' ')
<ipython-input-1-8f6cb536841b>:44: FutureWarning: The default value of regex will change from True to False in a future version.
  songs['lyrics'] = songs['lyrics'].str.replace('[-–—]', ' ')
<ipython-input-1-8f6cb536841b>:44: SettingWithCopyWarning: 
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  songs['lyrics'] = songs['lyrics'].str.replace('[-–—]', ' ')
<ipython-input-1-8f6cb536841b>:46: FutureWarning: The default value of regex will change from True to False in a future version.
  songs['lyrics'] = songs['lyrics'].str.replace('[^a-zA-Z0-9 ]', '')
<ipython-input-1-8f6cb536841b>:46: SettingWithCopyWarning: 
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  songs['lyrics'] = songs['lyrics'].str.replace('[^a-zA-Z0-9 ]', '')
<ipython-input-1-8f6cb536841b>:48: SettingWithCopyWarning: 
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  songs['lyrics'] = songs['lyrics'].str.lower()
Out[7]:
297778

Rundown

So far, we have gathered, sorted, and cleaned all of the lyrics from Buck 65's Genius entry. We can see that we have 297,778 individual lyrics across 163 songs. The first step then is to narrow that down to the 35,000 used in the original project. More specifically, we need to get the (chronologically) first 35,000 words. To do that, we'll need a list of his albums, which I've gotten from this Wikipedia article. The most relevant part of that article is pasted below:

Studio albums
Buck 65

Game Tight (1994)
Year Zero (1996)
Weirdo Magnet (1996)
Language Arts (1996)
Vertex (1998)
Man Overboard (Anticon, 2001)
Synesthesia (Endemik, 2001)
Square (WEA, 2002)
Talkin' Honky Blues (WEA, 2003)
Secret House Against the World (WEA, 2005)
Situation (Strange Famous, 2007)
20 Odd Years (WEA, 2011)
Laundromat Boogie (2014)
Neverlove (2014)
In [8]:
import nltk
nltk.download('punkt')

songs = pd.read_csv('buck65.tsv', sep="\t", keep_default_na=False)
[nltk_data] Downloading package punkt to /root/nltk_data...
[nltk_data]   Package punkt is already up-to-date!
In [9]:
# useful for coding next cell
songs['album'].value_counts()
Out[9]:
Neverlove (Deluxe Edition)                                          17
Talkin’ Honky Blues                                                 17
Vertex                                                              16
Situation                                                           16
Man Overboard                                                       13
Language Arts                                                       13
Secret House Against The World                                      12
20 Odd Years                                                        12
                                                                    11
Synesthesia                                                          9
This Right Here is Buck 65                                           5
Square                                                               4
Boy-Girl Fight                                                       2
20 Odd Years: Volume 4 - Ostranenie                                  2
Dirtbike 3                                                           2
Dirtywork E.P.                                                       2
I Dream Of Love: Live And In Private                                 2
Pole-Axed (More Rarities)                                            1
Weirdo Magnet                                                        1
Year Of The Carnivore Soundtrack                                     1
Climbing Up a Mountain With a Basket Full of Fruit                   1
Dark Was the Night                                                   1
Skratch Bastid Presents: Cretin Hip Hop Vol. 1 (Buck 65 Mixtape)     1
Dirtbike                                                             1
Giga Single                                                          1
Name: album, dtype: int64
In [10]:
# sort albums by album release
buck_albums = [
    'Game Tight',
    'Year Zero',
    'Weirdo Magnet',
    'Language Arts',
    'Vertex',
    'Man Overboard',
    'Synesthesia',
    'Square',
    'Talkin’ Honky Blues',
    'Secret House Against The World',
    'Situation ', # space is there on purpose
    '20 Odd Years',
    'Laundromat Boogie',
    'Neverlove (Deluxe Edition)'
]

albums_to_remove = [
    '20 Odd Years: Volume 4 - Ostranenie',
    'Dirtbike 3',
    'Dirtywork E.P.',
    'I Dream Of Love: Live And In Private',
    'Boy-Girl Fight',
    'Pole-Axed (More Rarities)',
    'Year Of The Carnivore Soundtrack',
    'Climbing Up a Mountain With a Basket Full of Fruit',
    'Dirtbike',
    'Skratch Bastid Presents: Cretin Hip Hop Vol. 1 (Buck 65 Mixtape)',
    'Weirdo Magnet', # yes this is a studio album, but there's only one song in it and it's full of notes
    'This Right Here is Buck 65', # best hits album
    'Giga Single',
    'Dark Was the Night',
    '', # removes songs with no album
]

def chron_order_albums(songs, albums):
    songs['ordered_album'] = pd.Categorical(
        songs['album'], 
        categories=albums, 
        ordered=True
    )

    return songs.sort_values(by='ordered_album')


songs = chron_order_albums(songs, buck_albums)

for alb in albums_to_remove:
    remove_bool = songs['album'] == alb
    songs = songs[~remove_bool]

songs.tail(30)
Out[10]:
title release_date album lyrics ordered_album
96 The Outskirters Situation young and attractive quote unquote old soul do... Situation
10 Superstars Don’t Love 20 Odd Years michael jackson died today cycles action hidea... 20 Odd Years
1 Cold Steel Drum 2011-01-01 20 Odd Years i lay down for you in black and blue life here... 20 Odd Years
2 Zombie Delight 2011-02-07 20 Odd Years zombie delight zombie delight zombies are comi... 20 Odd Years
3 She Said Yes 2011-01-01 20 Odd Years she wrote back too alone a single pair of shoo... 20 Odd Years
4 BCC 2011-01-01 20 Odd Years bcc the add we dont have much time you see bcc... 20 Odd Years
5 Tears Of Your Heart 2011-02-07 20 Odd Years french singing age is beauty bewildered young... 20 Odd Years
6 Final Approach 2011-01-01 20 Odd Years the sun is always shining bright at thirty tho... 20 Odd Years
7 Paper Airplane 2011-01-01 20 Odd Years down by the lake you saw me and you knew i was... 20 Odd Years
8 Gee Whiz 20 Odd Years tell me what is it is gee whiz i dont think i ... 20 Odd Years
9 Whispers Of The Waves 2011-01-01 20 Odd Years i am the deck you are the sea i am the light o... 20 Odd Years
11 Stop 2011-01-01 20 Odd Years stop you need to listen just let me show you a... 20 Odd Years
0 Who By Fire 20 Odd Years and who by fire who by water who in the sunshi... 20 Odd Years
55 Danger And Play 2014-09-23 Neverlove (Deluxe Edition) heres a man whos come apart pieces missing unc... Neverlove (Deluxe Edition)
68 That’s The Way Love Dies 2014-09-30 Neverlove (Deluxe Edition) thats the way love dies it starts when youre a... Neverlove (Deluxe Edition)
66 Love Will Fuck You Up 2014-09-30 Neverlove (Deluxe Edition) i live in an ugly city nowhere else id rather ... Neverlove (Deluxe Edition)
65 Ugly Bridge 2014-09-30 Neverlove (Deluxe Edition) oh damn i wish that i were dead sleeping somew... Neverlove (Deluxe Edition)
64 Heart of Stone 2014-08-21 Neverlove (Deluxe Edition) cant break a heart of stone no youre only goin... Neverlove (Deluxe Edition)
63 Gates Of Hell 2014-09-30 Neverlove (Deluxe Edition) now im feeling devilish soon were going to set... Neverlove (Deluxe Edition)
62 Only War 2014-06-25 Neverlove (Deluxe Edition) kiss it better stop the bleeding she attacks h... Neverlove (Deluxe Edition)
56 Fairytales 2013-03-13 Neverlove (Deluxe Edition) i dont believe in fairytales in love and other... Neverlove (Deluxe Edition)
61 Neverlove 2014-09-30 Neverlove (Deluxe Edition) back in the game on the prowl after noon looki... Neverlove (Deluxe Edition)
59 A Case For Us 2014-09-30 Neverlove (Deluxe Edition) maybe theres a place for us to go somewhere we... Neverlove (Deluxe Edition)
58 Super Pretty Naughty 2014-09-02 Neverlove (Deluxe Edition) fancy time naked saturday wild stylin now its ... Neverlove (Deluxe Edition)
57 Superhero In My Heart 2014-09-30 Neverlove (Deluxe Edition) when my baby left me i cried for an entire yea... Neverlove (Deluxe Edition)
52 Je T’aime Mon Amour 2014-09-29 Neverlove (Deluxe Edition) je taime mon amour je taime tant cest dur de n... Neverlove (Deluxe Edition)
53 Roses In The Rain 2014-09-09 Neverlove (Deluxe Edition) this is where she finds herself alone and empt... Neverlove (Deluxe Edition)
54 Baby Blanket 2014-09-30 Neverlove (Deluxe Edition) i dont want to be a bad person anymore a casua... Neverlove (Deluxe Edition)
60 NSFW Music Video 2014-09-30 Neverlove (Deluxe Edition) nsfw music video music video music video nsfw ... Neverlove (Deluxe Edition)
67 She Fades 2014-09-16 Neverlove (Deluxe Edition) dont look now its the bte noir playin flute so... Neverlove (Deluxe Edition)

Data Cleaning

Earlier on, we did a first round of data cleaning: removing special characters, lowercasing all words, etc. However, a second round was needed to remove rows. The original methodology only included studio albums unless there weren't enough for 35,000 words, in which case other materials were considered. This isn't the case for Buck 65, so I've had to remove some songs. I did so based on the album column, and the removals fell into roughly three categories:

  1. Mixtapes, singles and unreleased material: This is most of the removal list, and while it unfortunalely removes some of my favorite material (RIP Dirtbike), it was important to take it out to be consistant with the original methodology.

  2. 'Problem Albums': There were two of these: Weirdo Magnet and This Right Here is Buck 65. Weirdo Magnet had to go because the lyrics were woefully incomplete (only one song was on Genius) and the lyrics for it had comments in it. This Right Here is Buck 65 is a best of album, so it inflated the total word count without adding any unique words, so it made more sense to remove it.

  3. Songs with no album: a lot of these fall under mixtapes, singles, and unreleased material, but just weren't marked as such. There may have been some valuable songs in there (e.g. more of Weirdo Magnet) but going through and manually addojg albums was going to be a pain, so I decided to exclude them.

In [11]:
def get_unique_lyrics(tokens):
    return len(set(tokens))

def tokenize_lyrics(songs):
    lyrics = songs['lyrics']
    lyric_string = lyrics.str.cat()
    return nltk.word_tokenize(lyric_string)

lyric_tokens = tokenize_lyrics(songs)
print('Total Lyrics'len(lyric_tokens))
# get unique words in first 35,000 lyrics
limited_tokens = lyric_tokens[:34999]
print('First 35,000 Unique Lyrics:', get_unique_lyrics(limited_tokens))
43052
Out[11]:
6557

First Conclusion

The above cell gives us Buck 65's vocabulary according to Daniel's first 35,000 word methodology: 6,557 unique words. This puts him in a solid 3rd place. Ahead of Jedi Mind Tricks at 6,424, but still well behind Busdriver and Aesop Rock. While this gives us our answer, just for fun, I wanted to see how sensitive that result would be to changing the sample.

In [12]:
#all lyrics
print('Total 35,000 Unique Lyrics:'get_unique_lyrics(lyric_tokens))
7521
In [13]:
# last words
last_tokens = lyric_tokens[-35000:]
print('Last 35,000 Unique Lyrics:', get_unique_lyrics(last_tokens))
6537
In [14]:
# random samplings
from random import sample
from statistics import mean

def sample_lyrics(songs):
    counter = 0
    results = []
    while counter < 10:
        lyric_sample = sample(lyric_tokens, 35000)
        uniques = get_unique_lyrics(lyric_sample)
        results.append(uniques)
        counter += 1
    return results

sample_results = sample_lyrics(songs)
print('Random 35,000 Unique Lyrics:')
print(sorted(sample_results))
print(mean(sample_results))
[6644, 6653, 6666, 6677, 6678, 6682, 6684, 6707, 6713, 6717]
6682.1

Second Conclusion

When using his whole corpus of 43,052 words, we find 7,521 unique ones. Using his last 35,000 words gets us 6,557 unique words, implying a slight decrease in vocabulary over time. I'd assume that a lot of that is due to the inclusion of Neverlove, which was a far poppier, less dense-sounding album than many of his early works. Finally, using a series of random samplings of 35,000 words, we get results that tend to average out in the high 6,600s, but reaching down to the 6,590s and up to the low 6,700s.

While this is another good result, I have a hypothesis that these numbers will all go up noticably if I include two (especially poetic IMO) albums which he recorded as part of a collaboration with DJ Greetings from Tuskan.

In [15]:
# this only needs to be run if Lyrics_BikeForThree.json doesn't exist
# it will also take a while
# bike = genius.search_artist("Bike for Three!")
# bike.save_lyrics()
In [16]:
bike = make_initial_dataframe('Lyrics_BikeForThree.json')
bike.head()
Out[16]:
title release_date album lyrics
0 Lazarus Phenomenon 2009-05-26 {'api_path': '/albums/24905', 'cover_art_url':... (Here, it's perfectly dark)\n(Here, it's perfe...
1 Always I Will Miss You. Always You. 2009-05-26 {'api_path': '/albums/24905', 'cover_art_url':... Always, I will miss you, always you\nAlways, I...
2 There Is Only One Of Us 2009-05-26 {'api_path': '/albums/24905', 'cover_art_url':... Whispering ghosts, seduction unlikely\nJust ou...
3 All There Is to Say About Love 2009-05-26 {'api_path': '/albums/24905', 'cover_art_url':... Dragonflies and the agonizing blast from a gun...
4 Sublimation 2014-02-11 {'api_path': '/albums/133397', 'cover_art_url'... I don't know how to love myself, I'm hoping yo...
In [17]:
bike = format_albums(bike)
bike.head()
Out[17]:
title release_date album lyrics
0 Lazarus Phenomenon 2009-05-26 More Heart than Brains (Here, it's perfectly dark)\n(Here, it's perfe...
1 Always I Will Miss You. Always You. 2009-05-26 More Heart than Brains Always, I will miss you, always you\nAlways, I...
2 There Is Only One Of Us 2009-05-26 More Heart than Brains Whispering ghosts, seduction unlikely\nJust ou...
3 All There Is to Say About Love 2009-05-26 More Heart than Brains Dragonflies and the agonizing blast from a gun...
4 Sublimation 2014-02-11 So Much Forever I don't know how to love myself, I'm hoping yo...
In [18]:
bike.sort_values(by='album', inplace=True)
bike.head(20)
Out[18]:
title release_date album lyrics
0 Lazarus Phenomenon 2009-05-26 More Heart than Brains (Here, it's perfectly dark)\n(Here, it's perfe...
23 Ending 2009-05-26 More Heart than Brains
21 First Embrace 2009-05-26 More Heart than Brains (First embrace)\nPouring rain, roaring pain, g...
18 One More Time Forever 2009-05-26 More Heart than Brains Rioting quietly, we started fires and threw br...
16 MC Space 2009-05-26 More Heart than Brains Where I come from, we never heard of bite\nWe ...
15 The Departure 2009-05-26 More Heart than Brains Without everything else, heavily and only-less...
13 More Heart Than Brains 2009-05-26 More Heart than Brains Dark promises\nDark mysteries\nMemories, dista...
11 Can Feel Love (anymore) 2009-05-26 More Heart than Brains There's a baby girl on the way, blow your horn...
10 Let’s Never Meet 2009-05-26 More Heart than Brains Somewhere unseen and under the covers deep\nTh...
25 Beginning 2009-05-26 More Heart than Brains
7 Nightdriving 2009-05-26 More Heart than Brains Night driving, faced my wheel\nBoth of my legs...
5 No Idea How 2009-05-26 More Heart than Brains Steady, unbreakable, consistent, fast\nDirt, e...
3 All There Is to Say About Love 2009-05-26 More Heart than Brains Dragonflies and the agonizing blast from a gun...
2 There Is Only One Of Us 2009-05-26 More Heart than Brains Whispering ghosts, seduction unlikely\nJust ou...
1 Always I Will Miss You. Always You. 2009-05-26 More Heart than Brains Always, I will miss you, always you\nAlways, I...
8 Heart as Hell 2014-02-11 So Much Forever I have two hearts, and one of them is hard as ...
24 Intro 2014-02-11 So Much Forever
6 You Can Be Everything 2014-02-11 So Much Forever Sun and the rain, one and the same, builder\nD...
14 Full Moon 2014-02-11 So Much Forever How did it bleed? It bled like fire\nHow did i...
4 Sublimation 2014-02-11 So Much Forever I don't know how to love myself, I'm hoping yo...
In [19]:
# clean data
bike = clean_lyrics(bike)

# save
bike.to_csv('bike.tsv', sep="\t", index=False)
/root/miniconda3/envs/buck65/lib/python3.9/site-packages/pandas/core/strings/accessor.py:101: UserWarning: This pattern has match groups. To actually get the groups, use str.extract.
  return func(self, *args, **kwargs)
<ipython-input-1-8f6cb536841b>:42: FutureWarning: The default value of regex will change from True to False in a future version.
  songs['lyrics'] = songs['lyrics'].str.replace('[\n\t]', ' ')
<ipython-input-1-8f6cb536841b>:44: FutureWarning: The default value of regex will change from True to False in a future version.
  songs['lyrics'] = songs['lyrics'].str.replace('[-–—]', ' ')
<ipython-input-1-8f6cb536841b>:46: FutureWarning: The default value of regex will change from True to False in a future version.
  songs['lyrics'] = songs['lyrics'].str.replace('[^a-zA-Z0-9 ]', '')
In [20]:
bike = pd.read_csv('bike.tsv', sep="\t", keep_default_na=False)
bike_tokens = tokenize_lyrics(bike)
lyric_tokens += bike_tokens
# get updated total unique lyrics
get_unique_lyrics(lyric_tokens)
Out[20]:
8195
In [21]:
# only run this once per kernel
songs = songs.append(bike)
In [22]:
# buck_albums from above with Bike For Three's albums inserted
all_albums = [
    'Game Tight',
    'Year Zero',
    'Weirdo Magnet',
    'Language Arts',
    'Vertex',
    'Man Overboard',
    'Synesthesia',
    'Square',
    'Talkin’ Honky Blues',
    'Secret House Against The World',
    'Situation ',
    'More Heart than Brains',
    '20 Odd Years',
    'So Much Forever ',
    'Laundromat Boogie',
    'Neverlove (Deluxe Edition)'
]

songs = chron_order_albums(songs, all_albums)

songs.tail(50)
Out[22]:
title release_date album lyrics ordered_album
2 First Embrace 2009-05-26 More Heart than Brains first embrace pouring rain roaring pain girl m... More Heart than Brains
0 Lazarus Phenomenon 2009-05-26 More Heart than Brains here its perfectly dark here its perfectly dar... More Heart than Brains
1 Ending 2009-05-26 More Heart than Brains More Heart than Brains
3 One More Time Forever 2009-05-26 More Heart than Brains rioting quietly we started fires and threw bri... More Heart than Brains
4 MC Space 2009-05-26 More Heart than Brains where i come from we never heard of bite we ca... More Heart than Brains
13 There Is Only One Of Us 2009-05-26 More Heart than Brains whispering ghosts seduction unlikely just out ... More Heart than Brains
7 Can Feel Love (anymore) 2009-05-26 More Heart than Brains theres a baby girl on the way blow your horn k... More Heart than Brains
14 Always I Will Miss You. Always You. 2009-05-26 More Heart than Brains always i will miss you always you always i wil... More Heart than Brains
6 More Heart Than Brains 2009-05-26 More Heart than Brains dark promises dark mysteries memories distance... More Heart than Brains
5 The Departure 2009-05-26 More Heart than Brains without everything else heavily and only less ... More Heart than Brains
9 Whispers Of The Waves 2011-01-01 20 Odd Years i am the deck you are the sea i am the light o... 20 Odd Years
10 Superstars Don’t Love 20 Odd Years michael jackson died today cycles action hidea... 20 Odd Years
1 Cold Steel Drum 2011-01-01 20 Odd Years i lay down for you in black and blue life here... 20 Odd Years
2 Zombie Delight 2011-02-07 20 Odd Years zombie delight zombie delight zombies are comi... 20 Odd Years
3 She Said Yes 2011-01-01 20 Odd Years she wrote back too alone a single pair of shoo... 20 Odd Years
4 BCC 2011-01-01 20 Odd Years bcc the add we dont have much time you see bcc... 20 Odd Years
5 Tears Of Your Heart 2011-02-07 20 Odd Years french singing age is beauty bewildered young... 20 Odd Years
6 Final Approach 2011-01-01 20 Odd Years the sun is always shining bright at thirty tho... 20 Odd Years
8 Gee Whiz 20 Odd Years tell me what is it is gee whiz i dont think i ... 20 Odd Years
7 Paper Airplane 2011-01-01 20 Odd Years down by the lake you saw me and you knew i was... 20 Odd Years
11 Stop 2011-01-01 20 Odd Years stop you need to listen just let me show you a... 20 Odd Years
0 Who By Fire 20 Odd Years and who by fire who by water who in the sunshi... 20 Odd Years
23 The Last Romance 2014-02-11 So Much Forever nevermore its revolution a spiral a crack vita... So Much Forever
15 Heart as Hell 2014-02-11 So Much Forever i have two hearts and one of them is hard as h... So Much Forever
16 Intro 2014-02-11 So Much Forever So Much Forever
17 You Can Be Everything 2014-02-11 So Much Forever sun and the rain one and the same builder dest... So Much Forever
18 Full Moon 2014-02-11 So Much Forever how did it bleed it bled like fire how did it ... So Much Forever
19 Sublimation 2014-02-11 So Much Forever i dont know how to love myself im hoping you c... So Much Forever
20 Agony 2014-02-11 So Much Forever dark genius heart genius star venus freely dar... So Much Forever
22 The Muse Inside Me 2014-02-11 So Much Forever i dreamed i was beautiful a hunter and frighte... So Much Forever
21 Successful With Heavy Losses 2014-02-11 So Much Forever all we ever needs a voice and a device to play... So Much Forever
25 Ethereal Love 2014-02-11 So Much Forever window seat in dead of night awake i saw a fri... So Much Forever
24 Wolf Sister 2014-02-11 So Much Forever wolf sister we have just begun when the branch... So Much Forever
60 NSFW Music Video 2014-09-30 Neverlove (Deluxe Edition) nsfw music video music video music video nsfw ... Neverlove (Deluxe Edition)
55 Danger And Play 2014-09-23 Neverlove (Deluxe Edition) heres a man whos come apart pieces missing unc... Neverlove (Deluxe Edition)
68 That’s The Way Love Dies 2014-09-30 Neverlove (Deluxe Edition) thats the way love dies it starts when youre a... Neverlove (Deluxe Edition)
66 Love Will Fuck You Up 2014-09-30 Neverlove (Deluxe Edition) i live in an ugly city nowhere else id rather ... Neverlove (Deluxe Edition)
65 Ugly Bridge 2014-09-30 Neverlove (Deluxe Edition) oh damn i wish that i were dead sleeping somew... Neverlove (Deluxe Edition)
63 Gates Of Hell 2014-09-30 Neverlove (Deluxe Edition) now im feeling devilish soon were going to set... Neverlove (Deluxe Edition)
62 Only War 2014-06-25 Neverlove (Deluxe Edition) kiss it better stop the bleeding she attacks h... Neverlove (Deluxe Edition)
56 Fairytales 2013-03-13 Neverlove (Deluxe Edition) i dont believe in fairytales in love and other... Neverlove (Deluxe Edition)
61 Neverlove 2014-09-30 Neverlove (Deluxe Edition) back in the game on the prowl after noon looki... Neverlove (Deluxe Edition)
59 A Case For Us 2014-09-30 Neverlove (Deluxe Edition) maybe theres a place for us to go somewhere we... Neverlove (Deluxe Edition)
58 Super Pretty Naughty 2014-09-02 Neverlove (Deluxe Edition) fancy time naked saturday wild stylin now its ... Neverlove (Deluxe Edition)
57 Superhero In My Heart 2014-09-30 Neverlove (Deluxe Edition) when my baby left me i cried for an entire yea... Neverlove (Deluxe Edition)
52 Je T’aime Mon Amour 2014-09-29 Neverlove (Deluxe Edition) je taime mon amour je taime tant cest dur de n... Neverlove (Deluxe Edition)
53 Roses In The Rain 2014-09-09 Neverlove (Deluxe Edition) this is where she finds herself alone and empt... Neverlove (Deluxe Edition)
54 Baby Blanket 2014-09-30 Neverlove (Deluxe Edition) i dont want to be a bad person anymore a casua... Neverlove (Deluxe Edition)
67 She Fades 2014-09-16 Neverlove (Deluxe Edition) dont look now its the bte noir playin flute so... Neverlove (Deluxe Edition)
64 Heart of Stone 2014-08-21 Neverlove (Deluxe Edition) cant break a heart of stone no youre only goin... Neverlove (Deluxe Edition)
In [35]:
lyric_tokens_all = tokenize_lyrics(songs)
print('All Lyrics:', len(lyric_tokens_all))
print('All Unique Lyrics:', get_unique_lyrics(lyric_tokens_all))
# get unique words in first 35,000 lyrics
limited_tokens_all = lyric_tokens_all[:34999]
print('First 35,000 Unique Lyrics:', get_unique_lyrics(limited_tokens_all))
All Lyrics: 51500
All Unique Lyrics: 8197
First 35,000 Unique Lyrics: 6554
In [37]:
last_tokens = lyric_tokens[-35000:]
print('Last 35,000 Unique Lyrics:', get_unique_lyrics(last_tokens))
Last 35,000 Unique Lyrics: 6412
In [27]:
sample_results_all = sample_lyrics(songs)
print('Random 35,000 Unique Lyrics:')
print(sorted(sample_results_all))
print(mean(sample_results_all))
[6537, 6541, 6553, 6581, 6590, 6591, 6604, 6606, 6628, 6634]
6586.5

Final Conclusion

The effect from adding the two Bike for Three albums was surprising to say the least. I was expecting a large increase in unique words, but other than in the total, adding these albums actually caused a slight decrease in all three samples. It's not enough of a decrease in the first 35,000 sample to knock Buck out of 3rd place, but still notable.

The Takeaway

Using Matt Daniel's methodology, I've analyzed the number of unique words that Canadian rapper Buck 65 has used in his first 35,000 lyrics. It's 6,557 unique words, which would put him in 3rd place on the chart. While this number does change slightly depending on the sample and on the inclusion of other albums, those changes keep him comfortably in 3rd place.