This whole thing started after I read an excellent blog post by Julia Silge. You should definitely check it out, but the short version is that she used language processing tools within R to analyze the sentiment in Jane Austen’s novels. This led me to ask: can I do this with rap albums? I didn’t learn R coding just to write a dissertation, right?!
In the same way that Julia looked at Jane Austen’s entire body of work, I thought it would be interesting to analyze one musical artist’s whole commercial output. Within hip hop, Kanye seems like a natural choice. He’s got seven albums out, which I’d argue encompass a greater musical diversity than probably any other recent, popular musician. Some of his later albums, particularly 808s & Heartbreak and Yeezus have been polarizing, even for fans of his early stuff. In short, his catalog seems ripe for a sentiment analysis. Would such an analysis reveal major differences in feeling across his career arc? Is there really an “old Kanye” to miss, lyrically speaking?
What follows is my best shot at this analysis. Major credit to Julia Silge here. My work followed pretty easily from the code she already had written. In the parlance of hip hop, I sampled her code and remixed it a little. If you don’t care anything about how I actually did the analysis in R, you can skip The Geek Stuff section and head right along to The Rap Stuff.
In thinking about this sentiment analysis project, it seemed to me there were three major steps:
Get the lyrics for the albums that I want to analyze. Julia already had this solved for her work because she had the text of Austen’s novels packaged up nice and neat. I was starting from scratch.
Perform the sentiment analysis. Julia already solved this problem for me.
Visualize the results. Julia mostly solved this problem for me.
Here’s how I went about addressing these tasks:
To get the lyrics I wanted, the best source seemed pretty obvious to me: the website Genius (formerly Rap Genius). If you haven’t browsed Genius, you should. It’s essentially a wiki-type site dedicated to song lyrics. Contributors both transcribe the lyrics and annotate them for meaning. For my purposes, I just needed to get the lyrics out of the webpages and into R, preferably for whole albums at once. To do this, I used the R package rvest
. It was my first time working with it, but it helped me scrape lyrics off the web quickly and easily. I wrote a few R functions that allowed me to simply input the Genius page of the album I was interested in and process the entire album’s worth of lyrics.
For the sentiment analysis, I just followed Julia’s lead. Major disclaimer: I know very little about sentiment analysis. My basic understanding is that it’s used to quantify the subjective feeling in a text. Julia has a nice comparison of metrics that might be used to compute sentiment scores. She settled on the bing
method from the package syuzhet
since it did not seem to be directionally biased or overly variable. Good enough for her, good enough for me.
Again, I followed Julia’s example here. I thought her visuals using ggplot2
looked really nice, so I just modified her code slightly. Whereas Julia was analyzing novel-length texts, my texts had clear dividing points since an album is made up of discrete tracks. Thus, I added some visual elements to the plots to help distinguish between tracks within an album.
This entire analysis can be done with surprisingly few R packages. To summarize, I’m using dplyr
for general data manipulation, rvest
for web scraping, stringr
for manipulation of lyrics once I get them into R, syuzhet
to conduct the sentiment analysis, and ggplot2
and png
to create the plots. If you want to know more than that, check out all the code on the GitHub repository.
Ok, let’s see some results! But first, one revelation: I lied. This project is not just about Kanye. I wanted to “ground truth” the sentiment analysis method with some other popular hip hop albums (and just because I was interested). The first album that came to mind was Kendrick Lamar’s To Pimp A Butterfly (TPAB), so that’s what I did first:
The resulting sentiment plot is very similar to what Julia produced in her analyses, except I’ve used vertical dotted lines to delineate tracks within an album. I also plotted positive sentiments in green and negative sentiments in red just to make the difference more obvious. I think this is also a good time to mention that I really have no idea how well this bing
sentiment method may or may not be doing in analyzing a hip hop album. The Genius user community certainly does a very thorough job of transcribing lyrics accurately, but I would imagine that much of the content of these albums is quite unusual for such an analysis. To give you an idea, here are a few bars from my personal favorite on TPAB, “Momma”:
## [1] "I know everything"
## [2] "I know everything, know myself"
## [3] "I know morality, spirituality, good and bad health"
## [4] "I know fatality might haunt you"
## [5] "I know everything, I know Compton"
## [6] "I know street shit, I know shit that's conscious"
## [7] "I know everything, I know lawyers, advertisement and sponsors"
## [8] "I know wisdom, I know bad religion, I know good karma"
If you’ve heard any of the albums I’ll be analyzing, you know it gets a lot more colorful than that. So while these sentiment analysis methods are capable of giving us “answers”, I’d take everything with a grain of salt.
For this album, I think the analysis confirms what many listeners would intuit: the general mood is pretty negative. The tracks with the most negative scores include “King Kunta,” “u,” “Hood Politics,” and “The Blacker The Berry.” These tracks highlight one of the interesting things about sentiment analysis applied to music lyrics: sometimes tracks that are musically dark also have negative lyrics, but not always. For example, “u” is a pretty devastating reflection on (a lack of) self-worth. I would expect it to have a strongly negative sentiment, which it does. And the music matches this content: it sounds like a negative song. In contrast, “Hood Politics” is decidedly more upbeat musically yet has a negative lyrical sentiment according to the metrics. So keep that potential tension between music and lyrics in mind.
Since I did a Kendrick album, I felt obligated to give Drake some attention too. President Obama may have given Kendrick the nod, but what does the sentiment tell us? A lot of people would vote Take Care as Drake’s best album to this point, so let’s plot that one:
Given that Drake has self-identified as being all in his feelings, it was a bit of a surprise to me that Take Care was actually less negative than TPAB. TPAB has five songs that dip down below -5 on the sentiment scale. Take Care has only one: “Take Care.” While Take Care’s tracks don’t get as negative as Kendrick’s on TPAB, they’re still negative. In fact, the album starts with a run of seven pretty solidly negative songs: “Over My Dead Body” through “Buried Alive (Interlude).” In contrast, there are positive peaks in tracks like “Make Me Proud” and “The Real Her.”
At this point I was just having fun with it, so I decided I also had to do two inarguable classics of the genre: Nas’ Illmatic and Jay-Z’s The Blueprint.
Well, yeah, that makes sense. Illmatic is pretty unremittingly bleak. The album is an artistic masterpiece and arguably the best ever in hip hop. But it’s definitely not happy. Ironically, the song with the most negative section is “One Love.” Then again, the song is written to someone who’s locked up.
The Blueprint is a much more balanced album overall. Its contours are similar in a way to Take Care: never too negative and some noticeable peaks in positivity. The sentiment analysis largely gels with my expectations. I think songs with very negative sentiment scores like “Takeover,” “Song Cry” (Jay must really be feeling this one after LEMONADE), and the Eminem-produced “Renegade” would certainly sound that way to your average listener.
Ok, ok, onto the Kanye stuff. To prepare, let’s just take a moment to remember the ebullient soul this man gave us. You could have your first wedding dance to that. In fact, you probably should. Let’s also not forget when he helped Bob Simon understand a “dope ass beat” (also known as a “really good track”).
I’m going to arbitrarily present the Kanye catalog in three chunks: his first three albums (The College Dropout, Late Registration, and Graduation), his fourth and fifth (808s & Heartbreak and My Beautiful Dark Twisted Fantasy), and his last two (Yeezus and The Life Of Pablo). Alright, here are the first three albums, the three that fans of the “old Kanye” probably like best:
I think the main takeaway from these plots is that early Kanye had pretty positive lyrics (and a ton of tracks on his albums). Of course this lyrical content was mirrored musically by Kanye’s soul-influenced production from this period. In particular, I think the first run of songs on Graduation, “Good Morning” through “Good Life”, is notable for having almost entirely positive sentiment. That stretch is pretty unique in all the albums we’ve looked at so far. If you’re a die-hard fan, you’ll notice there are bonus tracks included in my analysis. I didn’t make these decisions; I just took what Genius had for the complete album tracklists. These inclusions are not insignificant: “Bittersweet Poetry” for instance is a bit of a negative outlier on Graduation. But it also gave us this interaction, so let’s call it an even trade.
Let’s look at the albums where it really went off (and then back on) the rails for some folks, 808s & Heartbreak and My Beautiful Dark Twisted Fantasy (MBDTF):
Many people, myself included, were at least initially put off by 808s & Heartbreak because of its musical and lyrical stylings. It was a sad pop album by a once-in-a-generation hip hop artist. I think this shift is reflected in the sentiment analysis but in a different way than one might expect. 808s is not overwhelming negative, but its sentiment is much more muted than Kanye’s prior albums. There aren’t as many highs or lows. It’s an album of neutrals. There are also just a lot fewer words overall on this album. You can see that indicated by the fact that there simply aren’t as many lyrical chunks analyzed for sentiment in the 808s plot. If you’ve listened to the album, you know this. The last half of “Say You Will,” for instance, is just an instrumental beeping and blooping its way towards finality. There’s not a lot of levity on this album, but even so, a few tracks like “Love Lockdown,” “Street Lights,” and “Bad News” got largely positive sentiment scores. I don’t totally know what to make of that.
MBDTF was Kanye’s return to a more grandiose production style. Kanye let us know he was making Art here. Pitchfork gave it a straight 10. However, it’s also beloved by more casual fans. The biggest surprise to me is that MBDTF is in no way a return to a lighter sentiment for Kanye. In fact, it ranks as the most negative album in his entire discography. Without bonus track “See Me Now,” it would be even darker. In this case, critical acclaim and fan affection were earned despite some apparently depressing content.
Finally, here’s the sentiment analysis for Kanye’s two most recent albums, Yeezus and The Life of Pablo (TLOP):
For a lot of people, I imagine Yeezus was an equivalent shock to 808s except instead of being extremely sad, Kanye seemed extremely angry. This negative sentiment shows in the analysis. The first five songs of Graduation were almost entirely positive. Yeezus starts with the inverse of that; “On Sight” through “Hold My Liquor” are almost entirely negative. And even though album closer “Bound 2” has a distinct production style relative to the rest of Yeezus, its sentiment stays solidly in the red.
TLOP is an interesting one. The album release seemed very disorganized and rushed. Musically, it’s a bit disorienting, with rapid shifts in style and tone. But no doubt there’s something very Kanye about it. It’s an album that can’t stick to any one idea because it’s so excited about everything, and there are some genuinely stirring moments tucked in there (Chance’s verse on “Ultralight Beam!”). The overall impression from the sentiment analysis of TLOP is very similar to Kanye’s first three albums, which have highs and lows in roughly equal measure.
To help summarize sentiment across Kanye’s entire catalog, here’s a table showing the average sentiment for each album and the variance in sentiment within that album:
## Album Title Sentiment Mean Sentiment Variance
## 1 The College Dropout 0.16 5.06
## 2 Late Registration 0.25 3.76
## 3 Graduation 0.10 4.34
## 4 808s & Heartbreak -0.31 2.41
## 5 My Beautiful Dark Twisted Fantasy -1.16 6.63
## 6 Yeezus -0.79 3.74
## 7 The Life of Pablo -0.25 5.31
These numbers help confirm a few things you may have picked up from looking at the sentiment plots. Kanye’s first three albums were characterized by having a positive average sentiment, while every album 808s and after has had a negative sentiment. 808s was also notable for having the lowest variance in sentiment across Kanye’s discography; it was kind of negative and more or less stayed that way throughout the album’s running time. In contrast, MBDTF had the greatest variance in sentiment and was also the most negative of any album. Since then, however, Kanye’s albums have been trending upwards in sentiment.
In sum, I think this sentiment analysis helps us understand Kanye a little better. Some of his success is undoubtedly attributable to the fact that he’s had the self-confidence to plunge headfirst into new musical waters, exploring styles as inspiration hits rather than being bound by conventions of popularity. Similarly, I think his willingness to cover broad swaths of lyrical content, all of life from the highs to the lows, is reflected in this sentiment analysis. His albums vary in average sentiment, and for many of them, there’s also considerable variation within an album. Good news though for fans of the old Kanye: if the sentiment metrics are any indication, over the course of the last few albums he’s heading more and more back to where he started.