Class 21

Sentiment Analysis and Lyrics

Materials for class on

2024-11-07

Preparation Reading

We’re working with these readings/materials today:

Agenda

Today we’ll focus on:

continuation from last time
tidytext sentiment analysis

Using APIs with R (Optional)

What is an API, and how do you access one in R? You could start with this guide:

R API Tutorial

For Homework 3, you can optionally use the {geniusr} package to get lyrics. Note that APIs can be finicky, but let’s try it out!

Important

Awkward installation notes - there is a bug in the main package code, so you need to install a “forked” version from GitHub. You can do so as follows (remove/skip comment character from installation lines):

# install a package that installs remote packages
# install.packages("remotes")
# install the modified geniusr package itself
# remotes::install_github("giovanni-cutri/geniusr")

The documentation for {geniusr} explains how to create and use an access token.

library(tidytext)
library(geniusr)
#library(spotifyr)

Authenticate

genius_token()

NOTE: you shouldn’t generally put any passwords or access tokens into your documents directly! There are some ways of managing this so you can use them non-interactively, but we don’t have time to cover that now. Once you use the genius_token() function interactively in your project, you should be able to render the document without re-entering it on the same computer. I have set the chunk above to output: false so that it doesn’t show the key in the html output either.

You can get lyrics for one song for a specific artist like this:

hs_adoreyou <- get_lyrics_search(artist_name = "Harry Styles", song_title = "Adore You")
head(hs_adoreyou)

#> # A tibble: 6 × 5
#>   line                         section_name section_artist song_name artist_name
#>   <chr>                        <chr>        <chr>          <chr>     <chr>      
#> 1 Walk in your rainbow paradi… Verse 1      Harry Styles   Adore You <NA>       
#> 2 Strawberry lipstick state o… Verse 1      Harry Styles   Adore You <NA>       
#> 3 I get so lost inside your e… Verse 1      Harry Styles   Adore You <NA>       
#> 4 Would you believe it?        Verse 1      Harry Styles   Adore You <NA>       
#> 5 You don't have to say you l… Pre-Chorus   Harry Styles   Adore You <NA>       
#> 6 You don't have to say nothi… Pre-Chorus   Harry Styles   Adore You <NA>

To combine multiple dataframes with matching columns/variables, you can use bind_rows():

hs_matilda <- get_lyrics_search(artist_name = "Harry Styles", song_title = "Matilda")
df_hs_all <- bind_rows(hs_adoreyou, hs_matilda)
df_hs_all |> 
  group_by(song_name) |> 
  count()

#> # A tibble: 2 × 2
#> # Groups:   song_name [2]
#>   song_name     n
#>   <chr>     <int>
#> 1 Adore You    55
#> 2 Matilda      36

Sentiment Analysis

Getting sentiments for words:

get_sentiments("bing")

#> # A tibble: 6,786 × 2
#>    word        sentiment
#>    <chr>       <chr>    
#>  1 2-faces     negative 
#>  2 abnormal    negative 
#>  3 abolish     negative 
#>  4 abominable  negative 
#>  5 abominably  negative 
#>  6 abominate   negative 
#>  7 abomination negative 
#>  8 abort       negative 
#>  9 aborted     negative 
#> 10 aborts      negative 
#> # ℹ 6,776 more rows

Joining and keeping only words that have a sentiment:

df_adore_sent <- hs_adoreyou |> 
  unnest_tokens(word, line) |> 
  inner_join(get_sentiments("bing"))

#> Joining with `by = join_by(word)`

df_adore_sent

#> # A tibble: 22 × 6
#>    section_name section_artist song_name artist_name word     sentiment
#>    <chr>        <chr>          <chr>     <chr>       <chr>    <chr>    
#>  1 Verse 1      Harry Styles   Adore You <NA>        paradise positive 
#>  2 Verse 1      Harry Styles   Adore You <NA>        paradise positive 
#>  3 Verse 1      Harry Styles   Adore You <NA>        lost     negative 
#>  4 Pre-Chorus   Harry Styles   Adore You <NA>        love     positive 
#>  5 Chorus       Harry Styles   Adore You <NA>        adore    positive 
#>  6 Chorus       Harry Styles   Adore You <NA>        adore    positive 
#>  7 Chorus       Harry Styles   Adore You <NA>        like     positive 
#>  8 Chorus       Harry Styles   Adore You <NA>        like     positive 
#>  9 Verse 2      Harry Styles   Adore You <NA>        wonder   positive 
#> 10 Verse 2      Harry Styles   Adore You <NA>        lemon    negative 
#> # ℹ 12 more rows

NOTE: most stopwords don’t have sentiments anyway.

df_adore_sent |> 
  count(sentiment)

#> # A tibble: 2 × 2
#>   sentiment     n
#>   <chr>     <int>
#> 1 negative      2
#> 2 positive     20

df_mat_sent <- hs_matilda |> 
  unnest_tokens(word, line) |> 
  inner_join(get_sentiments("bing"))

#> Joining with `by = join_by(word)`

df_mat_sent |> 
  count(sentiment)

#> # A tibble: 2 × 2
#>   sentiment     n
#>   <chr>     <int>
#> 1 negative      9
#> 2 positive     10

Getting More Lyrics

Some other {geniusr} functions that work - some don’t seem to work with the current version of the API, even with the modified package.

search_artist("Taylor Swift")

#> # A tibble: 1 × 3
#>   artist_id artist_name  artist_url                             
#>       <int> <chr>        <chr>                                  
#> 1      1177 Taylor Swift https://genius.com/artists/Taylor-swift

search_song("Taylor Swift", n = 20)

#> # A tibble: 20 × 5
#>     song_id song_name                      song_lyrics_url artist_id artist_name
#>       <int> <chr>                          <chr>               <int> <chr>      
#>  1  7076626 All Too Well (10 Minute Versi… https://genius…      1177 Taylor Swi…
#>  2  7394358 All Too Well (10 Minute Versi… https://genius…      1177 Taylor Swi…
#>  3 10024009 Fortnight (Ft. Post Malone)    https://genius…      1177 Taylor Swi…
#>  4  5793984 cardigan                       https://genius…      1177 Taylor Swi…
#>  5  5793983 exile (Ft. Bon Iver)           https://genius…      1177 Taylor Swi…
#>  6  4508914 Lover                          https://genius…      1177 Taylor Swi…
#>  7 10024526 loml                           https://genius…      1177 Taylor Swi…
#>  8 10024578 The Tortured Poets Department  https://genius…      1177 Taylor Swi…
#>  9  5794073 the 1                          https://genius…      1177 Taylor Swi…
#> 10 10296677 thanK you aIMee                https://genius…      1177 Taylor Swi…
#> 11 10024535 Down Bad                       https://genius…      1177 Taylor Swi…
#> 12 10024520 But Daddy I Love Him           https://genius…      1177 Taylor Swi…
#> 13  9538404 Is It Over Now? (Taylor's Ver… https://genius…      1177 Taylor Swi…
#> 14  5793977 august                         https://genius…      1177 Taylor Swi…
#> 15  6260164 tolerate it                    https://genius…      1177 Taylor Swi…
#> 16 10024512 I Can Do It With a Broken Hea… https://genius…      1177 Taylor Swi…
#> 17  3210592 Look What You Made Me Do       https://genius…      1177 Taylor Swi…
#> 18  6260160 champagne problems             https://genius…      1177 Taylor Swi…
#> 19 10024519 The Smallest Man Who Ever Liv… https://genius…      1177 Taylor Swi…
#> 20  5793962 betty                          https://genius…      1177 Taylor Swi…

You can also use lyrics text that you’ve obtained in other ways to read in. Let’s say that I copy the lyrics for a song from the Genius website. I could read them in as an object by creating separating the lines using read_lines():

apple <- read_lines("[Verse 1]
I guess the apple don't fall far from the tree
'Cause I've been looking at you so long
Now I only see me
I wanna throw the apple into the sky
Feels like you never understand me
So I just wanna drive
To the airport, the airport
The airport, the airport

[Verse 2]
I guess the apple could turn yellow or green
I know there's lots of different nuances
To you and to me
I wanna grow the apple, keep all the seeds
But I can't help but get so angry
You don't listen, I leave
To the airport, the airport
The airport, the airport
The airport, the airport
The airport, the airport

[Interlude]
(Yeah, yeah)
I'm gonna drive, gonna drive all night
I'm gonna drive, gonna drive all night

[Verse 3]
I think the apple's rotten right to the core
From all the things passed down
From all the apples coming before
I split the apple down symmetrical lines
And what I find is kinda scary
Makes me just wanna drive
(Drive, drive, drive, dr-dr-dr-drive, drive, drive)
(I'm gonna drive, gonna drive all night)
(I'm gonna drive, gonna drive all night)
(Drive, drive, drive, dr-dr-dr-drive, drive)

[Outro]
I wanna know where you go
When you're feeling alone
When you're feeling alone, do you?
I wanna know where you go
When you're feeling alone
When you're feeling alone, do you?
(Do you, do you, do you, do you)
(Do you, do you, do you, do you)
(Do you, do you, do you, do you)
(Do you, do you, do you, do you)
(Do you, do you, do you, do you)
(Do you, do you, do you, do you)
(Do you, do you, do you, do you)
(Do you, do you, do you, do you)
(Do you, do you)")

Now it’s a character vector. We can turn it into a dataframe to get to a tidytext format, similar to workflow used for the Emily Dickinson poem at the beginning of the Text Mining in R book:

df_apple <- tibble(song = "apple", text = apple)
df_apple

#> # A tibble: 55 × 2
#>    song  text                                            
#>    <chr> <chr>                                           
#>  1 apple "[Verse 1]"                                     
#>  2 apple "I guess the apple don't fall far from the tree"
#>  3 apple "'Cause I've been looking at you so long"       
#>  4 apple "Now I only see me"                             
#>  5 apple "I wanna throw the apple into the sky"          
#>  6 apple "Feels like you never understand me"            
#>  7 apple "So I just wanna drive"                         
#>  8 apple "To the airport, the airport"                   
#>  9 apple "The airport, the airport"                      
#> 10 apple ""                                              
#> # ℹ 45 more rows

Let’s remove the lines with brackets, and then unnest:

df_apple_words <- df_apple |> 
  filter(!str_starts(text, "\\[")) |> 
  unnest_tokens(word, text)
df_apple_words

#> # A tibble: 298 × 2
#>    song  word 
#>    <chr> <chr>
#>  1 apple i    
#>  2 apple guess
#>  3 apple the  
#>  4 apple apple
#>  5 apple don't
#>  6 apple fall 
#>  7 apple far  
#>  8 apple from 
#>  9 apple the  
#> 10 apple tree 
#> # ℹ 288 more rows

Plotting Some Sentiments

Let’s plot some of these:

# install.packages("paletteer")
library(paletteer)
df_mat_sent |> 
  count(word,sentiment) %>%
  group_by(sentiment) %>%
  slice_max(n, n = 10) %>% 
  ungroup() %>%
  mutate(word = reorder(word, n)) %>%
  ggplot(aes(x = n, y = word, fill = sentiment)) +
  geom_col(show.legend = FALSE) +
  facet_wrap(~sentiment, scales = "free_y") +
# using a 'beyonce' palette from the colors
  scale_fill_paletteer_d("beyonce::X18") +
  labs(x = "Contribution to sentiment",
       y = NULL)

Let’s make this a function:

plot_sentiments <- function(sentiment_data){
sentiment_data |> 
  count(word,sentiment) %>%
  group_by(sentiment) %>%
  slice_max(n, n = 10) %>% 
  ungroup() %>%
  mutate(word = reorder(word, n)) %>%
  ggplot(aes(x = n, y = word, fill = sentiment)) +
  geom_col(show.legend = FALSE) +
  facet_wrap(~sentiment, scales = "free_y") +
# using a 'beyonce' palette from the colors
  scale_fill_paletteer_d("beyonce::X18") +
  labs(x = "Contribution to sentiment",
       y = NULL)
}

plot_sentiments(df_adore_sent)

df_apple_words |> 
  inner_join(get_sentiments("bing")) |> 
  plot_sentiments()

#> Joining with `by = join_by(word)`