Folk-Horror Faves Sentiment Analysis

   

   One of my favorite film genres is Folk Horror. In recent years there has been a resurgence in films falling into this category. A few of my favorite recent picks are Hereditary, Midsommar, and Saint Maud, from the A24 Entertainment company. I have talked to dozens of people about these films and have found that they are incredibly polarizing. I saw Hereditary when it came out in theaters with some friends and my father. Most of my friends, including myself, absolutely loved it. My dad hated it! The same goes for many other films in the folk horror genre; you are either a die-hard fan, or you hate it!

Placeholder500x500

Critics loved the movies, but audiences did not rate them so highly. I took to Twitter to get some information on the overall sentiment of these movies from the userbase.

Scraping Twitter turned out to be no easy feat. Many python libraries dedicated to the task seem to be defunct as Twitter routinely changes their APIs and frontend to limit scraping. I was able to find a work around the in the library called Scweet. This library uses a Selenium backend to accomplish the task in a relatively efficient manner. I decided to limit the date ranges to the movies to about 1 year after their initial release, since some of the movies are a few years older than the other and I wanted to get sample sizes that were roughly the same for each film. Using Scweet, I found I only collected about a hundred in total. I manually searched tweets and came up with about the same number, so this may be due to the date restrictions I imposed or perhaps these movies just aren’t that popular! In the future I will dig into this a bit more, and consider scraping sentiments from other social media sites like Instagram to add more data to the analysis.


With many machine learning applications, great care must be taken to clean and pre-process the data. Cleansing and data preparation considerations must be made with the overall architecture of your machine learning process in mind. I decided for this analysis to utilize the Vader sentiment analysis tool. Vader (Valence Aware Dictionary and sEntiment Reasoner) is a lexicon and rule-based tool designed to capture sentiment on social media. Because of this, it may not perform as well on long format pieces but seems to be widely used for social media sentiment analysis. Since this tool utilizes both punctuation and capitalization is inputs into its sentiment scoring algorithm, I opted to leave these features in my data set.

Taking the sum of the area under each of these curves provided an overall sentiment score. As we can see, the overall sentiment for Saint Maud and Midsommar are positive. Hereditary has a much lower overall score. This movie was quite terrifying, and I suspect a lot of the tweets may have a very negative sentiment due to that.

Placeholder500x500

In the future, I will expand upon this analysis by acquiring more data for the sentiment analysis. I would also like to better understand how the Vader model performs on niche text topics like horror specifically. While a comment such as "I was terrified!" may score a negative sentiment, this could actually be meant as a compliment!