This is an abridged version of the original article. If you are interested in the technical details of the project, please read the full article here.
University students complaining about tough school life, relationship problems and about the competitive job market surely sounds familiar. We, university students ourselves, can relate to it. A group of us wanted to discover how NUS students’ emotions varied over a period. This article is an update on our progress.
Wolf Pack in NUS Business Analytics Society
As part of NUS BAS’s new initiative, Wolf Pack, we were challenged as a group to come up with a student-life related problem and use data analytics (DA) solutions to solve it. We intend to discover the emotions over the period of a year, by doing Sentiment Analysis on posts from NUSwhispers.
Sentiment Analysis is the process of computationally identifying and categorising a given text to determine the emotions involved in a text. Some examples of the emotions are Anger, Fear, Joy, Sadness, and Surprise!
NUSwhispers is a Facebook group with close to 23,000 followers. This is a platform where students can share their thoughts and feelings anonymously. We determined that data from this social media page would give us the widest coverage of opinions on student life.
A Summary of our Progress thus far
1. Scraping Data from NUSwhispers
Using Python, we scraped 2,557 posts with their likes and reactions received. The posts dated from 2015 to present.
2. Determining Emotions from Reactions to Posts
Before applying Sentiment Analysis to the posts, we considered categorizing each post according to the reactions (Happy, Sad, Wow, etc.) that it received. Though feasible, we had to handle posts which have little reactions or only pure likes with an alternative algorithm.
3. Sentiment Analysis with R Studio
We carried out sentiment analysis on the dataset using R programming. We referred to an online tutorial to analyse the words in the posts and determine the emotions they are expressing. Though more efficient than the previous method, there are certain limitations we have highlighted later.
4. Preliminary Findings
The results of our analysis of NUS Whispers posts in February 2018 can be seen below:
The high number of “Trust” & “Fear” related posts was congruent with our initial hypothesis. Unlike our initial hypothesis that the negative posts will dominate over positive posts, they seem to be comparable in number.
Limitations & Challenges
We realized that results from sentiment analysis must be taken with a pinch of salt, as we have yet to verify the reliability of the Syuzhet package in analysing sentiments.
We believe that the “Singlish” used by Students on posts can be a problem as the package has no way of identifying these slang terms and categorizing them into different sentiments.
Sentence structure is also another limitation of our analysis. The Syuzhet package only analyses keywords and hence cannot pick up on sarcastic comments, which may have a tone contradictory to the sentiments the individual words convey.
This is just a glimpse of what we intend to do! We envisage our final prototype to be a visual infographic comparing the sentiments of students in NUS and NTU over a period.
Our team is working extremely hard on addressing the limitations above and turning the idea into reality. We are now in Week 3 of our 5-Week Challenge. If you would like to find out more about the technical workings of this project thus far, click here.
Do follow us on t.me/nusbasblog, to receive updates on our project!
From left to right in the foreground: Kai Cong, Medric, Simon, Leon and Kelvin.