Ever wondered why it is so hard to get on a shuttle bus during lunch hours? As students, we constantly gripe about boarding the bus to get to class on time, and while NUS is trying its best to resolve this problem, it still persists till today.
(Image from the Straits Times)
Wolf Pack in NUS Business Analytics Society
As part of NUS BAS’s new initiative, Wolf Pack, our group decided to pick up a familiar student life-related problem and investigate its root cause. To tackle the insufficient supply of shuttle buses during peak hours, we planned to model travel demand and submit a report to the Office of Student Affairs (OSA) to see if they can adjust the bus schedule to meet the required demand.
Modelling Data Based on Modules
Our group decided to model the demand based on the number of classes occuring at once and class size. We can then estimate demand based on the number of students travelling to the main campus from University Town or Kent Ridge MRT.
Therefore, our first step to take was to scrape module-related data from the IVLE or CORS. We found out later that the data was easily available from the NUSMods API. We then converted it from JSON to CSV format (Excel-readable) using an online converter. Due to the sheer amount of data available and our inexperience, we decided to limit our scope to Business modules taken in BIZ 2 and COM 2.
Data Cleaning Using Pandas
As expected from online-sourced data, some data cleaning is required, and in our case our data had included duplicate modules and modules with two different module codes (e.g. ACC1002/1701). An experienced senior in Wolf Pack recommended us to learn Pandas, a useful data structures and data analysis tool in Python.
(An example of a duplicate. Class K9 for ACC2002 appears twice.)
To remove duplicates from our data set, we used the drop_duplicates function. However, inputting too few or too many parameters into the argument would result in duplicates still being present or non-duplicates being removed. After reading posts on Stack Overflow and some trial and error, we finally managed to get the code right.
Then, we filtered the data based on the module venues. The relevant venues would be close to BIZ 2 and COM 2 bus stops.
After exporting the cleaned dataset into Excel, the next step was to visualise the data.
Data Visualisation using Tableau
After learning Tableau from the BAS tutorial, we wanted to see the starting and ending times of all the modules across the week.
(Most of the Biz modules held in BIZ1 and BIZ2 ended around 11 am or 2 pm. This was especially so for Tuesdays and Thursdays)
Our initial plan was to scrape the class sizes for each module from IVLE. However, due to the class sizes being kept in different pages we were unable to do so. We tried doing it with CORS but because most of the foundation business modules were allocated, the bidding reports did not give reliable results.
Next, we wanted to create a heat map, similar to the one that we learnt in the Tableau tutorial.
(This heat map shows the profits for each state and is colour coded to show the difference between high and low profits, and negative profits)
Another Tableau guide gave us information about how we should start. We decided to do a scatterplot, using point coordinates to plot various locations.
A friend recommended the NUS ArcGIS Directory, which contained a folder containing data with lat long coordinates for each location in BIZ 2 and COM 2. This is exportable to JSON. Spatial data (shapefile or geoJSON) was required, so we used an online converter to do that. Following that, we realised the coordinates pointed us to the middle of the South Altantic Ocean, so it was necessary to transform the coordinates. There were online resources to help us do it, but we would need to input each pair of coordinates manually. Looking to Stack Overflow once again, we hoped a Python library could help us. The relevant articles we found were too technical and we did not have the experience or the time to understand them, so we decided to scrap that idea.
Nevertheless, we manually converted some coordinates so that we can input them into Tableau. Unfortunately, what we saw was a tiny dot in Singapore, as shown below.
Perhaps Singapore was too small compared to the United States? It was possible to build a custom map in Tableau, but due to the lack of time, we decided to wrap up our project.
Despite not accomplishing the goals we originally set for this project (i.e. submitting a report to OSA, scraping data from IVLE and creating a heat map using Tableau), the learning curve was indeed quite steep. Nevertheless, we actually enjoyed the experience because for most of us, it was our first time embarking on an independent project where we had to learn skills tailored to our tasks.
We hope that you have enjoyed reading about our project and learnt something from it!
The members of the team include:
Nelson Lim Jun Kang, Year 1, BAC
Jonathan Low Ka Heen, Year 2, BBA
Moh Song En Joseph, Year 2, BBA
Quek Yew Hong Ryan, Year 2, BBA + Econs