Package Installation
Packages are collection of R functions, data, and complied code that are in a well-defined format so that non-CS students like us can happily utilised for data analysis without being too overwhelmed from how coding works.
Some functions you can use to check what’s installed in your R studios are as below, but you can easily go through them in the packages tab in the files/directory pane. There are specific libraries for financial analysis, etc, but the two more important packages for now is dplyr (for data manipulation), and ggplot2 (plotting of data) which you can install first from the package tab.
## [1] ".GlobalEnv" "package:knitr" "package:RWordPress" ## [4] "tools:rstudio" "package:stats" "package:graphics" ## [7] "package:grDevices" "package:utils" "package:datasets" ## [10] "package:methods" "Autoloads" "package:base"
Setting a Working Directory
A working directory is the folder that R will look into for data or R scripts, and we can find the current directory that the RStudio is looking into through the function getwd()
getwd()
## [1] "C:/Users/fan_t/Downloads/data"
From this we can see that my current directory is set as above, but as an usual project go, you probably will save all your files inside another folder, so its best you set your working directory at that place, for example:
Then I would want to change the directory to that link above with the function setwd()
P.S. When you copy the directory link from the Window Explorer, the link is in backslashes, but R uses normal slashes, so try not to use some directory that is deep, deep in your drive, if you don’t want to waste time changing lots of backslashes
setwd("C:/Users/fan_t/Google Drive/data") getwd()
## [1] "C:/Users/fan_t/Google Drive/data"
Data Importing
While R can read excel files ending xls, comma separated files (.csv) are less likely to give you problems beacuse they are essentially long strings of characters that is free from any formatting. You can also do it manually from “Environment Pane”–> “Import Dataset”
You can import the csv data file through the function read.csv()
setwd("C:/Users/fan_t/Google Drive/data") df <- read.csv("landed_all.csv") #by reading in the data, you assign the data to the variable named "df" with the <- symbol #You can called the name of the file straightaway because you have set the correct working directory #else, you have to do this: #df <- read.csv("C:/Users/fan_t/Google Drive/data/landed_all.csv") # the function head(variable-name) helps you check the first 6 rows of data head(df)
## Project.Name Address No..of.Units ## 1 133 Lorong L Telok Kurau 133 Lorong L Telok Kurau 1 ## 2 27 Lengkong Satu 27 Lengkong Satu 1 ## 3 PLOT 3 Kheam Hock Road PLOT 3 Kheam Hock Road 1 ## 4 PEAKVIEW ESTATE 2 Jalan Anak Patong 1 ## 5 SARACA GARDENS 8 Saraca Walk 1 ## 6 124 Lorong K Telok Kurau 124 Lorong K Telok Kurau 1 ## Area..sqm. Type.of.Area Transacted.Price.... Nett.Price... ## 1 1346 Land 7270000 - ## 2 403 Land 2280000 - ## 3 288 Unknown 3020000 - ## 4 418 Land 1900000 - ## 5 201 Land 1590000 - ## 6 515 Land 2500000 - ## Unit.Price....psm. Unit.Price....psf. Sale.Date Property.Type ## 1 5400 502 1/1/1995 Detached House ## 2 5652 525 1/1/1995 Semi-Detached House ## 3 10504 976 1/1/1995 Semi-Detached House ## 4 4543 422 1/1/1995 Semi-Detached House ## 5 7910 735 1/1/1995 Terrace House ## 6 4854 451 1/1/1995 Semi-Detached House ## Tenure Completion.Date Type.of.Sale Purchaser.Address.Indicator ## 1 Freehold 1950 Resale Private ## 2 Freehold 1980 Resale Private ## 3 Freehold 1995 New Sale Private ## 4 Freehold 1995 Resale Private ## 5 Freehold 1993 Resale Private ## 6 Freehold 1969 Resale Private ## Postal.District Postal.Sector Postal.Code Planning.Region ## 1 15 42 425570 East Region ## 2 14 41 417501 East Region ## 3 11 29 NA Central Region ## 4 16 48 489318 East Region ## 5 28 80 807245 North East Region ## 6 15 42 425764 East Region ## Planning.Area ## 1 Bedok ## 2 Bedok ## 3 Novena ## 4 Bedok ## 5 Serangoon ## 6 Bedok
Data Exporting
Assuming you have finished processing your data, and want to output it out into another CSV file to send someone else, you can do so through the function write.csv(“variablewithdata”,“filenameyouwant.csv”) Similar to read.csv() you can skip large part of the link if you have set up the correct working directory.
processeddata <- airquality processeddata <- write.csv(processeddata,"Processed_data.csv")
You can’t see it here, but you will see a file generated in your working directory. The CSV files can be written to a new or existing file.
To play around with sample data, there are quality data sources such as data.gov.sg, or Kaggle. Essentially whenever you find websites that provides some form of data, look out for csv file downloads. That’s the end for this section!