Take-home Exercise 5

Take-home Exercise

Putting Visual Analytics into Practical Use: VAST Challenge 2022, Challenge 2: Patterns of Life.

Leslie Long Nu https://www.linkedin.com/in/leslielongnu/ (SMU, MITB)https://scis.smu.edu.sg/master-it-business
2022-05-29

1. Overview

1.1 Introduction

With reference to Challenge 2 Question 3 of VAST Challenge 2022, this take-home exercise will reveal:

Challenge 2: Patterns of Life

Assuming the volunteers are representative of the city’s population, characterize the distinct areas of the city that you identify. For each area you identify, provide your rationale and supporting data.

Where are the busiest areas in Engagement? Are there traffic bottlenecks that should be addressed? Explain your rationale.

1.2 Methodology

This take-home exercise aims to reveal the social areas of the city from two aspects and cross-check:

The former reveals where we expect as the social areas of the city as participants can socialize at restaurants and pubs, and the latter shows the actual and the frequency of participants visiting the social areas in the city.

As for the bottlenecks of the city, this take-home exercise will reveal them by plotting residents transport patterns during morning and afternoon peak hours during weekdays, and analyzing the areas that most residents travel pass. In this exercise, the morning peak hours is defined as 7-9am and afternoon peak hours is defined as 5-7pm according to the example given in Cleveland, Ohio here.

For the purpose of this exercise, status log 1 will be analyzed, which contains data for the duration of 6 days and is sufficient to reveal the pattern at the start of the study.

2. Data Preparation

2.1 Install and Load Packages

The following code chunk installs the required R packages and loads them onto RStudio environment. sf, an R package specially designed to handle geospatial data in simple feature objects.

show
packages = c('tidyverse', 'sf', 'tmap', 'lubridate', 'clock',
             'sftime', 'rmarkdown')
for (p in packages){
  if(!require(p, character.only = T)){
    install.packages(p)
  }
  library(p,character.only = T)
}

2.2 Import Datasets

Well-known text (WKT) is a human readable representation for spatial objects like points, lines, or enclosed areas on a map.

In the code chunk below, read_sf() of sf package is used to parse the location files, such as buildings, restaurants, pubs, schools, apartments and employers, into R as an sf data.frame.

show
buildings <- read_sf('data/Buildings.csv',
                   options= 'GEOM_POSSIBLE_NAMES=location')
restaurants <- read_sf('data/Restaurants.csv',
                   options= 'GEOM_POSSIBLE_NAMES=location')
pubs <- read_sf('data/Pubs.csv',
                   options= 'GEOM_POSSIBLE_NAMES=location')
schools <- read_sf('data/Schools.csv',
                   options= 'GEOM_POSSIBLE_NAMES=location')
apartments <- read_sf('data/Apartments.csv',
                   options= 'GEOM_POSSIBLE_NAMES=location')
employers <- read_sf('data/Employers.csv',
                   options= 'GEOM_POSSIBLE_NAMES=location')
show
logs1 <- read_sf('data/rawdata/ParticipantStatusLogs1.csv',
                   options= 'GEOM_POSSIBLE_NAMES=currentLocation')

2.3 Data Wrangling

The following code chunk creates a new column Timestamp that converts the time of the record into POSIXct format. Using this column, day of the week and hour of the day columns are also created using wday() and hour(). filter() is used to extract social records and transport records during weekday peak hours (7-9am and 5-7pm).

show
logs1_selected <- logs1 %>%
  mutate(Timestamp = date_time_parse(timestamp,zone = '',
                                     format = '%Y-%m-%dT%H:%M:%S'),
         day = wday(Timestamp, label = TRUE),
         hour = hour(Timestamp))

# Social Records 
logs1_social <- logs1_selected %>%
  filter(currentMode == 'AtRecreation' | 
           currentMode == 'AtRestaurant')

# transport during weekday AM peak hours
logs1_tram <- logs1_selected %>%
  filter(currentMode == 'Transport' &
         (day == 'Mon'| day == 'Tue'| day=='Wed'| day=='Thu'| day=='Fri') &
         (hour ==7| hour ==8))

# transport during weekday PM peak hours
logs1_trpm <- logs1_selected %>%
  filter(currentMode == 'Transport' &
         (day == 'Mon'| day == 'Tue'| day=='Wed'| day=='Thu'| day=='Fri') &
         (hour ==17| hour ==18))

2.4 Save as and Read RDS

The extracted status log data is saved as and read in RDS format to avoid uploading large files onto Git.

show
write_rds(logs1_social, 'data/rds/logs1_social.rds')
write_rds(logs1_tram, 'data/rds/logs1_tram.rds')
write_rds(logs1_trpm, 'data/rds/logs1_trpm.rds')
show
logs1_social <- read_rds('data/rds/logs1_social.rds')
logs1_tram <- read_rds('data/rds/logs1_tram.rds')
logs1_trpm <- read_rds('data/rds/logs1_trpm.rds')

3. Visualizations and Insights

3.1 Tmap of the City

The code chunk below plots the building polygon features by using tm_polygon() well as restaurants and pubs using tm_shape(). To better reveal the locations of restaurants and pubs, the other location types are plotted first to avoid overlapping and in transparent colors by setting alpha= 0.6. In addition, restaurants and pubs are assigned brighter colors.

show
labs <- c('Restaurant', 'Pub', 'Employer', 'Apartment', 'School')
cols <- c('#ffff00', '#00ff00', "#003366", '#f08080', '#20b2aa')

map <- tm_shape(buildings) +
  tm_polygons(col = "grey60",
           size = 1,
           border.col = "black",
           border.lwd = 1,
           border.alpha = 0.5) +
  tm_shape(employers) +
  tm_dots(col = "#003366", size = 0.3, alpha= 0.6) +
  tm_shape(apartments) +
  tm_dots(col = '#f08080', size = 0.3, alpha= 0.6 ) +
  tm_shape(schools) +
  tm_dots(col = '#20b2aa', size = 0.3, alpha= 0.6) +
  tm_shape(restaurants) +
  tm_dots(col = '#ffff00', size = 0.3) +
  tm_shape(pubs) +
  tm_dots(col = '#00ff00', size = 0.3) +
  tm_add_legend(title = 'Location Types',
                type = 'symbol',
                border.col = NA,
                labels = labs,
                col = cols) +
  tm_layout(main.title = 'Map of Engagemnt City, Ohio USA',
            frame = FALSE) +
  tm_compass(size = 2,
             position = c('right', 'top')) +
  tm_credits('Source: VAST Challenge 2022')

map

Insights

From the above map, we observe that the most active social areas (with most number of restaurants and pubs) are in the north-west region and central region of the city. While the rest of the areas have few restaurants and pubs.

Social Area: North-West Region
Social Area: Central Region

3.2 Hexagon Map for Social Areas

In the code chunk below, st_make_grid() of sf package is used to create hexagons.

show
hex <- st_make_grid(buildings, 
                    cellsize=100, 
                    square=FALSE) %>%
  st_sf() %>%
  rowid_to_column('hex_id')
plot(hex)

In the code chunk below, st_join() of sf package is used to count the number of event points in the hexagons.

show
points_in_hex <- st_join(logs1_social, 
                         hex, 
                         join= st_within) %>%
  st_set_geometry(NULL) %>%
  count(name = 'pointCount', hex_id)
head(points_in_hex)
# A tibble: 6 × 2
  hex_id pointCount
   <int>      <int>
1    641        875
2    775       1274
3    815       2702
4    822         56
5    860         24
6    863      18164

In the code chunk below, left_join() of dplyr package is used to perform a left-join by using hex as the target table and points_in_hex as the join table. The join ID is hex_id.

show
hex_combined <- hex %>%
  left_join(points_in_hex, 
            by = 'hex_id') %>%
  replace(is.na(.), 0)

In the code chunk below, tmap package is used to create the hexagon binning map for social locations.

show
social <- tm_shape(hex_combined %>%
           filter(pointCount > 0))+
  tm_fill("pointCount",
          title = 'Customer Visit Records',
          n = 8,
          style = "quantile") +
  tm_borders(alpha = 0.1) +
  tm_layout(main.title = 'Social Areas of Engagemnt City, Ohio USA\nSocial Hexagon Map',
            frame = FALSE) +
  tm_compass(size = 2,
             position = c('right', 'top')) +
  tm_credits('Source: VAST Challenge 2022')

social

The following code chunk compares the hexagon map of social areas against the map of the city showing the location of the restaurants and pubs to check if they are consistent.

show
tmap_arrange(map, social)

Insights

3.3 Hexagon Map for Weekday Peak Hours

In the code chunk below, st_join() of sf package is used to count the number of event points in the hexagons for both AM and PM weekday peak hours.

show
points_in_hextram <- st_join(logs1_tram, 
                         hex, 
                         join= st_within) %>%
  st_set_geometry(NULL) %>%
  count(name = 'pointCount', hex_id)

points_in_hextrpm <- st_join(logs1_trpm, 
                         hex, 
                         join= st_within) %>%
  st_set_geometry(NULL) %>%
  count(name = 'pointCount', hex_id)

In the code chunk below, left_join() of dplyr package is used to perform a left-join by using hex as the target table and points_in_hex as the join table. The join ID is hex_id.

show
hex_combinedtram <- hex %>%
  left_join(points_in_hextram, 
            by = 'hex_id') %>%
  replace(is.na(.), 0)

hex_combinedtrpm <- hex %>%
  left_join(points_in_hextrpm, 
            by = 'hex_id') %>%
  replace(is.na(.), 0)

In the code chunk below, tmap package is used to create the hexagon binning map for both AM and PM weekday peak hours.

show
am <- tm_shape(hex_combinedtram %>%
           filter(pointCount > 0))+
  tm_fill("pointCount",
          title = 'Traffic Density',
          n = 8,
          style = "quantile") +
  tm_borders(alpha = 0.1) +
  tm_layout(main.title = 'Traffic Bottleneck of Engagemnt City, Ohio USA\nWeekday AM Peak Hours',
            frame = FALSE)

pm <- tm_shape(hex_combinedtrpm %>%
           filter(pointCount > 0))+
  tm_fill("pointCount",
          title = 'Traffic Density',
          n = 8,
          style = "quantile") +
  tm_borders(alpha = 0.1) +
  tm_layout(main.title = '\nWeekday PM Peak Hours',
            frame = FALSE) +
  tm_compass(size = 2,
             position = c('right', 'top')) +
  tm_credits('Source: VAST Challenge 2022')

tmap_arrange(am, pm)

Insights

Differences in AM PM Traffic Bottlenecks

4. Learning Points

This week we focused on geo-spatial analytics and visualizations, where we explored Well-known text (WKT) data and working with R packages such as sf and tmap. I find that the learning experience really opened my eyes on what data visualization can do outside of the usual statistical graphs.