Take-home Exercise 4

Take-home Exercise

Putting Visual Analytics into Practical Use: VAST Challenge 2022, Challenge 2: Patterns of Life.

Leslie Long Nu https://www.linkedin.com/in/leslielongnu/ (SMU, MITB)https://scis.smu.edu.sg/master-it-business
2022-05-22

1. Overview

1.1 Introduction

With reference to Challenge 2 Question 3 of VAST Challenge 2022, this take-home exercise will reveal the daily routines of two selected participants of the city of Engagement, Ohio USA. ViSiElSe and other appropriate visual analytics R packages will be used.

Challenge 2: Patterns of Life, Question 3

Participants have given permission to have their daily routines captured. Choose two different participants with different routines and describe their daily patterns, with supporting evidence.

1.2 Methodology

As VAST challenge datasets include participants’ daily routine logs spanning for 15 months, given the time limit (1 week) of this take-home exercise, the scope of this take-home exercise is scaled down to focus on specific areas of participant’s daily life routine:

2. Data Preparation

2.1 Install and Load Packages

The following code chunk installs the required R packages and loads them onto RStudio environment.

show
packages = c('scales', 'viridis', 'lubridate', 'ggthemes', 
             'gridExtra', 'tidyverse', 'readxl', 'knitr',
             'data.table', 'ViSiElse', 'patchwork')
for (p in packages){
  if(!require(p, character.only = T)){
    install.packages(p)
  }
  library(p,character.only = T)
}

2.2 Import Raw Datasets

Relevant datasets are imported using read_csv() of readr package, which is useful for reading delimited files into tibbles.

show
participants <- read_csv('data/Participants.csv')
show
logs1 <- read_csv('rawdata/ParticipantStatusLogs1.csv')

2.3 Data Wrangling

Locate Target Participants

The following code chunk is used to extract participants with educations levels of low and graduate to locate our target participants.

show
low <- participants %>%
  filter(educationLevel == 'Low')
head(low, 5)
# A tibble: 5 × 7
  participantId householdSize haveKids   age educationLevel
          <dbl>         <dbl> <lgl>    <dbl> <chr>         
1            21             3 TRUE        35 Low           
2            23             3 TRUE        57 Low           
3            29             3 TRUE        18 Low           
4            34             3 TRUE        23 Low           
5            35             3 TRUE        20 Low           
# … with 2 more variables: interestGroup <chr>, joviality <dbl>
show
graduate <- participants %>%
  filter(educationLevel == 'Graduate')
head(graduate, 5)
# A tibble: 5 × 7
  participantId householdSize haveKids   age educationLevel
          <dbl>         <dbl> <lgl>    <dbl> <chr>         
1            14             3 TRUE        38 Graduate      
2            18             3 TRUE        20 Graduate      
3            31             3 TRUE        40 Graduate      
4            36             3 TRUE        41 Graduate      
5            45             3 TRUE        50 Graduate      
# … with 2 more variables: interestGroup <chr>, joviality <dbl>

Given the two data table of different educational levels, the following two participants are chosen:

This exercise is interested to find out how these two participants’ daily routine varies due to different education levels, as they have otherwise very similar demographic.

Extract Relevant Subsets

The following code chunk extracts participant id 21 & 14’s daily life routine on Mar 1, 2022.

show
# participant id 21:
logs_21 <- logs1 %>%
  mutate(date = date(timestamp),
         time = (hour(timestamp)*60+minute(timestamp))) %>%
  filter(participantId== 21,
         date == '2022-03-01') %>%
  select(participantId, date, time, currentMode:sleepStatus, timestamp)

# participant id 14:
logs_14 <- logs1 %>%
  mutate(date = date(timestamp),
         time = (hour(timestamp)*60+minute(timestamp))) %>%
  filter(participantId== 14,
         date == '2022-03-01') %>%
  select(participantId, date, time, currentMode:sleepStatus, timestamp)

The following code chunk saves the two logs into RDS format and read as RDS to avoid uploading large files onto GitHub.

show
saveRDS(logs_21, 'data/logs_21.rds')
saveRDS(logs_14, 'data/logs_14.rds')
show
logs_21 <- readRDS('data/logs_21.rds')
logs_14 <- readRDS('data/logs_14.rds')

Overview of Data

The following code chunk allows us to have an overview of the two status logs.

show
head(logs_21, 10)
# A tibble: 10 × 7
   participantId date        time currentMode hungerStatus sleepStatus
           <dbl> <date>     <dbl> <chr>       <chr>        <chr>      
 1            21 2022-03-01     0 AtHome      JustAte      Awake      
 2            21 2022-03-01     5 AtHome      JustAte      Awake      
 3            21 2022-03-01    10 AtHome      JustAte      Awake      
 4            21 2022-03-01    15 AtHome      JustAte      Awake      
 5            21 2022-03-01    20 AtHome      JustAte      Awake      
 6            21 2022-03-01    25 AtHome      JustAte      Awake      
 7            21 2022-03-01    30 AtHome      JustAte      Sleeping   
 8            21 2022-03-01    35 AtHome      JustAte      Sleeping   
 9            21 2022-03-01    40 AtHome      JustAte      Sleeping   
10            21 2022-03-01    45 AtHome      JustAte      Sleeping   
# … with 1 more variable: timestamp <dttm>
show
head(logs_14, 10)
# A tibble: 10 × 7
   participantId date        time currentMode hungerStatus sleepStatus
           <dbl> <date>     <dbl> <chr>       <chr>        <chr>      
 1            14 2022-03-01     0 AtHome      JustAte      Awake      
 2            14 2022-03-01     5 AtHome      JustAte      Awake      
 3            14 2022-03-01    10 AtHome      JustAte      Sleeping   
 4            14 2022-03-01    15 AtHome      JustAte      Sleeping   
 5            14 2022-03-01    20 AtHome      JustAte      Sleeping   
 6            14 2022-03-01    25 AtHome      JustAte      Sleeping   
 7            14 2022-03-01    30 AtHome      JustAte      Sleeping   
 8            14 2022-03-01    35 AtHome      JustAte      Sleeping   
 9            14 2022-03-01    40 AtHome      JustAte      Sleeping   
10            14 2022-03-01    45 AtHome      BecameFull   Sleeping   
# … with 1 more variable: timestamp <dttm>

From the overview we can tell that some of the values are not reliable as both participants are just ate while they are sleeping.

The following code chunk aims to examine the various activities recorded in the status log using unique().

show
unique(logs_21$currentMode)
[1] "AtHome"       "Transport"    "AtWork"       "AtRestaurant"
[5] "AtRecreation"
show
unique(logs_21$hungerStatus)
[1] "JustAte"        "BecameFull"     "BecomingHungry"
[4] "Hungry"        
show
unique(logs_21$sleepStatus)
[1] "Awake"    "Sleeping"

The following meaningful activities will be tracked based on the status log:

Hunger status column is not used for several reasons here. Firstly, the labels indicate participant’s hunger feelings instead of what they did at the time, therefore, it is challenging to determine their meal times based on this column. For example, both just ate and became full could be interpreted as during meal or just finished meal. Secondly, the values in this column are unreliable as it is found that both participants recorded just ate during their sleep. Therefore, this take-home exercise did not use hunger status to reflect participant’s daily life routine.

Deriving Activity Labels

The following code chunk is used to label the activity status with numbers and focus on the activities we intend to track. cumsum() is used to label only the tracked activities by giving them the same number label, while the other activities are labels as the accumulative sum. For example, to track sleeping, all other status in sleep mode are given incremental numbers while it stays the same for sleeping. The same is applied for all the tracked activities.

show
# mark number labels for each activity
# if there is no change in status: number label increase by 1
# if there is change in status: number label stays the same
logs_21num <- logs_21 %>%
  mutate(work= cumsum(currentMode != 'AtWork'),
         restaurant= cumsum(currentMode != 'AtRestaurant'),
         recreation= cumsum(currentMode != 'AtRecreation'),
         sleep = cumsum(sleepStatus != 'Sleeping'))

The following code chunk is used to match number labels to start … and stop … labels, for the first and last record of the tracked activities. For example, when the sleep status changed from awake to sleeping (when the number label stops increasing), the first sleeping status is marked as start sleep, and the last sleeping is marked as stop sleep.

show
# create 'start...' and 'stop...' labels based on the number labels
logs_21label <- logs_21num %>%
  mutate(wlabel=ifelse(work!=lag(work, default=0), 'NA',
                       (ifelse(work!=lead(work, default=0),
                               'Stop Work', 'Start Work'))),
         wlabelf=ifelse(wlabel==lag(wlabel, default='NA')|wlabel=='NA',
                                 NA, wlabel),
         rtlabel=ifelse(restaurant!=lag(restaurant, default=0), 'NA',
                       (ifelse(restaurant!=lead(restaurant, default=0),
                               'Leave Restaurant', 'At Restaurant'))),
         rtlabelf=ifelse(rtlabel==lag(rtlabel, default='NA')|rtlabel=='NA',
                                 NA, rtlabel),
         rnlabel=ifelse(recreation!=lag(recreation, default=0), 'NA',
                       (ifelse(recreation!=lead(recreation, default=0),
                               'Stop Recreation', 'Start Recreation'))),
         rnlabelf=ifelse(rnlabel==lag(rnlabel, default='NA')|rnlabel=='NA',
                                 NA, rnlabel),
         slabel=ifelse(sleep!=lag(sleep, default=0), 'NA',
                       (ifelse(sleep!=lead(sleep, default=0),
                               'Stop Sleep', 'Start Sleep'))),
         slabelf=ifelse(slabel==lag(slabel, default='NA')|slabel=='NA',
                                 NA, slabel))

Lastly, the derived dataset is cleaned and transposed to fit into VisiElSe format.

show
# extract relevant status 
logs_21f <- logs_21label %>%
  select(1:3, wlabelf, rtlabelf, rnlabelf, slabelf)

# derive final status 
logs_21ff <- logs_21f %>%
  unite("final", c(wlabelf, rtlabelf, rnlabelf, slabelf), sep="", 
        remove = TRUE, na.rm = TRUE) %>%
  select(4, 3) %>%
  mutate_all(na_if,"") %>%
  drop_na(final) 

# transpose logs_21f into ViSiElSe data template
log_21f <- transpose(logs_21ff[,-1])
colnames(log_21f) <- logs_21ff$final

# add id column 
log_21f <- rowid_to_column(log_21f, 'id' )

The following code chunk applies the same procedures to prepare the data for participant 14’s daily routine.

show
# mark number labels for each activity
# if there is no change in status: number label increase by 1
# if there is change in status: number label stays the same
logs_14num <- logs_14 %>%
  mutate(work= cumsum(currentMode != 'AtWork'),
         restaurant= cumsum(currentMode != 'AtRestaurant'),
         recreation= cumsum(currentMode != 'AtRecreation'),
         sleep = cumsum(sleepStatus != 'Sleeping'))

# create 'start...' and 'stop...' labels based on the number labels
logs_14label <- logs_14num %>%
  mutate(wlabel=ifelse(work!=lag(work, default=0), 'NA',
                       (ifelse(work!=lead(work, default=0),
                               'Stop Work', 'Start Work'))),
         wlabelf=ifelse(wlabel==lag(wlabel, default='NA')|wlabel=='NA',
                                 NA, wlabel),
         rtlabel=ifelse(restaurant!=lag(restaurant, default=0), 'NA',
                       (ifelse(restaurant!=lead(restaurant, default=0),
                               'Leave Restaurant', 'At Restaurant'))),
         rtlabelf=ifelse(rtlabel==lag(rtlabel, default='NA')|rtlabel=='NA',
                                 NA, rtlabel),
         rnlabel=ifelse(recreation!=lag(recreation, default=0), 'NA',
                       (ifelse(recreation!=lead(recreation, default=0),
                               'Stop Recreation', 'Start Recreation'))),
         rnlabelf=ifelse(rnlabel==lag(rnlabel, default='NA')|rnlabel=='NA',
                                 NA, rnlabel),
         slabel=ifelse(sleep!=lag(sleep, default=0), 'NA',
                       (ifelse(sleep!=lead(sleep, default=0),
                               'Stop Sleep', 'Start Sleep'))),
         slabelf=ifelse(slabel==lag(slabel, default='NA')|slabel=='NA',
                                 NA, slabel))

# extract relevant status 
logs_14f <- logs_14label %>%
  select(1:3, wlabelf, rtlabelf, rnlabelf, slabelf)

# derive final status 
logs_14ff <- logs_14f %>%
  unite("final", c(wlabelf, rtlabelf, rnlabelf, slabelf), sep="", 
        remove = TRUE, na.rm = TRUE) %>%
  select(4, 3) %>%
  mutate_all(na_if,"") %>%
  drop_na(final) 

# transpose logs_21f into ViSiElSe data template
log_14f <- transpose(logs_14ff[,-1])
colnames(log_14f) <- logs_14ff$final

# add id column 
log_14f <- rowid_to_column(log_14f, 'id' )

3. Visualizations and Insights

3.1 Ploting Default ViSiElSe

show
v1 <- visielse(log_21f, informer = NULL)
show
v2 <- visielse(log_14f, informer = NULL)

3.2 ViSibook

The following code chunk changes the labels and add long actions to the graph.

show
# for participant 21
b1 <- ConvertFromViSibook(v1@book)

b1 <- b1[order(as.numeric(b1$showorder)), ] 

# label punctual activities 
b1$label <- c("Start sleep", "Stop sleep","Start AM work", 
              "Stop AM work", "At restaurant", "Leave restaurant", 
              "Start PM work", "Stop PM work", "Start recreation1",
              "Stop recreation1", "Start recreation2","Stop recreation2", 
              "Start recreation3","Stop recreation3")

# label long activities 
b1[15,] <- c("sleep", "Sleep", "l", 1, "Start Sleep", "Stop Sleep")
b1[16,] <- c("work1", "AM Work", "l", 2, "Start Work", "Stop Work")
b1[17,] <- c("restaurant1", "Lunch at Restaurant", "l", 3, "At Restaurant", 
             "Leave Restaurant")
b1[18,] <- c("work2", "PM Work", "l", 4, "Start Work.1", "Stop Work.1")
b1[19,] <- c("recreation1", "First Recreation", "l", 5, "Start Recreation", 
             "Stop Recreation")
b1[20,] <- c("recreation2", "Second Recreation", "l", 7, "Start Recreation.1", 
             "Stop Recreation.1")
b1[21,] <- c("recreation3", "Third Recreation", "l", 8, "Start Recreation.2", 
             "Stop Recreation.2")
b1$showorder <- c(NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, 
                  1, 2, 3, 4, 5, 6, 7) 
b1 <- b1[order(as.numeric(b1$showorder)), ]
show
# for participant 14
b2 <- ConvertFromViSibook(v2@book)

b2 <- b2[order(as.numeric(b2$showorder)), ] 

# label punctual activities 
b2$label <- c("Start sleep", "Stop sleep","Start AM work", 
              "Stop AM work", "At restaurant1", "Leave restaurant1", 
              "Start PM work", "Stop PM work", "Start recreation1",
              "Stop recreation1", "At restaurant2", "Leave restaurant2",
              "Start recreation2","Stop recreation2", "Start recreation3",
              "Stop recreation3")

# label long activities 
b2[17,] <- c("sleep", "Sleep", "l", 1, "Start Sleep", "Stop Sleep")
b2[18,] <- c("work1", "AM Work", "l", 2, "Start Work", "Stop Work")
b2[19,] <- c("restaurant", "Lunch at Restaurant", "l", 3, "At Restaurant", 
             "Leave Restaurant")
b2[20,] <- c("work2", "PM Work", "l", 4, "Start Work.1", "Stop Work.1")
b2[21,] <- c("recreation1", "First Recreation", "l", 5, "Start Recreation", 
             "Stop Recreation")
b2[22,] <- c("restaurant2", "Dinner at Restaurant", "l", 6, "At Restaurant.1", 
             "Leave Restaurant.1")
b2[23,] <- c("recreation2", "Second Recreation", "l", 7, "Start Recreation.1", 
             "Stop Recreation.1")
b2[24,] <- c("recreation3", "Third Recreation", "l", 8, "Start Recreation.2", 
             "Stop Recreation.2")
b2$showorder <- c(NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, 
                  NA, 1, 2, 3, 4, 5, 6, 7, 8) 
b2 <- b2[order(as.numeric(b2$showorder)), ]

ViSiElSe with Long Activities

The following ViSiElSe is plotted taking into account of participants’ long actions:

show
v12 <- visielse(log_21f,
                book = b1, 
                doplot = F,
                informer = NULL,
                pixel = 60)

plot(v12,
     vp0w = 0.7, 
     unit.tps = "min", 
     scal.unit.tps = 60, 
     main = "Participant 21 on 1 Mar, 2022 (Education Low)")

show
v22 <- visielse(log_14f,
                book = b2, 
                doplot = F,
                informer = NULL,
                pixel = 60)
plot(v22,
     vp0w = 0.7, 
     unit.tps = "min", 
     scal.unit.tps = 60, 
     main = "Participant 14 on 1 Mar, 2022 (Education Graduate)")

From the above two plots, we can tell that one key difference of these two participants’ day is that participant 14 went to restaurants for both lunch and dinner while participant 21 only went to restaurant for lunch. We can’t tell for sure where participant 21 had dinner from the data.

3.3 Combine Both Routines

Given both participants’ daily activities are similar except dinner, we bear this in mind and take out dinner for participant 14 in order to plot both routines on one graph for better comparison.

The following code chunk take out dinner activity from visibook: Note: both participants’ daily routines are added in twice in log_both so that group function can be used when plotting ViSiElSe. It does not affect the accuracy of the plot as weights are the same for both.

show
# remove dinner activity from dataset
log_14f1 <- log_14f %>%
  select(-c(`At Restaurant.1`, `Leave Restaurant.1`)) 

# combine both daily routine into one dataset
# weight= 2 is added for both participants to allow for plotting visielse 
# group function 
log_both <- rbind(log_21f, log_14f1, log_21f, log_14f1) 

# rename id 
log_both[1, 1] = 'Participant 21'
log_both[2, 1] = 'Participant 14'
log_both[3, 1] = 'Participant 21'
log_both[4, 1] = 'Participant 14'

The following code chunk plots both participants’ routines on the same graph:

show
g <- c("Participant 21 (Low)", "Participant 14 (Graduate)", 
       "Participant 21 (Low)", "Participant 14 (Graduate)")
v3 <- visielse(log_both,
               group = g,
               book = b1, 
               doplot = F,
               informer = NULL,
               pixel = 60,
               method = 'cut')

plot(v3,
     vp0w = 0.7, 
     unit.tps = "min", 
     scal.unit.tps = 60, 
     main = "Participant 21 and 14's Routines on 1 Mar, 2022")

Insights

Recap on our target participants’ demographic:

From the above graph we are able to derive the following insights on their routines:

4. Learning Points

In this take-home exercise, I was able to apply a lot of data wrangling techniques in R while preparing the data to plot in ViSiElSe. In addition, we were able to explore plotting with ViSiElSe, especially by labeling and plotting long activities using ViSibook. Given the time constraint, I felt that the following could be explored further to better answer this question: