Putting Visual Analytics into Practical Use: VAST Challenge 2022, Challenge 2: Patterns of Life.
With reference to Challenge 2 Question 3 of VAST Challenge 2022,
this take-home exercise will reveal the daily routines of
two selected participants of the city of Engagement,
Ohio USA. ViSiElSe and
other appropriate visual analytics R packages will be used.
Challenge 2: Patterns of Life, Question 3
Participants have given permission to have their daily routines captured. Choose two different participants with different routines and describe their daily patterns, with supporting evidence.
As VAST challenge datasets include participants’ daily routine logs spanning for 15 months, given the time limit (1 week) of this take-home exercise, the scope of this take-home exercise is scaled down to focus on specific areas of participant’s daily life routine:
The following code chunk installs the required R packages and loads them onto RStudio environment.
packages = c('scales', 'viridis', 'lubridate', 'ggthemes',
'gridExtra', 'tidyverse', 'readxl', 'knitr',
'data.table', 'ViSiElse', 'patchwork')
for (p in packages){
if(!require(p, character.only = T)){
install.packages(p)
}
library(p,character.only = T)
}
Relevant datasets are imported using read_csv() of readr
package, which is useful for reading delimited files into tibbles.
participants <- read_csv('data/Participants.csv')
logs1 <- read_csv('rawdata/ParticipantStatusLogs1.csv')
The following code chunk is used to extract participants with educations levels of low and graduate to locate our target participants.
# A tibble: 5 × 7
participantId householdSize haveKids age educationLevel
<dbl> <dbl> <lgl> <dbl> <chr>
1 21 3 TRUE 35 Low
2 23 3 TRUE 57 Low
3 29 3 TRUE 18 Low
4 34 3 TRUE 23 Low
5 35 3 TRUE 20 Low
# … with 2 more variables: interestGroup <chr>, joviality <dbl>
# A tibble: 5 × 7
participantId householdSize haveKids age educationLevel
<dbl> <dbl> <lgl> <dbl> <chr>
1 14 3 TRUE 38 Graduate
2 18 3 TRUE 20 Graduate
3 31 3 TRUE 40 Graduate
4 36 3 TRUE 41 Graduate
5 45 3 TRUE 50 Graduate
# … with 2 more variables: interestGroup <chr>, joviality <dbl>
Given the two data table of different educational levels, the following two participants are chosen:
This exercise is interested to find out how these two participants’ daily routine varies due to different education levels, as they have otherwise very similar demographic.
The following code chunk extracts participant id 21 & 14’s daily life routine on Mar 1, 2022.
# participant id 21:
logs_21 <- logs1 %>%
mutate(date = date(timestamp),
time = (hour(timestamp)*60+minute(timestamp))) %>%
filter(participantId== 21,
date == '2022-03-01') %>%
select(participantId, date, time, currentMode:sleepStatus, timestamp)
# participant id 14:
logs_14 <- logs1 %>%
mutate(date = date(timestamp),
time = (hour(timestamp)*60+minute(timestamp))) %>%
filter(participantId== 14,
date == '2022-03-01') %>%
select(participantId, date, time, currentMode:sleepStatus, timestamp)
The following code chunk saves the two logs into RDS format and read as RDS to avoid uploading large files onto GitHub.
The following code chunk allows us to have an overview of the two status logs.
head(logs_21, 10)
# A tibble: 10 × 7
participantId date time currentMode hungerStatus sleepStatus
<dbl> <date> <dbl> <chr> <chr> <chr>
1 21 2022-03-01 0 AtHome JustAte Awake
2 21 2022-03-01 5 AtHome JustAte Awake
3 21 2022-03-01 10 AtHome JustAte Awake
4 21 2022-03-01 15 AtHome JustAte Awake
5 21 2022-03-01 20 AtHome JustAte Awake
6 21 2022-03-01 25 AtHome JustAte Awake
7 21 2022-03-01 30 AtHome JustAte Sleeping
8 21 2022-03-01 35 AtHome JustAte Sleeping
9 21 2022-03-01 40 AtHome JustAte Sleeping
10 21 2022-03-01 45 AtHome JustAte Sleeping
# … with 1 more variable: timestamp <dttm>
head(logs_14, 10)
# A tibble: 10 × 7
participantId date time currentMode hungerStatus sleepStatus
<dbl> <date> <dbl> <chr> <chr> <chr>
1 14 2022-03-01 0 AtHome JustAte Awake
2 14 2022-03-01 5 AtHome JustAte Awake
3 14 2022-03-01 10 AtHome JustAte Sleeping
4 14 2022-03-01 15 AtHome JustAte Sleeping
5 14 2022-03-01 20 AtHome JustAte Sleeping
6 14 2022-03-01 25 AtHome JustAte Sleeping
7 14 2022-03-01 30 AtHome JustAte Sleeping
8 14 2022-03-01 35 AtHome JustAte Sleeping
9 14 2022-03-01 40 AtHome JustAte Sleeping
10 14 2022-03-01 45 AtHome BecameFull Sleeping
# … with 1 more variable: timestamp <dttm>
From the overview we can tell that some of the values are not reliable as both participants are just ate while they are sleeping.
The following code chunk aims to examine the various activities
recorded in the status log using unique().
unique(logs_21$currentMode)
[1] "AtHome" "Transport" "AtWork" "AtRestaurant"
[5] "AtRecreation"
unique(logs_21$hungerStatus)
[1] "JustAte" "BecameFull" "BecomingHungry"
[4] "Hungry"
unique(logs_21$sleepStatus)
[1] "Awake" "Sleeping"
The following meaningful activities will be tracked based on the status log:
Hunger status column is not used for several reasons here. Firstly, the labels indicate participant’s hunger feelings instead of what they did at the time, therefore, it is challenging to determine their meal times based on this column. For example, both just ate and became full could be interpreted as during meal or just finished meal. Secondly, the values in this column are unreliable as it is found that both participants recorded just ate during their sleep. Therefore, this take-home exercise did not use hunger status to reflect participant’s daily life routine.
The following code chunk is used to label the activity status with
numbers and focus on the activities we intend to track.
cumsum() is used to label only the tracked activities by
giving them the same number label, while the other activities are labels
as the accumulative sum. For example, to track sleeping, all
other status in sleep mode are given incremental numbers while
it stays the same for sleeping. The same is applied for all the
tracked activities.
# mark number labels for each activity
# if there is no change in status: number label increase by 1
# if there is change in status: number label stays the same
logs_21num <- logs_21 %>%
mutate(work= cumsum(currentMode != 'AtWork'),
restaurant= cumsum(currentMode != 'AtRestaurant'),
recreation= cumsum(currentMode != 'AtRecreation'),
sleep = cumsum(sleepStatus != 'Sleeping'))
The following code chunk is used to match number labels to start … and stop … labels, for the first and last record of the tracked activities. For example, when the sleep status changed from awake to sleeping (when the number label stops increasing), the first sleeping status is marked as start sleep, and the last sleeping is marked as stop sleep.
# create 'start...' and 'stop...' labels based on the number labels
logs_21label <- logs_21num %>%
mutate(wlabel=ifelse(work!=lag(work, default=0), 'NA',
(ifelse(work!=lead(work, default=0),
'Stop Work', 'Start Work'))),
wlabelf=ifelse(wlabel==lag(wlabel, default='NA')|wlabel=='NA',
NA, wlabel),
rtlabel=ifelse(restaurant!=lag(restaurant, default=0), 'NA',
(ifelse(restaurant!=lead(restaurant, default=0),
'Leave Restaurant', 'At Restaurant'))),
rtlabelf=ifelse(rtlabel==lag(rtlabel, default='NA')|rtlabel=='NA',
NA, rtlabel),
rnlabel=ifelse(recreation!=lag(recreation, default=0), 'NA',
(ifelse(recreation!=lead(recreation, default=0),
'Stop Recreation', 'Start Recreation'))),
rnlabelf=ifelse(rnlabel==lag(rnlabel, default='NA')|rnlabel=='NA',
NA, rnlabel),
slabel=ifelse(sleep!=lag(sleep, default=0), 'NA',
(ifelse(sleep!=lead(sleep, default=0),
'Stop Sleep', 'Start Sleep'))),
slabelf=ifelse(slabel==lag(slabel, default='NA')|slabel=='NA',
NA, slabel))
Lastly, the derived dataset is cleaned and transposed to fit into VisiElSe format.
# extract relevant status
logs_21f <- logs_21label %>%
select(1:3, wlabelf, rtlabelf, rnlabelf, slabelf)
# derive final status
logs_21ff <- logs_21f %>%
unite("final", c(wlabelf, rtlabelf, rnlabelf, slabelf), sep="",
remove = TRUE, na.rm = TRUE) %>%
select(4, 3) %>%
mutate_all(na_if,"") %>%
drop_na(final)
# transpose logs_21f into ViSiElSe data template
log_21f <- transpose(logs_21ff[,-1])
colnames(log_21f) <- logs_21ff$final
# add id column
log_21f <- rowid_to_column(log_21f, 'id' )
The following code chunk applies the same procedures to prepare the data for participant 14’s daily routine.
# mark number labels for each activity
# if there is no change in status: number label increase by 1
# if there is change in status: number label stays the same
logs_14num <- logs_14 %>%
mutate(work= cumsum(currentMode != 'AtWork'),
restaurant= cumsum(currentMode != 'AtRestaurant'),
recreation= cumsum(currentMode != 'AtRecreation'),
sleep = cumsum(sleepStatus != 'Sleeping'))
# create 'start...' and 'stop...' labels based on the number labels
logs_14label <- logs_14num %>%
mutate(wlabel=ifelse(work!=lag(work, default=0), 'NA',
(ifelse(work!=lead(work, default=0),
'Stop Work', 'Start Work'))),
wlabelf=ifelse(wlabel==lag(wlabel, default='NA')|wlabel=='NA',
NA, wlabel),
rtlabel=ifelse(restaurant!=lag(restaurant, default=0), 'NA',
(ifelse(restaurant!=lead(restaurant, default=0),
'Leave Restaurant', 'At Restaurant'))),
rtlabelf=ifelse(rtlabel==lag(rtlabel, default='NA')|rtlabel=='NA',
NA, rtlabel),
rnlabel=ifelse(recreation!=lag(recreation, default=0), 'NA',
(ifelse(recreation!=lead(recreation, default=0),
'Stop Recreation', 'Start Recreation'))),
rnlabelf=ifelse(rnlabel==lag(rnlabel, default='NA')|rnlabel=='NA',
NA, rnlabel),
slabel=ifelse(sleep!=lag(sleep, default=0), 'NA',
(ifelse(sleep!=lead(sleep, default=0),
'Stop Sleep', 'Start Sleep'))),
slabelf=ifelse(slabel==lag(slabel, default='NA')|slabel=='NA',
NA, slabel))
# extract relevant status
logs_14f <- logs_14label %>%
select(1:3, wlabelf, rtlabelf, rnlabelf, slabelf)
# derive final status
logs_14ff <- logs_14f %>%
unite("final", c(wlabelf, rtlabelf, rnlabelf, slabelf), sep="",
remove = TRUE, na.rm = TRUE) %>%
select(4, 3) %>%
mutate_all(na_if,"") %>%
drop_na(final)
# transpose logs_21f into ViSiElSe data template
log_14f <- transpose(logs_14ff[,-1])
colnames(log_14f) <- logs_14ff$final
# add id column
log_14f <- rowid_to_column(log_14f, 'id' )
v1 <- visielse(log_21f, informer = NULL)
v2 <- visielse(log_14f, informer = NULL)

The following code chunk changes the labels and add long actions to the graph.
# for participant 21
b1 <- ConvertFromViSibook(v1@book)
b1 <- b1[order(as.numeric(b1$showorder)), ]
# label punctual activities
b1$label <- c("Start sleep", "Stop sleep","Start AM work",
"Stop AM work", "At restaurant", "Leave restaurant",
"Start PM work", "Stop PM work", "Start recreation1",
"Stop recreation1", "Start recreation2","Stop recreation2",
"Start recreation3","Stop recreation3")
# label long activities
b1[15,] <- c("sleep", "Sleep", "l", 1, "Start Sleep", "Stop Sleep")
b1[16,] <- c("work1", "AM Work", "l", 2, "Start Work", "Stop Work")
b1[17,] <- c("restaurant1", "Lunch at Restaurant", "l", 3, "At Restaurant",
"Leave Restaurant")
b1[18,] <- c("work2", "PM Work", "l", 4, "Start Work.1", "Stop Work.1")
b1[19,] <- c("recreation1", "First Recreation", "l", 5, "Start Recreation",
"Stop Recreation")
b1[20,] <- c("recreation2", "Second Recreation", "l", 7, "Start Recreation.1",
"Stop Recreation.1")
b1[21,] <- c("recreation3", "Third Recreation", "l", 8, "Start Recreation.2",
"Stop Recreation.2")
b1$showorder <- c(NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,
1, 2, 3, 4, 5, 6, 7)
b1 <- b1[order(as.numeric(b1$showorder)), ]
# for participant 14
b2 <- ConvertFromViSibook(v2@book)
b2 <- b2[order(as.numeric(b2$showorder)), ]
# label punctual activities
b2$label <- c("Start sleep", "Stop sleep","Start AM work",
"Stop AM work", "At restaurant1", "Leave restaurant1",
"Start PM work", "Stop PM work", "Start recreation1",
"Stop recreation1", "At restaurant2", "Leave restaurant2",
"Start recreation2","Stop recreation2", "Start recreation3",
"Stop recreation3")
# label long activities
b2[17,] <- c("sleep", "Sleep", "l", 1, "Start Sleep", "Stop Sleep")
b2[18,] <- c("work1", "AM Work", "l", 2, "Start Work", "Stop Work")
b2[19,] <- c("restaurant", "Lunch at Restaurant", "l", 3, "At Restaurant",
"Leave Restaurant")
b2[20,] <- c("work2", "PM Work", "l", 4, "Start Work.1", "Stop Work.1")
b2[21,] <- c("recreation1", "First Recreation", "l", 5, "Start Recreation",
"Stop Recreation")
b2[22,] <- c("restaurant2", "Dinner at Restaurant", "l", 6, "At Restaurant.1",
"Leave Restaurant.1")
b2[23,] <- c("recreation2", "Second Recreation", "l", 7, "Start Recreation.1",
"Stop Recreation.1")
b2[24,] <- c("recreation3", "Third Recreation", "l", 8, "Start Recreation.2",
"Stop Recreation.2")
b2$showorder <- c(NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,
NA, 1, 2, 3, 4, 5, 6, 7, 8)
b2 <- b2[order(as.numeric(b2$showorder)), ]
The following ViSiElSe is plotted taking into account of participants’ long actions:
v12 <- visielse(log_21f,
book = b1,
doplot = F,
informer = NULL,
pixel = 60)
plot(v12,
vp0w = 0.7,
unit.tps = "min",
scal.unit.tps = 60,
main = "Participant 21 on 1 Mar, 2022 (Education Low)")

v22 <- visielse(log_14f,
book = b2,
doplot = F,
informer = NULL,
pixel = 60)
plot(v22,
vp0w = 0.7,
unit.tps = "min",
scal.unit.tps = 60,
main = "Participant 14 on 1 Mar, 2022 (Education Graduate)")

From the above two plots, we can tell that one key difference of these two participants’ day is that participant 14 went to restaurants for both lunch and dinner while participant 21 only went to restaurant for lunch. We can’t tell for sure where participant 21 had dinner from the data.
Given both participants’ daily activities are similar except dinner, we bear this in mind and take out dinner for participant 14 in order to plot both routines on one graph for better comparison.
The following code chunk take out dinner activity from visibook:
Note: both participants’ daily routines are added in twice in
log_both so that group function can be used
when plotting ViSiElSe. It does not affect the accuracy of the plot as
weights are the same for both.
# remove dinner activity from dataset
log_14f1 <- log_14f %>%
select(-c(`At Restaurant.1`, `Leave Restaurant.1`))
# combine both daily routine into one dataset
# weight= 2 is added for both participants to allow for plotting visielse
# group function
log_both <- rbind(log_21f, log_14f1, log_21f, log_14f1)
# rename id
log_both[1, 1] = 'Participant 21'
log_both[2, 1] = 'Participant 14'
log_both[3, 1] = 'Participant 21'
log_both[4, 1] = 'Participant 14'
The following code chunk plots both participants’ routines on the same graph:
g <- c("Participant 21 (Low)", "Participant 14 (Graduate)",
"Participant 21 (Low)", "Participant 14 (Graduate)")
v3 <- visielse(log_both,
group = g,
book = b1,
doplot = F,
informer = NULL,
pixel = 60,
method = 'cut')
plot(v3,
vp0w = 0.7,
unit.tps = "min",
scal.unit.tps = 60,
main = "Participant 21 and 14's Routines on 1 Mar, 2022")

Recap on our target participants’ demographic:
From the above graph we are able to derive the following insights on their routines:
In this take-home exercise, I was able to apply a lot of data
wrangling techniques in R while preparing the data to plot in
ViSiElSe. In addition, we were able to explore plotting
with ViSiElSe, especially by labeling and plotting long
activities using ViSibook. Given the time constraint, I
felt that the following could be explored further to better answer this
question:
geom_segment() and even add animation to it. I
would be very interested to try it out as well when I have spare time.
:)