Creating data visualisation beyond default: peer critique and visualization remake of take-home exercise 1.
This take-home exercise is done based on a take-home exercise 1 submission prepared by a classmate. The peer submission will be critiqued in terms of clarity and aesthetics, and the original design will be remade using the data visualization principles and best practice learnt in Lesson 1 and 2.
The dataset used in take-home exercise 1 and 2 is downloaded from VAST Challenge 2022,
and processed by RStudio tidyverse family of packages and
visualized by ggplot2 and its extensions.
Data preparation steps taken by the original author of the critiqued graphs are listed here for easy reference. As this is not the focus of this exercise, I will not go into details about it.
Installing packages and loading libraries:
# install and load libraries
packages = c('tidyverse', 'knitr', 'ggdist', 'scales', 'grid', 'gridExtra',
'formattable', 'patchwork')
for(p in packages){
if(!require(p, character.only = T)){
install.packages(p)
}
library(p, character.only = T)
}
Import data:
# read csv file
participants <- read_csv('data/Participants.csv')
Bin age variable:
# bin age variable into 5 different age groups
one <- c("18", "19", "20")
two <- c("21", "22", "23", "24", "25", "26", "27", "28", "29", "30" )
three <- c("31", "32", "33", "34", "35", "36", "37", "38", "39", "40")
four <- c( "41", "42", "43", "44", "45", "46", "47", "48", "49", "50")
five <- c( "51", "52", "53", "54", "55", "56", "57", "58", "59", "60")
participants <- participants %>%
mutate(age_modified = case_when(
age %in% one ~ "<=20",
age %in% two ~ "20's",
age %in% three ~ "30's",
age %in% four ~ "40's",
age %in% five ~ "50+")) %>%
select(-age)
There are four graphs in total in this take-home exercise 1 and they will be reviewed and remade in terms of clarity and aesthetics.
A histogram is used to reveal the distribution of residents’ joviality.
# current graph
ggplot(data=participants,
aes(x = joviality)) +
geom_histogram(bins=20,
boundary = 50,
color="black",
fill="light blue") +
coord_cartesian(xlim=c(0.01,1)) +
labs(title = "Distribution of Joviality",
caption = "demographic information, Ohio USA") +
theme(plot.title = element_text(hjust = 0.5),
plot.subtitle = element_text(hjust = 1),
plot.caption = element_text(hjust = 0))

Clarity
bins= 20 and boundary= 50.bins= 20 (original) and
bins= 10.# comparing shapes of histograms with different number of bins
p1 <-
ggplot(data=participants,
aes(x = joviality)) +
geom_histogram(bins=20,
boundary = 50,
color="black",
fill="light blue") +
coord_cartesian(xlim=c(0.01,1)) +
labs(title = "Distribution of Joviality (bins=20)",
caption = "demographic information, Ohio USA") +
theme(plot.title = element_text(hjust = 0.5),
plot.subtitle = element_text(hjust = 1),
plot.caption = element_text(hjust = 0))
p2 <-
ggplot(data=participants,
aes(x = joviality)) +
geom_histogram(bins=10,
boundary = 50,
color="black",
fill="light blue") +
#coord_cartesian(xlim=c(0.0, 1)) +
labs(title = "Distribution of Joviality (bins=10)",
caption = "demographic information, Ohio USA") +
theme(plot.title = element_text(hjust = 0.5),
plot.subtitle = element_text(hjust = 1),
plot.caption = element_text(hjust = 0))
p1|p2

Aesthetics
# remake
ggplot(participants,
aes(joviality)) +
# plot histogram
geom_histogram(bins = 20,
color="#666666",
fill="light blue") +
# plot density curve (y= ..density..*60 to match the second y-axis)
geom_density(aes(y= ..density..*60), color= '#ff7373') +
# add second y-axis, which is transformed from the primary y-axis (/60)
scale_y_continuous('No. of\nResidents',
expand = c(0.05, 0.2),
sec.axis = sec_axis(~. /60, name= 'Density',
breaks= seq(0, 1.2, 0.5) )) +
# add mean joviality line
geom_vline(aes(xintercept = mean(joviality)),
linetype= 'dashed',
size= .4,
color= '#0e2f44') +
# annotation for mean joviality line
geom_text(aes(mean(joviality), 65,
label= 'Mean Joviality Index',
hjust= -0.1),
check_overlap = T,
color= '#0e2f44') +
labs(y= 'No. of\nResidents', x= 'Joviality',
title = "Distribution of Residents' Joviality Index",
subtitle= 'Demographics in Engagement, Ohio',
caption = "Source: VAST Challenge 2022") +
theme(axis.title.y= element_text(angle=0),
axis.title.y.right = element_text(angle=0, vjust = 1),
panel.grid.major = element_line(color= 'grey', size = 0.1),
panel.background= element_blank(),
axis.line= element_line(color= 'grey'),
plot.caption = element_text(hjust = 0))

The makeover graph kept the good features of the original graph, such as the caption and the overall format, and improved on a few elements:
Clarity
Aesthetics
A stacked bar chart is chosen to reveal the distribution of residents’ age for different household sizes.
# current graph
ggplot(data=participants,
aes(x=age_modified, fill = educationLevel)) +
geom_bar()+
labs(
title = "Distribution of Age for different household types",
caption = "demographic information, Ohio USA"
) +
theme(
plot.title = element_text(hjust = 0.5),
plot.subtitle = element_text(hjust = 1),
plot.caption = element_text(hjust = 0)
)

Clarity
Aesthetics
Further data preparation is first done before remaking graph:
# data preparation for remake
# rename '<=20' and '50+' for a more formal formatting
participants2 <- participants
participants2$age_modified <-
sub('<=20', '20 and below', participants$age_modified)
participants2$age_modified[participants2$age_modified == 'above 50+'] <- 'above 50'
# rename 'HighSchoolOrCollege' values in dataset for ease of reading
participants2$educationLevel <- sub('HighSchoolOrCollege',
'High School or College',
participants2$educationLevel)
# calculate %composition of age groups for different education levels
percent <-
participants2 %>%
group_by(age_modified, educationLevel) %>%
summarise(edu_size= n()) %>%
mutate(edu_pct= percent(edu_size/sum(edu_size)))
# plot remake graph
# plot p1: bar chart of age group distribution
p1 <-
ggplot(participants2, aes(x= age_modified)) +
geom_bar(fill= '#468499') +
scale_y_continuous(expand = c(0.2, 0.2)) +
labs(y= 'No. of\nResidents',
title = "Residents' Age Group Distribution",
subtitle= 'Demographics in Engagement, Ohio') +
theme(axis.title.y= element_text(angle=0), axis.ticks.x= element_blank(),
panel.background= element_blank(), axis.line= element_line(color= 'grey'),
panel.grid.major.y = element_line(size= 0.2, color = "grey"),
axis.title.x = element_blank(),
plot.title = element_text(hjust=0.5),
plot.subtitle = element_text(hjust=0.5))
# prepare sorted order of education levels from low to high levels
edu_lvl <- c('Low', 'High School or College', 'Bachelors', 'Graduate')
# plot p2: composition of corresponding age group in terms of education levels
p2<-
ggplot(percent,
aes(x= age_modified, y= edu_pct,
group= educationLevel,
color= factor(educationLevel, levels = edu_lvl))) +
geom_line() +
scale_color_discrete(name= 'Education Level') +
scale_y_continuous(labels = percent_format(),
expand = c(0.2, 0.2)) +
labs(y= 'Percent', x= 'Age Group',
title = "Composition of Residents' Education Level in Different Age Groups",
caption = "Source: VAST Challenge 2022")+
theme(legend.position = 'top', legend.direction = 'horizontal',
axis.title.y= element_text(angle=0), axis.ticks.x= element_blank(),
panel.background= element_blank(),
axis.line= element_line(color= 'grey'),
panel.grid.major = element_line(size= 0.2, color = "grey"),
legend.key = element_rect(fill= NA),
legend.title = element_text(size = 8.5),
plot.title = element_text(hjust=0.5),
plot.caption = element_text(hjust = 0))
# use patchwork to stack 2 graphs
p1/p2

Clarity
Aesthetics
This graph is a horizontal stacked bar chart designed to reveal the joviality distribution in relation to residents’ education level, age group and whether having kids.
# current graph
ggplot(data=participants,
aes(x= joviality,
y= educationLevel, fill = haveKids)) +
geom_col() +
theme_classic()+
labs(
title = "Joviality Measure",
caption = "demographic information, Ohio USA"
) +
theme(
plot.title = element_text(hjust = 0.5),
plot.subtitle = element_text(hjust = 1),
plot.caption = element_text(hjust = 0)
)+
facet_wrap(~age_modified)

Clarity
Aesthetics
In this remake, I plan to split the four variables included in the current graph into two graphs. I feel the author’s main intention is to explore how other variables affect joviality, therefore, the first remake will be on relationship between whether having kids and joviality while the second will be on joviality distribution in different educational and age groups.
# remake part 1
ggplot(participants2,
aes(x= joviality, color= haveKids)) +
geom_density() +
labs(y= 'Density', x= 'Joviality',
title = "Are Residents with Kids more Jovial?",
subtitle= 'Demographics in Engagement, Ohio',
caption = "Source: VAST Challenge 2022") +
scale_color_discrete(name= 'Have Kids?', labels = c('True', 'False')) +
theme(axis.title.y= element_text(angle=0),
panel.grid.major = element_line(color= 'grey', size = 0.1),
panel.background= element_blank(),
axis.line= element_line(color= 'grey'),
plot.caption = element_text(hjust = 0),
legend.key = element_rect(fill= NA))

This graph shows the difference in density curves of residents who have and do not have kids. It can be concluded that residents who have kids have a higher probability (area under the curve) of having lower joviality.
# remake part 2
ggplot(participants2,
aes(x= fct_rev(factor(educationLevel, levels= edu_lvl)), y= joviality)) +
geom_violin(fill= '#66cdaa',
color= NA,
bw= 0.4) +
geom_boxplot(width= 0.2,
color = '#065535',
alpha= 0.3) +
stat_summary(aes(color= 'Mean'),
fun= 'mean',
size= 0.2) +
geom_hline(aes(yintercept = 0.5),
color= 'black',
linetype= 'dashed',
size= .6) +
scale_color_manual(name= 'Statistics',
values = (Mean= '#f6546a')) +
facet_wrap(~ age_modified) +
labs(title= 'Joviality Distribution in Relation to Age and Interest Groups',
x= 'Interest Group', y= 'Joviality',
subtitle= 'Demographics in Engagement, Ohio',
caption = "Source: VAST Challenge 2022") +
theme(panel.background = element_blank(),
axis.title.y = element_text(angle=0),
axis.ticks.y = element_blank(),
axis.line= element_line(color= 'grey'),
panel.grid.major.y = element_line(color= 'grey', size = 0.1),
plot.caption = element_text(hjust=0),
legend.key = element_rect(fill= NA),
axis.text = element_text(size=10),
axis.title = element_text(size= 12),
plot.title = element_text(size = 18),
plot.subtitle = element_text(size = 14),
legend.text = element_text(size= 12)) +
coord_flip()

The remade graph shows the distribution of joviality for different age groups and education levels. Additional information, such as mean joviality and joviality=0.5, is marked on the graph for audience’s easy reference. The overall format and theme are also improved for a cleaner look.
A faceted bar chart is used to reveal the distribution of joviality with respect to different age groups and interest groups.
# current graph
ggplot(data=participants,
aes(x= interestGroup,
y= joviality)) +
geom_col() +
theme_classic()+
labs(
title = "Joviality Measure",
caption = "demographic information, Ohio USA"
) +
theme(
plot.title = element_text(hjust = 0.5),
plot.subtitle = element_text(hjust = 1),
plot.caption = element_text(hjust = 0)
)+
facet_wrap(~age_modified)

Clarity
geom_col was used to plot
x= interestGroup, y= joviality, the y-axis plotted could be
the sum of joviality for all residents in the same interest group. If
this is the case, the faceted bar chart is no longer meaningful as the
joviality value (height of the bar) is affected by number of residents
of the same age group in the same interest group. For example, from the
group, residents in their 40s in interest group F has a high joviality
value, but this could be due to there are simple more residents in their
40s in group F and does not reflect individual resident’s joviality
index.Aesthetics
# remake graph
ggplot(participants2,
aes(x= interestGroup, y= joviality)) +
geom_violin(fill= '#66cdaa',
color= NA,
bw= 0.4) +
geom_boxplot(width= 0.2,
color = '#065535',
alpha= 0.3) +
stat_summary(aes(color= 'Mean'),
fun= 'mean',
size= 0.2) +
geom_hline(aes(yintercept = 0.5),
color= 'black',
linetype= 'dashed',
size= .6) +
scale_color_manual(name= 'Statistics',
values = (Mean= '#f6546a')) +
facet_wrap(~ age_modified) +
labs(title= 'Joviality Distribution in Relation to Age and Interest Groups',
x= 'Interest Group', y= 'Joviality',
subtitle= 'Demographics in Engagement, Ohio',
caption = "Source: VAST Challenge 2022") +
theme(panel.background = element_blank(),
axis.title.y = element_text(angle=0),
axis.ticks.x = element_blank(),
axis.line= element_line(color= 'grey'),
panel.grid.major.y = element_line(color= 'grey', size = 0.1),
plot.caption = element_text(hjust=0),
legend.key = element_rect(fill= NA))

Clarity
Aesthetics
Take-home exercise 2 serves as a mirror image to take-home exercise 1. It provided me with an opportunity to reflect on what I have done for take-home exercise 1 after seeing it from different perspectives of classmates’ works. My key takeaways are: