Inside the PhD Student Network: Econometrics and Business Statistics at Monash University

Author

Filip Reierson

Published

May 7, 2025

How this was made
show_code = "hidden"
To learn more about how the output in this article was created you can choose to show the code I used. Currently the code in this article is  
.

In this article, I explore an idea suggested by Floyd Everest, which involves analysing the connections between PhD students through shared supervisors using a graph structure. By treating both students and academic staff as nodes, and supervisory relationships as edges, we can visualise the structure of academic mentorship as a network. This may offer insight into the flow of knowledge, influence, and collaboration within the Department of Econometrics and Business Statistics (EBS) at Monash Business School. Krisanat Anukarnsakulchularp already created an SVG version of this network and gave me some pointers for how I can scrape the data. I share my attempts at visualising this network interactively. I also explore the data further.

Code
library(purrr)
library(dplyr)
library(readr)
library(stringr)
library(visNetwork)
library(ggplot2)
library(tidyr)
library(patchwork)
Code
url_list <- paste0('https://www.monash.edu/business/research/our-researchers/graduate-research-students-and-supervisors?queries_degree_query=Econometrics+and+Business+Statistics&queries_degree_query_posted=1&result_1619149_result_page=', 1:4)
get_data <- function(url) {
  page <- rvest::read_html(url)
  image_links <- page |>
    html_elements(".box-listing-element__thumb-link") |>
    map(\(x)list(image = html_element(x, 'img') |> html_attr("src"), 
                     name = html_element(x, 'img') |> html_attr("alt"),
                 website = html_attr(x, 'href'))) |>
    bind_rows()
  page |>
    html_elements(".box-listing-element__blurb") |>
    map(\(x) {
      elements <- html_children(x)
      list(
        name = elements[[1]] |> 
          rvest::html_text(),
        description = elements[[2]] |> 
          rvest::html_text() |> 
          str_trim(),
        supervisors = elements[[3]] |> 
          rvest::html_text() |> 
          str_split('\r\n') |> 
          pluck(1, 1) |> 
          str_trim() |> 
          str_remove('Supervisors: ') |>
          str_split_1('(,| and )') |>
          str_trim() |>
          # Thank you Krisanat for the following line
          str_remove("Prof |A/Prof |Dr |Prof. |Dr. |Assoc Prof. |Professor ")
      )
    }) |>
    reduce(bind_rows) |>
    inner_join(image_links, by = join_by(name))
}
if(!file.exists('students.csv'))  {
  students <- url_list |>
    map(get_data) |>
    bind_rows() |>
    mutate(
      supervisors = case_when(
        supervisors %in% c("David Frazier", "David T. Frazier") ~ "David Frazier",
        supervisors %in% c("Di Cook", "Dianne Cook") ~ "Dianne Cook",
        supervisors %in% c("Farshid Vahid", "Farshid Vahid-Araghi") ~ "Farshid Vahid",
        supervisors %in% c("Gael M. Martin", "Gael Martin") ~ "Gael Martin",
        supervisors %in% c("Rob Hyndman", "Rob J Hyndman") ~ "Rob Hyndman",
        supervisors %in% c("Susan VanderPlas", "Susan Vanderplus") ~ "Susan VanderPlas",
        supervisors %in% c("Thiyanga S. Talagala", "Thiyanga Talagala") ~ "Thiyanga Talagala",
        TRUE ~ supervisors
      )
    ) |>
    rename(supervisor = supervisors)
  write_csv(students, 'students.csv')
}
students <- read_csv(
  'students.csv',
  col_types = cols(
    name = col_character(),
    description = col_character(),
    supervisor = col_character(),
    image = col_character(),
    website = col_character()
  )
)
if (!file.exists('staff.csv'))  {
  staff <- rvest::read_html('https://www.monash.edu/business/econometrics-and-business-statistics/our-people/staff-directory') |>
    rvest::html_elements('.group-list') |>
    map(\(tbl) {
      tbl |>
        rvest::html_elements('.row') |>
        map(\(x) {
          list(
            name = x |>
              rvest::html_element('strong') |>
              rvest::html_text() |>
              str_trim(),
            website = x |> 
              rvest::html_element('a') |> 
              rvest::html_attr('href'),
            image = x |> 
              rvest::html_element('img') |> 
              rvest::html_attr('src'),
            description = x |>
              rvest::html_element('ul') |> 
              rvest::html_elements('li') |> 
              rvest::html_text() |> 
              paste0(collapse = ', ')
          )
        })
    }) |>
    bind_rows() |>
    mutate(website = ifelse(str_detect(website, 'mailto'), '', website))
  write_csv(staff, 'staff.csv')
}
staff <- read_csv(
  'staff.csv',
  col_types = cols(
    name = col_character(),
    website = col_character(),
    image = col_character(),
    description = col_character()
  )
)
Code
students |>
  count(name, image) |>
  pmap(\(...) list(...)) |>
  walk(\(x)download.file(x$image, destfile = paste0('images/',
    janitor::make_clean_names(x$name), '.jpg'
  )))
staff |>
  count(name, image) |>
  pmap(\(...) list(...)) |>
  walk(\(x)download.file(x$image, destfile = paste0('images/',
    janitor::make_clean_names(x$name), '.jpg'
  )))
Code
student_info <- students |>
  select(name, description, website) |>
  distinct()
staff_info <- staff |>
  select(name, description, website) |>
  bind_rows(students |>
  filter(!supervisor %in% staff$name) |>
  mutate(name = supervisor, website = NA_character_, description = NA_character_)) |>
  distinct()
graph_edges <- students |>
  select(from=name,
         to=supervisor)
all_info <- bind_rows(
  student_info |>
    mutate(group = 'student'),
  staff_info |>
    mutate(group = 'staff')
) |>
  select(name, description, website, group) |>
  mutate(image = glue::glue('images/{janitor::make_clean_names(name, allow_dupes=TRUE)}.jpg'))|>
  mutate(image = ifelse(is.na(website), 'images/default.jpg', image)) |>
  mutate(
    id = name,
    label = name,
    title = case_when(
      is.na(website) ~ glue::glue('{name}<br>{description}', .na = ''),
      T ~ glue::glue('<a target="_blank" href = "{website}">{name}</a><br>{description}', .na = ''))
  ) |>
  distinct() |>
  filter(id %in% c(graph_edges$from, graph_edges$to))
my_visnetwork <- function(nodes, edges, degrees=1) {
  visNetwork(nodes, edges, width = "100%", ) |>
    visLayout(randomSeed = 42) |>
    visNodes(
      shape = "circularImage",
      size = 20,
      borderWidth = 3,
      shapeProperties = list(useBorderWithImage = TRUE)
    ) |>
    visEdges(smooth = T, labelHighlightBold = FALSE) |>
    visOptions(
      highlightNearest = list(
        enabled = TRUE,
        degree = degrees,
        labelOnly = TRUE,
        hover = FALSE,
        algorithm = "hierarchical"
      )
    ) |>
    visInteraction(
      tooltipStyle = 'position: fixed;visibility:hidden;padding: 5px;
                  font-family: verdana;font-size:14px;font-color:#000000;background-color: #f5f4ed;
                  -moz-border-radius: 3px;-webkit-border-radius: 3px;border-radius: 3px;
                   border: 1px solid #808074;box-shadow: 3px 3px 10px rgba(0, 0, 0, 0.2);
                   max-width:200px;word-break: normal;',
      selectConnectedEdges = FALSE
    )
}
only_students <- inner_join(
  graph_edges,
  graph_edges,
  by = join_by(to == to),
  relationship = "many-to-many"
) |>
  rename(title = to,
         from = from.x,
         to = from.y) |>
  filter(from<to)
only_staff <- inner_join(
  graph_edges,
  graph_edges,
  by = join_by(from == from),
  relationship = "many-to-many"
) |>
  rename(title = from,
         from = to.x,
         to = to.y) |>
  filter(from<to)
Code
p_students <- all_info |> 
  filter(group == 'student') |> 
  select(-group) |>
  my_visnetwork(edges=only_students, degrees = 1)
p_staff <- all_info |> 
  filter(group == 'staff') |> 
  select(-group) |>
  my_visnetwork(edges=only_staff, degrees = 1)
p_all<- my_visnetwork(all_info, graph_edges, degrees = 1)

There are some interesting properties of the student network graph.

Visualising PhD student website domains

Code
get_domain <- Vectorize(function(url) {
  url <- str_remove(url, "^https?://")
  url <- str_remove(url, "^www\\.")
  domain <- str_split(url, "/", simplify = TRUE)[1]
  parts <- str_split(domain, "\\.", simplify = TRUE)
  if (ncol(parts) >= 2) {
    paste0(parts[, ncol(parts) - 1], ".", parts[, ncol(parts)])
  } else {
    domain
  }
})
Code
website_data <- students |>
  distinct(name, website) |>
  mutate(domain = get_domain(website)) |>
  drop_na() |>
  mutate(personal_website = case_when(
    str_detect(domain, 'linkedin|.edu') ~ domain,
    T ~ 'Personal website'
  )) |>
  mutate(custom_domain = case_when(
    str_detect(domain, 'google.com|linkedin.com|netlify.app|.edu|github.io') ~ F,
    T ~ T
  )) |>
  mutate(suffix = str_extract(domain, '\\.[a-z]{2,3}'))
n_na <- students |> filter(is.na(website)) |> count(name) |> nrow()

There were 3 students with no website linked. The attributes of the remaining websites linked by student profiles are shown in . The majority of students linked to a LinkedIn profile, which is not surprising. Unsurprisingly, the “.com” suffix is the most popular, while the “.app” is second most popular as Netlify is a popular free host. Netlify, Google sites, and Github are the hosts used among people that did not have a custom domain. These may be a good place to start for someone needing to host a website. In fact this website is hosted using Netlify at the time of writing.

Code
p1 <- website_data |>
  count(personal_website) |>
  ggplot(aes(forcats::fct_infreq(personal_website, n), n)) +
      geom_bar(stat='identity') +
      labs(x='', y='Count') +
  theme(axis.text.x = element_text(angle = 45, hjust = 1))
p2 <- website_data |>
  filter(personal_website == 'Personal website') |>
  mutate(custom_domain = ifelse(custom_domain, "Custom domain", "Free domain")) |>
  ggplot(aes(x = forcats::fct_infreq(suffix), fill = custom_domain)) +
  geom_bar() +
  labs(x = "", y='', fill = "") +
  theme(axis.text.x = element_text(angle = 45, hjust = 1),
        legend.position='top')
p3 <- website_data |>
  filter(personal_website == 'Personal website', !custom_domain) |>
  count(domain) |>
  ggplot(aes(forcats::fct_infreq(domain, n), n)) +
    geom_bar(stat='identity') +
    labs(x='', y='') +
    theme(axis.text.x = element_text(angle = 45, hjust = 1))
p1 + p2 + p3 + plot_layout(widths = c(2, 3, 2))
Figure 1: (left) The number of students linking their own website, a social media profile, or academic profile. (middle) The popularity of suffixes for personal websites. (right) The popularity of hosts among the free domains.

Conclusion

This article explored the connections between PhD students and supervisors at Monash University. The network of students and supervisors was visualised using the visNetwork package, revealing clustering based on campus and research interests. The analysis also highlighted particularly connected supervisors and students within the network. I also briefly explored the domains and hosts people used for their personal websites. I will leave more sophisticated analysis of these networks to others, but I hope someone finds this preliminary exploration interesting.