Shopping around for what single-cell conferences to attend in 2020?

The new year is around the corner and you might be interested in figuring out what single-cell conferences to attend in 2020. A list of some single-cell conferences in 2020 came across my twitter feed the other day and I started to peruse it. If you are interested in attending of them, I thought I write up a quick blogpost to help make some comparisons between the single-cell conferences.

Load packages

First we will load some packages.

suppressMessages({
  library(here)
  library(tidyverse)
  library(rvest)
  library(UpSetR)
  library(gender)
})

here()
## [1] "/Users/shicks/Documents/github/websites/website-hicks-source"

Load data

First let’s create a dataframe with the short and long name for each conference and the url. I only considered conferences that had a list of names for the organizing committee and confirmed speakers available as of Dec 2019.

url_wellcome2020 <- "https://coursesandconferences.wellcomegenomecampus.org/our-events/single-cell-biology-2020/"
url_cellsymp <- "http://www.cell-symposia.com/conceptual-single-cells-2020/"
url_keystone <- "https://www.keystonesymposia.org/ks/Online/Events/2020F1/Details.aspx?EventKey=2020F1&Tabs=2#Tabs"
url_grc_scgenomics <- "https://www.grc.org/single-cell-genomics-conference/2020/"
url_grc_sccancerbio <- "https://www.grc.org/single-cell-cancer-biology-conference/2020/"
url_emrg_tech <- "https://www.vibconferences.be/events/emerging-technologies-in-single-cell-research#speakers"

url_confs <- tibble(name_conf = c("wellcome", "cell_symp", "keystone", "grc_scgenomics", "grc_sccancerbio", "emrg_tech"),
                    name_long = c("Wellcome Genome Campus: Single Cell Biology", 
                                  "Cell Symposia: The Conceptual Power of Single Cell Biology", "Keystone Symposia: Single Cell Biology", "Gordon Research Conference: Single-Cell Genomics", "Gordon Research Conference: Single-Cell Cancer Biology", "Emerging Technologies in Single Cell Research"),
                    url = c(url_wellcome2020, url_cellsymp, url_keystone, url_grc_scgenomics, url_grc_sccancerbio, url_emrg_tech))

url_confs
## # A tibble: 6 x 3
##   name_conf    name_long                    url                            
##   <chr>        <chr>                        <chr>                          
## 1 wellcome     Wellcome Genome Campus: Sin… https://coursesandconferences.…
## 2 cell_symp    Cell Symposia: The Conceptu… http://www.cell-symposia.com/c…
## 3 keystone     Keystone Symposia: Single C… https://www.keystonesymposia.o…
## 4 grc_scgenom… Gordon Research Conference:… https://www.grc.org/single-cel…
## 5 grc_sccance… Gordon Research Conference:… https://www.grc.org/single-cel…
## 6 emrg_tech    Emerging Technologies in Si… https://www.vibconferences.be/…

Next, I used a combination of rvest or by hand adding in the names of the organizing committees and speakers for each conference (depending on my frustration level with rvest and/or XML/HTML).

Wellcome Genome Campus: Single Cell Biology

wellcome_committee <- c("Ellen Rothenberg", "Sarah Teichmann", 
                        "Fabian Theis", "Itai Yanai")
wellcome_speakers <- c("Kathy Cheah", "Polly Fordyce", "Eileen Furlong",
                       "Gillian Griffiths", "Guoji Guo", "Muzz Haniffa",
                       "Joakim Lundeberg", "Samantha Morris", "Mats Nilsson", 
                       "Rahul Satija", "Timm Schroeder", "Fabian Theis",
                       "Barbara Treutlein", "Ludovic Vallier", "Roser Vento",
                       "Itai Yanai")
wellcome_all <- unique(c(wellcome_committee, wellcome_speakers))

Cell Symposia: The Conceptual Power of Single Cell Biology

h <- read_html(url_confs[which(url_confs$name_conf == "cell_symp"),]$url)

conf_names <- h %>% 
  html_nodes(".blue .bold") %>% 
  html_text()

cell_symp_committee <- conf_names[c(22, 24,25)] 
cell_symp_committee[2:3] <- stringr::str_sub(string = cell_symp_committee[2:3], end = -3)
cell_symp_speakers <- conf_names[1:21]
cell_symp_all <- unique(c(cell_symp_committee, cell_symp_speakers))

Keystone Symposia: Single Cell Biology

keystone_committee <- c("Charles Ansong", "Nikolaus Rajewsky", 
                        "Massimiliano Pagani")
keystone_speakers <- c("Eileen E.M. Furlong", "Barbara Treutlein", "Ido Amit", 
                "Hans Clevers", "Charles Ansong", "Sarah Teichmann", 
                "Fabian Theis", "Matthias Mann", "Jeffrey A. Whitsett",
                "Nikolaus Rajewsky", "Alexander Schier", "Stefano Piccolo", 
                "Julia Laskin", "Bernd Bodenmiller", "Evan Macosko", 
                "Massimiliano Pagani", "Evan W. Newell", "Peter Lichter", 
                "Alexander van Oudenaarden", "Iannis Aifantis", "Aron Jaffe", 
                "Sten Linnarsson", "Ana Pombo", "Bosiljka Tasic", "Liqun Luo", 
                "David Van Valen", "Jörg Vogel", "Angela Ciuffi")

keystone_all <- unique(c(keystone_committee, keystone_speakers))

Gordon Research Conference: Single-Cell Genomics

h <- read_html(url_confs[which(url_confs$name_conf == "grc_scgenomics"),]$url)

conf_names <- h %>% 
  html_nodes(".name strong") %>% 
  html_text()

grc_scgenomics_committee <- c("Xiaoliang Sunney Xie", "Stephen R. Quake", 
                              "Xiaowei Zhuang", "Barbara Treutlein")
grc_scgenomics_speakers <- conf_names
grc_scgenomics_all <- unique(c(grc_scgenomics_committee, grc_scgenomics_speakers))

Gordon Research Conference: Single-Cell Cancer Biology

h <- read_html(url_confs[which(url_confs$name_conf == "grc_sccancerbio"),]$url)

conf_names <- h %>% 
  html_nodes(".name strong") %>% 
  html_text()

grc_sccancerbio_committee <- c("Kai Tan", "Nicholas Navin", "Mario Suva", 
                              "Sohrab Shah")
grc_sccancerbio_speakers <- conf_names
grc_sccancerbio_all <- unique(c(grc_sccancerbio_committee, grc_sccancerbio_speakers))

Emerging Technologies in Single Cell Research

emrg_tech_committee <- c("Jean-Christophe Marine", "Diether Lambrechts", 
                         "Stein Aerts", "Yvan Saeys", "Martin Guilliams", 
                         "Ana Pombo", "Helen Parkinson", "Amos Tanay", 
                         "Evy Vierstraete")

emrg_tech_speakers <- c("Leeat Keren", "Miao-Ping Chien", "Jop Kind", 
                         "Klass Mulder", "Nitzan Mor", "Celine Vallot", 
                         "Nikolaus Rajewsky", "Amos Tanay", "Ana Pombo", 
                         "Oana Ursu")
emrg_tech_all <- unique(c(emrg_tech_committee, emrg_tech_speakers))

Find overlaps

Next, I combined all the organizing committee members and speakers into a big list into an UpSetr plot to find out how many overlaps there were between confirmed individuals attending the conferences. Note, the default is to use nsets=5, where nsets is the number of sets to consider. Here we have 6 conferences, so 6 sets. So I bumped it up to nsets=6 to show overlap between all 6 conferences.

data_list <- list(wellcome = wellcome_all, cell_symp = cell_symp_all, 
                    keystone = keystone_all, 
                    grc_scgenomics = grc_scgenomics_all, 
                    grc_sccancerbio = grc_sccancerbio_all, 
                    emrg_tech = emrg_tech_all)

upset(fromList(data_list), nsets = 6, order.by = "freq")

So you can see there are quite a few individuals who are confirmed speakers or on the organzing committee for multiple single-cell conferences in 2020 confirming my quick scan of the websites. There are at least two individuals who are scheduled to attend three out of the six single-cell conferences!

Let’s do a bit of digging to see who these individuals are. Just think if you miss them at one conference, you might be able to catch them at another later in the year! :)

Data wrangling

Next, I converted the list of names for each conference into a dataframe with one column referring to the conference name, the second is the name of the individual.

df <- tibble(name_conf = names(unlist(data_list)),
             name_ind = c(unlist(data_list)))
df
## # A tibble: 165 x 2
##    name_conf  name_ind         
##    <chr>      <chr>            
##  1 wellcome1  Ellen Rothenberg 
##  2 wellcome2  Sarah Teichmann  
##  3 wellcome3  Fabian Theis     
##  4 wellcome4  Itai Yanai       
##  5 wellcome5  Kathy Cheah      
##  6 wellcome6  Polly Fordyce    
##  7 wellcome7  Eileen Furlong   
##  8 wellcome8  Gillian Griffiths
##  9 wellcome9  Guoji Guo        
## 10 wellcome10 Muzz Haniffa     
## # … with 155 more rows

I removed the numbers from the end of the conference name

df$name_conf <- gsub('[[:digit:]]+', '', df$name_conf)
head(df$name_conf)
## [1] "wellcome" "wellcome" "wellcome" "wellcome" "wellcome" "wellcome"

And had to do some manual text wrangling to remove the middle initials of a few individuals and change the name of two other individuals as the names to reflect either the full name of an individual or that another individual’s last name was referred to differently across conference websites.

df[match(c("Eileen E.M. Furlong", "Jeffrey A. Whitsett", "Evan W. Newell", 
           "Stephen R. Quake"), df$name_ind), ]$name_ind <- 
  c("Eileen Furlong", "Jeffrey Whitsett", "Evan Newell", "Stephen Quake")
df[grep("Muzz Haniffa", df$name_ind),]$name_ind <- "Muzlifah Haniffa"
df[grep("Vento", df$name_ind),]$name_ind <- "Roser Vento-Tormo"

Next, I split the full names of the individuals into first and last names and converted the data frame into a tibble.

df <- cbind(df, plyr::ldply(stringr::str_split(
                df$name_ind, pattern = " ", n = 2)))
colnames(df)[match(c("V1", "V2"), colnames(df))] <- 
  c("name_first", "name_last")
df <- as_tibble(df)
df
## # A tibble: 165 x 4
##    name_conf name_ind          name_first name_last 
##    <chr>     <chr>             <chr>      <chr>     
##  1 wellcome  Ellen Rothenberg  Ellen      Rothenberg
##  2 wellcome  Sarah Teichmann   Sarah      Teichmann 
##  3 wellcome  Fabian Theis      Fabian     Theis     
##  4 wellcome  Itai Yanai        Itai       Yanai     
##  5 wellcome  Kathy Cheah       Kathy      Cheah     
##  6 wellcome  Polly Fordyce     Polly      Fordyce   
##  7 wellcome  Eileen Furlong    Eileen     Furlong   
##  8 wellcome  Gillian Griffiths Gillian    Griffiths 
##  9 wellcome  Guoji Guo         Guoji      Guo       
## 10 wellcome  Muzlifah Haniffa  Muzlifah   Haniffa   
## # … with 155 more rows

Exploratory data analysis

Let’s do some exploratory data analysis (EDA).

First let’s see who are the individuals who are attending multiple single-cell conferences this summer as confirmed speakers or on the organizing committee.

df %>% 
  group_by(name_ind) %>% 
  summarize(tot = n()) %>% 
  filter(tot > 1) %>% 
  arrange(desc(tot)) %>% 
  as.data.frame()
##             name_ind tot
## 1  Barbara Treutlein   3
## 2     Bosiljka Tasic   3
## 3         Amos Tanay   2
## 4          Ana Pombo   2
## 5  Bernd Bodenmiller   2
## 6      Celine Vallot   2
## 7      Charles Gawad   2
## 8         Dana Pe'er   2
## 9     Eileen Furlong   2
## 10  Ellen Rothenberg   2
## 11      Fabian Theis   2
## 12   Iannis Aifantis   2
## 13          Ido Amit   2
## 14  Joakim Lundeberg   2
## 15    Nicholas Navin   2
## 16 Nikolaus Rajewsky   2
## 17     Polly Fordyce   2
## 18      Rahul Satija   2
## 19 Roser Vento-Tormo   2
## 20   Samantha Morris   2
## 21   Sarah Teichmann   2
## 22    Xiaowei Zhuang   2
## 23       Zemin Zhang   2

Lots of great speakers on this list!

I also wanted to get a rough idea of what the gender balance was for each of the conferences. To do this, I used the gender R package to infers state-recorded gender categories from first names using historical datasets.

Inferring the gender

Here I’m using the gender() function with the method = "genderize", which uses the Genderize.io API. Reading the documention, this is based on “user profiles across major social networks.”

As there is an API limit on Genderize.io, I saved the dataset and load it in directly so I do not accidentally hit my limit each time I knit this R Markdown.

if(!file.exists(here("static", "data", "sc2020_genderize.csv"))){
  df_genderize <- gender(unique(df$name_first), method = "genderize")
  write_csv(df_genderize, here("static", "data", "sc2020_genderize.csv"))
} else { 
  df_genderize <- read_csv(here("static", "data", "sc2020_genderize.csv"))
}
## Parsed with column specification:
## cols(
##   name = col_character(),
##   gender = col_character(),
##   proportion_female = col_double(),
##   proportion_male = col_double()
## )
df_genderize
## # A tibble: 129 x 4
##    name     gender proportion_female proportion_male
##    <chr>    <chr>              <dbl>           <dbl>
##  1 Ellen    female            0.98            0.02  
##  2 Sarah    female            0.98            0.02  
##  3 Fabian   male              0.01            0.99  
##  4 Itai     male              0.0900          0.91  
##  5 Kathy    female            0.98            0.02  
##  6 Polly    female            0.9             0.100 
##  7 Eileen   female            0.98            0.02  
##  8 Gillian  female            0.92            0.0800
##  9 Guoji    male              0               1     
## 10 Muzlifah female            1               0     
## # … with 119 more rows

We see for each first name, we get returned the proportion of male (or female) names.

Then, I combine our data frame (df) above with the df_genderize data frame using a left_join() function from dplyr

colnames(df_genderize)[1] <- "name_first"
df <- dplyr::left_join(df, df_genderize, by = "name_first")

df %>% 
  select(name_conf, name_ind, gender, proportion_female, proportion_male)
## # A tibble: 165 x 5
##    name_conf name_ind          gender proportion_female proportion_male
##    <chr>     <chr>             <chr>              <dbl>           <dbl>
##  1 wellcome  Ellen Rothenberg  female            0.98            0.02  
##  2 wellcome  Sarah Teichmann   female            0.98            0.02  
##  3 wellcome  Fabian Theis      male              0.01            0.99  
##  4 wellcome  Itai Yanai        male              0.0900          0.91  
##  5 wellcome  Kathy Cheah       female            0.98            0.02  
##  6 wellcome  Polly Fordyce     female            0.9             0.100 
##  7 wellcome  Eileen Furlong    female            0.98            0.02  
##  8 wellcome  Gillian Griffiths female            0.92            0.0800
##  9 wellcome  Guoji Guo         male              0               1     
## 10 wellcome  Muzlifah Haniffa  female            1               0     
## # … with 155 more rows

After doing some digging, I noticed two incorrectly predicted gender labels. For example, Xiaowei Zhuang is referred to as a “she” in this wikipedia page, therefore, I modified the prediction here.

df[grep("Xiaowei", df$name_ind),]$gender <- "female"
df[grep("Xiaowei", df$name_ind),]$proportion_female <- 
  1 - unique(df[grep("Xiaowei", df$name_ind),]$proportion_female)
df[grep("Xiaowei", df$name_ind),]$proportion_male <- 
  1 - unique(df[grep("Xiaowei", df$name_ind),]$proportion_male)
df[grep("Xiaowei", df$name_ind),]
## # A tibble: 2 x 7
##   name_conf name_ind name_first name_last gender proportion_fema…
##   <chr>     <chr>    <chr>      <chr>     <chr>             <dbl>
## 1 cell_symp Xiaowei… Xiaowei    Zhuang    female             0.54
## 2 grc_scge… Xiaowei… Xiaowei    Zhuang    female             0.54
## # … with 1 more variable: proportion_male <dbl>

I also noticed Liqun Luo is referred to as a “he” in the Wikipedia page, so I modified the label.

df[grep("Liqun", df$name_ind),]$gender <- "male"
df[grep("Liqun", df$name_ind),]$proportion_female <- 
  1 - unique(df[grep("Liqun", df$name_ind),]$proportion_female)
df[grep("Liqun", df$name_ind),]$proportion_male <- 
  1 - unique(df[grep("Liqun", df$name_ind),]$proportion_male)
df[grep("Liqun", df$name_ind),]
## # A tibble: 1 x 7
##   name_conf name_ind name_first name_last gender proportion_fema…
##   <chr>     <chr>    <chr>      <chr>     <chr>             <dbl>
## 1 keystone  Liqun L… Liqun      Luo       male               0.38
## # … with 1 more variable: proportion_male <dbl>

There were also some names that had no gender predictions returned.

df[is.na(df$proportion_female),] 
## # A tibble: 5 x 7
##   name_conf name_ind name_first name_last gender proportion_fema…
##   <chr>     <chr>    <chr>      <chr>     <chr>             <dbl>
## 1 grc_scge… Chengha… Chenghang  Zong      <NA>                 NA
## 2 grc_scge… Fuchou … Fuchou     Tang      <NA>                 NA
## 3 grc_scca… Liynat … Liynat     Jerby-Ar… <NA>                 NA
## 4 emrg_tech Leeat K… Leeat      Keren     <NA>                 NA
## 5 emrg_tech Miao-Pi… Miao-Ping  Chien     <NA>                 NA
## # … with 1 more variable: proportion_male <dbl>

I used some of my google-fu to use my best judgement on what the individual’s gender might be. However, it is worth noting that gender is not binary and I am only performing this part of the analysis to get a better guess of the gender-balance for the population as a whole for each conference.

df[match(c("Leeat Keren", "Miao-Ping Chien", "Liynat Jerby-Arnon"), df$name_ind),]$gender <- "female"
df[match(c("Chenghang Zong", "Fuchou Tang"), df$name_ind),]$gender <- "male"

Finally, I created a plot to show the gender balance of confirmed speakers and individuals on the organizing committees across the six conferences.

df %>% 
  left_join(url_confs, by = "name_conf") %>%
  group_by(name_long, gender) %>% 
  summarize(total = n()) %>% 
  ggplot(aes(x = name_long, y = total, fill = gender)) + 
  geom_bar(stat="identity", position = "fill") +  coord_flip() + 
  xlab("Conference") + 
  ylab("Proportion") + 
  ggtitle("Confirmed speakers and organizers at six single-cell conferences in 2020")

As you can see, there seems to be a difference in the (predicted) gender balance across the six conferences.

Anyways, I had fun exploring a bit of the landscape of the single-cell conferences coming up in 2020! Hopefully this was helpful for someone else too. :)

Happy holidays and Happy New Year!