Chien-Lan Hsueh 2022-06-29

Coolest Things in R - Pipes

Compared to other programming languages, R is quite unique in many aspects. Although it is not designed as a general purposed programming language, many of its special features have been developed to make processing data easy. To me, three of the coolest things when programming in R are pipes, non-standard evaluation (NSE) and list columns.

Pipes

(Source:
<https://www.r-bloggers.com/2022/03/how-to-use-pipes-to-clean-up-your-r-code/>)

Originally from the maggritr package, the pipe operator %>% has become one of the most popular features to R users today. It transforms composite function calls f(g(h(x))) into a chained expression x %>% h() %>% g() %>% f(). Instead of having a series of nesting function calls like this:

df_nest <- 
  summarise(
    group_by(
      mutate(
        drop_na(starwars, mass, height), 
        BMI = mass / height^2 * 100^2
      ),
      gender
    ),
    BMI_average = mean(BMI)
  )

df_nest
## # A tibble: 3 × 2
##   gender    BMI_average
##   <chr>           <dbl>
## 1 feminine         19.2
## 2 masculine        34.7
## 3 <NA>             15.1

we are able to write cleaner codes using the pipe operator:

df_pipe <- starwars %>% 
  drop_na(mass, height) %>% 
  mutate(BMI = mass / height^2 * 100^2) %>% 
  group_by(gender) %>% 
  summarise(BMI_average = mean(BMI))
df_pipe
## # A tibble: 3 × 2
##   gender    BMI_average
##   <chr>           <dbl>
## 1 feminine         19.2
## 2 masculine        34.7
## 3 <NA>             15.1

What makes things better is the other pipe operators provided in the maggritr package. The exposition pipe %$% can be handy when using functions that don’t have built-in data argument (aka “non-pipe-friendly” functions). For example, to make a scatter plot of mass and height variables from starwars data set using plot() function, the following code doesn’t work:

starwars %>% 
  plot(mass, height)

To remedy this, we can use the exposition pipe %$% to “expose” the data (the scope of the available variables) to the function on the right hand side.

starwars %$% 
  plot(mass, height)

Another useful pipe operator is the tee pipe %T>%. It is useful when you are interested in a side effect from a function and want to chain the data to more functions afterward. The tee pipe works in a way to branch out the data flows:

starwars %>%
  select(height, mass) %T>% 
  # 1st branch
  plot(main = "Mass") %>% 
  # 2nd branch
  mutate(BMI = mass / height^2 * 100^2) %$% 
  plot(height, BMI, main = "BMI")

Starting from v4.1, base R supports the native pipe \>. It’s not quite the same with %>% yet but I believe the native pipe will improve soon. Hopefully the variants (like tee and exposition pipes) will also be included. Both maggritr pipes and the native pipe are really cool. BAM!

NSE

Non-Standard Evaluation (NSE) is another interesting feature R has. It is a quite controversial topic in R communities because you rarely see NSE in other programming languages. Some people like it while some don’t. For people who don’t but enjoy using tidyvser, there is a big chance that they already use NSE a lot. In the code examples above, every times we call the functions group_by() and muate(), we use NSE for its magic in the background.

Let’s say we want to define a helper function to summarize a variable of a data frame by group. We want to make the function flexible so that we can specify the grouping variable as well as the variable to summaries. Furthermore, we want to use the variable names as the function argument instead of character variables that store the columns names of interest. Here is an example of how we can define the helper function with use of NSE:

summarise_groups_NSE <- function(df, group, var){
  df %>%
    group_by() %>%  
    summarise( := mean(, na.rm = TRUE))
}

Then we can just use the variable names for the arguments:

starwars %>% summarise_groups_NSE(gender, height)
## # A tibble: 3 × 2
##   gender    height
##   <chr>      <dbl>
## 1 feminine    165.
## 2 masculine   177.
## 3 <NA>        181.
starwars %>% summarise_groups_NSE(gender, mass)
## # A tibble: 3 × 2
##   gender     mass
##   <chr>     <dbl>
## 1 feminine   54.7
## 2 masculine 106. 
## 3 <NA>       48

NSE is extensively used in many functions from tidyverse and more and more new tools are developed with NSE built-in. This is very cool. Double BAM!!

List Columns

The last one I would like to include here is list columns. List columns make it possible to have a data frame object saved in a column of a data frame resulting in a nested data frame. This is really cool and fun to use in many applications. For example, read in all Excel spreadsheets at once:

# read excel with multiple sheets 
df_raw <- 
  data.frame(sheetname = excel_sheets(excel.file)) %>% 
  mutate(contents = map(sheetname, ~read_excel(excel.file, sheet = .)))

The returned data frame has a column sheetname of the sheet names and another column content to store the data from each spreadsheet. List columns are very convenient when used with Broom package to tidy up outputs of many modeling functions. We can easily compare the performance of each model side-by-side in one master data frame. This is super cool! Triple BAM!!!


<
Previous Post
Some Thoughts Aboout Using API
>
Next Post
Automation on Reproting