library(dplyr)
Tables - math
data("iris")
<- tibble(iris) x
Doing math with columns in a tibble
Sometimes you want to do some math using columns of a tibble, like finding a sum for example.
Let’s try to make a new column containing the sum of Sepal.Length and Sepal.Width
|>
x mutate(z = Sepal.Length+Sepal.Width, .after=Sepal.Width)
# A tibble: 150 × 6
Sepal.Length Sepal.Width z Petal.Length Petal.Width Species
<dbl> <dbl> <dbl> <dbl> <dbl> <fct>
1 5.1 3.5 8.6 1.4 0.2 setosa
2 4.9 3 7.9 1.4 0.2 setosa
3 4.7 3.2 7.9 1.3 0.2 setosa
4 4.6 3.1 7.7 1.5 0.2 setosa
5 5 3.6 8.6 1.4 0.2 setosa
6 5.4 3.9 9.3 1.7 0.4 setosa
7 4.6 3.4 8 1.4 0.3 setosa
8 5 3.4 8.4 1.5 0.2 setosa
9 4.4 2.9 7.3 1.4 0.2 setosa
10 4.9 3.1 8 1.5 0.1 setosa
# ℹ 140 more rows
No problem, but if we want to wrap this step into a function and pass a vector of column names, we’ll need to think of another way:
The pick()
verb from dplyr allows us to supply a subset of columns to work on.
<- c("Sepal.Length", "Sepal.Width")
vars
|>
x mutate(z = sum(pick(all_of(vars))), .after=Sepal.Width)
# A tibble: 150 × 6
Sepal.Length Sepal.Width z Petal.Length Petal.Width Species
<dbl> <dbl> <dbl> <dbl> <dbl> <fct>
1 5.1 3.5 1335. 1.4 0.2 setosa
2 4.9 3 1335. 1.4 0.2 setosa
3 4.7 3.2 1335. 1.3 0.2 setosa
4 4.6 3.1 1335. 1.5 0.2 setosa
5 5 3.6 1335. 1.4 0.2 setosa
6 5.4 3.9 1335. 1.7 0.4 setosa
7 4.6 3.4 1335. 1.4 0.3 setosa
8 5 3.4 1335. 1.5 0.2 setosa
9 4.4 2.9 1335. 1.4 0.2 setosa
10 4.9 3.1 1335. 1.5 0.1 setosa
# ℹ 140 more rows
It looks like our new column z is a sum of both of the entire columns.
Using rowwise()
If we want a sum by row, we’ll need to use rowwise()
|>
x rowwise() |>
mutate(z = sum(pick(all_of(vars))), .after=Sepal.Width)
# A tibble: 150 × 6
# Rowwise:
Sepal.Length Sepal.Width z Petal.Length Petal.Width Species
<dbl> <dbl> <dbl> <dbl> <dbl> <fct>
1 5.1 3.5 8.6 1.4 0.2 setosa
2 4.9 3 7.9 1.4 0.2 setosa
3 4.7 3.2 7.9 1.3 0.2 setosa
4 4.6 3.1 7.7 1.5 0.2 setosa
5 5 3.6 8.6 1.4 0.2 setosa
6 5.4 3.9 9.3 1.7 0.4 setosa
7 4.6 3.4 8 1.4 0.3 setosa
8 5 3.4 8.4 1.5 0.2 setosa
9 4.4 2.9 7.3 1.4 0.2 setosa
10 4.9 3.1 8 1.5 0.1 setosa
# ℹ 140 more rows
It’s also now necessary to wrap your selection in all_of()
or tidyselect will complain with a message like this one:
Warning message: There was 1 warning in `mutate()`. ℹ In argument: `z = sum(pick(vars))`.
Caused by warning:
! Using an external vector in selections was deprecated in tidyselect 1.1.0.
ℹ Please use `all_of()` or `any_of()` instead.
\# Was: data %\>% select(vars)
\# Now: data %\>% select(all_of(vars))
See <https://tidyselect.r-lib.org/reference/faq-external-vector.html>.
Using apply()
Another way of accomplishing the same task is using apply()
to call sum()
on each row.
|>
x mutate(z = apply(pick(all_of(vars)),1,sum),
.after=Sepal.Width)
# A tibble: 150 × 6
Sepal.Length Sepal.Width z Petal.Length Petal.Width Species
<dbl> <dbl> <dbl> <dbl> <dbl> <fct>
1 5.1 3.5 8.6 1.4 0.2 setosa
2 4.9 3 7.9 1.4 0.2 setosa
3 4.7 3.2 7.9 1.3 0.2 setosa
4 4.6 3.1 7.7 1.5 0.2 setosa
5 5 3.6 8.6 1.4 0.2 setosa
6 5.4 3.9 9.3 1.7 0.4 setosa
7 4.6 3.4 8 1.4 0.3 setosa
8 5 3.4 8.4 1.5 0.2 setosa
9 4.4 2.9 7.3 1.4 0.2 setosa
10 4.9 3.1 8 1.5 0.1 setosa
# ℹ 140 more rows