A very common task is to split a data frame into subgroups, operate somehow on each group, and then bind the results back into one data frame: split-do-bind.
Let’s do something simple with the flight data: split into groupings by origin and carrier and compute the mean dep_delay.
1 data.frame
The classic way to do this involves the split, lapply and rbind (with help from do.call) functions. To split onto two groups we need to make a charcater (or factor) vector with compound meaning, origin_carrier, and split on that.
origin carrier V1
<char> <char> <num>
1: EWR UA 12.522869
2: LGA UA 12.087916
3: JFK AA 10.302155
4: JFK B6 12.757453
5: LGA DL 9.572997
6: EWR B6 13.100262
7: LGA EV 19.125500
8: LGA AA 6.705769
9: JFK UA 7.900000
10: LGA B6 14.805738
11: LGA MQ 8.528569
12: EWR AA 10.035419
13: JFK DL 8.333188
14: EWR MQ 17.467268
15: EWR DL 12.084592
16: EWR US 3.735104
17: EWR EV 20.164931
18: JFK US 5.866959
19: LGA WN 17.557000
20: JFK VX 13.279441
21: LGA FL 18.726075
22: EWR AS 5.804775
23: LGA US 3.306505
24: JFK MQ 13.199971
25: JFK 9E 19.001517
26: LGA F9 20.215543
27: EWR WN 17.864376
28: JFK HA 4.900585
29: JFK EV 18.520362
30: EWR 9E 5.951667
31: LGA 9E 8.894182
32: LGA YV 18.996330
33: LGA OO 10.434783
34: EWR VX 11.927378
35: EWR OO 20.833333
origin carrier V1
<char> <char> <num>
4 tidytable
This starts out quite similar to tibble, but you’ll note that there is no need to create an anonymous function to use within group_map(). Instead, the tidytable helper functions are “group-aware”.