Skip to main content
 首页 » 编程设计

R,dplyr : assign number of occurence as value to column at several group_by() levels

2025年04月02日62zlslch

require(plyr) 
require(dplyr) 
 
set.seed(8) 
df <- data.frame( 
  group = sample(c("A","B"), 10, replace=T), 
  subgroup = sample(c("a", "b", "c"),10, replace=T), 
  value = runif(10, -1,1) 
  ) 
df %>% arrange(group,subgroup) 

给出:

         group subgroup      value 
1      A        a -0.1841505 
2      A        a  0.3265360 
3      A        a -0.8045035 
4      A        b -0.5526222 
5      B        a  0.2238653 
6      B        a  0.0552373 
7      B        b  0.2297515 
8      B        b -0.5700525 
9      B        b  0.6347312 
10     B        c  0.9550054 

我可以指示值是高还是低,例如:

df2<- 
df %>% mutate(reg = ifelse(value > 0, "high", "low")) 
df2 

给出:

   group subgroup      value  reg 
1      A        b -0.5526222  low 
2      A        a -0.1841505  low 
3      B        b  0.2297515 high 
4      B        b -0.5700525  low 
5      A        a  0.3265360 high 
6      B        c  0.9550054 high 
7      A        a -0.8045035  low 
8      B        a  0.2238653 high 
9      B        a  0.0552373 high 
10     B        b  0.6347312 high 

问题: 我希望获得列 low.grouphigh.grouplow.subgrouphigh.subgroup 指示在该组中找到多少次高值和低值(我想到了 dplyr 的 group_by(group) n() ,也许使用 summarise()) 并在组 + 子组级别 (group_by(group, subgroup))。这将生成一个 6 行 x 6 列的数据框(A/B 和 a/b/c 的组合,以及列 groupsubgrouplow.group high.grouplow.subgrouphigh.subgroup)。第一列应为 (A, a, 3, 1, 2, 1),第二列应为 (A, b, 3, 1, 1, 0) 等。 我可以计算例如作者:

df %>% 
group_by(group,reg) %>% 
mutate(n.group=n()) 

但是如何将 n.group 拆分为 low.grouphigh.group 两列。子组也有同样的问题。

我确信 plyrdplyrreshape2 中的函数可以实现这种组合计数和汇总,但是如何实现呢?

更新: 这是我得到的手工结果:

group   subgroup    low.group   high.group  low.subgroup    high.subgroup 
A   a   3   1   2   1 
A   b   3   1   1   0 
A   c   3   1   0   0 
B   a   1   5   0   1 
B   b   1   5   1   2 
B   c   1   5   0   1 

请您参考如下方法:

有点长,但似乎达到了预期的效果:

library(dplyr) 
library(tidyr) 
df %>%  
  mutate(value = ifelse(value > 0, "high", "low")) %>% 
  group_by(group, subgroup, value) %>% 
  mutate(sub = n()) %>% 
  group_by(group, value) %>% 
  mutate(grp = n()) %>%  
  distinct(group, subgroup, value) %>%  
  gather(key, val, sub:grp) %>% 
  unite(x, value:key, sep = ".") %>% 
  spread(x, val, fill = 0) 
 
#Source: local data frame [5 x 6] 
# 
#  group subgroup high.grp high.sub low.grp low.sub 
#1     A        a        1        1       3       2 
#2     A        b        0        0       3       1 
#3     B        a        5        2       0       0 
#4     B        b        5        2       1       1 
#5     B        c        5        1       0       0 

请注意,组合 A-c 不会出现在示例数据中,因此不会出现在输出中。