Skip to content

Instantly share code, notes, and snippets.

@RebieKong
Created March 21, 2018 02:44
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save RebieKong/499499a010fea1b23fe033bc3611203a to your computer and use it in GitHub Desktop.
Save RebieKong/499499a010fea1b23fe033bc3611203a to your computer and use it in GitHub Desktop.
相同QQ群问题
val input: Seq[(String, Seq[String])] = List[(String, List[String])](
("A", List[String]("1", "2", "3")),
("B", List[String]("2", "3")),
("C", List[String]("3", "4"))
)
val output = input
// 转化成 uid => gid
.flatten(i => i._2.map(uid => (uid, i._1)))
// 转化成 uid => set(gid)
.groupBy(i => i._1)
// 转化成 (gid-x,gid-y)
.flatten(i => {
val rs = ListBuffer.empty[(String, String)]
val gids = i._2.map(t => t._2)
for (i <- gids.indices) {
for (j <- gids.indices) if (i < j) {
rs.append((gids(i), gids(j)))
}
}
rs
})
// 转化成 ((gid-x,gid-y),same_count)
.groupBy(v => v).map(t => (t._1, t._2.size))
println(output)
A,'qq群1','qq群1注解',['成员01','成员02','成员03']
B,'qq群2','qq群2注解',['成员02','成员03']
C,'qq群3','qq群2注解',['成员03','成员04']

1000w数据表test:数据格式在上面 qq_group_id,group_name,group_desc,group_member(arrays) 求相似性:在qq_group_id=A和qq_group_id=B的交集,所有的有找出来,没有就写0 就是在qq群A的人在群B,群C的人数: A,B,2 A,C,1 B,C,1

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment