Skip to content

Instantly share code, notes, and snippets.

@crhan
Created April 30, 2012 13:50
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save crhan/2558511 to your computer and use it in GitHub Desktop.
Save crhan/2558511 to your computer and use it in GitHub Desktop.
用 ruby 算一下去 '第二届中国推荐系统大会-TopGeek和GTUG联合主办' 的公司组成
# -*- coding: utf-8 -*-
require 'nokogiri'
require 'open-uri'
require 'pry'
doc = Nokogiri::HTML(open('http://resys.51qiangzuo.com/'))
a = doc.xpath('//table[@class="attendee_list"]/tr/td').grep(/公司/).map{|a| a.text.gsub(/公司:\s*/,"")}
num = {}
list = {
"土豆" => "土豆网",
"浙工大" => "浙江工业大学",
"盛大文学" => "盛大网络",
"盛大在线" => "盛大网络",
"盛大创新院" => "盛大网络",
"盛大" => "盛大网络",
"百度(中国)有限公司" => "百度",
"百度上海研发中心" => "百度",
"百度(中国)有限公司" => "百度",
"大众点评网" => "大众点评",
"阿里金融" => "阿里巴巴",
"阿里云" => "阿里巴巴",
"财华保网络科技有限公司" => "财华保",
"艾瑞咨询" => "艾瑞",
"淘宝网" => "淘宝",
"浙江大学计算机学院" => "浙江大学",
"江苏佰腾科技有限公司" => "江苏佰腾科技",
"星果游戏公司" => "星果网络",
"新蛋技术(中国)" => "新蛋",
"新蛋软件" => "新蛋",
"新蛋贸易" => "新蛋",
"新蛋信息技术(中国)有限公司" => "新蛋",
"新蛋信息技术有限公司" => "新蛋",
"新蛋信息技术" => "新蛋",
"新蛋(中国)技术有限公司" => "新蛋",
"新浪乐居" => "新浪",
"携程旅行网" => "携程",
"快乐淘宝" => "淘宝",
"微软公司" => "微软",
"成都电子科技大学—互联网科学中心" => "成都电子科技大学",
"常州速邦信息咨询有限公司" => "常州速帮",
"常州速帮信息咨询有限公司" => "常州速帮",
"常州佰腾科技科技有限公司" => "常州佰腾科技",
"好耶广告" => "好耶信息",
"奇虎360" => "奇虎",
"天猫" => "淘宝",
"大学" => "学生",
"复旦" => "复旦大学",
"北京大学网络与信息系统研究所" => "北京大学",
'分众' => '分众传媒',
'五分钟' => '上海五分钟',
'上海五分钟网络科技有限公司' => '上海五分钟',
'中原地产' => '中原集团',
'上海臻乐' => '上海臻乐网络科技有限公司',
'上海环彩网络有限公司' => '上海环彩网络',
'上海水渡石' => '上海水渡石信息技术有限公司',
'上海水渡石科技有限公司' => '上海水渡石信息技术有限公司',
'上海尚楚网络科技有线公司' => '上海尚源',
'上海交通大学' => '上海交大',
'1号店(1号商城)' => '1号店',
}
a.each do |e|
e = list[e]? list[e] : e
num[e] = 0 unless num.key?e
num[ e ] += 1
end
sorted_num = Hash[*num.sort{|a,b| b[1] <=> a[1]}.flatten]
binding.pry
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment