Skip to content

Instantly share code, notes, and snippets.

@lwu
Created July 12, 2009 03:42
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save lwu/145510 to your computer and use it in GitHub Desktop.
Save lwu/145510 to your computer and use it in GitHub Desktop.
This ruby script takes in a TSV of type co-occurrence probabilities,
does some simple filtering, and prints the output.
The filtered text file is suitable for node-link diagram visualization.
football_coach -> person
family_member -> person
building -> structure
house -> structure
house -> project_focus
truck_trim_level -> trim_level
usnris.nris_listing -> usnris.topic
usnris.nris_listing -> listed_site
notable_person_with_medical_condition -> person
uspolitician.u_s_congressperson -> person
adultentertainment.adult_media -> adultentertainment.topic
thoroughbredracing.thoroughbred_racehorse_trainer -> thoroughbredracing.topic
delete_task -> task
thoroughbredracing.thoroughbred_racehorse_trainer -> horsefacts.topic
merge_task -> task
simple_merge_task -> task
vancouver.city_street -> vancouver.location_in_neighborhood
vancouver.city_street -> vancouver.topic
vancouver.city_street -> location
vancouver.city_street -> road
thoroughbredracing.thoroughbred_racehorse -> thoroughbredracing.topic
us_county -> location
uk_civil_parish -> statistical_region
uk_civil_parish -> location
vancouver.location_in_neighborhood -> vancouver.topic
neighborhood -> location
jp_district -> statistical_region
jp_district -> dated_location
jp_city_town -> statistical_region
jp_city_town -> dated_location
in_district -> statistical_region
in_district -> location
in_district -> dated_location
barbie.barbie_doll -> barbie.topic
barbie.barbie_doll -> consumer_product
barbie.barbie_doll -> collectable_item
bioventurist.bv_therapeutic -> bioventurist.product
bioventurist.bv_venture_investor -> venture_investor
birdwatching.checklist_bird -> organism_classification
australian_suburb -> location
australian_local_government_area -> statistical_region
australian_local_government_area -> location
australian_local_government_area -> dated_location
braziliangovt.politician -> politician
braziliangovt.politician -> person
britishpubs.pub -> britishpubs.topic
britishpubs.pub -> business_location
britishpubs.pub -> employer
britishpubs.pub -> drinking_establishment
sports_team_location -> location
classiccars.classic_car -> model
classiccars.classic_car -> classiccars.topic
contractbridge.bridge_player -> contractbridge.topic
contractbridge.bridge_player -> person
crime.executed_person -> crime.convicted_criminal
crime.executed_person -> crime.topic
crime.lawyer -> crime.topic
9202a8c04000641f800000000ae936d4 -> moscratch.topic
engineering.engineering_person -> engineering.topic
engineering.engineering_person -> person
engineering.engineering_person -> project_participant
fashionmodels.fashion_model -> person
formula1.formula_1_driver -> formula1.topic
wfilmbase.film -> wfilmbase.topic
wfilmbase.topic -> wfilmbase.film
gayporn.topic -> gayporn.gay_porn
9202a8c04000641f8000000009e713a9 -> zxspectrum.zx_spectrum_program
9202a8c04000641f8000000009e713a9 -> zxspectrum.topic
golfcourses.golf_club -> golfcourses.topic
golfcourses.golf_course -> golfcourses.topic
horseracing.racehorse -> horseracing.topic
horseracing.racehorse -> thoroughbredracing.thoroughbred_racehorse
horseracing.racehorse -> thoroughbredracing.topic
horseracing.racehorse -> organism
indianelections2009.constituency -> indianelections.topic
indianelections2009.constituency -> indianelections2009.topic
indianelections2009.topic -> indianelections.topic
infrastructure.power_station -> structure
infrastructure.power_station -> infrastructure.topic
juiced.topic -> juiced.user_of_banned_substances
juiced.user_of_banned_substances -> juiced.topic
litcentral.focal_taxa -> litcentral.topic
losangelesbands.topic -> artist
9202a8c04000641f80000000086e6204 -> engineering.topic
marchmadness.ncaa_basketball_tournament_game -> marchmadness.topic
marchmadness.ncaa_basketball_tournament_game -> event
marchmadness.ncaa_basketball_tournament_stage -> marchmadness.topic
marchmadness.ncaa_basketball_tournament_stage -> event
theater_actor -> person
moscratch.shce021709 -> moscratch.topic
nobelprizes.nobel_prize_winner -> award_winner
nobelprizes.nobel_prize_winner -> nobelprizes.topic
yalebase.person -> yalebase.topic
passpm.project_management_concept -> passpm.topic
waterfall -> location
waterfall -> body_of_water
mountain_range -> location
written_by -> attribution
saturdaynightlive.snl_episode -> tv_series_episode
petbreeds.dog_breed -> animal_breed
football_player -> person
zxspectrum.zx_spectrum_program -> zxspectrum.topic
popstra.company -> popstra.sww_base
popstra.company -> popstra.topic
popstra.organization -> popstra.sww_base
popstra.organization -> popstra.topic
popstra.party -> popstra.sww_base
popstra.party -> popstra.topic
popstra.party_attendance_person -> popstra.sww_base
popstra.product -> popstra.topic
popstra.product_choice -> popstra.sww_base
popstra.restaurant -> popstra.sww_base
popstra.restaurant -> popstra.topic
popstra.restaurant_choice -> popstra.sww_base
popstra.support -> popstra.sww_base
provenance -> attribution
rugby.views.rugby_player -> rugby.rugby_player
golfer -> person
cricket_player -> person
baseball_player -> person
basketball_coach -> person
cricket_bowler -> person
cricket_bowler -> cricket_player
gene_group -> gene_ontology_group
gene_group_membership_evidence -> gene_ontology_group_membership_evidence
gene_ontology_group_membership_evidence -> gene_group_membership_evidence
cyclist -> person
company_advisor -> person
australian_rules_footballer -> person
tv_station -> broadcast
podcast_feed -> broadcast
internet_stream -> broadcast
boxer -> person
release_component -> creative_work
multipart_release -> creative_work
multipart_release -> release
gene_ontology_group -> gene_group
release_component -> release
release -> creative_work
football_team -> sports_team
chivalric_order_member -> person
pro_athlete -> person
pro_athlete -> measured_person
uk_civil_parish -> dated_location
academic -> person
user_profile -> user
measured_person -> person
user_profile -> namespace
soundtrack -> album
basketball_player -> person
songwriter -> composer
crime.lawyer -> person
user -> namespace
skyscraper -> structure
statistical_region -> dated_location
orbital_relationship -> celestial_object
lake -> body_of_water
football_player -> person
uk_civil_parish -> administrative_division
noble_person -> person
us_county -> dated_location
us_county -> statistical_region
skyscraper -> project_focus
domain_profile -> domain
lake -> location
9202a8c04000641f8000000008fe7278 -> tv_program
givennames.topic -> givennames.given_name
military_person -> person
vancouver.location_in_neighborhood -> location
birdconservation.bird_taxa -> birdconservation.topic
deceased_person -> person
politician -> person
gayporn.gay_porn -> gayporn.topic
us_county -> administrative_division
skyscraper -> building
place_of_interment -> location
user -> user_profile
tennis_player -> person
amusementparks.ride -> amusementparks.topic
songwriter -> lyricist
venture_funded_company -> employer
tv_actor -> person
prison.prisoner -> person
nobelprizes.topic -> nobelprizes.nobel_prize_winner
nobelprizes.topic -> award_winner
citytown -> location
bangladeshipeople.topic -> person
litcentral.focal_taxa -> book_subject
vineyard -> location
9202a8c04000641f800000000ae936d4 -> moscratch.shce021709
frameline.topic -> film
playboyplaymates.playmate -> playboyplaymates.topic
playboyplaymates.playmate -> person
venture_funded_company -> company
star -> celestial_object
in_district -> administrative_division
greatfilms.ranking -> greatfilms.topic
book -> written_work
geometry -> content
classiccars.topic -> model
popstra.fashion_choice -> popstra.sww_base
popstra.fashion_choice -> popstra.topic
popstra.topic -> popstra.sww_base
olympic_event_competition -> event
guitarist -> artist
bioventurist.science_or_technology_company -> company
bioventurist.science_or_technology_company -> employer
sports_championship_event -> event
computer_scientist -> person
political_party -> organization
writer -> person
australian_suburb -> dated_location
australian_suburb -> statistical_region
author -> person
visual_artist -> person
board_member -> person
hockey_player -> person
moscratch.shce021709 -> 9202a8c04000641f800000000ae936d4
school -> educational_institution
zxspectrum.zx_spectrum_program -> computer_videogame
birdconservation.bird_taxa -> organism_classification
adultentertainment.adult_entertainer -> person
director -> person
golfcourses.golf_club -> location
9202a8c04000641f8000000009e713a9 -> computer_videogame
dated_location -> location
actor -> person
amusementparks.roller_coaster -> amusementparks.ride
physician -> person
statistical_region -> location
tv_director -> person
editor -> person
cinematographer -> person
filmcameras.camera_lens -> filmcameras.topic
political_district -> location
wfilmbase.topic -> film
wfilmbase.film -> film
journal -> periodical
formula1.formula_1_grand_prix -> formula1.topic
monarch -> person
chess_player -> person
school_district -> location
tropical_cyclone -> disaster2.topic
astronaut -> person
9202a8c04000641f80000000086e612d -> engineering.topic
architect -> person
cemetery -> place_of_interment
cemetery -> dated_location
cemetery -> location
film_character -> fictional_character
fashionmodels.fashion_model -> fashionmodels.topic
guitarist -> person
amusementparks.roller_coaster -> amusementparks.topic
crime.convicted_criminal -> crime.topic
olympic_athlete -> person
musical_group -> artist
university -> educational_institution
litcentral.focal_taxa -> organism_classification
company_founder -> person
drinking_establishment -> business_location
drinking_establishment -> employer
yalebase.person -> person
activism.activist -> person
disaster2.death_causing_event -> event
mountain -> location
play -> written_work
apps.application -> domain
administrative_division -> location
9202a8c04000641f80000000086e6204 -> project_participant
popstra.product -> popstra.sww_base
governmental_jurisdiction -> location
moscratch.topic -> moscratch.shce021709
restaurant -> employer
cricket_bowler -> measured_person
cricket_bowler -> pro_athlete
producer -> person
sports_league_season -> event
horseracing.topic -> organism
horseracing.topic -> thoroughbredracing.topic
horseracing.topic -> thoroughbredracing.thoroughbred_racehorse
engineering.engineering_person -> 9202a8c04000641f80000000086e6204
americancomedy.comedian -> americancomedy.topic
nascar.nascar_driver -> person
horseracing.topic -> horseracing.racehorse
moscratch.topic -> 9202a8c04000641f800000000ae936d4
short_story -> written_work
apps.acre_app -> domain
poem -> written_work
inventor -> person
celebrity -> popstra.celebrity
rugby.views.rugby_player -> person
juiced.topic -> person
juiced.user_of_banned_substances -> person
usnris.topic -> listed_site
usnris.nris_listing -> location
adultentertainment.adult_entertainer -> adultentertainment.topic
tv_writer -> person
group_member -> artist
celebrity -> popstra.sww_base
celebrity -> popstra.topic
astronomer -> person
formula1.formula_1_driver -> person
prison.prisoner -> prison.topic
#!/usr/bin/env ruby
x = IO.readlines('cotype2009_07_10sans_user_types.txt')
z = x.sort_by { |ya| ya.split("\t")[5].to_f }.reverse.select do |ya|
arr = ya.split("\t")
(arr[5].to_f > 0.99) && (arr[2].to_i > 500) && (arr[3].to_i > 500) && (arr[2].to_i + arr[3].to_i > 2000)
end
puts z
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment