Created
July 12, 2009 03:42
-
-
Save lwu/145510 to your computer and use it in GitHub Desktop.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This ruby script takes in a TSV of type co-occurrence probabilities, | |
does some simple filtering, and prints the output. | |
The filtered text file is suitable for node-link diagram visualization. |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
football_coach -> person | |
family_member -> person | |
building -> structure | |
house -> structure | |
house -> project_focus | |
truck_trim_level -> trim_level | |
usnris.nris_listing -> usnris.topic | |
usnris.nris_listing -> listed_site | |
notable_person_with_medical_condition -> person | |
uspolitician.u_s_congressperson -> person | |
adultentertainment.adult_media -> adultentertainment.topic | |
thoroughbredracing.thoroughbred_racehorse_trainer -> thoroughbredracing.topic | |
delete_task -> task | |
thoroughbredracing.thoroughbred_racehorse_trainer -> horsefacts.topic | |
merge_task -> task | |
simple_merge_task -> task | |
vancouver.city_street -> vancouver.location_in_neighborhood | |
vancouver.city_street -> vancouver.topic | |
vancouver.city_street -> location | |
vancouver.city_street -> road | |
thoroughbredracing.thoroughbred_racehorse -> thoroughbredracing.topic | |
us_county -> location | |
uk_civil_parish -> statistical_region | |
uk_civil_parish -> location | |
vancouver.location_in_neighborhood -> vancouver.topic | |
neighborhood -> location | |
jp_district -> statistical_region | |
jp_district -> dated_location | |
jp_city_town -> statistical_region | |
jp_city_town -> dated_location | |
in_district -> statistical_region | |
in_district -> location | |
in_district -> dated_location | |
barbie.barbie_doll -> barbie.topic | |
barbie.barbie_doll -> consumer_product | |
barbie.barbie_doll -> collectable_item | |
bioventurist.bv_therapeutic -> bioventurist.product | |
bioventurist.bv_venture_investor -> venture_investor | |
birdwatching.checklist_bird -> organism_classification | |
australian_suburb -> location | |
australian_local_government_area -> statistical_region | |
australian_local_government_area -> location | |
australian_local_government_area -> dated_location | |
braziliangovt.politician -> politician | |
braziliangovt.politician -> person | |
britishpubs.pub -> britishpubs.topic | |
britishpubs.pub -> business_location | |
britishpubs.pub -> employer | |
britishpubs.pub -> drinking_establishment | |
sports_team_location -> location | |
classiccars.classic_car -> model | |
classiccars.classic_car -> classiccars.topic | |
contractbridge.bridge_player -> contractbridge.topic | |
contractbridge.bridge_player -> person | |
crime.executed_person -> crime.convicted_criminal | |
crime.executed_person -> crime.topic | |
crime.lawyer -> crime.topic | |
9202a8c04000641f800000000ae936d4 -> moscratch.topic | |
engineering.engineering_person -> engineering.topic | |
engineering.engineering_person -> person | |
engineering.engineering_person -> project_participant | |
fashionmodels.fashion_model -> person | |
formula1.formula_1_driver -> formula1.topic | |
wfilmbase.film -> wfilmbase.topic | |
wfilmbase.topic -> wfilmbase.film | |
gayporn.topic -> gayporn.gay_porn | |
9202a8c04000641f8000000009e713a9 -> zxspectrum.zx_spectrum_program | |
9202a8c04000641f8000000009e713a9 -> zxspectrum.topic | |
golfcourses.golf_club -> golfcourses.topic | |
golfcourses.golf_course -> golfcourses.topic | |
horseracing.racehorse -> horseracing.topic | |
horseracing.racehorse -> thoroughbredracing.thoroughbred_racehorse | |
horseracing.racehorse -> thoroughbredracing.topic | |
horseracing.racehorse -> organism | |
indianelections2009.constituency -> indianelections.topic | |
indianelections2009.constituency -> indianelections2009.topic | |
indianelections2009.topic -> indianelections.topic | |
infrastructure.power_station -> structure | |
infrastructure.power_station -> infrastructure.topic | |
juiced.topic -> juiced.user_of_banned_substances | |
juiced.user_of_banned_substances -> juiced.topic | |
litcentral.focal_taxa -> litcentral.topic | |
losangelesbands.topic -> artist | |
9202a8c04000641f80000000086e6204 -> engineering.topic | |
marchmadness.ncaa_basketball_tournament_game -> marchmadness.topic | |
marchmadness.ncaa_basketball_tournament_game -> event | |
marchmadness.ncaa_basketball_tournament_stage -> marchmadness.topic | |
marchmadness.ncaa_basketball_tournament_stage -> event | |
theater_actor -> person | |
moscratch.shce021709 -> moscratch.topic | |
nobelprizes.nobel_prize_winner -> award_winner | |
nobelprizes.nobel_prize_winner -> nobelprizes.topic | |
yalebase.person -> yalebase.topic | |
passpm.project_management_concept -> passpm.topic | |
waterfall -> location | |
waterfall -> body_of_water | |
mountain_range -> location | |
written_by -> attribution | |
saturdaynightlive.snl_episode -> tv_series_episode | |
petbreeds.dog_breed -> animal_breed | |
football_player -> person | |
zxspectrum.zx_spectrum_program -> zxspectrum.topic | |
popstra.company -> popstra.sww_base | |
popstra.company -> popstra.topic | |
popstra.organization -> popstra.sww_base | |
popstra.organization -> popstra.topic | |
popstra.party -> popstra.sww_base | |
popstra.party -> popstra.topic | |
popstra.party_attendance_person -> popstra.sww_base | |
popstra.product -> popstra.topic | |
popstra.product_choice -> popstra.sww_base | |
popstra.restaurant -> popstra.sww_base | |
popstra.restaurant -> popstra.topic | |
popstra.restaurant_choice -> popstra.sww_base | |
popstra.support -> popstra.sww_base | |
provenance -> attribution | |
rugby.views.rugby_player -> rugby.rugby_player | |
golfer -> person | |
cricket_player -> person | |
baseball_player -> person | |
basketball_coach -> person | |
cricket_bowler -> person | |
cricket_bowler -> cricket_player | |
gene_group -> gene_ontology_group | |
gene_group_membership_evidence -> gene_ontology_group_membership_evidence | |
gene_ontology_group_membership_evidence -> gene_group_membership_evidence | |
cyclist -> person | |
company_advisor -> person | |
australian_rules_footballer -> person | |
tv_station -> broadcast | |
podcast_feed -> broadcast | |
internet_stream -> broadcast | |
boxer -> person | |
release_component -> creative_work | |
multipart_release -> creative_work | |
multipart_release -> release | |
gene_ontology_group -> gene_group | |
release_component -> release | |
release -> creative_work | |
football_team -> sports_team | |
chivalric_order_member -> person | |
pro_athlete -> person | |
pro_athlete -> measured_person | |
uk_civil_parish -> dated_location | |
academic -> person | |
user_profile -> user | |
measured_person -> person | |
user_profile -> namespace | |
soundtrack -> album | |
basketball_player -> person | |
songwriter -> composer | |
crime.lawyer -> person | |
user -> namespace | |
skyscraper -> structure | |
statistical_region -> dated_location | |
orbital_relationship -> celestial_object | |
lake -> body_of_water | |
football_player -> person | |
uk_civil_parish -> administrative_division | |
noble_person -> person | |
us_county -> dated_location | |
us_county -> statistical_region | |
skyscraper -> project_focus | |
domain_profile -> domain | |
lake -> location | |
9202a8c04000641f8000000008fe7278 -> tv_program | |
givennames.topic -> givennames.given_name | |
military_person -> person | |
vancouver.location_in_neighborhood -> location | |
birdconservation.bird_taxa -> birdconservation.topic | |
deceased_person -> person | |
politician -> person | |
gayporn.gay_porn -> gayporn.topic | |
us_county -> administrative_division | |
skyscraper -> building | |
place_of_interment -> location | |
user -> user_profile | |
tennis_player -> person | |
amusementparks.ride -> amusementparks.topic | |
songwriter -> lyricist | |
venture_funded_company -> employer | |
tv_actor -> person | |
prison.prisoner -> person | |
nobelprizes.topic -> nobelprizes.nobel_prize_winner | |
nobelprizes.topic -> award_winner | |
citytown -> location | |
bangladeshipeople.topic -> person | |
litcentral.focal_taxa -> book_subject | |
vineyard -> location | |
9202a8c04000641f800000000ae936d4 -> moscratch.shce021709 | |
frameline.topic -> film | |
playboyplaymates.playmate -> playboyplaymates.topic | |
playboyplaymates.playmate -> person | |
venture_funded_company -> company | |
star -> celestial_object | |
in_district -> administrative_division | |
greatfilms.ranking -> greatfilms.topic | |
book -> written_work | |
geometry -> content | |
classiccars.topic -> model | |
popstra.fashion_choice -> popstra.sww_base | |
popstra.fashion_choice -> popstra.topic | |
popstra.topic -> popstra.sww_base | |
olympic_event_competition -> event | |
guitarist -> artist | |
bioventurist.science_or_technology_company -> company | |
bioventurist.science_or_technology_company -> employer | |
sports_championship_event -> event | |
computer_scientist -> person | |
political_party -> organization | |
writer -> person | |
australian_suburb -> dated_location | |
australian_suburb -> statistical_region | |
author -> person | |
visual_artist -> person | |
board_member -> person | |
hockey_player -> person | |
moscratch.shce021709 -> 9202a8c04000641f800000000ae936d4 | |
school -> educational_institution | |
zxspectrum.zx_spectrum_program -> computer_videogame | |
birdconservation.bird_taxa -> organism_classification | |
adultentertainment.adult_entertainer -> person | |
director -> person | |
golfcourses.golf_club -> location | |
9202a8c04000641f8000000009e713a9 -> computer_videogame | |
dated_location -> location | |
actor -> person | |
amusementparks.roller_coaster -> amusementparks.ride | |
physician -> person | |
statistical_region -> location | |
tv_director -> person | |
editor -> person | |
cinematographer -> person | |
filmcameras.camera_lens -> filmcameras.topic | |
political_district -> location | |
wfilmbase.topic -> film | |
wfilmbase.film -> film | |
journal -> periodical | |
formula1.formula_1_grand_prix -> formula1.topic | |
monarch -> person | |
chess_player -> person | |
school_district -> location | |
tropical_cyclone -> disaster2.topic | |
astronaut -> person | |
9202a8c04000641f80000000086e612d -> engineering.topic | |
architect -> person | |
cemetery -> place_of_interment | |
cemetery -> dated_location | |
cemetery -> location | |
film_character -> fictional_character | |
fashionmodels.fashion_model -> fashionmodels.topic | |
guitarist -> person | |
amusementparks.roller_coaster -> amusementparks.topic | |
crime.convicted_criminal -> crime.topic | |
olympic_athlete -> person | |
musical_group -> artist | |
university -> educational_institution | |
litcentral.focal_taxa -> organism_classification | |
company_founder -> person | |
drinking_establishment -> business_location | |
drinking_establishment -> employer | |
yalebase.person -> person | |
activism.activist -> person | |
disaster2.death_causing_event -> event | |
mountain -> location | |
play -> written_work | |
apps.application -> domain | |
administrative_division -> location | |
9202a8c04000641f80000000086e6204 -> project_participant | |
popstra.product -> popstra.sww_base | |
governmental_jurisdiction -> location | |
moscratch.topic -> moscratch.shce021709 | |
restaurant -> employer | |
cricket_bowler -> measured_person | |
cricket_bowler -> pro_athlete | |
producer -> person | |
sports_league_season -> event | |
horseracing.topic -> organism | |
horseracing.topic -> thoroughbredracing.topic | |
horseracing.topic -> thoroughbredracing.thoroughbred_racehorse | |
engineering.engineering_person -> 9202a8c04000641f80000000086e6204 | |
americancomedy.comedian -> americancomedy.topic | |
nascar.nascar_driver -> person | |
horseracing.topic -> horseracing.racehorse | |
moscratch.topic -> 9202a8c04000641f800000000ae936d4 | |
short_story -> written_work | |
apps.acre_app -> domain | |
poem -> written_work | |
inventor -> person | |
celebrity -> popstra.celebrity | |
rugby.views.rugby_player -> person | |
juiced.topic -> person | |
juiced.user_of_banned_substances -> person | |
usnris.topic -> listed_site | |
usnris.nris_listing -> location | |
adultentertainment.adult_entertainer -> adultentertainment.topic | |
tv_writer -> person | |
group_member -> artist | |
celebrity -> popstra.sww_base | |
celebrity -> popstra.topic | |
astronomer -> person | |
formula1.formula_1_driver -> person | |
prison.prisoner -> prison.topic |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
#!/usr/bin/env ruby | |
x = IO.readlines('cotype2009_07_10sans_user_types.txt') | |
z = x.sort_by { |ya| ya.split("\t")[5].to_f }.reverse.select do |ya| | |
arr = ya.split("\t") | |
(arr[5].to_f > 0.99) && (arr[2].to_i > 500) && (arr[3].to_i > 500) && (arr[2].to_i + arr[3].to_i > 2000) | |
end | |
puts z |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment