Skip to content

Instantly share code, notes, and snippets.

@avrilcoghlan
Created December 17, 2013 11:07
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save avrilcoghlan/8003267 to your computer and use it in GitHub Desktop.
Save avrilcoghlan/8003267 to your computer and use it in GitHub Desktop.
Use the Ensembl Compara Perl API to get the families predicted for the human gene ENSG00000139618.
#!/usr/bin/env perl
# Get the families predicted for the human gene ENSG00000139618. What do you notice ?
# Note: Families include UniProt proteins and Ensembl genes/proteins.
use strict;
use warnings;
use Bio::EnsEMBL::Registry;
my $registry = 'Bio::EnsEMBL::Registry';
$registry->load_registry_from_db(
-host => 'ensembldb.ensembl.org',
-user => 'anonymous'
);
my $gma = $registry->get_adaptor('multi', 'compara', 'GeneMember');
my $fa = $registry->get_adaptor( 'multi', 'compara', 'family' );
my $gene_member = $gma->fetch_by_source_stable_id("ENSEMBLGENE", 'ENSG00000139618');
my @families = @{$fa->fetch_all_by_Member($gene_member)};
foreach my $fam (@families){
# For every Bio::EnsEMBL::Compara::Family you find, use the description() and description_score()
# methods for getting an idea of the function of the gene in that family.
my $desc = $fam->description(); # a description of the family, based on a consensus of descriptions of its members
my $score = $fam->description_score(); # a score for the description, saying how well it agrees between members
print "family $desc (score $score)\n";
# You can also use the get_all_Members() method in order to get the underlying member objects (both genes and peptides).
# A gene might belong to several families through different transcripts (this is unusual though).
my @mem = @{$fam->get_all_Members()};
foreach my $mem (@mem) {
my $stable_id = $mem->stable_id();
my $source_name = $mem->source_name(); # eg. ENSEMBLGENE or ENSEMBLPEP or Uniprot/SPTREMBL or Uniprot/SWISSPROT
my $taxon_name = $mem->taxon()->name(); # eg. Homo sapiens
print "___ source=$source_name id=$stable_id taxon=$taxon_name\n"; # could be a gene member or protein member
}
}
# Here we find one large family:
# family BREAST CANCER TYPE 2 SUSCEPTIBILITY HOMOLOG FANCONI ANEMIA GROUP D1 HOMOLOG (score 75)
# Also find tiny family UNKNOWN (score 0) with ENSG00000139618 & ENSP00000433168
# This peptide of the gene was put into a family of its own, presumably because it doesn't match the other family well enough.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment