Skip to content

Instantly share code, notes, and snippets.

@dbolser-ebi
dbolser-ebi / project_variation_to_v3.plx
Created January 22, 2014 09:37
Script for mapping features via the Ensembl API. Actually performing the database update via this script *is* possible, but it's painfully slow. Instead I dump the results and do the update in MySQL. To adapt this script to features not currently in the database, first build a (minimal) new feature from the input file and 'transform' that.
#!/usr/bin/env perl
use strict;
use warnings;
use Data::Dumper;
use Bio::EnsEMBL::Registry;
my $verbose = 0;
#! perl
use strict;
use warnings;
use Bio::EnsEMBL::Registry;
my $species = 'sorghum_bicolor';
warn "Loading registry\n";
##gff-version 3
##sequence-region 1812.scaffold02418 1 1541
1812.scaffold02418 AUGUSTUS gene 1 1541 0.29 - . ID=Bv_56410_edoq;Name=Bv_56410_edoq
1812.scaffold02418 AUGUSTUS mRNA 1 1541 0.29 - . ID=Bv_56410_edoq.t1;Parent=Bv_56410_edoq;Name=Bv_56410_edoq.t1 88.9%;Note=cDNAcoverage_88.9%
1812.scaffold02418 AUGUSTUS intron 1 218 0.39 - . ID=Bv_56410_edoq.t1.intron;Parent=Bv_56410_edoq.t1
1812.scaffold02418 AUGUSTUS CDS 219 293 0.67 - 0 ID=Bv_56410_edoq.t1.CDS;Parent=Bv_56410_edoq.t1
1812.scaffold02418 AUGUSTUS intron 294 610 0.83 - . ID=Bv_56410_edoq.t1.intron;Parent=Bv_56410_edoq.t1
1812.scaffold02418 AUGUSTUS CDS 611 655 0.83 - 0 ID=Bv_56410_edoq.t1.CDS;Parent=Bv_56410_edoq.t1
1812.scaffold02418 AUGUSTUS start_codon 653 655 . - 0 ID=Bv_564
#!perl
use strict;
use warnings;
## Easy manipulation of sets of integers (arbitrary intervals)
use Set::IntRange;
## Files
22:24 < dbolser> hello, I'm calling 'the same' script in two different
contexts, on one it works, on one it doesn't:
22:24 < dbolser> http://www.ebi.ac.uk/~dbolser/Genoverse/?r=3:9874770-9902882
22:24 < dbolser> http://dev.transplantdb.eu/node/4328
22:27 < yansanmo> dbolser, you don't have jquery-ui on the second ?
22:27 < yansanmo> and jquery-ui contains .sortable() function
22:27 < dbolser> yansanmo: that's weird, I should have! Thanks for this
information!
DROP TABLE IF EXISTS temp_individual_genotype_a;
Query OK, 0 rows affected (0.02 sec)
CREATE TABLE temp_individual_genotype_a
-> (INDEX pk_ish (variation_id, population_id, allele_code_id))
-> SELECT
-> variation_id, population_id, allele_code_id_1 AS allele_code_id,
-> `COUNT(*)` AS N
-> FROM temp_individual_genotype_a1 LIMIT 6000000;
Query OK, 6000000 rows affected, 1 warning (6.08 sec)
-- Try adding an index...
#ALTER TABLE tmp_individual_genotype_single_bp
# ADD INDEX allele_1_idx (allele_1),
# ADD INDEX allele_2_idx (allele_2);
# Query OK, 277545309 rows affected (43 min 29.78 sec)
-- The above index seems to have no impact on query execution time
-- below, as the individual_id_idx is picked instead of either of
dbolser@dbolser-laptop ~ $ time curl http://$mydomain:9213
{
"status" : 200,
"name" : "urgi_node3",
"cluster_name" : "urgi_tribe_demo",
"version" : {
"number" : "1.4.3",
"build_hash" : "36a29a7144cfde87a960ba039091d40856fcb9af",
"build_timestamp" : "2015-02-11T14:23:15Z",
"build_snapshot" : false,
1424361431: Done 116000
1424361431: Done 117000
1424361431: Done 118000
1424361431: Done 119000
1424361431: Done 120000
1424361431: Done 121000
1424361431: Done 122000
1424361431: Done 123000
1424361431: Done 124000
1424361431: Done 125000