Skip to content

Instantly share code, notes, and snippets.

What would you like to do?
bulk updating of feature locations
use strict;
use autodie qw/open close/;
use DBI;
use Getopt::Long;
use Bio::GFF3::LowLevel qw/gff3_parse_feature/;
my ( $dsn, $user, $pass );
'dsn=s' => \$dsn,
'u|user=s' => \$user,
'p|password=s' => \$pass
die "no input file given\n" if !$ARGV[0];
my $dbh = DBI->connect( $dsn, $user, $pass,
{ PrintError => 1, RaiseError => 1 } );
CREATE TEMP TABLE temp_feature_update (
id varchar(56) NOT NULL,
start integer NOT NULL
end integer NOT NULL
my $input = IO::File->new( $ARGV[0], 'r' );
$dbh->do("COPY temp_feature(id,start,end) FROM STDIN");
while ( my $line = $input->getline ) {
my $feature_hashref = gff3_parse_feature($line);
my $id = $feature_hashref->{attributes}->{ID}->[0];
UPDATE featureloc
SET featureloc.fmin = tmpf.start,
featureloc.fmax = tmpf.end
FROM featureloc,feature,temp_feature_update tmpf
feature.feature_id = featureloc.feature_id
feature.uniquename =
=head1 NAME - Update feature location of existing feature from gff3 file
perl (--dsn=<DBI dsn>) (-u|--user=<username>) (-p|--password=<password> ) (<gff3 file>)
=head1 OPTIONS
--dsn=DBI dsn Database dsn.
-u <username>, --user=<username> Database user name.
-p <password>, --password=<password> Database password.
The script reads a gff3 file and then updates the feature locations of existing features in chado
database. It is intended to run after doing an bulk gff3 upload or if only the features are expected
to be present in the database. It also expects the following behaviour ...
All the features in gff3 should have an ID attribute.
The value of ID should be stored in uniquename column of feature table.
=head2 Further improvement
It updates the feature locations of all matching features. It could be improved
by updating features with changes in feature locations.
Run the entire loading in single transaction.
For very large number of updates, the temp table could have an index on id column.
The foreign key constraints on feature and featureloc tables might be turned
off for faster update. Having an index on uniquename column might help because it
is not clear whether the default chado indices would be sufficient.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.