Skip to content

Instantly share code, notes, and snippets.

@almasaeed2010
Created January 30, 2019 19:07
Show Gist options
  • Save almasaeed2010/a32be862588e63a99ddc614357b69bf1 to your computer and use it in GitHub Desktop.
Save almasaeed2010/a32be862588e63a99ddc614357b69bf1 to your computer and use it in GitHub Desktop.
Uses the [protein_id=*] to extract the protein name.
<?php
if ($argc <= 1) {
echo "Please supply a file name\n";
exit(1);
}
$input = $argv[1];
$file = fopen($input, 'r');
while ($line = fgets($file)) {
$line = trim($line, "\n");
if (begins_with($line, '>')) {
$line = str_replace('lcl|', '', $line);
preg_match('/\[protein_id=(.*?)\]/', $line, $matches);
if(count($matches) > 1) {
$line = ">".$matches[1];
} else {
$parts = explode(' ', $line);
$line = $parts[0];
}
}
echo $line."\n";
}
function begins_with($string, $needle)
{
return substr($string, 0, strlen($needle)) === $needle;
}
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment