Skip to content

Instantly share code, notes, and snippets.

@whacked
Created February 6, 2015 17:07
Show Gist options
  • Save whacked/43b473aafb2e2419f551 to your computer and use it in GitHub Desktop.
Save whacked/43b473aafb2e2419f551 to your computer and use it in GitHub Desktop.
ncbi taxonomy in matlab (incomplete -- see DANGER)
function traverse(node_id, iter)
if nargin < 2
iter = 0;
end
% > head names.dmp
% 1 | all | | synonym |
% 1 | root | | scientific name |
% 2 | Bacteria | Bacteria <prokaryote> | scientific name |
% > head nodes.dmp
% 1 | 1 | no rank | | 8 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | |
% 2 | 131567 | superkingdom | | 0 | 0 | 11 | 0 | 0 | 0 | 0 | 0 | |
persistent parentmap;
persistent genusmap;
persistent namemap;
if isempty(genusmap);
disp('loading files...');
t0 = now;
F = fopen('nodes.dmp');
rawnodes = textscan(F, '%d | %d | %s %*[^\n]');
fclose(F);
parentmap = containers.Map(rawnodes{1}, rawnodes{2});
genusmap = containers.Map(rawnodes{1}, rawnodes{3});
% FIXME
% DANGER: id -> name is not 1:1 !!!
F = fopen('names.dmp');
rawnames = textscan(F, '%d | %s %*[^\n]');
fclose(F);
namemap = containers.Map(rawnames{1}, rawnames{2});
disp('files loaded in');
disp(now-t0);
else
;
% disp('files already loaded');
end
if iter > 10;
disp('big loop! break');
return;
end
if ~genusmap.isKey(node_id);
disp('no such id');
return;
end
category = genusmap(node_id);
if strcmp(category, 'genus')
disp(sprintf('found genus at %d\n', node_id));
else
if ~parentmap.isKey(node_id);
disp('warning: no parent');
elseif node_id == parentmap(node_id);
disp('end of tree!')
else
traverse(parentmap(node_id), iter+1);
end
end
end
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment