Skip to content

Instantly share code, notes, and snippets.

@adyork
Last active October 20, 2022 15:26
Show Gist options
  • Save adyork/5983a1ed9763ffb37fb7cd15df24e895 to your computer and use it in GitHub Desktop.
Save adyork/5983a1ed9763ffb37fb7cd15df24e895 to your computer and use it in GitHub Desktop.
match_names_gnrd.ipynb
Display the source blob
Display the rendered blob
Raw
{
"nbformat": 4,
"nbformat_minor": 0,
"metadata": {
"colab": {
"name": "match_names_gnrd.ipynb",
"provenance": [],
"collapsed_sections": [],
"include_colab_link": true
},
"kernelspec": {
"name": "python3",
"display_name": "Python 3"
},
"language_info": {
"name": "python"
}
},
"cells": [
{
"cell_type": "markdown",
"metadata": {
"id": "view-in-github",
"colab_type": "text"
},
"source": [
"<a href=\"https://colab.research.google.com/gist/adyork/5983a1ed9763ffb37fb7cd15df24e895/match_names_gnrd.ipynb\" target=\"_parent\"><img src=\"https://colab.research.google.com/assets/colab-badge.svg\" alt=\"Open In Colab\"/></a>"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "rit8245jUGye"
},
"source": [
"# Match taxonomic names using the global names resolver\n",
"\n",
"Takes a data file and matches a column of taxanomic names to data sources using the Global Names Resover. http://resolver.globalnames.org/api\n",
"\n",
"Outputs supplied name, and match results for each data source (e.g. EOL, COL, GBIF,ITIS, WoRMS)."
]
},
{
"cell_type": "code",
"metadata": {
"id": "cJ2Q-B-rGpnv"
},
"source": [
"import pandas as pd\n",
"import json\n",
"import os,re\n",
"import requests\n",
"from requests.exceptions import HTTPError"
],
"execution_count": null,
"outputs": []
},
{
"cell_type": "code",
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/"
},
"id": "vD28JT9QHY3_",
"outputId": "61a9b94a-07f9-4c4d-be80-461aeebe5b4e"
},
"source": [
"# First, let's go through an example using two names and look at the \n",
"# data struccture returned.\n",
"\n",
"names=['Gadus morhua','Acomys cahirinus']\n",
"\n",
"# datasource ids\n",
"# http://resolver.globalnames.org/data_sources\n",
"# sources = {'EOL':12,'ITIS':3,'WoRMS':9,'OBIS':149,'NCBI':4,'GBIF':11}\n",
"\n",
"\n",
"response = requests.get(\n",
" 'http://resolver.globalnames.org/name_resolvers.json',\n",
" params={'names': '|'.join(names),\n",
" \"data_source_ids\": '9',\n",
" \"best_match_only\": \"true\",\n",
" \"with_context\": \"true\"\n",
" },\n",
" )\n",
"response.status_code"
],
"execution_count": null,
"outputs": [
{
"output_type": "execute_result",
"data": {
"text/plain": [
"200"
]
},
"metadata": {},
"execution_count": 78
}
]
},
{
"cell_type": "code",
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/"
},
"id": "PAKCiDYKM_BP",
"outputId": "b2fcab67-5ac5-4640-d109-a0b30db2d006"
},
"source": [
"#Wrapping the request to catch errors\n",
"\n",
"try:\n",
" response = requests.get(\n",
" 'http://resolver.globalnames.org/name_resolvers.json',\n",
" params={'names': '|'.join(names),\n",
" \"data_source_ids\": '9',\n",
" \"best_match_only\": \"true\",\n",
" \"with_context\": \"true\"\n",
" },\n",
" )\n",
"\n",
" # If the response was successful, no Exception will be raised\n",
" response.raise_for_status()\n",
"except HTTPError as http_err:\n",
" (f'HTTP error occurred: {http_err}') \n",
"except Exception as err:\n",
" print(f'Other error occurred: {err}') \n",
"else:\n",
" print('200: Success!')\n"
],
"execution_count": null,
"outputs": [
{
"output_type": "stream",
"name": "stdout",
"text": [
"200: Success!\n"
]
}
]
},
{
"cell_type": "code",
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/"
},
"id": "Hr4zowhYH4xF",
"outputId": "91f5849b-f490-4a87-a0e7-426858566a4b"
},
"source": [
"results = response.json()\n",
"results"
],
"execution_count": null,
"outputs": [
{
"output_type": "execute_result",
"data": {
"text/plain": [
"{'context': [{'context_clade': 'Gadus morhua', 'context_data_source_id': 9}],\n",
" 'data': [{'data_sources_number': 1,\n",
" 'in_curated_sources': False,\n",
" 'is_known_name': True,\n",
" 'results': [{'canonical_form': 'Gadus morhua',\n",
" 'classification_path': 'Biota|Animalia|Chordata|Vertebrata|Gnathostomata|Pisces|Actinopterygii|Gadiformes|Gadidae|Gadus|Gadus morhua',\n",
" 'classification_path_ids': 'urn:lsid:marinespecies.org:taxname:1|urn:lsid:marinespecies.org:taxname:2|urn:lsid:marinespecies.org:taxname:1821|urn:lsid:marinespecies.org:taxname:146419|urn:lsid:marinespecies.org:taxname:1828|urn:lsid:marinespecies.org:taxname:11676|urn:lsid:marinespecies.org:taxname:10194|urn:lsid:marinespecies.org:taxname:10313|urn:lsid:marinespecies.org:taxname:125469|urn:lsid:marinespecies.org:taxname:125732|urn:lsid:marinespecies.org:taxname:126436',\n",
" 'classification_path_ranks': '|Kingdom|Phylum|Subphylum|Infraphylum|Superclass|Class|Order|Family|Genus|Species',\n",
" 'data_source_id': 9,\n",
" 'data_source_title': 'World Register of Marine Species',\n",
" 'edit_distance': 0,\n",
" 'gni_uuid': '5128ca17-9b42-5b3b-a280-b6121ffff53b',\n",
" 'imported_at': '2021-06-23T14:03:10Z',\n",
" 'match_type': 2,\n",
" 'match_value': 'Exact match by canonical form',\n",
" 'name_string': 'Gadus morhua Linnaeus, 1758',\n",
" 'prescore': '3|0|0',\n",
" 'score': 0.988,\n",
" 'taxon_id': 'urn:lsid:marinespecies.org:taxname:126436'}],\n",
" 'supplied_name_string': 'Gadus morhua'},\n",
" {'is_known_name': False, 'supplied_name_string': 'Acomys cahirinus'}],\n",
" 'data_sources': [{'id': 9, 'title': 'World Register of Marine Species'}],\n",
" 'id': 'bolp00bdidle',\n",
" 'message': 'Success',\n",
" 'parameters': {'best_match_only': True,\n",
" 'data_sources': [9],\n",
" 'header_only': False,\n",
" 'preferred_data_sources': [],\n",
" 'resolve_once': False,\n",
" 'with_canonical_ranks': False,\n",
" 'with_context': True,\n",
" 'with_vernaculars': False},\n",
" 'status': 'success',\n",
" 'url': 'http://resolver.globalnames.org/name_resolvers/bolp00bdidle.json'}"
]
},
"metadata": {},
"execution_count": 80
}
]
},
{
"cell_type": "code",
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/"
},
"id": "9qLkamPGOv5T",
"outputId": "6bb14bda-36c2-4099-ed49-191e2274f162"
},
"source": [
"#looking at the first data source results at index [0] since python starts at 0.\n",
"results['data'][0]"
],
"execution_count": null,
"outputs": [
{
"output_type": "execute_result",
"data": {
"text/plain": [
"{'data_sources_number': 1,\n",
" 'in_curated_sources': False,\n",
" 'is_known_name': True,\n",
" 'results': [{'canonical_form': 'Gadus morhua',\n",
" 'classification_path': 'Biota|Animalia|Chordata|Vertebrata|Gnathostomata|Pisces|Actinopterygii|Gadiformes|Gadidae|Gadus|Gadus morhua',\n",
" 'classification_path_ids': 'urn:lsid:marinespecies.org:taxname:1|urn:lsid:marinespecies.org:taxname:2|urn:lsid:marinespecies.org:taxname:1821|urn:lsid:marinespecies.org:taxname:146419|urn:lsid:marinespecies.org:taxname:1828|urn:lsid:marinespecies.org:taxname:11676|urn:lsid:marinespecies.org:taxname:10194|urn:lsid:marinespecies.org:taxname:10313|urn:lsid:marinespecies.org:taxname:125469|urn:lsid:marinespecies.org:taxname:125732|urn:lsid:marinespecies.org:taxname:126436',\n",
" 'classification_path_ranks': '|Kingdom|Phylum|Subphylum|Infraphylum|Superclass|Class|Order|Family|Genus|Species',\n",
" 'data_source_id': 9,\n",
" 'data_source_title': 'World Register of Marine Species',\n",
" 'edit_distance': 0,\n",
" 'gni_uuid': '5128ca17-9b42-5b3b-a280-b6121ffff53b',\n",
" 'imported_at': '2021-06-23T14:03:10Z',\n",
" 'match_type': 2,\n",
" 'match_value': 'Exact match by canonical form',\n",
" 'name_string': 'Gadus morhua Linnaeus, 1758',\n",
" 'prescore': '3|0|0',\n",
" 'score': 0.988,\n",
" 'taxon_id': 'urn:lsid:marinespecies.org:taxname:126436'}],\n",
" 'supplied_name_string': 'Gadus morhua'}"
]
},
"metadata": {},
"execution_count": 81
}
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "UlvoBFL5PS-V"
},
"source": [
"## Match types\n",
"* take a closer look at any name that isn't an exact match. It could indicate formatting or spelling errors.\n",
"* from global names resolver documentation:\n",
" 1 - Exact match\n",
" 2 - Exact match by canonical form of a name\n",
" 3 - Fuzzy match by canonical form\n",
" 4 - Partial exact match by species part of canonical form\n",
" 5 - Partial fuzzy match by species part of canonical form\n",
" 6 - Exact match by genus part of a canonical form"
]
},
{
"cell_type": "code",
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/"
},
"id": "DhxRhmkYF9P1",
"outputId": "8e19f8e1-0dfe-4526-c78d-1af704bcb4e2"
},
"source": [
"# To get down to the level with the name match results you have to drill down into\n",
"# the json heirarchy.\n",
"# For example, to get the first name match result from the first data source:\n",
"results['data'][0]['results'][0]"
],
"execution_count": null,
"outputs": [
{
"output_type": "execute_result",
"data": {
"text/plain": [
"{'canonical_form': 'Gadus morhua',\n",
" 'classification_path': 'Biota|Animalia|Chordata|Vertebrata|Gnathostomata|Pisces|Actinopterygii|Gadiformes|Gadidae|Gadus|Gadus morhua',\n",
" 'classification_path_ids': 'urn:lsid:marinespecies.org:taxname:1|urn:lsid:marinespecies.org:taxname:2|urn:lsid:marinespecies.org:taxname:1821|urn:lsid:marinespecies.org:taxname:146419|urn:lsid:marinespecies.org:taxname:1828|urn:lsid:marinespecies.org:taxname:11676|urn:lsid:marinespecies.org:taxname:10194|urn:lsid:marinespecies.org:taxname:10313|urn:lsid:marinespecies.org:taxname:125469|urn:lsid:marinespecies.org:taxname:125732|urn:lsid:marinespecies.org:taxname:126436',\n",
" 'classification_path_ranks': '|Kingdom|Phylum|Subphylum|Infraphylum|Superclass|Class|Order|Family|Genus|Species',\n",
" 'data_source_id': 9,\n",
" 'data_source_title': 'World Register of Marine Species',\n",
" 'edit_distance': 0,\n",
" 'gni_uuid': '5128ca17-9b42-5b3b-a280-b6121ffff53b',\n",
" 'imported_at': '2021-06-23T14:03:10Z',\n",
" 'match_type': 2,\n",
" 'match_value': 'Exact match by canonical form',\n",
" 'name_string': 'Gadus morhua Linnaeus, 1758',\n",
" 'prescore': '3|0|0',\n",
" 'score': 0.988,\n",
" 'taxon_id': 'urn:lsid:marinespecies.org:taxname:126436'}"
]
},
"metadata": {},
"execution_count": 82
}
]
},
{
"cell_type": "code",
"metadata": {
"id": "EObs6cuJP7SE",
"outputId": "091886fd-7a51-402a-f432-b5f91d897793",
"colab": {
"base_uri": "https://localhost:8080/",
"height": 35
}
},
"source": [
"# And to get the taxon_id from that name match:\n",
"results['data'][0]['results'][0]['taxon_id']\n",
"\n",
"# * more about the taxon_id variability between data sources at the end of this notebook...."
],
"execution_count": null,
"outputs": [
{
"output_type": "execute_result",
"data": {
"application/vnd.google.colaboratory.intrinsic+json": {
"type": "string"
},
"text/plain": [
"'urn:lsid:marinespecies.org:taxname:126436'"
]
},
"metadata": {},
"execution_count": 83
}
]
},
{
"cell_type": "code",
"metadata": {
"id": "sMP73lO_hEqO"
},
"source": [
"#get_name_matches takes a list of species names and one data source and \n",
"# returns a data frame with match information for each name\n",
"def get_name_matches(names=['Gadus morhua','Acomys cahirinus'],datasource_id=3):\n",
"\n",
" #uses global name resolver, see http://resolver.globalnames.org/api\n",
" \n",
" # .e.g.\n",
" # http://resolver.globalnames.org/name_resolvers.xml?names=Plantago+major|Monohamus+galloprovincialis|Felis+concolor&data_source_ids=1|12\n",
"\n",
" try:\n",
" response = requests.get(\n",
" 'http://resolver.globalnames.org/name_resolvers.json',\n",
" params={'names': '|'.join(names),\n",
" \"data_source_ids\": datasource_id,\n",
" \"best_match_only\": \"true\",\n",
" \"with_context\": \"true\"\n",
" },\n",
" )\n",
"\n",
" # If the response was successful, no Exception will be raised\n",
" response.raise_for_status()\n",
" except HTTPError as http_err:\n",
" (f'HTTP error occurred: {http_err}') \n",
" except Exception as err:\n",
" print(f'Other error occurred: {err}') \n",
" else:\n",
" results = response.json()\n",
"\n",
"\n",
" #transform the results data from json to a dataframe. \n",
" # Also fill in dummy values when name not matched\n",
"\n",
" ''' \n",
" If not a known name won't have a \"results\" under data so this doesn't work\n",
" because ignore only allows keyerrors for meta not \"record_path\"\n",
" \n",
" '''\n",
"\n",
" #Before I json_normalize below, I need to add an empty results key in the dict so the normalize\n",
" # won't get a key error. It can handle the metas being missing but not the \"record paths.\"\n",
" for i, v in enumerate(results['data']):\n",
" if not 'results' in v:\n",
" print('\\t\\t No match for: ' +str(v))\n",
" #add results list with first item a dict\n",
" results['data'][i]['results'] = [{'match_value': 'No match found'}]\n",
"\n",
" df = pd.json_normalize(results['data'], errors='ignore',\n",
" meta=['is_known_name', 'supplied_name_string'],\n",
" record_path='results' # I want cols from all keys under this\n",
" )\n",
" return df"
],
"execution_count": null,
"outputs": []
},
{
"cell_type": "code",
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/",
"height": 235
},
"id": "q8AjfGGeS0D4",
"outputId": "bfe0b946-b96d-406d-df21-6d231dc32f59"
},
"source": [
"get_name_matches(names=['Gadus morha','Acomys cahirinus'],datasource_id=3)\n",
"# the above name is spelled wrong, it should be Gadus morhua"
],
"execution_count": null,
"outputs": [
{
"output_type": "execute_result",
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>data_source_id</th>\n",
" <th>data_source_title</th>\n",
" <th>gni_uuid</th>\n",
" <th>name_string</th>\n",
" <th>canonical_form</th>\n",
" <th>classification_path</th>\n",
" <th>classification_path_ranks</th>\n",
" <th>classification_path_ids</th>\n",
" <th>taxon_id</th>\n",
" <th>edit_distance</th>\n",
" <th>imported_at</th>\n",
" <th>match_type</th>\n",
" <th>match_value</th>\n",
" <th>prescore</th>\n",
" <th>score</th>\n",
" <th>current_taxon_id</th>\n",
" <th>current_name_string</th>\n",
" <th>is_known_name</th>\n",
" <th>supplied_name_string</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>0</th>\n",
" <td>3</td>\n",
" <td>Integrated Taxonomic Information SystemITIS</td>\n",
" <td>5128ca17-9b42-5b3b-a280-b6121ffff53b</td>\n",
" <td>Gadus morhua Linnaeus, 1758</td>\n",
" <td>Gadus morhua</td>\n",
" <td>Animalia|Bilateria|Deuterostomia|Chordata|Vert...</td>\n",
" <td>Kingdom|Subkingdom|Infrakingdom|Phylum|Subphyl...</td>\n",
" <td>202423|914154|914156|158852|331030|914179|1610...</td>\n",
" <td>164712</td>\n",
" <td>1</td>\n",
" <td>2021-06-23T03:49:29Z</td>\n",
" <td>3</td>\n",
" <td>Fuzzy match by canonical form</td>\n",
" <td>1|0|0</td>\n",
" <td>0.750</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>False</td>\n",
" <td>Gadus morha</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1</th>\n",
" <td>3</td>\n",
" <td>Integrated Taxonomic Information SystemITIS</td>\n",
" <td>261f2ff8-be59-5a09-b7fe-77fd219a14c6</td>\n",
" <td>Acomys cahirinus (Desmarest, 1819)</td>\n",
" <td>Acomys cahirinus</td>\n",
" <td>Animalia|Bilateria|Deuterostomia|Chordata|Vert...</td>\n",
" <td>Kingdom|Subkingdom|Infrakingdom|Phylum|Subphyl...</td>\n",
" <td>202423|914154|914156|158852|331030|914179|9141...</td>\n",
" <td>585099</td>\n",
" <td>0</td>\n",
" <td>2021-06-23T03:53:12Z</td>\n",
" <td>2</td>\n",
" <td>Exact match by canonical form</td>\n",
" <td>3|0|0</td>\n",
" <td>0.988</td>\n",
" <td>970754</td>\n",
" <td>Acomys cahirinus (É. Geoffroy, 1803)</td>\n",
" <td>True</td>\n",
" <td>Acomys cahirinus</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" data_source_id ... supplied_name_string\n",
"0 3 ... Gadus morha\n",
"1 3 ... Acomys cahirinus\n",
"\n",
"[2 rows x 19 columns]"
]
},
"metadata": {},
"execution_count": 85
}
]
},
{
"cell_type": "code",
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/",
"height": 219
},
"id": "NURs3WXBS5jQ",
"outputId": "b08828b8-27c5-4f09-8b74-4c376d5c3674"
},
"source": [
"#note that it still returns the row for unmatched names\n",
"get_name_matches(names=['Gadus morha','Acomys cahirinus'],datasource_id=9)\n",
"# the above name is spelled wrong, it should be Gadus morhua\n",
"# And I changed the data source from ITIS to WoRMS"
],
"execution_count": null,
"outputs": [
{
"output_type": "stream",
"name": "stdout",
"text": [
"\t\t No match for: {'supplied_name_string': 'Acomys cahirinus', 'is_known_name': False}\n"
]
},
{
"output_type": "execute_result",
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>data_source_id</th>\n",
" <th>data_source_title</th>\n",
" <th>gni_uuid</th>\n",
" <th>name_string</th>\n",
" <th>canonical_form</th>\n",
" <th>classification_path</th>\n",
" <th>classification_path_ranks</th>\n",
" <th>classification_path_ids</th>\n",
" <th>taxon_id</th>\n",
" <th>edit_distance</th>\n",
" <th>imported_at</th>\n",
" <th>match_type</th>\n",
" <th>match_value</th>\n",
" <th>prescore</th>\n",
" <th>score</th>\n",
" <th>is_known_name</th>\n",
" <th>supplied_name_string</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>0</th>\n",
" <td>9.0</td>\n",
" <td>World Register of Marine Species</td>\n",
" <td>5128ca17-9b42-5b3b-a280-b6121ffff53b</td>\n",
" <td>Gadus morhua Linnaeus, 1758</td>\n",
" <td>Gadus morhua</td>\n",
" <td>Biota|Animalia|Chordata|Vertebrata|Gnathostoma...</td>\n",
" <td>|Kingdom|Phylum|Subphylum|Infraphylum|Supercla...</td>\n",
" <td>urn:lsid:marinespecies.org:taxname:1|urn:lsid:...</td>\n",
" <td>urn:lsid:marinespecies.org:taxname:126436</td>\n",
" <td>1.0</td>\n",
" <td>2021-06-23T14:03:10Z</td>\n",
" <td>3.0</td>\n",
" <td>Fuzzy match by canonical form</td>\n",
" <td>1|0|0</td>\n",
" <td>0.75</td>\n",
" <td>False</td>\n",
" <td>Gadus morha</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1</th>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>No match found</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>False</td>\n",
" <td>Acomys cahirinus</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" data_source_id ... supplied_name_string\n",
"0 9.0 ... Gadus morha\n",
"1 NaN ... Acomys cahirinus\n",
"\n",
"[2 rows x 17 columns]"
]
},
"metadata": {},
"execution_count": 86
}
]
},
{
"cell_type": "code",
"metadata": {
"id": "yR25EVeEHMgV"
},
"source": [
"'''\n",
"Takes a data frame (df_orig) and matches the column with taxonomic name (taxa_col) \n",
" then calls get_match_names() to add match information to the dataset data frame.\n",
"''' \n",
"\n",
"def match_df(df_orig=None,taxa_col = 'Species',sources = {'GBIF': 11, 'WoRMS': 9}):\n",
"\n",
" df_results = df_orig.copy()\n",
" df_results_all = df_orig.copy()\n",
"\n",
"\n",
" for source,source_id in sources.items():\n",
" print('Matching names with GNRD source '+str(source_id)+': ' + source)\n",
" df_matches = get_name_matches(names=df_orig[taxa_col],datasource_id=source_id)\n",
" df_matches = df_matches.add_prefix(source+'_') #add prefix EOL_ etc to each col\n",
"\n",
" #add columns for this data source to results df\n",
" df_results_all = pd.concat([df_results_all, df_matches], axis=1, sort=False)\n",
"\n",
" df_matches[source+'_matched_name'] = df_matches[source+'_canonical_form']\n",
"\n",
" #get just the critical columns\n",
" df_results = pd.concat(\n",
" [\n",
" df_results,\n",
" df_matches[[\n",
" source + '_match_value',\n",
" source+'_matched_name',\n",
" source+'_taxon_id',\n",
" #source + '_is_known_name'\n",
" ]]\n",
" ],\n",
" axis=1,\n",
" sort=False\n",
" )\n",
"\n",
" df_results.to_csv('match_results.csv',index=False)\n",
" df_results_all.to_csv('match_results_all.csv', index=False)\n",
"\n",
" print('done')\n",
" return df_results_all"
],
"execution_count": null,
"outputs": []
},
{
"cell_type": "code",
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/"
},
"id": "ouPNrpp-QpUY",
"outputId": "c7371e02-c674-4be3-a92a-4aa4e5a9e0be"
},
"source": [
"'''\n",
"Run it all! Specify sources, get a data table with species names, and return \n",
" added columns with match information to the data table from each source using \n",
" the above function we just defined.\n",
"\n",
"* Keep in mind which sources are domain-specific! Note that terrestrial species\n",
" like Chamaea fasciata (a bird) are not known names in \n",
" the World Register of Marine Species (WoRMS). \n",
"'''\n",
"\n",
"sources = {'EOL':12,'WoRMS':9,'NCBI':4,'GBIF':11,'ITIS':3}\n",
"\n",
"df_orig = pd.read_csv('https://datadocs.bco-dmo.org/data/305/CC_Fishery_Adaptations/752795/1/data/animal_mobility.csv')\n",
"\n",
"#limiting for demo purposes\n",
"df_orig = df_orig.head(100)\n",
"\n",
"print(df_orig)\n",
"\n",
"df = match_df(df_orig,'Species',sources)"
],
"execution_count": null,
"outputs": [
{
"output_type": "stream",
"name": "stdout",
"text": [
" System Group Species BM HR Refs\n",
"0 M B Sterna forsteri 149 58.000 (150)\n",
"1 M B Ptychoramphus aleuticus 164 3008.000 (151|152)\n",
"2 M B Brachyramphus marmoratus 220 127.000 (153)\n",
"3 M B Calonectris diomedea 535 485776.000 (154)\n",
"4 M B Alca torda 600 2201.000 (155)\n",
".. ... ... ... ... ... ...\n",
"95 T B Habia rubica 33 0.049 (221)\n",
"96 T B Habia fuscicauda 38 0.061 (221)\n",
"97 T B Dasyornis brachypterus 42 0.100 (222);(223)\n",
"98 T B Lanius ludovicianus 48 0.073 (221|224–227) \n",
"99 T B Laniuis ludovicianus 48 0.076 (221)\n",
"\n",
"[100 rows x 6 columns]\n",
"Matching names with GNRD source 12: EOL\n",
"Matching names with GNRD source 9: WoRMS\n",
"\t\t No match for: {'supplied_name_string': 'Chamaea fasciata', 'is_known_name': False}\n",
"\t\t No match for: {'supplied_name_string': 'Laniuis collurio', 'is_known_name': False}\n",
"\t\t No match for: {'supplied_name_string': 'Habia rubica', 'is_known_name': False}\n",
"\t\t No match for: {'supplied_name_string': 'Habia fuscicauda', 'is_known_name': False}\n",
"\t\t No match for: {'supplied_name_string': 'Dasyornis brachypterus', 'is_known_name': False}\n",
"Matching names with GNRD source 4: NCBI\n",
"\t\t No match for: {'supplied_name_string': 'Dendroica magnolia', 'is_known_name': False}\n",
"\t\t No match for: {'supplied_name_string': 'Dendroica virens', 'is_known_name': False}\n",
"\t\t No match for: {'supplied_name_string': 'Dendroica pensylvanica', 'is_known_name': False}\n",
"\t\t No match for: {'supplied_name_string': 'Dendroica fusca', 'is_known_name': False}\n",
"\t\t No match for: {'supplied_name_string': 'Dendroica kirtlandi', 'is_known_name': False}\n",
"Matching names with GNRD source 11: GBIF\n",
"Matching names with GNRD source 3: ITIS\n",
"done\n"
]
}
]
},
{
"cell_type": "code",
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/",
"height": 839
},
"id": "JwXSKS9-iWyQ",
"outputId": "d9186634-da4f-4976-c694-5f99dc42f443"
},
"source": [
"df"
],
"execution_count": null,
"outputs": [
{
"output_type": "execute_result",
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>System</th>\n",
" <th>Group</th>\n",
" <th>Species</th>\n",
" <th>BM</th>\n",
" <th>HR</th>\n",
" <th>Refs</th>\n",
" <th>EOL_data_source_id</th>\n",
" <th>EOL_data_source_title</th>\n",
" <th>EOL_gni_uuid</th>\n",
" <th>EOL_name_string</th>\n",
" <th>EOL_canonical_form</th>\n",
" <th>EOL_classification_path</th>\n",
" <th>EOL_classification_path_ranks</th>\n",
" <th>EOL_classification_path_ids</th>\n",
" <th>EOL_taxon_id</th>\n",
" <th>EOL_local_id</th>\n",
" <th>EOL_edit_distance</th>\n",
" <th>EOL_imported_at</th>\n",
" <th>EOL_match_type</th>\n",
" <th>EOL_match_value</th>\n",
" <th>EOL_prescore</th>\n",
" <th>EOL_score</th>\n",
" <th>EOL_is_known_name</th>\n",
" <th>EOL_supplied_name_string</th>\n",
" <th>WoRMS_data_source_id</th>\n",
" <th>WoRMS_data_source_title</th>\n",
" <th>WoRMS_gni_uuid</th>\n",
" <th>WoRMS_name_string</th>\n",
" <th>WoRMS_canonical_form</th>\n",
" <th>WoRMS_classification_path</th>\n",
" <th>WoRMS_classification_path_ranks</th>\n",
" <th>WoRMS_classification_path_ids</th>\n",
" <th>WoRMS_taxon_id</th>\n",
" <th>WoRMS_edit_distance</th>\n",
" <th>WoRMS_imported_at</th>\n",
" <th>WoRMS_match_type</th>\n",
" <th>WoRMS_match_value</th>\n",
" <th>WoRMS_prescore</th>\n",
" <th>WoRMS_score</th>\n",
" <th>WoRMS_current_taxon_id</th>\n",
" <th>...</th>\n",
" <th>NCBI_is_known_name</th>\n",
" <th>NCBI_supplied_name_string</th>\n",
" <th>GBIF_data_source_id</th>\n",
" <th>GBIF_data_source_title</th>\n",
" <th>GBIF_gni_uuid</th>\n",
" <th>GBIF_name_string</th>\n",
" <th>GBIF_canonical_form</th>\n",
" <th>GBIF_classification_path</th>\n",
" <th>GBIF_classification_path_ranks</th>\n",
" <th>GBIF_classification_path_ids</th>\n",
" <th>GBIF_taxon_id</th>\n",
" <th>GBIF_edit_distance</th>\n",
" <th>GBIF_imported_at</th>\n",
" <th>GBIF_match_type</th>\n",
" <th>GBIF_match_value</th>\n",
" <th>GBIF_prescore</th>\n",
" <th>GBIF_score</th>\n",
" <th>GBIF_current_taxon_id</th>\n",
" <th>GBIF_current_name_string</th>\n",
" <th>GBIF_is_known_name</th>\n",
" <th>GBIF_supplied_name_string</th>\n",
" <th>ITIS_data_source_id</th>\n",
" <th>ITIS_data_source_title</th>\n",
" <th>ITIS_gni_uuid</th>\n",
" <th>ITIS_name_string</th>\n",
" <th>ITIS_canonical_form</th>\n",
" <th>ITIS_classification_path</th>\n",
" <th>ITIS_classification_path_ranks</th>\n",
" <th>ITIS_classification_path_ids</th>\n",
" <th>ITIS_taxon_id</th>\n",
" <th>ITIS_edit_distance</th>\n",
" <th>ITIS_imported_at</th>\n",
" <th>ITIS_match_type</th>\n",
" <th>ITIS_match_value</th>\n",
" <th>ITIS_prescore</th>\n",
" <th>ITIS_score</th>\n",
" <th>ITIS_current_taxon_id</th>\n",
" <th>ITIS_current_name_string</th>\n",
" <th>ITIS_is_known_name</th>\n",
" <th>ITIS_supplied_name_string</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>0</th>\n",
" <td>M</td>\n",
" <td>B</td>\n",
" <td>Sterna forsteri</td>\n",
" <td>149</td>\n",
" <td>58.000</td>\n",
" <td>(150)</td>\n",
" <td>12</td>\n",
" <td>Encyclopedia of Life</td>\n",
" <td>75cd7ab7-b950-5609-85e6-352d56155f2b</td>\n",
" <td>Sterna forsteri</td>\n",
" <td>Sterna forsteri</td>\n",
" <td></td>\n",
" <td></td>\n",
" <td></td>\n",
" <td>45509329</td>\n",
" <td>45509329</td>\n",
" <td>0</td>\n",
" <td>2021-06-24T17:14:03Z</td>\n",
" <td>1</td>\n",
" <td>Exact string match</td>\n",
" <td>3|0|0</td>\n",
" <td>0.988</td>\n",
" <td>True</td>\n",
" <td>Sterna forsteri</td>\n",
" <td>9.0</td>\n",
" <td>World Register of Marine Species</td>\n",
" <td>540b124d-66e3-5a21-8c74-d70906cb1a5a</td>\n",
" <td>Sterna forsteri Nuttall, 1834</td>\n",
" <td>Sterna forsteri</td>\n",
" <td>Biota|Animalia|Chordata|Vertebrata|Gnathostoma...</td>\n",
" <td>|Kingdom|Phylum|Subphylum|Infraphylum|Supercla...</td>\n",
" <td>urn:lsid:marinespecies.org:taxname:1|urn:lsid:...</td>\n",
" <td>urn:lsid:marinespecies.org:taxname:159057</td>\n",
" <td>0.0</td>\n",
" <td>2021-06-23T14:05:19Z</td>\n",
" <td>2.0</td>\n",
" <td>Exact match by canonical form</td>\n",
" <td>3|0|0</td>\n",
" <td>0.988</td>\n",
" <td>NaN</td>\n",
" <td>...</td>\n",
" <td>True</td>\n",
" <td>Sterna forsteri</td>\n",
" <td>11</td>\n",
" <td>GBIF Backbone Taxonomy</td>\n",
" <td>540b124d-66e3-5a21-8c74-d70906cb1a5a</td>\n",
" <td>Sterna forsteri Nuttall, 1834</td>\n",
" <td>Sterna forsteri</td>\n",
" <td>Animalia|Chordata|Aves|Charadriiformes|Laridae...</td>\n",
" <td>kingdom|phylum|class|order|family|genus|species</td>\n",
" <td>1|44|212|7192402|9316|2481227|5229247</td>\n",
" <td>5229247</td>\n",
" <td>0</td>\n",
" <td>2021-06-24T02:19:59Z</td>\n",
" <td>2</td>\n",
" <td>Exact match by canonical form</td>\n",
" <td>3|0|0</td>\n",
" <td>0.988</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>True</td>\n",
" <td>Sterna forsteri</td>\n",
" <td>3</td>\n",
" <td>Integrated Taxonomic Information SystemITIS</td>\n",
" <td>540b124d-66e3-5a21-8c74-d70906cb1a5a</td>\n",
" <td>Sterna forsteri Nuttall, 1834</td>\n",
" <td>Sterna forsteri</td>\n",
" <td>Animalia|Bilateria|Deuterostomia|Chordata|Vert...</td>\n",
" <td>Kingdom|Subkingdom|Infrakingdom|Phylum|Subphyl...</td>\n",
" <td>202423|914154|914156|158852|331030|914179|9141...</td>\n",
" <td>176887</td>\n",
" <td>0</td>\n",
" <td>2021-06-23T03:50:03Z</td>\n",
" <td>2</td>\n",
" <td>Exact match by canonical form</td>\n",
" <td>3|0|0</td>\n",
" <td>0.988</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>True</td>\n",
" <td>Sterna forsteri</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1</th>\n",
" <td>M</td>\n",
" <td>B</td>\n",
" <td>Ptychoramphus aleuticus</td>\n",
" <td>164</td>\n",
" <td>3008.000</td>\n",
" <td>(151|152)</td>\n",
" <td>12</td>\n",
" <td>Encyclopedia of Life</td>\n",
" <td>7884810e-fb20-5c52-a78f-c84ccaa8ad46</td>\n",
" <td>Ptychoramphus aleuticus</td>\n",
" <td>Ptychoramphus aleuticus</td>\n",
" <td></td>\n",
" <td></td>\n",
" <td></td>\n",
" <td>45509361</td>\n",
" <td>45509361</td>\n",
" <td>0</td>\n",
" <td>2021-06-24T17:14:03Z</td>\n",
" <td>1</td>\n",
" <td>Exact string match</td>\n",
" <td>3|0|0</td>\n",
" <td>0.988</td>\n",
" <td>True</td>\n",
" <td>Ptychoramphus aleuticus</td>\n",
" <td>9.0</td>\n",
" <td>World Register of Marine Species</td>\n",
" <td>449fb8f0-b255-5506-b7a9-f94fce8a0ff2</td>\n",
" <td>Ptychoramphus aleuticus (Pallas, 1811)</td>\n",
" <td>Ptychoramphus aleuticus</td>\n",
" <td>Biota|Animalia|Chordata|Vertebrata|Gnathostoma...</td>\n",
" <td>|Kingdom|Phylum|Subphylum|Infraphylum|Supercla...</td>\n",
" <td>urn:lsid:marinespecies.org:taxname:1|urn:lsid:...</td>\n",
" <td>urn:lsid:marinespecies.org:taxname:344115</td>\n",
" <td>0.0</td>\n",
" <td>2021-06-23T14:11:37Z</td>\n",
" <td>2.0</td>\n",
" <td>Exact match by canonical form</td>\n",
" <td>3|0|0</td>\n",
" <td>0.988</td>\n",
" <td>NaN</td>\n",
" <td>...</td>\n",
" <td>True</td>\n",
" <td>Ptychoramphus aleuticus</td>\n",
" <td>11</td>\n",
" <td>GBIF Backbone Taxonomy</td>\n",
" <td>449fb8f0-b255-5506-b7a9-f94fce8a0ff2</td>\n",
" <td>Ptychoramphus aleuticus (Pallas, 1811)</td>\n",
" <td>Ptychoramphus aleuticus</td>\n",
" <td>Animalia|Chordata|Aves|Charadriiformes|Alcidae...</td>\n",
" <td>kingdom|phylum|class|order|family|genus|species</td>\n",
" <td>1|44|212|7192402|2985|2481301|2481302</td>\n",
" <td>2481302</td>\n",
" <td>0</td>\n",
" <td>2021-06-23T23:59:22Z</td>\n",
" <td>2</td>\n",
" <td>Exact match by canonical form</td>\n",
" <td>3|0|0</td>\n",
" <td>0.988</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>True</td>\n",
" <td>Ptychoramphus aleuticus</td>\n",
" <td>3</td>\n",
" <td>Integrated Taxonomic Information SystemITIS</td>\n",
" <td>449fb8f0-b255-5506-b7a9-f94fce8a0ff2</td>\n",
" <td>Ptychoramphus aleuticus (Pallas, 1811)</td>\n",
" <td>Ptychoramphus aleuticus</td>\n",
" <td>Animalia|Bilateria|Deuterostomia|Chordata|Vert...</td>\n",
" <td>Kingdom|Subkingdom|Infrakingdom|Phylum|Subphyl...</td>\n",
" <td>202423|914154|914156|158852|331030|914179|9141...</td>\n",
" <td>177013</td>\n",
" <td>0</td>\n",
" <td>2021-06-23T03:50:03Z</td>\n",
" <td>2</td>\n",
" <td>Exact match by canonical form</td>\n",
" <td>3|0|0</td>\n",
" <td>0.988</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>True</td>\n",
" <td>Ptychoramphus aleuticus</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2</th>\n",
" <td>M</td>\n",
" <td>B</td>\n",
" <td>Brachyramphus marmoratus</td>\n",
" <td>220</td>\n",
" <td>127.000</td>\n",
" <td>(153)</td>\n",
" <td>12</td>\n",
" <td>Encyclopedia of Life</td>\n",
" <td>1a704327-c41c-5fa5-9a5a-4a745ca8c5a5</td>\n",
" <td>Brachyramphus marmoratus</td>\n",
" <td>Brachyramphus marmoratus</td>\n",
" <td></td>\n",
" <td></td>\n",
" <td></td>\n",
" <td>45509352</td>\n",
" <td>45509352</td>\n",
" <td>0</td>\n",
" <td>2021-06-24T17:14:03Z</td>\n",
" <td>1</td>\n",
" <td>Exact string match</td>\n",
" <td>3|0|0</td>\n",
" <td>0.988</td>\n",
" <td>True</td>\n",
" <td>Brachyramphus marmoratus</td>\n",
" <td>9.0</td>\n",
" <td>World Register of Marine Species</td>\n",
" <td>5c6c9cc6-e3ea-5654-97bc-f98a0ff18455</td>\n",
" <td>Brachyramphus marmoratus (Gmelin, 1789)</td>\n",
" <td>Brachyramphus marmoratus</td>\n",
" <td>Biota|Animalia|Chordata|Vertebrata|Gnathostoma...</td>\n",
" <td>|Kingdom|Phylum|Subphylum|Infraphylum|Supercla...</td>\n",
" <td>urn:lsid:marinespecies.org:taxname:1|urn:lsid:...</td>\n",
" <td>urn:lsid:marinespecies.org:taxname:254308</td>\n",
" <td>0.0</td>\n",
" <td>2021-06-23T14:08:53Z</td>\n",
" <td>2.0</td>\n",
" <td>Exact match by canonical form</td>\n",
" <td>3|0|0</td>\n",
" <td>0.988</td>\n",
" <td>NaN</td>\n",
" <td>...</td>\n",
" <td>True</td>\n",
" <td>Brachyramphus marmoratus</td>\n",
" <td>11</td>\n",
" <td>GBIF Backbone Taxonomy</td>\n",
" <td>5c6c9cc6-e3ea-5654-97bc-f98a0ff18455</td>\n",
" <td>Brachyramphus marmoratus (Gmelin, 1789)</td>\n",
" <td>Brachyramphus marmoratus</td>\n",
" <td>Animalia|Chordata|Aves|Charadriiformes|Alcidae...</td>\n",
" <td>kingdom|phylum|class|order|family|genus|species</td>\n",
" <td>1|44|212|7192402|2985|2481326|5229281</td>\n",
" <td>5229281</td>\n",
" <td>0</td>\n",
" <td>2021-06-24T01:34:45Z</td>\n",
" <td>2</td>\n",
" <td>Exact match by canonical form</td>\n",
" <td>3|0|0</td>\n",
" <td>0.988</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>True</td>\n",
" <td>Brachyramphus marmoratus</td>\n",
" <td>3</td>\n",
" <td>Integrated Taxonomic Information SystemITIS</td>\n",
" <td>5c6c9cc6-e3ea-5654-97bc-f98a0ff18455</td>\n",
" <td>Brachyramphus marmoratus (Gmelin, 1789)</td>\n",
" <td>Brachyramphus marmoratus</td>\n",
" <td>Animalia|Bilateria|Deuterostomia|Chordata|Vert...</td>\n",
" <td>Kingdom|Subkingdom|Infrakingdom|Phylum|Subphyl...</td>\n",
" <td>202423|914154|914156|158852|331030|914179|9141...</td>\n",
" <td>176996</td>\n",
" <td>0</td>\n",
" <td>2021-06-23T03:50:03Z</td>\n",
" <td>2</td>\n",
" <td>Exact match by canonical form</td>\n",
" <td>3|0|0</td>\n",
" <td>0.988</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>True</td>\n",
" <td>Brachyramphus marmoratus</td>\n",
" </tr>\n",
" <tr>\n",
" <th>3</th>\n",
" <td>M</td>\n",
" <td>B</td>\n",
" <td>Calonectris diomedea</td>\n",
" <td>535</td>\n",
" <td>485776.000</td>\n",
" <td>(154)</td>\n",
" <td>12</td>\n",
" <td>Encyclopedia of Life</td>\n",
" <td>d1c5f3ee-b763-5bf4-be74-de78125d2a43</td>\n",
" <td>Calonectris diomedea</td>\n",
" <td>Calonectris diomedea</td>\n",
" <td></td>\n",
" <td></td>\n",
" <td></td>\n",
" <td>51900544</td>\n",
" <td>51900544</td>\n",
" <td>0</td>\n",
" <td>2021-06-24T17:22:56Z</td>\n",
" <td>1</td>\n",
" <td>Exact string match</td>\n",
" <td>3|0|0</td>\n",
" <td>0.988</td>\n",
" <td>True</td>\n",
" <td>Calonectris diomedea</td>\n",
" <td>9.0</td>\n",
" <td>World Register of Marine Species</td>\n",
" <td>2c4fe23c-30fb-5e01-9894-b4f72766bc8f</td>\n",
" <td>Calonectris diomedea (Scopoli, 1769)</td>\n",
" <td>Calonectris diomedea</td>\n",
" <td>Biota|Animalia|Chordata|Vertebrata|Gnathostoma...</td>\n",
" <td>|Kingdom|Phylum|Subphylum|Infraphylum|Supercla...</td>\n",
" <td>urn:lsid:marinespecies.org:taxname:1|urn:lsid:...</td>\n",
" <td>urn:lsid:marinespecies.org:taxname:137194</td>\n",
" <td>0.0</td>\n",
" <td>2021-06-23T14:04:04Z</td>\n",
" <td>2.0</td>\n",
" <td>Exact match by canonical form</td>\n",
" <td>3|0|0</td>\n",
" <td>0.988</td>\n",
" <td>NaN</td>\n",
" <td>...</td>\n",
" <td>True</td>\n",
" <td>Calonectris diomedea</td>\n",
" <td>11</td>\n",
" <td>GBIF Backbone Taxonomy</td>\n",
" <td>2c4fe23c-30fb-5e01-9894-b4f72766bc8f</td>\n",
" <td>Calonectris diomedea (Scopoli, 1769)</td>\n",
" <td>Calonectris diomedea</td>\n",
" <td>Animalia|Chordata|Aves|Procellariiformes|Proce...</td>\n",
" <td>kingdom|phylum|class|order|family|genus|species</td>\n",
" <td>1|44|212|7192755|9339|2481517|2481521</td>\n",
" <td>2481521</td>\n",
" <td>0</td>\n",
" <td>2021-06-23T23:59:23Z</td>\n",
" <td>2</td>\n",
" <td>Exact match by canonical form</td>\n",
" <td>3|0|0</td>\n",
" <td>0.988</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>True</td>\n",
" <td>Calonectris diomedea</td>\n",
" <td>3</td>\n",
" <td>Integrated Taxonomic Information SystemITIS</td>\n",
" <td>2c4fe23c-30fb-5e01-9894-b4f72766bc8f</td>\n",
" <td>Calonectris diomedea (Scopoli, 1769)</td>\n",
" <td>Calonectris diomedea</td>\n",
" <td>Animalia|Bilateria|Deuterostomia|Chordata|Vert...</td>\n",
" <td>Kingdom|Subkingdom|Infrakingdom|Phylum|Subphyl...</td>\n",
" <td>202423|914154|914156|158852|331030|914179|9141...</td>\n",
" <td>203446</td>\n",
" <td>0</td>\n",
" <td>2021-06-23T03:49:58Z</td>\n",
" <td>2</td>\n",
" <td>Exact match by canonical form</td>\n",
" <td>3|0|0</td>\n",
" <td>0.988</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>True</td>\n",
" <td>Calonectris diomedea</td>\n",
" </tr>\n",
" <tr>\n",
" <th>4</th>\n",
" <td>M</td>\n",
" <td>B</td>\n",
" <td>Alca torda</td>\n",
" <td>600</td>\n",
" <td>2201.000</td>\n",
" <td>(155)</td>\n",
" <td>12</td>\n",
" <td>Encyclopedia of Life</td>\n",
" <td>5d7c4673-7f5b-5b74-ac0c-cfeb677ce2b1</td>\n",
" <td>Alca torda</td>\n",
" <td>Alca torda</td>\n",
" <td></td>\n",
" <td></td>\n",
" <td></td>\n",
" <td>45509345</td>\n",
" <td>45509345</td>\n",
" <td>0</td>\n",
" <td>2021-06-24T17:14:03Z</td>\n",
" <td>1</td>\n",
" <td>Exact string match</td>\n",
" <td>3|0|0</td>\n",
" <td>0.988</td>\n",
" <td>True</td>\n",
" <td>Alca torda</td>\n",
" <td>9.0</td>\n",
" <td>World Register of Marine Species</td>\n",
" <td>4ff6d04a-7e9a-5925-9ce3-3b6f11fa296a</td>\n",
" <td>Alca torda Linnaeus, 1758</td>\n",
" <td>Alca torda</td>\n",
" <td>Biota|Animalia|Chordata|Vertebrata|Gnathostoma...</td>\n",
" <td>|Kingdom|Phylum|Subphylum|Infraphylum|Supercla...</td>\n",
" <td>urn:lsid:marinespecies.org:taxname:1|urn:lsid:...</td>\n",
" <td>urn:lsid:marinespecies.org:taxname:137128</td>\n",
" <td>0.0</td>\n",
" <td>2021-06-23T14:04:03Z</td>\n",
" <td>2.0</td>\n",
" <td>Exact match by canonical form</td>\n",
" <td>3|0|0</td>\n",
" <td>0.988</td>\n",
" <td>NaN</td>\n",
" <td>...</td>\n",
" <td>True</td>\n",
" <td>Alca torda</td>\n",
" <td>11</td>\n",
" <td>GBIF Backbone Taxonomy</td>\n",
" <td>4ff6d04a-7e9a-5925-9ce3-3b6f11fa296a</td>\n",
" <td>Alca torda Linnaeus, 1758</td>\n",
" <td>Alca torda</td>\n",
" <td>Animalia|Chordata|Aves|Charadriiformes|Alcidae...</td>\n",
" <td>kingdom|phylum|class|order|family|genus|species</td>\n",
" <td>1|44|212|7192402|2985|2481308|8277073</td>\n",
" <td>8277073</td>\n",
" <td>0</td>\n",
" <td>2021-06-24T03:34:18Z</td>\n",
" <td>2</td>\n",
" <td>Exact match by canonical form</td>\n",
" <td>3|0|0</td>\n",
" <td>0.988</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>True</td>\n",
" <td>Alca torda</td>\n",
" <td>3</td>\n",
" <td>Integrated Taxonomic Information SystemITIS</td>\n",
" <td>4ff6d04a-7e9a-5925-9ce3-3b6f11fa296a</td>\n",
" <td>Alca torda Linnaeus, 1758</td>\n",
" <td>Alca torda</td>\n",
" <td>Animalia|Bilateria|Deuterostomia|Chordata|Vert...</td>\n",
" <td>Kingdom|Subkingdom|Infrakingdom|Phylum|Subphyl...</td>\n",
" <td>202423|914154|914156|158852|331030|914179|9141...</td>\n",
" <td>176971</td>\n",
" <td>0</td>\n",
" <td>2021-06-23T03:50:03Z</td>\n",
" <td>2</td>\n",
" <td>Exact match by canonical form</td>\n",
" <td>3|0|0</td>\n",
" <td>0.988</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>True</td>\n",
" <td>Alca torda</td>\n",
" </tr>\n",
" <tr>\n",
" <th>...</th>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" </tr>\n",
" <tr>\n",
" <th>95</th>\n",
" <td>T</td>\n",
" <td>B</td>\n",
" <td>Habia rubica</td>\n",
" <td>33</td>\n",
" <td>0.049</td>\n",
" <td>(221)</td>\n",
" <td>12</td>\n",
" <td>Encyclopedia of Life</td>\n",
" <td>734a7ff9-6ac2-54b4-a211-45481c1a084d</td>\n",
" <td>Habia rubica</td>\n",
" <td>Habia rubica</td>\n",
" <td></td>\n",
" <td></td>\n",
" <td></td>\n",
" <td>921778</td>\n",
" <td>921778</td>\n",
" <td>0</td>\n",
" <td>2021-06-24T16:16:44Z</td>\n",
" <td>1</td>\n",
" <td>Exact string match</td>\n",
" <td>3|0|0</td>\n",
" <td>0.988</td>\n",
" <td>True</td>\n",
" <td>Habia rubica</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>No match found</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>...</td>\n",
" <td>True</td>\n",
" <td>Habia rubica</td>\n",
" <td>11</td>\n",
" <td>GBIF Backbone Taxonomy</td>\n",
" <td>4b5708dc-817b-53b9-8721-e2e949ccbf9d</td>\n",
" <td>Habia rubica (Vieillot, 1817)</td>\n",
" <td>Habia rubica</td>\n",
" <td>Animalia|Chordata|Aves|Passeriformes|Cardinali...</td>\n",
" <td>kingdom|phylum|class|order|family|genus|species</td>\n",
" <td>1|44|212|729|9285|5230696|5230705</td>\n",
" <td>5230705</td>\n",
" <td>0</td>\n",
" <td>2021-06-24T02:20:25Z</td>\n",
" <td>2</td>\n",
" <td>Exact match by canonical form</td>\n",
" <td>3|0|0</td>\n",
" <td>0.988</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>True</td>\n",
" <td>Habia rubica</td>\n",
" <td>3</td>\n",
" <td>Integrated Taxonomic Information SystemITIS</td>\n",
" <td>4b5708dc-817b-53b9-8721-e2e949ccbf9d</td>\n",
" <td>Habia rubica (Vieillot, 1817)</td>\n",
" <td>Habia rubica</td>\n",
" <td>Animalia|Bilateria|Deuterostomia|Chordata|Vert...</td>\n",
" <td>Kingdom|Subkingdom|Infrakingdom|Phylum|Subphyl...</td>\n",
" <td>202423|914154|914156|158852|331030|914179|9141...</td>\n",
" <td>560348</td>\n",
" <td>0</td>\n",
" <td>2021-06-23T03:52:19Z</td>\n",
" <td>2</td>\n",
" <td>Exact match by canonical form</td>\n",
" <td>3|0|0</td>\n",
" <td>0.988</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>True</td>\n",
" <td>Habia rubica</td>\n",
" </tr>\n",
" <tr>\n",
" <th>96</th>\n",
" <td>T</td>\n",
" <td>B</td>\n",
" <td>Habia fuscicauda</td>\n",
" <td>38</td>\n",
" <td>0.061</td>\n",
" <td>(221)</td>\n",
" <td>12</td>\n",
" <td>Encyclopedia of Life</td>\n",
" <td>d01ecfc5-69be-5797-a414-8831f8a0cc69</td>\n",
" <td>Habia fuscicauda</td>\n",
" <td>Habia fuscicauda</td>\n",
" <td></td>\n",
" <td></td>\n",
" <td></td>\n",
" <td>921776</td>\n",
" <td>921776</td>\n",
" <td>0</td>\n",
" <td>2021-06-24T16:16:44Z</td>\n",
" <td>1</td>\n",
" <td>Exact string match</td>\n",
" <td>3|0|0</td>\n",
" <td>0.988</td>\n",
" <td>True</td>\n",
" <td>Habia fuscicauda</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>No match found</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>...</td>\n",
" <td>True</td>\n",
" <td>Habia fuscicauda</td>\n",
" <td>11</td>\n",
" <td>GBIF Backbone Taxonomy</td>\n",
" <td>f1dd20fe-6493-5129-a552-a423c50a1c66</td>\n",
" <td>Habia fuscicauda (Cabanis, 1861)</td>\n",
" <td>Habia fuscicauda</td>\n",
" <td>Animalia|Chordata|Aves|Passeriformes|Cardinali...</td>\n",
" <td>kingdom|phylum|class|order|family|genus|species</td>\n",
" <td>1|44|212|729|9285|5230696|5230697</td>\n",
" <td>5230697</td>\n",
" <td>0</td>\n",
" <td>2021-06-24T02:20:25Z</td>\n",
" <td>2</td>\n",
" <td>Exact match by canonical form</td>\n",
" <td>3|0|0</td>\n",
" <td>0.988</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>True</td>\n",
" <td>Habia fuscicauda</td>\n",
" <td>3</td>\n",
" <td>Integrated Taxonomic Information SystemITIS</td>\n",
" <td>f1dd20fe-6493-5129-a552-a423c50a1c66</td>\n",
" <td>Habia fuscicauda (Cabanis, 1861)</td>\n",
" <td>Habia fuscicauda</td>\n",
" <td>Animalia|Bilateria|Deuterostomia|Chordata|Vert...</td>\n",
" <td>Kingdom|Subkingdom|Infrakingdom|Phylum|Subphyl...</td>\n",
" <td>202423|914154|914156|158852|331030|914179|9141...</td>\n",
" <td>560346</td>\n",
" <td>0</td>\n",
" <td>2021-06-23T03:52:19Z</td>\n",
" <td>2</td>\n",
" <td>Exact match by canonical form</td>\n",
" <td>3|0|0</td>\n",
" <td>0.988</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>True</td>\n",
" <td>Habia fuscicauda</td>\n",
" </tr>\n",
" <tr>\n",
" <th>97</th>\n",
" <td>T</td>\n",
" <td>B</td>\n",
" <td>Dasyornis brachypterus</td>\n",
" <td>42</td>\n",
" <td>0.100</td>\n",
" <td>(222);(223)</td>\n",
" <td>12</td>\n",
" <td>Encyclopedia of Life</td>\n",
" <td>2b2e6e76-5c5d-5d86-ad3a-5c2b74613c30</td>\n",
" <td>Dasyornis brachypterus</td>\n",
" <td>Dasyornis brachypterus</td>\n",
" <td></td>\n",
" <td></td>\n",
" <td></td>\n",
" <td>45518088</td>\n",
" <td>45518088</td>\n",
" <td>0</td>\n",
" <td>2021-06-24T16:12:38Z</td>\n",
" <td>1</td>\n",
" <td>Exact string match</td>\n",
" <td>3|0|0</td>\n",
" <td>0.988</td>\n",
" <td>True</td>\n",
" <td>Dasyornis brachypterus</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>No match found</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>...</td>\n",
" <td>True</td>\n",
" <td>Dasyornis brachypterus</td>\n",
" <td>11</td>\n",
" <td>GBIF Backbone Taxonomy</td>\n",
" <td>46dc4c95-fff7-5463-89bd-8b19182f074f</td>\n",
" <td>Dasyornis brachypterus (Latham, 1802)</td>\n",
" <td>Dasyornis brachypterus</td>\n",
" <td>Animalia|Chordata|Aves|Passeriformes|Dasyornit...</td>\n",
" <td>kingdom|phylum|class|order|family|genus|species</td>\n",
" <td>1|44|212|729|5788774|5230412|5230413</td>\n",
" <td>5230413</td>\n",
" <td>0</td>\n",
" <td>2021-06-24T02:20:23Z</td>\n",
" <td>2</td>\n",
" <td>Exact match by canonical form</td>\n",
" <td>3|0|0</td>\n",
" <td>0.988</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>True</td>\n",
" <td>Dasyornis brachypterus</td>\n",
" <td>3</td>\n",
" <td>Integrated Taxonomic Information SystemITIS</td>\n",
" <td>46dc4c95-fff7-5463-89bd-8b19182f074f</td>\n",
" <td>Dasyornis brachypterus (Latham, 1802)</td>\n",
" <td>Dasyornis brachypterus</td>\n",
" <td>Animalia|Bilateria|Deuterostomia|Chordata|Vert...</td>\n",
" <td>Kingdom|Subkingdom|Infrakingdom|Phylum|Subphyl...</td>\n",
" <td>202423|914154|914156|158852|331030|914179|9141...</td>\n",
" <td>559685</td>\n",
" <td>0</td>\n",
" <td>2021-06-23T03:52:18Z</td>\n",
" <td>2</td>\n",
" <td>Exact match by canonical form</td>\n",
" <td>3|0|0</td>\n",
" <td>0.988</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>True</td>\n",
" <td>Dasyornis brachypterus</td>\n",
" </tr>\n",
" <tr>\n",
" <th>98</th>\n",
" <td>T</td>\n",
" <td>B</td>\n",
" <td>Lanius ludovicianus</td>\n",
" <td>48</td>\n",
" <td>0.073</td>\n",
" <td>(221|224–227)</td>\n",
" <td>12</td>\n",
" <td>Encyclopedia of Life</td>\n",
" <td>1332f6b2-0911-5802-b6d5-7a4aa2666dc1</td>\n",
" <td>Lanius ludovicianus</td>\n",
" <td>Lanius ludovicianus</td>\n",
" <td></td>\n",
" <td></td>\n",
" <td></td>\n",
" <td>45510944</td>\n",
" <td>45510944</td>\n",
" <td>0</td>\n",
" <td>2021-06-24T16:16:33Z</td>\n",
" <td>1</td>\n",
" <td>Exact string match</td>\n",
" <td>3|0|0</td>\n",
" <td>0.988</td>\n",
" <td>True</td>\n",
" <td>Lanius ludovicianus</td>\n",
" <td>9.0</td>\n",
" <td>World Register of Marine Species</td>\n",
" <td>4aa2abb8-7617-5996-bed3-ca66116ba2d4</td>\n",
" <td>Lanius ludovicianus Linnaeus, 1766</td>\n",
" <td>Lanius ludovicianus</td>\n",
" <td>Biota|Animalia|Chordata|Vertebrata|Gnathostoma...</td>\n",
" <td>|Kingdom|Phylum|Subphylum|Infraphylum|Supercla...</td>\n",
" <td>urn:lsid:marinespecies.org:taxname:1|urn:lsid:...</td>\n",
" <td>urn:lsid:marinespecies.org:taxname:1484469</td>\n",
" <td>0.0</td>\n",
" <td>2021-06-23T14:39:12Z</td>\n",
" <td>2.0</td>\n",
" <td>Exact match by canonical form</td>\n",
" <td>3|0|0</td>\n",
" <td>0.988</td>\n",
" <td>NaN</td>\n",
" <td>...</td>\n",
" <td>True</td>\n",
" <td>Lanius ludovicianus</td>\n",
" <td>11</td>\n",
" <td>GBIF Backbone Taxonomy</td>\n",
" <td>4aa2abb8-7617-5996-bed3-ca66116ba2d4</td>\n",
" <td>Lanius ludovicianus Linnaeus, 1766</td>\n",
" <td>Lanius ludovicianus</td>\n",
" <td>Animalia|Chordata|Aves|Passeriformes|Laniidae|...</td>\n",
" <td>kingdom|phylum|class|order|family|genus|species</td>\n",
" <td>1|44|212|729|9315|2483315|2492870</td>\n",
" <td>2492870</td>\n",
" <td>0</td>\n",
" <td>2021-06-24T00:02:08Z</td>\n",
" <td>2</td>\n",
" <td>Exact match by canonical form</td>\n",
" <td>3|0|0</td>\n",
" <td>0.988</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>True</td>\n",
" <td>Lanius ludovicianus</td>\n",
" <td>3</td>\n",
" <td>Integrated Taxonomic Information SystemITIS</td>\n",
" <td>4aa2abb8-7617-5996-bed3-ca66116ba2d4</td>\n",
" <td>Lanius ludovicianus Linnaeus, 1766</td>\n",
" <td>Lanius ludovicianus</td>\n",
" <td>Animalia|Bilateria|Deuterostomia|Chordata|Vert...</td>\n",
" <td>Kingdom|Subkingdom|Infrakingdom|Phylum|Subphyl...</td>\n",
" <td>202423|914154|914156|158852|331030|914179|9141...</td>\n",
" <td>178515</td>\n",
" <td>0</td>\n",
" <td>2021-06-23T03:50:09Z</td>\n",
" <td>2</td>\n",
" <td>Exact match by canonical form</td>\n",
" <td>3|0|0</td>\n",
" <td>0.988</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>True</td>\n",
" <td>Lanius ludovicianus</td>\n",
" </tr>\n",
" <tr>\n",
" <th>99</th>\n",
" <td>T</td>\n",
" <td>B</td>\n",
" <td>Laniuis ludovicianus</td>\n",
" <td>48</td>\n",
" <td>0.076</td>\n",
" <td>(221)</td>\n",
" <td>12</td>\n",
" <td>Encyclopedia of Life</td>\n",
" <td>1332f6b2-0911-5802-b6d5-7a4aa2666dc1</td>\n",
" <td>Lanius ludovicianus</td>\n",
" <td>Lanius ludovicianus</td>\n",
" <td></td>\n",
" <td></td>\n",
" <td></td>\n",
" <td>45510944</td>\n",
" <td>45510944</td>\n",
" <td>1</td>\n",
" <td>2021-06-24T16:16:33Z</td>\n",
" <td>3</td>\n",
" <td>Fuzzy match by canonical form</td>\n",
" <td>1|0|0</td>\n",
" <td>0.750</td>\n",
" <td>False</td>\n",
" <td>Laniuis ludovicianus</td>\n",
" <td>9.0</td>\n",
" <td>World Register of Marine Species</td>\n",
" <td>4aa2abb8-7617-5996-bed3-ca66116ba2d4</td>\n",
" <td>Lanius ludovicianus Linnaeus, 1766</td>\n",
" <td>Lanius ludovicianus</td>\n",
" <td>Biota|Animalia|Chordata|Vertebrata|Gnathostoma...</td>\n",
" <td>|Kingdom|Phylum|Subphylum|Infraphylum|Supercla...</td>\n",
" <td>urn:lsid:marinespecies.org:taxname:1|urn:lsid:...</td>\n",
" <td>urn:lsid:marinespecies.org:taxname:1484469</td>\n",
" <td>1.0</td>\n",
" <td>2021-06-23T14:39:12Z</td>\n",
" <td>3.0</td>\n",
" <td>Fuzzy match by canonical form</td>\n",
" <td>1|0|0</td>\n",
" <td>0.750</td>\n",
" <td>NaN</td>\n",
" <td>...</td>\n",
" <td>False</td>\n",
" <td>Laniuis ludovicianus</td>\n",
" <td>11</td>\n",
" <td>GBIF Backbone Taxonomy</td>\n",
" <td>4aa2abb8-7617-5996-bed3-ca66116ba2d4</td>\n",
" <td>Lanius ludovicianus Linnaeus, 1766</td>\n",
" <td>Lanius ludovicianus</td>\n",
" <td>Animalia|Chordata|Aves|Passeriformes|Laniidae|...</td>\n",
" <td>kingdom|phylum|class|order|family|genus|species</td>\n",
" <td>1|44|212|729|9315|2483315|2492870</td>\n",
" <td>2492870</td>\n",
" <td>1</td>\n",
" <td>2021-06-24T00:02:08Z</td>\n",
" <td>3</td>\n",
" <td>Fuzzy match by canonical form</td>\n",
" <td>1|0|0</td>\n",
" <td>0.750</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>False</td>\n",
" <td>Laniuis ludovicianus</td>\n",
" <td>3</td>\n",
" <td>Integrated Taxonomic Information SystemITIS</td>\n",
" <td>4aa2abb8-7617-5996-bed3-ca66116ba2d4</td>\n",
" <td>Lanius ludovicianus Linnaeus, 1766</td>\n",
" <td>Lanius ludovicianus</td>\n",
" <td>Animalia|Bilateria|Deuterostomia|Chordata|Vert...</td>\n",
" <td>Kingdom|Subkingdom|Infrakingdom|Phylum|Subphyl...</td>\n",
" <td>202423|914154|914156|158852|331030|914179|9141...</td>\n",
" <td>178515</td>\n",
" <td>1</td>\n",
" <td>2021-06-23T03:50:09Z</td>\n",
" <td>3</td>\n",
" <td>Fuzzy match by canonical form</td>\n",
" <td>1|0|0</td>\n",
" <td>0.750</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>False</td>\n",
" <td>Laniuis ludovicianus</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"<p>100 rows × 98 columns</p>\n",
"</div>"
],
"text/plain": [
" System Group ... ITIS_is_known_name ITIS_supplied_name_string\n",
"0 M B ... True Sterna forsteri\n",
"1 M B ... True Ptychoramphus aleuticus\n",
"2 M B ... True Brachyramphus marmoratus\n",
"3 M B ... True Calonectris diomedea\n",
"4 M B ... True Alca torda\n",
".. ... ... ... ... ...\n",
"95 T B ... True Habia rubica\n",
"96 T B ... True Habia fuscicauda\n",
"97 T B ... True Dasyornis brachypterus\n",
"98 T B ... True Lanius ludovicianus\n",
"99 T B ... False Laniuis ludovicianus\n",
"\n",
"[100 rows x 98 columns]"
]
},
"metadata": {},
"execution_count": 89
}
]
},
{
"cell_type": "code",
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/"
},
"id": "5n_36WvobjRN",
"outputId": "280ab8da-06e2-4758-c8be-69508823094a"
},
"source": [
"# Let's do another match after pulling data from BCO-DMO's ERDDAP endpoint\n",
"\n",
"url = \"https://erddap.bco-dmo.org/erddap/tabledap/bcodmo_dataset_752795.json\"\n",
"\n",
"response = requests.get(url)\n",
"print(response.status_code)"
],
"execution_count": null,
"outputs": [
{
"output_type": "stream",
"name": "stdout",
"text": [
"200\n"
]
}
]
},
{
"cell_type": "code",
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/",
"height": 675
},
"id": "o6Bf13M6bsfW",
"outputId": "74f80dc3-2391-4a15-9cb2-bc519846e274"
},
"source": [
"j = response.json()\n",
"\n",
"# Let's get this in a data frame\n",
"\n",
"df = pd.DataFrame(data=j['table']['rows'][0:20],columns=j['table']['columnNames'])\n",
"\n",
"# Print the dataframe\n",
"df"
],
"execution_count": null,
"outputs": [
{
"output_type": "execute_result",
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>System</th>\n",
" <th>Group</th>\n",
" <th>Species</th>\n",
" <th>BM</th>\n",
" <th>HR</th>\n",
" <th>Refs</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>0</th>\n",
" <td>M</td>\n",
" <td>B</td>\n",
" <td>Sterna forsteri</td>\n",
" <td>149</td>\n",
" <td>5.800000e+01</td>\n",
" <td>(150)</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1</th>\n",
" <td>M</td>\n",
" <td>B</td>\n",
" <td>Ptychoramphus aleuticus</td>\n",
" <td>164</td>\n",
" <td>3.008000e+03</td>\n",
" <td>(151|152)</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2</th>\n",
" <td>M</td>\n",
" <td>B</td>\n",
" <td>Brachyramphus marmoratus</td>\n",
" <td>220</td>\n",
" <td>1.270000e+02</td>\n",
" <td>(153)</td>\n",
" </tr>\n",
" <tr>\n",
" <th>3</th>\n",
" <td>M</td>\n",
" <td>B</td>\n",
" <td>Calonectris diomedea</td>\n",
" <td>535</td>\n",
" <td>4.857760e+05</td>\n",
" <td>(154)</td>\n",
" </tr>\n",
" <tr>\n",
" <th>4</th>\n",
" <td>M</td>\n",
" <td>B</td>\n",
" <td>Alca torda</td>\n",
" <td>600</td>\n",
" <td>2.201000e+03</td>\n",
" <td>(155)</td>\n",
" </tr>\n",
" <tr>\n",
" <th>5</th>\n",
" <td>M</td>\n",
" <td>B</td>\n",
" <td>Uria aalge</td>\n",
" <td>907</td>\n",
" <td>8.150000e+02</td>\n",
" <td>(155|156)</td>\n",
" </tr>\n",
" <tr>\n",
" <th>6</th>\n",
" <td>M</td>\n",
" <td>B</td>\n",
" <td>Sula sula</td>\n",
" <td>956</td>\n",
" <td>5.454000e+03</td>\n",
" <td>(157)</td>\n",
" </tr>\n",
" <tr>\n",
" <th>7</th>\n",
" <td>M</td>\n",
" <td>B</td>\n",
" <td>Melanitta nigra</td>\n",
" <td>1052</td>\n",
" <td>1.298000e+03</td>\n",
" <td>(158)</td>\n",
" </tr>\n",
" <tr>\n",
" <th>8</th>\n",
" <td>M</td>\n",
" <td>B</td>\n",
" <td>Procellaria aequinoctialis</td>\n",
" <td>1213</td>\n",
" <td>6.830000e+05</td>\n",
" <td>(159)</td>\n",
" </tr>\n",
" <tr>\n",
" <th>9</th>\n",
" <td>M</td>\n",
" <td>B</td>\n",
" <td>Procellaria conspicillata</td>\n",
" <td>1278</td>\n",
" <td>5.965460e+05</td>\n",
" <td>(160)</td>\n",
" </tr>\n",
" <tr>\n",
" <th>10</th>\n",
" <td>M</td>\n",
" <td>B</td>\n",
" <td>Papasula abbotti</td>\n",
" <td>1572</td>\n",
" <td>1.085030e+05</td>\n",
" <td>(161)</td>\n",
" </tr>\n",
" <tr>\n",
" <th>11</th>\n",
" <td>M</td>\n",
" <td>B</td>\n",
" <td>Somateria mollissima</td>\n",
" <td>2067</td>\n",
" <td>6.800000e+01</td>\n",
" <td>(162)</td>\n",
" </tr>\n",
" <tr>\n",
" <th>12</th>\n",
" <td>M</td>\n",
" <td>B</td>\n",
" <td>Eudyptes filholi</td>\n",
" <td>2330</td>\n",
" <td>2.605360e+06</td>\n",
" <td>(163)</td>\n",
" </tr>\n",
" <tr>\n",
" <th>13</th>\n",
" <td>M</td>\n",
" <td>B</td>\n",
" <td>Thalassarche chrysostoma</td>\n",
" <td>3508</td>\n",
" <td>6.400000e+05</td>\n",
" <td>(164)</td>\n",
" </tr>\n",
" <tr>\n",
" <th>14</th>\n",
" <td>M</td>\n",
" <td>B</td>\n",
" <td>Thalassarche melanophrys</td>\n",
" <td>3564</td>\n",
" <td>4.440000e+05</td>\n",
" <td>(164)</td>\n",
" </tr>\n",
" <tr>\n",
" <th>15</th>\n",
" <td>M</td>\n",
" <td>B</td>\n",
" <td>Pygoscelis antarctica</td>\n",
" <td>3800</td>\n",
" <td>3.270000e+02</td>\n",
" <td>(165)</td>\n",
" </tr>\n",
" <tr>\n",
" <th>16</th>\n",
" <td>M</td>\n",
" <td>B</td>\n",
" <td>Gaviia immer</td>\n",
" <td>5186</td>\n",
" <td>5.470000e+02</td>\n",
" <td>(166)</td>\n",
" </tr>\n",
" <tr>\n",
" <th>17</th>\n",
" <td>M</td>\n",
" <td>B</td>\n",
" <td>Pygoscelis papua</td>\n",
" <td>5190</td>\n",
" <td>1.260000e+02</td>\n",
" <td>(165)</td>\n",
" </tr>\n",
" <tr>\n",
" <th>18</th>\n",
" <td>M</td>\n",
" <td>B</td>\n",
" <td>Eudyptes chrysolophus</td>\n",
" <td>21700</td>\n",
" <td>1.760000e+06</td>\n",
" <td>(167)</td>\n",
" </tr>\n",
" <tr>\n",
" <th>19</th>\n",
" <td>M</td>\n",
" <td>F</td>\n",
" <td>Ctenochaetus striatus</td>\n",
" <td>57</td>\n",
" <td>1.200000e-05</td>\n",
" <td>(168);(169)</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" System Group Species BM HR Refs\n",
"0 M B Sterna forsteri 149 5.800000e+01 (150)\n",
"1 M B Ptychoramphus aleuticus 164 3.008000e+03 (151|152)\n",
"2 M B Brachyramphus marmoratus 220 1.270000e+02 (153)\n",
"3 M B Calonectris diomedea 535 4.857760e+05 (154)\n",
"4 M B Alca torda 600 2.201000e+03 (155)\n",
"5 M B Uria aalge 907 8.150000e+02 (155|156)\n",
"6 M B Sula sula 956 5.454000e+03 (157)\n",
"7 M B Melanitta nigra 1052 1.298000e+03 (158)\n",
"8 M B Procellaria aequinoctialis 1213 6.830000e+05 (159)\n",
"9 M B Procellaria conspicillata 1278 5.965460e+05 (160)\n",
"10 M B Papasula abbotti 1572 1.085030e+05 (161)\n",
"11 M B Somateria mollissima 2067 6.800000e+01 (162)\n",
"12 M B Eudyptes filholi 2330 2.605360e+06 (163)\n",
"13 M B Thalassarche chrysostoma 3508 6.400000e+05 (164)\n",
"14 M B Thalassarche melanophrys 3564 4.440000e+05 (164)\n",
"15 M B Pygoscelis antarctica 3800 3.270000e+02 (165)\n",
"16 M B Gaviia immer 5186 5.470000e+02 (166)\n",
"17 M B Pygoscelis papua 5190 1.260000e+02 (165)\n",
"18 M B Eudyptes chrysolophus 21700 1.760000e+06 (167)\n",
"19 M F Ctenochaetus striatus 57 1.200000e-05 (168);(169)"
]
},
"metadata": {},
"execution_count": 91
}
]
},
{
"cell_type": "code",
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/",
"height": 1000
},
"id": "MdPW_TqKd08v",
"outputId": "0695befd-454f-4420-ec79-6a0c8521c35c"
},
"source": [
"# Add the match info to the data frame\n",
"df_matches = match_df(df_orig,'Species',sources)\n",
"df_matches\n"
],
"execution_count": null,
"outputs": [
{
"output_type": "stream",
"name": "stdout",
"text": [
"Matching names with GNRD source 12: EOL\n",
"Matching names with GNRD source 9: WoRMS\n",
"\t\t No match for: {'supplied_name_string': 'Chamaea fasciata', 'is_known_name': False}\n",
"\t\t No match for: {'supplied_name_string': 'Laniuis collurio', 'is_known_name': False}\n",
"\t\t No match for: {'supplied_name_string': 'Habia rubica', 'is_known_name': False}\n",
"\t\t No match for: {'supplied_name_string': 'Habia fuscicauda', 'is_known_name': False}\n",
"\t\t No match for: {'supplied_name_string': 'Dasyornis brachypterus', 'is_known_name': False}\n",
"Matching names with GNRD source 4: NCBI\n",
"\t\t No match for: {'supplied_name_string': 'Dendroica magnolia', 'is_known_name': False}\n",
"\t\t No match for: {'supplied_name_string': 'Dendroica virens', 'is_known_name': False}\n",
"\t\t No match for: {'supplied_name_string': 'Dendroica pensylvanica', 'is_known_name': False}\n",
"\t\t No match for: {'supplied_name_string': 'Dendroica fusca', 'is_known_name': False}\n",
"\t\t No match for: {'supplied_name_string': 'Dendroica kirtlandi', 'is_known_name': False}\n",
"Matching names with GNRD source 11: GBIF\n",
"Matching names with GNRD source 3: ITIS\n",
"done\n"
]
},
{
"output_type": "execute_result",
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>System</th>\n",
" <th>Group</th>\n",
" <th>Species</th>\n",
" <th>BM</th>\n",
" <th>HR</th>\n",
" <th>Refs</th>\n",
" <th>EOL_data_source_id</th>\n",
" <th>EOL_data_source_title</th>\n",
" <th>EOL_gni_uuid</th>\n",
" <th>EOL_name_string</th>\n",
" <th>EOL_canonical_form</th>\n",
" <th>EOL_classification_path</th>\n",
" <th>EOL_classification_path_ranks</th>\n",
" <th>EOL_classification_path_ids</th>\n",
" <th>EOL_taxon_id</th>\n",
" <th>EOL_local_id</th>\n",
" <th>EOL_edit_distance</th>\n",
" <th>EOL_imported_at</th>\n",
" <th>EOL_match_type</th>\n",
" <th>EOL_match_value</th>\n",
" <th>EOL_prescore</th>\n",
" <th>EOL_score</th>\n",
" <th>EOL_is_known_name</th>\n",
" <th>EOL_supplied_name_string</th>\n",
" <th>WoRMS_data_source_id</th>\n",
" <th>WoRMS_data_source_title</th>\n",
" <th>WoRMS_gni_uuid</th>\n",
" <th>WoRMS_name_string</th>\n",
" <th>WoRMS_canonical_form</th>\n",
" <th>WoRMS_classification_path</th>\n",
" <th>WoRMS_classification_path_ranks</th>\n",
" <th>WoRMS_classification_path_ids</th>\n",
" <th>WoRMS_taxon_id</th>\n",
" <th>WoRMS_edit_distance</th>\n",
" <th>WoRMS_imported_at</th>\n",
" <th>WoRMS_match_type</th>\n",
" <th>WoRMS_match_value</th>\n",
" <th>WoRMS_prescore</th>\n",
" <th>WoRMS_score</th>\n",
" <th>WoRMS_current_taxon_id</th>\n",
" <th>...</th>\n",
" <th>NCBI_is_known_name</th>\n",
" <th>NCBI_supplied_name_string</th>\n",
" <th>GBIF_data_source_id</th>\n",
" <th>GBIF_data_source_title</th>\n",
" <th>GBIF_gni_uuid</th>\n",
" <th>GBIF_name_string</th>\n",
" <th>GBIF_canonical_form</th>\n",
" <th>GBIF_classification_path</th>\n",
" <th>GBIF_classification_path_ranks</th>\n",
" <th>GBIF_classification_path_ids</th>\n",
" <th>GBIF_taxon_id</th>\n",
" <th>GBIF_edit_distance</th>\n",
" <th>GBIF_imported_at</th>\n",
" <th>GBIF_match_type</th>\n",
" <th>GBIF_match_value</th>\n",
" <th>GBIF_prescore</th>\n",
" <th>GBIF_score</th>\n",
" <th>GBIF_current_taxon_id</th>\n",
" <th>GBIF_current_name_string</th>\n",
" <th>GBIF_is_known_name</th>\n",
" <th>GBIF_supplied_name_string</th>\n",
" <th>ITIS_data_source_id</th>\n",
" <th>ITIS_data_source_title</th>\n",
" <th>ITIS_gni_uuid</th>\n",
" <th>ITIS_name_string</th>\n",
" <th>ITIS_canonical_form</th>\n",
" <th>ITIS_classification_path</th>\n",
" <th>ITIS_classification_path_ranks</th>\n",
" <th>ITIS_classification_path_ids</th>\n",
" <th>ITIS_taxon_id</th>\n",
" <th>ITIS_edit_distance</th>\n",
" <th>ITIS_imported_at</th>\n",
" <th>ITIS_match_type</th>\n",
" <th>ITIS_match_value</th>\n",
" <th>ITIS_prescore</th>\n",
" <th>ITIS_score</th>\n",
" <th>ITIS_current_taxon_id</th>\n",
" <th>ITIS_current_name_string</th>\n",
" <th>ITIS_is_known_name</th>\n",
" <th>ITIS_supplied_name_string</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>0</th>\n",
" <td>M</td>\n",
" <td>B</td>\n",
" <td>Sterna forsteri</td>\n",
" <td>149</td>\n",
" <td>58.000</td>\n",
" <td>(150)</td>\n",
" <td>12</td>\n",
" <td>Encyclopedia of Life</td>\n",
" <td>75cd7ab7-b950-5609-85e6-352d56155f2b</td>\n",
" <td>Sterna forsteri</td>\n",
" <td>Sterna forsteri</td>\n",
" <td></td>\n",
" <td></td>\n",
" <td></td>\n",
" <td>45509329</td>\n",
" <td>45509329</td>\n",
" <td>0</td>\n",
" <td>2021-06-24T17:14:03Z</td>\n",
" <td>1</td>\n",
" <td>Exact string match</td>\n",
" <td>3|0|0</td>\n",
" <td>0.988</td>\n",
" <td>True</td>\n",
" <td>Sterna forsteri</td>\n",
" <td>9.0</td>\n",
" <td>World Register of Marine Species</td>\n",
" <td>540b124d-66e3-5a21-8c74-d70906cb1a5a</td>\n",
" <td>Sterna forsteri Nuttall, 1834</td>\n",
" <td>Sterna forsteri</td>\n",
" <td>Biota|Animalia|Chordata|Vertebrata|Gnathostoma...</td>\n",
" <td>|Kingdom|Phylum|Subphylum|Infraphylum|Supercla...</td>\n",
" <td>urn:lsid:marinespecies.org:taxname:1|urn:lsid:...</td>\n",
" <td>urn:lsid:marinespecies.org:taxname:159057</td>\n",
" <td>0.0</td>\n",
" <td>2021-06-23T14:05:19Z</td>\n",
" <td>2.0</td>\n",
" <td>Exact match by canonical form</td>\n",
" <td>3|0|0</td>\n",
" <td>0.988</td>\n",
" <td>NaN</td>\n",
" <td>...</td>\n",
" <td>True</td>\n",
" <td>Sterna forsteri</td>\n",
" <td>11</td>\n",
" <td>GBIF Backbone Taxonomy</td>\n",
" <td>540b124d-66e3-5a21-8c74-d70906cb1a5a</td>\n",
" <td>Sterna forsteri Nuttall, 1834</td>\n",
" <td>Sterna forsteri</td>\n",
" <td>Animalia|Chordata|Aves|Charadriiformes|Laridae...</td>\n",
" <td>kingdom|phylum|class|order|family|genus|species</td>\n",
" <td>1|44|212|7192402|9316|2481227|5229247</td>\n",
" <td>5229247</td>\n",
" <td>0</td>\n",
" <td>2021-06-24T02:19:59Z</td>\n",
" <td>2</td>\n",
" <td>Exact match by canonical form</td>\n",
" <td>3|0|0</td>\n",
" <td>0.988</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>True</td>\n",
" <td>Sterna forsteri</td>\n",
" <td>3</td>\n",
" <td>Integrated Taxonomic Information SystemITIS</td>\n",
" <td>540b124d-66e3-5a21-8c74-d70906cb1a5a</td>\n",
" <td>Sterna forsteri Nuttall, 1834</td>\n",
" <td>Sterna forsteri</td>\n",
" <td>Animalia|Bilateria|Deuterostomia|Chordata|Vert...</td>\n",
" <td>Kingdom|Subkingdom|Infrakingdom|Phylum|Subphyl...</td>\n",
" <td>202423|914154|914156|158852|331030|914179|9141...</td>\n",
" <td>176887</td>\n",
" <td>0</td>\n",
" <td>2021-06-23T03:50:03Z</td>\n",
" <td>2</td>\n",
" <td>Exact match by canonical form</td>\n",
" <td>3|0|0</td>\n",
" <td>0.988</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>True</td>\n",
" <td>Sterna forsteri</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1</th>\n",
" <td>M</td>\n",
" <td>B</td>\n",
" <td>Ptychoramphus aleuticus</td>\n",
" <td>164</td>\n",
" <td>3008.000</td>\n",
" <td>(151|152)</td>\n",
" <td>12</td>\n",
" <td>Encyclopedia of Life</td>\n",
" <td>7884810e-fb20-5c52-a78f-c84ccaa8ad46</td>\n",
" <td>Ptychoramphus aleuticus</td>\n",
" <td>Ptychoramphus aleuticus</td>\n",
" <td></td>\n",
" <td></td>\n",
" <td></td>\n",
" <td>45509361</td>\n",
" <td>45509361</td>\n",
" <td>0</td>\n",
" <td>2021-06-24T17:14:03Z</td>\n",
" <td>1</td>\n",
" <td>Exact string match</td>\n",
" <td>3|0|0</td>\n",
" <td>0.988</td>\n",
" <td>True</td>\n",
" <td>Ptychoramphus aleuticus</td>\n",
" <td>9.0</td>\n",
" <td>World Register of Marine Species</td>\n",
" <td>449fb8f0-b255-5506-b7a9-f94fce8a0ff2</td>\n",
" <td>Ptychoramphus aleuticus (Pallas, 1811)</td>\n",
" <td>Ptychoramphus aleuticus</td>\n",
" <td>Biota|Animalia|Chordata|Vertebrata|Gnathostoma...</td>\n",
" <td>|Kingdom|Phylum|Subphylum|Infraphylum|Supercla...</td>\n",
" <td>urn:lsid:marinespecies.org:taxname:1|urn:lsid:...</td>\n",
" <td>urn:lsid:marinespecies.org:taxname:344115</td>\n",
" <td>0.0</td>\n",
" <td>2021-06-23T14:11:37Z</td>\n",
" <td>2.0</td>\n",
" <td>Exact match by canonical form</td>\n",
" <td>3|0|0</td>\n",
" <td>0.988</td>\n",
" <td>NaN</td>\n",
" <td>...</td>\n",
" <td>True</td>\n",
" <td>Ptychoramphus aleuticus</td>\n",
" <td>11</td>\n",
" <td>GBIF Backbone Taxonomy</td>\n",
" <td>449fb8f0-b255-5506-b7a9-f94fce8a0ff2</td>\n",
" <td>Ptychoramphus aleuticus (Pallas, 1811)</td>\n",
" <td>Ptychoramphus aleuticus</td>\n",
" <td>Animalia|Chordata|Aves|Charadriiformes|Alcidae...</td>\n",
" <td>kingdom|phylum|class|order|family|genus|species</td>\n",
" <td>1|44|212|7192402|2985|2481301|2481302</td>\n",
" <td>2481302</td>\n",
" <td>0</td>\n",
" <td>2021-06-23T23:59:22Z</td>\n",
" <td>2</td>\n",
" <td>Exact match by canonical form</td>\n",
" <td>3|0|0</td>\n",
" <td>0.988</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>True</td>\n",
" <td>Ptychoramphus aleuticus</td>\n",
" <td>3</td>\n",
" <td>Integrated Taxonomic Information SystemITIS</td>\n",
" <td>449fb8f0-b255-5506-b7a9-f94fce8a0ff2</td>\n",
" <td>Ptychoramphus aleuticus (Pallas, 1811)</td>\n",
" <td>Ptychoramphus aleuticus</td>\n",
" <td>Animalia|Bilateria|Deuterostomia|Chordata|Vert...</td>\n",
" <td>Kingdom|Subkingdom|Infrakingdom|Phylum|Subphyl...</td>\n",
" <td>202423|914154|914156|158852|331030|914179|9141...</td>\n",
" <td>177013</td>\n",
" <td>0</td>\n",
" <td>2021-06-23T03:50:03Z</td>\n",
" <td>2</td>\n",
" <td>Exact match by canonical form</td>\n",
" <td>3|0|0</td>\n",
" <td>0.988</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>True</td>\n",
" <td>Ptychoramphus aleuticus</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2</th>\n",
" <td>M</td>\n",
" <td>B</td>\n",
" <td>Brachyramphus marmoratus</td>\n",
" <td>220</td>\n",
" <td>127.000</td>\n",
" <td>(153)</td>\n",
" <td>12</td>\n",
" <td>Encyclopedia of Life</td>\n",
" <td>1a704327-c41c-5fa5-9a5a-4a745ca8c5a5</td>\n",
" <td>Brachyramphus marmoratus</td>\n",
" <td>Brachyramphus marmoratus</td>\n",
" <td></td>\n",
" <td></td>\n",
" <td></td>\n",
" <td>45509352</td>\n",
" <td>45509352</td>\n",
" <td>0</td>\n",
" <td>2021-06-24T17:14:03Z</td>\n",
" <td>1</td>\n",
" <td>Exact string match</td>\n",
" <td>3|0|0</td>\n",
" <td>0.988</td>\n",
" <td>True</td>\n",
" <td>Brachyramphus marmoratus</td>\n",
" <td>9.0</td>\n",
" <td>World Register of Marine Species</td>\n",
" <td>5c6c9cc6-e3ea-5654-97bc-f98a0ff18455</td>\n",
" <td>Brachyramphus marmoratus (Gmelin, 1789)</td>\n",
" <td>Brachyramphus marmoratus</td>\n",
" <td>Biota|Animalia|Chordata|Vertebrata|Gnathostoma...</td>\n",
" <td>|Kingdom|Phylum|Subphylum|Infraphylum|Supercla...</td>\n",
" <td>urn:lsid:marinespecies.org:taxname:1|urn:lsid:...</td>\n",
" <td>urn:lsid:marinespecies.org:taxname:254308</td>\n",
" <td>0.0</td>\n",
" <td>2021-06-23T14:08:53Z</td>\n",
" <td>2.0</td>\n",
" <td>Exact match by canonical form</td>\n",
" <td>3|0|0</td>\n",
" <td>0.988</td>\n",
" <td>NaN</td>\n",
" <td>...</td>\n",
" <td>True</td>\n",
" <td>Brachyramphus marmoratus</td>\n",
" <td>11</td>\n",
" <td>GBIF Backbone Taxonomy</td>\n",
" <td>5c6c9cc6-e3ea-5654-97bc-f98a0ff18455</td>\n",
" <td>Brachyramphus marmoratus (Gmelin, 1789)</td>\n",
" <td>Brachyramphus marmoratus</td>\n",
" <td>Animalia|Chordata|Aves|Charadriiformes|Alcidae...</td>\n",
" <td>kingdom|phylum|class|order|family|genus|species</td>\n",
" <td>1|44|212|7192402|2985|2481326|5229281</td>\n",
" <td>5229281</td>\n",
" <td>0</td>\n",
" <td>2021-06-24T01:34:45Z</td>\n",
" <td>2</td>\n",
" <td>Exact match by canonical form</td>\n",
" <td>3|0|0</td>\n",
" <td>0.988</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>True</td>\n",
" <td>Brachyramphus marmoratus</td>\n",
" <td>3</td>\n",
" <td>Integrated Taxonomic Information SystemITIS</td>\n",
" <td>5c6c9cc6-e3ea-5654-97bc-f98a0ff18455</td>\n",
" <td>Brachyramphus marmoratus (Gmelin, 1789)</td>\n",
" <td>Brachyramphus marmoratus</td>\n",
" <td>Animalia|Bilateria|Deuterostomia|Chordata|Vert...</td>\n",
" <td>Kingdom|Subkingdom|Infrakingdom|Phylum|Subphyl...</td>\n",
" <td>202423|914154|914156|158852|331030|914179|9141...</td>\n",
" <td>176996</td>\n",
" <td>0</td>\n",
" <td>2021-06-23T03:50:03Z</td>\n",
" <td>2</td>\n",
" <td>Exact match by canonical form</td>\n",
" <td>3|0|0</td>\n",
" <td>0.988</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>True</td>\n",
" <td>Brachyramphus marmoratus</td>\n",
" </tr>\n",
" <tr>\n",
" <th>3</th>\n",
" <td>M</td>\n",
" <td>B</td>\n",
" <td>Calonectris diomedea</td>\n",
" <td>535</td>\n",
" <td>485776.000</td>\n",
" <td>(154)</td>\n",
" <td>12</td>\n",
" <td>Encyclopedia of Life</td>\n",
" <td>d1c5f3ee-b763-5bf4-be74-de78125d2a43</td>\n",
" <td>Calonectris diomedea</td>\n",
" <td>Calonectris diomedea</td>\n",
" <td></td>\n",
" <td></td>\n",
" <td></td>\n",
" <td>51900544</td>\n",
" <td>51900544</td>\n",
" <td>0</td>\n",
" <td>2021-06-24T17:22:56Z</td>\n",
" <td>1</td>\n",
" <td>Exact string match</td>\n",
" <td>3|0|0</td>\n",
" <td>0.988</td>\n",
" <td>True</td>\n",
" <td>Calonectris diomedea</td>\n",
" <td>9.0</td>\n",
" <td>World Register of Marine Species</td>\n",
" <td>2c4fe23c-30fb-5e01-9894-b4f72766bc8f</td>\n",
" <td>Calonectris diomedea (Scopoli, 1769)</td>\n",
" <td>Calonectris diomedea</td>\n",
" <td>Biota|Animalia|Chordata|Vertebrata|Gnathostoma...</td>\n",
" <td>|Kingdom|Phylum|Subphylum|Infraphylum|Supercla...</td>\n",
" <td>urn:lsid:marinespecies.org:taxname:1|urn:lsid:...</td>\n",
" <td>urn:lsid:marinespecies.org:taxname:137194</td>\n",
" <td>0.0</td>\n",
" <td>2021-06-23T14:04:04Z</td>\n",
" <td>2.0</td>\n",
" <td>Exact match by canonical form</td>\n",
" <td>3|0|0</td>\n",
" <td>0.988</td>\n",
" <td>NaN</td>\n",
" <td>...</td>\n",
" <td>True</td>\n",
" <td>Calonectris diomedea</td>\n",
" <td>11</td>\n",
" <td>GBIF Backbone Taxonomy</td>\n",
" <td>2c4fe23c-30fb-5e01-9894-b4f72766bc8f</td>\n",
" <td>Calonectris diomedea (Scopoli, 1769)</td>\n",
" <td>Calonectris diomedea</td>\n",
" <td>Animalia|Chordata|Aves|Procellariiformes|Proce...</td>\n",
" <td>kingdom|phylum|class|order|family|genus|species</td>\n",
" <td>1|44|212|7192755|9339|2481517|2481521</td>\n",
" <td>2481521</td>\n",
" <td>0</td>\n",
" <td>2021-06-23T23:59:23Z</td>\n",
" <td>2</td>\n",
" <td>Exact match by canonical form</td>\n",
" <td>3|0|0</td>\n",
" <td>0.988</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>True</td>\n",
" <td>Calonectris diomedea</td>\n",
" <td>3</td>\n",
" <td>Integrated Taxonomic Information SystemITIS</td>\n",
" <td>2c4fe23c-30fb-5e01-9894-b4f72766bc8f</td>\n",
" <td>Calonectris diomedea (Scopoli, 1769)</td>\n",
" <td>Calonectris diomedea</td>\n",
" <td>Animalia|Bilateria|Deuterostomia|Chordata|Vert...</td>\n",
" <td>Kingdom|Subkingdom|Infrakingdom|Phylum|Subphyl...</td>\n",
" <td>202423|914154|914156|158852|331030|914179|9141...</td>\n",
" <td>203446</td>\n",
" <td>0</td>\n",
" <td>2021-06-23T03:49:58Z</td>\n",
" <td>2</td>\n",
" <td>Exact match by canonical form</td>\n",
" <td>3|0|0</td>\n",
" <td>0.988</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>True</td>\n",
" <td>Calonectris diomedea</td>\n",
" </tr>\n",
" <tr>\n",
" <th>4</th>\n",
" <td>M</td>\n",
" <td>B</td>\n",
" <td>Alca torda</td>\n",
" <td>600</td>\n",
" <td>2201.000</td>\n",
" <td>(155)</td>\n",
" <td>12</td>\n",
" <td>Encyclopedia of Life</td>\n",
" <td>5d7c4673-7f5b-5b74-ac0c-cfeb677ce2b1</td>\n",
" <td>Alca torda</td>\n",
" <td>Alca torda</td>\n",
" <td></td>\n",
" <td></td>\n",
" <td></td>\n",
" <td>45509345</td>\n",
" <td>45509345</td>\n",
" <td>0</td>\n",
" <td>2021-06-24T17:14:03Z</td>\n",
" <td>1</td>\n",
" <td>Exact string match</td>\n",
" <td>3|0|0</td>\n",
" <td>0.988</td>\n",
" <td>True</td>\n",
" <td>Alca torda</td>\n",
" <td>9.0</td>\n",
" <td>World Register of Marine Species</td>\n",
" <td>4ff6d04a-7e9a-5925-9ce3-3b6f11fa296a</td>\n",
" <td>Alca torda Linnaeus, 1758</td>\n",
" <td>Alca torda</td>\n",
" <td>Biota|Animalia|Chordata|Vertebrata|Gnathostoma...</td>\n",
" <td>|Kingdom|Phylum|Subphylum|Infraphylum|Supercla...</td>\n",
" <td>urn:lsid:marinespecies.org:taxname:1|urn:lsid:...</td>\n",
" <td>urn:lsid:marinespecies.org:taxname:137128</td>\n",
" <td>0.0</td>\n",
" <td>2021-06-23T14:04:03Z</td>\n",
" <td>2.0</td>\n",
" <td>Exact match by canonical form</td>\n",
" <td>3|0|0</td>\n",
" <td>0.988</td>\n",
" <td>NaN</td>\n",
" <td>...</td>\n",
" <td>True</td>\n",
" <td>Alca torda</td>\n",
" <td>11</td>\n",
" <td>GBIF Backbone Taxonomy</td>\n",
" <td>4ff6d04a-7e9a-5925-9ce3-3b6f11fa296a</td>\n",
" <td>Alca torda Linnaeus, 1758</td>\n",
" <td>Alca torda</td>\n",
" <td>Animalia|Chordata|Aves|Charadriiformes|Alcidae...</td>\n",
" <td>kingdom|phylum|class|order|family|genus|species</td>\n",
" <td>1|44|212|7192402|2985|2481308|8277073</td>\n",
" <td>8277073</td>\n",
" <td>0</td>\n",
" <td>2021-06-24T03:34:18Z</td>\n",
" <td>2</td>\n",
" <td>Exact match by canonical form</td>\n",
" <td>3|0|0</td>\n",
" <td>0.988</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>True</td>\n",
" <td>Alca torda</td>\n",
" <td>3</td>\n",
" <td>Integrated Taxonomic Information SystemITIS</td>\n",
" <td>4ff6d04a-7e9a-5925-9ce3-3b6f11fa296a</td>\n",
" <td>Alca torda Linnaeus, 1758</td>\n",
" <td>Alca torda</td>\n",
" <td>Animalia|Bilateria|Deuterostomia|Chordata|Vert...</td>\n",
" <td>Kingdom|Subkingdom|Infrakingdom|Phylum|Subphyl...</td>\n",
" <td>202423|914154|914156|158852|331030|914179|9141...</td>\n",
" <td>176971</td>\n",
" <td>0</td>\n",
" <td>2021-06-23T03:50:03Z</td>\n",
" <td>2</td>\n",
" <td>Exact match by canonical form</td>\n",
" <td>3|0|0</td>\n",
" <td>0.988</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>True</td>\n",
" <td>Alca torda</td>\n",
" </tr>\n",
" <tr>\n",
" <th>...</th>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" </tr>\n",
" <tr>\n",
" <th>95</th>\n",
" <td>T</td>\n",
" <td>B</td>\n",
" <td>Habia rubica</td>\n",
" <td>33</td>\n",
" <td>0.049</td>\n",
" <td>(221)</td>\n",
" <td>12</td>\n",
" <td>Encyclopedia of Life</td>\n",
" <td>734a7ff9-6ac2-54b4-a211-45481c1a084d</td>\n",
" <td>Habia rubica</td>\n",
" <td>Habia rubica</td>\n",
" <td></td>\n",
" <td></td>\n",
" <td></td>\n",
" <td>921778</td>\n",
" <td>921778</td>\n",
" <td>0</td>\n",
" <td>2021-06-24T16:16:44Z</td>\n",
" <td>1</td>\n",
" <td>Exact string match</td>\n",
" <td>3|0|0</td>\n",
" <td>0.988</td>\n",
" <td>True</td>\n",
" <td>Habia rubica</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>No match found</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>...</td>\n",
" <td>True</td>\n",
" <td>Habia rubica</td>\n",
" <td>11</td>\n",
" <td>GBIF Backbone Taxonomy</td>\n",
" <td>4b5708dc-817b-53b9-8721-e2e949ccbf9d</td>\n",
" <td>Habia rubica (Vieillot, 1817)</td>\n",
" <td>Habia rubica</td>\n",
" <td>Animalia|Chordata|Aves|Passeriformes|Cardinali...</td>\n",
" <td>kingdom|phylum|class|order|family|genus|species</td>\n",
" <td>1|44|212|729|9285|5230696|5230705</td>\n",
" <td>5230705</td>\n",
" <td>0</td>\n",
" <td>2021-06-24T02:20:25Z</td>\n",
" <td>2</td>\n",
" <td>Exact match by canonical form</td>\n",
" <td>3|0|0</td>\n",
" <td>0.988</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>True</td>\n",
" <td>Habia rubica</td>\n",
" <td>3</td>\n",
" <td>Integrated Taxonomic Information SystemITIS</td>\n",
" <td>4b5708dc-817b-53b9-8721-e2e949ccbf9d</td>\n",
" <td>Habia rubica (Vieillot, 1817)</td>\n",
" <td>Habia rubica</td>\n",
" <td>Animalia|Bilateria|Deuterostomia|Chordata|Vert...</td>\n",
" <td>Kingdom|Subkingdom|Infrakingdom|Phylum|Subphyl...</td>\n",
" <td>202423|914154|914156|158852|331030|914179|9141...</td>\n",
" <td>560348</td>\n",
" <td>0</td>\n",
" <td>2021-06-23T03:52:19Z</td>\n",
" <td>2</td>\n",
" <td>Exact match by canonical form</td>\n",
" <td>3|0|0</td>\n",
" <td>0.988</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>True</td>\n",
" <td>Habia rubica</td>\n",
" </tr>\n",
" <tr>\n",
" <th>96</th>\n",
" <td>T</td>\n",
" <td>B</td>\n",
" <td>Habia fuscicauda</td>\n",
" <td>38</td>\n",
" <td>0.061</td>\n",
" <td>(221)</td>\n",
" <td>12</td>\n",
" <td>Encyclopedia of Life</td>\n",
" <td>d01ecfc5-69be-5797-a414-8831f8a0cc69</td>\n",
" <td>Habia fuscicauda</td>\n",
" <td>Habia fuscicauda</td>\n",
" <td></td>\n",
" <td></td>\n",
" <td></td>\n",
" <td>921776</td>\n",
" <td>921776</td>\n",
" <td>0</td>\n",
" <td>2021-06-24T16:16:44Z</td>\n",
" <td>1</td>\n",
" <td>Exact string match</td>\n",
" <td>3|0|0</td>\n",
" <td>0.988</td>\n",
" <td>True</td>\n",
" <td>Habia fuscicauda</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>No match found</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>...</td>\n",
" <td>True</td>\n",
" <td>Habia fuscicauda</td>\n",
" <td>11</td>\n",
" <td>GBIF Backbone Taxonomy</td>\n",
" <td>f1dd20fe-6493-5129-a552-a423c50a1c66</td>\n",
" <td>Habia fuscicauda (Cabanis, 1861)</td>\n",
" <td>Habia fuscicauda</td>\n",
" <td>Animalia|Chordata|Aves|Passeriformes|Cardinali...</td>\n",
" <td>kingdom|phylum|class|order|family|genus|species</td>\n",
" <td>1|44|212|729|9285|5230696|5230697</td>\n",
" <td>5230697</td>\n",
" <td>0</td>\n",
" <td>2021-06-24T02:20:25Z</td>\n",
" <td>2</td>\n",
" <td>Exact match by canonical form</td>\n",
" <td>3|0|0</td>\n",
" <td>0.988</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>True</td>\n",
" <td>Habia fuscicauda</td>\n",
" <td>3</td>\n",
" <td>Integrated Taxonomic Information SystemITIS</td>\n",
" <td>f1dd20fe-6493-5129-a552-a423c50a1c66</td>\n",
" <td>Habia fuscicauda (Cabanis, 1861)</td>\n",
" <td>Habia fuscicauda</td>\n",
" <td>Animalia|Bilateria|Deuterostomia|Chordata|Vert...</td>\n",
" <td>Kingdom|Subkingdom|Infrakingdom|Phylum|Subphyl...</td>\n",
" <td>202423|914154|914156|158852|331030|914179|9141...</td>\n",
" <td>560346</td>\n",
" <td>0</td>\n",
" <td>2021-06-23T03:52:19Z</td>\n",
" <td>2</td>\n",
" <td>Exact match by canonical form</td>\n",
" <td>3|0|0</td>\n",
" <td>0.988</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>True</td>\n",
" <td>Habia fuscicauda</td>\n",
" </tr>\n",
" <tr>\n",
" <th>97</th>\n",
" <td>T</td>\n",
" <td>B</td>\n",
" <td>Dasyornis brachypterus</td>\n",
" <td>42</td>\n",
" <td>0.100</td>\n",
" <td>(222);(223)</td>\n",
" <td>12</td>\n",
" <td>Encyclopedia of Life</td>\n",
" <td>2b2e6e76-5c5d-5d86-ad3a-5c2b74613c30</td>\n",
" <td>Dasyornis brachypterus</td>\n",
" <td>Dasyornis brachypterus</td>\n",
" <td></td>\n",
" <td></td>\n",
" <td></td>\n",
" <td>45518088</td>\n",
" <td>45518088</td>\n",
" <td>0</td>\n",
" <td>2021-06-24T16:12:38Z</td>\n",
" <td>1</td>\n",
" <td>Exact string match</td>\n",
" <td>3|0|0</td>\n",
" <td>0.988</td>\n",
" <td>True</td>\n",
" <td>Dasyornis brachypterus</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>No match found</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>...</td>\n",
" <td>True</td>\n",
" <td>Dasyornis brachypterus</td>\n",
" <td>11</td>\n",
" <td>GBIF Backbone Taxonomy</td>\n",
" <td>46dc4c95-fff7-5463-89bd-8b19182f074f</td>\n",
" <td>Dasyornis brachypterus (Latham, 1802)</td>\n",
" <td>Dasyornis brachypterus</td>\n",
" <td>Animalia|Chordata|Aves|Passeriformes|Dasyornit...</td>\n",
" <td>kingdom|phylum|class|order|family|genus|species</td>\n",
" <td>1|44|212|729|5788774|5230412|5230413</td>\n",
" <td>5230413</td>\n",
" <td>0</td>\n",
" <td>2021-06-24T02:20:23Z</td>\n",
" <td>2</td>\n",
" <td>Exact match by canonical form</td>\n",
" <td>3|0|0</td>\n",
" <td>0.988</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>True</td>\n",
" <td>Dasyornis brachypterus</td>\n",
" <td>3</td>\n",
" <td>Integrated Taxonomic Information SystemITIS</td>\n",
" <td>46dc4c95-fff7-5463-89bd-8b19182f074f</td>\n",
" <td>Dasyornis brachypterus (Latham, 1802)</td>\n",
" <td>Dasyornis brachypterus</td>\n",
" <td>Animalia|Bilateria|Deuterostomia|Chordata|Vert...</td>\n",
" <td>Kingdom|Subkingdom|Infrakingdom|Phylum|Subphyl...</td>\n",
" <td>202423|914154|914156|158852|331030|914179|9141...</td>\n",
" <td>559685</td>\n",
" <td>0</td>\n",
" <td>2021-06-23T03:52:18Z</td>\n",
" <td>2</td>\n",
" <td>Exact match by canonical form</td>\n",
" <td>3|0|0</td>\n",
" <td>0.988</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>True</td>\n",
" <td>Dasyornis brachypterus</td>\n",
" </tr>\n",
" <tr>\n",
" <th>98</th>\n",
" <td>T</td>\n",
" <td>B</td>\n",
" <td>Lanius ludovicianus</td>\n",
" <td>48</td>\n",
" <td>0.073</td>\n",
" <td>(221|224–227)</td>\n",
" <td>12</td>\n",
" <td>Encyclopedia of Life</td>\n",
" <td>1332f6b2-0911-5802-b6d5-7a4aa2666dc1</td>\n",
" <td>Lanius ludovicianus</td>\n",
" <td>Lanius ludovicianus</td>\n",
" <td></td>\n",
" <td></td>\n",
" <td></td>\n",
" <td>45510944</td>\n",
" <td>45510944</td>\n",
" <td>0</td>\n",
" <td>2021-06-24T16:16:33Z</td>\n",
" <td>1</td>\n",
" <td>Exact string match</td>\n",
" <td>3|0|0</td>\n",
" <td>0.988</td>\n",
" <td>True</td>\n",
" <td>Lanius ludovicianus</td>\n",
" <td>9.0</td>\n",
" <td>World Register of Marine Species</td>\n",
" <td>4aa2abb8-7617-5996-bed3-ca66116ba2d4</td>\n",
" <td>Lanius ludovicianus Linnaeus, 1766</td>\n",
" <td>Lanius ludovicianus</td>\n",
" <td>Biota|Animalia|Chordata|Vertebrata|Gnathostoma...</td>\n",
" <td>|Kingdom|Phylum|Subphylum|Infraphylum|Supercla...</td>\n",
" <td>urn:lsid:marinespecies.org:taxname:1|urn:lsid:...</td>\n",
" <td>urn:lsid:marinespecies.org:taxname:1484469</td>\n",
" <td>0.0</td>\n",
" <td>2021-06-23T14:39:12Z</td>\n",
" <td>2.0</td>\n",
" <td>Exact match by canonical form</td>\n",
" <td>3|0|0</td>\n",
" <td>0.988</td>\n",
" <td>NaN</td>\n",
" <td>...</td>\n",
" <td>True</td>\n",
" <td>Lanius ludovicianus</td>\n",
" <td>11</td>\n",
" <td>GBIF Backbone Taxonomy</td>\n",
" <td>4aa2abb8-7617-5996-bed3-ca66116ba2d4</td>\n",
" <td>Lanius ludovicianus Linnaeus, 1766</td>\n",
" <td>Lanius ludovicianus</td>\n",
" <td>Animalia|Chordata|Aves|Passeriformes|Laniidae|...</td>\n",
" <td>kingdom|phylum|class|order|family|genus|species</td>\n",
" <td>1|44|212|729|9315|2483315|2492870</td>\n",
" <td>2492870</td>\n",
" <td>0</td>\n",
" <td>2021-06-24T00:02:08Z</td>\n",
" <td>2</td>\n",
" <td>Exact match by canonical form</td>\n",
" <td>3|0|0</td>\n",
" <td>0.988</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>True</td>\n",
" <td>Lanius ludovicianus</td>\n",
" <td>3</td>\n",
" <td>Integrated Taxonomic Information SystemITIS</td>\n",
" <td>4aa2abb8-7617-5996-bed3-ca66116ba2d4</td>\n",
" <td>Lanius ludovicianus Linnaeus, 1766</td>\n",
" <td>Lanius ludovicianus</td>\n",
" <td>Animalia|Bilateria|Deuterostomia|Chordata|Vert...</td>\n",
" <td>Kingdom|Subkingdom|Infrakingdom|Phylum|Subphyl...</td>\n",
" <td>202423|914154|914156|158852|331030|914179|9141...</td>\n",
" <td>178515</td>\n",
" <td>0</td>\n",
" <td>2021-06-23T03:50:09Z</td>\n",
" <td>2</td>\n",
" <td>Exact match by canonical form</td>\n",
" <td>3|0|0</td>\n",
" <td>0.988</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>True</td>\n",
" <td>Lanius ludovicianus</td>\n",
" </tr>\n",
" <tr>\n",
" <th>99</th>\n",
" <td>T</td>\n",
" <td>B</td>\n",
" <td>Laniuis ludovicianus</td>\n",
" <td>48</td>\n",
" <td>0.076</td>\n",
" <td>(221)</td>\n",
" <td>12</td>\n",
" <td>Encyclopedia of Life</td>\n",
" <td>1332f6b2-0911-5802-b6d5-7a4aa2666dc1</td>\n",
" <td>Lanius ludovicianus</td>\n",
" <td>Lanius ludovicianus</td>\n",
" <td></td>\n",
" <td></td>\n",
" <td></td>\n",
" <td>45510944</td>\n",
" <td>45510944</td>\n",
" <td>1</td>\n",
" <td>2021-06-24T16:16:33Z</td>\n",
" <td>3</td>\n",
" <td>Fuzzy match by canonical form</td>\n",
" <td>1|0|0</td>\n",
" <td>0.750</td>\n",
" <td>False</td>\n",
" <td>Laniuis ludovicianus</td>\n",
" <td>9.0</td>\n",
" <td>World Register of Marine Species</td>\n",
" <td>4aa2abb8-7617-5996-bed3-ca66116ba2d4</td>\n",
" <td>Lanius ludovicianus Linnaeus, 1766</td>\n",
" <td>Lanius ludovicianus</td>\n",
" <td>Biota|Animalia|Chordata|Vertebrata|Gnathostoma...</td>\n",
" <td>|Kingdom|Phylum|Subphylum|Infraphylum|Supercla...</td>\n",
" <td>urn:lsid:marinespecies.org:taxname:1|urn:lsid:...</td>\n",
" <td>urn:lsid:marinespecies.org:taxname:1484469</td>\n",
" <td>1.0</td>\n",
" <td>2021-06-23T14:39:12Z</td>\n",
" <td>3.0</td>\n",
" <td>Fuzzy match by canonical form</td>\n",
" <td>1|0|0</td>\n",
" <td>0.750</td>\n",
" <td>NaN</td>\n",
" <td>...</td>\n",
" <td>False</td>\n",
" <td>Laniuis ludovicianus</td>\n",
" <td>11</td>\n",
" <td>GBIF Backbone Taxonomy</td>\n",
" <td>4aa2abb8-7617-5996-bed3-ca66116ba2d4</td>\n",
" <td>Lanius ludovicianus Linnaeus, 1766</td>\n",
" <td>Lanius ludovicianus</td>\n",
" <td>Animalia|Chordata|Aves|Passeriformes|Laniidae|...</td>\n",
" <td>kingdom|phylum|class|order|family|genus|species</td>\n",
" <td>1|44|212|729|9315|2483315|2492870</td>\n",
" <td>2492870</td>\n",
" <td>1</td>\n",
" <td>2021-06-24T00:02:08Z</td>\n",
" <td>3</td>\n",
" <td>Fuzzy match by canonical form</td>\n",
" <td>1|0|0</td>\n",
" <td>0.750</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>False</td>\n",
" <td>Laniuis ludovicianus</td>\n",
" <td>3</td>\n",
" <td>Integrated Taxonomic Information SystemITIS</td>\n",
" <td>4aa2abb8-7617-5996-bed3-ca66116ba2d4</td>\n",
" <td>Lanius ludovicianus Linnaeus, 1766</td>\n",
" <td>Lanius ludovicianus</td>\n",
" <td>Animalia|Bilateria|Deuterostomia|Chordata|Vert...</td>\n",
" <td>Kingdom|Subkingdom|Infrakingdom|Phylum|Subphyl...</td>\n",
" <td>202423|914154|914156|158852|331030|914179|9141...</td>\n",
" <td>178515</td>\n",
" <td>1</td>\n",
" <td>2021-06-23T03:50:09Z</td>\n",
" <td>3</td>\n",
" <td>Fuzzy match by canonical form</td>\n",
" <td>1|0|0</td>\n",
" <td>0.750</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>False</td>\n",
" <td>Laniuis ludovicianus</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"<p>100 rows × 98 columns</p>\n",
"</div>"
],
"text/plain": [
" System Group ... ITIS_is_known_name ITIS_supplied_name_string\n",
"0 M B ... True Sterna forsteri\n",
"1 M B ... True Ptychoramphus aleuticus\n",
"2 M B ... True Brachyramphus marmoratus\n",
"3 M B ... True Calonectris diomedea\n",
"4 M B ... True Alca torda\n",
".. ... ... ... ... ...\n",
"95 T B ... True Habia rubica\n",
"96 T B ... True Habia fuscicauda\n",
"97 T B ... True Dasyornis brachypterus\n",
"98 T B ... True Lanius ludovicianus\n",
"99 T B ... False Laniuis ludovicianus\n",
"\n",
"[100 rows x 98 columns]"
]
},
"metadata": {},
"execution_count": 92
}
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "vwfGe3d5OPU4"
},
"source": [
"# Looking at \"taxon_id\" variation\n",
"\n",
"Keep in mind that type of id returned in \"taxon_id\" vary amongst the returned results from different sources. Many are specific to that source and not globally unique. \n",
"\n",
"For some sources the taxon_id is a globally unique identifier, [Life Science identifier (LSID](http://www.lsid.info/)), but for others it is an identifier specific just to that source. \n",
"\n",
"For example, the taxon_id in the results from WoRMS \"WoRMS_taxon_id\" is the global LSID (e.g. `urn:lsid:marinespecies.org:taxname:159057`). \n",
"\n",
"In the ITIS results \"ITIS_taxon_id\" is the TSN identifier (e.g. `176887`) which is not globaly unique on its own. It would need be be prepended with \"urn:lsid:itis.gov:itis_tsn:\" to become an LSID.\n",
"\n",
"---\n",
"\n",
"> If you use the [DarwinCore](https://dwc.tdwg.org/) standard, the LSID is what you would include in the [scientificNameID](https://dwc.tdwg.org/terms/#dwc:scientificNameID).\" \n",
"> * e.g. WoRMS LSID `urn:lsid:marinespecies.org:taxname:159057`\n",
"\n"
]
}
]
}
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment