Skip to content

Instantly share code, notes, and snippets.

@epifanio
Created November 14, 2019 12:45
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save epifanio/96d6fb55bb857c99590b6c7fc1c029fa to your computer and use it in GitHub Desktop.
Save epifanio/96d6fb55bb857c99590b6c7fc1c029fa to your computer and use it in GitHub Desktop.
Display the source blob
Display the rendered blob
Raw
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Display the source blob
Display the rendered blob
Raw
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Display the source blob
Display the rendered blob
Raw
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Display the source blob
Display the rendered blob
Raw
{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Custom names Part 2: Morphospecies\n",
"Names devised for our own purposes of taxa that can be equated to species (e.g. \"Corymorpha sand stolon\")"
]
},
{
"cell_type": "code",
"execution_count": 1,
"metadata": {
"scrolled": true
},
"outputs": [],
"source": [
"import pandas as pd"
]
},
{
"cell_type": "code",
"execution_count": 2,
"metadata": {},
"outputs": [],
"source": [
"from fuzzyutil import *"
]
},
{
"cell_type": "code",
"execution_count": 3,
"metadata": {},
"outputs": [],
"source": [
"#from fuzzyutil import tidy"
]
},
{
"cell_type": "code",
"execution_count": 4,
"metadata": {},
"outputs": [],
"source": [
"df = pd.read_csv('names_v1.csv', encoding='latin1')"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## List of candidate names\n",
"All names are potential candidates"
]
},
{
"cell_type": "code",
"execution_count": 5,
"metadata": {},
"outputs": [],
"source": [
"unique_allspecies = df['Taxonomy'][df['Status']==False]"
]
},
{
"cell_type": "code",
"execution_count": 6,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"8245"
]
},
"execution_count": 6,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"len(unique_allspecies)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Remove the unsure ids from the list of candidate names"
]
},
{
"cell_type": "code",
"execution_count": 7,
"metadata": {},
"outputs": [],
"source": [
"# this needs to be done better because some norwegian letters and symbols turn into question marks\n",
"unsure = df['Taxonomy'][((df['Taxonomy'].str.contains('cf')) | (df['Taxonomy'].str.contains('/')) | (df['Taxonomy'].str.contains('\\?')) & (df['Status']==False))]"
]
},
{
"cell_type": "code",
"execution_count": 8,
"metadata": {},
"outputs": [],
"source": [
"unique_allspecies = [x for x in unique_allspecies if x not in unsure.values]"
]
},
{
"cell_type": "code",
"execution_count": 9,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"5736"
]
},
"execution_count": 9,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"len(unique_allspecies)"
]
},
{
"cell_type": "code",
"execution_count": 10,
"metadata": {},
"outputs": [],
"source": [
"unique_allspecies = pd.DataFrame(unique_allspecies, columns=['Taxonomy'])"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## List of available choices\n",
"Morphospecies"
]
},
{
"cell_type": "code",
"execution_count": 11,
"metadata": {},
"outputs": [],
"source": [
"Morphospecies = ['Corymorpha sand stolon',\n",
"'Polychaeta question mark',\n",
"'Actiniaria yellow stolon',\n",
"'Amphipoda shiny eyes',\n",
"'Polychaeta fishingnet',\n",
"'Porifera bat',\n",
"'Porifera brown papillae',\n",
"'Porifera coral',\n",
"'Porifera cupcake',\n",
"'Porifera dirty yellow',\n",
"'Porifera egg',\n",
"'Porifera lily',\n",
"'Porifera parabol',\n",
"'Porifera string',\n",
"'Porifera urn',\n",
"'Porifera window',\n",
"'Tunicata trunk']"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Compare to provided list"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"------------------------------------------------------------\n",
"\n",
"**Hack to deal with list instead of DF**"
]
},
{
"cell_type": "code",
"execution_count": 12,
"metadata": {},
"outputs": [],
"source": [
"#tt = list(unique_allspecies['Species'].values)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"------------------------------------------------------------\n"
]
},
{
"cell_type": "code",
"execution_count": 16,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"[['Actiniaria yellow stolon', 'Actiniaria yellow stolon', 100],\n",
" ['Amphipoda Red shiny eyes', 'Amphipoda shiny eyes', 91],\n",
" ['Amphipoda red shiny eys', 'Amphipoda shiny eyes', 88],\n",
" ['Amphipoda shiny eyes', 'Amphipoda shiny eyes', 100],\n",
" ['Corymorpha sand stolon', 'Corymorpha sand stolon', 100],\n",
" ['Polychaeta fishing net', 'Polychaeta fishingnet', 98],\n",
" ['Polychaeta fishingnet', 'Polychaeta fishingnet', 100],\n",
" ['Polychaeta fishingnet long', 'Polychaeta fishingnet', 89],\n",
" ['Polychaeta question mark', 'Polychaeta question mark', 100],\n",
" ['Polychaeta question mark', 'Polychaeta question mark', 100],\n",
" ['Porifera coral', 'Porifera coral', 100],\n",
" ['Porifera (dirty yellow)', 'Porifera dirty yellow', 100],\n",
" ['Porifera (whiteyellow dirty)', 'Porifera dirty yellow', 89],\n",
" ['Porifera (yellow-brown dirty)', 'Porifera dirty yellow', 88],\n",
" ['Porifera (yellow dirty)', 'Porifera dirty yellow', 100],\n",
" ['Porifera (yellow small dirty)', 'Porifera dirty yellow', 88],\n",
" ['Porifera coral', 'Porifera coral', 100],\n",
" ['porifera bat', 'Porifera bat', 100],\n",
" ['Porifera bat', 'Porifera bat', 100],\n",
" ['Porifera brown papillae', 'Porifera brown papillae', 100],\n",
" ['porifera coral', 'Porifera coral', 100],\n",
" ['Porifera coral', 'Porifera coral', 100],\n",
" ['POrifera coral', 'Porifera coral', 100],\n",
" ['Porifera coral', 'Porifera coral', 100],\n",
" ['Porifera coral', 'Porifera coral', 100],\n",
" ['Porifera cupcake', 'Porifera cupcake', 100],\n",
" ['Porifera dirty yellow', 'Porifera dirty yellow', 100],\n",
" ['Porifera dirty yellow round', 'Porifera dirty yellow', 88],\n",
" ['Porifera dirty yellow small', 'Porifera dirty yellow', 88],\n",
" ['Porifera dirty yellow urne', 'Porifera dirty yellow', 89],\n",
" ['porifera egg', 'Porifera egg', 100],\n",
" ['Porifera egg', 'Porifera egg', 100],\n",
" ['Porifera lilla', 'Porifera lily', 89],\n",
" ['Porifera lily', 'Porifera lily', 100],\n",
" ['Porifera parabol', 'Porifera parabol', 100],\n",
" ['porifera string', 'Porifera string', 100],\n",
" ['Porifera string', 'Porifera string', 100],\n",
" ['Porifera urn', 'Porifera urn', 100],\n",
" ['porifera urne', 'Porifera urn', 96],\n",
" ['Porifera urne', 'Porifera urn', 96],\n",
" ['Porifera whiteyellow dirty', 'Porifera dirty yellow', 89],\n",
" ['Porifera window', 'Porifera window', 100],\n",
" ['Porifera yellow dirty', 'Porifera dirty yellow', 100],\n",
" ['Porifera yellow dirty small', 'Porifera dirty yellow', 88],\n",
" ['Porifera, bate', 'Porifera bat', 96],\n",
" ['Porifera, Bate', 'Porifera bat', 96],\n",
" ['Porifera brown papillae', 'Porifera brown papillae', 100],\n",
" ['Porifera coral', 'Porifera coral', 100],\n",
" ['Porifera dirty yellow', 'Porifera dirty yellow', 100],\n",
" ['Porifera parabol', 'Porifera parabol', 100],\n",
" ['Porigera egg', 'Porifera egg', 92],\n",
" ['poychaet fishing net', 'Polychaeta fishingnet', 93],\n",
" ['Tunicata truk', 'Tunicata trunk', 96],\n",
" ['Tunicata trunck', 'Tunicata trunk', 97],\n",
" ['Tunicata trunk', 'Tunicata trunk', 100]]"
]
},
"execution_count": 16,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"corrector = matchinglist(tidy(unique_allspecies['Taxonomy']), Morphospecies,scorelimit=87, method='token_sort',perfectmatch=True)\n",
"corrector"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"* Replace and set status as OK"
]
},
{
"cell_type": "code",
"execution_count": 15,
"metadata": {},
"outputs": [
{
"name": "stderr",
"output_type": "stream",
"text": [
"/usr/local/lib/python3.7/dist-packages/ipykernel_launcher.py:2: SettingWithCopyWarning: \n",
"A value is trying to be set on a copy of a slice from a DataFrame\n",
"\n",
"See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy\n",
" \n",
"/usr/local/lib/python3.7/dist-packages/ipykernel_launcher.py:3: SettingWithCopyWarning: \n",
"A value is trying to be set on a copy of a slice from a DataFrame\n",
"\n",
"See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy\n",
" This is separate from the ipykernel package so we can avoid doing imports until\n"
]
}
],
"source": [
"for i in corrector:\n",
" df['To_name'][tidy(df['Taxonomy'])==i[0]] = i[1]\n",
" df['Status'][tidy(df['Taxonomy'])==i[0]] = True"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"* Repeat until no more matches are found!"
]
},
{
"cell_type": "code",
"execution_count": 17,
"metadata": {},
"outputs": [],
"source": [
"df.to_csv('names_v2.csv', index=False, encoding='latin1')"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": []
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": []
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.7.3"
}
},
"nbformat": 4,
"nbformat_minor": 2
}
Display the source blob
Display the rendered blob
Raw
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Display the source blob
Display the rendered blob
Raw
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Display the source blob
Display the rendered blob
Raw
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Display the source blob
Display the rendered blob
Raw
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Display the source blob
Display the rendered blob
Raw
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment