epifanio/Join_dataframes.ipynb

## Join_dataframes.ipynb

      
Display the source blob

    
Display the rendered blob

    
    Raw
  

              Join_dataframes.ipynb
            
          
      Sorry, something went wrong. Reload?
      Sorry, we cannot display this file.
      Sorry, this file is invalid so it cannot be displayed.
      
          Viewer requires iframe.
      
    
## names_check.ipynb

      
Display the source blob

    
Display the rendered blob

    
    Raw
  

              names_check.ipynb
            
          
      Sorry, something went wrong. Reload?
      Sorry, we cannot display this file.
      Sorry, this file is invalid so it cannot be displayed.
      
          Viewer requires iframe.
      
    
## species clean 0 custom names.ipynb

      
Display the source blob

    
Display the rendered blob

    
    Raw
  

              species clean 0 custom names.ipynb
            
          
      Sorry, something went wrong. Reload?
      Sorry, we cannot display this file.
      Sorry, this file is invalid so it cannot be displayed.
      
          Viewer requires iframe.
      
    
## species clean 2 morphospecies.ipynb

      
Display the source blob

    
Display the rendered blob

    
    Raw
  

              species clean 2 morphospecies.ipynb
            
          
      Sorry, something went wrong. Reload?
      Sorry, we cannot display this file.
      Sorry, this file is invalid so it cannot be displayed.
      
          Viewer requires iframe.
      
    
## species clean 3 provisional names.ipynb
{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Custom names Part 3: Provisional names\n",
    "Names awaiting confirmation from an expert. They are usually coded using \"cf\" but they have to be recoded so as not to be confused with unsure identifications"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 1,
   "metadata": {
    "scrolled": true
   },
   "outputs": [],
   "source": [
    "import pandas as pd"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "metadata": {},
   "outputs": [],
   "source": [
    "from fuzzyutil import *"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 3,
   "metadata": {},
   "outputs": [],
   "source": [
    "from fuzzyutil import tidy"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 4,
   "metadata": {},
   "outputs": [],
   "source": [
    "df = pd.read_csv('names_v2.csv', encoding='latin1')"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## List of candidate names\n",
    "Names containing \"cf\""
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 5,
   "metadata": {},
   "outputs": [],
   "source": [
    "unconfirmed = df['Taxonomy'][((df['Taxonomy'].str.contains('cf'))) & (df['Status']==False)]"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 6,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "1252"
      ]
     },
     "execution_count": 6,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "len(unconfirmed)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## List of provisional names"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 7,
   "metadata": {},
   "outputs": [],
   "source": [
    "await_confirm = [\"Grantia compressa TBC\"]"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 8,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "[['Grantia compressa cf', 'Grantia compressa TBC', 83]]"
      ]
     },
     "execution_count": 8,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "corrector = matchinglist(unconfirmed, await_confirm,scorelimit=80, method='token_sort')\n",
    "corrector"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "* Replace and set status as OK"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 9,
   "metadata": {},
   "outputs": [
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "/usr/local/lib/python3.7/dist-packages/ipykernel_launcher.py:2: SettingWithCopyWarning: \n",
      "A value is trying to be set on a copy of a slice from a DataFrame\n",
      "\n",
      "See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy\n",
      "  \n",
      "/usr/local/lib/python3.7/dist-packages/ipykernel_launcher.py:3: SettingWithCopyWarning: \n",
      "A value is trying to be set on a copy of a slice from a DataFrame\n",
      "\n",
      "See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy\n",
      "  This is separate from the ipykernel package so we can avoid doing imports until\n"
     ]
    }
   ],
   "source": [
    "for i in corrector:\n",
    "    df['To_name'][df['Taxonomy']==i[0]] = i[1]\n",
    "    df['Status'][df['Taxonomy']==i[0]] = True"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 10,
   "metadata": {},
   "outputs": [],
   "source": [
    "df.to_csv('names_v3.csv', index=False, encoding='latin1')"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "* Find Tethya citrina and replace with Craniella cranium\n",
    "* Filograna implexa cf in R1017, R1001 can be corrected to Filograna implexa\n",
    "* sort out Ophiuroidea etc."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": []
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.7.3"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 2
}

## species clean 4 lazy.ipynb

      
Display the source blob

    
Display the rendered blob

    
    Raw
  

              species clean 4 lazy.ipynb
            
          
      Sorry, something went wrong. Reload?
      Sorry, we cannot display this file.
      Sorry, this file is invalid so it cannot be displayed.
      
          Viewer requires iframe.
      
    
## Species clean order by similarity.ipynb

      
Display the source blob

    
Display the rendered blob

    
    Raw
  

              Species clean order by similarity.ipynb
            
          
      Sorry, something went wrong. Reload?
      Sorry, we cannot display this file.
      Sorry, this file is invalid so it cannot be displayed.
      
          Viewer requires iframe.
      
    
## Species Clean Summary.ipynb

      
Display the source blob

    
Display the rendered blob

    
    Raw
  

              Species Clean Summary.ipynb
            
          
      Sorry, something went wrong. Reload?
      Sorry, we cannot display this file.
      Sorry, this file is invalid so it cannot be displayed.
      
          Viewer requires iframe.
      
    
## species clean 5 worms-DEV-V2.ipynb

      
Display the source blob

    
Display the rendered blob

    
    Raw
  

              species clean 5 worms-DEV-V2.ipynb
            
          
      Sorry, something went wrong. Reload?
      Sorry, we cannot display this file.
      Sorry, this file is invalid so it cannot be displayed.
      
          Viewer requires iframe.
	{
	"cells": [
	{
	"cell_type": "markdown",
	"metadata": {},
	"source": [
	"# Custom names Part 3: Provisional names\n",
	"Names awaiting confirmation from an expert. They are usually coded using \"cf\" but they have to be recoded so as not to be confused with unsure identifications"
	]
	},
	{
	"cell_type": "code",
	"execution_count": 1,
	"metadata": {
	"scrolled": true
	},
	"outputs": [],
	"source": [
	"import pandas as pd"
	]
	},
	{
	"cell_type": "code",
	"execution_count": 2,
	"metadata": {},
	"outputs": [],
	"source": [
	"from fuzzyutil import *"
	]
	},
	{
	"cell_type": "code",
	"execution_count": 3,
	"metadata": {},
	"outputs": [],
	"source": [
	"from fuzzyutil import tidy"
	]
	},
	{
	"cell_type": "code",
	"execution_count": 4,
	"metadata": {},
	"outputs": [],
	"source": [
	"df = pd.read_csv('names_v2.csv', encoding='latin1')"
	]
	},
	{
	"cell_type": "markdown",
	"metadata": {},
	"source": [
	"## List of candidate names\n",
	"Names containing \"cf\""
	]
	},
	{
	"cell_type": "code",
	"execution_count": 5,
	"metadata": {},
	"outputs": [],
	"source": [
	"unconfirmed = df['Taxonomy'][((df['Taxonomy'].str.contains('cf'))) & (df['Status']==False)]"
	]
	},
	{
	"cell_type": "code",
	"execution_count": 6,
	"metadata": {},
	"outputs": [
	{
	"data": {
	"text/plain": [
	"1252"
	]
	},
	"execution_count": 6,
	"metadata": {},
	"output_type": "execute_result"
	}
	],
	"source": [
	"len(unconfirmed)"
	]
	},
	{
	"cell_type": "markdown",
	"metadata": {},
	"source": [
	"## List of provisional names"
	]
	},
	{
	"cell_type": "code",
	"execution_count": 7,
	"metadata": {},
	"outputs": [],
	"source": [
	"await_confirm = [\"Grantia compressa TBC\"]"
	]
	},
	{
	"cell_type": "code",
	"execution_count": 8,
	"metadata": {},
	"outputs": [
	{
	"data": {
	"text/plain": [
	"[['Grantia compressa cf', 'Grantia compressa TBC', 83]]"
	]
	},
	"execution_count": 8,
	"metadata": {},
	"output_type": "execute_result"
	}
	],
	"source": [
	"corrector = matchinglist(unconfirmed, await_confirm,scorelimit=80, method='token_sort')\n",
	"corrector"
	]
	},
	{
	"cell_type": "markdown",
	"metadata": {},
	"source": [
	"* Replace and set status as OK"
	]
	},
	{
	"cell_type": "code",
	"execution_count": 9,
	"metadata": {},
	"outputs": [
	{
	"name": "stderr",
	"output_type": "stream",
	"text": [
	"/usr/local/lib/python3.7/dist-packages/ipykernel_launcher.py:2: SettingWithCopyWarning: \n",
	"A value is trying to be set on a copy of a slice from a DataFrame\n",
	"\n",
	"See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy\n",
	" \n",
	"/usr/local/lib/python3.7/dist-packages/ipykernel_launcher.py:3: SettingWithCopyWarning: \n",
	"A value is trying to be set on a copy of a slice from a DataFrame\n",
	"\n",
	"See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy\n",
	" This is separate from the ipykernel package so we can avoid doing imports until\n"
	]
	}
	],
	"source": [
	"for i in corrector:\n",
	" df['To_name'][df['Taxonomy']==i[0]] = i[1]\n",
	" df['Status'][df['Taxonomy']==i[0]] = True"
	]
	},
	{
	"cell_type": "code",
	"execution_count": 10,
	"metadata": {},
	"outputs": [],
	"source": [
	"df.to_csv('names_v3.csv', index=False, encoding='latin1')"
	]
	},
	{
	"cell_type": "markdown",
	"metadata": {},
	"source": [
	"* Find Tethya citrina and replace with Craniella cranium\n",
	"* Filograna implexa cf in R1017, R1001 can be corrected to Filograna implexa\n",
	"* sort out Ophiuroidea etc."
	]
	},
	{
	"cell_type": "code",
	"execution_count": null,
	"metadata": {},
	"outputs": [],
	"source": []
	}
	],
	"metadata": {
	"kernelspec": {
	"display_name": "Python 3",
	"language": "python",
	"name": "python3"
	},
	"language_info": {
	"codemirror_mode": {
	"name": "ipython",
	"version": 3
	},
	"file_extension": ".py",
	"mimetype": "text/x-python",
	"name": "python",
	"nbconvert_exporter": "python",
	"pygments_lexer": "ipython3",
	"version": "3.7.3"
	}
	},
	"nbformat": 4,
	"nbformat_minor": 2
	}