Skip to content

Instantly share code, notes, and snippets.

Show Gist options
  • Save armish/45ffc4b1b40c99a5acd7610329d008f8 to your computer and use it in GitHub Desktop.
Save armish/45ffc4b1b40c99a5acd7610329d008f8 to your computer and use it in GitHub Desktop.
Use ChatGPT's new function return functionality to enrich an answer with HGNC's most recent annotation for gene-related questions
{
"cells": [
{
"cell_type": "code",
"execution_count": 1,
"id": "6168a371",
"metadata": {},
"outputs": [],
"source": [
"import openai\n",
"import json\n",
"\n",
"import pandas as pd"
]
},
{
"cell_type": "code",
"execution_count": 2,
"id": "e0d373c5",
"metadata": {},
"outputs": [],
"source": [
"openai.api_key = '...'"
]
},
{
"cell_type": "code",
"execution_count": 3,
"id": "436368d9",
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>hgnc_id</th>\n",
" <th>symbol</th>\n",
" <th>name</th>\n",
" <th>locus_group</th>\n",
" <th>locus_type</th>\n",
" <th>status</th>\n",
" <th>location</th>\n",
" <th>location_sortable</th>\n",
" <th>alias_symbol</th>\n",
" <th>alias_name</th>\n",
" <th>...</th>\n",
" <th>cd</th>\n",
" <th>lncrnadb</th>\n",
" <th>enzyme_id</th>\n",
" <th>intermediate_filament_db</th>\n",
" <th>rna_central_ids</th>\n",
" <th>lncipedia</th>\n",
" <th>gtrnadb</th>\n",
" <th>agr</th>\n",
" <th>mane_select</th>\n",
" <th>gencc</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>0</th>\n",
" <td>HGNC:5</td>\n",
" <td>A1BG</td>\n",
" <td>alpha-1-B glycoprotein</td>\n",
" <td>protein-coding gene</td>\n",
" <td>gene with protein product</td>\n",
" <td>Approved</td>\n",
" <td>19q13.43</td>\n",
" <td>19q13.43</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>...</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>HGNC:5</td>\n",
" <td>ENST00000263100.8|NM_130786.4</td>\n",
" <td>NaN</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1</th>\n",
" <td>HGNC:37133</td>\n",
" <td>A1BG-AS1</td>\n",
" <td>A1BG antisense RNA 1</td>\n",
" <td>non-coding RNA</td>\n",
" <td>RNA, long non-coding</td>\n",
" <td>Approved</td>\n",
" <td>19q13.43</td>\n",
" <td>19q13.43</td>\n",
" <td>FLJ23569</td>\n",
" <td>NaN</td>\n",
" <td>...</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>A1BG-AS1</td>\n",
" <td>NaN</td>\n",
" <td>HGNC:37133</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2</th>\n",
" <td>HGNC:24086</td>\n",
" <td>A1CF</td>\n",
" <td>APOBEC1 complementation factor</td>\n",
" <td>protein-coding gene</td>\n",
" <td>gene with protein product</td>\n",
" <td>Approved</td>\n",
" <td>10q11.23</td>\n",
" <td>10q11.23</td>\n",
" <td>ACF|ASP|ACF64|ACF65|APOBEC1CF</td>\n",
" <td>NaN</td>\n",
" <td>...</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>HGNC:24086</td>\n",
" <td>ENST00000373997.8|NM_014576.4</td>\n",
" <td>NaN</td>\n",
" </tr>\n",
" <tr>\n",
" <th>3</th>\n",
" <td>HGNC:7</td>\n",
" <td>A2M</td>\n",
" <td>alpha-2-macroglobulin</td>\n",
" <td>protein-coding gene</td>\n",
" <td>gene with protein product</td>\n",
" <td>Approved</td>\n",
" <td>12p13.31</td>\n",
" <td>12p13.31</td>\n",
" <td>FWP007|S863-7|CPAMD5</td>\n",
" <td>NaN</td>\n",
" <td>...</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>HGNC:7</td>\n",
" <td>ENST00000318602.12|NM_000014.6</td>\n",
" <td>HGNC:7</td>\n",
" </tr>\n",
" <tr>\n",
" <th>4</th>\n",
" <td>HGNC:27057</td>\n",
" <td>A2M-AS1</td>\n",
" <td>A2M antisense RNA 1</td>\n",
" <td>non-coding RNA</td>\n",
" <td>RNA, long non-coding</td>\n",
" <td>Approved</td>\n",
" <td>12p13.31</td>\n",
" <td>12p13.31</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>...</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>A2M-AS1</td>\n",
" <td>NaN</td>\n",
" <td>HGNC:27057</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" </tr>\n",
" <tr>\n",
" <th>5</th>\n",
" <td>HGNC:23336</td>\n",
" <td>A2ML1</td>\n",
" <td>alpha-2-macroglobulin like 1</td>\n",
" <td>protein-coding gene</td>\n",
" <td>gene with protein product</td>\n",
" <td>Approved</td>\n",
" <td>12p13.31</td>\n",
" <td>12p13.31</td>\n",
" <td>FLJ25179|p170</td>\n",
" <td>NaN</td>\n",
" <td>...</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>HGNC:23336</td>\n",
" <td>ENST00000299698.12|NM_144670.6</td>\n",
" <td>HGNC:23336</td>\n",
" </tr>\n",
" <tr>\n",
" <th>6</th>\n",
" <td>HGNC:41022</td>\n",
" <td>A2ML1-AS1</td>\n",
" <td>A2ML1 antisense RNA 1</td>\n",
" <td>non-coding RNA</td>\n",
" <td>RNA, long non-coding</td>\n",
" <td>Approved</td>\n",
" <td>12p13.31</td>\n",
" <td>12p13.31</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>...</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>A2ML1-AS1</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" </tr>\n",
" <tr>\n",
" <th>7</th>\n",
" <td>HGNC:41523</td>\n",
" <td>A2ML1-AS2</td>\n",
" <td>A2ML1 antisense RNA 2</td>\n",
" <td>non-coding RNA</td>\n",
" <td>RNA, long non-coding</td>\n",
" <td>Approved</td>\n",
" <td>12p13.31</td>\n",
" <td>12p13.31</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>...</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>A2ML1-AS2</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" </tr>\n",
" <tr>\n",
" <th>8</th>\n",
" <td>HGNC:8</td>\n",
" <td>A2MP1</td>\n",
" <td>alpha-2-macroglobulin pseudogene 1</td>\n",
" <td>pseudogene</td>\n",
" <td>pseudogene</td>\n",
" <td>Approved</td>\n",
" <td>12p13.31</td>\n",
" <td>12p13.31</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>...</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>HGNC:8</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" </tr>\n",
" <tr>\n",
" <th>9</th>\n",
" <td>HGNC:30005</td>\n",
" <td>A3GALT2</td>\n",
" <td>alpha 1,3-galactosyltransferase 2</td>\n",
" <td>protein-coding gene</td>\n",
" <td>gene with protein product</td>\n",
" <td>Approved</td>\n",
" <td>1p35.1</td>\n",
" <td>01p35.1</td>\n",
" <td>IGBS3S|IGB3S</td>\n",
" <td>iGb3 synthase|isoglobotriaosylceramide synthase</td>\n",
" <td>...</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>HGNC:30005</td>\n",
" <td>ENST00000442999.3|NM_001080438.1</td>\n",
" <td>NaN</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"<p>10 rows × 54 columns</p>\n",
"</div>"
],
"text/plain": [
" hgnc_id symbol name \\\n",
"0 HGNC:5 A1BG alpha-1-B glycoprotein \n",
"1 HGNC:37133 A1BG-AS1 A1BG antisense RNA 1 \n",
"2 HGNC:24086 A1CF APOBEC1 complementation factor \n",
"3 HGNC:7 A2M alpha-2-macroglobulin \n",
"4 HGNC:27057 A2M-AS1 A2M antisense RNA 1 \n",
"5 HGNC:23336 A2ML1 alpha-2-macroglobulin like 1 \n",
"6 HGNC:41022 A2ML1-AS1 A2ML1 antisense RNA 1 \n",
"7 HGNC:41523 A2ML1-AS2 A2ML1 antisense RNA 2 \n",
"8 HGNC:8 A2MP1 alpha-2-macroglobulin pseudogene 1 \n",
"9 HGNC:30005 A3GALT2 alpha 1,3-galactosyltransferase 2 \n",
"\n",
" locus_group locus_type status location \\\n",
"0 protein-coding gene gene with protein product Approved 19q13.43 \n",
"1 non-coding RNA RNA, long non-coding Approved 19q13.43 \n",
"2 protein-coding gene gene with protein product Approved 10q11.23 \n",
"3 protein-coding gene gene with protein product Approved 12p13.31 \n",
"4 non-coding RNA RNA, long non-coding Approved 12p13.31 \n",
"5 protein-coding gene gene with protein product Approved 12p13.31 \n",
"6 non-coding RNA RNA, long non-coding Approved 12p13.31 \n",
"7 non-coding RNA RNA, long non-coding Approved 12p13.31 \n",
"8 pseudogene pseudogene Approved 12p13.31 \n",
"9 protein-coding gene gene with protein product Approved 1p35.1 \n",
"\n",
" location_sortable alias_symbol \\\n",
"0 19q13.43 NaN \n",
"1 19q13.43 FLJ23569 \n",
"2 10q11.23 ACF|ASP|ACF64|ACF65|APOBEC1CF \n",
"3 12p13.31 FWP007|S863-7|CPAMD5 \n",
"4 12p13.31 NaN \n",
"5 12p13.31 FLJ25179|p170 \n",
"6 12p13.31 NaN \n",
"7 12p13.31 NaN \n",
"8 12p13.31 NaN \n",
"9 01p35.1 IGBS3S|IGB3S \n",
"\n",
" alias_name ... cd lncrnadb \\\n",
"0 NaN ... NaN NaN \n",
"1 NaN ... NaN NaN \n",
"2 NaN ... NaN NaN \n",
"3 NaN ... NaN NaN \n",
"4 NaN ... NaN NaN \n",
"5 NaN ... NaN NaN \n",
"6 NaN ... NaN NaN \n",
"7 NaN ... NaN NaN \n",
"8 NaN ... NaN NaN \n",
"9 iGb3 synthase|isoglobotriaosylceramide synthase ... NaN NaN \n",
"\n",
" enzyme_id intermediate_filament_db rna_central_ids lncipedia gtrnadb \\\n",
"0 NaN NaN NaN NaN NaN \n",
"1 NaN NaN NaN A1BG-AS1 NaN \n",
"2 NaN NaN NaN NaN NaN \n",
"3 NaN NaN NaN NaN NaN \n",
"4 NaN NaN NaN A2M-AS1 NaN \n",
"5 NaN NaN NaN NaN NaN \n",
"6 NaN NaN NaN A2ML1-AS1 NaN \n",
"7 NaN NaN NaN A2ML1-AS2 NaN \n",
"8 NaN NaN NaN NaN NaN \n",
"9 NaN NaN NaN NaN NaN \n",
"\n",
" agr mane_select gencc \n",
"0 HGNC:5 ENST00000263100.8|NM_130786.4 NaN \n",
"1 HGNC:37133 NaN NaN \n",
"2 HGNC:24086 ENST00000373997.8|NM_014576.4 NaN \n",
"3 HGNC:7 ENST00000318602.12|NM_000014.6 HGNC:7 \n",
"4 HGNC:27057 NaN NaN \n",
"5 HGNC:23336 ENST00000299698.12|NM_144670.6 HGNC:23336 \n",
"6 NaN NaN NaN \n",
"7 NaN NaN NaN \n",
"8 HGNC:8 NaN NaN \n",
"9 HGNC:30005 ENST00000442999.3|NM_001080438.1 NaN \n",
"\n",
"[10 rows x 54 columns]"
]
},
"execution_count": 3,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# Download/load from https://ftp.ebi.ac.uk/pub/databases/genenames/hgnc/tsv/non_alt_loci_set.txt\n",
"df = pd.read_csv('./non_alt_loci_set.txt', delimiter='\\t', low_memory=False)\n",
"\n",
"df.head(10)"
]
},
{
"cell_type": "code",
"execution_count": 4,
"id": "c548200d",
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"hgnc_id HGNC:132\n",
"symbol ACTB\n",
"name actin beta\n",
"locus_group protein-coding gene\n",
"locus_type gene with protein product\n",
"status Approved\n",
"location 7p22.1\n",
"location_sortable 07p22.1\n",
"alias_symbol NaN\n",
"alias_name β-actin\n",
"prev_symbol NaN\n",
"prev_name NaN\n",
"gene_group Actins|BAF complex|PBAF complex|GBAF complex\n",
"gene_group_id 929|1604|1605|1606\n",
"date_approved_reserved 1986-01-01\n",
"date_symbol_changed NaN\n",
"date_name_changed 2016-06-09\n",
"date_modified 2023-01-20\n",
"entrez_id 60.0\n",
"ensembl_gene_id ENSG00000075624\n",
"vega_id OTTHUMG00000023268\n",
"ucsc_id uc003sot.5\n",
"ena M28424\n",
"refseq_accession NM_001101\n",
"ccds_id CCDS5341\n",
"uniprot_ids P60709\n",
"pubmed_id 1505215\n",
"mgd_id MGI:87904\n",
"rgd_id RGD:628837\n",
"lsdb LRG_132|http://ftp.ebi.ac.uk/pub/databases/lrg...\n",
"cosmic NaN\n",
"omim_id 102630\n",
"mirbase NaN\n",
"homeodb NaN\n",
"snornabase NaN\n",
"bioparadigms_slc NaN\n",
"orphanet 159893.0\n",
"pseudogene.org NaN\n",
"horde_id NaN\n",
"merops NaN\n",
"imgt NaN\n",
"iuphar NaN\n",
"kznf_gene_catalog NaN\n",
"mamit-trnadb NaN\n",
"cd NaN\n",
"lncrnadb NaN\n",
"enzyme_id NaN\n",
"intermediate_filament_db NaN\n",
"rna_central_ids NaN\n",
"lncipedia NaN\n",
"gtrnadb NaN\n",
"agr HGNC:132\n",
"mane_select ENST00000646664.1|NM_001101.5\n",
"gencc HGNC:132\n",
"Name: 270, dtype: object"
]
},
"execution_count": 4,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"df.iloc[270]"
]
},
{
"cell_type": "code",
"execution_count": 5,
"id": "49a75e45",
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>hgnc_id</th>\n",
" <th>symbol</th>\n",
" <th>name</th>\n",
" <th>locus_group</th>\n",
" <th>locus_type</th>\n",
" <th>status</th>\n",
" <th>location</th>\n",
" <th>location_sortable</th>\n",
" <th>alias_symbol</th>\n",
" <th>alias_name</th>\n",
" <th>...</th>\n",
" <th>cd</th>\n",
" <th>lncrnadb</th>\n",
" <th>enzyme_id</th>\n",
" <th>intermediate_filament_db</th>\n",
" <th>rna_central_ids</th>\n",
" <th>lncipedia</th>\n",
" <th>gtrnadb</th>\n",
" <th>agr</th>\n",
" <th>mane_select</th>\n",
" <th>gencc</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>270</th>\n",
" <td>HGNC:132</td>\n",
" <td>ACTB</td>\n",
" <td>actin beta</td>\n",
" <td>protein-coding gene</td>\n",
" <td>gene with protein product</td>\n",
" <td>Approved</td>\n",
" <td>7p22.1</td>\n",
" <td>07p22.1</td>\n",
" <td>NaN</td>\n",
" <td>β-actin</td>\n",
" <td>...</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>HGNC:132</td>\n",
" <td>ENST00000646664.1|NM_001101.5</td>\n",
" <td>HGNC:132</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"<p>1 rows × 54 columns</p>\n",
"</div>"
],
"text/plain": [
" hgnc_id symbol name locus_group \\\n",
"270 HGNC:132 ACTB actin beta protein-coding gene \n",
"\n",
" locus_type status location location_sortable \\\n",
"270 gene with protein product Approved 7p22.1 07p22.1 \n",
"\n",
" alias_symbol alias_name ... cd lncrnadb enzyme_id \\\n",
"270 NaN β-actin ... NaN NaN NaN \n",
"\n",
" intermediate_filament_db rna_central_ids lncipedia gtrnadb agr \\\n",
"270 NaN NaN NaN NaN HGNC:132 \n",
"\n",
" mane_select gencc \n",
"270 ENST00000646664.1|NM_001101.5 HGNC:132 \n",
"\n",
"[1 rows x 54 columns]"
]
},
"execution_count": 5,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"def find_gene(gene_str):\n",
" gene_str_upper = gene_str.upper()\n",
" \n",
" df_res = df[df['symbol'] == gene_str_upper]\n",
" if df_res.shape[0] > 0:\n",
" return df_res\n",
" \n",
" df_res = df[df.apply(lambda row: row.astype(str).str.contains(gene_str, case=False).any(), axis=1)]\n",
" return df_res\n",
" \n",
"find_gene('ACTB')"
]
},
{
"cell_type": "code",
"execution_count": 6,
"id": "b70ca9b0",
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>hgnc_id</th>\n",
" <th>symbol</th>\n",
" <th>name</th>\n",
" <th>locus_group</th>\n",
" <th>locus_type</th>\n",
" <th>status</th>\n",
" <th>location</th>\n",
" <th>location_sortable</th>\n",
" <th>alias_symbol</th>\n",
" <th>alias_name</th>\n",
" <th>...</th>\n",
" <th>cd</th>\n",
" <th>lncrnadb</th>\n",
" <th>enzyme_id</th>\n",
" <th>intermediate_filament_db</th>\n",
" <th>rna_central_ids</th>\n",
" <th>lncipedia</th>\n",
" <th>gtrnadb</th>\n",
" <th>agr</th>\n",
" <th>mane_select</th>\n",
" <th>gencc</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>1000</th>\n",
" <td>HGNC:483</td>\n",
" <td>ANG</td>\n",
" <td>angiogenin</td>\n",
" <td>protein-coding gene</td>\n",
" <td>gene with protein product</td>\n",
" <td>Approved</td>\n",
" <td>14q11.2</td>\n",
" <td>14q11.2</td>\n",
" <td>RNASE5|RAA1</td>\n",
" <td>ribonuclease A family member 5</td>\n",
" <td>...</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>3.1.27.-</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>HGNC:483</td>\n",
" <td>ENST00000397990.5|NM_001097577.3</td>\n",
" <td>HGNC:483</td>\n",
" </tr>\n",
" <tr>\n",
" <th>30017</th>\n",
" <td>HGNC:10074</td>\n",
" <td>RNH1</td>\n",
" <td>ribonuclease/angiogenin inhibitor 1</td>\n",
" <td>protein-coding gene</td>\n",
" <td>gene with protein product</td>\n",
" <td>Approved</td>\n",
" <td>11p15.5</td>\n",
" <td>11p15.5</td>\n",
" <td>RAI</td>\n",
" <td>NaN</td>\n",
" <td>...</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>HGNC:10074</td>\n",
" <td>ENST00000354420.7|NM_203387.3</td>\n",
" <td>NaN</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"<p>2 rows × 54 columns</p>\n",
"</div>"
],
"text/plain": [
" hgnc_id symbol name \\\n",
"1000 HGNC:483 ANG angiogenin \n",
"30017 HGNC:10074 RNH1 ribonuclease/angiogenin inhibitor 1 \n",
"\n",
" locus_group locus_type status location \\\n",
"1000 protein-coding gene gene with protein product Approved 14q11.2 \n",
"30017 protein-coding gene gene with protein product Approved 11p15.5 \n",
"\n",
" location_sortable alias_symbol alias_name ... \\\n",
"1000 14q11.2 RNASE5|RAA1 ribonuclease A family member 5 ... \n",
"30017 11p15.5 RAI NaN ... \n",
"\n",
" cd lncrnadb enzyme_id intermediate_filament_db rna_central_ids \\\n",
"1000 NaN NaN 3.1.27.- NaN NaN \n",
"30017 NaN NaN NaN NaN NaN \n",
"\n",
" lncipedia gtrnadb agr mane_select \\\n",
"1000 NaN NaN HGNC:483 ENST00000397990.5|NM_001097577.3 \n",
"30017 NaN NaN HGNC:10074 ENST00000354420.7|NM_203387.3 \n",
"\n",
" gencc \n",
"1000 HGNC:483 \n",
"30017 NaN \n",
"\n",
"[2 rows x 54 columns]"
]
},
"execution_count": 6,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"find_gene('angiogenin')"
]
},
{
"cell_type": "code",
"execution_count": 7,
"id": "92c08d38",
"metadata": {},
"outputs": [],
"source": [
"def get_gene_info(gene_str, max_no_of_res=5):\n",
" \"\"\"Get details about a given gene\"\"\"\n",
"\n",
" cols_of_interest = ['symbol', 'name', 'alias_symbol', 'alias_name',\n",
" 'prev_symbol', 'prev_name', 'hgnc_id', 'gene_group',\n",
" 'location']\n",
" res_df = find_gene(gene_str)[cols_of_interest].head(max_no_of_res)\n",
" \n",
" gene_info = res_df.to_dict(orient='records')\n",
" return json.dumps(gene_info)\n",
"\n",
"function_descriptions = [\n",
" {\n",
" \"name\": get_gene_info.__name__,\n",
" \"description\": get_gene_info.__doc__,\n",
" \"parameters\": {\n",
" \"type\": \"object\",\n",
" \"properties\": {\n",
" \"gene_str\": {\n",
" \"type\": \"string\",\n",
" \"description\": \"a human gene symbol, name, or description; e.g. Actin\",\n",
" },\n",
" \"max_no_of_res\": {\n",
" \"type\": \"integer\",\n",
" \"description\": \"maximum number of results to be returned. 5 by default.\",\n",
" }\n",
"\n",
" },\n",
" \"required\": [\"gene_str\"],\n",
" }\n",
" }\n",
" ]"
]
},
{
"cell_type": "code",
"execution_count": 8,
"id": "f341f5b1",
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"The term PD-1 actually refers to two different genes. \n",
"\n",
"1. PDCD1 (Programmed Cell Death 1): It is also known as CD279 and it is associated with systemic lupus erythematosus susceptibility. This gene is located at 2q37.3. The PDCD1 gene is a part of the CD molecule group and V-set domain-containing group.\n",
"\n",
"2. VSIR (V-Set ImmunoRegulatory Receptor): This gene is also sometimes referred to as PD-1. This gene is located at 10q22.1. It's alternate names include SISP1, GI24, B7-H5, B7H5, VISTA, and PD-1H. This gene is a part of the V-Set domain-containing group and the B7 family. The VSIR gene is also known to be involved in the immunoregulatory process. \n",
"\n",
"Both these genes are involved in immune system regulation and can have implications for conditions like autoimmune disorders and cancer. The proteins they produce play a critical role in the body's immune response, and their dysfunction can lead to a variety of health problems.\n",
"\n"
]
}
],
"source": [
"def gene_chatgpt(user_query):\n",
" response = openai.ChatCompletion.create(\n",
" model=\"gpt-4-0613\",\n",
" messages=[{\"role\": \"user\", \"content\": user_query}],\n",
" functions=function_descriptions,\n",
" function_call=\"auto\"\n",
" )\n",
"\n",
" ai_response_message = response[\"choices\"][0][\"message\"]\n",
" gene_str = eval(ai_response_message['function_call']['arguments']).get(\"gene_str\")\n",
" max_no_of_res = eval(ai_response_message['function_call']['arguments']).get(\"max_no_of_res\")\n",
" \n",
" function_response = get_gene_info(gene_str, max_no_of_res=max_no_of_res)\n",
" \n",
" second_response = openai.ChatCompletion.create(\n",
" model=\"gpt-4-0613\",\n",
" messages=[\n",
" {\"role\": \"user\", \"content\": user_query},\n",
" ai_response_message,\n",
" {\n",
" \"role\": \"function\",\n",
" \"name\": get_gene_info.__name__,\n",
" \"content\": function_response\n",
" },\n",
" ],\n",
" )\n",
" \n",
" return second_response['choices'][0]['message']['content']\n",
"\n",
"print(gene_chatgpt('What do we know about the PD-1 gene?'))"
]
},
{
"cell_type": "code",
"execution_count": 9,
"id": "aa27a3ba",
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"The official gene symbol for Tcf-1 is TCF7. It is also known by the alias name T-cell-factor-7. The gene is located at 5q31.1. It does not have any previous names or symbol listed but it is part of the gene group \"TCF/LEF transcription factor family|Wnt enhanceosome complex\".\n"
]
}
],
"source": [
"print(gene_chatgpt('What is the official gene symbol for Tcf-1? Does it have any previous names or aliases?'))"
]
},
{
"cell_type": "code",
"execution_count": 10,
"id": "68306934",
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"The HGNC id for PD-L1 is HGNC:17635.\n"
]
}
],
"source": [
"print(gene_chatgpt('What is the HGNC id for PD-L1?'))"
]
},
{
"cell_type": "code",
"execution_count": 11,
"id": "8a9cab43",
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"1. **[ABITRAM](https://www.genenames.org/data/gene-symbol-report/#!/hgnc_id/HGNC:1364)**: Location 9q31.3 - This is an actin binding transcription modulator gene.\n",
"2. **[ABLIM1](https://www.genenames.org/data/gene-symbol-report/#!/hgnc_id/HGNC:78)**: Location 10q25.3 - It's an actin binding LIM protein 1 gene which belongs to the LIM domain containing group.\n",
"3. **[ABLIM2](https://www.genenames.org/data/gene-symbol-report/#!/hgnc_id/HGNC:19195)**: Location 4p16.1 - This actin binding LIM protein family member 2 belongs to the LIM domain containing and MicroRNA protein coding host genes groups.\n",
"4. **[ABLIM3](https://www.genenames.org/data/gene-symbol-report/#!/hgnc_id/HGNC:29132)**: Location 5q32 - This gene is for an actin binding LIM protein family member 3 and belongs to the LIM domain containing group.\n",
"5. **[ABRA](https://www.genenames.org/data/gene-symbol-report/#!/hgnc_id/HGNC:30655)**: Location 8q23.1 - It's known as the actin binding Rho activating protein.\n",
"6. **[ACD](https://www.genenames.org/data/gene-symbol-report/#!/hgnc_id/HGNC:25070)**: Location 16q22.1 - This gene is for the ACD shelterin complex subunit and telomerase recruitment factor and belongs to the Shelterin complex group.\n",
"7. **[ACTA1](https://www.genenames.org/data/gene-symbol-report/#!/hgnc_id/HGNC:129)**: Location 1q42.13 - This skeletal muscle actin alpha 1 gene belongs to the Actins group.\n",
"8. **[ACTA2](https://www.genenames.org/data/gene-symbol-report/#!/hgnc_id/HGNC:130)**: Location 10q23.31 - It's known as smooth muscle actin alpha 2, and it's part of the Actins group.\n",
"9. **[ACTB](https://www.genenames.org/data/gene-symbol-report/#!/hgnc_id/HGNC:132)**: Location 7p22.1 - This actin beta gene is part of the Actins, BAF complex, PBAF complex, and GBAF complex groups.\n",
"10. **[ACTBL2](https://www.genenames.org/data/gene-symbol-report/#!/hgnc_id/HGNC:17780)**: Location 5q11.2 - This gene is for an actin beta like 2 protein and belongs to the Actins group.\n"
]
}
],
"source": [
"print(gene_chatgpt('What do we know about actin genes? Show 10 relevant genes at most.'))"
]
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3 (ipykernel)",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.10.10"
}
},
"nbformat": 4,
"nbformat_minor": 5
}
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment