Skip to content

Instantly share code, notes, and snippets.

@AstraBert
Last active May 10, 2024 23:51
Show Gist options
  • Save AstraBert/2d5299582e2087b6de1883ab0cf3a255 to your computer and use it in GitHub Desktop.
Save AstraBert/2d5299582e2087b6de1883ab0cf3a255 to your computer and use it in GitHub Desktop.
filter_vcfs.ipynb
Display the source blob
Display the rendered blob
Raw
{
"nbformat": 4,
"nbformat_minor": 0,
"metadata": {
"colab": {
"provenance": [],
"mount_file_id": "1yyy1oYr8c70ws5KBPQLZ9jxs1Z-QR4aZ",
"authorship_tag": "ABX9TyMoXrKdXAiw9dzvHcxm2D21",
"include_colab_link": true
},
"kernelspec": {
"name": "python3",
"display_name": "Python 3"
},
"language_info": {
"name": "python"
}
},
"cells": [
{
"cell_type": "markdown",
"metadata": {
"id": "view-in-github",
"colab_type": "text"
},
"source": [
"<a href=\"https://colab.research.google.com/gist/AstraBert/2d5299582e2087b6de1883ab0cf3a255/filter_vcfs.ipynb\" target=\"_parent\"><img src=\"https://colab.research.google.com/assets/colab-badge.svg\" alt=\"Open In Colab\"/></a>"
]
},
{
"cell_type": "markdown",
"source": [
"# ANALYZE AND FILTER YOUR VCF FILE *WITHOUT EXCEL*\n",
"\n",
"In this notebook, we'll be learning how to complete a thorough filtering of a [VCF file](https://en.wikipedia.org/wiki/Variant_Call_Format) (provided in `tsv` format, short for \"tab-separated values\") without employing fancy Excel functions, in less than 80 lines of pure python code!\n",
"\n",
"You **won't be actually coding** (everything has already been done for you), but you'll need to know how to use a notebook: if you don't, no worries! It is really simple and, by clicking on [this link](https://youtu.be/HcUZ5xbdvro?feature=shared), you'll be able to see an awesome 2-mins YouTube tutorial by London Business Analytics Group.\n",
"\n",
"## How do I run a code block?\n",
"If you move your mouse arrow over the square brackets with the numbers in the blocks formatted with a gray background, you will see that the usual \"start\" sign appears which you also click to start videos on YouTube after you have paused them: you just have to clik on it!\n",
"\n",
"Explanatory image:\n",
"#![freccetta.jpg](data:image/jpeg;base64,/9j/4AAQSkZJRgABAQEAeAB4AAD/2wBDAAIBAQIBAQICAgICAgICAwUDAwMDAwYEBAMFBwYHBwcGBwcICQsJCAgKCAcHCg0KCgsMDAwMBwkODw0MDgsMDAz/2wBDAQICAgMDAwYDAwYMCAcIDAwMDAwMDAwMDAwMDAwMDAwMDAwMDAwMDAwMDAwMDAwMDAwMDAwMDAwMDAwMDAwMDAz/wAARCAC+AIYDASIAAhEBAxEB/8QAHwAAAQUBAQEBAQEAAAAAAAAAAAECAwQFBgcICQoL/8QAtRAAAgEDAwIEAwUFBAQAAAF9AQIDAAQRBRIhMUEGE1FhByJxFDKBkaEII0KxwRVS0fAkM2JyggkKFhcYGRolJicoKSo0NTY3ODk6Q0RFRkdISUpTVFVWV1hZWmNkZWZnaGlqc3R1dnd4eXqDhIWGh4iJipKTlJWWl5iZmqKjpKWmp6ipqrKztLW2t7i5usLDxMXGx8jJytLT1NXW19jZ2uHi4+Tl5ufo6erx8vP09fb3+Pn6/8QAHwEAAwEBAQEBAQEBAQAAAAAAAAECAwQFBgcICQoL/8QAtREAAgECBAQDBAcFBAQAAQJ3AAECAxEEBSExBhJBUQdhcRMiMoEIFEKRobHBCSMzUvAVYnLRChYkNOEl8RcYGRomJygpKjU2Nzg5OkNERUZHSElKU1RVVldYWVpjZGVmZ2hpanN0dXZ3eHl6goOEhYaHiImKkpOUlZaXmJmaoqOkpaanqKmqsrO0tba3uLm6wsPExcbHyMnK0tPU1dbX2Nna4uPk5ebn6Onq8vP09fb3+Pn6/9oADAMBAAIRAxEAPwD98NE1B9SsnkcKCtxNENo4wkroPxwoq2TgVm+E/wDkFy/9fl1/6USVpHpQBVv797WMlQvA71heBPG114n8Qaxazx26R6esJjMakFt/mZzkn+4P1rW1j/UN9K5H4Q/8jn4m/wBy1/8Aa1AHoFFFFABRRRQAUUUUAFFFFABRRRQAUUUUAFFFFAGZ4T/5Bcv/AF+XX/pRJWkelZvhP/kFy/8AX5df+lElaR6UAZ2sf6hvpXI/CH/kc/E3+5a/+1q7HVV3W7fSuS+FERj8Y+JPdLb+c1AHeUUUUAFFFFABRRRQAUUUUAFFFFABRRRQAUUUUAZnhP8A5Bcv/X5df+lEladZnhP/AJBcv/X5df8ApRJWnQBXu4fMjIryn4qeBdVvzM2lanqWlyTY3vZ3LwM+M4yVIzjJ/OvXWG4VXnsFm6jNAHydd/B/x60p2+OfGoHtrd1/8XUP/CnfH/8A0PXjf/weXX/xdfWDaHEf4ab/AGFF/dFAHyh/wp7x/wD9D143/wDB3df/ABdH/CnvH/8A0PXjf/wd3X/xdfV/9hRf3RR/YUX90UAfKH/CnvH/AP0PXjf/AMHd1/8AF0f8Ke8f/wDQ9eN//B3df/F19X/2FF/dFH9hRf3RQB8of8Ke8f8A/Q9eN/8Awd3X/wAXR/wp7x//AND143/8Hd1/8XX1f/YUX90Uf2FF/dFAHyh/wp7x/wD9D143/wDB3df/ABdH/CnvH/8A0PXjf/wd3X/xdfV/9hRf3RR/YUX90UAfKH/CnvH/AP0PXjf/AMHd1/8AF0f8Ke8f/wDQ9eN//B3df/F19X/2FF/dFH9hRf3RQB8of8Ke8f8A/Q9eN/8Awd3X/wAXRX1f/YUX90UUAJ4T/wCQXL/1+XX/AKUSVp1meE/+QXL/ANfl1/6USVp0AFZ+u/8ALL8f6VT0/wCJOg6t4+1HwtbatY3HiHSLWG8vdPjlDT2sMpYRs6j7u7acA84wehGbmu/8svx/pQmnsTGUZaxdzPoorifj7+0Z4K/Ze+H0/ijx54hsfDuiwt5YmuCS88m0sI4o1BeRyFJCoCTg8UFHbUV+b3if/g5i+EmmeI3t9M8E+PtU06Niv2xktbcy4bG5IzKTtI5G4qemVHb6r/Y4/wCCjnwq/bns518D65J/a9pF59zouoxC31G3jyF3mPLBlyQCyMwGRkjIoA92ooooAKKKKACiiigAooooAKKKKALfhvf/AGJceVt8z7Xd7N33c/aJMZ9q+cbv9rr4k+MrqT4XaV4Kk0L40BzHqF7NC8/h3RrEnC6xHOQBPG4yIoPvmRXVwAhJ+idH1O30Twxe3l3NHbWlpcXs080jbUiRZ5SzMT0AAJJ9qzfhZ8b/AA38ZoLt9Au7uZrHyzNFeadc6fMqSAtHII7iONzG4BKyAFGwcE4NZ1ISls7HJisPUq2VOo4rrbqv0fZ9NdNms/8AZ8/Z80b9nbwZJpumyXWo6lqM7X2s6zfP5l/rl4/37id+7HsPuqoCqABXWa7/AMsvx/pWhWfrv/LL8f6VcYqK5Y7G9KlClBU6askZ9fzrf8Fiv2wdX/av/bQ8SwS3hk8MeBb640LQbVRiOJIn2TTY7tLJGWLHnaEXoor+imv5kf8AgoX8GtW+A37avxK8P6xavbSjXrq+tSWLCe1uJWmgkVjnIaN1PUkHIPINM0O3+HX/AASx8ffG79ie3+MngZT4nWG8urXUtAtrdjfxLC6qJYAM+fkMSyABl28b88eF/Bz4u+If2fPiro3i3w1fXWk674fuluIJYmKNkHDRt6qwyrKeCGIIwa+9P2NP+CwehfsF/wDBNqx8J+H7JfEXxPvNVvriG0nR1sdLidxsmnYY3k44iQ5I5ZkGN3wR8TviR4h+P/xV1TxLrkp1PxH4lvDPcNBbqnnyuQAqRoAB2AVR6CgD+nv9nX4v2/7QHwF8HeN7aNIYvFWj2upmJSSIHliVnjyQCdrFlzj+Guzry39iL4SXfwH/AGQPhr4Q1FPK1PQvDtnb30f/ADzufKVplHAyBIXGe+K9SoAwPih8RLL4U+BdQ17UItUmtbCMu6afplzqM59MQ26PIR6kLhRkkgAmvnb9mT/goVd/Gz4leCPDmoaNqkE2u+Dotb1KWLwfrNtFFfSyqsaxvLGUW12CX987GNmChZT0Psn7VvgjxF8Q/wBn3xVpfhPVNV0zxBPptwLIWAs999IYXVbZjdxvEqSMQGPyMOzr1ryH9lD9kLxj8Hvj1PrOr+LvFt1o2leEtG0GxjuY9HEN+sKXBlt3EFssixwPIhRgUdiTueVeni4ypi/rlOFFPk6u2nfe67duvmfPZhVx/wBfpU6Cfs9OZ20778y7W269bo+oq+PP2tvix8WfC/7QVjp/hOXxKdHe7nFm66BE5Eo0qR5Y7WM39uNSCD98EljAWUELJOy+QPsOiu7G4V4iCgpuOqd15HpZjgpYqkqcZuGqd15dP69T5a/bm+JXxA8E+FfDj+D7vxHcW9zDY/2hJPpS2kIdr61WJ5Lr7RayQTyMzI0CpIrKzB1gX95VzxH8ePiT4U+H/wAIdPtr20h8W/EHxLPo2p3firws8J07EV3OVWyt7tBhDCsaMLmVWQB/Ml3bz9M14b+2P8A9U+PHiv4Sx2ltqUuk6D4rOo6zPp+ryaXc2dt9huYxIk0UsUwPmPGP3TbsMe2a4cXhK0HOvSm23yq13/Mr97afdqzzcdga9NzxFGcm5cq5buy95Xa3tp1tpq0Vfhh+0b4xtvjv4t+GfjG38OahrPh2ws9Ztda0W2mtLS8tLkyoI5LWWWV4ZleF+k0iupz8uMUV6F4H/Z28MfDDSr238PWktpcapcrdahf3l3PqWoajIqlVM91cSPPLtU4Xe7bV4XA4orroU8RCHLUd3r16X0V9L2WlzuwtHFU6fLVd3d9b2V9Fd2bstLtXZ0XjzwbcfEX4GeLtAtHjju9atdUsYHkOEV5HmRS2O2SM1zfwA8PeJLz4ha34m1/w3d+FFuNF0zRYbK8u7a5mle1Ny8kwNvJIgjJnAXLBztJKrwK9J8J/8guX/r8uv/SiStOu89M+c/2Wvgbrnw2+PPiXVbjwrcaPYalFcm61C+nsprm/ne6EieXNasslzCF3kNewrPHlVVmBYV75rv8Ayy/H+laFZ+u/8svx/pQBn181/t/f8Ewvhv8At/xaNdeKPt+jeINFcJDq+lMkd5PbbsvbNvVldTyVLKTGxJHDOrfSlVtQ8v5N3+t58rr97+Xp1oA/m0/bF/YP1D9kr9tjX/htrOt6dpXhY6XBr+j+IryG9urLT7G5u9Rgtk1O4t7Vlt55PsSRqCiq0smAduSv68/sx/8ABDb4Mfsv/GPTvHOjy+MfFWp6HKLrSl1u9t7i1gkHKXKrDBFuZeCpJKgkMBkKR0/i/wDZ78F/tT/tv/HjwF4+0W38R+Fde+Fvw/8A7S0+ZmVJXi1zxbPGcgg5SWKNxg4yg7VwX/BJz9sL4n6p4m8X/A/46iy0X4q/DuGI28EkJEmo2M00rRv9qV3tLhhby6dlYH3IZdrKGDKoB90/bh9pxuTydv388bs9M9OlOs7kzxDfhZerJ3H4VT/0f/t0/H7/APPp+FT2G3zjv/4+sfP9O3t6UAW6KwPih8RLL4U+BdQ17UItUmtbCMu6afplzqM59MQ26PIR6kLhRkkgAmvmr4Cf8FKD8Q/FfhbT9f0vUbCC88DDxLrd2nhDWbaG1umcFdkksZRLQRrKfPdjExChZex4sRmGHoVI0qsrNnn4rNMLh6saNaaUpbffb+vRn1pRXhPwA+J3xX/aC8J6P4+hPgfw74Q8QOl7p3h680u7n1WXTmxsklvluUjimkXMgQWsiqCqlmOWGV+0npk3gv8Aaz+CmqaZrXiu0fxT4kuLDVLNfEN+dNu4E0y5dUNkZjbDDxo2VjBJGSSal49eyVaMXytx300k0k1v32dmZyzNewWIhF8rcVrpdSaSa377OzsfRdFeD/t++PYfB3wiC6nB8WNO0Jru1lvPEPgXUrGyutMP2mNEjdp50lKSM4VhFHJld2cV6T8Z/jRonwB8C/23rhv5o2nisbO0srZrq91O6kO2K3gjXl5XbgDgDksVUEjX63BTnGWiik2/W/8Alv8A5G7x1ONSpCeigk2353/DTf5dDr6K5X4Y/Ei+8fwXX9peDvFPgy6tSpFtrQtHaZGzh0ktLieE8ggr5gdcAlQGUkrohNTjzL/I6qdRTjzR2+78Hqdv4T/5Bcv/AF+XX/pRJWnWZ4T/AOQXL/1+XX/pRJWnVFhWfrv/ACy/H+laFZ+u/wDLL8f6UAZ9VdRlCvEmwFpCVVu6HjkVaqtqEjrsQL+7kyJGx90evt3oA8F+Fjg/8FPvjGMcp8L/AAMrH+8Rq3jHmuZ/4Ki/sX+Kf2jfDHhDx98MtVOj/Fz4NXVxrfh+MW0U/wDwkkOxJ5dDzcSpb232y5s9P/0yRXMHkcDa8gPS/DK5EX/BTj4yNJsjig+Fngb5zwNi6r4x+Ykn079K+Nv+Cp3/AAVr8c+Ov+CYnwU+L37Lnie88BX/AMWvivD4Ks77WtL0+7Z4A2r2b+YjrdxCJrmySRXj3P5YXpuZKAPvH9k742T/ABt+A2ia1q1otn4ls4k0vxLpu9ZBp2rxRp9rh8xUWOQpIWXfEPLbqpIr1CyfyrxoSNzouTKerdP8fXtX5if8Eaf2Jv20/wBnT4+aj45+Mnxg8D+Mvhx47s7rW7rw/pdxM8p1O8+zyLcJAbCCOBAEOUhdUBY4Q7mJ/TywYxzGJRmBRlX9T9enc0AcN+1b4I8RfEP9n3xVpfhPVNV0zxBPptwLIWAs999IYXVbZjdxvEqSMQGPyMOzr1rwv4GfsKeKvD3xB8Qx+IfGPjCbw5N4E03wlbLNHo4jvYxBdLNAwgtlkVIGmUowKOxJ3SSr0+uKK4K+XUq1ZVp3uul3br0+f4I8zE5TQxGIjiKjd49Lu3Xp89fRHz1+yzc/E34G/DLw18NfEHw61HV5/C6RaNB4p0/VdOXRryzjCrHcujzreRuI+GjW2f504Yq24Uv2p08ceJvj78Mr/QvhX4y13Svh7r82p3l9bahosUV9FJYTwAW6z38chYPMMiRI+FYgnjP0lRUvLk6Koe0lZWt8N/d1S+HyW+vmS8qTw6w3tZcqcbfDf3Wml8Pkt9fM+dP2+rHxp8X/ANna48KeGPhr4r1jUtfjsrxnS+0iCHTWjuopngnMt6hMgWMjMQkTJHz96vftF+EfGXxs8AeCfFug+EL7TPFPgHxNF4gi8Ma7e2cc2pxxpNDJCJbeae3SR45WaNmkwGC7inJHvtFOpl8akpylN+8kumlm2mtN035ryHVyqNSc5znL30l9nTlbaa93dNt63Xkct8M/HOt+OorubVvBet+DIoSqQQ6veWU11cHks220nnjVB8oBMu4ndlFABYrqaK7oRcY2bv62/Sx6NOLjHlbv5u36JL8C74T/AOQXL/1+XX/pRJWnWZ4T/wCQXL/1+XX/AKUSVp1RYVn67/yy/H+laFZ+u/8ALL8f6UAZ9V9QVygIIEYBMg7sPb9asVBfW7yx+YC22EFnUDhx6H8qAPzF/wCCh6fFT4vf8FNfEnwN+GuvQeCZvjD8LvCf2rxZBrN3peq6BFpur+Jr1VtTbxsX89Emgk3MvySEfMGIq54n/wCCeF7p3xr/AGifgdBD4PHhn4q+BH8Q/APRNjDRPhbq+i6dFp1/qMNuIfL0q6l1PXoLpJtPjeRsXEjskgVX/Sc2ssjQj95m5GYjg/uQBnC/hxxivlH/AIKr6F4s8FeDfB/xq8E6tb6DL8BtcbxH43u0ci/1zwXBGbzW9JtRsZHmuvsFmVSRokZoFzNGASQD0D/gnlrep3n7KHhLw9q19c6l4q8BWEPhPW9QnlaZb68sYY4J5Ukb55EaRSVd1VyDllB4r3HTgzfMhxbEfIh6g5//AF96+Hv+CMOq+K/jZ4f+IXxQXWZT8OPif4q1DxR4Q0mbi+0jT7uOylhhnjUGNJFPnFlSWRQX4Y54+49PhZ1+0fMkTghY8fKpz2/I/nQBx/7S37Qeg/spfATxV8R/FIvj4d8G6e+p6j9ihE04gTG4ohK7iBzjPbjmuF+Ff7eGj/Er4oab4YvPBHxH8GnxNpM+t+F9Q8RaZBa23ie1g8symFEne4tpVSWN/IvobaYqxIjOx9uR/wAFap2tv+CbnxhkSGS4ePw9KyxRlQ8pDIQo3ELk9BkgepFZXhnw345/aj+P/wAPvF2ufDvxF8LPDvw107UTFD4jv9LudR1q9vbdLdfKTTru7iS3jjEjM0kqSF2QBNoLUAdxpv7c3hLVP2e/AHxJj0/xENC+I2qadpOmwNBD9rglvrgW8TTL5uxVDnLFXYgdAx4rmPHn/BSjQPCnjr4i+HtJ+HvxR8Z3vwmmiHiyTRdOs1ttKtpLVboXQlurqBJ18stmGAyXOYyfI2lGbwbQPgR8ck/Z7+DPwfj+FMlpZ/Cnxnot5rviW+8Q6cLTWLGz1AyfaNMiimed/lCu63aWrKuQiyvxXtXhn9nnxhp7/tZedpGz/hZl/JP4a/0qA/2kh0C1tAeH/d/v43T97t+7n7pBoA5n9r39vXxr4C+KXwJj+Hnw9+IHinwn4712JpNR0l/D623iizm0q8uUs4BfX8M8MoZIpS0iQLiJlEpJ2N7P4q/a1i+HXwZ03xR4p8CeOPDmua5fjSdK8Gz/ANnXmvapeuXENvF9ku5rPdIqFwzXKxogLSvEFYr5L8Sf2f8Ax7oX7Nv7N+o6P4Wm8R+K/grfaVqOp+GrfULSC6vo10yWwuooJppEtjMguGdd8yRt5ZG8ZFM/bR+B3in9sr4X/DLxingL4haDrPw78Tya3deC/wDhMIvD3iDUbY21zaSR2+o6VqBginKzLLGDeIjgeXI8W9sAHtnwA/aesfjprOv6HdeGvFHgPxj4WML6p4a8Rrafb7aCcMbe4V7O4uLaWGTZIFeKZ8NG6ttZStFeafsb/CXwz8Kdd8T+MJPhv8Yfhp4g1mK00q5vPiV48Piy71O3hM0kSQyDWdTESI0spxujJLnhuoKAPqXwn/yC5f8Ar8uv/SiStOszwn/yC5f+vy6/9KJK06ACs/Xf+WX4/wBK0Kz9d/5Zfj/SgDPooooAKKKKACiiigDnvix8KtB+OHw51fwl4osP7T8P67AbW+tfPkh8+MkErvjZXXoOVYGt+GFbeFY0GFQBVHoBXH/tB/FwfAX4LeI/GLadJqy+HrRrs2ccvlvcAEDaGwcHnjiuX8K/tE68/wASdB0LxV4MTwzD4vsrm80SZNXW8nLQKrvb3cYiVYJvLbcBHJOnyuN/A3YTxNOE/ZyeunR9XZa7as0jSlKPMtj1qivn74a/th+MPib8P9J8Y2fwruv+EXvL9bC5MWsfadUGbk27XEFrHARLAhwzNJJE4VZCEIUF7/xxuYk/af8AhX/b/gvSNStzqtxbeHtbj8Q3Md5ptw1lLJKz2YgWJlKxMg3TN1BwKx+vU5QU4bO26a3fp/wPNF/V5KXLLz7dPme5UV5X8SP2gNc8O/HSx+H/AId8IR6/quo6I+tJeXWq/YbG1VJxEyzuIpXUHPylEkJYgFVGXHmP7Qf7SHjDx1+xH4l8T+F9ItdDvbJL3T9XkbxDNbXWj3NtceQ5tHit288b0fDM0B24OASQCrj6UFLduKb2fT5eaCGHnJrz/U+nri2juk2yxpIoOcOoIzRXMfC3V/GGq2Eh8WaJ4c0chIzbHStdn1Mz5B3F/MtLfYRxjG/OT0xyV1wkpR5l/kYtWdmdx4T/AOQXL/1+XX/pRJWnWZ4T/wCQXL/1+XX/AKUSVp1QgrP13/ll+P8AStCs/Xf+WX4/0oAz6KKKACiiigAooooA8d/4KAlh+xl8Q9oBb+yW2gnAJ3L1PNL4c+GXjD4ifE/w14q8ZW3hvRLfwjY3Eel6do+pzakbme5RY3nlmkt7fYFjXaI1RslyxbgCup/aC8X2HhH4fn+1NDs/EWn6lOtnPY3e0wyqVdvmVkZWHydCO/tXAL+2mEUAeGQABgAah0/8hVySwvNW9pJ6WWnmm3+psq1ocq31/Gx2X7KHwj1L4FfAbRPC+rzWNxqGmm4Mslm7PC3mTySDaWVT0cZyBzmuW+PXgD4l+Ofi14N1XQtD8DS6V4I1aTU4Gv8AxNdW1xqIe0ltyjRpp8ixYMxOQ8mQo4GeK/8Aw2t/1LX/AJUf/tVH/Da3/Utf+VH/AO1VUsLF0o0U2krfhtuJVnzub3f6nX2Xwu1q6/aRsPHN5/ZdvbL4TOjXFrDcSTSR3LXKzHYTGoaMAEBjtY8fIK5TTf2VNSvP2XPHXgDUdRsLe78V6hrF3b3duHmjtlu7uWeEsCEJK7l3AcZBAJHNR/8ADa3/AFLX/lR/+1Uf8Nrf9S1/5Uf/ALVQ8JTd79b/AI2v+Qe3mtvL8D0j4R3XjeTRvJ8a6R4W026toYo45dF1me/S6YLh2ZZbWAxDIBABk6nJ45K83/4bW/6lr/yo/wD2qit4RcY8rdzNu7ufQfhP/kFy/wDX5df+lEladZnhP/kFy/8AX5df+lEladUIKz9d/wCWX4/0rQooA5+iugooA5+iugooA5+iugooA8G/bG/5JlY/9hSP/wBFTV82V+hlFAH550V+hlFAH550V+hlFAH550V+hlFAGZ4T/wCQXL/1+XX/AKUSVp1T0SwfTbJ43KktcTSjb0w8ruPxwwq5QAUUUUAFFFfPP/BTr9kzxN+2r+y+fAnhXV7PRb6513TL26uLm/uLINZw3KSXEaywI8gdogyrgAZIyR1oA+hqK+FfiR/wb8fBLxB8O9esPDd58QfD3iK9064t9L1WXxlq14mmXTxMsNw0LXIEojcq5QkBtuCRmvr34BfDe5+DXwJ8FeELzVH1y88K6DY6PPqToUbUJLe3jhacqWYguULYLEjd1PWgDraKKKAOe+J/xc8KfBHwjLr/AI08T+HvCOhQSJFJqWtajDYWkbucIpllZUBY8AE8mud+GP7V/wANvjza6ufhz478GfEi70SAT3Vn4X8QWOpTx7g3lq2ybahcowUyMqkg8gAkegXEZlgdR1ZSBmvMP2K/gjqv7OP7MPhXwVrdxp91qmhxTRzy2MjvbuXuJZRtLqjHhx1Uc5oAX9mv9qK3/aOvPGVifCPi3wTrPgTVk0fVtM8QGwadJntoblSj2V1cwshjnTkSZzkEcV6jXkv7PfwH1f4T/GP4y+IdRuNNmsviJ4lttZ01LaR2lghj020tGWYMihX8yByApYbSvOcgetUAc3q3xc8PaH8U9G8FXWoeV4m8QWF1qen2fkSN9ot7ZoVnfzApjXaZ4hhmBO/gHBx0lfLvxh/4Juaf8V/20fDfxGk1zxvb6La6PqtrqsFp8R/Een3QuriS0aH7ItvdKlvAFhl8yOFokYmMlHKgr6XcfBPxJofxs+Gl7oPiHUYvAnhDRtS07VdO1DXb68utUllW2FpLI0zSG5ePypcyTyFx5mQWLNQB6vRXzb4u+GP7RPhDSfG8vgPxF4H1LVvEPjd9V0xfFGqXz2umaKbKKMWqBbeQxyC4QvsQeWAzHdliKKAP/9k=)"
],
"metadata": {
"id": "dPZGEowzcHCL"
}
},
{
"cell_type": "markdown",
"source": [
"## 1. Collect data\n",
"\n",
"### 1a. Mount your Google Drive\n",
"By running the next code block, you'll link your Google Drive to this notebook. This will help you while interacting with the files you need to upload."
],
"metadata": {
"id": "mbVvwZEIgiOl"
}
},
{
"cell_type": "code",
"execution_count": 1,
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/"
},
"id": "ZzMz9Or8bKTo",
"outputId": "cf731489-c43c-4875-a65f-f2637270dfc8"
},
"outputs": [
{
"output_type": "stream",
"name": "stdout",
"text": [
"Drive already mounted at /content/drive; to attempt to forcibly remount, call drive.mount(\"/content/drive\", force_remount=True).\n"
]
}
],
"source": [
"from google.colab import drive\n",
"drive.mount('/content/drive')"
]
},
{
"cell_type": "markdown",
"source": [
"### 1b. Upload VCF file from Drive\n",
"\n",
"By running the next block, you'll upload your VCF file to this notebook.\n",
"\n",
"**ATTENTION** ⚠! Mind that your file may have a different path (the thing with \"/\"s, like: `/content/drive/MyDrive/VCF/2827-22.tsv`) from the one reported here!"
],
"metadata": {
"id": "IJNvFwuxhGgM"
}
},
{
"cell_type": "code",
"source": [
"inf = \"/content/drive/MyDrive/VCF/2827-22.tsv\""
],
"metadata": {
"id": "qZSmCT37hEEJ"
},
"execution_count": 2,
"outputs": []
},
{
"cell_type": "markdown",
"source": [
"### 1c. Download the panel file and upload it to Google Drive\n",
"\n",
"Now you can go to [PanelApp](https://panelapp.genomicsengland.co.uk/panels/) and download your panel of interest from there.\n",
"\n",
"Once you downloaded it, run the following block and you'll be able to upload the file directly on Colab."
],
"metadata": {
"id": "9vXVylI7i77U"
}
},
{
"cell_type": "code",
"source": [
"from google.colab import files\n",
"uploaded = files.upload()"
],
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/",
"height": 73
},
"id": "COsJ0IWwjc55",
"outputId": "ddf59be4-1889-4c5f-d444-d2c259d55cd3"
},
"execution_count": 3,
"outputs": [
{
"output_type": "display_data",
"data": {
"text/plain": [
"<IPython.core.display.HTML object>"
],
"text/html": [
"\n",
" <input type=\"file\" id=\"files-bb75ab16-bce9-4b18-8ef5-731b0ab10724\" name=\"files[]\" multiple disabled\n",
" style=\"border:none\" />\n",
" <output id=\"result-bb75ab16-bce9-4b18-8ef5-731b0ab10724\">\n",
" Upload widget is only available when the cell has been executed in the\n",
" current browser session. Please rerun this cell to enable.\n",
" </output>\n",
" <script>// Copyright 2017 Google LLC\n",
"//\n",
"// Licensed under the Apache License, Version 2.0 (the \"License\");\n",
"// you may not use this file except in compliance with the License.\n",
"// You may obtain a copy of the License at\n",
"//\n",
"// http://www.apache.org/licenses/LICENSE-2.0\n",
"//\n",
"// Unless required by applicable law or agreed to in writing, software\n",
"// distributed under the License is distributed on an \"AS IS\" BASIS,\n",
"// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.\n",
"// See the License for the specific language governing permissions and\n",
"// limitations under the License.\n",
"\n",
"/**\n",
" * @fileoverview Helpers for google.colab Python module.\n",
" */\n",
"(function(scope) {\n",
"function span(text, styleAttributes = {}) {\n",
" const element = document.createElement('span');\n",
" element.textContent = text;\n",
" for (const key of Object.keys(styleAttributes)) {\n",
" element.style[key] = styleAttributes[key];\n",
" }\n",
" return element;\n",
"}\n",
"\n",
"// Max number of bytes which will be uploaded at a time.\n",
"const MAX_PAYLOAD_SIZE = 100 * 1024;\n",
"\n",
"function _uploadFiles(inputId, outputId) {\n",
" const steps = uploadFilesStep(inputId, outputId);\n",
" const outputElement = document.getElementById(outputId);\n",
" // Cache steps on the outputElement to make it available for the next call\n",
" // to uploadFilesContinue from Python.\n",
" outputElement.steps = steps;\n",
"\n",
" return _uploadFilesContinue(outputId);\n",
"}\n",
"\n",
"// This is roughly an async generator (not supported in the browser yet),\n",
"// where there are multiple asynchronous steps and the Python side is going\n",
"// to poll for completion of each step.\n",
"// This uses a Promise to block the python side on completion of each step,\n",
"// then passes the result of the previous step as the input to the next step.\n",
"function _uploadFilesContinue(outputId) {\n",
" const outputElement = document.getElementById(outputId);\n",
" const steps = outputElement.steps;\n",
"\n",
" const next = steps.next(outputElement.lastPromiseValue);\n",
" return Promise.resolve(next.value.promise).then((value) => {\n",
" // Cache the last promise value to make it available to the next\n",
" // step of the generator.\n",
" outputElement.lastPromiseValue = value;\n",
" return next.value.response;\n",
" });\n",
"}\n",
"\n",
"/**\n",
" * Generator function which is called between each async step of the upload\n",
" * process.\n",
" * @param {string} inputId Element ID of the input file picker element.\n",
" * @param {string} outputId Element ID of the output display.\n",
" * @return {!Iterable<!Object>} Iterable of next steps.\n",
" */\n",
"function* uploadFilesStep(inputId, outputId) {\n",
" const inputElement = document.getElementById(inputId);\n",
" inputElement.disabled = false;\n",
"\n",
" const outputElement = document.getElementById(outputId);\n",
" outputElement.innerHTML = '';\n",
"\n",
" const pickedPromise = new Promise((resolve) => {\n",
" inputElement.addEventListener('change', (e) => {\n",
" resolve(e.target.files);\n",
" });\n",
" });\n",
"\n",
" const cancel = document.createElement('button');\n",
" inputElement.parentElement.appendChild(cancel);\n",
" cancel.textContent = 'Cancel upload';\n",
" const cancelPromise = new Promise((resolve) => {\n",
" cancel.onclick = () => {\n",
" resolve(null);\n",
" };\n",
" });\n",
"\n",
" // Wait for the user to pick the files.\n",
" const files = yield {\n",
" promise: Promise.race([pickedPromise, cancelPromise]),\n",
" response: {\n",
" action: 'starting',\n",
" }\n",
" };\n",
"\n",
" cancel.remove();\n",
"\n",
" // Disable the input element since further picks are not allowed.\n",
" inputElement.disabled = true;\n",
"\n",
" if (!files) {\n",
" return {\n",
" response: {\n",
" action: 'complete',\n",
" }\n",
" };\n",
" }\n",
"\n",
" for (const file of files) {\n",
" const li = document.createElement('li');\n",
" li.append(span(file.name, {fontWeight: 'bold'}));\n",
" li.append(span(\n",
" `(${file.type || 'n/a'}) - ${file.size} bytes, ` +\n",
" `last modified: ${\n",
" file.lastModifiedDate ? file.lastModifiedDate.toLocaleDateString() :\n",
" 'n/a'} - `));\n",
" const percent = span('0% done');\n",
" li.appendChild(percent);\n",
"\n",
" outputElement.appendChild(li);\n",
"\n",
" const fileDataPromise = new Promise((resolve) => {\n",
" const reader = new FileReader();\n",
" reader.onload = (e) => {\n",
" resolve(e.target.result);\n",
" };\n",
" reader.readAsArrayBuffer(file);\n",
" });\n",
" // Wait for the data to be ready.\n",
" let fileData = yield {\n",
" promise: fileDataPromise,\n",
" response: {\n",
" action: 'continue',\n",
" }\n",
" };\n",
"\n",
" // Use a chunked sending to avoid message size limits. See b/62115660.\n",
" let position = 0;\n",
" do {\n",
" const length = Math.min(fileData.byteLength - position, MAX_PAYLOAD_SIZE);\n",
" const chunk = new Uint8Array(fileData, position, length);\n",
" position += length;\n",
"\n",
" const base64 = btoa(String.fromCharCode.apply(null, chunk));\n",
" yield {\n",
" response: {\n",
" action: 'append',\n",
" file: file.name,\n",
" data: base64,\n",
" },\n",
" };\n",
"\n",
" let percentDone = fileData.byteLength === 0 ?\n",
" 100 :\n",
" Math.round((position / fileData.byteLength) * 100);\n",
" percent.textContent = `${percentDone}% done`;\n",
"\n",
" } while (position < fileData.byteLength);\n",
" }\n",
"\n",
" // All done.\n",
" yield {\n",
" response: {\n",
" action: 'complete',\n",
" }\n",
" };\n",
"}\n",
"\n",
"scope.google = scope.google || {};\n",
"scope.google.colab = scope.google.colab || {};\n",
"scope.google.colab._files = {\n",
" _uploadFiles,\n",
" _uploadFilesContinue,\n",
"};\n",
"})(self);\n",
"</script> "
]
},
"metadata": {}
},
{
"output_type": "stream",
"name": "stdout",
"text": [
"Saving Congenital myopathy.tsv to Congenital myopathy.tsv\n"
]
}
]
},
{
"cell_type": "markdown",
"source": [
"Now run this block to allow this notebook to actually *use* this file:"
],
"metadata": {
"id": "pnY4yannjtKU"
}
},
{
"cell_type": "code",
"source": [
"panel = \"Congenital myopathy.tsv\""
],
"metadata": {
"id": "cH9CBTDwj26K"
},
"execution_count": 4,
"outputs": []
},
{
"cell_type": "markdown",
"source": [
"## 2. Preprocess the data\n",
"### 2a. Define constants\n",
"\n",
"We'll be defining constants that we'll be using to filter our VCF file, specifically we'll set the columns we want to retain and the variant effects we want to keep."
],
"metadata": {
"id": "5cw0u8Rsj-5h"
}
},
{
"cell_type": "code",
"source": [
"EFFECTS = {'start_lost', 'stop_lost', 'stop_gained', 'missense_variant', 'frameshift_variant&stop_lost', 'disruptive_inframe_deletion', 'initiator_codon_variant','frameshift_variant', 'disruptive_inframe_insertion', 'frameshift_variant&start_lost', 'frameshift_variant&stop_gained', 'stop_retained_variant'}\n",
"TO_RETAIN = ['CHR', 'START', 'END', 'REF', 'ALT', 'EFFECT', 'GENE', 'TRANSCRIPT_ID', 'SELECT_CANONICAL', 'EXON_INTRON_NUM', 'HGVS_C', 'HGVS_P', 'CDS_DISTANCE', 'AA_LEN', 'OTHER_TRANSCRIPTS', 'gnomAD_AF_ALL', 'gnomAD_Hom_ALL', 'gnomAD_AF_NFE', 'gnomAD_Hom_NFE', 'CADD_score', 'PolyPhen-2_pred', 'SIFT_pred', 'PseeAC-RF_pred', 'PseeAC-RF_score', 'ClinVar_hotSpot', 'ClinVar_RCV', 'ClinVar_clinical_significance', 'ClinVar_rev_status', 'ClinVar_traits', 'ClinVar_PMIDS', 'Diseases', 'Disease_IDs']\n",
"TO_RETAIN_SPEC = ['.GENO', '.GENO_QUAL', '.AF', '.AO', '.RO', '.COV']"
],
"metadata": {
"id": "91B_J-1QkWLh"
},
"execution_count": 6,
"outputs": []
},
{
"cell_type": "markdown",
"source": [
"### 2b. Define functions\n",
"\n",
"**What the hell is a python function??🤯**\n",
"\n",
"A python function is like an oven: it takes some inputs (a raw cake) and spits out some outputs (a baked cake)🎂... This is everything we need to know!\n",
"\n",
"**Why are function useful for us?**\n",
"\n",
"We'll be using functions to:\n",
"\n",
"- preprocess our data (`do_preprocess`, that takes our raw VCF as input and returns only the columns we want as output)\n",
"- filter variants according to effect, quality, coverage, gnomAD frequency, gnomAD homozygous count (`do_filter`, that takes the selected columns as input and returns a table with gene, effect, predicted pathogenicity, HGVS_C and HGVS_P for each filtered variant)\n",
"- compare filtered variants to panel genes (`summarize_out`, which takes the table from `do_filter` and turns it into a final, nicely summarized, table)"
],
"metadata": {
"id": "TAOB3guPkdeG"
}
},
{
"cell_type": "code",
"source": [
"import pandas as pd\n",
"import os\n",
"\n",
"def eliminate_first_line(infile):\n",
" fp = open(infile, \"r+\")\n",
" lines = fp.readlines()\n",
" if lines[0].startswith(\"#\"):\n",
" fp.seek(0)\n",
" fp.truncate()\n",
" lines = [line for line in lines if not line.startswith(\"#\")]\n",
" fp.writelines(lines)\n",
" fp.close()\n",
" return infile\n",
" return infile\n",
"\n",
"\n",
"def do_preprocess(infile):\n",
" global TO_RETAIN_SPEC\n",
" indf = pd.read_csv(eliminate_first_line(infile), delimiter=\"\\t\")\n",
" keys = list(indf.keys())\n",
" to_retain = TO_RETAIN\n",
" for i in TO_RETAIN_SPEC:\n",
" for key in keys:\n",
" if key.endswith(i):\n",
" to_retain.append(key)\n",
" else:\n",
" continue\n",
" newdf = indf[to_retain]\n",
" return newdf\n",
"\n",
"def do_filter(df):\n",
" newdict = {key: df[key] for key in list(df.keys())}\n",
" newdf = {\"GENE\": [], \"EFFECT\": [], \"Polyphen\": [], \"SIFT\": [], \"HGVS_CP\": []}\n",
" print(\"Variants before filtering:\", len(newdict[list(newdict.keys())[0]]))\n",
" for i in range(len(newdict[list(newdict.keys())[0]])):\n",
" if (newdict[\"EFFECT\"][i] not in EFFECTS) or (float(newdict[\"gnomAD_AF_ALL\"][i]) > 0.01) or (int(newdict[\"2827-22.GENO_QUAL\"][i]) != 99) or (int(newdict[\"2827-22.COV\"][i]) < 15) or (int(newdict[\"gnomAD_Hom_ALL\"][i]) > 35):\n",
" continue\n",
" else:\n",
" newdf[\"GENE\"].append(newdict[\"GENE\"][i])\n",
" newdf[\"Polyphen\"].append(newdict[\"PolyPhen-2_pred\"][i])\n",
" newdf[\"SIFT\"].append(newdict[\"SIFT_pred\"][i])\n",
" newdf[\"HGVS_CP\"].append(\":\".join([newdict[\"HGVS_C\"][i], newdict[\"HGVS_P\"][i]]))\n",
" newdf[\"EFFECT\"].append(newdict[\"EFFECT\"][i])\n",
" print(\"Variants after filtering:\", len(newdf[list(newdf.keys())[0]]))\n",
" return pd.DataFrame.from_dict(newdf)\n",
"\n",
"def summarize_out(panel_df, target_df):\n",
" genes = list(panel_df[\"Gene Symbol\"])\n",
" gens = list(target_df[\"GENE\"])\n",
" rows = []\n",
" for i in range(len(gens)):\n",
" if gens[i] in genes:\n",
" rows.append(target_df.iloc[i])\n",
" else:\n",
" continue\n",
" return pd.DataFrame(rows)"
],
"metadata": {
"id": "_LATx913nIHL"
},
"execution_count": 13,
"outputs": []
},
{
"cell_type": "markdown",
"source": [
"## 3. Finally get your results\n",
"\n",
"Now that we have defined everything, it's time to get our results!\n",
"\n",
"Let's first **select our columns of interest**:😎"
],
"metadata": {
"id": "AJ00PQOQnaM8"
}
},
{
"cell_type": "code",
"source": [
"df = do_preprocess(inf)"
],
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/"
},
"id": "IzOFbV2enZUS",
"outputId": "75b14500-b53a-421a-d156-c5d1e3057f30"
},
"execution_count": 10,
"outputs": [
{
"output_type": "stream",
"name": "stderr",
"text": [
"<ipython-input-9-7f2991e6aecd>:19: DtypeWarning: Columns (0) have mixed types. Specify dtype option on import or set low_memory=False.\n",
" indf = pd.read_csv(eliminate_first_line(infile), delimiter=\"\\t\")\n"
]
}
]
},
{
"cell_type": "markdown",
"source": [
"(Let's see how the file looks like now🧐)"
],
"metadata": {
"id": "W8c8peT5oCph"
}
},
{
"cell_type": "code",
"source": [
"df"
],
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/",
"height": 461
},
"id": "4kR-pN0DoBt8",
"outputId": "14778977-b9bd-430c-835a-e1f1805a950d"
},
"execution_count": 11,
"outputs": [
{
"output_type": "execute_result",
"data": {
"text/plain": [
" CHR START END REF ALT \\\n",
"0 1 69270 69270 A G \n",
"1 1 69511 69511 A G \n",
"2 1 69897 69897 T C \n",
"3 1 183358 183358 G C \n",
"4 1 183937 183937 G A \n",
"... .. ... ... ... .. \n",
"133833 Y 14748901 14748904 AAAG - \n",
"133834 Y 14840785 14840785 C T \n",
"133835 Y 19705901 19705901 A G \n",
"133836 Y 19732748 19732748 A - \n",
"133837 Y 21540939 21540939 A G \n",
"\n",
" EFFECT GENE TRANSCRIPT_ID \\\n",
"0 synonymous_variant OR4F5 NM_001005484.2 \n",
"1 missense_variant OR4F5 NM_001005484.2 \n",
"2 synonymous_variant OR4F5 NM_001005484.2 \n",
"3 intron_variant DDX11L17 NR_148357.1 \n",
"4 non_coding_transcript_exon_variant DDX11L17 NR_148357.1 \n",
"... ... ... ... \n",
"133833 intron_variant NLGN4Y NM_001365584.1 \n",
"133834 synonymous_variant NLGN4Y NM_001365584.1 \n",
"133835 3_prime_UTR_variant KDM5D NM_004653.5 \n",
"133836 splice_region_variant&intron_variant KDM5D NM_004653.5 \n",
"133837 intron_variant RBMY1A1 NM_005058.4 \n",
"\n",
" SELECT_CANONICAL EXON_INTRON_NUM ... ClinVar_traits ClinVar_PMIDS \\\n",
"0 True 3 ... . . \n",
"1 True 3 ... . . \n",
"2 True 3 ... . . \n",
"3 False 2 ... . . \n",
"4 False 3 ... . . \n",
"... ... ... ... ... ... \n",
"133833 True 4 ... . . \n",
"133834 True 7 ... not provided . \n",
"133835 True 27 ... . . \n",
"133836 True 8 ... . . \n",
"133837 True 4 ... . . \n",
"\n",
" Diseases Disease_IDs 2827-22.GENO 2827-22.GENO_QUAL 2827-22.AF \\\n",
"0 . . hom 42.0 1.000000 \n",
"1 . . hom 99.0 1.000000 \n",
"2 . . hom 9.0 1.000000 \n",
"3 . . het 60.0 0.117647 \n",
"4 . . het 49.0 0.123711 \n",
"... ... ... ... ... ... \n",
"133833 . . het 39.0 0.500000 \n",
"133834 . . het 99.0 0.117073 \n",
"133835 . . hom 99.0 1.000000 \n",
"133836 . . het 99.0 0.292308 \n",
"133837 . . het 50.0 0.152174 \n",
"\n",
" 2827-22.AO 2827-22.RO 2827-22.COV \n",
"0 14 0 14 \n",
"1 108 0 108 \n",
"2 3 0 3 \n",
"3 8 60 68 \n",
"4 12 85 97 \n",
"... ... ... ... \n",
"133833 1 1 2 \n",
"133834 24 181 205 \n",
"133835 33 0 33 \n",
"133836 19 35 65 \n",
"133837 7 39 46 \n",
"\n",
"[133838 rows x 38 columns]"
],
"text/html": [
"\n",
" <div id=\"df-0c7421d0-3dcd-45b3-b3e3-bec8f769d96b\" class=\"colab-df-container\">\n",
" <div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>CHR</th>\n",
" <th>START</th>\n",
" <th>END</th>\n",
" <th>REF</th>\n",
" <th>ALT</th>\n",
" <th>EFFECT</th>\n",
" <th>GENE</th>\n",
" <th>TRANSCRIPT_ID</th>\n",
" <th>SELECT_CANONICAL</th>\n",
" <th>EXON_INTRON_NUM</th>\n",
" <th>...</th>\n",
" <th>ClinVar_traits</th>\n",
" <th>ClinVar_PMIDS</th>\n",
" <th>Diseases</th>\n",
" <th>Disease_IDs</th>\n",
" <th>2827-22.GENO</th>\n",
" <th>2827-22.GENO_QUAL</th>\n",
" <th>2827-22.AF</th>\n",
" <th>2827-22.AO</th>\n",
" <th>2827-22.RO</th>\n",
" <th>2827-22.COV</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>0</th>\n",
" <td>1</td>\n",
" <td>69270</td>\n",
" <td>69270</td>\n",
" <td>A</td>\n",
" <td>G</td>\n",
" <td>synonymous_variant</td>\n",
" <td>OR4F5</td>\n",
" <td>NM_001005484.2</td>\n",
" <td>True</td>\n",
" <td>3</td>\n",
" <td>...</td>\n",
" <td>.</td>\n",
" <td>.</td>\n",
" <td>.</td>\n",
" <td>.</td>\n",
" <td>hom</td>\n",
" <td>42.0</td>\n",
" <td>1.000000</td>\n",
" <td>14</td>\n",
" <td>0</td>\n",
" <td>14</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1</th>\n",
" <td>1</td>\n",
" <td>69511</td>\n",
" <td>69511</td>\n",
" <td>A</td>\n",
" <td>G</td>\n",
" <td>missense_variant</td>\n",
" <td>OR4F5</td>\n",
" <td>NM_001005484.2</td>\n",
" <td>True</td>\n",
" <td>3</td>\n",
" <td>...</td>\n",
" <td>.</td>\n",
" <td>.</td>\n",
" <td>.</td>\n",
" <td>.</td>\n",
" <td>hom</td>\n",
" <td>99.0</td>\n",
" <td>1.000000</td>\n",
" <td>108</td>\n",
" <td>0</td>\n",
" <td>108</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2</th>\n",
" <td>1</td>\n",
" <td>69897</td>\n",
" <td>69897</td>\n",
" <td>T</td>\n",
" <td>C</td>\n",
" <td>synonymous_variant</td>\n",
" <td>OR4F5</td>\n",
" <td>NM_001005484.2</td>\n",
" <td>True</td>\n",
" <td>3</td>\n",
" <td>...</td>\n",
" <td>.</td>\n",
" <td>.</td>\n",
" <td>.</td>\n",
" <td>.</td>\n",
" <td>hom</td>\n",
" <td>9.0</td>\n",
" <td>1.000000</td>\n",
" <td>3</td>\n",
" <td>0</td>\n",
" <td>3</td>\n",
" </tr>\n",
" <tr>\n",
" <th>3</th>\n",
" <td>1</td>\n",
" <td>183358</td>\n",
" <td>183358</td>\n",
" <td>G</td>\n",
" <td>C</td>\n",
" <td>intron_variant</td>\n",
" <td>DDX11L17</td>\n",
" <td>NR_148357.1</td>\n",
" <td>False</td>\n",
" <td>2</td>\n",
" <td>...</td>\n",
" <td>.</td>\n",
" <td>.</td>\n",
" <td>.</td>\n",
" <td>.</td>\n",
" <td>het</td>\n",
" <td>60.0</td>\n",
" <td>0.117647</td>\n",
" <td>8</td>\n",
" <td>60</td>\n",
" <td>68</td>\n",
" </tr>\n",
" <tr>\n",
" <th>4</th>\n",
" <td>1</td>\n",
" <td>183937</td>\n",
" <td>183937</td>\n",
" <td>G</td>\n",
" <td>A</td>\n",
" <td>non_coding_transcript_exon_variant</td>\n",
" <td>DDX11L17</td>\n",
" <td>NR_148357.1</td>\n",
" <td>False</td>\n",
" <td>3</td>\n",
" <td>...</td>\n",
" <td>.</td>\n",
" <td>.</td>\n",
" <td>.</td>\n",
" <td>.</td>\n",
" <td>het</td>\n",
" <td>49.0</td>\n",
" <td>0.123711</td>\n",
" <td>12</td>\n",
" <td>85</td>\n",
" <td>97</td>\n",
" </tr>\n",
" <tr>\n",
" <th>...</th>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" </tr>\n",
" <tr>\n",
" <th>133833</th>\n",
" <td>Y</td>\n",
" <td>14748901</td>\n",
" <td>14748904</td>\n",
" <td>AAAG</td>\n",
" <td>-</td>\n",
" <td>intron_variant</td>\n",
" <td>NLGN4Y</td>\n",
" <td>NM_001365584.1</td>\n",
" <td>True</td>\n",
" <td>4</td>\n",
" <td>...</td>\n",
" <td>.</td>\n",
" <td>.</td>\n",
" <td>.</td>\n",
" <td>.</td>\n",
" <td>het</td>\n",
" <td>39.0</td>\n",
" <td>0.500000</td>\n",
" <td>1</td>\n",
" <td>1</td>\n",
" <td>2</td>\n",
" </tr>\n",
" <tr>\n",
" <th>133834</th>\n",
" <td>Y</td>\n",
" <td>14840785</td>\n",
" <td>14840785</td>\n",
" <td>C</td>\n",
" <td>T</td>\n",
" <td>synonymous_variant</td>\n",
" <td>NLGN4Y</td>\n",
" <td>NM_001365584.1</td>\n",
" <td>True</td>\n",
" <td>7</td>\n",
" <td>...</td>\n",
" <td>not provided</td>\n",
" <td>.</td>\n",
" <td>.</td>\n",
" <td>.</td>\n",
" <td>het</td>\n",
" <td>99.0</td>\n",
" <td>0.117073</td>\n",
" <td>24</td>\n",
" <td>181</td>\n",
" <td>205</td>\n",
" </tr>\n",
" <tr>\n",
" <th>133835</th>\n",
" <td>Y</td>\n",
" <td>19705901</td>\n",
" <td>19705901</td>\n",
" <td>A</td>\n",
" <td>G</td>\n",
" <td>3_prime_UTR_variant</td>\n",
" <td>KDM5D</td>\n",
" <td>NM_004653.5</td>\n",
" <td>True</td>\n",
" <td>27</td>\n",
" <td>...</td>\n",
" <td>.</td>\n",
" <td>.</td>\n",
" <td>.</td>\n",
" <td>.</td>\n",
" <td>hom</td>\n",
" <td>99.0</td>\n",
" <td>1.000000</td>\n",
" <td>33</td>\n",
" <td>0</td>\n",
" <td>33</td>\n",
" </tr>\n",
" <tr>\n",
" <th>133836</th>\n",
" <td>Y</td>\n",
" <td>19732748</td>\n",
" <td>19732748</td>\n",
" <td>A</td>\n",
" <td>-</td>\n",
" <td>splice_region_variant&amp;intron_variant</td>\n",
" <td>KDM5D</td>\n",
" <td>NM_004653.5</td>\n",
" <td>True</td>\n",
" <td>8</td>\n",
" <td>...</td>\n",
" <td>.</td>\n",
" <td>.</td>\n",
" <td>.</td>\n",
" <td>.</td>\n",
" <td>het</td>\n",
" <td>99.0</td>\n",
" <td>0.292308</td>\n",
" <td>19</td>\n",
" <td>35</td>\n",
" <td>65</td>\n",
" </tr>\n",
" <tr>\n",
" <th>133837</th>\n",
" <td>Y</td>\n",
" <td>21540939</td>\n",
" <td>21540939</td>\n",
" <td>A</td>\n",
" <td>G</td>\n",
" <td>intron_variant</td>\n",
" <td>RBMY1A1</td>\n",
" <td>NM_005058.4</td>\n",
" <td>True</td>\n",
" <td>4</td>\n",
" <td>...</td>\n",
" <td>.</td>\n",
" <td>.</td>\n",
" <td>.</td>\n",
" <td>.</td>\n",
" <td>het</td>\n",
" <td>50.0</td>\n",
" <td>0.152174</td>\n",
" <td>7</td>\n",
" <td>39</td>\n",
" <td>46</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"<p>133838 rows × 38 columns</p>\n",
"</div>\n",
" <div class=\"colab-df-buttons\">\n",
"\n",
" <div class=\"colab-df-container\">\n",
" <button class=\"colab-df-convert\" onclick=\"convertToInteractive('df-0c7421d0-3dcd-45b3-b3e3-bec8f769d96b')\"\n",
" title=\"Convert this dataframe to an interactive table.\"\n",
" style=\"display:none;\">\n",
"\n",
" <svg xmlns=\"http://www.w3.org/2000/svg\" height=\"24px\" viewBox=\"0 -960 960 960\">\n",
" <path d=\"M120-120v-720h720v720H120Zm60-500h600v-160H180v160Zm220 220h160v-160H400v160Zm0 220h160v-160H400v160ZM180-400h160v-160H180v160Zm440 0h160v-160H620v160ZM180-180h160v-160H180v160Zm440 0h160v-160H620v160Z\"/>\n",
" </svg>\n",
" </button>\n",
"\n",
" <style>\n",
" .colab-df-container {\n",
" display:flex;\n",
" gap: 12px;\n",
" }\n",
"\n",
" .colab-df-convert {\n",
" background-color: #E8F0FE;\n",
" border: none;\n",
" border-radius: 50%;\n",
" cursor: pointer;\n",
" display: none;\n",
" fill: #1967D2;\n",
" height: 32px;\n",
" padding: 0 0 0 0;\n",
" width: 32px;\n",
" }\n",
"\n",
" .colab-df-convert:hover {\n",
" background-color: #E2EBFA;\n",
" box-shadow: 0px 1px 2px rgba(60, 64, 67, 0.3), 0px 1px 3px 1px rgba(60, 64, 67, 0.15);\n",
" fill: #174EA6;\n",
" }\n",
"\n",
" .colab-df-buttons div {\n",
" margin-bottom: 4px;\n",
" }\n",
"\n",
" [theme=dark] .colab-df-convert {\n",
" background-color: #3B4455;\n",
" fill: #D2E3FC;\n",
" }\n",
"\n",
" [theme=dark] .colab-df-convert:hover {\n",
" background-color: #434B5C;\n",
" box-shadow: 0px 1px 3px 1px rgba(0, 0, 0, 0.15);\n",
" filter: drop-shadow(0px 1px 2px rgba(0, 0, 0, 0.3));\n",
" fill: #FFFFFF;\n",
" }\n",
" </style>\n",
"\n",
" <script>\n",
" const buttonEl =\n",
" document.querySelector('#df-0c7421d0-3dcd-45b3-b3e3-bec8f769d96b button.colab-df-convert');\n",
" buttonEl.style.display =\n",
" google.colab.kernel.accessAllowed ? 'block' : 'none';\n",
"\n",
" async function convertToInteractive(key) {\n",
" const element = document.querySelector('#df-0c7421d0-3dcd-45b3-b3e3-bec8f769d96b');\n",
" const dataTable =\n",
" await google.colab.kernel.invokeFunction('convertToInteractive',\n",
" [key], {});\n",
" if (!dataTable) return;\n",
"\n",
" const docLinkHtml = 'Like what you see? Visit the ' +\n",
" '<a target=\"_blank\" href=https://colab.research.google.com/notebooks/data_table.ipynb>data table notebook</a>'\n",
" + ' to learn more about interactive tables.';\n",
" element.innerHTML = '';\n",
" dataTable['output_type'] = 'display_data';\n",
" await google.colab.output.renderOutput(dataTable, element);\n",
" const docLink = document.createElement('div');\n",
" docLink.innerHTML = docLinkHtml;\n",
" element.appendChild(docLink);\n",
" }\n",
" </script>\n",
" </div>\n",
"\n",
"\n",
"<div id=\"df-a1198fdb-ff6a-4151-a8a0-1a4be79042a3\">\n",
" <button class=\"colab-df-quickchart\" onclick=\"quickchart('df-a1198fdb-ff6a-4151-a8a0-1a4be79042a3')\"\n",
" title=\"Suggest charts\"\n",
" style=\"display:none;\">\n",
"\n",
"<svg xmlns=\"http://www.w3.org/2000/svg\" height=\"24px\"viewBox=\"0 0 24 24\"\n",
" width=\"24px\">\n",
" <g>\n",
" <path d=\"M19 3H5c-1.1 0-2 .9-2 2v14c0 1.1.9 2 2 2h14c1.1 0 2-.9 2-2V5c0-1.1-.9-2-2-2zM9 17H7v-7h2v7zm4 0h-2V7h2v10zm4 0h-2v-4h2v4z\"/>\n",
" </g>\n",
"</svg>\n",
" </button>\n",
"\n",
"<style>\n",
" .colab-df-quickchart {\n",
" --bg-color: #E8F0FE;\n",
" --fill-color: #1967D2;\n",
" --hover-bg-color: #E2EBFA;\n",
" --hover-fill-color: #174EA6;\n",
" --disabled-fill-color: #AAA;\n",
" --disabled-bg-color: #DDD;\n",
" }\n",
"\n",
" [theme=dark] .colab-df-quickchart {\n",
" --bg-color: #3B4455;\n",
" --fill-color: #D2E3FC;\n",
" --hover-bg-color: #434B5C;\n",
" --hover-fill-color: #FFFFFF;\n",
" --disabled-bg-color: #3B4455;\n",
" --disabled-fill-color: #666;\n",
" }\n",
"\n",
" .colab-df-quickchart {\n",
" background-color: var(--bg-color);\n",
" border: none;\n",
" border-radius: 50%;\n",
" cursor: pointer;\n",
" display: none;\n",
" fill: var(--fill-color);\n",
" height: 32px;\n",
" padding: 0;\n",
" width: 32px;\n",
" }\n",
"\n",
" .colab-df-quickchart:hover {\n",
" background-color: var(--hover-bg-color);\n",
" box-shadow: 0 1px 2px rgba(60, 64, 67, 0.3), 0 1px 3px 1px rgba(60, 64, 67, 0.15);\n",
" fill: var(--button-hover-fill-color);\n",
" }\n",
"\n",
" .colab-df-quickchart-complete:disabled,\n",
" .colab-df-quickchart-complete:disabled:hover {\n",
" background-color: var(--disabled-bg-color);\n",
" fill: var(--disabled-fill-color);\n",
" box-shadow: none;\n",
" }\n",
"\n",
" .colab-df-spinner {\n",
" border: 2px solid var(--fill-color);\n",
" border-color: transparent;\n",
" border-bottom-color: var(--fill-color);\n",
" animation:\n",
" spin 1s steps(1) infinite;\n",
" }\n",
"\n",
" @keyframes spin {\n",
" 0% {\n",
" border-color: transparent;\n",
" border-bottom-color: var(--fill-color);\n",
" border-left-color: var(--fill-color);\n",
" }\n",
" 20% {\n",
" border-color: transparent;\n",
" border-left-color: var(--fill-color);\n",
" border-top-color: var(--fill-color);\n",
" }\n",
" 30% {\n",
" border-color: transparent;\n",
" border-left-color: var(--fill-color);\n",
" border-top-color: var(--fill-color);\n",
" border-right-color: var(--fill-color);\n",
" }\n",
" 40% {\n",
" border-color: transparent;\n",
" border-right-color: var(--fill-color);\n",
" border-top-color: var(--fill-color);\n",
" }\n",
" 60% {\n",
" border-color: transparent;\n",
" border-right-color: var(--fill-color);\n",
" }\n",
" 80% {\n",
" border-color: transparent;\n",
" border-right-color: var(--fill-color);\n",
" border-bottom-color: var(--fill-color);\n",
" }\n",
" 90% {\n",
" border-color: transparent;\n",
" border-bottom-color: var(--fill-color);\n",
" }\n",
" }\n",
"</style>\n",
"\n",
" <script>\n",
" async function quickchart(key) {\n",
" const quickchartButtonEl =\n",
" document.querySelector('#' + key + ' button');\n",
" quickchartButtonEl.disabled = true; // To prevent multiple clicks.\n",
" quickchartButtonEl.classList.add('colab-df-spinner');\n",
" try {\n",
" const charts = await google.colab.kernel.invokeFunction(\n",
" 'suggestCharts', [key], {});\n",
" } catch (error) {\n",
" console.error('Error during call to suggestCharts:', error);\n",
" }\n",
" quickchartButtonEl.classList.remove('colab-df-spinner');\n",
" quickchartButtonEl.classList.add('colab-df-quickchart-complete');\n",
" }\n",
" (() => {\n",
" let quickchartButtonEl =\n",
" document.querySelector('#df-a1198fdb-ff6a-4151-a8a0-1a4be79042a3 button');\n",
" quickchartButtonEl.style.display =\n",
" google.colab.kernel.accessAllowed ? 'block' : 'none';\n",
" })();\n",
" </script>\n",
"</div>\n",
" </div>\n",
" </div>\n"
],
"application/vnd.google.colaboratory.intrinsic+json": {
"type": "dataframe",
"variable_name": "df"
}
},
"metadata": {},
"execution_count": 11
}
]
},
{
"cell_type": "markdown",
"source": [
"Now we filter out all the variants that do not match the following criteria:\n",
"\n",
"- Less than 1% frequency on gnomAD\n",
"- Less than 35 homozygous individuals registered\n",
"- 99.0 quality\n",
"- More than 15 reads coverage\n",
"- Exonic variant"
],
"metadata": {
"id": "c7TWiYO0qFBV"
}
},
{
"cell_type": "code",
"source": [
"newdf = do_filter(df)"
],
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/"
},
"id": "w8xx0uz8qfDg",
"outputId": "0c5de803-5485-4356-f1d7-30891b736df8"
},
"execution_count": 14,
"outputs": [
{
"output_type": "stream",
"name": "stdout",
"text": [
"Variants before filtering: 133838\n",
"Variants after filtering: 1730\n"
]
}
]
},
{
"cell_type": "markdown",
"source": [
"(Let's see how the file looks like now🧐)"
],
"metadata": {
"id": "0X2pbPo6q5Ti"
}
},
{
"cell_type": "code",
"source": [
"newdf"
],
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/",
"height": 423
},
"id": "O_Vtih6bq2q4",
"outputId": "2bbc9125-260c-426f-c1c7-e48e6d357d18"
},
"execution_count": 15,
"outputs": [
{
"output_type": "execute_result",
"data": {
"text/plain": [
" GENE EFFECT Polyphen SIFT \\\n",
"0 CDK11A disruptive_inframe_deletion . . \n",
"1 CCDC27 missense_variant D B \n",
"2 RPL22 missense_variant B B \n",
"3 ZBTB48 missense_variant B B \n",
"4 FBXO6 missense_variant B B \n",
"... ... ... ... ... \n",
"1725 CNGA2 missense_variant B B \n",
"1726 PNMA3 missense_variant . . \n",
"1727 RPL10 missense_variant . . \n",
"1728 RPL10 missense_variant B D \n",
"1729 PLXNA3 missense_variant . . \n",
"\n",
" HGVS_CP \n",
"0 c.933_941delGGAGGAGGA:p.Glu312_Glu314del \n",
"1 c.1513A>C:p.Asn505His \n",
"2 c.20T>C:p.Leu7Pro \n",
"3 c.679G>A:p.Gly227Ser \n",
"4 c.820C>A:p.Gln274Lys \n",
"... ... \n",
"1725 c.338G>T:p.Gly113Val \n",
"1726 c.1130C>T:p.Ala377Val \n",
"1727 c.605G>A:p.Ser202Asn \n",
"1728 c.628C>T:p.Arg210Trp \n",
"1729 c.2589G>C:p.Glu863Asp \n",
"\n",
"[1730 rows x 5 columns]"
],
"text/html": [
"\n",
" <div id=\"df-0a3fbd4d-d322-4c72-9c0b-25d1e4dee587\" class=\"colab-df-container\">\n",
" <div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>GENE</th>\n",
" <th>EFFECT</th>\n",
" <th>Polyphen</th>\n",
" <th>SIFT</th>\n",
" <th>HGVS_CP</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>0</th>\n",
" <td>CDK11A</td>\n",
" <td>disruptive_inframe_deletion</td>\n",
" <td>.</td>\n",
" <td>.</td>\n",
" <td>c.933_941delGGAGGAGGA:p.Glu312_Glu314del</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1</th>\n",
" <td>CCDC27</td>\n",
" <td>missense_variant</td>\n",
" <td>D</td>\n",
" <td>B</td>\n",
" <td>c.1513A&gt;C:p.Asn505His</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2</th>\n",
" <td>RPL22</td>\n",
" <td>missense_variant</td>\n",
" <td>B</td>\n",
" <td>B</td>\n",
" <td>c.20T&gt;C:p.Leu7Pro</td>\n",
" </tr>\n",
" <tr>\n",
" <th>3</th>\n",
" <td>ZBTB48</td>\n",
" <td>missense_variant</td>\n",
" <td>B</td>\n",
" <td>B</td>\n",
" <td>c.679G&gt;A:p.Gly227Ser</td>\n",
" </tr>\n",
" <tr>\n",
" <th>4</th>\n",
" <td>FBXO6</td>\n",
" <td>missense_variant</td>\n",
" <td>B</td>\n",
" <td>B</td>\n",
" <td>c.820C&gt;A:p.Gln274Lys</td>\n",
" </tr>\n",
" <tr>\n",
" <th>...</th>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1725</th>\n",
" <td>CNGA2</td>\n",
" <td>missense_variant</td>\n",
" <td>B</td>\n",
" <td>B</td>\n",
" <td>c.338G&gt;T:p.Gly113Val</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1726</th>\n",
" <td>PNMA3</td>\n",
" <td>missense_variant</td>\n",
" <td>.</td>\n",
" <td>.</td>\n",
" <td>c.1130C&gt;T:p.Ala377Val</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1727</th>\n",
" <td>RPL10</td>\n",
" <td>missense_variant</td>\n",
" <td>.</td>\n",
" <td>.</td>\n",
" <td>c.605G&gt;A:p.Ser202Asn</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1728</th>\n",
" <td>RPL10</td>\n",
" <td>missense_variant</td>\n",
" <td>B</td>\n",
" <td>D</td>\n",
" <td>c.628C&gt;T:p.Arg210Trp</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1729</th>\n",
" <td>PLXNA3</td>\n",
" <td>missense_variant</td>\n",
" <td>.</td>\n",
" <td>.</td>\n",
" <td>c.2589G&gt;C:p.Glu863Asp</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"<p>1730 rows × 5 columns</p>\n",
"</div>\n",
" <div class=\"colab-df-buttons\">\n",
"\n",
" <div class=\"colab-df-container\">\n",
" <button class=\"colab-df-convert\" onclick=\"convertToInteractive('df-0a3fbd4d-d322-4c72-9c0b-25d1e4dee587')\"\n",
" title=\"Convert this dataframe to an interactive table.\"\n",
" style=\"display:none;\">\n",
"\n",
" <svg xmlns=\"http://www.w3.org/2000/svg\" height=\"24px\" viewBox=\"0 -960 960 960\">\n",
" <path d=\"M120-120v-720h720v720H120Zm60-500h600v-160H180v160Zm220 220h160v-160H400v160Zm0 220h160v-160H400v160ZM180-400h160v-160H180v160Zm440 0h160v-160H620v160ZM180-180h160v-160H180v160Zm440 0h160v-160H620v160Z\"/>\n",
" </svg>\n",
" </button>\n",
"\n",
" <style>\n",
" .colab-df-container {\n",
" display:flex;\n",
" gap: 12px;\n",
" }\n",
"\n",
" .colab-df-convert {\n",
" background-color: #E8F0FE;\n",
" border: none;\n",
" border-radius: 50%;\n",
" cursor: pointer;\n",
" display: none;\n",
" fill: #1967D2;\n",
" height: 32px;\n",
" padding: 0 0 0 0;\n",
" width: 32px;\n",
" }\n",
"\n",
" .colab-df-convert:hover {\n",
" background-color: #E2EBFA;\n",
" box-shadow: 0px 1px 2px rgba(60, 64, 67, 0.3), 0px 1px 3px 1px rgba(60, 64, 67, 0.15);\n",
" fill: #174EA6;\n",
" }\n",
"\n",
" .colab-df-buttons div {\n",
" margin-bottom: 4px;\n",
" }\n",
"\n",
" [theme=dark] .colab-df-convert {\n",
" background-color: #3B4455;\n",
" fill: #D2E3FC;\n",
" }\n",
"\n",
" [theme=dark] .colab-df-convert:hover {\n",
" background-color: #434B5C;\n",
" box-shadow: 0px 1px 3px 1px rgba(0, 0, 0, 0.15);\n",
" filter: drop-shadow(0px 1px 2px rgba(0, 0, 0, 0.3));\n",
" fill: #FFFFFF;\n",
" }\n",
" </style>\n",
"\n",
" <script>\n",
" const buttonEl =\n",
" document.querySelector('#df-0a3fbd4d-d322-4c72-9c0b-25d1e4dee587 button.colab-df-convert');\n",
" buttonEl.style.display =\n",
" google.colab.kernel.accessAllowed ? 'block' : 'none';\n",
"\n",
" async function convertToInteractive(key) {\n",
" const element = document.querySelector('#df-0a3fbd4d-d322-4c72-9c0b-25d1e4dee587');\n",
" const dataTable =\n",
" await google.colab.kernel.invokeFunction('convertToInteractive',\n",
" [key], {});\n",
" if (!dataTable) return;\n",
"\n",
" const docLinkHtml = 'Like what you see? Visit the ' +\n",
" '<a target=\"_blank\" href=https://colab.research.google.com/notebooks/data_table.ipynb>data table notebook</a>'\n",
" + ' to learn more about interactive tables.';\n",
" element.innerHTML = '';\n",
" dataTable['output_type'] = 'display_data';\n",
" await google.colab.output.renderOutput(dataTable, element);\n",
" const docLink = document.createElement('div');\n",
" docLink.innerHTML = docLinkHtml;\n",
" element.appendChild(docLink);\n",
" }\n",
" </script>\n",
" </div>\n",
"\n",
"\n",
"<div id=\"df-cb2f37d9-b18e-4718-922a-d046d3fd58af\">\n",
" <button class=\"colab-df-quickchart\" onclick=\"quickchart('df-cb2f37d9-b18e-4718-922a-d046d3fd58af')\"\n",
" title=\"Suggest charts\"\n",
" style=\"display:none;\">\n",
"\n",
"<svg xmlns=\"http://www.w3.org/2000/svg\" height=\"24px\"viewBox=\"0 0 24 24\"\n",
" width=\"24px\">\n",
" <g>\n",
" <path d=\"M19 3H5c-1.1 0-2 .9-2 2v14c0 1.1.9 2 2 2h14c1.1 0 2-.9 2-2V5c0-1.1-.9-2-2-2zM9 17H7v-7h2v7zm4 0h-2V7h2v10zm4 0h-2v-4h2v4z\"/>\n",
" </g>\n",
"</svg>\n",
" </button>\n",
"\n",
"<style>\n",
" .colab-df-quickchart {\n",
" --bg-color: #E8F0FE;\n",
" --fill-color: #1967D2;\n",
" --hover-bg-color: #E2EBFA;\n",
" --hover-fill-color: #174EA6;\n",
" --disabled-fill-color: #AAA;\n",
" --disabled-bg-color: #DDD;\n",
" }\n",
"\n",
" [theme=dark] .colab-df-quickchart {\n",
" --bg-color: #3B4455;\n",
" --fill-color: #D2E3FC;\n",
" --hover-bg-color: #434B5C;\n",
" --hover-fill-color: #FFFFFF;\n",
" --disabled-bg-color: #3B4455;\n",
" --disabled-fill-color: #666;\n",
" }\n",
"\n",
" .colab-df-quickchart {\n",
" background-color: var(--bg-color);\n",
" border: none;\n",
" border-radius: 50%;\n",
" cursor: pointer;\n",
" display: none;\n",
" fill: var(--fill-color);\n",
" height: 32px;\n",
" padding: 0;\n",
" width: 32px;\n",
" }\n",
"\n",
" .colab-df-quickchart:hover {\n",
" background-color: var(--hover-bg-color);\n",
" box-shadow: 0 1px 2px rgba(60, 64, 67, 0.3), 0 1px 3px 1px rgba(60, 64, 67, 0.15);\n",
" fill: var(--button-hover-fill-color);\n",
" }\n",
"\n",
" .colab-df-quickchart-complete:disabled,\n",
" .colab-df-quickchart-complete:disabled:hover {\n",
" background-color: var(--disabled-bg-color);\n",
" fill: var(--disabled-fill-color);\n",
" box-shadow: none;\n",
" }\n",
"\n",
" .colab-df-spinner {\n",
" border: 2px solid var(--fill-color);\n",
" border-color: transparent;\n",
" border-bottom-color: var(--fill-color);\n",
" animation:\n",
" spin 1s steps(1) infinite;\n",
" }\n",
"\n",
" @keyframes spin {\n",
" 0% {\n",
" border-color: transparent;\n",
" border-bottom-color: var(--fill-color);\n",
" border-left-color: var(--fill-color);\n",
" }\n",
" 20% {\n",
" border-color: transparent;\n",
" border-left-color: var(--fill-color);\n",
" border-top-color: var(--fill-color);\n",
" }\n",
" 30% {\n",
" border-color: transparent;\n",
" border-left-color: var(--fill-color);\n",
" border-top-color: var(--fill-color);\n",
" border-right-color: var(--fill-color);\n",
" }\n",
" 40% {\n",
" border-color: transparent;\n",
" border-right-color: var(--fill-color);\n",
" border-top-color: var(--fill-color);\n",
" }\n",
" 60% {\n",
" border-color: transparent;\n",
" border-right-color: var(--fill-color);\n",
" }\n",
" 80% {\n",
" border-color: transparent;\n",
" border-right-color: var(--fill-color);\n",
" border-bottom-color: var(--fill-color);\n",
" }\n",
" 90% {\n",
" border-color: transparent;\n",
" border-bottom-color: var(--fill-color);\n",
" }\n",
" }\n",
"</style>\n",
"\n",
" <script>\n",
" async function quickchart(key) {\n",
" const quickchartButtonEl =\n",
" document.querySelector('#' + key + ' button');\n",
" quickchartButtonEl.disabled = true; // To prevent multiple clicks.\n",
" quickchartButtonEl.classList.add('colab-df-spinner');\n",
" try {\n",
" const charts = await google.colab.kernel.invokeFunction(\n",
" 'suggestCharts', [key], {});\n",
" } catch (error) {\n",
" console.error('Error during call to suggestCharts:', error);\n",
" }\n",
" quickchartButtonEl.classList.remove('colab-df-spinner');\n",
" quickchartButtonEl.classList.add('colab-df-quickchart-complete');\n",
" }\n",
" (() => {\n",
" let quickchartButtonEl =\n",
" document.querySelector('#df-cb2f37d9-b18e-4718-922a-d046d3fd58af button');\n",
" quickchartButtonEl.style.display =\n",
" google.colab.kernel.accessAllowed ? 'block' : 'none';\n",
" })();\n",
" </script>\n",
"</div>\n",
" </div>\n",
" </div>\n"
],
"application/vnd.google.colaboratory.intrinsic+json": {
"type": "dataframe",
"variable_name": "newdf",
"summary": "{\n \"name\": \"newdf\",\n \"rows\": 1730,\n \"fields\": [\n {\n \"column\": \"GENE\",\n \"properties\": {\n \"dtype\": \"string\",\n \"num_unique_values\": 879,\n \"samples\": [\n \"TRBV13\",\n \"DSP\",\n \"LENG1\"\n ],\n \"semantic_type\": \"\",\n \"description\": \"\"\n }\n },\n {\n \"column\": \"EFFECT\",\n \"properties\": {\n \"dtype\": \"category\",\n \"num_unique_values\": 8,\n \"samples\": [\n \"missense_variant\",\n \"stop_lost\",\n \"disruptive_inframe_deletion\"\n ],\n \"semantic_type\": \"\",\n \"description\": \"\"\n }\n },\n {\n \"column\": \"Polyphen\",\n \"properties\": {\n \"dtype\": \"category\",\n \"num_unique_values\": 3,\n \"samples\": [\n \".\",\n \"D\",\n \"B\"\n ],\n \"semantic_type\": \"\",\n \"description\": \"\"\n }\n },\n {\n \"column\": \"SIFT\",\n \"properties\": {\n \"dtype\": \"category\",\n \"num_unique_values\": 3,\n \"samples\": [\n \".\",\n \"B\",\n \"D\"\n ],\n \"semantic_type\": \"\",\n \"description\": \"\"\n }\n },\n {\n \"column\": \"HGVS_CP\",\n \"properties\": {\n \"dtype\": \"string\",\n \"num_unique_values\": 1701,\n \"samples\": [\n \"c.1141G>T:p.Glu381*\",\n \"c.646G>A:p.Glu216Lys\",\n \"c.1256C>A:p.Ala419Glu\"\n ],\n \"semantic_type\": \"\",\n \"description\": \"\"\n }\n }\n ]\n}"
}
},
"metadata": {},
"execution_count": 15
}
]
},
{
"cell_type": "markdown",
"source": [
"Now we finally compare genes from the panel to genes from our filtered variants and retain only the shared ones:🚀"
],
"metadata": {
"id": "pOIcExt8rGJP"
}
},
{
"cell_type": "code",
"source": [
"dff = pd.read_csv(panel, delimiter=\"\\t\")\n",
"newdff = summarize_out(dff, newdf)"
],
"metadata": {
"id": "5eKCIBW5rSaU"
},
"execution_count": 16,
"outputs": []
},
{
"cell_type": "markdown",
"source": [
"Let's see the output! Are you thrilled?😁"
],
"metadata": {
"id": "zUhPlSUwrcrC"
}
},
{
"cell_type": "code",
"source": [
"newdff"
],
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/",
"height": 237
},
"id": "QhNGmh9Auzmj",
"outputId": "254066bc-af4f-4c01-926a-b74ca745dd71"
},
"execution_count": 19,
"outputs": [
{
"output_type": "execute_result",
"data": {
"text/plain": [
" GENE EFFECT Polyphen SIFT HGVS_CP\n",
"391 COL12A1 missense_variant B B c.6590C>T:p.Thr2197Ile\n",
"828 VCP missense_variant D B c.469G>A:p.Gly157Arg\n",
"1141 ISCU missense_variant . . c.19_20delTTinsGG:p.Phe7Gly\n",
"1279 RYR3 missense_variant D B c.11545A>C:p.Asn3849His\n",
"1412 PIEZO2 missense_variant . B c.147A>C:p.Lys49Asn\n",
"1419 EPG5 missense_variant B B c.1511A>C:p.His504Pro"
],
"text/html": [
"\n",
" <div id=\"df-85d1dd1b-fd77-419c-8a38-2f5a0f7bc2ee\" class=\"colab-df-container\">\n",
" <div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>GENE</th>\n",
" <th>EFFECT</th>\n",
" <th>Polyphen</th>\n",
" <th>SIFT</th>\n",
" <th>HGVS_CP</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>391</th>\n",
" <td>COL12A1</td>\n",
" <td>missense_variant</td>\n",
" <td>B</td>\n",
" <td>B</td>\n",
" <td>c.6590C&gt;T:p.Thr2197Ile</td>\n",
" </tr>\n",
" <tr>\n",
" <th>828</th>\n",
" <td>VCP</td>\n",
" <td>missense_variant</td>\n",
" <td>D</td>\n",
" <td>B</td>\n",
" <td>c.469G&gt;A:p.Gly157Arg</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1141</th>\n",
" <td>ISCU</td>\n",
" <td>missense_variant</td>\n",
" <td>.</td>\n",
" <td>.</td>\n",
" <td>c.19_20delTTinsGG:p.Phe7Gly</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1279</th>\n",
" <td>RYR3</td>\n",
" <td>missense_variant</td>\n",
" <td>D</td>\n",
" <td>B</td>\n",
" <td>c.11545A&gt;C:p.Asn3849His</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1412</th>\n",
" <td>PIEZO2</td>\n",
" <td>missense_variant</td>\n",
" <td>.</td>\n",
" <td>B</td>\n",
" <td>c.147A&gt;C:p.Lys49Asn</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1419</th>\n",
" <td>EPG5</td>\n",
" <td>missense_variant</td>\n",
" <td>B</td>\n",
" <td>B</td>\n",
" <td>c.1511A&gt;C:p.His504Pro</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>\n",
" <div class=\"colab-df-buttons\">\n",
"\n",
" <div class=\"colab-df-container\">\n",
" <button class=\"colab-df-convert\" onclick=\"convertToInteractive('df-85d1dd1b-fd77-419c-8a38-2f5a0f7bc2ee')\"\n",
" title=\"Convert this dataframe to an interactive table.\"\n",
" style=\"display:none;\">\n",
"\n",
" <svg xmlns=\"http://www.w3.org/2000/svg\" height=\"24px\" viewBox=\"0 -960 960 960\">\n",
" <path d=\"M120-120v-720h720v720H120Zm60-500h600v-160H180v160Zm220 220h160v-160H400v160Zm0 220h160v-160H400v160ZM180-400h160v-160H180v160Zm440 0h160v-160H620v160ZM180-180h160v-160H180v160Zm440 0h160v-160H620v160Z\"/>\n",
" </svg>\n",
" </button>\n",
"\n",
" <style>\n",
" .colab-df-container {\n",
" display:flex;\n",
" gap: 12px;\n",
" }\n",
"\n",
" .colab-df-convert {\n",
" background-color: #E8F0FE;\n",
" border: none;\n",
" border-radius: 50%;\n",
" cursor: pointer;\n",
" display: none;\n",
" fill: #1967D2;\n",
" height: 32px;\n",
" padding: 0 0 0 0;\n",
" width: 32px;\n",
" }\n",
"\n",
" .colab-df-convert:hover {\n",
" background-color: #E2EBFA;\n",
" box-shadow: 0px 1px 2px rgba(60, 64, 67, 0.3), 0px 1px 3px 1px rgba(60, 64, 67, 0.15);\n",
" fill: #174EA6;\n",
" }\n",
"\n",
" .colab-df-buttons div {\n",
" margin-bottom: 4px;\n",
" }\n",
"\n",
" [theme=dark] .colab-df-convert {\n",
" background-color: #3B4455;\n",
" fill: #D2E3FC;\n",
" }\n",
"\n",
" [theme=dark] .colab-df-convert:hover {\n",
" background-color: #434B5C;\n",
" box-shadow: 0px 1px 3px 1px rgba(0, 0, 0, 0.15);\n",
" filter: drop-shadow(0px 1px 2px rgba(0, 0, 0, 0.3));\n",
" fill: #FFFFFF;\n",
" }\n",
" </style>\n",
"\n",
" <script>\n",
" const buttonEl =\n",
" document.querySelector('#df-85d1dd1b-fd77-419c-8a38-2f5a0f7bc2ee button.colab-df-convert');\n",
" buttonEl.style.display =\n",
" google.colab.kernel.accessAllowed ? 'block' : 'none';\n",
"\n",
" async function convertToInteractive(key) {\n",
" const element = document.querySelector('#df-85d1dd1b-fd77-419c-8a38-2f5a0f7bc2ee');\n",
" const dataTable =\n",
" await google.colab.kernel.invokeFunction('convertToInteractive',\n",
" [key], {});\n",
" if (!dataTable) return;\n",
"\n",
" const docLinkHtml = 'Like what you see? Visit the ' +\n",
" '<a target=\"_blank\" href=https://colab.research.google.com/notebooks/data_table.ipynb>data table notebook</a>'\n",
" + ' to learn more about interactive tables.';\n",
" element.innerHTML = '';\n",
" dataTable['output_type'] = 'display_data';\n",
" await google.colab.output.renderOutput(dataTable, element);\n",
" const docLink = document.createElement('div');\n",
" docLink.innerHTML = docLinkHtml;\n",
" element.appendChild(docLink);\n",
" }\n",
" </script>\n",
" </div>\n",
"\n",
"\n",
"<div id=\"df-467c83db-0fb1-4e32-8b61-2f2ef0d0ceb2\">\n",
" <button class=\"colab-df-quickchart\" onclick=\"quickchart('df-467c83db-0fb1-4e32-8b61-2f2ef0d0ceb2')\"\n",
" title=\"Suggest charts\"\n",
" style=\"display:none;\">\n",
"\n",
"<svg xmlns=\"http://www.w3.org/2000/svg\" height=\"24px\"viewBox=\"0 0 24 24\"\n",
" width=\"24px\">\n",
" <g>\n",
" <path d=\"M19 3H5c-1.1 0-2 .9-2 2v14c0 1.1.9 2 2 2h14c1.1 0 2-.9 2-2V5c0-1.1-.9-2-2-2zM9 17H7v-7h2v7zm4 0h-2V7h2v10zm4 0h-2v-4h2v4z\"/>\n",
" </g>\n",
"</svg>\n",
" </button>\n",
"\n",
"<style>\n",
" .colab-df-quickchart {\n",
" --bg-color: #E8F0FE;\n",
" --fill-color: #1967D2;\n",
" --hover-bg-color: #E2EBFA;\n",
" --hover-fill-color: #174EA6;\n",
" --disabled-fill-color: #AAA;\n",
" --disabled-bg-color: #DDD;\n",
" }\n",
"\n",
" [theme=dark] .colab-df-quickchart {\n",
" --bg-color: #3B4455;\n",
" --fill-color: #D2E3FC;\n",
" --hover-bg-color: #434B5C;\n",
" --hover-fill-color: #FFFFFF;\n",
" --disabled-bg-color: #3B4455;\n",
" --disabled-fill-color: #666;\n",
" }\n",
"\n",
" .colab-df-quickchart {\n",
" background-color: var(--bg-color);\n",
" border: none;\n",
" border-radius: 50%;\n",
" cursor: pointer;\n",
" display: none;\n",
" fill: var(--fill-color);\n",
" height: 32px;\n",
" padding: 0;\n",
" width: 32px;\n",
" }\n",
"\n",
" .colab-df-quickchart:hover {\n",
" background-color: var(--hover-bg-color);\n",
" box-shadow: 0 1px 2px rgba(60, 64, 67, 0.3), 0 1px 3px 1px rgba(60, 64, 67, 0.15);\n",
" fill: var(--button-hover-fill-color);\n",
" }\n",
"\n",
" .colab-df-quickchart-complete:disabled,\n",
" .colab-df-quickchart-complete:disabled:hover {\n",
" background-color: var(--disabled-bg-color);\n",
" fill: var(--disabled-fill-color);\n",
" box-shadow: none;\n",
" }\n",
"\n",
" .colab-df-spinner {\n",
" border: 2px solid var(--fill-color);\n",
" border-color: transparent;\n",
" border-bottom-color: var(--fill-color);\n",
" animation:\n",
" spin 1s steps(1) infinite;\n",
" }\n",
"\n",
" @keyframes spin {\n",
" 0% {\n",
" border-color: transparent;\n",
" border-bottom-color: var(--fill-color);\n",
" border-left-color: var(--fill-color);\n",
" }\n",
" 20% {\n",
" border-color: transparent;\n",
" border-left-color: var(--fill-color);\n",
" border-top-color: var(--fill-color);\n",
" }\n",
" 30% {\n",
" border-color: transparent;\n",
" border-left-color: var(--fill-color);\n",
" border-top-color: var(--fill-color);\n",
" border-right-color: var(--fill-color);\n",
" }\n",
" 40% {\n",
" border-color: transparent;\n",
" border-right-color: var(--fill-color);\n",
" border-top-color: var(--fill-color);\n",
" }\n",
" 60% {\n",
" border-color: transparent;\n",
" border-right-color: var(--fill-color);\n",
" }\n",
" 80% {\n",
" border-color: transparent;\n",
" border-right-color: var(--fill-color);\n",
" border-bottom-color: var(--fill-color);\n",
" }\n",
" 90% {\n",
" border-color: transparent;\n",
" border-bottom-color: var(--fill-color);\n",
" }\n",
" }\n",
"</style>\n",
"\n",
" <script>\n",
" async function quickchart(key) {\n",
" const quickchartButtonEl =\n",
" document.querySelector('#' + key + ' button');\n",
" quickchartButtonEl.disabled = true; // To prevent multiple clicks.\n",
" quickchartButtonEl.classList.add('colab-df-spinner');\n",
" try {\n",
" const charts = await google.colab.kernel.invokeFunction(\n",
" 'suggestCharts', [key], {});\n",
" } catch (error) {\n",
" console.error('Error during call to suggestCharts:', error);\n",
" }\n",
" quickchartButtonEl.classList.remove('colab-df-spinner');\n",
" quickchartButtonEl.classList.add('colab-df-quickchart-complete');\n",
" }\n",
" (() => {\n",
" let quickchartButtonEl =\n",
" document.querySelector('#df-467c83db-0fb1-4e32-8b61-2f2ef0d0ceb2 button');\n",
" quickchartButtonEl.style.display =\n",
" google.colab.kernel.accessAllowed ? 'block' : 'none';\n",
" })();\n",
" </script>\n",
"</div>\n",
" </div>\n",
" </div>\n"
],
"application/vnd.google.colaboratory.intrinsic+json": {
"type": "dataframe",
"variable_name": "newdff",
"summary": "{\n \"name\": \"newdff\",\n \"rows\": 6,\n \"fields\": [\n {\n \"column\": \"GENE\",\n \"properties\": {\n \"dtype\": \"string\",\n \"num_unique_values\": 6,\n \"samples\": [\n \"COL12A1\",\n \"VCP\",\n \"EPG5\"\n ],\n \"semantic_type\": \"\",\n \"description\": \"\"\n }\n },\n {\n \"column\": \"EFFECT\",\n \"properties\": {\n \"dtype\": \"category\",\n \"num_unique_values\": 1,\n \"samples\": [\n \"missense_variant\"\n ],\n \"semantic_type\": \"\",\n \"description\": \"\"\n }\n },\n {\n \"column\": \"Polyphen\",\n \"properties\": {\n \"dtype\": \"string\",\n \"num_unique_values\": 3,\n \"samples\": [\n \"B\"\n ],\n \"semantic_type\": \"\",\n \"description\": \"\"\n }\n },\n {\n \"column\": \"SIFT\",\n \"properties\": {\n \"dtype\": \"category\",\n \"num_unique_values\": 2,\n \"samples\": [\n \".\"\n ],\n \"semantic_type\": \"\",\n \"description\": \"\"\n }\n },\n {\n \"column\": \"HGVS_CP\",\n \"properties\": {\n \"dtype\": \"string\",\n \"num_unique_values\": 6,\n \"samples\": [\n \"c.6590C>T:p.Thr2197Ile\"\n ],\n \"semantic_type\": \"\",\n \"description\": \"\"\n }\n }\n ]\n}"
}
},
"metadata": {},
"execution_count": 19
}
]
},
{
"cell_type": "markdown",
"source": [
"Let's generate links to copy-paste in your browser and query ClinVar for these candidate variants!🤠"
],
"metadata": {
"id": "lc6U08TBu_t8"
}
},
{
"cell_type": "code",
"source": [
"newdict = {key: list(newdff[key]) for key in list(newdff.keys())}\n",
"for i in range(len(newdict[list(newdict.keys())[0]])):\n",
" print(f\"https://www.ncbi.nlm.nih.gov/clinvar/?term={newdict['GENE'][i]}+{newdict['HGVS_CP'][i].replace(':','%3A')}\")"
],
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/"
},
"id": "WJ8umyHjrcPN",
"outputId": "598951c7-05a7-418c-a75a-7d8ccc7f5213"
},
"execution_count": 20,
"outputs": [
{
"output_type": "stream",
"name": "stdout",
"text": [
"https://www.ncbi.nlm.nih.gov/clinvar/?term=COL12A1+c.6590C>T%3Ap.Thr2197Ile\n",
"https://www.ncbi.nlm.nih.gov/clinvar/?term=VCP+c.469G>A%3Ap.Gly157Arg\n",
"https://www.ncbi.nlm.nih.gov/clinvar/?term=ISCU+c.19_20delTTinsGG%3Ap.Phe7Gly\n",
"https://www.ncbi.nlm.nih.gov/clinvar/?term=RYR3+c.11545A>C%3Ap.Asn3849His\n",
"https://www.ncbi.nlm.nih.gov/clinvar/?term=PIEZO2+c.147A>C%3Ap.Lys49Asn\n",
"https://www.ncbi.nlm.nih.gov/clinvar/?term=EPG5+c.1511A>C%3Ap.His504Pro\n"
]
}
]
},
{
"cell_type": "markdown",
"source": [
"From here, you can directly go to OMIM and VarSome! Also, if you are interested in searching [Franklin](https://franklin.genoox.com/clinical-db/home), you can run the block down here and use the printed queries:😶‍🌫️"
],
"metadata": {
"id": "0NGoVY3v1CR9"
}
},
{
"cell_type": "code",
"source": [
"newdict = {key: list(newdff[key]) for key in list(newdff.keys())}\n",
"for i in range(len(newdict[list(newdict.keys())[0]])):\n",
" print(f\"{newdict['GENE'][i]}:{newdict['HGVS_CP'][i].split(':')[0]}\")"
],
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/"
},
"id": "Tamt7XRA1PYw",
"outputId": "860e66c4-17f9-4f79-da53-098c307969b3"
},
"execution_count": 28,
"outputs": [
{
"output_type": "stream",
"name": "stdout",
"text": [
"COL12A1:c.6590C>T\n",
"VCP:c.469G>A\n",
"ISCU:c.19_20delTTinsGG\n",
"RYR3:c.11545A>C\n",
"PIEZO2:c.147A>C\n",
"EPG5:c.1511A>C\n"
]
}
]
},
{
"cell_type": "markdown",
"source": [
"# HAVE FUN!❤️\n",
"\n",
"\n",
"And give a little star to this [Gist on GitHub](#)🥰"
],
"metadata": {
"id": "7kZtIHvr2Pp_"
}
}
]
}
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment