There are a huge range of tutorials and books available for learning Python. There are a couple things to watch out for when selecting learning material:
- Python 2 & 3 are very similar but subtly different languages. This can be a bit of a pain point when first learning Python so be aware of what version a tutorial or book is demonstrating. I highly recommend sticking to Python 3 as Python 2 is slowly being phased out of existence.
- Python is a very general purpose language (e.g. compared to R) so a lot of material is focused on web development, games and etc. The style of code used can be different between these topics so try to find learning material from an appropriate domain.
Bellow is a list of resources that are generally more focussed towards science and numerical programming rather than other topics (e.g. web development). I have used parts of most of these or otherwise trust the source of the material.
-
The Python Tutorial is an official tutorial provided by the Python Foundation.
-
Programming With Python (a Software Carpentry course) is a good introductory lesson.
-
A Byte of Python is an online text book for complete beginners. It is available for Python 2 and 3. Probably one of the best places to start for someone new who wants to learn programming.
-
Think Python is a textbook for learning Python (2 or 3). The book is freely available online and you can also purchase a hard copy. Green Tea Press offer a variety of free programming books including some domain specific Python books:
- Think Stats,
- Think Bayes,
- Think DSP (Digital Signal Processing)
-
Numpy provides fast arrays and matrices for numerical computation. It is a foundation that many other science libraries are built on.
-
Pandas (which is built on top of Numpy) provides highly flexible dataframes for python (similar to those in R). Dataframes are a common way to handle tabular data (think spreadsheets) in programming. If you have data in a excel, csv or similar format then Pandas will likely be the best way of getting data into python.
-
Scipy is built on top of Numpy and provides many options for numerical computation and statistics. It is a large library containing many smaller submodules so be sure to read the documentation carefully.
-
Biopython provides many ways to read and analyse biological data formats in python. The Biopython Tutorial and Cookbook is a good place to get started. This library is a good starting point for biological data analysis but there are other (often better) options available some of which are in the advanced list.
-
matplotlib is the grandfather of plotting libraries in python. Matplotlib can be quite fidley so jump to its pyplot submodule for simpler plotting or else look at using Seaborn.
-
Seaborn is a strait-forward library for creating 'out of the box' plots based on matplotlib.
-
Intermediate Python is a free online book that focusses on some less obvious (but highly useful) aspects of the language including generators, decorators, comprehensions and lambda functions. It also covers some of the modules that ship with a standard python installation but often not touched on in other books, e.g. the Collections module. The book is quite brief on most topics and is still being written.
-
The Hitchhikers Guide to Python is a free blog-come-book that contains some real gems of knowledge that are often missing from textbooks. For example structuring a project.
-
From Python to Numpy is a free online book for more advanced Numpy usage. It provides many examples of taking slow Python code and re-writing it as fast vectorised Numpy code. It also has an extensive bibliography of links to other Numpy books and tutorials.
-
Python Data Science Handbook free online book that focusses on getting analysis done quickly and effectively in python. There is also a paid 540 page hard copy.
-
Fluent Python is a paid 750 page textbook that give a deep knowledge of how the language works and many advanced topics. It is generally focussed on the language and object oriented programming rather than science analysis.
- Biology:
- Pysam, Pyvcf and Pybedtools for dealing with NGS data files.
- Scikit-bio a new library with similar scope to biopython which also has a cookbook.
- Scikit-allel for analysis of genetic variation
- Dendropy for phylogenetic analysis.
- ETEToolkit for phylogenetic visualisation.
- Numerical Computing:
- Machine Learning and Statistics:
- Scikit-learn is the go-to solution for training simple models in python.
- Theano is a numerical/ML library that is often built on by others.
- PyMC3 Probabilistic Programming (Bayesian inference with Markov chain Monte Carlo sampling) in python. This library has its own free textbook called Probabilistic Programming & Bayesian Methods for Hackers
- PyTorch deep learning in python.
- TensorFlow another deep learning library for python.
- Visualisation:
- Plotly Interactive web based plots with python.
- Bokeh Interactive web based plots with python (similar to Plotly).
- Datashader for creating attractive legible scatter/line plots of millions of points in seconds. Can be integrated with Bokeh for some level on interactivity.
- HoloViews is a high level wrapper for Bokeh, Datashader and matplotlib. This is the easiest library for creating interactive plots based on Bokeh and Datashader.