Skip to content

Instantly share code, notes, and snippets.

@C-H-Simpson
Last active November 8, 2022 11:57
Show Gist options
  • Save C-H-Simpson/abbcc83c900f0c94eb2a1fec9aab532a to your computer and use it in GitHub Desktop.
Save C-H-Simpson/abbcc83c900f0c94eb2a1fec9aab532a to your computer and use it in GitHub Desktop.
My opinionated recommendations of software for researchers

Software for researchers

by Charles Simpson, 2022-11-08

Python

  • When you use python you are usually importing a lot of code from other packages.
  • Conda is a package for managing your python package environment.
  • When using conda, people often have problems with packages conflicting, or conda taking a very long time to solve dependencies. This can be avoided with some simple tricks.

mamba not conda

mamba is a faster re-implementation of conda. I recommend using it as much as possible. I recommend setting up your environment as follows:

  1. Start with a miniconda install, then open up a terminal.
  2. In the base environment (get into it by doing conda activate base), do conda install mamba.
  3. Use mamba to create your working environment e.g. mamba env create -p ./env. I tend to rely on prefixes rather than names as it is easier if you have lots of projects. This means the environment is identified by its directory rather than a unique name. You can activate it like conda activate ./env.
  4. Always install as much as possible from conda-forge rather than Anaconda. Mixing between the repositories is likely to lead to conflicts. You can do this by specifying -c conda-forge when you install a package.
  5. Use mamba from the base environment to install packages into your working environment e.g. for scipy and seaborn do mamba install scipy seaborn -c conda-forge -p ./env.

Following these instructions I have never had an issue with my environment.

  • You can still install pip packages if you need to, but it's better to use mamba when you can. Just activate the working environment and do pip install. You might need to install pip using mamba from the base environment first.

VSCode not Jupyterlab

  • I don't generally use Jupyter Notebooks, but I do use Jupyter kernels in VSCode.
  • I mainly do this because I find the Jupyter Notebook / Lab editor a bit inconvenient and annoying. I like being able to set up keyboard shortcuts and use vim keybindings easily. It's autocomplete isn't very good.
  • Furthermore, there are problems with Notebooks.
    • You can run the cells out of order.
    • It is hard to version control them.
    • It's hard to copy and paste multiple cells of code from them.
  • VSCode is a proper IDE. But, you can execute your python code interactively in a jupyter kernel. This gives you a notebook-like experience without any of the downsides. You can also save the output as a notebook, which can be really handy!
  • Set it up like this:
    1. Install VSCode
    2. Install the ms-python extension.
    3. Install the jupyter extension
    4. Create an ipykernel in your working conda environment.. It is also possible to connect to a kernel on a remote server using port forwarding!
    5. In the python script you want to be interactive, put # %% everywhere you want a new cell.
    6. Press Ctrl+Enter to run a cell.
  • There is a conflict between the keybindings for ms-python and jupyter. Shift+Enter is supposed to be run and move down one cell in jupyter, but is bound to ms-python. Go into the keybindings settings and search for Shift+Enter and disable the conflicting binding.
  • What about Spyder:
    • Spyder does have the # %% notation for code cells, which is nice.
    • In VSCode you can set your own keyboard shortcuts. I love keyboard shortcuts!
    • I don't think Spyder lets you export a notebook from your console output whereas VSCode+Jupyter does.
    • I think the autocomplete etc. is a bit better in VSCode.
    • VSCode has some great extensions.
    • VSCode can handle other programming and markup languages, it isn't just for python.

Github Desktop is really easy to use

  • Git is a package for version control. This means it keeps track of your changes between versions of a document or code.
  • It works best with plaintext files, e.g. some code, as opposed to binary files e.g. a notebook or a PNG.
  • If you're on GNU/Linux git is probably already installed, and you're likely to be familiar enough with the command line that you can use it there. Most people online, and the Software Carpentry course will tell you to use the command line version.
  • But if you're on Windows: everyone hates the command line in Windows. I recommend using Github Desktop.
  • The way it works is you create a repository that is basically a project folder that tracks changes. Whenever you have made a change to your code, go to the Github window and commit the change, with a short note about what you did.
  • If you want to go back to an earlier version of your work, you can easily using History.
  • If you want to have different versions of your work, its easy to keep track of the differences between them using Branches.
  • If you Push origin regularly, then your work is backed up remotely, and you don't have to worry about losing work!
  • If you want to collaborate on code or a document with someone, you can each make your own changes then merge them together.
  • If you want to work on the same code on multiple machines, you can keep them in sync easily.
  • The first thing I do when starting a new project is create a private Github repository for it. I start tracking changes from the very beginning. The sooner you get used to doing this the better!

Markdown not $\LaTeX$ or MS Word

You're writing a paper, what do you use?

  • Lots of people will tell you to use $\LaTeX$. $\LaTeX$ is great for typesetting, and gives you total control over how your document looks.
  • However, $\LaTeX$ is kind of hard:
    • Installing it and using it on Windows is annoyingly hard.
    • You get lots of confusing compilation errors.
  • Other people will tell you to just use MS Word and stop trying to be clever but:
    • All the reference manager software in MS Word is a bit annoying and inconvenient.
    • Adding formulae and symbols is hard.
    • It's hard to get your figures to go where you tell them.
    • The only way to get your tables how you want them is to do it in MS Excel then copy-paste it in.
    • It's hard to do version control on an MS Word document; your only real option is to save lots of dated version and use the Review->Merge/Compare tool.

I would actually recommend writing as much as possible in Markdown, then using pandoc to compile it into other formats.

  • Markdown has a lot of the advantages of $\LaTeX$.
  • Markdown syntax is easier than $\LaTeX$.
  • You can embed symbols and formulae like in $\LaTeX$.
  • You can use a .bib file for reference management. It's easy to embed references in pandoc, you just type the key of the reference a bit like in bibtex. (I really hate the reference management tools in MS Word.)
  • You can use pandoc to export the document to a .docx or .tex format when you need to anyway!
  • You can easily apply document styles from a MS Word template.
  • You can easily version control your markdown document.
  • You can automatically format your results in pandas (see below), although this is also true for $\LaTeX$. If you do need to manually edit a table, it is fairly intuitive.
  • If you are going to use $\LaTeX$, I recommend using Overleaf.com, as it avoids a lot of issues. It handles a lot of the package dependency issues I've encountered when installing $\LaTeX$ locally.

Comments on version control and collaboration:

  • The ideal situation is you do everything via git or GitHub. If you want to collaborate with someone they make changes directly to the markdown file and you can merge/compare etc.
  • If you're collaborating with someone less comfortable with git or GitHub, you can handle the different versions yourself, e.g. send them the markdown file, get them to edit it directly, then put it in a branch of the git repository yourself!
  • It's easier to get someone to modify a markdown file than a $\LaTeX$ file, it looks a lot more like plain text.
  • If you're collaborating with someone who can't handle a markdown file for some reason, you could export to a .docx file and get them to track changes in MS Word.
    • Pandoc does have tools for converting MS Word to markdown, but it doesn't handle tracked changes that well. Accept all their changes in the .docx then export to markdown, then put it in a branch of your git repo yourself, and compare/merge as you like!
    • You could use the merge / compare tools in MS Word to compare the different versions.

Showing author's tracked changes for peer review

  • markdown
    • Just diff the markdown files (or "file compare" in Windows). You could also do this online using Github if you have been using it.
    • Or, if you want a nicely marked up manuscript: Build to whatever document is your second format, i.e. tex or docx, then follow instructions for that option below.
  • docx
    • Use MS Word's Merge/Compare tool.
  • LaTeX
    • The Track Changes option on Overleaf does not give you a marked up manuscript, it is just for collaboration purposes.
    • There is a tool called latexdiff which can be used to mark up the differences between two tex files. It works really well. However, you can't use directly in Overleaf.
    • Instead, do a minimal install of TeXLive on you local machine, then use the package manager to install latexdiff. Use latexdiff locally, then upload the resulting file to Overleaf for building a PDF!
    • Only do the latexdiff locally, don't bother trying to build a PDF. Building everything locally is a pain. Doing the full install of TeXLive means downloading >70 GB which I feel like I shouldn't have to do. If you don't do the full install and try to build your document then you will inevitably run into cryptic errors about dependencies and fonts.

Zotero

  • Zotero is a reference manager. You use it to keep track of things you've read etc. and produce bibliographies for your papers.
  • I used Zotero Connector to get papers from the browser into my Zotero database.
  • I use a plugin called Better Bibtex in Zotero, which automatically exports my whole library to a bibtex file. I then just point to this file whenever doing referencing in a markdown document.

Obsidian

  • Obsidian is a free markdown note-taking app.
  • I use the Citations extension to connect it to my bibtex file, so I can really easily put references!
  • This is really nice for making notes on papers you read. Just add the paper to Zotero, then reference it in my diary, go to the note that Obsidian made automatically for the reference, and write my notes. This works to create references if you use Pandoc to reformat a note.
  • Using lots of backlinks, especially to notes that don't exist, is a great way to keep track of topics and keywords.

Don't manually format results tables

  • Use pandas to_markdown or to_string or to_latex instead.
  • Save the CSV data and have a separate script that formats it nicely.
  • If you update the data while you are drafting a paper, it's a pain to manually format everything again. Much better to have it as code!
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment