Skip to content

Instantly share code, notes, and snippets.

@janxkoci
Last active May 6, 2022 15:01
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save janxkoci/5fca0cf87a20f168eaeadd78c921e41f to your computer and use it in GitHub Desktop.
Save janxkoci/5fca0cf87a20f168eaeadd78c921e41f to your computer and use it in GitHub Desktop.
scripts for plink clustering (MDS and PCA) using either plink or VCF formats as input
@janxkoci
Copy link
Author

janxkoci commented Feb 4, 2022

TODO

  • add README file (for git clone) with links and explanation of different options (e.g. distances)

@janxkoci
Copy link
Author

janxkoci commented May 6, 2022

These scripts are probably fine for a one-off analysis of samples. But after working with them today, rerunning stuff a few times, I realized there are several rather dumb decisions and the scripts can be somewhat improved. Things like:

  • Doing --pca and --mds-plot in the same command. They really don't need to be run separately.
  • Not using a text file set with --recode for the prunned data. This comes from a tutorial I've found back in the day and I guess it was there to solve a name collision, as using the same --bfile would cause it, but using --file can avoid it. Or, you know, just make a new name for the prunned dataset, so you can keep using --bfile (i.e. the smaller and more efficient format).
  • Using --genome and --read-genome can be completelly ommited. Plink will do it automatically with --cluster (or whatever different option if feels is better atm). Also this way it can be removed from the prunning step, so all clustering can be done within one step. I may add the options again later (only to the last clustering steps) to make rerunning the scripts more efficient, but that will need some detection of files to work.
  • Some log files are getting overwritten by the following commands, because they use the same --out names. This should be easy to fix, but needs a little thought for rerunning (i.e. when some steps may get skipped).

There may be some other improvements, probably more involved, like detecting if prunning was already done and skipping the step if so, to make rerunning easier, but that will be for another time.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment