Create a gist now

Instantly share code, notes, and snippets.

@lh3 /00_label.txt
Last active Dec 1, 2015

What would you like to do?
Summary of Loman's survey (http://dx.doi.org/10.6084/m9.figshare.1572287)
===> Where do you most often perform bioinformatics analysis <===
1 Personal computer (laptop/desktop)
2 Lab server
3 Departmental server
4 University server/cluster
5 Cloud
6 Other
===> Type of problems <===
1 Lack of biology knowledge
2 Lack of programming/informatics skills
3 Installation problems
4 Inaccessible data/software or lacking data
5 Difficult interop (including formats and reproducibility)
6 Bad documentations (of software or data)
7 Slow data transfer
8 Insufficient hardware (speed, memory, storage or not enough machines)
9 Incompetent software; lack of particlar software
10 Difficulties in system admin
11 Bad services
===> Miscellaneous notes <===
* Too many choices (programs, parameters, etc)
* SRA is inconvenient
* When complain about RAM, not sure due to bad software or insufficient hardware
* Installation problems are easier to complain
7 2 3,2 10/6/2015 16:23:21,Yes,7,My inability to script for parsing/parallelising (and the lack of time to devote to learning it properly) ,Yes,,"Best for job, Word of mouth recommendation, Used in similar analysis, scale to large genome sets",Lab server,I have to install all dependancies etc and update myself,lack of bioinformatics background in biology students
1 1 2,3 10/6/2015 16:23:40,No,1,"Time to learn how to program, while doing experiments",Yes,,"Best for job, Word of mouth recommendation, Easy to use (e.g ""just type this"")",Personal computer (laptop/desktop),Installing,Time
8 2 4,5,3 10/6/2015 16:24:15,No,8,"Access to data, and f**** having to write parsers",No,I work on providing this kind of data,"Already installed on server, Word of mouth recommendation, Good documentation",Lab server,compilations and lack of sudo,making them write parsers instead of making them understand the data and why it is the way it is
6 2 6,5,3 10/6/2015 16:24:37,Yes,6,Badly written manuals. People's positions on interleaved and/or non-interleaved FASTQ files.,Yes,,"Best for job, Quickest, Word of mouth recommendation, Good documentation, Used in similar analysis",Lab server,Dependencies.,
10 4 3 10/6/2015 16:24:44,Is Scotland still in the UK?,10,"Often the University infrastructure is at least one major version out of date in terms of the OS. I often have to use Amazon EC2 to get around this, which costs money.",Yes,,"Quickest, Good documentation",University server/cluster,"Installing p̶o̶r̶e̶t̶o̶o̶l̶s software that relies on the latest C libraries, or with huge dependency lists which means that, as I often do not have root privileges and have to work with out-of-date OS, software simply won't install","Access to sufficient compute to carry out their studies - PhD budgets are already strained, there isn't enough to pay for compute, so they rely on (free/bad) departmental servers"
7 1 3,2 10/6/2015 16:25:01,No,7,"installation of different software, packages,... and getting them to work on different platforms (linux, mac, windows), both in doing this myself as well as helping students with installing software and getting it to work.",No,not applicable for my work (to little data for the organisms i work with),"Best for job, Word of mouth recommendation, Good documentation",Personal computer (laptop/desktop),,"basic computer knowledge and getting programs to work. Usually, they have a good background in statistics and biology, but the programming skills are the most crucial step and the skill they are often lacking."
9 3 7,3 10/6/2015 16:25:18,No,9,"Having to wait for data to be moved around",Yes,,"Best for job, Quickest, Good documentation",Departmental server,Installation of software when I don't have sudo permissions,10/6/2015 16:25:27,No,9,"lack of annotation of data",Yes,,"Best for job, Quickest",Lab server,running out of memory,"Getting them to accept the messiness of the real world, compared to textbook problems that actually have solutions"
1 1 5 10/6/2015 16:26:00,Yes,1,Lack of standardised file types.,Yes,,"Already installed on server, Best for job, Good documentation",Personal computer (laptop/desktop),,
4 4 6,8 10/6/2015 16:26:52,No,4,"Recondite documentation for other peoples' tools. Pressure of time – I always have to stop poking at my code before it's perfect to talk to people/eat/fill out surveys.Poorly commented code written by me-from-the-past (ass).",No,I work on a non-model organism and data is scarce.,"Best for job, Word of mouth recommendation, Good documentation",University server/cluster,Hard to test long-running/big-memory jobs,It is hard to get across the extent to which bioinfo. == patient data munging
8 4 9,10 10/6/2015 16:27:29,No,8,That there's a lot of crappy software out there that people put out without any real benchmarking.,Yes,,"Best for job, Word of mouth recommendation, Good documentation, Used in similar analysis, Has been benchmarked to give the right answer",University server/cluster,"Making permissions work well across members of the research group within the lager cluster that is used by many other people. Also, we don't do our own sysadmin and the sysadmin folks don't understand what we do.",Seems like there are a bazillion little things that someone needs to pick up and so I always feel like I'm correcting people or with holding information.
6 4 8 10/6/2015 16:28:17,No,6,Server waiting time,Yes,,"Best for job, Used in similar analysis",University server/cluster,,
2 4 2,8 10/6/2015 16:29:06,Yes,2,My general lack of programming expertise,Yes,,"Already installed on server, Best for job, Quickest, Good documentation, Used in similar analysis, Ease of use",University server/cluster,"Limited memory space on nodes, on occasion",
8 4 2,9 10/6/2015 16:29:55,Yes,8,"- Lack of personal training in good practices - Lack of good open-sourced pipelining tools - Publishing new and better implementations of existing methodologies should be valued considerably more. ",Yes,,"Already installed on server, Best for job, Word of mouth recommendation, Good documentation, Used in similar analysis",University server/cluster,Some tools are clearly not designed to be parallelized. This probably due to the lack of good (read easy to use) parallelization APIs in common scripting languages. ,
9 2 2 10/6/2015 16:30:59,No,9,Lack of Underlings ,Yes,,Best for job,Lab server,,Proper understanding of underlying algorithms to know how to optimize precise and accuracy results quickly.
8 4 3,5,11 10/6/2015 16:31:10,No,8,"-Flat text files -Poor/Awful/Incomplete/custom/docker-only software installation packages -Centralized code development -NCBI/SRA",Yes,,"Best for job, Word of mouth recommendation, Good documentation, Used in similar analysis",University server/cluster,"See above: Software installation, either due to poor understanding of install packages, or ignorance of things like easyinstall leads to awful install problems","As I describe it to students: ""You are facing an incredibly unfair challenge, you are being asked to learn (at the same time), 3 NEW disciplines: Unix/Linux command line, Software tools for bioinformatics, and scripting/programming languages (Bash/R/perl/python), to do a single analysis."""
8 4 8 10/6/2015 16:31:44,No,8,,Yes,,"Already installed on server, Best for job, Good documentation",University server/cluster,"Nodes failures, queue system bugs",
6 1 6,9 10/6/2015 16:33:18,No,6,"Lack of clear and adequate documentation in Software packages. For example, all jargon should be clearly defined.",No,Everything I have needed so far is in house,"Best for job, Good documentation",Personal computer (laptop/desktop),"Too slow, and so I am learning how to use databases and servers/clusters","That most have absolutely no computing knowledge, thus you have to train that as well as the domain specific aspects."
5 4 2,6,3,8 10/6/2015 16:33:28,No,5,First of all my own understanding of how stuff works. Secondly bad documentation. Thirdly: having to install c/perl libraries/modules. Fourthly: limited computing capabilities. ,Yes,,"Best for job, Quickest, Word of mouth recommendation, Good documentation, Used in similar analysis, Does it compile and/or crash?",University server/cluster,having to install arcane packages which i may not be able to do due to lack of permissions. ,having them learn how to google/ask for help online.
7 2 9,4,6,3 10/6/2015 16:33:35,No,7,"Lack of APIs for databases Closed source code in articles Unreproducible analysis in articles due to missing complete descriptions ",Yes,,"Best for job, Word of mouth recommendation, Good documentation",Lab server,Lots of software depencies that sometimes get unsupported,Training them in proper and fast coding
5 2 6,3 10/6/2015 16:34:26,Yes,5,"poor documentation hard-to-install software no sudo rights ",Yes,,"Best for job, Word of mouth recommendation, Good documentation, Used in similar analysis",Lab server,"can't always install software myself (no sudo) poor documentation for software - both usage and interpretation of output","too many people want their area of expertise to be covered rather than focusing on teaching a few useful basic skills well not sure how to teach effectively"
6 2 6,3 10/6/2015 16:36:32,No,6,"Poor documentation, especially on the effect of different parameter settings.",Yes,,"Good documentation, Used in similar analysis",Lab server,"Compiling software, especially with dependencies and linking libraries. ",Convincing them that the command line is not as scary as they think.
3 1 8,5 10/6/2015 16:36:39,No,3,Lack of Support from Supervisors,Yes,,"Best for job, Quickest, Good documentation, Used in similar analysis",Personal computer (laptop/desktop),"Computer too slow, Software incompatible",
5 4 3 10/6/2015 16:37:01,No,5,,Yes,,"Already installed on server, Best for job, Word of mouth recommendation",University server/cluster,Permissions,Getting them to believe they can actually do it
0 2 2,9,3 10/6/2015 16:37:30,Yes,0,"- Lack of computational trainings for undergrads, and lack of appropriate lab environment i.e. computational labs designed for Bioinformatics students. - Lack of JavaScript tools (for Bioinformatics analysis)",Yes,,"Best for job, Quickest, Good documentation",Personal computer (laptop/desktop),Dependency Hell.,
6 4 7,3 10/6/2015 16:38:16,No,6,disk IO (file copying),Yes,,"Best for job, Good documentation",Lab server,dependency hell,command line basics
8 4 8,3 10/6/2015 16:38:48,No,8,"- Lack of storage space - Lack of computational power ",Yes,,"Word of mouth recommendation, Good documentation, Well supported and used",University server/cluster,Installing software is always a big pain due to out-of-date operating system (e.g. old gcc),Getting to think like a scientist and not as a technician.
4 6 6,9 10/6/2015 16:39:07,No,4,data set interoperability ,No,i did previously,"Graphical interface, Good documentation",Cloud,,
6 4 5,4 10/6/2015 16:40:53,No,6,"Stupid inconsistencies between projects, e.g Encode says ""chr1"" and 1K genomes says ""1""Lack of null datasets to use to build statistical models Insufficient replicates",Yes,,"Best for job, Graphical interface, Good documentation, Used in similar analysis, Validated",University server/cluster,None - I'm at the Broad and they have a department that supports the servers and installs all the versions of all the software (feel free to spit in jealousy),None. Our students are invariably brilliant and a pleasure to train.
4 2 8,4,3 10/6/2015 16:40:48,No,4,"limited amount of CPUs and memory on lab server for everyone in the lab, difficulty finding metadata associated with publically available sequence data, some software is difficult to install (requires root privileges, only available for ubuntu, etc)",Yes,,"Best for job, Good documentation, Used in similar analysis",Lab server,"Installing software is usually difficult, especially if the software has many dependencies. Software that rely on GUIs instead of command line are terrible to work with on the lab server.",
5 4 4,3 10/6/2015 16:41:12,Is Scotland still in the UK?,5,"Time - being pulled in too many directions on too many projects. Underspend on data acquisition. See too many RNA-Seq experiments without replicates, or sequencing efforts unwilling to spend more money to get a complete genome vs spending uncosted time trying to extract as much as possible from a poor dataset.",Yes,,"Already installed on server, Best for job, Quickest, Word of mouth recommendation, Good documentation, Used in similar analysis, Good installation instructions without 101 dependencies",University server/cluster,"Cluster libraries tend to be old/stable, making installing some new bioinformatics tools extremely hard/impossible due to requiring very recent versions of their dependency chain.","Nowadays it is rare to have any experience of the command line at all, many older bioinformaticians grew up with DOS etc which provides a useful frame of reference."
3 4 8 10/6/2015 16:41:25,No,3,"Access to computing resources, clusters, cloud resources, etc.",No,Not relevant to my work,Best for job,University server/cluster,"the university server/cluster is not well managed, performance can vary widely depending on traffic",Finding the balance between showing them exactly what to do and letting them figure some things out on their own
6 6 4 10/6/2015 16:41:25,Yes,6,"1. Lack of time. 2. NGS machine errors.3. Investigators not sharing samples, sequences and metadata, especially during MERS and Ebola outbreaks.",Yes,,"Best for job, Quickest, Good documentation, Callable by command line ",Laptop 50% cluster 50%,,"1. Teaching bioinformatics to biologists who can't code.2. Teaching bioinformatics to coders who don't know biology."
8 1 8,3 10/6/2015 16:41:55,Is Scotland still in the UK?,8,"My failings.Funding ineligibility.Local computing infrastructure failings.Local bureaucracy failings.PI failings (when won't collaborate or share data, or provide funding for time/hardware).",Yes,,"Best for job, Quickest, Good documentation",Personal computer (laptop/desktop),"Very few. More problems on the server/cluster, with versioning etc. and oversubscription.","Encouraging a computational approach to thinking and managing/handling/analysing data (also sometimes convincing them that bioinformatics is science, and not just doing BLAST searches). "
5 2 7,8,11,6 10/6/2015 16:42:05,No,-5,"Large data files (file transfer times). Large storage space requirements for intermediate files. Extremely irritating and time-consuming requirements for submitting data for publication (NCBI)",Yes,,"Best for job, Used in similar analysis",Lab server,"Poor documentation from academic software. Need more vignettes, walkthroughs.","1) Many students dont understand why they have to leave the sanctuary of GUI based tools such as excel to move to R/UNIX/Python. They find the command line too daunting.2) Running a class on bioinformatics using real life examples for 10+ students takes too much time and computational resource. Need either access to cloud tools ($$) or to make the example so small it defeats the object."
8 4 8,5 10/6/2015 16:42:21,No,8,"Storage capacityComputational capacityInteroperability of tools",Yes,,"Already installed on server, Best for job, Good documentation, Used in similar analysis",University server/cluster,"Time-sharing, few options for extending window of use.","Depending on the background.Students with Computer Science background usually find it a challenge to understand the nuances in molecular biologyStudents with Biology background find it a challenge to consider command-line tools."
3 4 5,8 10/6/2015 16:44:08,No,3,"Different file formats. High CPU/RAM requirements. ",Yes,,"Best for job, Good documentation, Used in similar analysis",University server/cluster,"Google and Stackoverflow. Contact the author. (Response time is directly proportional to the time since the software was released).",One can never learn enough.
8 2 8,7,6,3 10/6/2015 16:44:33,Yes,8,"Data transfer speeds Poor documentation Tricky compiles with badly defined dependencies",Yes,,"Best for job, Word of mouth recommendation",Lab server,Generally CPU limited - but that's what the cluster is for.,Lack of basic computational/statistical literacy.
8 4 8 10/6/2015 16:44:41,No,8,"Too many tasks to do, more lab scientists than bioinformaticians",No,Lazy,"Already installed on server, Best for job, Quickest, Word of mouth recommendation, Good documentation, Used in similar analysis",University server/cluster,"Server downtime Incorrectly guessing resources requires",
7 4 8,3,6,10 10/6/2015 16:44:54,Yes,7,lack of competent collaborators at senior level; inadequate compute resources; ,Yes,,"Best for job, Word of mouth recommendation, Good documentation",University server/cluster,lack of server capacity; lack of documentation; lack of systems support; no root access to set up database servers or other infrastructure,
6 2 5,6,3 10/6/2015 16:45:33,Yes,6,Inconsistent data formatting. Poor/incomplete annotation. ,Yes,,"Best for job, Word of mouth recommendation, Good documentation",Lab server,Updating/dependency compatibility in R,
8 4 5,3,8,7 10/6/2015 16:48:01,No,8,People using xlsx files to send data. Or pdf. Or proprietary formats for statistical software. Anything else than text. ,Yes,,"Best for job, Quickest, Word of mouth recommendation, Good documentation, Stable, flexible ",University server/cluster,Installation problems that take to much of my time to solve. Ocasional limited computational resources (many users active at the time). Connection problems outside university network (vpn failings). ,"Hard to educate in the principle of : Think your question before you type. For bio/med background: don't be scared to make mistakes or look stupid. For comp / maths background : your question is more important that your method. "
2 4 3 10/6/2015 16:48:28,Yes,2,"Software installation. Lack of bioinformatics support (1 other person in the department that does sequence analysis from start to finish). ",No,I'd use a reference genome and reference databases like Greengenes and Silva but haven't needed anything else. Very much a biologist who uses genomics as and when. ,"Already installed on server, Used in similar analysis, Good tutorial available",University server/cluster,Difficult to download new programs onto the server. Probably based on my own lack of computing ability. ,N/A
4 4 2,3 10/6/2015 16:48:47,Yes,4,"Lack of knowledge, dizzying array of different solutions and no way to compare them or to judge which is best for a particular application other than word of mouth, after finally selecting one discovering that collaborators prefer another or you can't install it on a particular server, ennui, lack of time to perform the actual analysis because 5 years has been spent collecting samples and sequencing them and the project should have ended 2 years ago.",No,"Analytical tools there are poor, irrelevant or wrong. Do use them for storing data.","Best for job, Good documentation, Used in similar analysis",University server/cluster,Few problems running software as supported by a bioinformatics support service. However software that updates frequently can be an issue as there is always a delay in updating the remote install.,Installation of programs. Keeping materials and programs up to date and relevant. CLI is not an issue and is picked up relatively quickly by even complete novices.
8 1 6,5,3 10/6/2015 16:49:00,No,8,"Lack of proper comparisons / benchmarking of methodology, and thus standardisation Poorly implemented or documented code Lack of metadata annotation in published sequence data (environmental) Lack of time",Yes,,"Best for job, Good documentation",Personal computer (laptop/desktop),Difficulties to install dependencies (e.g. SciPy vX.x) or poorly documented programs,"Very limited knowledge of informatics and statistics, esp. among students primarily studying biology "
10 3 5 10/6/2015 16:49:49,Yes,10,"Experimenters approaching the core facility after data has been generated and not using our experimental design capability (data autopsy's) Experimenters unable to afford more replicates Experimenters not recording data correctly in the first instance (dodgy excel spreadsheets)",Yes,,Best for job,Departmental server,"Novice users running custom scripts that spawn too many (tens of thousands of) jobs on one node. ","Bioinformatics is more than one discipline: coding and analysis have different training goals A students base level of statistics knowledge is often poor Time between learning and using the knowledge is not always synced hence the trainees will often forget."
3 3 3 10/6/2015 16:51:04,No,3,"-time-successful wet lab work",Yes,,"Already installed on server, Good documentation",Departmental server,"Lacking certain software packages or modules, having trouble installing them.",
9 4 9,8,3 10/6/2015 16:51:11,No,9,Missing tools that I am forced to implement. The limited computer resources. The size of the human genome. Time in the day. Cluster stability.,No,Command line is more compatible with pipelines,"Already installed on server, Best for job, Quickest, Word of mouth recommendation, Good documentation, Used in similar analysis",University server/cluster,Updates of software breaking things. ,N/A
9 2 5,8 10/6/2015 16:52:41,No,9,"Bioinformatics expertise is not what hampers me. It is more the frustrating things that cannot be fully automated, which slow me down: identifier mapping, writing parsers, and searching+reading the literature.",Yes,,"Best for job, Quickest, Word of mouth recommendation",Lab server,"When running on the lab server, the main problem is lack of RAM, followed by lack of computational power. However, I have access to large clusters and cloud computing facilities, so in those cases I simply run elsewhere.","Teaching them to always *look* at the data and think critically about their results, rather than presenting me with a p-value or ROC curve."
8 4 3,6,9 10/6/2015 16:54:58,No,8,"Dependency hell when installing tools, lack of method description, hardly neutral/unbiased tool benchmarks",Yes,,"Best for job, Word of mouth recommendation, Good documentation, Easy to install",University server/cluster,"No root access, yet many tools require it",Lack of UNIX skills
5 2 3,5,2,10 10/6/2015 17:01:26,No,5,"Assumed that I am bioinformatics support for everything, which includes sysadmin for installing software. Having to constantly evaluate commercial products that are wrappers for open source software. Nobody understanding/caring why software A is more appropriate for a job than software b.",No,"more of a service facility. generate data, qc.",Best for job,Lab servers/cluster,"Not having a cluster management system deployed yet so we can't have multiple instances of software versions installed. Can get around this by having executables called across a nfs mount point, but sometimes that isn't really a feasible option.",
8 4 5 10/6/2015 16:59:14,Yes,8,"State of bioinformatics software, formats madness",Yes,,"Best for job, Good documentation",University server/cluster,,"Basic knowledge on algorithms, coding and testing their own code."
1 2 2 10/6/2015 16:59:34,No,1,"Confusing directions that go well above my level of understanding. As a noob who struggles with this kind of stuff, I find most new comer guides to assume the student knows way too much.",No,Not interested.,"Best for job, Good documentation, people to answer questions",Lab server,"Connection problems to server (my wifi, campus power outage, someone *turned the server off*, etc).",
4 3 6,3 10/6/2015 16:59:36,No,4,Poor documentation of tools and methods.,Yes,,"Best for job, Word of mouth recommendation, Good documentation, Used in similar analysis",Departmental server,"Poorly documented software, dependency conflicts, installation process not clear.","Bioinformatics is just such a wide field - students need to be confident programmers, but also able to teach themselves to use lots of tools and understand the biological background - so lots to learn and teach. Also, with all the methods out there, I feel like not enough of a focus is placed on good research methodologies (and not just use some tool you don't understand on your data) - so maybe that's the biggest challenge?"
8 2 5,8 10/6/2015 17:00:39,No,8,"- inconsistent use of data formats; - access to shared compute resources (in theory, I have access to lots of shared resources; in practice, queue times, disk allocations, etc are very limiting); - inability to distinguish at an early stage how much effort I need to put into software engineering for a particular program/script (i.e., I frequently over-optimize code that never gets used again, and I frequently continue to use kludgy, poorly-written code over and over again without putting in a small amount of effort to clean it up and document it, which would save me a lot of time)",Yes,,"Best for job, Quickest, Word of mouth recommendation, Good documentation",Lab server,Constantly maxing out local disk capacity,Command-line bullshittery without a doubt. http://www.pgbovine.net/command-line-bullshittery.htm
4 1 2 10/6/2015 17:04:01,Is Scotland still in the UK?,4,Lack of a background in maths and computer science. ,Yes,,"Best for job, Good documentation, Used in similar analysis",Personal computer (laptop/desktop),"Typically none, will make use of university cluster when memory/processing power required in abundance.",
9 4 5 10/6/2015 17:06:52,No,9,Lack of real standards,Yes,,Used in similar analysis,University server/cluster,,they're noobs
7 4 6,9 10/6/2015 17:07:39,No,7,"Good software (well documented, open, predictable, using a reasonable amount of ressources)",Yes,,"Best for job, Good documentation, Used in similar analysis",University server/cluster,Bad documentation,Having them think programmatically and learn general tools like linux bash R python.
7 2 6,5 10/6/2015 17:08:09,No,7,"Poor documentation Much overlap among tools (e.g., most can query NCBI), but few have more useful/specialised components",Yes,,"Quickest, Good documentation",Lab server,Remarkably few; occasionally would be nice to have more cores,"Too many programs leading to confusion as to how to start Hard for them to install large frameworks Poor programming skills starting out"
7 4 6,5,8 10/6/2015 17:08:16,Yes,7,"-Bad documentation. -Non-bioinformatically trained collaborators ignoring guidelines for optimal/standara data formats. -Expectation to carry out analysis i)in unreasonably short times (""press a button"" bioinformatics misunderstanding ii) free iii) uncredited . Resist and fight in all cases.",Yes,,"Best for job, Word of mouth recommendation, Good documentation, Used in similar analysis",University server/cluster,"-New users crashing the server (but we have a stellar IT support who are quick to fix and re-train new users.-Potentially long queueing times.",
4 2 3,6 10/6/2015 17:08:29,No,4,Dependencies and poor documentation. My own lack of knowledge and time.,Yes,,"Best for job, Quickest, Good documentation, Used in similar analysis",Lab server,Trying to install things that are not self contained.,Finding people that are good that will accept non-competitive pay.
5 4 6,9,8,3 10/6/2015 17:11:48,Yes,5,"Documentation unclear/incomplete, inefficient code - not multithreaded/parallel, waiting in the batch queuing system, my lack of python, R just being R. Lack of time to learn Python or other skills, people just want to know 'the answer' so have to make do with the quick and easiest solution (not necessarily best).",Yes,,"Best for job, Quickest, Good documentation",University server/cluster,"Lack of dependencies, lack of permissions to install. Managing to work around this with a virtual env.","Command line not intuitive for some. So many ways to do things, difficult to decide what is best to teach. Broad audience."
2 5 2,3,6 10/6/2015 17:12:32,No,2,"-Poor familiarity with Perl a and coding in general -dependencies that crash or are outdated -uninformative error messages from open-source algorithms (ex: just ""core dump"" and nothing else)",Yes,,"Already installed on server, Best for job, Quickest, Good documentation",Cloud,Different Ruby or other background installation than the software needed,
2 2 6,3 10/6/2015 17:12:34,No,2,"There's never a guide that applies to all research, so developing a pipeline separately for each different project can be time consuming. As a beginner, I find it frustrating that there's often no answers to the errors I receive from my commands. If there were a comprehensive list of errors and explanations for each program I use, that would be ideal (but in reality, unlikely to ever happen).",Yes,,"Already installed on server, Best for job, Good documentation, Used in similar analysis",Lab server,I struggle installing programs i.e. setting the path and making it accessible to all users. ,I don't train students - I'm a graduate student.
4 2 2,8,3 10/6/2015 17:18:01,Is Scotland still in the UK?,4,Lacks of knowledge about already available tools.,Yes,,"Already installed on server, Best for job, Quickest, Word of mouth recommendation, Good documentation, Used in similar analysis, Is an R package",Lab server,"Software not installed. Out of disk space on my quota. X11 forwarding being irritably slow.",Not (yet) training students.
2 1 6,2,3 10/6/2015 17:18:32,Yes,2,"Lack of simple, documented command line tools or similar to manipulate/parse text files on windows desktop. Loading a linux VM is a PITA. Lack of GUI tools with good feedback on function for similar tasks as above - e.g. contrast with power and interface of CellProfiler for potentially very complex image analysis. Lack of time to develop my own skills to enable training of students and create/understand a set of consistent tools. Informatics is only a small part of most of our projects (e.g. gene selection for an expression screen), so activity tends to be sporadic and very task/job centric.",Yes,,"Graphical interface, Good documentation, Used in similar analysis",Personal computer (laptop/desktop),"Dependencies/incompatibilities (on Linux) - really, didn't Windows solve this in '95 with the end of ""the version of run32.dll needed is XXX.XXX, click here to install"". Why is this still a problem? Documentation unsuitable for people of my own limited experience. Analysis or data aggregation is often needed very sporadically, meaning gaining genuine expertise is often not time-efficient or recall of how I got it to work last time is poor.","My biggest issue is lack of personal knowledge in the area - can't train people without the relevant expertise yourself. I therefore rely upon others with better experience (colleagues/collaborators) and the student's own ability. This approach is fraught with risks and pitfalls, many of which we have had to dig ourselves out of already - however in the absence of simpler/easier toolchains it has been the only way we have got things done."
5 4 8,4 10/6/2015 17:18:35,No,5,internal access to data and wait times.,Yes,,"Already installed on server, Best for job, Quickest, Good documentation",University server/cluster,issues with university department running the server's priorities.,
6 4 3 10/6/2015 17:19:12,No,6,"crappy runs only on my laptop only that doesn't take into account actually installing software for real on a production system outside of your home directory. Reliance on horrible dependency hell of dynamic libraries (python, R). super user requirements of docker",Yes,,"Already installed on server, Best for job, Word of mouth recommendation, Good documentation, Used in similar analysis, open source",University server/cluster,,
8 4 5,3,8 10/6/2015 17:20:31,Yes,8,"Getting data from users in a usable or easily usable format. Ease of getting updates to software on the cluster",Yes,,"Already installed on server, Best for job, Word of mouth recommendation, Good documentation",University server/cluster,"Out of date libraries Occasional downtime of the cluster",Fear of command line!
8 2 5,6,9 10/6/2015 17:22:49,Yes,8,"Inconsistent naming Crap documentation Broken software Mick Watson ",No,I have a fucking big cluster.,"Best for job, Quickest, Mick Watson",Mick Watson,Cluster is solid. Storage is solid. Mick Watson,Mick Watson.
2 4 3,8,2 10/6/2015 17:27:30,Yes,2,"Computer power, size of dataset (metagenomics). Lack of knowledge re: computer languages.",Yes,,"Already installed on server, Best for job, Good documentation",University server/cluster,Long delays in waiting for things to be installed,
4 1 7 10/6/2015 17:29:28,No,4,"Poorly designed websites; Copying data from one website to another",Yes,,"Best for job, Graphical interface, Good documentation",Personal computer (laptop/desktop),,
7 4 8,6,3 10/6/2015 17:38:52,Yes,7,"Too many data, too little time (and memory) Not well documented software Lack of collaboration between (some) people Usage of office (seriously guys?)Too much perl ",No,Never needed to,"Best for job, Word of mouth recommendation, Good documentation",University server/cluster,"Software not installed on the server (dependencies and linking libraries). Long queuing time. Disk capacity (yeah, on the server as well)",
4 3 3,10,6 10/6/2015 17:33:54,Yes,4,"Installation and getting software to work! No admin privileges on dept. server- installation often impossible. Limited software preinstalled. Lazy and unhelpful server admins Poor software documentation, lack of clear examples/instructions",Yes,,"Already installed on server, Good documentation",Departmental server,,
2 1 5,3 10/6/2015 17:35:04,No,2,"Finding tools to easily convert data formats Things supposedly in a format that are really slightly out of spec and so won't work",Yes,,"Best for job, Word of mouth recommendation, Good documentation",Personal computer (laptop/desktop),"One of the biggest problems I ruin into is institutional , we don't have admin privileges on our own competes ( do on our servers though)",
4 4 2,3 10/6/2015 17:37:01,No,4,"Multitude of software/tools but not knowing which one is the ""best"" for my data. The uncertainty of genomic downstream analysis because of current limitations of sequencing technologies. Not having many people in the lab that can critically assess the way I analyse data.",Yes,,Best for job,University server/cluster,Not being able to quickly install/update software.,I am a PhD student myself.
2 4 5 10/6/2015 17:39:44,No,2,1000 different formats and having to translate between them,Yes,,"Word of mouth recommendation, Used in similar analysis",University server/cluster,hangups in my connection -> changes in script saved locally but not on cluster -> running same errors again,
5 1 9,3 10/6/2015 17:41:18,Yes,5,"Bad software Bad software written in Perl Bad software which masquerades as good software Lack of well-maintained, clear-winner tools like Samtools, Khmer. The NCBI and most of the software it maintains. The many painful steps of dealing with BLAST-like results for analysing metagenomes The suckiness of a large proportion of bioinformatics web services. Dependency management ",Yes,,"Best for job, Quickest, Good documentation",Personal computer (laptop/desktop),"Dependency management, version conflicts, issues with GCC/Clang. People not wriitng tools to be Mac compatible.",
5 2 3,6 10/6/2015 17:44:42,No,5,"Being a geneticist without CS background, I am annoyed that my uni doesn't teach python and R.",Yes,,"Already installed on server, Best for job, Quickest, Word of mouth recommendation, Good documentation",Lab server,installing poorly documented software,
7 1 2,8 10/6/2015 17:52:06,Yes,7,"Using VMs, need more coding knowledge...",Yes,,"Word of mouth recommendation, Good documentation",Personal computer (laptop/desktop),Slow / needs more RAM,Microsoft Windows
7 5 5,3,2 10/6/2015 17:53:28,No,7,"Few agreed formats, tools difficult to install, often break. Self loathing. ",No,"Too difficult, not enough documentation.","Word of mouth recommendation, Good documentation, Used in similar analysis",Cloud,Managing a large ad hoc cluster. ,A large amount of skills and knowledge need to be learnt to be productive.
8 1 3,5,4 10/6/2015 17:54:44,Yes,8,Various kind of Standard but non-standard data types to deal with ,Yes,,"Already installed on server, Best for job, Word of mouth recommendation, Good documentation, Used in similar analysis",Personal Computer/cluster,dependencies and specific version,Training with sample data set and using tools in real life can be different and always enhancing tools library there is always something (not always the best but) new out there
8 2 6,3,9,5 10/6/2015 17:59:07,No,8,"* Lack of clear documentation and example use cases for tools, especially those with a large set of options * Lack of good, ""simple"" test cases (particularly annoying when building from source; would like to check build is correct in a few minutes, i.e., not having to run a full human genome through the tool). * Undocumented build and/or runtime dependencies; related: including large and/or duplicate dependency source trees (e.g., BOOST) to avoid such issues. * Lack of long-term stability and release/transition plans for many tools. Especially bad w.r.t. web-based services and data sources, which often disappear/move without notice. * Proliferation of new tools instead of contributing improvements to existing ones (see, for example, short read mappers). Similarly, would like to see a community-maintained, optimized (e.g., using SSE) library for key algorithms like Smith-Waterman to reduce duplicate efforts. * Bizarre software architectures, especially w.r.t. installation directory structures; please, please, please, stick with the ""bin""/""share""/""lib"" convention and have one main executable. * Hard-coded paths (particularly for tool locations) and similar assumptions about how our site's infrastructure is configured. * Almost all bioinformatics tools shouldn't need admin (root) privileges to install, let alone run. Don't assume install will be in /usr/local (or, even worse, /usr); should be able to build and run out of $HOME/.local (or equivalent) if I want to. * Proliferation of data formats, lack of documentation/standards for them, lack of compliance with standards (if they exist), and the insane number of parsers I have to write as a result. * Web services need to start supporting HTTPS (not just plain, un-secured HTTP). * Licensing: costs for ""non-commercial/research/academic use only"" software often crippling for smaller biotech startups and hospitals; custom, confusing, unclear, and/or incompatible licenses (e.g., GPL sources except for one key .c file that's marked ""academic use only). Talk to a lawyer before licensing your software!",Yes,,"Already installed on server, Best for job, Quickest, Word of mouth recommendation, Used in similar analysis",Lab server,Lack of memory or storage; bursting to cloud sounds like a good idea until we run into bandwidth and data security issues,
8 3 7,8 10/6/2015 18:01:30,No,8,"Bandwidth, large datasets to transfer",Yes,,"Best for job, Quickest, Good documentation",Departmental server,Running out of memory and/or disk space ,
5 3 3,5 10/6/2015 18:02:05,Yes,5,"Other than not being Heng Li, things that frustrate me are: 1. installing and compiling bioinformatic software 2. Other people not being able to use good bioinformatics tools and therefore asking me to do it for them 3. Not having all the fastqs in SRA available in an easy to analyse format alongside my analysis pipelines on a large cloud infrastructure",Yes,,"Already installed on server, Best for job, Word of mouth recommendation, Good documentation",Departmental server,"None really, but we are lucky to have multiple sys admins and experienced bioinformaticians.","The lack of complete tutorials that, if they were given some new data, they could follow an analysis pipeline (e.g. SNP analysis) with minimal (<30 minute) hands on time from me."
8 2 6,9,5 10/6/2015 18:03:55,Yes,8,"Cruddy data formats, terrible software, awful documentation. All of it really!",No,,"Best for job, Good documentation",Lab server,None really.,
5 4 2,8 10/6/2015 18:05:00,No,5,My inadequate coding skills.,Yes,,"Best for job, Used in similar analysis",University server/cluster,"Incompatibility of software, resource limits - wall time, memory, etc.",
8 4 5,3 10/6/2015 18:07:20,No,8,"Change Legacy projects ",No,Needs to be downloaded,"Best for job, Good documentation, Used in similar analysis",University server/cluster,"Installation, though this is vcoming easier with Docker",
1 4 3,2 10/6/2015 18:08:28,No,1,"Installation of software programs. Software program updates not being compatible with each other. Overall lack of knowledge, and how confusing it can be to gain it (since I'm a beginner). ",Yes,,"Best for job, Word of mouth recommendation, Good documentation, Used in similar analysis",University server/cluster,We have excellent support staff who troubleshoot software packages so this tends not to be a huge problem now that we have switched over to this cluster. ,n/a
6 4 4,5,6,3 10/6/2015 18:09:06,No,6,"lack of metadata, lack of standard formats, lack of clear documentation, perl-python-R-C monsters with 1000 different dependencies",Yes,,"Best for job, Good documentation",University server/cluster,unmet dependencies,
8 2 9,3 10/6/2015 18:09:11,No,8,"Large resources failing to carefully consider the extensibility and ""consumability"" of the product they provide.",Yes,,"Best for job, Good documentation",Lab server,Package dependencies and data specific run-time errors.,Taking the time to demonstrate a balance between best practices and exploring the problem fully.
5 1 9 10/6/2015 18:09:30,No,5,Closed culture of bioinformatics crusaders who deny the existence of legitimate tools outside of Linus environment,Yes,,"Best for job, Good documentation, free, reputable, robust",Personal computer (laptop/desktop),,"Field is now too broad for a general education in ""bioinformatics""."
5 4 3,5 10/6/2015 18:13:22,No,5,"Software installation issues Lack of default file formats ",Yes,,"Already installed on server, Word of mouth recommendation, Good documentation, Used in similar analysis",University server/cluster,No root access means waiting time for software installs or trying myself without having enough Sysadmin skills,You have to start from scratch as they have no formal training in anything combined with the fact that there is a lot to learn
8 1 4,3 10/6/2015 18:14:14,No,8,"Groups that don't release their data, release their data but make it difficult to reuse, or release their data with a license that prevents redistribution.",Yes,,"Best for job, Quickest, Word of mouth recommendation, Good documentation, Used in similar analysis",Personal computer (laptop/desktop),Errors while installing,"It's very interdisciplinary, so students rarely are strong in all aspects."
6 4 3 10/6/2015 18:16:27,No,6,"At other jobs it has been IT and infrastructure, now it is metadata storage (and lack of well-defined solutions) and data integration across omics experiments.",No,The type of data I'm processing would not benefit from those possible comparisons,"Best for job, Quickest, Good documentation",University server/cluster,"Asking for new modules all the time and waiting for them to be installed is a hassle, so managing my own environment can be a problem especially when servers are dismantled, changed, upgraded, etc.",I try my best not to train anyone.
3 4 3,4 10/6/2015 18:18:21,Yes,3,"Python modules Non-open licenses",Yes,,"Best for job, Word of mouth recommendation, Good documentation, Used in similar analysis",University server/cluster,None,Sometimes they don't know much stuff
3 4 2,5,8 10/6/2015 18:20:45,Yes,3,"Need to learn how to use multiple programs Inconsistency in file formats",Yes,,"Best for job, Used in similar analysis",University server/cluster,"Server downtime Long waits for jobs to run sometimes",
7 2 2,3 10/6/2015 18:26:21,No,7,downloading a data and working and filtering the contaminated data sucks!!!! no enuf time to understand the biology and improve the coding skills better sucks !!!!,Yes,,"Best for job, Quickest, Word of mouth recommendation, Good documentation, Used in similar analysis",Lab server,"python module cannot found and permission denied , most of the services blocked ",understand the biology behind running tools alone is not important churn out the needle from a haystack
5 4 5,3 10/6/2015 18:39:39,No,5,"- Lack of truly standard formats - Varying chromosome names in human reference genome (with or without ""chr"" prefix)",Yes,,"Best for job, Good documentation",University server/cluster,"Software doesn't compile on older OSes Bottleneck in installing libraries/headers since I do not have root access",They often struggle setting up their software environment
1 1 2 10/6/2015 18:46:29,No,1,I don't enjoy bioinformatics. I wanted to be a scientist and now this is apparently part of the gig. Coding is hard.,Yes,,"Word of mouth recommendation, Graphical interface, Used in similar analysis",Personal computer (laptop/desktop),None yet. I imagine wen I start doing more meta genomics work (coming soon) I will have issues with space on my personal lap top.,I am one to be trained :)
6 2 6,3 10/6/2015 18:47:37,No,6,Poor documentation,Yes,,"Best for job, Good documentation",Lab server,Dependencies,Where to start! Students can come with so many different backgrounds.
5 1 6,3,8 10/6/2015 18:52:24,No,5,"insufficient, inaccurate software documentation time spent trying to convert among file types/rearrange data",Yes,,Used in similar analysis,Personal computer (laptop/desktop),"Slow Installs on personal computers can sometimes be painful. Issues with dependencies, compilers, etc...",
4 4 2,3 10/6/2015 18:58:48,No,4,"My lack of ability, obvs Lack of good students/postdocs",Yes,,"Best for job, Word of mouth recommendation, Good documentation",University server/cluster,The mother fuckers changing to slurm and messing up a whole barrel load of lovingly maintained pipelines,"We should teach 3 things a) principles (inc stats, which most biologists are pathetically ignorant about), b) how to use the command line c) how to use google. Given the pace of change people have to be able to teach themselves from the basics"
4 3 5,2 10/6/2015 19:00:59,No,4,"File formats. Particularly GTF files. My own lack of formal computer science/software engineering training",No,I mostly used local instances of databases/genomes,"Word of mouth recommendation, Good documentation, Used in similar analysis",Departmental server,User error!,"Finding a common vocabulary and being sure that your understanding of a concept is the same as theirs. I deal mostly with PhD students who (understandably) spend most of their time/focus on their project, and this often makes it difficult to establish a proper understanding of more general bioinformatics tools and analysis. "
7 3 2 10/6/2015 19:02:57,No,7,"The words ""pilot study"" doesn't excuse you from all aspects of good study design. Many biologists nearly complete lack of interest in learning how to analyze their own data or understanding the details of the analysis. ",Yes,,Best for job,Departmental server,,"Lack of computing/programming skills, resulting in a very step learning curve for bio grad students. Academic programs lead by faculty that they themselves have little to no bioinformatics research experiences. "
5 4 2,3,8 10/6/2015 19:06:05,No,5,"• Not a lot of institutional or other support for moving beyond 101 stuff. I know what the command line is. I know how to grep. I know how to install and run something like bwa. I don't know how to write decent classes. • Dependency hell. • Software or software advances that only apply to or work in the human genome. • Being self-taught in language of choice means I have to dig to learn best practices. • Continuing own education after learning the basics.",Yes,,"Already installed on server, Best for job, Word of mouth recommendation, Good documentation, Has to be installable without requiring hours to figure out how to edit the Cmake files.",University server/cluster,"• Long-running jobs (for example, bwa mem with no seed) exceed the allowed time limit and are killed. • Difficulty installing with limited permissions.",
6 4 3,2 10/6/2015 19:16:26,No,6,Time installing software,Yes,,"Best for job, Quickest, Word of mouth recommendation, Used in similar analysis",Xsede cluster,,"Most lack a basic understanding of how a computer works....how ram is utilized, data stuctures, file types, ect "
9 4 * 10/6/2015 19:20:18,Yes,9,,Yes,,"Already installed on server, Best for job, Quickest, Word of mouth recommendation, Good documentation, Used in similar analysis, command line interface",University server/cluster,,
7 2 3,6 10/6/2015 19:22:51,No,7,Lack of developers.,No,Never seen the benefit,"Best for job, Word of mouth recommendation, Good documentation, Used in similar analysis",Lab server,"Installation, poor documentation and lack of author response to inquiries. ",Training good software development practices.
4 1 6,8 10/6/2015 19:29:36,No,4,The poor documentation of programs and the number of programs available making it impossible to figure out which ones are the best to use for your analysis. There is no best practice and it is changing all the time.,Yes,,"Best for job, Word of mouth recommendation, Good documentation, Used in similar analysis",Personal computer (laptop/desktop),Lack of memory especially with java based programs.,"Teaching them to work in the terminal and that understanding what the program does is essential. Especially with troubleshooting programs, they need to understand the programs to keep moving as error messages often are not very informative."
8 4 7,9 10/6/2015 19:30:39,No,8,Having to copy files back and forth between different cluster file systems and I/O throughout limitations.,Yes,,"Best for job, Quickest, Word of mouth recommendation",University server/cluster,Bugs in unmaintained but useful bioinformatics software.,"The need for students to learn practical data wrangling, analysis and visualization skills in R, Python, etc that often aren't taught in course work."
5 5 2 10/6/2015 19:35:49,No,5,The need to learn so many different programs and lack of a computer science background. The best analysis methods change so fast it is hard to keep up.,Yes,,"Best for job, Word of mouth recommendation, Good documentation",Cloud,,Training them to actually understand what each script is doing to the data and why is it important to understand the process and not just the final output.
8 2 6 10/6/2015 19:37:02,No,8,Lack of qualified human resources at all levels,Yes,,Best for job,Lab server,Documentation problems,The difficulty of having both computer science and molecular biology mindsets
5 1 * 10/6/2015 19:45:51,No,5,,Yes,,"Best for job, Quickest, Graphical interface",Personal computer (laptop/desktop),,
8 4 8 10/6/2015 19:47:04,Yes,8,Having sufficient compute and storage for large-scale exome and genome analysis.,Yes,,"Best for job, Word of mouth recommendation, Good documentation",University server/cluster,Competing with other projects for compute capacity can slow analyses.,
4 1 * 10/6/2015 19:53:09,Yes,4,Nope,Yes,,That I know how it works and have it,Personal computer (laptop/desktop),"None, but some better software is available but too expensive.",They hate it :-)
7 4 7 10/6/2015 19:53:21,No,7,Network speed,Yes,,"Best for job, Word of mouth recommendation, Good documentation, Permissive licensing",Departmental server,,
9 2 4,5 10/6/2015 19:55:02,No,9,(US) Government network infrastructure and the arbitrary security requirements. Each lab must effectively create a separate internal network to do any meaningful analysis.,Yes,,"Best for job, Word of mouth recommendation, Am I allowed to use it, is it US-sourced, is the license compatible with my needs.",Lab server,"Data is not ubiquitous. Keeping data persistent across a laptop, a server, a SAN, and a cloud instance is most of the job. Gotta have the data in the right place and format to do the work.","They're afraid to break things. They have a powerful general-purpose computer in their pocket, but the app store approach to computing makes them think they can't tell it what to do."
3 4 3,2 10/6/2015 19:57:14,No,3,"Installation of tools not always straightforward. Mostly CLI, which is ok. Unclean data. ",Yes,,"Already installed on server, Good documentation, Used in similar analysis",University server/cluster,"As a part-time bifo person, I think that I struggle with selecting proper defaults that fit my experiments because I don't always have a good understanding of the stats behind them. Not for lack of trying either!",NA
9 3 6,4,3,5 10/6/2015 19:58:36,No,9,"- insufficiently described methods in papers - dbGaP - code from published papers that's not actually available - code that's not documented - code that won't install - code that doesn't work - too many formats - insufficient funding",Yes,,"Best for job, Word of mouth recommendation, Good documentation, Used in similar analysis",Departmental server,"- no root access so can't install some binary-only software - dependency hell - lack of support for the previous major version of a Linux distribution","Getting them to avoid newbie errors (like using inconsistent assemblies or zero- and one-based coordinates together). No matter how many times I underline how important this is they keep doing it! Also the huge amount of time new students have to spend installing software and configuring their environment just to get started."
8 4 * 10/6/2015 20:00:59,No,8,"No one cares about doing things the right way, so when you do things the right way it appears you're slow.",Yes,,"Best for job, Good documentation, On Github (surprisingly good indicator)",University server/cluster,Development lag between my laptop and machine. I have a powerful laptop (16GB) memory which allows me to do as much as possible before running remotely.,"Narrowing down the audience so you're not boring some students while others struggle with ""if"" statements in simple programming."
6 3 5,8,3 10/6/2015 20:02:50,No,6,"- converting between file types - lack of common 'grammar' for bioinformatics analysis - lack of easy to access computational resources - lack of solid advice on which techniques to use and when (e.g. best practices)",Yes,,"Best for job, Quickest, Good documentation, Used in similar analysis",Departmental server,Not allowed to compile programs from source on user accounts,
6 4 5,3,9 10/6/2015 20:06:55,No,6,"Consistent need to reprocess useful public datasets to achieve uniformity Perceived lack of institutional/mentor support/appreciation for value and necessity of implementing valid, rigorous bioinformatics methods ",Yes,,"Best for job, Good documentation",University server/cluster,"1) environment incompatible with software (requires assistance from sys admins), 2) software immature (I'll correct a handful of coding errors, but more than a few obvious bugs cause me to distrust/abandon a new tool)",
2 1 2 10/6/2015 20:07:24,Yes,2,Opportunity/experience,Yes,,"Best for job, Good documentation, Used in similar analysis",Personal computer (laptop/desktop),,
2 1 2,3 10/6/2015 20:08:20,No,2,"Available GUI tools, Canned solution like docker needed to remove need for installing additional software dependencies.",Yes,,"Best for job, Word of mouth recommendation, Good documentation",Personal computer (laptop/desktop),DEPENENCIES!! Pain in the ass.,
8 1 5,9 10/6/2015 20:27:35,No,8,"1 - data preparation taking more time than the actual analysis 2 - poor experimental design 3 - poor data quality (you find out after long data preparation) ",Yes,,"Best for job, Word of mouth recommendation, Used in similar analysis",Personal computer (laptop/desktop),"poor programming, lack of parallelisation",biological thinking
2 3 2,6,3 10/6/2015 20:28:11,Yes,2,"The complexity of the analyses relative to my limited knowledge; I'm picking up the background and terminology as I go along, but am painfully aware that a mistake through ignorance could invalidate the whole thing (particularly when there's no consensus on the best method anyway). A lot of the documentation assumes a certain level of computing/statistical knowledge that make it rather impenetrable to someone from a different background.",No,Haven't had the need to as yet,"Word of mouth recommendation, Used in similar analysis",Departmental server,Something breaks somewhere and I'm dependent on the server technician being available and able to fix the problem; different people want different versions of the same software; not easy to install new software; sometimes people jam up the server with huge jobs.,
2 3 2 10/6/2015 20:29:59,Yes,2,Not enough time to learn how to make neat code - forced to rush through ugly hacks.,No,Run local instances.,"Best for job, Word of mouth recommendation, Good documentation",Departmental server,,
3 1 8,6 10/6/2015 20:37:01,No,3,"(1) My computer gets angry (2) I need to do pesky human things like sleep and eat. (3) Bad metadata -- can we please get consistent standards that people actually follow. I spend most of my life dealing with metadata. (4) Backups in the wet lab. Our wetlab people are awesome, but sometimes, they get busy.",Yes,,"Already installed on server, Good documentation, Used in similar analysis, Code is tested and in a language I know",Personal computer (laptop/desktop),I cook logic boards.,"Figuring out how to balance the biology and computation... they're good at one or there other, but struggle to handle both."
9 2 4 10/6/2015 20:39:10,No,9,Availability of software and methods from published papers.,No,I use Qiita,"Best for job, Good documentation",Lab server,"None, really.",Not having an algorithmic thinking.
9 2 3,5,8 10/6/2015 20:39:50,No,9,"inflexible software software with too many dependencies/complicated configuration Software that use non-standard and incompatible formats",Yes,,"Best for job, Quickest, Good documentation",Lab server,lack of resources (when sharing with others),
6 4 6,5,8,3 10/6/2015 20:59:23,No,6,"multiple and/or badly documented data formats, bad software quality, unobtainable data from published studies, incomprehensible Perl (not only...) scripts. Matherial and methods: ""data was analyzed using custom Perl|Bash|awk scripts"" (not available from anywhere)",Yes,,"Best for job, Good documentation, Used in similar analysis",University server/cluster,"long waiting time. Manual partitioning of input data. Obsolete or incompatible with bioinf software versions of gcc, libraries etc. ","GUI oriented education. Finding the right proportion between teaching using existing programs as ""black-boxes"", vs ""see the guts of an algorithm"" but do not learn how to use it."
8 4 * 10/6/2015 20:40:28,No,8,,Yes,,"Already installed on server, Best for job, Quickest, Good documentation, Used in similar analysis",University server/cluster,,
8 4 6,3 10/6/2015 20:41:23,No,8,"Poor and/or incomplete documentation. Software that has too many dependencies, or only works on specific versions of Linux. ",Yes,,"Best for job, Word of mouth recommendation, Good documentation",University server/cluster,Have to get some tools installed by sys admin,"Making them skeptical about the tools that they use, and getting them to check all inputs and outputs "
7 4 4,2,3 10/6/2015 20:44:17,No,7,Access to data sets.,Yes,,"Best for job, Word of mouth recommendation, Used in similar analysis",University server/cluster,Dependencies.,Lack of unix skills
2 2 3,2 10/6/2015 20:45:41,Yes,2,Lack of time to learn how to do stuff properly,Yes,,Word of mouth recommendation,Lab server,It don't install right,They need I medics training and its often done peicemeal in short courses or workshops
7 2 8 10/6/2015 20:46:18,No,7,"the proxy deploying whatever I want in a server. storage cluster+quotas",Yes,,"Already installed on server, Best for job, Quickest, Word of mouth recommendation, Good documentation, Used in similar analysis, I understand the code",Lab server,setting the limits : too many resources / not enough,"linux/command-line is your friend, really."
4 4 2,5 10/6/2015 20:48:43,No,4,So many tools...dissemination of current best practices and which tools compliment one another is a constant struggle.,No,Don't integrate particularly well with mainstream Linux/HPC resource at academic institution,"Best for job, Word of mouth recommendation, Good documentation, Noted track record",University server/cluster,No standardization in command line flags and options among tools (assemblers for instance). Poor scaling. memory utilization.,"Poor preparation for linux use and computational science in general. Conceptually unprepared for ""tools of the trade""."
2 4 4,6,8,3 10/6/2015 20:53:32,Yes,2,"Replicates. I.e good quality input data. Poorly documented code and scripts. Resources that assume I have time to work out their idiosyncratic usage methods. Lack of knowledge. Lack of time. ",Yes,,"Best for job, Quickest, Word of mouth recommendation, Good documentation, Used in similar analysis",University server/cluster,"Memory.Available server time. Suitable permissions to install software as needed. Needless restrictions on activity. Cost. ","A lack of well documented exemplars and illustrations which take into account the time and motivation levels of undergraduates. For post grads, a lack of suitable online resources which teach logical programming and unit testing. Can someone somewhere write an intelligible guide to GIT. "
7 3 9,5,8,3 10/6/2015 20:54:07,Yes,7,"Buggy software, software which takes non-standard input and output formats, software which has a lengthy set of dependencies, software which requires you to fax/email to get a license, software with no source code so I have to treat it as a black box Also I fill up my disk quota a lot.",Yes,,"Best for job, Quickest, Word of mouth recommendation, Good documentation, Used in similar analysis",Departmental server,"Has a non-standard setup with libraries in odd places, so lots of software is tricky to compile.","(I do only informal supervision of PhD students) Steep learning curve to get basic programming and command line familiarity down, which is necessary to use the computing resources available (very few GUI options here)"
8 4 * 10/6/2015 20:54:56,Yes,8,Misunderstandings with collaborators who hail from a purely biological\clinical background as opposed to a computational biology\genomics one.,Yes,,Best for job,University server/cluster,Bioinformaticians with a biology background unable to use the SGE properly and wasting everybody's time and patience,
6 1 3,5,2,8 10/6/2015 20:57:21,No,6,"Data homogeneisation - formating Programming skills Hardware capabilities",Yes,,"Best for job, Good documentation, API, librairies to treat it",Personal computer (laptop/desktop),"Compilation, dependencies, ","data is every where, don't only produce data, analyse them!"
4 4 10,3 10/6/2015 20:57:55,Yes,4,"IT departments that don't understand that they need to let people get on with things, and that some people need access to more than just Microsoft office and a few minor programs. ",Yes,,"Already installed on server, Best for job, Quickest, Word of mouth recommendation, Good documentation, Used in similar analysis",University server/cluster,"Managed cluster with inability to update software. Leads to problems getting updates, as well as troubleshooting problems with install and dependencies. ",Getting them over the fear of UNIX and the command line. If they can get that they tend to do much better.
7 2 9,5 10/6/2015 20:59:02,No,7,"Code that doesn't work Data that's in the wrong format",Yes,,"Already installed on server, Best for job, Word of mouth recommendation, Used in similar analysis, Good interface (not necessarily GUI)",Lab server,,
5 1 4,8 10/6/2015 21:00:16,No,5,Access to data and HIPPAA protections,No,Lacking a use case,"Best for job, Word of mouth recommendation, Good documentation, Used in similar analysis",Personal computer (laptop/desktop),RAM and processor limitations,Breadth of the field as a whole. Very difficult to keep material relevant to a particular student and teach class as a whole.
8 4 8 10/6/2015 21:02:17,Yes,8,Resource limits (e.g. 3 day cap for jobs to be run on the cluster) or lack of necessary resources (e.g. not enough amounts of RAM available).,Yes,,"Best for job, Used in similar analysis",University server/cluster,,
7 2 2,6,3 10/6/2015 21:02:53,No,7,"- Not enough of statistics - Inability to read the mind of the developer who wrote bad documentations",Yes,,"Best for job, Word of mouth recommendation, Good documentation",Lab server,Lack of packages available for distro,"Statistics, statistics, statistics"
6 2 5 10/6/2015 21:02:54,No,6,Standard formats,Yes,,"Best for job, Word of mouth recommendation, Good documentation",Lab server,,
8 4 3,6,9 10/6/2015 21:03:55,No,8,"- Tools that are difficult to install - Tools that are poorly documented - Tools where the documentation is well out of date - Tools that have implicit assumptions about versions of other tools that should be explicit - Tools that try to do too much, or have non-standard usage, that makes them hard to integrate into analysis pipelines",Yes,,"Best for job, Quickest, Word of mouth recommendation, Good documentation, Used in similar analysis",University server/cluster,"- Lack of control over the operating system / installed libraries installed on the cluster. - Competition with other users for limited disk space / CPU time","- Understanding how to move for quickly profiling different potential solutions in a rapid way, and how to transition from that kind of analysis into building larger-scale analyses. - Training students to learn for themselves how to follow new developments in the field. "
7 4 6 10/6/2015 21:03:59,Yes,7,Tools with very complex/limited documentation ,Yes,,"Already installed on server, Best for job, Good documentation, Used in similar analysis",University server/cluster,,"They usually start learning bioinformatics once entered in a masters or phd, too late."
7 4 4,8,3,9 10/6/2015 21:05:34,No,7,"Access to data sets from other sites. Compute cluster bottlenecks in storage or compute nodes.",Yes,,"Best for job, Good documentation, Used in similar analysis",University server/cluster,"- Tools with complex dependencies that require root access to easily install - Tools that crash or just do not work",
7 2 * 10/6/2015 21:08:37,No,7,,Yes,,"Best for job, Quickest",Lab server,,
7 4 8 10/6/2015 21:09:17,No,7,Too many meetings!,Yes,,"Already installed on server, Best for job, Quickest, Word of mouth recommendation, Good documentation, Used in similar analysis",University server/cluster,The pending times are too long,
6 4 7,8,3 10/6/2015 21:11:54,No,6,"File transfer Storage limits Controlled datasets (I appreciate privacy concerns but the turnaround for approvals can run upwards of 6 months)",Yes,,"Best for job, Word of mouth recommendation, Good documentation, Used in similar analysis",University server/cluster,Local install of packages/versions required for a program don't always work and getting administrative approval for a global solution is slow.,
2 1 2 10/6/2015 21:12:29,Yes,2,"1. Finding a good starting point. Generally I find on-line advice either too basic or too advanced. 2. Trusting that any analyses I do manage to conduct is correct. 3. Understanding what tools to use and what I need to learn before I attempt anything.",Yes,,"Best for job, Quickest",Personal computer (laptop/desktop),,"I don't train anyone but am sometimes asked for advice by newer students. Once someone has done a few unix/python/R tutorials, how do they then apply these new skills to their data?"
6 4 3 10/6/2015 21:14:26,No,6,ERS4456069_BKA1.sam.bam.aln.fullpipeline.sort.index.vcf.gz.bed.mask.txt,Yes,,"Best for job, Good documentation, Used in similar analysis, What my PI says to use",University server/cluster,Compiling (libraries don't exist or are wrong version and I don't have power to change),
7 3 3,5 10/6/2015 21:15:02,No,7,"Lack of time to really check my code as thoroughly as I would like. Lack of standardised pipelines or prevalence of bespoke analysis.",Yes,,"Word of mouth recommendation, Good documentation",Departmental server,"Lack of sudo access Poorly maintained software","Time constraints Differing backgrounds"
6 4 3 10/6/2015 21:16:08,No,6,Dealing with University IT policy bullshit (lack of admin privledges). My work is often halted by this if I want to test out a new package/tool on my local desktop.,Yes,,"Already installed on server, Best for job, Good documentation",University server/cluster,Cluster thread limitation. Cluster maintenance is poor. Takes a while to get programs installed/updated.,Getting students to care about learning the details. Most just want to get from data to interpreted results without having to grasp advanced statistics or programming.
5 4 3 10/6/2015 21:18:39,No,5,Not enough hours in the day. I still make stupid mistakes.,Yes,,"Already installed on server, Best for job, Word of mouth recommendation, Good documentation, Easy to install",University server/cluster,"Dependencies unavailable. Only works with a particular version of Java/Python etc. Input has to be zipped. Input has to be unzipped. Output is unzipped ...",Getting them to listen to my advice. :-)
2 1 3 10/6/2015 21:25:17,No,2,"Constraints of having to run Linux on virtual box on a windows pc. Installation/configuration of tools (no doubt compounded by my inexperience and vbox as discussed above)",No,Difficulty getting and formating,"Graphical interface, Good documentation",Personal computer (laptop/desktop),See above. ,
7 1 5,6 10/6/2015 21:29:57,No,7,Poorly documented software that is tailored for a specific organism or group of organisms. Orphan software. Software that only runs on one OS. Not enough time to spend messing with analyses and the data.,Yes,,"Graphical interface, Good documentation, Used in similar analysis",Personal computer (laptop/desktop),"Software requires very specific data formats but the format is not well documented. No matter how you format the input the response is ""data not formatted correctly"" On what line, and what character would really help!","I have to train each student, one at a time. I also have to remember how to do everything and for some analyses I do them once every few months, so I can lose brain cells between analyses."
8 1 3,6 10/6/2015 21:36:07,No,8,"Ability to compile/run the average software suite. Annotation. Lack of it, rather. Pages and pages of output that I'm never sure I should pay attention to or not. ",Yes,,"Best for job, Good documentation, Used in similar analysis",Personal computer (laptop/desktop),"Dependency hell, failure to compile",Getting them comfortable on the command line.
7 2 * 10/6/2015 21:41:49,Yes,7,,Yes,,"Best for job, Quickest, Good documentation",Lab server,,View that bioinformatics is easy and quick to master
7 4 2,3,8 10/6/2015 21:42:53,Yes,7,"Programming ability Knowledge of cluster and parallel computing Inability to install software on department infrastructure ",Yes,,"Best for job, Quickest, Word of mouth recommendation, Good documentation, Used in similar analysis",University server/cluster,Not able to install software. Complexity of cluster use. ,
5 1 6,8 10/6/2015 21:44:30,No,5,"documentation for some bioinformatic tools are pretty weak for non high level bioinformaticians there's quite a gap (at least I feel like there is) between a person that can use linux, command lines and do some little scripts to actually being able to do big scripts using several tools, etc. ",Yes,,"Best for job, Word of mouth recommendation, Good documentation",Personal computer (laptop/desktop),"low computing capacity running time",
4 4 6,3 10/6/2015 21:46:50,Yes,4,"Bad metadata Getting people to hand over data Not having a time machine or clones",Yes,,"Best for job, Word of mouth recommendation, Works",University server/cluster,Installation ,
4 4 3 10/6/2015 21:48:09,No,4,"Experts are so busy and the pipeline of bioinformaticians not big enough yet And my postdocs have variable issues with access to HPC",Yes,,"Best for job, Word of mouth recommendation, Good documentation",University server/cluster,Cluster staff used to maths not bioinformatics so we need to trouble shoot every new install ourselves,I don't.....lets train more of them!
6 4 8,9,3 10/6/2015 21:52:33,No,6,"- lack of high memory nodes on my school's cluster. I want ALL the memory. - things take too long. I want instant results. DIAMOND is a step in the right direction.",Yes,,"Best for job, Quickest, Good documentation, Used in similar analysis",University server/cluster,"- long queue times for high memory or GPU nodes - software not configured correctly - some software requires recompiling to change parameters (not ideal...), thus would have to ask support staff to do that for me. - downtime frequent","I don't train students (grad student), but I do believe there should be an emphasis on basic computer literacy in undergraduate education. We all take ""intro to biology."" Students should all take ""intro to computing"" and learn basic computer skills."
5 1 2 10/6/2015 21:55:26,No,5,Programming,Yes,,"Already installed on server, Quickest, Graphical interface, Good documentation",Personal computer (laptop/desktop),,
8 1 6,3,8 10/6/2015 21:57:28,No,8,1. Poor documentation; 2. inadequate description of a newly published bioinformatic method such that the results of a paper cannot be reproduced without back and forth emails with the author leading me to ask 'who reviewed this paper anyway?'; 3.outdated dependencies; 4. Lack of support from authors on a method they published. 5. Inadequate computational resources ,Yes,,"Best for job, Good documentation, Used in similar analysis, Easy to install",Personal computer (laptop/desktop),Running out of disk space. Lack of sys admin support. Our sys admin installs open suse on all servers but most software I use works smoothly for Ubuntu so lose lots of time troubleshooting software on open suse.,Keeping up to date with material
8 2 7 10/6/2015 21:59:30,Yes,8,"Lack of available time Lack of energy Poor internet connections everywhere that isn't my home or office",Yes,,"Best for job, Word of mouth recommendation, Good documentation, Used in similar analysis, Free (as in beer)",Lab server,Very rarely any,Lack of fundamental skills such as numeracy and problem solving
4 2 6 10/6/2015 22:00:52,No,4,Clear lists of available tools with documentation and advantages/disadvantages for processing data,Yes,,"Best for job, Quickest, Good documentation",Lab server,Lack of error documentation ,
3 4 8 10/6/2015 22:04:08,No,3,Devoid of a linux computer inspite if being a full time bioinformatics phd candidate. The hpc cluster is awesome!,Yes,,"Best for job, Good documentation, Used in similar analysis",University server/cluster,,"Real time data analysis Answers to which tool to use"
3 2 * 10/6/2015 22:04:29,No,3,,Yes,,"Best for job, Word of mouth recommendation, Good documentation",Lab server,,
9 1 6,5,8 10/6/2015 22:16:07,No,9,"*collaborators sending me poorly documented/named datasets *lack of database/archive curation (e.g. contaminators/mislabelling in GenBank)",Yes,,Best for job,Personal computer (laptop/desktop),"False positives, false negatives & scalability.",Being critical of predictions.
8 1 8 10/6/2015 22:18:52,No,8,Lack of compute,Yes,,"Already installed on server, Best for job, Word of mouth recommendation, Good documentation",Personal computer (laptop/desktop),,
3 2 5 10/6/2015 22:19:28,No,3,Few publication authors shares reproducible analysis pipeline!,No,Not yet.,"Best for job, Good documentation",Lab server,,
4 4 3 10/6/2015 22:22:43,No,4,Lack of a good team. ,Yes,,"Best for job, Quickest, Word of mouth recommendation, Good documentation",University server/cluster,Dependencies for complicated pipelines can be difficult to install on the cluster. ,Giving them enough confidence to feel comfortable.
7 5 9,4 10/6/2015 22:23:25,Yes,7,"People re-inventing tools that already exist, rather than enhancing existing tools Long-winded arguments about which programming language is best. Poorly maitained tools lacking supporting information. Politics around data localization. Dyed-in-the-wool industry haters who assume anything commercial can't be any good.",No,Too many tools/data available in other sites too,"Best for job, Good documentation, Reproducible results",Cloud,Licensing in tools/pipelies/etc,
9 2 6 10/6/2015 22:25:08,No,9,"Slow updates of genomic databases and tools, including Ensembl and UCSC browser",No,"Not convienient enough, not custom enough","Best for job, Word of mouth recommendation, Good documentation",Lab server,Bugs and poor documentation,Slow learning curve + difficulties in establishing baseline skills in Unix and basic programming + too many tools need to be mastered.
6 4 8,3 10/6/2015 22:33:14,Yes,6,"Not enough disk space on the cluster. Not enough free job slots on the cluster. Ridiculously complicated R-packages. Perl module and version incompatibilities. Not enough hours in the day.",Yes,,"Already installed on server, Best for job, Quickest",University server/cluster,No free job slots on the cluster,Lack of any previous experience in Linux platform or coding.
8 4 2,5 10/6/2015 22:38:12,No,8,"Diversity and complexity of software deployment Need to integrate disparate tools into coherent, robust workflows (weekly) Processing power",Yes,,"Best for job, Quickest, Good documentation, Used in similar analysis",University server/cluster,"Institutional bureaucracy Need for grants/cost of entry","Students come from many different backgrounds, so no one training solution is viable for all. Typically results in the students needing a large amount of self-guided learning, with mixed results. "
3 3 2 10/6/2015 22:39:27,Yes,3,"- Need to improve coding skills but it's hard to find time - Lack of support/mentoring/oversight - lack of other bioinformatics people to talk to - working with organisms that don't have well annotated genomes - trying to find a documentation strategy that works for me (constantly forgetting to git commit)",Yes,,"Best for job, Word of mouth recommendation, Good documentation, Used in similar analysis",Departmental server,"Maintaining the server myself - steep learning curve to acquire the necessary sysadmin skills Compatibility issues",Have only done some informal teaching of phd students. It's hard to know where to start...
9 3 9,6,8 10/6/2015 22:44:15,No,9,"- pipelines composed of poorly crafted software - poor documentation, which transforms what should be simple software reuse or adaptation into a hard problem - not enough compute resources ",No,just beginning to do HiC analysis.,"Best for job, Good documentation",Departmental server,"Undocumented or under-documented errors, which don't suggest how to fix errors. Also, usually there are zero tests, meaning that even if the software seems to install properly, I have no way to trust that it has been installed properly, or if it works.",
4 3 8,3 10/6/2015 22:45:54,No,4,My Advisor and hard-drive space,Yes,,"Best for job, Quickest, Word of mouth recommendation, Good documentation",Departmental server,Installing dependencies without root access,Getting them to sit at the command line long enough to start doing their own troubleshooting
7 2 * 10/6/2015 23:01:25,No,7,,Yes,,"Best for job, Quickest, Good documentation, Used in similar analysis",Lab server,,
2 5 * 10/6/2015 23:02:50,No,2,"Knowing there is a way to perform a task, but not being able to properly logic out the process in a timely manner.",Yes,,"Graphical interface, Good documentation, Used in similar analysis",Cloud,,
5 2 3,8 10/6/2015 23:10:23,No,5,"Computer capacity , immature software",Yes,,"Best for job, Quickest, Word of mouth recommendation, Good documentation",Lab server,Pre req packages ... Conflicts with other installed software,Resources to run real analysis
10 4 * 10/6/2015 23:35:39,No,10,Told that only provide a command line interface rather than a programmatic API. This often results in unnecessary temp files wasting space and makes it difficult to write analysis as a single script that can be run reproducibly,Yes,,"Best for job, Quickest, Good documentation, Used in similar analysis",University server/cluster,,Convincing them to use version control
8 4 5,8 10/6/2015 23:43:51,Yes,8,"The persistent use of non-generic primary data formats. For instance, VCF instead of JSON, BAM instead of protobuf.",Yes,,"Best for job, Quickest, I wrote it",University server/cluster,"Low availability. Waiting for jobs to even start really impairs progress. And having data on some crazy custom filesystem is never a good start to interfacing with the outside world. Oh, and clusters whose sysadmins refuse to provide SSH connection to the system and instead require the use of VPNs that don't work on Linux. Wherever possible I optimize things to run on my laptop, even if it takes days or even weeks of development.","Helping them to avoid cargo cult bioinformatics (hello GATK) and develop their own solutions to answer problems they want to solve. Bioinformatics has tended to lack the small modular tools that enable this kind of approach, although I have to say this seems to be improving."
7 4 * 10/6/2015 23:46:08,No,7,So many ideas for different analyses - only one of me. ,Yes,,"Best for job, Word of mouth recommendation, Good documentation",University server/cluster,,
7 3 8 10/6/2015 23:46:45,No,7,"outsourcing (60% discount in bioinformatics service) ",No,low throughput,"Best for job, Word of mouth recommendation, Good documentation",Departmental server,Lack of RAM (1TB),
7 1 4,5,6,3,8,9 10/7/2015 1:01:41,No,7,"Certain types of data unlikely to be public (e.g. metabolome), not in standardized formats Non-standard file formats that are insufficiently documented to parse except by trial and error Dependency hell (e.g. different tools requiring incompatible versions of libraries) Memory and disk limitations Tools that are buggy, poorly documented, and/or that only run in very specific environments",Yes,,"Already installed on server, Best for job, Quickest, Word of mouth recommendation, Good documentation, Used in similar analysis",Personal computer (laptop/desktop),Firewall makes remote access more annoying; personal desktop is powerful and guaranteed to be available but not as powerful as shared servers,"Instilling reproducible research habits Providing adequate training in each of the different spheres bioinformatics touches (mol bio, stat, cs, evo, etc) Motivating students through tedious work (parsing, troubleshooting other people's tools, etc)"
5 1 3 10/7/2015 1:13:50,No,5,Installing programs with +++ dependancies. ,Yes,,"Best for job, Good documentation",Personal computer (laptop/desktop),,
4 4 6 10/7/2015 1:19:31,No,4,"Poorly documented open-source code and manuals, very little computational support (both in the lab and the faculty in general - UofC just hired its first Chair position for Bioinformatics), large amount of software options to choose for any one task - no gold standards",Yes,,"Best for job, Good documentation, Used in similar analysis, something i can actually install and run",University server/cluster,"Once i figured out the shell, got scolded a few times for not qsub'ing jobs in $WORK, filling up $SCRATCH and other n00b mistakes I haven't had any problems - I'm working with relatively small data sets (10GB)",
7 4 3,8 10/7/2015 1:28:08,No,7,"1a) Collaborators who want to do all analysis w/o prior experimental design. 1b) Collaborators who are incapable of providing experimental design. 2) Raw computing power. ",Yes,,"Already installed on server, Best for job, Quickest, Good documentation, Memory limitations.",University server/cluster,"Dependency limitations. Odd one off bugs that have never been documented and are difficult to create test cases to re-create. Memory. ","Two things: Thinking that bio-bioinformatics can be learned and internalized in a 2 day or even semester long course and not realizing that it is a scientific discipline. Raw fear of computing as a tool. (Similar to math adverse students.)"
7 4 8 10/7/2015 1:29:34,No,7,"Not enough resources for all users, often forces a move from cluster to the cloud.",Yes,,"Best for job, Quickest, Word of mouth recommendation",University server/cluster,"",Apprehension of large data and thinking they will not get it.
3 4 * 10/7/2015 2:03:44,No,3,,Yes,,"Good documentation, Used in similar analysis, forum support available",University server/cluster,Too much data is stored on the server. Too many users trying to be on the server at the same time. ,
6 5 5,8,3 10/7/2015 2:29:54,No,6,"- limited storage space - limits to sharing human (patient) biological data - ""chr1"" vs. ""1"" - incompatibility of tools that ought to work together - tools: lack of portability / poor packaging / bit rot",Yes,,"Best for job, Word of mouth recommendation, Good documentation, Used in similar analysis, permissive open-source license",Cloud,"Must completely automate installation of dependencies in order to run the software, can't just crash around until it works",Convincing them that it's worth taking the time to properly learn coding basics. It's not just a chore you can half-learn for a semester and get half-decent results on your projects; either you do it right or you get nothing.
5 1 8 10/7/2015 3:39:45,No,5,"(1) Difficult to reuse tools due to licenses or unclear whether tool is appropriate for bacteria. (2) No training infrastructure (self-taught). (3) Not supported by core facilities with superlative expertise in the same manner as traditional wet lab resources e.g., Sanger sequencing, flow cytometry, imaging.",Yes,,"Best for job, Good documentation, Used in similar analysis",Personal computer (laptop/desktop),Memory and storage limitations (then I switch to university cluster).,"Training on the prerequisite ""boring"" tools before you can conduct a meaningful biological analysis. These include: installing the needed software, command line interface, git (sometimes), statistical analyses, and including control/test data to differentiate meaningful output from garbage."
3 4 2,3 10/7/2015 4:39:44,No,3,Lack of practice and how you have to start from zero if you go a while without practicing it.,Yes,,"Best for job, Quickest, Graphical interface, Good documentation, Used in similar analysis",University server/cluster,Incompatibility between softwares and dependencies.,"I'm a student myself, and learning bioinformatics analyses is difficult even with internet and access to many forums as many people have their own way of writing codes, preference for programs used, etc."
2 1 * 10/7/2015 5:48:36,No,2,,Yes,,"Word of mouth recommendation, Good documentation, Used in similar analysis",Personal computer (laptop/desktop),,
7 3 3,7,4,8 10/7/2015 6:19:50,No,7,"Non open-source tools and databases. Not enough CPUs on a node. Data transfer too slow from NCBI and ENA.",Yes,,"Best for job, Quickest",Departmental server,"Installing software. Poor software quality.","Unix command line."
4 * 3,2 10/7/2015 6:23:03,Yes,4,"Mathematical expertise Algorithm and model design Coding (fixing this) ",Yes,,"Best for job, Good documentation, Used in similar analysis",Iridis4,Program Dependency issues,
7 3 3 10/7/2015 6:36:32,No,7,Installation of open source packages and their dependencies is often a nightmare,Yes,,"Already installed on server, Good documentation, scripting to allow reproducibility",Departmental server,I do not have admin access and hence rely on others to install software packages,The steep learning curve. It is often too difficult to install and test new software tools due to the number of dependencies etc that are needed with the right version numbers. This results in the use of suboptimal solutions as people use what they have already installed.
3 2 8,5 10/7/2015 6:53:14,Yes,3,Fast and efficient data transfer; lack of suitable IT infrastructure; continually installing updates or new software. ,Yes,,"Best for job, Quickest",Lab server,Updates and installs; lack of standardised system; insufficient memory ,
4 1 * 10/7/2015 7:14:48,No,4,,Yes,,"Best for job, Good documentation, Used in similar analysis",Personal computer (laptop/desktop),,
8 4 * 10/7/2015 7:31:15,No,8,,Yes,,"Best for job, Word of mouth recommendation, Good documentation",University server/cluster,,
4 2 5,8,3 10/7/2015 7:56:20,No,4,"huge amount of necessary working memory for some tools output/input format conversions dependencies of some programs (down to version level) ",Yes,,"Best for job, Quickest, Word of mouth recommendation, Used in similar analysis",Lab server,,
8 1 5 10/7/2015 8:53:53,No,8,"-No standard names. For example HGNC was not standard till pretty recently. So standard it is great if they are diff. genome severs (ie. USCS, ENSEMBL), but they should talk more each together. -Standard APIs for accessing data for instance it will be a must for USCS ",Yes,,"Already installed on server, Best for job, Quickest, Word of mouth recommendation, Good documentation, Used in similar analysis, software should have a related publication",Personal computer (laptop/desktop),It is fine for me,difficulties to find good candidates in comparation to some other fields. Seems to me they do not choose academia but company money instead
4 4 7,3,5 10/7/2015 8:55:01,No,4,Software installation. Incompatible formats. Platforms competing to deliver the whole package but don't and make it difficult to export data.,Yes,,"Already installed on server, Word of mouth recommendation, Good documentation, Used in similar analysis",Departmental server,Slow connection. No root access.,Steep learning curve. Takes long time to dø even basic stuff.
9 4 6,3,5 10/7/2015 8:57:45,No,9,"Poorly documented software (inlc. web tools, databases, packages, modules, ...). Parsing XML files. Converting Input/Output files from non standard formats. Abandoned software. ",Yes,,"Already installed on server, Best for job, Used in similar analysis",University server/cluster,Hard to install software (especially if admin privileges needed).,
9 2 4 10/7/2015 9:02:58,No,9,"closed source / data not publicly available lack of support from IT-department",Yes,,"Word of mouth recommendation, Used in similar analysis, open source",Lab server,"none - using the lab server, we manage the software ourselves ","Students want to jump ahead, and are not interesting in the basics, like How does the data look like, How is the data stored, Where is the data coming from, etc ."
5 2 8,3 10/7/2015 9:27:19,No,5,"- HPC structure - storage capacity - time",Yes,,"Best for job, Quickest, Good documentation, Used in similar analysis",Lab server,"- compatibility - portability","- Focus in training goals - Highlight work"
4 1 6,3 10/7/2015 10:07:29,Yes,4,"Poor documentation on tools (either non-existent or the assumed knowledge level is to high). Hundreds of scripts/programs/pipelines that do the same thing, everyone has an opinion on what is best.",Yes,,"Already installed on server, Best for job, Word of mouth recommendation, Good documentation, Used in similar analysis",Personal computer (laptop/desktop),"Poor documentation. Specific out-of-date dependencies. Dependencies not installing correctly. Obtuse error messages. That horrible feeling when you want to do a new bit of analysis, you find what looks like the perfect tool already designed and it turns out to be the biggest festering shite you've ever compiled on your computer","Senior academics taking on students with absolutely no previous expertise in bioinformatics just because they are keen to ""have a go"". Usually medics."
5 3 8,6 10/7/2015 10:16:04,No,5,"Lack of good metadata on publicly available datasets, lack of / unclear documentation for new software leading to time wasted. ",Yes,,"Already installed on server, Best for job, Word of mouth recommendation, Good documentation, Used in similar analysis",Departmental server,Long running times for certain software,
6 4 5 10/7/2015 11:17:08,No,6,"Bioinformatics software that is windows only! Tools that wont work unless data is arranged in specific paths Unnecessary config files when a command-line option would do Inconsistent data formats",Yes,,"Best for job, Quickest, Word of mouth recommendation, Good documentation",University server/cluster,Very few. It works very well.,Lack of programming experience
5 2 3,5,8 10/7/2015 12:35:54,No,5,"Inconsistenties between formats! Biology is hard.",Yes,,"Best for job, Quickest, Good documentation",Lab server,"Software not installed. Competing for resources with coworkers. ",Aversion to the command line.
7 4 3,2 10/7/2015 13:01:59,Yes,7,"Published software that can only actually run on the authors laptop. Poor quality sequencing. EBI",Yes,,"Best for job, Good documentation",University server/cluster,Installing software and its dependancies.,A complete lack of computing or biology knowledge. Some just want to know where the magic button is to give them a tree.
5 4 3 10/7/2015 13:07:20,Yes,5,,Yes,,"Best for job, Quickest, Good documentation",University server/cluster,Incorrect' (for my needs) version installed (or not installed at all),
5 4 6 10/7/2015 13:13:57,Yes,5,"statistics knowledge, R uninformative error messages",Yes,,"Best for job, Good documentation, Used in similar analysis",University server/cluster,,
8 4 7,8,3 10/7/2015 14:03:09,Is Scotland still in the UK?,8,"Data transfer I/O bottlenecks within compute systems Inconsistent annotation",Yes,,"Best for job, Quickest, Good documentation",University server/cluster,"Resource allocation conflicts Job queuing inefficiencies Installation works on head node but not worker nodes","For students from a biological background, getting them to lose their fear of 'breaking' things on the compute servers/cluster. For students from math/stats/CS background, understanding that biology is messy and there are rarely if ever clean and clear answers."
4 1 6,8 10/7/2015 14:27:14,Yes,4,"- Difficult to navigate the infinite choices of software, algorithms, analyses. I want to move from 16S rRNA sequencing to metagenomics, but how do I even start if I'm doing it entirely on my own? - Difficult to find enough documentation for any tool/software for learning by myself from scratch; - People don't give enough details of chosen parameters for their analyses (e.g. in 16S rRNA gene pipelines), so it's difficult to properly compare across studies.",Yes,"At least I think I do, but maybe I'm not using enough. Mostly just BLAST, but I only do amplicon sequencing.","Already installed on server, Best for job, Quickest, Word of mouth recommendation, Good documentation, Used in similar analysis",Personal computer (laptop/desktop),"Not enough memory, not enough disk space (although it's a decent Macbook Pro). There is a departmental server, but access is restricted and help is limited, as there isn't a dedicated bioinformatician.","Making sure they understand what exactly is being done to the dataset at each step when they use a semi-automated pipeline, what the limitations, biases and assumptions are, making sure they choose parameters and tools according to their research questions, instead of just repeating commands from workshop slides. Also difficult to teach them to go beyond the basics (stacked bar plots for 16S rRNA sequencing), use different visualisations, different analyses, more statistics, etc., ask new questions to explore the data, etc."
7 4 9,5,3,7 10/7/2015 14:48:17,No,7,"Software that doesn't work Data in too many different formats Little problems with the data that require you to spend lots of time debugging Too much variability in methods and parameters that people use in their papers. All the money goes to people at large institutions. ",Yes,,"Best for job, Quickest, Good documentation, Used in similar analysis",University server/cluster,"Cannot run Docker containers on the university cluster. Versions of software and parameters change quickly. Not transferable directly from one computer to another.","Teaching them to code in R when they have been trained in Python. As much as people might prefer using Python for bioinformatics, R is essential to know."
3 3 * 10/7/2015 14:58:15,Yes,3,not enough bioinformatics support and poor infrastructure,Yes,,"Already installed on server, Word of mouth recommendation, Used in similar analysis",Departmental server,,
7 2 4,3 10/7/2015 15:10:21,Yes,7,Difficult to find public samples based on metadata.,Yes,,"Best for job, Quickest, Word of mouth recommendation, Good documentation",Lab server,Make,
7 2 5,3 10/7/2015 15:10:29,No,7,"Politics Format changes",Yes,,"Already installed on server, Best for job, Quickest, Word of mouth recommendation, Good documentation, Used in similar analysis, development by own lab",Lab server,Administration / Old OS versions,
4 1 9 10/7/2015 15:20:24,Yes,4,Multiple tools in multiple programming languages,Yes,,"Best for job, Quickest, Word of mouth recommendation, Graphical interface, Good documentation, Used in similar analysis",Personal computer (laptop/desktop),Ad hoc software design that is often incompatible with my application ,
7 2 3 10/7/2015 15:28:13,No,7,Tools that come with a million dependencies,Yes,,"Already installed on server, Best for job, Word of mouth recommendation, Good documentation, Used in similar analysis",Lab server,Missing a package I want to use,Teaching them that just because the tool says something doesn't mean it's biologically correct
4 4 * 10/7/2015 15:49:49,Yes,4,Time,Yes,,"Good documentation, Used in similar analysis",University server/cluster,The need to explore a folder structure that was not set up by me.,NA
2 1 2,8,6 10/7/2015 16:11:32,No,2,"• my own lack of knowledge • lack of computing power",Yes,,"Best for job, Good documentation",Personal computer (laptop/desktop),"Often I have no idea how exactly the software works, as the documentation, although possibly at least existing, does not nearly provide enough actual practical examples.",
10 * 3,6 10/7/2015 16:28:02,Yes,10,"Access to diverse tools in sensible environment. Benchmarking and documentation in tools. Cloud aren't yet for ad hoc and easy-entry, try-and-see tasks. Funding typically underestimated cost of solid bioinformatics. ",No,Because I don't have my own data - I work for one of these resources. ,"Best for job, Good documentation, Open for others to understand and criticise",Dedicated institutional compute resource,Installation and configuration issues with tools that aren't quite mature. ,Developing an understanding of the diversity of resources available and the two-way nature of their use.
8 4 3 10/7/2015 16:40:05,Yes,8,"Incompatible software i.e. pipelines that require multiple software, but certain versions not being compatible",Yes,,Good documentation,University server/cluster,"Not necessarily the most up-to-date versions, as administrator for server likes to benchmark new version before installation and use.",They often have no training in *nix systems before coming to me - they have no programming/scripting experience from their undergraduate courses.
6 2 3,5 10/7/2015 16:48:44,No,6,"file formats, papers written like press releases with few details on key steps, lack of standard workflows even for routine bioinformatics tasks",Yes,,"Already installed on server, Best for job, Quickest, Good documentation, Used in similar analysis",Lab server,"installation, random poorly-documented bugs","diversity of backgrounds, persistent ""I'm not a mathematician/computer scientist"" attitudes, not much institution-wide support for training where I am"
2 1 * 10/7/2015 19:29:31,No,2,Lack of up-to-date info when not connected to the buzz around a state-of-the-art lab.,Yes,,Word of mouth recommendation,Personal computer (laptop/desktop),,
10 1 * 10/7/2015 20:13:46,No,10,"Lack of persistence of data and resources Java",Yes,,"Best for job, Word of mouth recommendation, Used in similar analysis, known/reliable authors",Personal computer (laptop/desktop),"Java is messed up Some resources are becoming slow or have large cumbersome outputs (e.g. sequence similarity searches)","Finding out that some of the tools and databases I used last year don't exist anymore (again, lack of persistence)."
5 1 3,6 10/7/2015 21:48:56,No,5,,Yes,,"Already installed on server, Best for job, Good documentation, Used in similar analysis",Personal computer (laptop/desktop),Difficulties to install some softwares with too many dependencies and no tutorial.,
7 4 5,6 10/7/2015 22:34:45,Yes,7,Many reinventions of the wheel in terms of tools and file formats. Poor approach to defensive programming means error messages are often utterly meaningless. Failure to easily utilise parallel programming environments ,No,Too hard to join up disparate tools,"Best for job, Word of mouth recommendation, Good documentation, Used in similar analysis, computationally feasible given meagre resources",University server/cluster,"Terrible error handling, poor documentation and occasionally program constructs which could be bettered by a GCSE student",Educating them the fundamentals of the command line
7 4 9,3 10/8/2015 1:25:37,No,7,"Poor software engineering The whole ""pet bioinformatician"" thing Ancient clusters and expensive clouds Lack of understanding of basic software engineering in colleagues",Yes,,"Best for job, Quickest, Good documentation, not written in Java",University server/cluster,"Obsolete versions of libraries/compilers on clusters Poor or expensive cloud compute capabilities.",The absence of basic computational and mathematic training in early undergraduate biology courses.
7 4 9 10/8/2015 1:39:06,No,7,No biologist can explain their experiment clearly,Yes,,"Best for job, Used in similar analysis, Publication consistent with source code",University server/cluster,"Nobody have good tests for their software, so you have to implement your own CI server to see that everything works as expected bedtools/MACS2/STAR/etc",
8 2 5,8 10/8/2015 4:39:13,No,8,Lack of standardized methods by those who generate data,Yes,,"Word of mouth recommendation, Good documentation",Lab server,People using all resources without adjusting job priorities!,"Statistics is the most important part, that computer science is not the end point."
3 4 2,8 10/8/2015 11:45:09,No,3,"I'm a Computer Scientist (just trying to design and develop genomic software), and I do struggle to keep on the path of learning so many genomic words when talking to my biology colleagues. Besides, there is a huge pletora of genomic apps that one cannot know whether is making the right decision or not.",No,"I don't need to use, but my colleagues do.","Best for job, Word of mouth recommendation, Graphical interface, Good documentation, Open source",University server/cluster,We depend on third parties when running out of HD space or RAM.,"On the side of training Computer Engineers, the challenges are: - Getting a deep intro to genomics. - Getting to know the specific problem to solve (i.e. DNA assembly, metagenomics, etc). - Selecting the right software to use."
5 4 2,5,8 10/8/2015 13:22:21,Yes,5,"Programming skills, to manipulate data into formats required by software. Not much time to sit down and learn perl/Python!",Yes,,Best for job,University server/cluster,Unpredictable queuing times. Getting new software installed can take a while.,
2 2 3,6,2 10/8/2015 16:52:25,No,2,"No root access. Poor documentation. Install dependencies in Python. Personally, knowledge of statistical modeling.",No,Do n ot know how.,"Best for job, Word of mouth recommendation, Good documentation, Used in similar analysis",Lab server,No root access means no free installation of many things (some error always comes up).,N/A
7 2 2,7 10/8/2015 18:59:13,Yes,7,I am a biologist who taught myself to code. My most common frustration is not being as versatile a programmer as I'd like to be.,Yes,,"Best for job, Quickest, Word of mouth recommendation, Good documentation, Used in similar analysis, fewest bugs",Lab server,"scaling up to really big datasets lack of parallelizability slow data transfers","for biology students = algorithmic thinking for computer science students = biological thinking"
6 2 * 10/8/2015 20:47:31,No,6,,Yes,,"Best for job, Used in similar analysis",Lab server,,
8 2 6,8 10/9/2015 1:20:27,No,8,undocumented tools/features/limitations. ,Yes,,"Best for job, Good documentation",Lab server,software uses too much RAM/time/cpu and limits throughput given finite compute. ,finding students interested in both biology and computer. Most have 1 skill but not the other.
6 1 6,5,3 10/9/2015 11:30:41,No,6,"- Software with unclear instructions / descriptions that forces you to look at the code to understand what it's actually doing - File formats; even if standards exist they leave too much room for interpretation -> diverging implementations - Bad experimental designs or limited sample availability - Clinical data in Excel tables and with bad format",Yes,,"Best for job, Good documentation, Used in similar analysis",Personal computer (laptop/desktop),"- hard to interpret runtime errors - software often doesn't compile on anything but Linux (or OS X if you're lucky)",- little or no programming literacy
7 2 3,5 10/9/2015 15:21:37,No,7,"- it takes a lot of time to find out which tool is best for the job - unavailable R packages after updating R - format hell ",Yes,,"Best for job, Word of mouth recommendation, Good documentation",Lab server,"- running old OS version not fullfilling version requirements - installation on server without admin rights (no package manager)",Understanding the messiness and noisiness of biological data
6 4 3,6 10/9/2015 15:23:09,Yes,6,"software compilation incomplete metadata internet speed",Yes,,"Best for job, Word of mouth recommendation, Good documentation, Used in similar analysis",University server/cluster,,
6 1 * 10/9/2015 16:11:42,No,6,,Yes,,Good documentation,Personal computer (laptop/desktop),,"Basic computer literacy - how to install software, troubleshooting simple errors."
9 3 * 10/9/2015 16:49:41,No,9,Wetlab workers not being able to produce sensible sample sizes.,Yes,,"Best for job, Good documentation, Source available.",Departmental server,,Reproducibility and controls.
4 3 9 10/9/2015 19:53:38,No,4,"small samples, not enough data",Yes,,"Best for job, Word of mouth recommendation, Good documentation",Departmental server,"java memory leaks",
7 3 5,3 10/9/2015 20:50:12,No,7,Lack of standardization in almost every aspect,Yes,,"Best for job, Word of mouth recommendation, Good documentation",Departmental server,Most problems involve installation of software and are mostly due to the shortcomings of the documentation.,Training students to problem solve when they run into issues.
8 1 4,6 10/10/2015 23:46:45,No,8,"Failure to provide code, and merely providing terse verbal methods section descriptions in its place, makes many papers incomprehensible.",Yes,,"Best for job, Good documentation",Personal computer (laptop/desktop),trouble understanding convoluted R documentation.,"Most students do not have a solid background in math, probability, and statistics."
8 4 5 10/11/2015 19:46:48,No,8,Formats,Yes,,"Best for job, Quickest, Good documentation",University server/cluster,,
6 2 9,3 10/11/2015 23:03:27,No,6,"usability of bioinformatics software/tools out there",Yes,,"Best for job, Graphical interface",Lab server,incompatibilities,
8 4 5 10/12/2015 6:49:52,No,8,"1) Lack of standards: i) formats, ii) tools to use, iii) best coding practices and above all iv) data management and sharing 2) Lack of time versus amount of data to analyse and new knowledge to absorb, all things in the way of becoming a unicorn AKA the ultimate full-stack bioinformatician ",Yes,,"Best for job, Good documentation, available on git/bitbucket",University server/cluster,Decommissioning of HPC cluster combined with lack of support from institution for HPC infrastructure and eResearch,"Beside good project and data management skills, I would say critical thinking is the biggest challenge."
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment