Skip to content

Instantly share code, notes, and snippets.

@peterk87
Last active December 16, 2015 15:09
Show Gist options
  • Save peterk87/5453340 to your computer and use it in GitHub Desktop.
Save peterk87/5453340 to your computer and use it in GitHub Desktop.
LaTex: CGF report creator manual
\documentclass{article}
\usepackage{lineno}
\usepackage{url}
\usepackage{hyperref}
\usepackage{fancyhdr} % Needed to define custom headers/footers
\usepackage{titlesec}
\usepackage{lastpage} % Used to determine the number of pages in the document (for "Page X of Total")
% \usepackage[hyphens]{url}
\usepackage{tocbibind}
% \usepackage{lineno}
\usepackage{subcaption}
\usepackage[section]{placeins}
\usepackage{booktabs}
\usepackage{microtype}
\hypersetup{
urlcolor=black,
colorlinks=true,
linkcolor=blue,
citecolor=black} % Colors hyperlinks in blue - change to black if annoying
\fancyhead{}
\cfoot{\tiny \copyright July 2009, Laboratory for Foodborne Zoonoses Lethbridge Unit, Public Health Agency of Canada.}
\lhead{\small Comparative Genomic Fingerprinting: Automated Scoring Module, Manual}
\rhead{\thepage}
\pagestyle{fancy}
% \linenumbers
\begin{document}
\title{\textbf{Comparative Genomic Fingerprinting: Automated Scoring Module, Manual}\\
with Campylobacter CGF40 mPCR Module\\
Version 1.0}
\author{Steven Mutschall}
\date{\today}
\maketitle
\thispagestyle{empty}
\newpage
\tableofcontents
\newpage
% \listoftables
% \newpage
% \listoffigures
% \newpage
\section{Introduction}
The CGF Automated Scoring Module is a software package to be used in concert with QIAxcel BioCalculator software and is designed to generate data reports within MS Excel.
Using PCR-specific modules, the software creates reports by integrating user-defined information with the jpg ``gel images'' and binary peak calling results from BioCalculator.
The software contains an Automated Exporter utility which allows for simple batch export of binary peak data from the BioCalculator software for peak calling at high and low sensitivities at user-defined thresholds . Within the MS Excel data report, the high and low threshold data are compared and differences are red flagged for the human user to easily discern and resolve any ambiguity. The built in clustering feature then allows for clustering of the binary strain fingerprints right in MS Excel.
\subsection{Campylobacter CGF40 mPCR Module}
This module has been specifically designed for the CGF40 multiplex PCR assay for Campylobacter spp. The CGF40 assay consists of eight unique 5-plex PCRs, which together generate a 40-gene binary fingerprint. This module contains all of the reaction conditions and reagent concentrations required for performing each of the CGF40 multiplex PCRs. Report templates containing the appropriate gene names and PCR product sizes for each multiplex, as well as the gene-specific .pid files with peak calling Rel Times and tolerances used by the QIAxcel software for binary peak calling are included within this module.
\subsection{Software Package Contents:}
\begin{enumerate}
\item \textbf{QIAxcelReportCreator.xla}
\begin{itemize}
\item Excel Add-In that pastes in gel images, and automatic binary scoring output from the QIAxcel BioCalculator software, and clustering capability. When running the report creator make sure to tell Excel to ``Enable Macros'' if prompted.
\end{itemize}
\item \textbf{BCAE\_GUI.exe (BioCalculator Automated Exporter with Graphical User Interface)}
\begin{itemize}
\item This script allows batch export of all needed exports (usually 16) from the QIAxcel BioCalculator software all once, greatly accelerating the process. Requires BioCalculator to be installed on the system.
\end{itemize}
\item \textbf{CGF\_AutoScoring\_Example.zip}
\begin{itemize}
\item A package of an example of the directory structure the CGF\_AutoScoring module expects in order to function.
\end{itemize}
\item \textbf{PIDs (folder)}
\begin{itemize}
\item This folder contains .pid files with peak calling Rel Times and tolerances used by the QIAxcel software to call peaks.
\end{itemize}
\item Images (folder)
\begin{itemize}
\item Contains images for this page
\end{itemize}
\end{enumerate}
{\large\textbf{Required: MS Excel 2003 or 2007 (2000 will not work) }}
% Table of Contents
% I. Overview Flowchart
% II. Installation
% III. Using the Module
% A. PCR and Plate Setup
% B. Running Plates on the QIAxcel
% C. Exporting Gel Images as JPGs
% D. Exporting Binary Peak Data (Automated)
% E. Creating QIAxcel Data Report with Excel
% F. Integrating Binary Data into Excel Sheet
% G. Generating a Final Data Table
% H. Compiling Final Data From Multiplex Reports
% I. Clustering
% IV. Appendix
% A. Exporting Binary Peak Data (Manual)
% B. Data Tables
% C. Known Issues
\section{Instructions}
\subsection{Overview Flowchart}
The diagram below is a flowchart showing an overview of using the Automated Scoring software, and when to use each component:
\subsection{Installation}
\begin{enumerate}
\item QIAxcelReportCreator.xla
\begin{enumerate}
\item Open MSExcel (Requires MSExcel 2003 or MSExcel 2007). Create a new blank workbook (File $\rightarrow{}$ New)
\item Select Tools $\rightarrow{}$ Add-Ins (Excel2007: Located in the ``Menus'' tab of the ribbon)
\item From the dialog that appears, click ``Browse''. Browse for QIAxcelReportCreator.xla
\item The Add-In should now appear in the list. Make sure that the checkbox is checked. Click ``OK''
\item The Add-In should now be installed. A new persistent menu item ``CGF'' will appear in the Excel menu bar.
\end{enumerate}
\item BCAE\_GUI.exe
\begin{itemize}
\item No installation required. Simply run BCAE\_GUI.exe
\end{itemize}
\end{enumerate}
\subsection{Using the Module}
\subsubsection{PCR and Plate Setup}
The QIAxcelReportCreator.xla file contains the CGF Analysis add-in menu located on the main toolbar of MS Excel.
Users will utilize this drop-down menu to navigate through the report creating process.
At each step of the process, only those options that can be performed will be selectable.
\begin{enumerate}
\item From the CGF drop-down menu, select ``New QIAxcel Data Report''. This will create the new Report workbook containing the Campylobacter CGF40 PCR template.
\item The SampleInfo worksheet in this new file contains the PCR reaction conditions for the Campylobacter CGF40 mPCRs.
Enter the sample names in the ``Strain Names'' column and the number of reactions in cell F3 to calculate the mastermix volumes.
At this stage, the file can be saved and closed until the PCRs have been completed and run on the QIAxcel.
\item 96-well PCR plates can be set up in a variety of ways depending on the number of samples; often more than one multiplex PCR can be run in a single plate.
For example, 24 samples would be run in four sets per plate, completing 4 of the 8 multiplex PCRs.
The software is designed to accept the various different plate configurations outlined in the figure below.
It is important to keep in mind that the QIAxcel must run full rows of a plate.
\end{enumerate}
\subsubsection{Running Plates on the QIAxcel}
\begin{enumerate}
\item Campylobacter CGF40 mPCRs are run on the QIAxcel DNA Screening Kit (Cat No. 929004) using the AM320 method with a 20 second injection time (see Section 5.6 of QIAxcel User Manual).
Band sizes are determined using the QX DNA Size Marker 100bp-3kb (Cat No. 929553) and are aligned using the QX Alignment Marker 15bp/3kb (Cat No. 929522).
\item Best Practices: Prior to running plates in the QIAxcel, it is helpful to include the sample information for each of the wells in a plate.
This is done by clicking the Sample Info button on the Instrument Control window of BioCalculator and either manually entering sample names or importing a previously created comma-separated-value (.csv) file with the sample names (see Section 5-17 of QIAxcel User Manual).
The sample names will then appear above each well on the gel image and can later be cross referenced with the sample names provided in the data tables within the integrated reports.
\end{enumerate}
\subsubsection{Exporting Gel Images as JPGs}
BioCalculator automatically exports a .jpg ``gel image'' for each row of a plate as samples are being run, however many times the alignment markers must be manually fixed and a new corrected jpg must be exported.
It is critical that both the upper and lower alignment markers are correctly aligned for all 12 lanes that make up the ``gel image''.
Perform the following steps below to manually add or delete alignment marker peaks in any electropherograms that have not properly aligned. (see also Section 5-23 of QIAxcel User Manual).
\begin{enumerate}
\item Using the QIAxcel BioCalculator software load each gel image (Open *.hff files).
Add or delete peaks by right clicking on the electropherogram and choose ``Add Peak'' or ``Delete Peak'' on the desired positions on the electropherogram.
Once peak alterations are complete, return to the whole row ``gel image'' interface and click the dark blue arrow button on the upper toolbar to re-set the new image.
\item Best Practices: Stretch the ``gel image'' to its maximum size by clicking on any two separate points on the gel image.
This will eliminate the white spaces outside of the alignment markers and produces an image that is more easily viewed by the user in later steps.
\item Go to Folder $\rightarrow{}$ Export $\rightarrow{}$ Current View to export the edited image as a jpg.
Make sure the file is saved as a jpg (other file extensions will not work) and make sure the resolution is set to x = 100, y = 1000.
This exports a jpeg with dimensions of 1600x1000 pixels, which is the size require by the report creator later on.
\item Create a sub folder for each multiplex and name it ``MultiplexX'' where X is an integer from 1--8 (e.g. start with ``Multiplex1'') and put the appropriate jpegs into each sub folder, as shown in the image below.
\end{enumerate}
\subsection{Exporting Binary Peak Data}
A script written in AutoItv3 a scripting language for controlling the Windows GUI can automate the export process (To see how to perform an export without the script, see Section~\ref{sec:exporting_binary_peak_data_manual}).
One run of the script can generate BIN data for all 8 multiplexes using both high and low thresholds (a total of 16 exports).
The files will automatically be named mpX\_BIN.csv and placed into ``\\High'' and ``\\Low'' folders created within the output directory.
While the automated process is running, the computer cannot be used for anything else, as the script waits for certain expected windows to become active/unactive for program control.
\begin{enumerate}
\item Ensure that the QIAxcel BioCalculator itself is completely closed.
Then open BCAE\_GUI.exe
\item The following dialog should pop up:
\end{enumerate}
\begin{itemize}
\item Plate Directories:
\begin{itemize}
\item Fill in the necessary multiplex (MP) fields by locating their directories using the Browse button. This is the .hda, hff directories not the exported jpg directories.
\end{itemize}
\item High and Low Threshold:
\begin{itemize}
\item These fields refer to the positive threshold from the electrophereogram peak.
- the height at which the peak must reach in order to be scored as positive.
A percentage scale is used, which is calculated in relation to the height of the tallest peak, in order to normalize the data across multiple runs/cartridges etc.
The default settings of 10.00\% and 5.00\% are recommended.
For any changes made, a decimal number is expected.
\end{itemize}
\item Peak Calling Tables Directory:
\begin{itemize}
\item These files are included in the PIDs folder of the zip file and can be placed in a convenient location for the user to select.
\end{itemize}
\item Output Directory:
\begin{itemize}
\item It is recommended that the Output Directory be set in the same root location as the MulitplexX folders containing the jpgs.
Once automated export is complete, the root folder should contain all of the MultiplexX folders, as well as the High and Low folders.
\end{itemize}
\item Click Run to perform the automated export. All warning messages generated by the QIAxcel software still need to be handled by the user. If there is ever an error that causes the script's dialog to become unresponsive, Exit the script by right clicking on the AutoIt icon in the Task Bar tray:
\end{itemize}
\subsection{Creating QIAxcel Data Report with Excel}
\begin{enumerate}
\item Make sure that you have installed QIAxcelReportCreator.xla in MSExcel
\item If you have already started a report previously select CGF $\rightarrow{}$ Open a QIAxcel Data Report. Browse for your previous file. Otherwise refer to PCR and Plate Setup.
\item Next, insert gel images by selecting CGF $\rightarrow{}$ Insert Gel Images
\item Following the onscreen prompts, select the directory for the data (the root folder that contains the ``MultiplexX'' and High, Low folders) and then enter the number of multiplexes (8 in most cases), and the number of multiplexes per plate. The gel images will be inserted into new worksheets based on multiplex, along with a data table with strain info for scoring the gel images. You will be asked if you would like to auto-integrate binary peak data. If you choose ``No'' you can still auto-integrate binary data at anytime through the CGF Analysis drop-down menu.
\end{enumerate}
\subsection{Integrating Binary Data into Excel Sheet}
\begin{enumerate}
\item Binary data can be integrated either by clicking ``Yes'' to the ``Would you like to automatically integrate...'' dialog or by going to the ``Autointegrate Binary Data'' submenu from the CGF Analysis drop-down menu.
\item You should now see a dialog like the below (or like this for one threshold at a time):
\end{enumerate}
If you had named the files as recommended before (ie high threshold in a ``High'' folder, and all binary data files named *X\_BIN.csv) or used the Automated Exporter the macro will have autofilled for you. Check that these are the correct files. If the files are wrong, or there was no autofill, browse for the files.
\begin{enumerate}
\item Click ``Run'' when all the files have been entered. The macro will extract binary data for each multiplex band and import it into the appropriate data tables below each gel image, providing a ``1'' for band presence, ``0'' for no band present, and ``no data'' if there is no data found for that band. For both thresholds at once, the addin first integrates all the high threshold data and then moves on to the low threshold.
\item If there is a difference in scoring between the high and the low thresholds, the compare table spot will be red with the words ``dif''. Resolve these ``dif'' flags directly in the compare table, by replacing it with a ``1'' or ``0'' as appropriate.
\end{enumerate}
\subsection{Generating a Final Data Table}
\begin{enumerate}
\item Once the gel images have been inserted, and the band patterns have been scored, select CGF $\rightarrow{}$ Final Data Table $\rightarrow{}$ Generate Final Data Table to transpose all of the report data into a single worksheet named ``Final''. The ``Comparison: Low vs. High'' scoring tables are the ones that are exported to the ``Final'' data table.
\item You will be prompted to select the name of the data table you wish to use (eg. DataTable\_CGF40). The data tables are available for previewing in Appendix B of this manual. If you make an incorrect selection, going to CGF $\rightarrow{}$ Final Data Table $\rightarrow{}$ Generate Final Data Table again will allow you to reselect a data table. (Note that the ``Final'' worksheet will be overwritten)
\item Now the Final worksheet will be filled in and colour coded
\item If you wish to go back to the Reports and change the scoring you can update the changes by going to CGF $\rightarrow{}$ Final Data Table $\rightarrow{}$ Refresh Final Data Table
\end{enumerate}
\subsection{Compiling Final Data From Multiplex Reports}
\begin{enumerate}
\item If you have multiple data reports you can compile their ``Final'' data sheets into one consolidated table. First, ensure that all the files you wish to compile together have their ``Final'' data table already generated.
\item Go to CGF $\rightarrow{}$ Compile Multiple Reports
\item Click ``Save As...'' to specify a name for the new compiled report file
\item Add/Remove as many reports as you wish. This is however, limited by the number of rows in Excel (65,535 rows for Excel 2003 and 1,048,576 rows for Excel 2007)
\item Click ``Compile'' when ready. This will consolidate all listed reports into one. Sample numbers will increment by the number of samples in the reports before it (to give each sample an unique number). The original filename will appear to the right of the binary data.
\end{enumerate}
\subsection{Clustering}
\begin{enumerate}
\item In order to cluster you must be in a workbook with a ``Final'' data table or a ``Final (Compiled)'' data table
\item To begin, go to CGF $\rightarrow{}$ Clustering $\rightarrow{}$ Cluster Strains. This brings up a sub menu with 3 options, Whole Sheet, Select Loci and Select a Range.
\item Clustering a Whole Sheet
\item The Add-In will cluster the ``Final'' data sheet if it exists. Else you will be prompted to enter the name of the sheet to cluster (eg ``Final (Compiled)'').
\item In the dialog that appears, the two algorithms to select from are both based on UPGMA but differ in clash-resolution:
\begin{itemize}
\item UPGMA: If >1 strains have the (same) highest similarity score to the psuedo-strain, a random strain is added to the psuedo-strain
\item yUPGMA: If >1 strains have the (same) highest similarity score to the psuedo-strain, all the strains are added to the psuedo-strain
\end{itemize}
\item Pick an algorithm, enter a numbering threshold (0 - 100\%). Click ``OK'' to cluster
\item Clustering by Loci
\begin{itemize}
\item Cluster by Loci allows you to pick certain loci out of the possible 40 to have the strains cluster according to. The rest of the loci will be outputted to the right of the sheet.
\end{itemize}
\item Clustering by Range
\begin{itemize}
\item This is a tentative feature.
It allows you to cluster by selecting certain rows, and certain loci.
The non-selected loci/strains will NOT be outputted to sheet.
When selecting, select binary data only.
Do not select Sample information, or multiplex information
\end{itemize}
\item \textbf{Note:} The clustering process can become quite slow with a large number of samples. A progress bar for clustering can be turned on via CGF $\rightarrow{}$ Preferences
\end{enumerate}
\addtocontents{toc}{\vspace{2em}} % Add a gap in the Contents, for aesthetics
\appendix % Cue to tell LaTeX that the following 'chapters' are Appendices
\section{Exporting Binary Peak Data (Manual)}
\label{sec:exporting_binary_peak_data_manual}
\begin{enumerate}
\item Close all files in the QIAxcel software first (Window $\rightarrow{}$ Close All to close all files at once. You will be prompted to save)
\item Go to the ``Plate Image \& Result File Creator'' window of the BioCalculator by going to File $\rightarrow{}$ Export. This option is only present if all files are closed
\item Select the plate directory with the *.hda files.
\item To set a threshold, check ``Use these Integration parameters'' and then click the ``Params'' button. In the Pos. Threshold field, enter a percentage. (eg. 5\% for a low threshold and 10\% for a high threshold)
\item Under the ``Binary Peak Calling'' area, click ``Enable''
\item A binary peak data *\_BIN.csv file must be generated for each multiplex. Repeat the next steps for each multiplex.
\item Select a different ``Image/Result File Name'' for each multiplex. The extension in the field will be .jpg. The software will also generate a .csv and a \_BIN.csv file (the one of interest) It is recommended that you name each *X.jpg (eg mp1.jpg) this will generate a *X\_BIN.csv file (eg mp1\_BIN.csv) automatically
\item Click the ``Table'' button. Load the appropriate .pid file (mp1.pid, mp2.pid...mp8.pid these files are included in the PIDs folder of the zip) by clicking ``Open''. Do not edit anything under the ``Names'' column, the macro will mess up later. It is ok to edit Reltime and Tol. (not recommended).
\item Click ok. You are now back at the previous dialog. Under ``Files to Process'' click ``Select All''.
\item Your screen should look something like the image below. Now click ``Process'' to generate the data files.
\item You should have generated a number of *\_BIN.csv files equal to the number of multiplexes. Change the threshold under ``Params'' to generate data for a different positive threshold. Placing all \_BIN.csv files for the High threshold in a sub folder ``High'' and all the \_BIN.csv files for the Low threshold in a sub folder ``Low'' in the root folder where the ``MultiplexX'' folders also are will have the macro autofill for you.
\end{enumerate}
\section{Data Tables}
\begin{itemize}
\item DataTable\_CGF35
\item DataTable\_CGF40
\item DataTable\_CGF40\_1stHalf
\item DataTable\_CGF40\_2ndHalf
\end{itemize}
\section{Known Issues}
\begin{enumerate}
\item There cannot be other folders in the path with ``Multiplex'' in the name beside the ones with the gel images
\item The macro does not remember user selected files in the ``Need Binary Data CSV Files - (High/Low) Threshold'' dialogs. Every time they show, the selected files will be overwritten by the auto specified files (./High/*X\_BIN.csv and ./Low/*X\_BIN.csv where X is an integer between 1-8)
\item In the *X\_BIN.csv files, the bands must be named MPXBY also the well numbers of the image, *\_BIN.csv file and scoring table must match. This should not be a problem if the .pid files are not edited.
\item Does not work with MSExcel 2000. Needs Microsoft Office 10.0 Object Library, MSExcel 2000 only comes with 9.0 Object library. If you have the required library and it is still generates compile errors you need to add the library to the compile. Go to Tools $\rightarrow{}$ Macro $\rightarrow{}$ Visual Basic Editor, then in the Visual Basic Editor go to Tools $\rightarrow{}$ References then check the checkbox for Microsoft Office 10.0 Object Library and click ok.
\end{enumerate}
\end{document}
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment