Skip to content

Instantly share code, notes, and snippets.

@jguenther
Forked from alexpreynolds/eulergrid.pl
Last active August 29, 2015 14:07
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save jguenther/a13310c0403ac2c3663b to your computer and use it in GitHub Desktop.
Save jguenther/a13310c0403ac2c3663b to your computer and use it in GitHub Desktop.
#!/usr/bin/perl -w
# from https://www.biostars.org/p/77362/
#
# https://gist.github.com/alexpreynolds/5531166
use warnings;
use strict;
use Getopt::Long;
use Carp;
my ($plotTitle, $offCellColor, $onCellColor, $setNames, $setCardinalities,
$setTotal, $setTotalWithout, $outputFilename, $showWholeSets, $rGraphScript,
$ctsCounts);
GetOptions (
"plotTitle=s" => \$plotTitle,
"offCellColor=s" => \$offCellColor,
"onCellColor=s" => \$onCellColor,
"setNames=s" => \$setNames,
"setCardinalities=s" => \$setCardinalities,
"setTotal=s" => \$setTotal,
"setTotalWithout=s" => \$setTotalWithout,
"outputFilename=s" => \$outputFilename,
"showWholeSets=s" => \$showWholeSets,
"rGraphScript=s" => \$rGraphScript,
"ctsCounts=s" => \$ctsCounts,
);
# optional args
if (!$offCellColor) { $offCellColor = "red"; }
if (!$onCellColor) { $onCellColor = "green"; }
if (!$setTotalWithout) { $setTotalWithout = -1; }
if (!$showWholeSets) { $showWholeSets = -1; } else { $showWholeSets = 1; }
if (!$plotTitle) {
die "specify --plotTitle=str\n";
}
if (!$setNames) {
die "specify --setNames=a1,a2,a3,...,aN\n";
}
if (!$setCardinalities) {
die "specify --setCardinalities=c1,c2,c3,...,cN,c1^c2,c1^c3,...,c1^c2^c3^...^cN\n";
}
if (!$setTotal) {
die "specify --setTotal=n\n";
}
if (!$outputFilename) {
die "specify --outputFilename=str.png|ps\n";
}
if (!$ctsCounts) {
die "specify --ctsCounts=cts1,cts2,...,ctsN\n";
}
if (!$rGraphScript) {
$rGraphScript = "/home/areynolds/proj/eulergrid/src/plotEulergrid.R";
}
my $args = qq{
plotTitle='$plotTitle' offCellColor='$offCellColor' onCellColor='$onCellColor' setNames='$setNames' setCardinalities='$setCardinalities' setTotal='$setTotal' setTotalWithout='$setTotalWithout' outputFilename='$outputFilename' showWholeSets='$showWholeSets' ctsCounts='$ctsCounts'
};
my $rScriptSys = qq{
R CMD BATCH --no-save --no-restore "--args $args" $rGraphScript eulergrid.log 2>&1
};
system ($rScriptSys) == 0 or die "R script failed $?";
./eulergrid.pl \
--setNames=GM06990,HepG2,K562,SKNSH,TH1 \
--plotTitle="Footprint__overlaps__for__multiple__cell__lines\n(FDR__0.001)" \
--setCardinalities=212350,233552,270586,287731,240701,93351,64049,89860,110579,62852,96806,89476,62075,64644,90129,30893,51178,53416,29083,32041,51033,28922,28279,48629,27407,22805,23548,39400,22418,21029,17172 \
--setTotal=689952 \
--outputFilename=example_figure.png \
--offCellColor="gray80" \
--onCellColor="springgreen4" \
--ctsCounts=65897,97624,173336,150753,91965
#######################################
# https://www.biostars.org/p/70963/
#######################################
Because it's not always possible to use a Venn diagram (a circular one that could be made in R) to show overlaps between three or four sets, I'll suggest something a little different.
I came up with something I call an "Eulergrid" which shows a bar graph, where each bar is an element in the power set of intersected sets, and a grid of overlap cases underneath (e.g., for three sets: A, B, C, A ∩ B, B ∩ C, A ∩ C, A ∩ B ∩ C).
The bar graph shows the overlap cardinalities between set intersections contained in the power set. The grid shows the intersection between one and more sets, and is aligned to the value shown in the bar graph column. The bar graph is sorted by overlap cardinality, presented from left to right, from least to greatest cardinality. (I leave out visualizing the empty set, although strictly speaking this is also a valid subset.)
While an Eulergrid is admittedly less intuitive to read than a circular Venn diagram, it can always show all true overlaps between all the sets, and without adding distortion or visual errors from "impossible" Venn overlaps.
The R script used to make Eulergrids will scale up to however many sets you need to show intersections for, but it will create an exponentially wider figure as the total number of permutations of intersections increase as a power of 2 (three sets have eight power set subsets, intersections of four sets have sixteen subsets; five sets have thirty-two subsets, etc.).
To demonstrate, here's an example of what an Eulergrid figure looks like:
[example_figure.png]
The green denotes the count for that subset. Yellow coloring, in the context of this figure, represents cell-specific cardinality, i.e. the counts that are unique to a single cell type or dataset.
As a way to read this, for example, 42% of the total element overlaps over these five cells types involve SKNSH in some way. Of all those overlaps, roughly half can be assigned to SKNSH alone.
Here's an example of calling the Perl wrapper on the command line, which was used to make the figure shown above:
==================================
$ ./eulergrid.pl \
--setNames=GM06990,HepG2,K562,SKNSH,TH1 \
--plotTitle="Footprint__overlaps__for__multiple__cell__lines\n(FDR__0.001)" \
--setCardinalities=212350,233552,270586,287731,240701,93351,64049,89860,110579,62852,96806,89476,62075,64644,90129,30893,51178,53416,29083,32041,51033,28922,28279,48629,27407,22805,23548,39400,22418,21029,17172 \
--setTotal=689952 \
--outputFilename=example_figure.png \
--offCellColor="gray80" \
--onCellColor="springgreen4" \
--ctsCounts=65897,97624,173336,150753,91965
==================================
The option --ctsCounts refers to the yellow coloring I describe up above, representing "cell-type-specific" counts.
Hopefully, this gives you some ideas or at least an understanding that Venn diagrams cannot always represent intersections between more than three sets (and sometimes not even between three sets).
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment