To use GGity in Livebook, we need to install GGity, and also Kino, which renders image output in a livebook.
Mix.install([:ggity, :kino])
GGity ships with several sample datasets commonly used with the R language.
We will use the mtcars
dataset,
which includes fuel consumption and other data describing design and
performance for 32 automobiles (1973-74 models) from 1974 Motor Trend U.S. magazines.
Let's take a quick look at the dataset first.
data = GGity.Examples.mtcars()
Kino.DataTable.new(data)
Now let's explore the data with a scatterplot - we will structure it
as a function so that we can toggle what variable goes on the x axis,
what goes on the y axis, and more. This function will return a Plot
struct that we can render or hang on to for further modification.
alias GGity.Plot
scatterplot = fn x_variable, y_variable, color_variable, color_palette, plot_title ->
data
|> Plot.new(%{x: x_variable, y: y_variable})
|> Plot.geom_point(%{color: color_variable})
|> Plot.scale_color_viridis(option: color_palette)
|> Plot.labs(title: plot_title)
end
GGity.plot/1
returns an iolist, but Kino renders binary data. Let's define a utility function to which we can feed our Plot
structs and get them rendered.
render = fn plot ->
plot
|> Plot.plot()
|> to_string()
|> Kino.Image.new(:svg)
end
Let's get some inputs.
x =
Kino.Input.select("X Variable",
wt: "wt",
mpg: "mpg",
qsec: "qsec",
disp: "disp"
)
|> Kino.render()
|> Kino.Input.read()
y =
Kino.Input.select("Y Variable",
wt: "wt",
mpg: "mpg",
qsec: "qsec",
disp: "disp"
)
|> Kino.render()
|> Kino.Input.read()
color =
Kino.Input.select("Color Variable",
cyl: "cyl",
am: "am",
gear: "gear"
)
|> Kino.render()
|> Kino.Input.read()
palette =
Kino.Input.select("Color Palette",
viridis: "viridis",
plasma: "plasma",
magma: "magma",
inferno: "inferno",
cividis: "cividis"
)
|> Kino.render()
|> Kino.Input.read()
title =
Kino.Input.text("Plot Title", default: "Motor Trend")
|> Kino.render()
|> Kino.Input.read()
Kino.nothing()
mtcars_scatterplot = scatterplot.(x, y, color, palette, title)
render.(mtcars_scatterplot)
Now we will change the plot formatting to a lighter theme.
import GGity.Element.{Line, Rect, Text}
theme = [
axis_line: element_line(color: "gray", size: 0.25),
legend_key: element_rect(fill: "white"),
panel_background: element_rect(fill: "white"),
panel_grid: element_line(color: "lightgray"),
panel_grid_major: element_line(size: 0.5)
]
mtcars_scatterplot
|> Plot.theme(theme)
|> render.()
render.(mtcars_scatterplot)
Continuing with the cars theme, now we will explore the mpg
dataset with
a bar chart. This dataset describes city/highway mileage data for 235
makes and models of vehicles.
Let's look at the data.
data = GGity.Examples.mpg()
Kino.DataTable.new(data)
Now we will create our bar chart; this one won't be configurable, but the process for making it so would be comparable to our approach in the scatterplot above.
In this example, we will plot the number of models in each class by manufacturer.
data
|> Enum.filter(fn record ->
record["manufacturer"] in [
"chevrolet",
"audi",
"ford",
"nissan",
"subaru",
"toyota"
]
end)
|> Plot.new(%{x: "manufacturer"})
|> Plot.geom_bar(%{fill: "class"})
|> Plot.scale_y_continuous(labels: &floor/1)
|> Plot.labs(
title: "Product Line Analysis",
y: "Number of Models",
x: "Manufacturer",
fill: "Vehicle Class"
)
|> render.()
Audi, why do you hate big cars so much?
Using the same data, let's draw some boxplots. In this example we will use static colors for some of the elements (instead of black, the default).
Here we will plot the distribution of highway mileage by vehicle class.
data
|> Plot.new(%{x: "class", y: "hwy"})
|> Plot.geom_boxplot(fill: "white", color: "#3366FF")
|> render.()
While the relative medians across classes are no surprise, it is interesting to compare the medians of subcompacts to compacts and midsize vehicles - not as different as one might assume, with a wide variety of highway mileage across the subcompact class.
Of course, the blue outline and white fill are great too.
For our line chart example, we will use the economics
dataset.
This data describes certain economic indicators over the past several decades.
data = GGity.Examples.economics()
Kino.DataTable.new(data)
It is easy for us to plot a line for one of these variables, say, unemployment.
data
|> Plot.new(%{x: "date", y: "unemploy"})
|> Plot.geom_line()
|> render.()
Quick tangent - note the y-axis labels. The default Elixir format for printed floats is rarely satisfying for large numbers. We can fix that, and make the line dotted and purple while we are at it.
plot =
data
|> Plot.new(%{x: "date", y: "unemploy"})
|> Plot.geom_line(color: "purple", linetype: :dotted)
|> Plot.scale_y_continuous(labels: :commas)
|> render.()
Here we used the name of a built-in labeling function, :commas
(passed
to Plot.scale_y_continuous/2
), but any function that takes the label
value as an argument and returns the desired label text will work.
to_thous = fn value -> "#{round(value / 1000)} thou" end
plot =
data
|> Plot.new(%{x: "date", y: "unemploy"})
|> Plot.geom_line(color: "limegreen", linetype: :dotted)
|> Plot.scale_y_continuous(labels: to_thous)
|> render.()
Back to real work - there are bunch of variables in this time series data; what if we want to see how each of them moved over time on the same plot?
Enter the economics_long
dataset, which normalizes each observation to
a value between zero and one, and presents that number in the value01
variable.
The name of the variable is stored in the... variable
variable.
Take a look:
data = GGity.Examples.economics_long()
Kino.DataTable.new(data)
This data shape allows us to assign a chart aesthetic (in this example,
the color of the line) to the the variable
variable, and GGity will group
those observations automatically.
data
|> Plot.new(%{x: "date", y: "value01", color: "variable"})
|> Plot.geom_line()
|> render.()
We could use linetype instead of color if we desire, although in this case it does not seem desirable.
Since we are making the plot uglier in that regard, let's use custom date formatting to at least simplify the x axis.
data
|> Plot.new(%{x: "date", y: "value01", linetype: "variable"})
|> Plot.geom_line()
|> Plot.scale_x_date(date_labels: "%Y")
|> render.()