Skip to content

Instantly share code, notes, and snippets.

@jeremy6462
Created February 16, 2018 21:28
Show Gist options
  • Save jeremy6462/1bd2b1163151c5e8603acd2ce4f2e330 to your computer and use it in GitHub Desktop.
Save jeremy6462/1bd2b1163151c5e8603acd2ce4f2e330 to your computer and use it in GitHub Desktop.
<!DOCTYPE html>
<meta charset='utf-8'>
<html>
<head>
<script src="http://d3js.org/d3.v4.min.js" charset="utf-8"></script>
<script src="https://cdnjs.cloudflare.com/ajax/libs/d3-legend/2.25.5/d3-legend.min.js"></script>
<script src="https://d3js.org/d3-scale-chromatic.v1.min.js"></script>
<link rel='stylesheet' href='style.css'>
</head>
<body>
<h1>Airbnb Rentals in Boston</h1>
<p>
1. Scatter plots are visualizations used to depict correlations between multiple variables. Data points are represented as graphics in the body of the visualization and properties of those data points map to visual aspects of the individual graphics. Scatter plots are created by mapping these data properties (usually denoted as columns in a table) to visual properites of the graphics displayed on screen. X, Y position is the basic visual property modification in a scatter plot where the x, y positions of the graphic on screen map to 2 respective properties of the data, and these properties are consistent for all points displayed in the scatter plot. Scatter plots are used to represent multiple dimensions of data in one visualization. The number of scatter point visual property modifications directly map to the amount of variables/data properties we'd like to show in the visualization. There are many different visual properties we can modify in a scatter plot:
</p>
<ol>
<li>The shape of the scatter plot point can be modified to represent nominal values in the data. If a column had 2 possible values, 2 different shapes could be used as points on the scatter plot to represent which points showed either of the two nominal values.</li>
<li>The size of the point can be modified to represent ordinal values (eg. size mapped to guest capacity in the below example)</li>
<li>Opacity of the point can be modified to represent ordinal values (eg. opacity of the point mapped to the rating of an item)</li>
<li>Border width of the point can be modified to represent ordinal values (eg. border width mapped to population size)</li>
<li>Border color of the point can be mapped to a nominal value (eg. border color mapped to population average ethnicity)</li>
<li>Fill color of the point can be mapped to a nominal value (eg. the neighborhood of an apartment decides the initial fill color in the scatter plot below)</li>
<li>Fill-image of the point can be mapped to nominal value (eg. a point could have a fill image of a nation's flag if a point represented a specific country)</li>
<li>Scatter plots could include grid lines to help the user see important relationships between points and help the user map a point back to either axis of the scatter plot</li>
<li>A trend line could be drawn to represent a relationship between all of the data points.</li>
<li>Trend lines could have different colors if we wanted to show different relationships between points</li>
<li>Lines can be drawn connecting specific points to represent replationships between only those points</li>
<li>Cluster shapes can be drawn around points to visualize replationships between many points</li>
<li>Small connecting or trendlines can have different textures like dotted, fill, jagged to represent differences in the lines</li>
<li>Multiple scatter plots can be shown on screen inside visualization to highlight different correlations in the data at the same time</li>
<li>Points could fill down to the x-axis or over to the side of the y axis to create a shape if interval values are important (eg. a scatter plot of weather could show that one point is 3x warmer than another point via this fill-shape mechanism)</li>
<li>The scale of the axis could be modified to illustrate a certain relationship in the data that may take up too much screen space with a linear axis</li>
<li>Axis increments could break to bring outliers closer to the average data points</li>
<li>The background of the entire scatter plot could be an image known to the user so that points in space have a practical frame of reference</li>
</ol>
<p>
2. I think it would be interesting to allow the user to progressivly add more data to the visualization to see how certain data points change a trend line. This would be very practical for allowing users to understand how only looking at small portions of the data could skew their understanding of the world as a whole. For example, if a scatter plot showed certain politicans and their approval ratings, the user could initially only looked at data from the southern states to understand one perspective and as we add in more data, it'd be interesting to see how the overall rating of the politican changes. This could be reflected via color of the whole graph or a trend line, with either describing the country's overall apporval rating. Moreover, this tool could show how data was collected over time, which informs the user that the first data isn't always representative of the whole.
</p>
<p>
3. I chose a dataset that contains metadata and overall satisfaction of Airbnb rentals in the Boston area (found <a href="http://tomslee.net/airbnb-data-collection-get-the-data">here</a>). The dataset includes information about the rental apartment including the price, lat/long, guest capacity, the listing name, etc. My chart starts by illustrating the distance of the Airbnb (in miles) from the center of Boston (city hall) and how that distance relates to price. The points are colored by their neighborhood, sized by their guest capacity, and opacity-filled by their overall rating. All of these values can be changed using the drop down selection tools below the graph. You can also toggle on selection of the data using a box selection tool to get a deeper understanding of certain data points.
</p>
<p>
7. I thought it was very interesting to see that in most neighborhoods, no matter how close to Boston, it is easy to find cheap and expensive apartments. I thought all apartments close to boston would be expensive and all apartments far from Boston would be cheap but that is not the case. This is due to fact that apartment cost is based on multiple values (guest capacity, ammenities, bathroom count), not just location. This relationship could also be represented via a radial or sun graph where distance from the center represents distance from the center of Boston and price could be size of the points. The guest capacity also does not seem to have much of a positive or negative relationship with price since you are able to find apartments that are the same price for 1 or 8 people. Finally, I found it strange that no one rated any apartment between 0-3. All ratings were either 3-5 or 0 stars.
</p>
<!-- The body is populated in script.js -->
<!-- The original dataset can be found at: http://tomslee.net/airbnb-data-collection-get-the-data -->
<script type='text/javascript' src='script.js'></script>
<!--table for data of brushed elements-->
<div id="table">
<table>
<tr>
<th>Name</th>
<th>Price (in USD)</th>
<th>Accommodates</th>
<th>Rating</th>
</tr>
</table>
</div>
<div id="option">
<input name="boxLassoModeToggle"
type="button"
value="Toggle Box Selection"
onclick="toggleBoxSelection()" />
</div>
</body>
</html>
// sizing of the graphic
const margin = { top: 30, right: 300, bottom: 50, left: 50 };
const outerHeight = getHeight();
const innerHeight = outerHeight - margin.top - margin.bottom;
const outerWidth = getWidth();
const innerWidth = outerWidth - margin.left - margin.right;
// the initial values used for each property of the scatter points
const xValue = d => d.longitude;
const yValue = d => d.price;
const sizeValue = d => d.accommodates;
const transparencyValue = d => d.overall_satisfaction;
const colorValue = d => d.neighborhood
// the lat long of Boston City Hall
const bostonCenterLat = 42.3601
const bostonCenterLong = -71.0589
// ordering of the neighborhoods used based on their general proximity to Boston City Hall
const neighborhoods = ["Downtown", "Beacon Hill", "Back Bay", "South Boston", "Fenway", "Jamaica Plain", "Roslindale", "West Roxbury"];
// the data attributes that can be modified for the points, ordered based on their initial values
var xAxisSelectables = [{"text": "Distance to Boston"},{"text": "Price"},
{"text": "Guest Capacity"}, {"text": "Rating"},
];
var yAxisSelectables = [{"text": "Price"}, {"text": "Distance to Boston"},
{"text": "Guest Capacity"}, {"text": "Rating"},
];
var sizeSelectables = [{"text": "Guest Capacity"},{"text": "Price"}, {"text": "Distance to Boston"},
{"text": "Rating"}
];
var transSelectables = [{"text": "Rating"}, {"text": "Guest Capacity"},{"text": "Price"}, {"text": "Distance to Boston"},];
var axisSelectableToDataPropMap = {
"Distance to Boston": "longitude",
"Price": "price",
"Guest Capacity": "accommodates",
"Rating": "overall_satisfaction"
}
// methods for finding the width and height of the page so that the graph is sized properly for the display
// found here: https://stackoverflow.com/a/1038781/4811913
function getWidth() {
return Math.max(
document.body.scrollWidth,
document.documentElement.scrollWidth,
document.body.offsetWidth,
document.documentElement.offsetWidth,
document.documentElement.clientWidth
);
}
function getHeight() {
return Math.max(
document.body.scrollHeight,
document.documentElement.scrollHeight,
document.body.offsetHeight,
document.documentElement.offsetHeight,
document.documentElement.clientHeight
);
}
// a function used to find the distance between two lat long points
// found here: https://stackoverflow.com/questions/27928/calculate-distance-between-two-latitude-longitude-points-haversine-formula
function getDistanceFromLatLonInKm(lat1,lon1,lat2,lon2) {
var R = 6371; // Radius of the earth in km
var dLat = deg2rad(lat2-lat1); // deg2rad below
var dLon = deg2rad(lon2-lon1);
var a =
Math.sin(dLat/2) * Math.sin(dLat/2) +
Math.cos(deg2rad(lat1)) * Math.cos(deg2rad(lat2)) *
Math.sin(dLon/2) * Math.sin(dLon/2)
;
var c = 2 * Math.atan2(Math.sqrt(a), Math.sqrt(1-a));
var d = R * c; // Distance in km
return d;
}
function deg2rad(deg) {
return deg * (Math.PI/180)
}
const row = d => {
d.price = +d.price;
// store the distance to boston in d.Longetitude
d.longitude = getDistanceFromLatLonInKm(bostonCenterLat, bostonCenterLong, parseFloat(d.latitude), parseFloat(d.longitude))
d.accommodates = +d.accommodates
d.overall_satisfaction = parseFloat(d.overall_satisfaction);
return d;
};
var svg;
var brush;
var circles;
d3.csv('data.csv', row, data => {
var body = d3.select('body');
// set the dimensions of the visualization
svg = body.append('svg')
.attr('height', outerHeight)
.attr('width', outerWidth)
.append('g')
.attr('transform','translate(' + margin.left + ',' + margin.top + ')');
// Scales
var xScale = d3.scaleLinear()
.domain(d3.extent(data, xValue))
.range([0,innerWidth]);
var yScale = d3.scaleLinear()
.domain(d3.extent(data, yValue))
.range([innerHeight,0]);
var sizeScale = d3.scaleLinear()
.domain(d3.extent(data, sizeValue))
.range([1,12]);
var tranScale = d3.scaleLinear()
.domain(d3.extent(data, transparencyValue))
.range([0.2,0.8])
var colorScale = d3.scaleOrdinal()
.domain(neighborhoods)
.range(d3.schemeCategory10);
var xAxis = d3.axisBottom()
.scale(xScale);
var yAxis = d3.axisLeft()
.scale(yScale);
// axis modifiers
var span = body.append('span')
.text('Select X-Axis variable: ');
var xInput = body.append('select')
.attr('id','xSelect')
.on('change',xChange)
.selectAll('option')
.data(xAxisSelectables)
.enter()
.append('option')
.attr('value', function (d) { return d.text })
.text(function (d) { return d.text ;});
body.append('br');
var span = body.append('span')
.text('Select Y-Axis variable: ');
var yInput = body.append('select')
.attr('id','ySelect')
.on('change',yChange)
.selectAll('option')
.data(yAxisSelectables)
.enter()
.append('option')
.attr('value', function (d) { return d.text })
.text(function (d) { return d.text ;});
body.append('br');
// select the size variable
var span = body.append('span')
.text('Select size variable: ');
var yInput = body.append('select')
.attr('id','sizeSelect')
.on('change',sizeChange)
.selectAll('option')
.data(sizeSelectables)
.enter()
.append('option')
.attr('value', function (d) { return d.text })
.text(function (d) { return d.text ;});
body.append('br');
// select the opacity variable
var span = body.append('span')
.text('Select opacity variable: ');
var yInput = body.append('select')
.attr('id','opacity')
.on('change',opacityChange)
.selectAll('option')
.data(transSelectables)
.enter()
.append('option')
.attr('value', function (d) { return d.text })
.text(function (d) { return d.text ;});
// store the value of the currently selected neighborhood
var neighborhood_selected = "";
// Color legend - Neighborhoods
var colorLegend = d3.legendColor()
.scale(colorScale)
.shape('circle')
.cells(neighborhoods)
.shapePadding(4)
.on("cellclick", function(type) {
// selecting a new neighborhood
if (neighborhood_selected === "" || neighborhood_selected !== type) {
neighborhood_selected = type;
// dim all the neighborhoods in legend except for the selected neighborhood
d3.selectAll(".cell")
.style("opacity", 0.1);
d3.select(this)
.style("opacity", 1);
// remove the neighborhoods who are not in this neighborhood
d3.selectAll(".non_brushed")
.style("opacity", 0)
.filter(function(d) {
return d["neighborhood"] == type;
})
.style("opacity", 1)
} else { // selecting the same neighborhood; act as a reset of the legend
neighborhood_selected = ""
// dim all the icons in legend
d3.selectAll(".cell")
.style("opacity", 1);
// dim all the data circles
d3.selectAll(".non_brushed")
.style("opacity", 1)
}
});
// X-axis setup
svg.append('g')
.attr('class','axis')
.attr('id','xAxis')
.attr('transform', `translate(0, ${innerHeight})`)
.call(xAxis)
.append('text') // X-axis Label
.attr('id','xAxisLabel')
.attr('class', 'axis-label')
.attr('y',30)
.attr('x',innerWidth/2)
.text('Distance from Boston City Hall (in miles)');
// Y-axis setup
svg.append('g')
.attr('class','axis')
.attr('id','yAxis')
.call(yAxis)
.append('text') // y-axis Label
.attr('id', 'yAxisLabel')
.attr('class', 'axis-label')
.attr('transform','rotate(-90)')
.attr('x',-innerHeight/2)
.attr('y',-30)
.style('text-anchor','middle')
.text('Price (USD)');
// Color legend set up
svg.append('g')
.attr('class', 'legend')
.attr('id', 'colorLegend')
.attr('transform', `translate(${innerWidth-300})`)
.call(colorLegend)
.append('text')
.attr('class', 'axis-label')
.attr('y', -20)
.text('Neighborhoods');
// Scatter points set up
circles = svg.selectAll('circle')
.data(data)
.enter()
.append('circle')
.attr('cx', d => xScale(xValue(d)))
.attr('cy', d => yScale(yValue(d)))
.attr('r', d => sizeScale(sizeValue(d)))
.attr('fill-opacity', d => tranScale(transparencyValue(d)))
.style('fill', d => colorScale(colorValue(d)))
.attr("class", "non_brushed")
.on("mouseover", mouseOver)
.on("mouseout", mouseOut);
// highlight brushing found using piazza: http://bl.ocks.org/feyderm/6bdbc74236c27a843db633981ad22c1b
function highlightBrushedCircles() {
if (d3.event.selection != null) {
// revert circles to initial style
circles.attr("class", "non_brushed");
var brush_coords = d3.brushSelection(this);
// style brushed circles
circles.filter(function (){
var cx = d3.select(this).attr("cx"),
cy = d3.select(this).attr("cy");
return isBrushed(brush_coords, cx, cy);
})
.attr("class", "brushed");
}
}
function displayTable() {
// disregard brushes w/o selections
// ref: http://bl.ocks.org/mbostock/6232537
if (!d3.event.selection) return;
// clearing of brush after mouse-up
// ref: https://github.com/d3/d3-brush/issues/10
d3.select(this).call(brush.move, null);
var d_brushed = d3.selectAll(".brushed").data();
// calculate the average of the selected data for display then display all selected data in the table
var price_sum = 0;
var accommodates_sum = 0;
var rating_sum = 0;
if (d_brushed.length > 0) {
clearTableRows();
d_brushed.forEach(function(d_row) {
price_sum += d_row.price;
accommodates_sum += d_row.accommodates;
rating_sum += d_row.overall_satisfaction;
})
var total_rows = d_brushed.length;
var price_average = (price_sum / total_rows);
var accommodates_average = (accommodates_sum / total_rows);
var rating_average = (rating_sum / total_rows);
var average_row = {
"name": "AVERAGE",
"price": price_average.toFixed(2),
"accommodates": accommodates_average.toFixed(2),
"overall_satisfaction": rating_average.toFixed(2)
}
populateTableRow(average_row);
d_brushed.forEach(function(d_row) {
populateTableRow(d_row);
})
} else {
clearTableRows();
}
}
brush = d3.brush()
.on("brush", highlightBrushedCircles)
.on("end", displayTable);
svg.append("g")
.call(brush);
// initially turn off the brush effect - allow for toggling
d3.selectAll(".overlay")
.style("display", "none")
// handle the axis, size, and opacity modifiers - found via piazza post by TAs
function xChange() {
var value = axisSelectableToDataPropMap[this.value]; // get the new x value
xScale
.domain([
d3.min([0,d3.min(data,function (d) { return d[value] })]),
d3.max([0,d3.max(data,function (d) { return d[value] })])
]);
xAxis.scale(xScale); // update with new xscale
d3.select('#xAxis') // redraw the xAxis
.call(xAxis);
d3.select('#xAxisLabel') // change the xAxisLabel
.text(this.value);
d3.selectAll('.non_brushed') // update the circles
.attr('cx',function (d) { return xScale(d[value]) });
}
function yChange() {
var value = axisSelectableToDataPropMap[this.value]; // get the new y value
yScale // change the yScale
.domain([
d3.min([0,d3.min(data,function (d) { return d[value] })]),
d3.max([0,d3.max(data,function (d) { return d[value] })])
]);
yAxis.scale(yScale); // change the yScale
d3.select('#yAxis') // redraw the yAxis
.call(yAxis);
d3.select('#yAxisLabel') // change the yAxisLabel
.text(this.value);
d3.selectAll('.non_brushed') // update the circles
.attr('cy',function (d) { return yScale(d[value]) });
}
function sizeChange() {
var value = axisSelectableToDataPropMap[this.value]; // get the new size value
sizeScale // change the size scale
.domain([
d3.min([0,d3.min(data,function (d) { return d[value] })]),
d3.max([0,d3.max(data,function (d) { return d[value] })])
]);
d3.selectAll('.non_brushed') // update the circles
.attr('r',function (d) { return sizeScale(d[value]) });
}
function opacityChange() {
var value = axisSelectableToDataPropMap[this.value]; // get the new opacity value
tranScale // change the opacity scale
.domain([
d3.min([0,d3.min(data,function (d) { return d[value] })]),
d3.max([0,d3.max(data,function (d) { return d[value] })])
]);
d3.selectAll('.non_brushed') // update the circles
.attr('fill-opacity',function (d) { return tranScale(d[value]) });
}
})
// handle mousing over the data
function mouseOver(data, i) {
d3.select(this)
.style("stroke-width", 5)
.style("stroke", "black");
svg.append("text")
.attr('y', -10) // location is always top left of the chart
.attr('x', 0)
.attr('id', "label" + "-" + i)
.text(function() {
return "Name: " + data.name + " Price: " + data.price + " Rating: " + data.overall_satisfaction + " Guest Capacity: " + data.accommodates; // Value of the text
});
}
function mouseOut(data, i) {
d3.select(this)
.style("stroke-width", 0)
d3.select("#label" + "-" + i).remove(); // remove the label that we added in mouse over
}
function clearTableRows() {
hideTableColNames();
d3.selectAll(".row_data").remove();
}
function isBrushed(brush_coords, cx, cy) {
var x0 = brush_coords[0][0],
x1 = brush_coords[1][0],
y0 = brush_coords[0][1],
y1 = brush_coords[1][1];
return x0 <= cx && cx <= x1 && y0 <= cy && cy <= y1;
}
function hideTableColNames() {
d3.select("table").style("visibility", "hidden");
}
function showTableColNames() {
d3.select("table")
.style("visibility", "visible");
}
// wheather or not we are in box selection mode
var boxSelectionMode = false;
function toggleBoxSelection() {
if (boxSelectionMode == false) {
boxSelectionMode = true
// display the box selection tool
d3.selectAll(".overlay")
.style("display", "inline")
} else {
boxSelectionMode = false
// hide the box selection tool
d3.selectAll(".overlay")
.style("display", "none")
clearTableRows()
// reset all circles
circles.attr("class", "non_brushed");
}
}
function populateTableRow(d) {
showTableColNames();
var d_row_filter = [d.name,
d.price,
d.accommodates,
d.overall_satisfaction];
d3.select("table")
.append("tr")
.attr("class", "row_data")
.selectAll("td")
.data(d_row_filter)
.enter()
.append("td")
.attr("align", (d, i) => i == 0 ? "left" : "right")
.text(d => d);
}
h1{
text-align: center;
}
span {
font-size: 14px;
}
.axis-label {
fill: black;
font: sans-serif;
font-size: 12px;
}
.cell {
font: sans-serif;
font-size: 12px;
}
.legend .axis-label {
font-size: 14px;
}
table {
visibility: hidden;
position: absolute;
top: 30px;
right: 10px;
font-family: sans-serif;
font-size: 0.7em;
overflow: scroll;
background-color: white;
}
tr:nth-child(even) {
background-color: #d9d9d9;
}
.brushed {
stroke: #8e1b54;
opacity: 1.0;
}
.non_brushed {
opacity: 0.5;
}
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment