Skip to content

Instantly share code, notes, and snippets.

@mhaligowski
Last active December 15, 2015 12:59
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save mhaligowski/5263914 to your computer and use it in GitHub Desktop.
Save mhaligowski/5263914 to your computer and use it in GitHub Desktop.
MDS songs count by years

Million Songs Dataset is probably one of the most popular datasets for those who want to start fiddle with Big Data analysis and Hadoop. In a nutshell, it's a set of million songs, described by a long set of characteristics, like year of publishing, where the artist comes from, but also shape of the wave, segments, etc.

In order for a time analysis (like, how does the tempo change throughout the years), it is good to know what is the distribution of the data among the time. And this is what the chart above is about - I just how many songs for each year are there in the dataset.

The analysis was performed on 10 small instances on Amazon Map Reduce, and it took nearly 10 hours, which means that the cost of the analysis was 10 instances * 10 hours * (0.015 + 0.06)$ = 7.50$. Pretty cheap, isn't it?

More to come!

<!DOCTYPE html>
<meta characterset="utf-8">
<style>
body {
font: 10px sans-serif;
}
.axis path,
.axis line {
fill: none;
stroke: #000;
shape-rendering: crispEdges;
}
.bar {
fill: steelblue;
}
.x.axis path {
display: none;
}
</style>
<body>
<script src="http://d3js.org/d3.v3.min.js"></script>
<script>
var margin = {top: 20, right: 10, bottom: 30, left: 50},
width = 960 - margin.left - margin.right,
height = 500 - margin.top - margin.bottom;
var x = d3.scale.linear().range([0, width]);
var y = d3.scale.linear().range([height, 0]);
var xAxis = d3.svg.axis()
.scale(x)
.orient("bottom");
var yAxis = d3.svg.axis()
.scale(y)
.orient("left");
var svg = d3.select("body").append("svg")
.attr("width", width + margin.left + margin.right)
.attr("height", height + margin.top + margin.bottom)
.append("g")
.attr("transform", "translate(" + margin.left + "," + margin.top + ")");
d3.tsv("results.tsv", function(error, data) {
data.forEach(function(d) {
d.year = +d.year;
d.count = +d.count;
});
x.domain(d3.extent(data.map(function(d) { return d.year; })));
y.domain([0, d3.max(data, function(d) { return d.count * 1.05; })]);
svg.append("g")
.attr("class", "x axis")
.attr("transform", "translate(0," + height + ")")
.call(xAxis);
svg.append("g")
.attr("class", "y axis")
.call(yAxis)
.append("text")
.attr("transform", "rotate(-90)")
.attr("y", 6)
.attr("dy", ".71em")
.style("text-anchor", "end")
.text("count");
svg.selectAll(".bar")
.data(data)
.enter().append("rect")
.attr("class", "bar")
.attr("x", function(d) { return x(d.year); })
.attr("y", function(d) { return y(d.count); })
.attr("width", function(d) { return 7; })
.attr("height", function(d) { return height - y(d.count); });
});
</script>
year count
1922 6
1923 0
1924 5
1925 7
1926 19
1927 43
1928 52
1929 93
1930 40
1931 35
1932 11
1933 6
1934 29
1935 24
1936 25
1937 28
1938 19
1939 35
1940 52
1941 32
1942 24
1943 14
1944 15
1945 30
1946 29
1947 57
1948 43
1949 60
1950 84
1951 74
1952 77
1953 133
1954 123
1955 275
1956 565
1957 598
1958 583
1959 592
1960 424
1961 572
1962 605
1963 902
1964 945
1965 1120
1966 1377
1967 1718
1968 1867
1969 2211
1970 2350
1971 2131
1972 2288
1973 2596
1974 2186
1975 2482
1976 2179
1977 2502
1978 2926
1979 3108
1980 3101
1981 3167
1982 3597
1983 3386
1984 3368
1985 3578
1986 4220
1987 5125
1988 5613
1989 6672
1990 7258
1991 8650
1992 9547
1993 10529
1994 12127
1995 13260
1996 14135
1997 15182
1998 15858
1999 18262
2000 19293
2001 21604
2002 23472
2003 27389
2004 29618
2005 34960
2006 37546
2007 39414
2008 34770
2009 31051
2010 9397
2011 1
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment