Skip to content

Instantly share code, notes, and snippets.

@doryokujin
Created June 28, 2012 08:07
Show Gist options
  • Star 1 You must be signed in to star a gist
  • Fork 2 You must be signed in to fork a gist
  • Save doryokujin/3009823 to your computer and use it in GitHub Desktop.
Save doryokujin/3009823 to your computer and use it in GitHub Desktop.
Book-Crossing Dataset
-- 3.1.1 出版年別分布 --
td query -w -d book_crossing_dataset "
SELECT year, cnt
FROM
(
SELECT 1975 AS year, COUNT(year_of_publication) AS cnt
FROM books
WHERE year_of_publication < 1970
UNION ALL
SELECT year_of_publication AS year, COUNT(year_of_publication) AS cnt
FROM books
WHERE 1970 <= year_of_publication
AND year_of_publication < 2005
GROUP BY year_of_publication
UNION ALL
SELECT 2005 AS year, COUNT(year_of_publication) AS cnt
FROM books
WHERE 2005 <= year_of_publication
) t1
ORDER BY year
"
-- 3.1.2 出版年別分布 --
td query -w -d book_crossing_dataset -f csv -o book_dist_of_year.csv "
SELECT year_of_publication, cnt, ROUND(cnt/total_cnt*10000)/100 AS rate
FROM
(
SELECT year_of_publication, COUNT(year_of_publication) AS cnt, 1 AS one
FROM books
WHERE 1970 <= year_of_publication
AND year_of_publication < 2005
GROUP BY year_of_publication
) t1
JOIN
(
SELECT COUNT(year_of_publication) AS total_cnt, 1 AS one
FROM books
WHERE 1970 <= year_of_publication
AND year_of_publication < 2005
) t2
ON
(t1.one=t2.one)
ORDER BY year_of_publication
"
-- 3.2 出版社別分布 --
td query -w -d book_crossing_dataset -f csv -o book_dist_of_publisher.csv "
SELECT publisher, COUNT(publisher) AS cnt
FROM books
GROUP BY publisher
ORDER BY cnt DESC
LIMIT 30
"
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment