Skip to content

Instantly share code, notes, and snippets.

View bigsquirrel's full-sized avatar

ivanchou bigsquirrel

View GitHub Profile
@bigsquirrel
bigsquirrel / genfiles.sh
Last active August 29, 2015 14:22
random generate a large number of files
#!/bin/bash
for i in $(seq 1 5000);
do
dd if=input of=${i}.data bs=1k count=1024; # bs stands for block size, count stands for the block number
done;
@bigsquirrel
bigsquirrel / parse_dblp.py
Last active August 29, 2015 14:20
parse dblp
#!/usr/bin/python
# filename: parse_dblp.py
# author: ivanchou
import codecs, os
import xml.etree.ElementTree as ET
paper_tag = ('article','inproceedings','proceedings','book',
'incollection','phdthesis','mastersthesis','www')
class AllEntities:
@bigsquirrel
bigsquirrel / dblp_part.xml
Last active August 29, 2015 14:20
parse dblp & small piece of dblp
<?xml version="1.0" encoding="ISO-8859-1"?>
<!DOCTYPE dblp SYSTEM "dblp.dtd">
<dblp>
<article mdate="2011-01-11" key="journals/acta/Saxena96">
<author>Sanjeev Saxena</author>
<title>Parallel Integer Sorting and Simulation Amongst CRCW Models.</title>
<pages>607-619</pages>
<year>1996</year>
<volume>33</volume>
<journal>Acta Inf.</journal>
@bigsquirrel
bigsquirrel / Hw1Grp2.java
Created April 15, 2015 13:41
big data hw1
package com.ivanchou;
import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.*;
import org.apache.hadoop.hbase.HBaseConfiguration;
import org.apache.hadoop.hbase.HColumnDescriptor;
import org.apache.hadoop.hbase.HTableDescriptor;
import org.apache.hadoop.hbase.TableName;
import org.apache.hadoop.hbase.client.HBaseAdmin;
import org.apache.hadoop.hbase.client.HTable;