Skip to content

Instantly share code, notes, and snippets.

View yanshengjia's full-sized avatar
🏠
Working from home

Shengjia Yan yanshengjia

🏠
Working from home
View GitHub Profile
@yanshengjia
yanshengjia / mongo_backup_and_clear.sh
Created November 29, 2018 10:54
MongoDB backup and clear
#!/bin/bash
MONGO_DATABASE="annotation"
MONGO_HOST="10.0.5.40"
MONGO_PORT="27017"
TIMESTAMP=`date +%Y-%m-%d-%H-%M-%S`
MONGODUMP_PATH=/usr/bin/mongodump
BACKUPS_DIR=~/sjyan/data/mongodb-backup/
BACKUP_NAME=$TIMESTAMP
SCRIPT_DIR=~/sjyan/scripts/
@yanshengjia
yanshengjia / gitpull.sh
Created August 1, 2018 03:02
Enter each folder and git pull
#!/bin/bash
for f in /Users/yanshengjia/GitHub/*;
do
[ -d $f ] && cd "$f" && echo Entering into $f and git pull
git pull
done;
@yanshengjia
yanshengjia / concat_files.py
Created April 14, 2018 04:09
Concat files
def concat_files():
filenames = ['file1.txt', 'file2.txt']
with open('path/to/output/file', 'w') as outfile:
for fname in filenames:
with open(fname) as infile:
for line in infile:
outfile.write(line)
def concat_binary_files():
with open('output_file.txt','wb') as wfd:
@yanshengjia
yanshengjia / arpa_parser.py
Last active April 8, 2018 07:40
Convert ARPA format language model to JSON
# !/usr/bin/python
# -*- coding:utf-8 -*-
# @author: Shengjia Yan
# @date: 2017-11-29 Wednesday
# @email: i@yanshengjia.com
import json
import codecs
class ARPAParser:
@yanshengjia
yanshengjia / update_fork.md
Last active April 8, 2018 03:10 — forked from CristinaSolana/gist:1885435
Keeping a fork up to date

1. Clone your fork:

git clone git@github.com:YOUR-USERNAME/YOUR-FORKED-REPO.git

2. Add remote from original repository in your forked repository:

cd into/cloned/fork-repo
git remote add upstream git://github.com/ORIGINAL-DEV-USERNAME/REPO-YOU-FORKED-FROM.git
git fetch upstream
@yanshengjia
yanshengjia / mongo_backup.sh
Last active April 23, 2018 06:01 — forked from sheharyarn/mongo_backup.sh
Mongodump Shell Script for Cronjob
#!/bin/bash
MONGO_DATABASE="annotation"
MONGO_HOST="127.0.0.1"
MONGO_PORT="27017"
TIMESTAMP=`date +%Y-%m-%d-%H-%M-%S`
MONGODUMP_PATH=/usr/bin/mongodump
BACKUPS_DIR=~/sjyan/data/mongodb-backup/
BACKUP_NAME=$TIMESTAMP
SCRIPT_DIR=~/sjyan/scripts/
@yanshengjia
yanshengjia / tornado_cookie_secret_generator.py
Created March 20, 2018 01:11 — forked from didip/tornado_cookie_secret_generator.py
Generates secure cookie secret for Tornado Web Framework
@yanshengjia
yanshengjia / last_element.cpp
Created March 13, 2018 05:33
Find last element in the array
#include <iostream>
using namespace std;
int main() {
int a[] = {1, 2, 3, 4, 5};
int res = *((int *)(&a + 1) - 1);
cout << res << endl;
return 0;
}
@yanshengjia
yanshengjia / line_numbers.py
Last active April 14, 2018 03:52
Check the line numbers of a file
# python2
def check_line_numbers():
file_path = 'test.txt'
num_lines = sum(1 for line in open(file_path))
print(num_lines)
@yanshengjia
yanshengjia / clean.py
Created March 1, 2018 15:14
A python script which cleans the raw corpus
import re
from nltk.corpus import stopwords
from nltk.stem.wordnet import WordNetLemmatizer
def clean(raw_str):
en_stopwords = set(stopwords.words('english'))
lemma = WordNetLemmatizer()
lower_str = raw_str.lower()
punc_free_str = ' '.join(re.findall(r'\w+', lower_str))
stop_free_str = ' '.join([i for i in punc_free_str.split() if i not in en_stopwords])