Skip to content

Instantly share code, notes, and snippets.

🏠
Working from home

Shengjia Yan yanshengjia

🏠
Working from home
Block or report user

Report or block yanshengjia

Hide content and notifications from this user.

Learn more about blocking users

Contact Support about this user’s behavior.

Learn more about reporting abuse

Report abuse
View GitHub Profile
View mongo_backup_and_clear.sh
#!/bin/bash
MONGO_DATABASE="annotation"
MONGO_HOST="10.0.5.40"
MONGO_PORT="27017"
TIMESTAMP=`date +%Y-%m-%d-%H-%M-%S`
MONGODUMP_PATH=/usr/bin/mongodump
BACKUPS_DIR=~/sjyan/data/mongodb-backup/
BACKUP_NAME=$TIMESTAMP
SCRIPT_DIR=~/sjyan/scripts/
@yanshengjia
yanshengjia / gitpull.sh
Created Aug 1, 2018
Enter each folder and git pull
View gitpull.sh
#!/bin/bash
for f in /Users/yanshengjia/GitHub/*;
do
[ -d $f ] && cd "$f" && echo Entering into $f and git pull
git pull
done;
View concat_files.py
def concat_files():
filenames = ['file1.txt', 'file2.txt']
with open('path/to/output/file', 'w') as outfile:
for fname in filenames:
with open(fname) as infile:
for line in infile:
outfile.write(line)
def concat_binary_files():
with open('output_file.txt','wb') as wfd:
@yanshengjia
yanshengjia / arpa_parser.py
Last active Apr 8, 2018
Convert ARPA format language model to JSON
View arpa_parser.py
# !/usr/bin/python
# -*- coding:utf-8 -*-
# @author: Shengjia Yan
# @date: 2017-11-29 Wednesday
# @email: i@yanshengjia.com
import json
import codecs
class ARPAParser:
@yanshengjia
yanshengjia / update_fork.md
Last active Apr 8, 2018 — forked from CristinaSolana/gist:1885435
Keeping a fork up to date
View update_fork.md

1. Clone your fork:

git clone git@github.com:YOUR-USERNAME/YOUR-FORKED-REPO.git

2. Add remote from original repository in your forked repository:

cd into/cloned/fork-repo
git remote add upstream git://github.com/ORIGINAL-DEV-USERNAME/REPO-YOU-FORKED-FROM.git
git fetch upstream
@yanshengjia
yanshengjia / mongo_backup.sh
Last active Apr 23, 2018 — forked from sheharyarn/mongo_backup.sh
Mongodump Shell Script for Cronjob
View mongo_backup.sh
#!/bin/bash
MONGO_DATABASE="annotation"
MONGO_HOST="127.0.0.1"
MONGO_PORT="27017"
TIMESTAMP=`date +%Y-%m-%d-%H-%M-%S`
MONGODUMP_PATH=/usr/bin/mongodump
BACKUPS_DIR=~/sjyan/data/mongodb-backup/
BACKUP_NAME=$TIMESTAMP
SCRIPT_DIR=~/sjyan/scripts/
@yanshengjia
yanshengjia / tornado_cookie_secret_generator.py
Created Mar 20, 2018 — forked from didip/tornado_cookie_secret_generator.py
Generates secure cookie secret for Tornado Web Framework
View tornado_cookie_secret_generator.py
@yanshengjia
yanshengjia / last_element.cpp
Created Mar 13, 2018
Find last element in the array
View last_element.cpp
#include <iostream>
using namespace std;
int main() {
int a[] = {1, 2, 3, 4, 5};
int res = *((int *)(&a + 1) - 1);
cout << res << endl;
return 0;
}
@yanshengjia
yanshengjia / line_numbers.py
Last active Apr 14, 2018
Check the line numbers of a file
View line_numbers.py
# python2
def check_line_numbers():
file_path = 'test.txt'
num_lines = sum(1 for line in open(file_path))
print(num_lines)
@yanshengjia
yanshengjia / clean.py
Created Mar 1, 2018
A python script which cleans the raw corpus
View clean.py
import re
from nltk.corpus import stopwords
from nltk.stem.wordnet import WordNetLemmatizer
def clean(raw_str):
en_stopwords = set(stopwords.words('english'))
lemma = WordNetLemmatizer()
lower_str = raw_str.lower()
punc_free_str = ' '.join(re.findall(r'\w+', lower_str))
stop_free_str = ' '.join([i for i in punc_free_str.split() if i not in en_stopwords])
You can’t perform that action at this time.