Skip to content

Instantly share code, notes, and snippets.

View tuan3w's full-sized avatar

Tuan Nguyen tuan3w

View GitHub Profile
@tuan3w
tuan3w / Intro.md
Last active May 7, 2021 14:02
Pre-trained model for English -> Vietnamese NMT

Datasets

I had such a bad time trying to create english-vietnamese parallel corpus from bilingual stories, but it sucks. It just wastes a lot of time. So I try to find out as much corpora as possible throughout the internet. My final dataset consists of about 2.5M pair of sentences. You can find all corpora here: link

Model

I use OpenNMT to train my nmt model. Thanks Systran and HavardNLP for open source this project. It will help me and many others to understand how a industral translation system might work. The parameters of my model are as follow:

  • Preprocesssing: Using aggressive tokenizer provided by OpenNMT
@tuan3w
tuan3w / ALS2.scala
Last active June 16, 2020 20:23
Implementation of Biased Matrix Factorization on Spark
/*
* Licensed to the Apache Software Foundation (ASF) under one or more
* contributor license agreements. See the NOTICE file distributed with
* this work for additional information regarding copyright ownership.
* The ASF licenses this file to You under the Apache License, Version 2.0
* (the "License"); you may not use this file except in compliance with
* the License. You may obtain a copy of the License at
*
* http://www.apache.org/licenses/LICENSE-2.0
*
# eth0 is public network interface
iptables -P INPUT ACCEPT
iptables -P OUTPUT ACCEPT
# Allow connection RELATED,ESTABLISHED on eth0
iptables -t filter -A INPUT -i eth0 -m conntrack --ctstate RELATED,ESTABLISHED -j ACCEPT
# Prevent DOCKER chain
iptables -t nat -I PREROUTING -m addrtype --dst-type LOCAL -j RETURN
@tuan3w
tuan3w / nginx.conf
Last active February 13, 2019 09:40
Experiment with nginx-http-flv-module
worker_processes auto; #should be 1 for Windows, for it doesn't support Unix domain socket
worker_rlimit_nofile 100000;
error_log logs/error.log error;
#if the module is compiled as a dynamic module and features relevant
#to RTMP are needed, the command below MUST be specified and MUST be
#located before events directive, otherwise the module won't be loaded
@tuan3w
tuan3w / peleenet.py
Created December 28, 2018 03:26
peleenet.py
import torch
import torch.nn as nn
from torch.nn import Parameter
import math
import torch.nn.functional as F
class Scale(nn.Module):
def __init__(self, channels):
super(Scale, self).__init__()
@tuan3w
tuan3w / example.sh
Created August 21, 2018 04:06
Adding elapsed time to video
ffmpeg -f video4linux2 -input_format mjpeg -s 1280x720 -i /dev/video0 \
-vf "drawtext=fontfile=/usr/share/fonts/truetype/dejavu/DejaVuSans-Bold.ttf: \
text=\'%{pts\:gmtime\:0\:%M\\\:%S}\': fontcolor=white@0.8: x=7: y=7: fontsize=24" -vcodec libx264 \
-preset veryfast -f mp4 -pix_fmt yuv420p -y output.mp4
@tuan3w
tuan3w / swarm_register.py
Created October 7, 2017 15:56
swarm_register.py
#!/usr/bin/env python
# -*- coding: utf-8 -*-
# vim:fenc=utf-8
#
# Copyright © 2017 zc <zc@www>
#
# Distributed under terms of the MIT license.
"""
Service registrator agent
@tuan3w
tuan3w / deepvoice_errorlog.txt
Created January 4, 2018 14:45
deepvoice_errorlog.txt
(dev3) ❯ python synthesis.py --hparams="builder=deepvoice3,preset=deepvoice3_ljspeech" checkpoints_deepvoice3/checkpoint_step000390000.pth test.txt samples
Command line args:
{'--checkpoint-postnet': None,
'--checkpoint-seq2seq': None,
'--file-name-suffix': '',
'--help': False,
'--hparams': 'builder=deepvoice3,preset=deepvoice3_ljspeech',
'--max-decoder-steps': '500',
'--output-html': False,
'--replace_pronunciation_prob': '0.0',
@tuan3w
tuan3w / setup.py
Created November 16, 2017 05:50
setup.py
# --------------------------------------------------------
# Fast R-CNN
# Copyright (c) 2015 Microsoft
# Licensed under The MIT License [see LICENSE for details]
# Written by Ross Girshick
# --------------------------------------------------------
import os
from os.path import join as pjoin
import numpy as np
@tuan3w
tuan3w / VertxHystrixSample.java
Created July 28, 2017 04:45
sample circuit breaker with vertx
import com.soundcloud.prometheus.hystrix.HystrixPrometheusMetricsPublisher;
import com.vcc.bigdata.micro.cmd.FailCommand;
import io.vertx.circuitbreaker.CircuitBreaker;
import io.vertx.circuitbreaker.CircuitBreakerOptions;
import io.vertx.circuitbreaker.HystrixMetricHandler;
import io.vertx.core.AbstractVerticle;
import io.vertx.core.http.HttpServer;
import io.vertx.core.http.HttpServerOptions;
import io.vertx.ext.web.Router;