Skip to content

Instantly share code, notes, and snippets.

Docker

docker <management_command> <sub_command>

Docker hub (https://hub.docker.com/) has a lot of community and official the docker images

Containers

Start a docker container

@wael34218
wael34218 / Vim.md
Last active December 16, 2021 16:23

Vim Basics

Basic Command Description
h, j, k, l ARROWS Move cursor left, down, up, right (You can also use arrows)
H, M, L Go to the first, middle and last line of current screen
ctrl+ [f, b] Jump forward, backward one full screen
ctrl+ [d, u] Jump down, up one half screen
w, b Move word forward, backward
0, $ Jump to start, end of line
@wael34218
wael34218 / Screen.md
Last active December 16, 2021 16:23

Screen Command

Command Description
screen –S NAME Create new screen
screen -ls List all screens started on the machine
screen –r NAME Reattach a session (if it is detached)
screen –x NAME Reattach to a session even if it is attached by other terminal.
[ctrl + a] [d] Detach from the current screen. Executed from inside the screen
[ctrl + d] Terminate screen. Executed from inside the screen

Git Commands

Initializing/Cloning a new project and status commands

Command Description
git init . Starting a new local git repository
git clone <RURL>:<Repo> [--branch <BN> --single-branch] Coping the repository from remote server [for cloning only one branch]
git status Show the working tree status
git log Show commit logs

Bash in Data Preparation

Balancing dataset based on labels

Clean arabic dataset

Limit number of words per sentence

Shuf clean and uniq

@wael34218
wael34218 / SysAdminCommands.md
Last active December 16, 2021 16:22
System Admin commands I usually use

Mounting new disks permanently

1- Check the disk you want to mount:

sudo blkid

Then you can mount it using

mount /dev/sda2 /path/to/mountpoint

TensorFlow

TFRecords

To create a TF Record:

  1. Open a TFRecords file using tf.python_io.TFRecordWriter
  2. Convert your data into the proper data type of the feature using tf.train.Int64List, tf.train.BytesList, or tf.train.FloatList
  3. Create a feature using tf.train.Feature and pass the converted data to it
  4. Create an Example protocol buffer using tf.train.Example and pass the feature to it

05/05/2018

2018: Speech2Vec: A Sequence-to-Sequence Framework for Learning Word Embeddings from Speech

Projects audio files that contains one word of speech into a hyper-dimension space just like Word2Vec. Uses "Force Aligment" to split audio into words (which requires text). Pad the audio segments with zeros, do MFCC, feed into encoder-decoder which uses RMSE. They also add noise to the signal and make the network denoise it. LibriSpeech 500 hour of audio. Not sure how it can incorporated in an ASR or TTS systems. The audio file has to be paired with a text otherwise Speech2Vec cannot split the audio file into words using "Forced Alignment" method. It is used to query if the spoken word is similar to an existing word in the corpus.

2016: Neural Machine Translation of Rare Words with Subword Units (BPE)

BPE data compression tool that combines most frequent pair of bytes with one. It works well with Named Entity, loadwords and morphologically complex words. Handles OOVs well and rare words. You can

@wael34218
wael34218 / AWK.md
Last active December 16, 2021 16:22

AWK Basics

Introduction

Alfred Aho, Peter Weinberger, and Brian Kernighan - Awk is a utility that enables a programmer to write tiny but effective programs in the form of statements that define text patterns that are to be searched for in each line of a document and the action that is to be taken when a match is found within a line.

AWK Operations:

  • Scans a file line by line
  • Splits each input line into fields
  • Compares input line/fields to pattern
@wael34218
wael34218 / Tmux.md
Last active December 16, 2021 16:22

TMux Command

Command Description
tmux new -s NAME Create new tmux session
tmux ls List all sessions started on the machine/user
tmux a -t NAME Reattach a tmux session
tmux kill-session -t NAME Terminate session