Skip to content

Instantly share code, notes, and snippets.

@zhangk1551
zhangk1551 / word2vec_and_glove.md
Created June 21, 2021 21:52
The note for Word2vec and GloVe

Word2vec vs GloVe

Paper

Word2vec

Efficient Estimation of Word Representations in Vector Space (2013 Sept) link Distributed Representations of Words and Phrases and their Compositionality (2013 Oct) link

GloVe

GloVe: Global Vectors for Word Representation (2014 Jan) link

Approach

@zhangk1551
zhangk1551 / joint_embedding_exploration.md
Last active June 18, 2021 14:46
Exploration for Joint embeddings

Exploration for Joint embeddings

The Text2Shape task:

  • 3D related.
  • Sentence descriptions instead of word terms.
  • The cosine similarity between the text and shape embeddings need to be meaningful.
  • Don't need to handle general sentences for all the possible 3D scene. Just focus on a small subset of the sentences including furniture description.

A note for sentence/doc embeddings: Summary for sentence/doc embeddings

Word & object shape

DeViSE(2D)

@zhangk1551
zhangk1551 / Statistical_learning_in_NLP.md
Last active June 17, 2021 19:59
Statistical learning in NLP

Statistical learning in NLP

This note briefly introduced three statistical learning methods in NLP applications: the latent dirichlet allocation for topic modeling, the conditional random fields for named entity recognition and matrix factorization for latent semantic analysis.

Latent Dirichlet Allocation

Dirichlet distribution

Dirichlet distribution is a multivariate generalization of the beta distribution. enter image description here enter image description here

Approach

@zhangk1551
zhangk1551 / sentence_and_doc_embeddings.md
Last active June 1, 2021 22:31
Summary for sentence/doc embeddings

Summary for sentence/doc embeddings

Representation

A general definition of representation: transform the origin data(word/sentence/doc, or even images) to vectors that contain information and useful as input for models.
Similar word/sentence/doc tend to be close to each other measured by cosine distance. (map the origin data to a meaningful vector space) It's nice-to-have but not a must. This feature is useful for tasks like semantic similarity comparison, clustering, and information retrieval via semantic search. Examples for representations with/without the cos similarity feature: Word2vec provides the cos similarity feature for word embeddings, and furthermore it preserve linear regularities among words (vector(”King”) - vector(”Man”) + vector(”Woman”)). But average/last embeddings from BERT don't guaranteed the cos similarity feature, and thus have a bad performance when you apply cosine distance directly to measure sentence/doc similarity (even worse than sentence embeddin

@zhangk1551
zhangk1551 / transformer_and_bert_introduction.md
Last active May 24, 2021 19:40
Introduction of Transformer and BERT

Transformer

As the word embeddings, for sentences and articles, there are sequence auto-encoder models, which turn the text into a vector representation, and sequence auto-decoder models, which unfolded a vector representation and returned something meaningful like text, tags, or labels.

In the famous paper "Attention Is All You Need" published in 2017, the researchers in Google proposed Transformer, a encode-decode model only with attension mechanism. Before this paper, there were already many former works about neural network encoder and decoder. However, unlike the Transformer that based solely on attention mechanisms, most of the former encoders/decoders relied on recurrent or convolutional structure. Compared with 1-dimension CNN that can only focus on fixed-length parts of the sentence sequences due to the limitation of convolution kernels, attentionism as the weighted average can handle the whole sentence s

@zhangk1551
zhangk1551 / grounding_paper_reading.md
Last active March 31, 2021 09:15
Grounding Paper Reading

Paper Critiques

  • DeViSE NIPS 2013
  • Deep Multimodal Embedding: Manipulating Novel Objects with Point-clouds, Language and Trajectories ICRA 2017
  • Show, Attend and Tell: Neural Image Caption ICML 2015
  • MAttNet: Modular Attention Network for Referring Expression Comprehension CVPR 2018
  • Vilbert NeurIPS 2019
  • CLIP CVPR 2021
@zhangk1551
zhangk1551 / vim8_plugins_recommendation.md
Last active September 19, 2020 04:57
vim8 Plugins Recommendation

Vim8 Plugins Recommendation

Preface

Before starting the discussion, go through the following three questions:

With vim,

  1. How do you search a string / file in a project? Could you quickly switch between the search results?
  2. How do you compile a project? Could you do it with one shortcut and jump to the exact position where error occurs?
  3. How do you rename a variable? What if there is another variable with the same name in the project?