Skip to content

Instantly share code, notes, and snippets.

View chengjun's full-sized avatar
🏠
Working from home

Cheng-Jun Wang chengjun

🏠
Working from home
View GitHub Profile
@zhicongchen
zhicongchen / gensim_word2vec_procrustes_align.py
Last active June 3, 2024 01:54 — forked from quadrismegistus/gensim_word2vec_procrustes_align.py
Code for aligning two gensim word2vec models using Procrustes matrix alignment (updated for compatibility with Gensim 4.0 API). The code is modified from https://gist.github.com/quadrismegistus/09a93e219a6ffc4f216fb85235535faf, which is originally ported from HistWords by William Hamilton: https://github.com/williamleif/histwords
def smart_procrustes_align_gensim(base_embed, other_embed, words=None):
"""
Original script: https://gist.github.com/quadrismegistus/09a93e219a6ffc4f216fb85235535faf
Procrustes align two gensim word2vec models (to allow for comparison between same word across models).
Code ported from HistWords <https://github.com/williamleif/histwords> by William Hamilton <wleif@stanford.edu>.
First, intersect the vocabularies (see `intersection_align_gensim` documentation).
Then do the alignment on the other_embed model.
Replace the other_embed model's syn0 and syn0norm numpy matrices with the aligned version.
Return other_embed.
import numpy as np
import gensim
def smart_procrustes_align_gensim(base_embed, other_embed, words=None):
"""Procrustes align two gensim word2vec models (to allow for comparison between same word across models).
Code ported from HistWords <https://github.com/williamleif/histwords> by William Hamilton <wleif@stanford.edu>.
(With help from William. Thank you!)
First, intersect the vocabularies (see `intersection_align_gensim` documentation).
Then do the alignment on the other_embed model.
Replace the other_embed model's syn0 and syn0norm numpy matrices with the aligned version.
@dmasad
dmasad / Grey's Anatomy ERGM with Python.ipynb
Created June 29, 2015 14:08
Replicating the Grey's Anatomy Hookup ERGM with PyMC
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
@chengjun
chengjun / 53free.r
Created November 11, 2012 06:04 — forked from cdesante/53free.r
five.thirty.free
library(plyr)
library(ggplot2)
library(grid)
election.data <- read.csv("http://www.oberlin.edu/faculty/cdesante/assets/downloads/election2012.csv")
five.thirty.free <- function (SIMS) {
Mode <- function(X) {
@stevenworthington
stevenworthington / ipak.R
Created July 25, 2012 19:44
Install and load multiple R packages at once
# ipak function: install and load multiple R packages.
# check to see if packages are installed. Install them if they are not, then load them into the R session.
ipak <- function(pkg){
new.pkg <- pkg[!(pkg %in% installed.packages()[, "Package"])]
if (length(new.pkg))
install.packages(new.pkg, dependencies = TRUE)
sapply(pkg, require, character.only = TRUE)
}
@halpo
halpo / 000-instructions.md
Created June 19, 2012 16:40
harvestr R users conference presentation.

Building a beamer presentation with knitr.

Introduction

The documents included are the input for knitr. In addition you need to have the tool pandoc installed. I also use a custom beamer template to add the University of Utah \institute command to the template. It also changes the indentation some.

Steps

  1. knit document with
# 导入数据
text <- readLines('d:\\honglou.txt',encoding='UTF-8')
library(ggplot2)
library(rmmseg4j)
library(tm)
library(MASS)
library(proxy)
#去除空白行
@chengjun
chengjun / caschools-analysis.rmd
Created May 20, 2012 01:15 — forked from jeromyanglim/caschools-analysis.rmd
California schools analysis demonstrating use of R Markdown
`r opts_chunk$set(cache=TRUE)`
This is a quick set of analyses of the California Test Score dataset. The post was produced using R Markdown in RStudio 0.96. The main purpose of this post is to provide a case study of using R Markdown to prepare a quick reproducible report. It provides examples of using plots, output, in-line R code, and markdown. The post is designed to be read along side the R Markdown source code, which is available as a gist on github.
<!-- more -->
### Preliminaries
* This post builds on my earlier post which provided a guide for [Getting Started with R Markdown, knitr, and RStudio 0.96](jeromyanglim.blogspot.com/2012/05/getting-started-with-r-markdown-knitr.html)
* The dataset analysed comes from the `AER` package which is an accompaniment to the book [Applied Econometrics with R](http://www.amazon.com/Applied-Econometrics-R-Use/dp/0387773169) written by [Christian Kleiber](http://wwz.unibas.ch/personen/profil/person/kleiber/) and [Achim Zeileis](http://eeecon.uibk.ac.at/~zeileis/
library(ggplot2)
library(colorRamps)
TawiTawiPop <- c(17000, 45000, 46000, 59000, 79000, 110000, 143000, 195000, 228204,
250718, 322317, 450346, 366550)
YearNames <- c("1903", "1918", "1939", "1948", "1960", "1970", "1975", "1980", "1990",
"1995", "2000", "2007", "2010")
qplot(YearNames, TawiTawiPop,
xlab = expression(bold("Censal Year")),
@theconektd
theconektd / github.css
Created April 30, 2012 02:11
Github Markdown CSS - for Markdown Editor Preview
body {
font-family: Helvetica, arial, sans-serif;
font-size: 14px;
line-height: 1.6;
padding-top: 10px;
padding-bottom: 10px;
background-color: white;
padding: 30px; }
body > *:first-child {