Skip to content

Instantly share code, notes, and snippets.

View wenming's full-sized avatar

Wenming Ye wenming

  • Amazon Web Services
  • Redmond, WA
View GitHub Profile
///repo lists is just a copy and paste from your repo dashboard. For example: the list could be:
WindowsAzure-TrainingKit/Tutorial-HPCSOAapps
WindowsAzure-TrainingKit/Tutorial-HPCPowershellDeployment
WindowsAzure-TrainingKit/Tutorial-HPCMPIIntro
WindowsAzure-TrainingKit/Tutorial-HPCBasicParametricSweepApps
WindowsAzure-TrainingKit/Tutorial-TPLAzureScaleOut
WindowsAzure-TrainingKit/Tutorial-HPCImageRendering
WindowsAzure-TrainingKit/Tutorial-HPCBLAST
WindowsAzure-TrainingKit/Tutorial-HPCDeployToExistingCluster
Swap notepad with curl.
PS C:\Users\wenmingy> while (1)
{ $filename = get-date -format 'yy-MM-dd-H-m-s'; $filename=$filename + ".log"; echo $filename; start-process -wait notepad $filename}
@wenming
wenming / read index.py
Created May 20, 2012 23:53
opening 3 files to convert matrix indexes to int from string hash
#!/usr/bin/python #http://labrosa.ee.columbia.edu/millionsong/sites/default/files/challenge/train_triplets.txt.zip
songs_file = open('songs.txt', 'r')
users_file = open('users.txt', 'r')
dataset_file = open('train_triplets.txt', 'r')
#dataset_file = open('a.txt', 'r')
songs_count = 0
users_count = 0
dataset_count = 0
users_dict = dict()
@wenming
wenming / MSD sparse matrix read.py
Created May 20, 2012 23:53
MSD sparse matrix read
#!/usr/bin/python
#opens the triplets, saves song & user keys, writes out a simpler matrix based on int user, int song, int rating.
dataset_file = open('train_triplets.txt', 'r') #http://labrosa.ee.columbia.edu/millionsong/sites/default/files/challenge/train_triplets.txt.zip
songs_count = 0
users_count = 0
dataset_count = 0
users_dict = dict()
songs_dict = dict()
@wenming
wenming / Configuring the IPython.parallel
Created May 21, 2012 04:48 — forked from ellisonbg/Configuring the IPython.parallel
Running IPython.parallel on Microsoft Azure
============================
Configuring IPython.parallel
============================
This guide describes the steps needed to run IPython engines on Microsoft Azure to perform a parallel
computation in the Microsoft cloud. We assume (see above guide) that Python, IPython and PyZMQ have
been installed on a set of Azure compute nodes.
1. Install Python, IPython and PyZMQ on a system having open access to the internet. We used an
Ubuntu 11.04 system for this purpose. This system will run the IPython controller and the IPython
program convertsongdat
c
integer nrows, ncols, tnnz
integer nrow(384546), matrix(384546)
integer rowe, matrixe, rowi, coli, ncol, nonzero
c
nrows=384546
ncols=1019318
tnnz=48373586
@wenming
wenming / Twitter (json format).js
Created June 8, 2012 02:00 — forked from gnip/Twitter (json format).js
Twitter Sample Payload, JSON format
{
"coordinates": null,
"created_at": "Thu Oct 21 16:02:46 +0000 2010",
"favorited": false,
"truncated": false,
"id_str": "28039652140",
"entities": {
"urls": [
{
"expanded_url": null,
@wenming
wenming / gist:2941492
Created June 16, 2012 14:32
Resources for Azure scheduler
Hefinition of HPC:
High Performance Computing (HPC) is the use of servers, clusters, and supercomputers – plus associated software, tools, components, storage, and services – for scientific, engineering, or analytical tasks that are particularly intensive in computation, memory usage, or data management. HPC is used by scientists and engineers both in research and in production across industry, government, and academia. Within industry, HPC can frequently be distinguished from general business computing in that companies generally will use HPC applications to gain advantage in their core endeavors – e.g., finding oil, designing automobile parts, or protecting clients’ investments – as opposed to non-core endeavors such as payroll management or resource planning.
Azure HPC scheduler is a great way to run batch workload including but not limited to HPC.
The Azure HPC Scheduler includes 3 programming models:
MPI, SOA, and Parametric sweep.
MPI is a traditional HPC programming model which you can look up on
@wenming
wenming / hadooponazure
Created June 16, 2012 14:41
Resources for Hadoop on Windows Azure
Hadooponazure.com is strictly a private CTP for microsoft's hadoop distro. It supports HIVE, PIG, a javascript console, a web portal. You can also terminal service into the actual clusters as needed. There's a lot of tutorials in the training kit, there's a deck and there's a bunch of tutorials.
You should also be able to find content on windowsazure.com
http://www.windowsazure.com/en-us/develop/net/scenarios/big-data/
http://www.windowsazure.com/en-us/develop/net/how-to-guides/hadoop/
I recommend going through at least one of these tutorials:
http://www.windowsazure.com/en-us/develop/net/tutorials/hadoop-marketplace/
and perhaps look at this deck in addition to the one included in the training kit. the http://view.officeapps.live.com/op/view.aspx?src=http%3a%2f%2fvideo.ch9.ms%2fteched%2f2012%2fna%2fAZR325.pptx
@wenming
wenming / parseJson.py
Created July 25, 2012 22:15
parse json
#!/usr/bin/python
import os
import sys
import json
import pprint
file = open("twitter_stream_seq2.txt", 'r')
lines = file.readlines()
i = 0
str = ""