Skip to content

Instantly share code, notes, and snippets.

@Puriney
Created May 15, 2017 19:54
Show Gist options
  • Save Puriney/98544b779bcb815926f7acf87f537e61 to your computer and use it in GitHub Desktop.
Save Puriney/98544b779bcb815926f7acf87f537e61 to your computer and use it in GitHub Desktop.
Working on sparse matrix in Python: Create Pandas sparse data frame from matrix-market format.
Display the source blob
Display the rendered blob
Raw
{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Working with sparse matrix in Python\n",
"\n",
"Sparse matrix is saved in matrix-market format (<http://math.nist.gov/MatrixMarket/formats.html>)\n",
"in common cases. This post is showing how to read in matrix-market format and create a Pandas dataframe.\n",
"\n",
"*Note: the Pandas version should be at least 0.20*"
]
},
{
"cell_type": "code",
"execution_count": 12,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"0.20.1\n"
]
}
],
"source": [
"import scipy.io\n",
"import scipy.sparse\n",
"import pandas as pd\n",
"print(pd.__version__)\n",
"import numpy as np"
]
},
{
"cell_type": "code",
"execution_count": 8,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"foo.mtx\n",
"%%MatrixMarket matrix coordinate real general\n",
"%=================================================================================\n",
"%\n",
"% This ASCII file represents a sparse MxN matrix with L \n",
"% nonzeros in the following Matrix Market format:\n",
"%\n",
"% +----------------------------------------------+\n",
"% |%%MatrixMarket matrix coordinate real general | <--- header line\n",
"% |% | <--+\n",
"% |% comments | |-- 0 or more comment lines\n",
"% |% | <--+ \n",
"% | M N L | <--- rows, columns, entries\n",
"% | I1 J1 A(I1, J1) | <--+\n",
"% | I2 J2 A(I2, J2) | |\n",
"% | I3 J3 A(I3, J3) | |-- L lines\n",
"% | . . . | |\n",
"% | IL JL A(IL, JL) | <--+\n",
"% +----------------------------------------------+ \n",
"%\n",
"% Indices are 1-based, i.e. A(1,1) is the first element.\n",
"%\n",
"%=================================================================================\n",
" 5 5 8\n",
" 1 1 1.000e+00\n",
" 2 2 1.050e+01\n",
" 3 3 1.500e-02\n",
" 1 4 6.000e+00\n",
" 4 2 2.505e+02\n",
" 4 4 -2.800e+02\n",
" 4 5 3.332e+01\n",
" 5 5 1.200e+01"
]
}
],
"source": [
"!ls *mtx\n",
"!cat foo.mtx"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Step-1: Read in mtx file by scipy\n",
"\n",
"`scipy.io.mmread` is the function to read in matrix-market format and return a `coo` matrix."
]
},
{
"cell_type": "code",
"execution_count": 11,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
" (0, 0)\t1.0\n",
" (1, 1)\t10.5\n",
" (2, 2)\t0.015\n",
" (0, 3)\t6.0\n",
" (3, 1)\t250.5\n",
" (3, 3)\t-280.0\n",
" (3, 4)\t33.32\n",
" (4, 4)\t12.0\n"
]
},
{
"data": {
"text/plain": [
"<5x5 sparse matrix of type '<class 'numpy.float64'>'\n",
"\twith 8 stored elements in COOrdinate format>"
]
},
"execution_count": 11,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"coo_mat = scipy.io.mmread('foo.mtx')\n",
"print(coo_mat)\n",
"coo_mat"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Step-2: `coo` to `csr`/`csc`\n",
"\n",
"Scipy matrix in `coo` layout can be easily converted to other types: `csr` and `csc`."
]
},
{
"cell_type": "code",
"execution_count": 13,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
" (0, 0)\t1.0\n",
" (1, 1)\t10.5\n",
" (3, 1)\t250.5\n",
" (2, 2)\t0.015\n",
" (0, 3)\t6.0\n",
" (3, 3)\t-280.0\n",
" (3, 4)\t33.32\n",
" (4, 4)\t12.0\n"
]
},
{
"data": {
"text/plain": [
"<5x5 sparse matrix of type '<class 'numpy.float64'>'\n",
"\twith 8 stored elements in Compressed Sparse Column format>"
]
},
"execution_count": 13,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"csc_mat = coo_mat.tocsc()\n",
"print(csc_mat)\n",
"csc_mat"
]
},
{
"cell_type": "code",
"execution_count": 14,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
" (0, 0)\t1.0\n",
" (0, 3)\t6.0\n",
" (1, 1)\t10.5\n",
" (2, 2)\t0.015\n",
" (3, 1)\t250.5\n",
" (3, 3)\t-280.0\n",
" (3, 4)\t33.32\n",
" (4, 4)\t12.0\n"
]
},
{
"data": {
"text/plain": [
"<5x5 sparse matrix of type '<class 'numpy.float64'>'\n",
"\twith 8 stored elements in Compressed Sparse Row format>"
]
},
"execution_count": 14,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"csr_mat = coo_mat.tocsr(copy=True)\n",
"print(csr_mat)\n",
"csr_mat"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Step-3: `csr` to Pandas sparse data frame"
]
},
{
"cell_type": "code",
"execution_count": 9,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
" 0 1 2 3 4\n",
"0 1.0 NaN NaN 6.0 NaN\n",
"1 NaN 10.5 NaN NaN NaN\n",
"2 NaN NaN 0.015 NaN NaN\n",
"3 NaN 250.5 NaN -280.0 33.32\n",
"4 NaN NaN NaN NaN 12.00\n"
]
}
],
"source": [
"sp_df = pd.SparseDataFrame(csr_mat)#.fillna(0)\n",
"print(sp_df)"
]
},
{
"cell_type": "code",
"execution_count": 15,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
" 0 1 2 3 4\n",
"0 1.0 0.0 0.000 6.0 0.00\n",
"1 0.0 10.5 0.000 0.0 0.00\n",
"2 0.0 0.0 0.015 0.0 0.00\n",
"3 0.0 250.5 0.000 -280.0 33.32\n",
"4 0.0 0.0 0.000 0.0 12.00\n"
]
}
],
"source": [
"sp_df = pd.SparseDataFrame(csr_mat).fillna(0)\n",
"print(sp_df)"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"collapsed": true
},
"outputs": [],
"source": []
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.5.2"
}
},
"nbformat": 4,
"nbformat_minor": 1
}
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment