Skip to content

Instantly share code, notes, and snippets.

@yjzhang
Created June 16, 2022 23:57
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save yjzhang/a3f42c41aa614273e11e79a563cd2cfd to your computer and use it in GitHub Desktop.
Save yjzhang/a3f42c41aa614273e11e79a563cd2cfd to your computer and use it in GitHub Desktop.
import subprocess
import numpy as np
from scipy import sparse, io
threshold = 1000
folders = ['M7_5', 'M8_3', 'X5.3.4', 'X5_2', 'X6.1']
for f in folders:
print(f)
data = sparse.csc_matrix(io.mmread(f + '/matrix.mtx.gz'))
print('data loaded')
cell_sums = np.array(data.sum(0)).flatten()
data_filtered = data[:, cell_sums > threshold]
print('data filter calculated')
# TODO: save indices?
np.savetxt(f + '/filtered_indices.txt', cell_sums > threshold, fmt='%d')
data_path_new = f + '/matrix_filtered.mtx'
io.mmwrite(data_path_new, data_filtered)
print('filtered data saved')
subprocess.call(['gzip', data_path_new])
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment