Skip to content

Instantly share code, notes, and snippets.

Avatar
🎼

Qiuzhuang Lian Qiuzhuang

🎼
View GitHub Profile
@Qiuzhuang
Qiuzhuang / repositories
Created Jul 8, 2021 — forked from alswl/repositories
sbt repositories in China(mirror)
View repositories
[repositories]
local
huaweicloud-ivy: https://mirrors.huaweicloud.com/repository/ivy/, [organization]/[module]/(scala_[scalaVersion]/)(sbt_[sbtVersion]/)[revision]/[type]s/[artifact](-[classifier]).[ext]
huaweicloud-maven: https://mirrors.huaweicloud.com/repository/maven/
bintray-typesafe-ivy: https://dl.bintray.com/typesafe/ivy-releases/, [organization]/[module]/(scala_[scalaVersion]/)(sbt_[sbtVersion]/)[revision]/[type]s/[artifact](-[classifier]).[ext]
bintray-sbt-plugins: https://dl.bintray.com/sbt/sbt-plugin-releases/, [organization]/[module]/(scala_[scalaVersion]/)(sbt_[sbtVersion]/)[revision]/[type]s/[artifact](-[classifier]).[ext], bootOnly
# aliyun not works for ivy
# aliyun-ivy: https://maven.aliyun.com/repository/public/, [organization]/[module]/(scala_[scalaVersion]/)(sbt_[sbtVersion]/)[revision]/[type]s/[artifact](-[classifier]).[ext]
# aliyun-public-mirror: https://maven.aliyun.com/repository/public/
View gist:fdf55e99264cf0704eb08a3eeb2b9576

Install the prerequisites (the last one for numpy):

sudo apt install libopenblas-dev libblas-dev m4 cmake cython python3-yaml libatlas-base-dev

Increase the swap size:

  • Stop the swap : sudo dphys-swapfile swapoff
  • Modify the size of the swap by editing as root the following file : /etc/dphys-swapfile. Modify the valiable CONF_SWAPSIZE and change its value to CONF_SWAPSIZE=2048
  • Run following from command prompt: dphys-swapfile setup to update the changes.
  • Start the swap back again: sudo dphys-swapfile swapon
View WGET_Large_Files_GDrive.md

This document walks you through the steps to prepare a wget compatible link from a file that is located in your Google Drive.

Motivation: When working in Deep Learning, we often use Google Colab, Kaggle Kernels, or Cloud Instances for training our models on GPUs. But the problem that comes with it is we often have to upload all the necessary files required to get things up and running. This is particularly problematic when we have a large dataset and this cannot be uploaded/gathered directly (sometimes, scp does not work as well). We may have a dataset stored in our Google Drives. In situations like that, we generally create a wget compatible link from the file (typically the dataset) located in our Google Drive (this document only deals with Google Drive).

Steps:

  • Right click on the file (located in Google Drive) and click on "Share".
  • In the Link sharing on section, change the permissions of your file to "Anyone with the link can view" and copy the link.
  • Now, the link should resemble `
View spark_flame_graphs.md

Generating Flame Graphs for Apache Spark

Flame graphs are a nifty debugging tool to determine where CPU time is being spent. Using the Java Flight recorder, you can do this for Java processes without adding significant runtime overhead.

When are flame graphs useful?

Shivaram Venkataraman and I have found these flame recordings to be useful for diagnosing coarse-grained performance problems. We started using them at the suggestion of Josh Rosen, who quickly made one for the Spark scheduler when we were talking to him about why the scheduler caps out at a throughput of a few thousand tasks per second. Josh generated a graph similar to the one below, which illustrates that a significant amount of time is spent in serialization (if you click in the top right hand corner and search for "serialize", you can see that 78.6% of the sampled CPU time was spent in serialization). We used this insight to spee

View getconf.sh
#!/bin/bash
# simple shmsetup script
page_size=`getconf PAGE_SIZE`
phys_pages=`getconf _PHYS_PAGES`
shmall=`expr $phys_pages / 2`
shmmax=`expr $shmall \* $page_size`
echo kernel.shmmax = $shmmax
echo kernel.shmall = $shmall