Mike Dusenberry dusenberrymw

## tmux_tips_and_tricks.md

      
              1 file
            
          
              4 forks
            
          
              1 comment
            
          
              19 stars
            
          
                dusenberrymw
                / tmux_tips_and_tricks.md
            
            
              Last active
              April 8, 2024 00:41
            
              
                Tmux Tips & Tricks
              
          
    Quick cheat sheet of helpful tmux commands


tmux new - Create and attach to a new session.
tmux new -s NAME_HERE - Create and attach to a new session named NAME_HERE.
CTRL-b, d - Detach (i.e. exit) from the currently-opened tmux session (alternatively, tmux detach).  Note, this means press and hold CTRL, press b, release both, press d.
tmux ls - Show list of tmux sessions.
tmux a - Attach to the previously-opened tmux session.
tmux a -t NAME_HERE - Attach to the tmux session named NAME_HERE.
CTRL-d - Delete (i.e. kill) currently-opened tmux session (alternatively tmux kill-session).
CTRL-b, [ - Enter copy mode, and enable scrolling in currently-opened tmux session. Press q to exit.
CTRL-b, " - Split window horizontally (i.e. split and add a pane below).


## proxy.pac
// Proxy PAC File
// - Used to redirect certain addresses to the server through the SOCKS ssh port (1280 for this file), i.e.
//   tunnel traffic through server.
// - Useful for easily accessing webpages from services running on a server (Jupyter notebooks, TensorBoard, Spark UI, etc.)
//   that is otherwise locked down by a firewall.
// - To install on OS X/MacOS, go to "Settings->Network->Advanced->Proxies->Automatic Proxy Configuration"
//   and paste the local file url (`file:///absolute/path/to/proxy.pac`).
// - Alternatively, use `./reinstall_proxy.sh`.
// - SSH to the server with `ssh -D 1280 ....`.
function FindProxyForURL(url, host) {

## ml_dl_scenarios.md

      
              1 file
            
          
              4 forks
            
          
              7 comments
            
          
              13 stars
            
          
                dusenberrymw
                / ml_dl_scenarios.md
            
            
              Last active
              January 3, 2024 07:14
            
              
                Interesting Machine Learning / Deep Learning Scenarios
              
          
    Interesting Machine Learning / Deep Learning Scenarios

This gist aims to explore interesting scenarios that may be encountered while training machine learning models.
Increasing validation accuracy and loss

Let's imagine a scenario where the validation accuracy and loss both begin to increase.  Intuitively, it seems like this scenario should not happen, since loss and accuracy seem like they would have an inverse relationship.  Let's explore this a bit in the context of a binary classification problem in which a model parameterizes a Bernoulli distribution (i.e., it outputs the "probability" of the true class) and is trained with the associated negative log likelihood as the loss function (i.e., the "logistic loss" == "log loss" == "binary cross entropy").
Imagine that when the model is predicting a probability of 0.99 for a "true" class, the model is both correct (assuming a decision threshold of 0.5) and has a low loss since it can't do much better for that example. Now, imagine that the model

  
## mwd.sleepMac.plist
<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE plist PUBLIC "-//Apple//DTD PLIST 1.0//EN" "http://www.apple.com/DTDs/PropertyList-1.0.dtd">
<plist version="1.0">
<dict>
    <key>Label</key>
    <string>mwd.sleepMac</string>
    <key>ProgramArguments</key>
    <array>
        <string>/path/to/sleepMac.sh</string>
    </array>

## spark_tips_and_tricks.md

      
              1 file
            
          
              20 forks
            
          
              1 comment
            
          
              74 stars
            
          
                dusenberrymw
                / spark_tips_and_tricks.md
            
            
              Last active
              February 8, 2023 05:11
            
              
                Tips and tricks for Apache Spark.
              
          
    Spark Tips & Tricks

Misc. Tips & Tricks


If values are integers in [0, 255], Parquet will automatically compress to use 1 byte unsigned integers, thus decreasing the size of saved DataFrame by a factor of 8.
Partition DataFrames to have evenly-distributed, ~128MB partition sizes (empirical finding).  Always err on the higher side w.r.t. number of partitions.
Pay particular attention to the number of partitions when using flatMap, especially if the following operation will result in high memory usage. The flatMap op usually results in a DataFrame with a [much] larger number of rows, yet the number of partitions will remain the same. Thus, if a subsequent op causes a large expansion of memory usage (i.e. converting a DataFrame of indices to a DataFrame of large Vectors), the memory usage per partition may become too high. In this case, it is beneficial to repartition the output of flatMap to a number of partitions that will safely allow for appropriate partition memory sizes, based upon the


## 1.rsync_tips_and_tricks.md

      
              2 files
            
          
              2 forks
            
          
              0 comments
            
          
              5 stars
            
          
                dusenberrymw
                / 1.rsync_tips_and_tricks.md
            
            
              Last active
              December 20, 2022 13:27
            
              
                Rsync Tips & Tricks
              
          
    Rsync Tips & Tricks


rsync -auzPhv --delete --exclude-from=rsync_exclude.txt SOURCE/ DEST/ -n

-a -> --archive; recursively sync, preserving symbolic links and all file metadata
-u -> --update; skip files that are newer on the receiver; sometimes this is inaccurate (due to Git, I think...)
-z -> --compress; compression
-P -> --progress + --partial; show progress bar and resume interupted transfers
-h -> --human-readable; human-readable format
-v -> --verbose; verbose output


-n -> --dry-run; dry run; use this to test, and then remove to actually execute the sync


## notebook.json
{
  "CodeCell": {
    "cm_config": {
      "indentUnit": 2
    }
  }
}

## keras_tips_and_tricks.md

      
              1 file
            
          
              0 forks
            
          
              2 comments
            
          
              4 stars
            
          
                dusenberrymw
                / keras_tips_and_tricks.md
            
            
              Last active
              July 28, 2020 11:59
            
              
                Keras Tips & Tricks
              
          
    Keras Tips & Tricks

fit_generator


Can using either threading or multiprocessing for concurrent and parallel processing, respectively, of the data generator.
In the threading approach (model.fit_generator(..., pickle_safe=False)), the generator can be run concurrently (but not parallel) in multiple threads, with each thread pulling the next available batch based on the shared state of the generator and placing it in a shared queue.  However, the generator must be threadsafe (i.e. use locks at synchronization points).
Due to the Python global interpreter lock (GIL), the threading option generally does not benefit from >1 worker (i.e. model.fit_generator(..., nb_worker=1) is best).  One possible use case in which >1 threads could be beneficial is the presence of exceptionally long IO times, during which the GIL will be released to enable concurrency.  Note also that TensorFlow's `session.run(


## tensorflow_tips_and_tricks.md

      
              1 file
            
          
              1 fork
            
          
              0 comments
            
          
              9 stars
            
          
                dusenberrymw
                / tensorflow_tips_and_tricks.md
            
            
              Last active
              April 2, 2020 16:49
            
              
                Tips and tricks for TensorFlow, Keras, CUDA, etc.
              
          
    TensorFlow Tips & Tricks

GPU Memory Issues


nvidia-smi to check for current memory usage.
watch -n 1 nvidia-smi to monitor memory usage every second.
Often, extra Python processes can stay running in the background, maintaining a hold on the GPU memory,
even if nvidia-smi doesn't show it.

Probably due to running Keras in a notebook, and then running the cell that starts the processes again,
since this will fork the current process, which has a hold on GPU memory.  In the future, restart the kernel first,
and stop all process before exiting (even though they are daemons and should stop automatically when the parent process ends).


## git-cherry-pick-with-committer.sh
#!/bin/bash
#
# Copyright (c) 2013-2014 David Ingram
#
# Permission is hereby granted, free of charge, to any person obtaining a copy
# of this software and associated documentation files (the "Software"), to deal
# in the Software without restriction, including without limitation the rights
# to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
# copies of the Software, and to permit persons to whom the Software is
# furnished to do so, subject to the following conditions:
	// Proxy PAC File
	// - Used to redirect certain addresses to the server through the SOCKS ssh port (1280 for this file), i.e.
	// tunnel traffic through server.
	// - Useful for easily accessing webpages from services running on a server (Jupyter notebooks, TensorBoard, Spark UI, etc.)
	// that is otherwise locked down by a firewall.
	// - To install on OS X/MacOS, go to "Settings->Network->Advanced->Proxies->Automatic Proxy Configuration"
	// and paste the local file url (`file:///absolute/path/to/proxy.pac`).
	// - Alternatively, use `./reinstall_proxy.sh`.
	// - SSH to the server with `ssh -D 1280 ....`.
	function FindProxyForURL(url, host) {
	<?xml version="1.0" encoding="UTF-8"?>
	<!DOCTYPE plist PUBLIC "-//Apple//DTD PLIST 1.0//EN" "http://www.apple.com/DTDs/PropertyList-1.0.dtd">
	<plist version="1.0">
	<dict>
	<key>Label</key>
	<string>mwd.sleepMac</string>
	<key>ProgramArguments</key>
	<array>
	<string>/path/to/sleepMac.sh</string>
	</array>
	#!/bin/bash
	#
	# Copyright (c) 2013-2014 David Ingram
	#
	# Permission is hereby granted, free of charge, to any person obtaining a copy
	# of this software and associated documentation files (the "Software"), to deal
	# in the Software without restriction, including without limitation the rights
	# to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
	# copies of the Software, and to permit persons to whom the Software is
	# furnished to do so, subject to the following conditions: