Petr Kuderov pkuderov

## ds-project-organization.md

      
              1 file
            
          
              45 forks
            
          
              21 comments
            
          
              243 stars
            
          
                ericmjl
                / ds-project-organization.md
            
            
              Last active
              July 1, 2024 08:49
            
              
                How to organize your Python data science project
              
          
    UPDATE: I have baked the ideas in this file inside a Python CLI tool called pyds-cli. Please find it here: https://github.com/ericmjl/pyds-cli
How to organize your Python data science project

Having done a number of data projects over the years, and having seen a number of them up on GitHub, I've come to see that there's a wide range in terms of how "readable" a project is. I'd like to share some practices that I have come to adopt in my projects, which I hope will bring some organization to your projects.
Disclaimer: I'm hoping nobody takes this to be "the definitive guide" to organizing a data project; rather, I hope you, the reader, find useful tips that you can adapt to your own projects.
Disclaimer 2: What I’m writing below is primarily geared towards Python language users. Some ideas may be transferable to other languages; others may not be so. Please feel free to remix whatever you see here!

  
## svg2pdf.bash
#!/bin/bash
#
# Convert an SVG file to a PDF file by using headless Chrome.
#

if [ $# -ne 2 ]; then
  echo "Usage: ./svg2pdf.bash input.svg output.pdf" 1>&2
  exit 1
fi

## lru.py
"""
Simplified Implementation of the Linear Recurrent Unit
------------------------------------------------------
We present here a simplified JAX implementation of the Linear Recurrent Unit (LRU).
The state of the LRU is driven by the input $(u_k)_{k=1}^L$ of sequence length $L$
according to the following formula (and efficiently parallelized using an associative scan):
$x_{k} = \Lambda x_{k-1} +\exp(\gamma^{\log})\odot (B u_{k})$,
and the output is computed at each timestamp $k$ as follows: $y_k = C x_k + D u_k$.
In our code, $B,C$ follow Glorot initialization, with $B$ scaled additionally by a factor 2
to account for halving the state variance by taking the real part of the output projection.
	#!/bin/bash
	#
	# Convert an SVG file to a PDF file by using headless Chrome.
	#

	if [ $# -ne 2 ]; then
	echo "Usage: ./svg2pdf.bash input.svg output.pdf" 1>&2
	exit 1
	fi
	"""
	Simplified Implementation of the Linear Recurrent Unit
	------------------------------------------------------
	We present here a simplified JAX implementation of the Linear Recurrent Unit (LRU).
	The state of the LRU is driven by the input $(u_k)_{k=1}^L$ of sequence length $L$
	according to the following formula (and efficiently parallelized using an associative scan):
	$x_{k} = \Lambda x_{k-1} +\exp(\gamma^{\log})\odot (B u_{k})$,
	and the output is computed at each timestamp $k$ as follows: $y_k = C x_k + D u_k$.
	In our code, $B,C$ follow Glorot initialization, with $B$ scaled additionally by a factor 2
	to account for halving the state variance by taking the real part of the output projection.