Eugene Yan eugeneyan

## gist:dd86fc1029ff31038aece03a8b6478cb
aiterate

## mandelbrot-mojo.md

      
              1 file
            
          
              3 forks
            
          
              8 comments
            
          
              40 stars
            
          
                eugeneyan
                / mandelbrot-mojo.md
            
            
              Last active
              April 4, 2024 15:52
            
              
                Benchmarking Mojo vs. Python on Mandelbrot sets
              
          
    Mandelbrot in Mojo with Python plots

Not only Mojo is great for writing high-performance code, but it also allows us to leverage huge Python ecosystem of libraries and tools. With seamless Python interoperability, Mojo can use Python for what it's good at, especially GUIs, without sacrificing performance in critical code. Let's take the classic Mandelbrot set algorithm and implement it in Mojo.
We'll introduce a Complex type and use it in our implementation.
Mandelbrot in python


## convert_tag_format.py
"""
Fixes tags that were converted to links during Obsidian import.

Specifically, it is the first 3 line and the line contains "tags:", convert all [[tag name]] to #tag-name
"""
import os
import re
from pathlib import Path

DIR = '/Users/eugeneya/obsidian-vault/'

## iterative_git.py
"""
Iteratively loop through all files in DIR and add-commit-push them to REPO.

This script should sit in your obsidian vault.
"""
from pathlib import Path
from git import Repo
import os

DIR = '/Users/eugene/obsidian-vault/assets'

## setup_nvt_t4r_pytorch
# Start a SageMaker notebook instance (ml.p3.2xlarge) and open a terminal

# Upload the conda yml from here: https://gist.github.com/eugeneyan/3435e05dd675b9ee2af164214536752d

# Install NVTabular
conda env create -f=SageMaker/nvt_t4r.yml

# Activate conda env
source anaconda3/etc/profile.d/conda.sh
conda activate nvt_t4r

## nvt.yml
# Based on https://github.com/NVIDIA-Merlin/NVTabular/blob/main/conda/environments/nvtabular_dev_cuda11.0.yml
name: nvt_t4r
channels:
  - rapidsai
  - nvidia
  - conda-forge
  - defaults
dependencies:
  - nvtabular
  - python>=3.7

## swing.py
for i in xrange(0, len(u2items)):
    wi = math.pow(len(u2items[i]) + 5, -0.35)
    for j in xrange(i + 1, len(u2items)):
        intersection = u2items[i] & u2items[j]
        wj = wi * math.pow(len(u2items[j]) + 5, -0.35)
        for product_id in intersection:
            i2i[product_id] = i2i.get(product_id, 0.0) + wj / (1 + len(intersection))

# u2items = array of users and their items
# u2items[i] = items user i clicked on

## data-discovery-comparison.txt
|                             | Search | Recommendations | Schemas & Description | Data Preview | Column Statistics | Space/cost metrics | Ownership | Top Users | Lineage | Change Notification | Open Source | Documentation | Supported Sources                                     | Push or Pull |
|-----------------------------|--------|-----------------|-----------------------|--------------|-------------------|--------------------|-----------|-----------|---------|---------------------|-------------|---------------|-------------------------------------------------------|--------------|
| Amundsen (Lyft)             | ✔      | ✔               | ✔                     | ✔            | ✔                 |                    | ✔         | ✔         | Todo    |                     | ✔           | ✔             | Hive, Redshift, Druit, RDBMS, Presto, Snowflake, etc. | Pull         |
| Datahub (LinkedIn)          | ✔      |                 | ✔                     |              |                   |

## testing_ml_setup.sh
# Clone and setup environment
git clone https://github.com/eugeneyan/testing-ml.git
cd testing-ml
make setup

# Run test suite
make check

## test_rf_better_at_same_depth.py
def test_rf_better_than_dt(dummy_titanic):
    X_train, y_train, X_test, y_test = dummy_titanic

    dt = DecisionTree(depth_limit=10)
    dt.fit(X_train, y_train)

    rf = RandomForest(depth_limit=10, num_trees=7, col_subsampling=0.8, row_subsampling=0.8)
    rf.fit(X_train, y_train)

    pred_test_dt = dt.predict(X_test)
	"""
	Fixes tags that were converted to links during Obsidian import.

	Specifically, it is the first 3 line and the line contains "tags:", convert all [[tag name]] to #tag-name
	"""
	import os
	import re
	from pathlib import Path

	DIR = '/Users/eugeneya/obsidian-vault/'
	"""
	Iteratively loop through all files in DIR and add-commit-push them to REPO.

	This script should sit in your obsidian vault.
	"""
	from pathlib import Path
	from git import Repo
	import os

	DIR = '/Users/eugene/obsidian-vault/assets'
	# Start a SageMaker notebook instance (ml.p3.2xlarge) and open a terminal

	# Upload the conda yml from here: https://gist.github.com/eugeneyan/3435e05dd675b9ee2af164214536752d

	# Install NVTabular
	conda env create -f=SageMaker/nvt_t4r.yml

	# Activate conda env
	source anaconda3/etc/profile.d/conda.sh
	conda activate nvt_t4r
	# Based on https://github.com/NVIDIA-Merlin/NVTabular/blob/main/conda/environments/nvtabular_dev_cuda11.0.yml
	name: nvt_t4r
	channels:
	- rapidsai
	- nvidia
	- conda-forge
	- defaults
	dependencies:
	- nvtabular
	- python>=3.7
	for i in xrange(0, len(u2items)):
	wi = math.pow(len(u2items[i]) + 5, -0.35)
	for j in xrange(i + 1, len(u2items)):
	intersection = u2items[i] & u2items[j]
	wj = wi * math.pow(len(u2items[j]) + 5, -0.35)
	for product_id in intersection:
	i2i[product_id] = i2i.get(product_id, 0.0) + wj / (1 + len(intersection))

	# u2items = array of users and their items
	# u2items[i] = items user i clicked on
	\| \| Search \| Recommendations \| Schemas & Description \| Data Preview \| Column Statistics \| Space/cost metrics \| Ownership \| Top Users \| Lineage \| Change Notification \| Open Source \| Documentation \| Supported Sources \| Push or Pull \|
	\|-----------------------------\|--------\|-----------------\|-----------------------\|--------------\|-------------------\|--------------------\|-----------\|-----------\|---------\|---------------------\|-------------\|---------------\|-------------------------------------------------------\|--------------\|
	\| Amundsen (Lyft) \| ✔ \| ✔ \| ✔ \| ✔ \| ✔ \| \| ✔ \| ✔ \| Todo \| \| ✔ \| ✔ \| Hive, Redshift, Druit, RDBMS, Presto, Snowflake, etc. \| Pull \|
	\| Datahub (LinkedIn) \| ✔ \| \| ✔ \| \| \|
	# Clone and setup environment
	git clone https://github.com/eugeneyan/testing-ml.git
	cd testing-ml
	make setup

	# Run test suite
	make check
	def test_rf_better_than_dt(dummy_titanic):
	X_train, y_train, X_test, y_test = dummy_titanic

	dt = DecisionTree(depth_limit=10)
	dt.fit(X_train, y_train)

	rf = RandomForest(depth_limit=10, num_trees=7, col_subsampling=0.8, row_subsampling=0.8)
	rf.fit(X_train, y_train)

	pred_test_dt = dt.predict(X_test)