Ilya Baryshnikov lucidyan

## README.md

      
              2 files
            
          
              0 forks
            
          
              0 comments
            
          
              0 stars
            
          
                lucidyan
                / README.md
            
            
              Last active
              April 27, 2024 02:41
            
          
    Process yt-dlp Extracted Subtitles Script

Description

This Python script is designed to process .vtt subtitle files obtained using yt-dlp from YouTube or similar platforms. It merges subtitles with overlapping segments and cleans the text by removing excess whitespace. The script outputs the processed subtitles into a new text file with a timestamped filename.
Features


Subtitle Merging: Combines multiple subtitle entries into a single entry, considering overlaps.
Text Cleaning: Cleans subtitle text by replacing newline characters and reducing multiple spaces to a single space.
Output: Generates a cleaned and merged text file for each .vtt file in the specified directory.


## exact_bayesian_inference.md

      
              1 file
            
          
              0 forks
            
          
              1 comment
            
          
              1 star
            
          
                lucidyan
                / exact_bayesian_inference.md
            
            
              Created
              February 4, 2020 20:15
            
              
                Exact Bayesian Inference for A/B testing (all three parts)
              
          
    Exact Bayesian Inference for A/B testing

^{use "MathJax Plugin for Github" Chrome extension for Equation support}

Part I

^{author: Evan Haas}

^2009.12.09

^Source
In this three part series I’m going to talk about statistics in the context of A/B Testing. Part I discusses how to analyze experiments using traditional techniques from the frequentist school. Part II will discuss the Bayesian approach, and Part III will provide an implementation of the Bayesian method. Much of the information is adapted from the excellent Information Theory, Inference, and Learning Algorithms by Davi

  
## spark-on-k8s-operator.md

      
              1 file
            
          
              0 forks
            
          
              0 comments
            
          
              0 stars
            
          
                lucidyan
                / spark-on-k8s-operator.md
            
            
              Last active
              November 18, 2019 14:20
            
              
                Example of running spark-on-k8s-operator on minikube cluster locally
              
          
    spark-on-k8s-operator

Install minikube

curl -Lo minikube https://github.com/kubernetes/minikube/releases/download/v1.5.2/minikube-linux-amd64   && chmod +x minikube
sudo mkdir -p /usr/local/bin/
sudo install minikube /usr/local/bin/
Install VirtualBox


## bash_oneliners.md

      
              1 file
            
          
              0 forks
            
          
              0 comments
            
          
              0 stars
            
          
                lucidyan
                / bash_oneliners.md
            
            
              Last active
              November 14, 2019 23:45
            
          
    Grep files with selected extensions only

find ./ -type f \( -iname \*.md -o -iname \*.txt \) -exec grep -Hi 'word' {} +
Show files opened in Sublime

cat $HOME/.config/sublime-text-3/Local/Auto\ Save\ Session.sublime_session |grep "\"file\":" |sed 's/^[[:space:]]*//g' |sed 's/^\"file\"\: \"//g' |sort -u | sed 's/[\",]*//ig'"
Convert all *.wav to *.mp3


## scipy_odr_test.py
from __future__ import print_function

import numpy as np
import scipy.linalg
from scipy.odr import *
import matplotlib as mpl
from mpl_toolkits.mplot3d import Axes3D
from matplotlib import pyplot as plt
import sys
import time

## spark_to_pandas.py
import pandas as pd
from pyspark.sql import DataFrame

# Wrapper for seamless Spark's serialisation
def spark_to_pandas(spark_df: DataFrame) -> pd.DataFrame:
    """
    PySpark toPandas realisation using mapPartitions
    much faster than vanilla version
    fork: https://gist.github.com/lucidyan/1e5d9e490a101cdc1c2ed901568e082b
    origin: https://gist.github.com/joshlk/871d58e01417478176e7

## gpu-control.md

      
              1 file
            
          
              4 forks
            
          
              1 comment
            
          
              17 stars
            
          
                lucidyan
                / gpu-control.md
            
            
              Last active
              March 19, 2023 09:37
            
              
                Prevent NVIDIA GPUs' throttling on headless server
              
          
    Prevent NVIDIA GPUs' throttling on headless server


Unlock manual fan & overclock settings 

sudo nvidia-xconfig -a --cool-bits=28 --allow-empty-initial-configuration
Reboot system
Create script /usr/local/bin/gpu-fan-control.sh

#!/bin/bash

  
## flatten_dict.py
# Non-recursive flatten nested dictionaries in Python3
# Source: https://codereview.stackexchange.com/a/173483/173155

from itertools import chain, starmap


def flatten_dict(dictionary):
    """Flatten a nested dictionary structure"""

    def unpack(parent_key, parent_value):
	from __future__ import print_function

	import numpy as np
	import scipy.linalg
	from scipy.odr import *
	import matplotlib as mpl
	from mpl_toolkits.mplot3d import Axes3D
	from matplotlib import pyplot as plt
	import sys
	import time
	import pandas as pd
	from pyspark.sql import DataFrame

	# Wrapper for seamless Spark's serialisation
	def spark_to_pandas(spark_df: DataFrame) -> pd.DataFrame:
	"""
	PySpark toPandas realisation using mapPartitions
	much faster than vanilla version
	fork: https://gist.github.com/lucidyan/1e5d9e490a101cdc1c2ed901568e082b
	origin: https://gist.github.com/joshlk/871d58e01417478176e7
	# Non-recursive flatten nested dictionaries in Python3
	# Source: https://codereview.stackexchange.com/a/173483/173155

	from itertools import chain, starmap


	def flatten_dict(dictionary):
	"""Flatten a nested dictionary structure"""

	def unpack(parent_key, parent_value):