Skip to content

Instantly share code, notes, and snippets.

View bepcyc's full-sized avatar
🙃
Sparkling

Viacheslav Rodionov bepcyc

🙃
Sparkling
  • Qualcomm
  • Germany
View GitHub Profile
@bepcyc
bepcyc / dpreview_forum_regex.txt
Created April 26, 2023 09:37
regex expressions used by me for dumping DPreview forum discussions and reducing their size
1. Removing quotes and nested quotes: ^ {4}(\w+ wrote:(?:\n+(?: {4,}.*))*)
This will match quotes likes this one:
```
user1 wrote:
user2 wrote:
I use Lens X on Camera Y. It's a super good combo. I have bought the lens station on Amazon. Upgraded to latest firmware and I have full ibis now.
I just got my 85mm and it does not work well on my Camera W. As others have noted, IBIS and BB+ shooting does not work.
{
"$schema": "https://raw.githubusercontent.com/jsonresume/resume-schema/v1.0.0/schema.json",
"basics": {
"name": "Viacheslav Rodionov",
"label": "BIG DATA architect",
"location": {
"city": "Munich",
"countryCode": "DE",
"region": "Bavaria"
},
@bepcyc
bepcyc / headless_firefox_crawl_js_site.py
Created October 19, 2021 11:17
Crawl a dynamic website in headless mode
### based on great SO answers: https://stackoverflow.com/a/50593885/918211 and https://stackoverflow.com/a/46768243/918211
## Debian/Ubuntu specific
# sudo apt install -y firefox-geckodriver
# python3 -m venv venv
# cd venv
# source bin/activate
# pip install selenium beautifulsoup4
@bepcyc
bepcyc / kill_all_yarn_apps.sh
Last active December 15, 2021 11:55
Kill all running YARN applications. Workaround for multi RM setups.
# In case you're getting an error like:
# This is standby RM. The redirect url is:
# or (for yarn util):
# INFO client.INFO retry.RetryInvocationHandler: java.net.ConnectException: Call From xxx to yyy:pppp failed on connection exception: java.net.ConnectException: Connection refused; For more details see: http://wiki.apache.org/hadoop/ConnectionRefused, while invoking ApplicationClientProtocolPBClientImpl.getApplications over rm2 after 1 failover attempts. Trying to failover after sleeping for x ms.ConfiguredRMFailoverProxyProvider: Failing over to rm2
#
# this workaround might be needed for multi-master setups (for exanple AWS EMR 5.x YARN has this issue)
ACTIVE_HOST=$(curl -s -i http://${HOSTNAME}:8088/ws/v1/cluster/metrics | grep "Location:" | grep http | cut -d' ' -f2 | cut -d'/' -f3 | cut -d':' -f1)
RM_HOSTNAME=${ACTIVE_HOST:-$HOSTNAME}
curl -s -L http://${RM_HOSTNAME}:8088/ws/v1/cluster/apps?state=RUNNING | jq -c ".apps.app[].id" | xargs yarn application --kill
@bepcyc
bepcyc / update_gcc.sh
Created June 10, 2021 08:25
Change default gcc /g++/ cpp version used in ubuntu
# for advanced users only
# sudo add-apt-repository ppa:ubuntu-toolchain-r/test
GCC_VERSION=11 # or whatever
sudo apt update
sudo apt install gcc-${GCC_VERSION} gcc-${GCC_VERSION}-locales gcc-${GCC_VERSION}-multilib g++-${GCC_VERSION} g++-${GCC_VERSION}-multilib cpp-${GCC_VERSION}
sudo update-alternatives --install /usr/bin/g++ g++ /usr/bin/g++-${GCC_VERSION} 10
# test with
g++-${GCC_VERSION} --version
sudo update-alternatives --install /usr/bin/gcc gcc /usr/bin/gcc-${GCC_VERSION} 10
# test with
@bepcyc
bepcyc / add_secret_to_bitwarden.sh
Created February 19, 2021 12:42
How to add secret content (e.g. a private key) to a bitwarden and how to restore it back. Chezmoi template included.
SECRET_NAME=id_rsa
SECRET_PATH=~/.ssh/id_rsa
# store the secret content as an item in bitwarden
echo "{\"organizationId\":null,\"folderId\":null,\"type\":2,\"name\":\"${SECRET_NAME}\",\"notes\":\"$(base64 -w 0 ${SECRET_PATH})\",\"favorite\":false,\"fields\":[],\"login\":null,\"secureNote\":{\"type\":0},\"card\":null,\"identity\":null}" | bw encode | bw create item
bw sync # optional
# retrieve the secret
# assuming a single search result
bw list items --search id_rsa | jq -r '.[0].notes' | base64 -d > ${SECRET_PATH}
# in case you're using chezmoi here's a template that will retrieve that secret automatically
#$cat $(chezmoi source-path ${SECRET_PATH})
@bepcyc
bepcyc / kafka_offsets_by_timestamp.sh
Last active December 11, 2020 14:50
generate kafka topic offsets JSON for a given timestamp
#!/usr/bin/env bash
# This script generates a JSON structure that represents
# topic offsets per partition for a given moment of time
# this structure is very useful when used as a
# "startingOffsets" parameter in Spark Structured Streaming
shopt -s expand_aliases
#### YOU NEED TO SET UP THIS PART ####
@bepcyc
bepcyc / spark_extract_json_field_schema.scala
Created June 19, 2020 15:39
Get JSON field schema in Spark
val path = "s3://some/dir"
val df = spark.read.parquet(path)
val df2 = df.select($"value") // suppose value is a string with JSON
val ds = df2.as[String]
val dsj = spark.read.json(ds)
val schema = dsj.schema // here is your schema
println(schema.json)
println(schema.toDDL)
@bepcyc
bepcyc / S3_buckets_object_count.sh
Created June 19, 2020 12:40
S3 buckets file count
FILENAME=/tmp/buckets
for b in $(aws s3api list-buckets | jq -r ".Buckets[].Name"); do aws s3api list-objects --bucket $b --output json --query "[length(Contents[])]" | jq -c ".[0]" | xargs -I {} echo -e {}"\t$b" >>$FILENAME ; done
@bepcyc
bepcyc / ppas_new_release_check.sh
Created October 14, 2019 11:02
check PPAs availability for a new Ubuntu release
# DIST is a current release codename (e.g. bionic)
# NEXT_DIST is a next release codename (e.g. disco)
# error codes 301 and 200 mean there is a page, 404 - meansthere is not
DIST=$(. /etc/os-release; echo $VERSION_CODENAME); NEXT_DIST=disco; for url in $(grep -h -v "^#" /etc/apt/sources.list.d/*.list|grep "^deb" | sort -u | grep $DIST | awk -v n="$NEXT_DIST" '{print $2"/dists/"n}'); do echo $url $(curl -s -o /dev/null -w "%{http_code}" "$url"); done