Skip to content

Instantly share code, notes, and snippets.

View simondmorias's full-sized avatar

Simon D'Morias simondmorias

View GitHub Profile
@GerryWilko
GerryWilko / workitem-poster.yml
Last active May 19, 2022 13:31
Azure Pipelines Task to Post Comment to Linked Work Items (Bash)
- bash: | # Access Token should need Build Read, Work Item Read & Write, Member Entitlement Management Read
curl -u test@test.com:access-token-xxxxxxxxxxx https://dev.azure.com/{organisation}/{project}/_apis/build/builds/$(build.buildId)/workitems?api-version=6.0 | jq '.value[] | .id' |
while IFS=$"\n" read -r c; do
wid=$(echo $c | tr -dc '0-9')
echo
echo Posting status to work item: $wid
echo
curl -u test@test.com:access-token-xxxxxxxxxxx https://dev.azure.com/{organisation}/{project}/_apis/wit/workItems/$wid/comments?api-version=6.0-preview.3 -X POST --data '{"text": "Build $(Build.BuildNumber) completed with status: $(Agent.JobStatus)"}' -H 'Content-Type: application/json'
echo
done
@dusenberrymw
dusenberrymw / spark_tips_and_tricks.md
Last active February 8, 2023 05:11
Tips and tricks for Apache Spark.

Spark Tips & Tricks

Misc. Tips & Tricks

  • If values are integers in [0, 255], Parquet will automatically compress to use 1 byte unsigned integers, thus decreasing the size of saved DataFrame by a factor of 8.
  • Partition DataFrames to have evenly-distributed, ~128MB partition sizes (empirical finding). Always err on the higher side w.r.t. number of partitions.
  • Pay particular attention to the number of partitions when using flatMap, especially if the following operation will result in high memory usage. The flatMap op usually results in a DataFrame with a [much] larger number of rows, yet the number of partitions will remain the same. Thus, if a subsequent op causes a large expansion of memory usage (i.e. converting a DataFrame of indices to a DataFrame of large Vectors), the memory usage per partition may become too high. In this case, it is beneficial to repartition the output of flatMap to a number of partitions that will safely allow for appropriate partition memory sizes, based upon the