Skip to content

Instantly share code, notes, and snippets.

View Primetalk's full-sized avatar

Arseniy Zhizhelev Primetalk

View GitHub Profile
@Primetalk
Primetalk / split-csv.sc
Created February 1, 2022 12:30
Split a CSV file based on a field name
#!/usr/bin/env scala-cli
// Usage:
// 1. Install scala-cli as described here: https://scala-cli.virtuslab.org/install/
// (For macos it's: `brew install Virtuslab/scala-cli/scala-cli`)
// 2. Run `scala-cli <this gist url> <original-file>.csv <count of same value chunks> <chunk id field>
// 3. It'll produce many smaller files each of which will contain the requested number of
// line chunks with the same identifier.
// For each line it'll get the field and compare with the previous line. If it's new,
// then counter is incremented.
// After size it'll emit a new file.

Keybase proof

I hereby claim:

  • I am primetalk on github.
  • I am zhizhelev (https://keybase.io/zhizhelev) on keybase.
  • I have a public key ASDZcrUUlHnt8K5S-e3r-7ez1Iw8sYd7_N1geP9suA7Dqgo

To claim this, I am signing this object:

@Primetalk
Primetalk / Sliced.java
Last active November 26, 2018 18:42
Slice a stream into batches (Java)
import java.util.*;
import java.util.stream.Stream;
public final class StreamUtils {
public static <A> Stream<List<A>> sliced(final Stream<A> stm, final int sliceSize){
final ArrayList<A> lst = new ArrayList<>(sliceSize);
final int[] cnt = {0};
final Stream<ArrayList<A>> sliced = stm.flatMap(el -> {
if (cnt[0] == sliceSize) {
cnt[0] = 0;