Skip to content

Instantly share code, notes, and snippets.

@olliefr
Last active November 12, 2022 18:29
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save olliefr/501e7e7676c8b550b5aa0789df25fc68 to your computer and use it in GitHub Desktop.
Save olliefr/501e7e7676c8b550b5aa0789df25fc68 to your computer and use it in GitHub Desktop.
Dataflow data generator

Google Cloud Blog post:

More detail on the Streaming Data Generator template:

The source code for the template:

It appears to be using json-data-generator library. Its GitHub repo has docs on how to define the schema:

In streaming mode, it generates an unbounded input collection using Beam's GenerateSequence PTransform.

Interestingly enough, the documentation for that PTransform states

A PTransform that produces longs starting from the given value, and either up to the given limit or until Long.MAX_VALUE / until the given time elapses.

My interpretation of this is that eventually it will run out of values to produce?

Oracle states that

The long data type is a 64-bit two's complement integer. The signed long has a minimum value of -2^63 and a maximum value of 2^63-1.

Which is a large number, but still finite. This is not an issue, more of a curiousity, so I'll park this study for now.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment