Skip to content

Instantly share code, notes, and snippets.

View iht's full-sized avatar
💭
Migrating the world to the cloud

Israel Herraiz iht

💭
Migrating the world to the cloud
View GitHub Profile
@iht
iht / chain_schema.py
Last active November 24, 2023 15:11
Schema for a StructuredOutputParser in langchain
# See previous gist https://gist.github.com/iht/d5bedff9ee86ed5223cfa2ad06080ca0
import json
# ...
from langchain.output_parsers import ResponseSchema, StructuredOutputParser
def _dict_to_json(x: dict) -> str:
return "```\n" + json.dumps(x) + "\n```"
response_schema = [
ResponseSchema(name="query", description="SQL query to solve the user question."),
@iht
iht / two_chains_wrong.py
Last active November 24, 2023 15:02
Passing info from one chain to the next -- this does not work
from operator import itemgetter
from langchain.chat_models import ChatVertexAI
from langchain.prompts import PromptTemplate
from langchain.schema import StrOutputParser
from langchain.schema.vectorstore import VectorStoreRetriever
## Chain1
# The input is {"input": "Some question written by a user"}
some_model1 = ChatVertexAI(model_name="codechat-bison", max_output_tokens=2048)
@iht
iht / hop_web_gcloud.sh
Last active October 15, 2022 16:50
Commands to create a Google Cloud VM and disk for Hop web to run Dataflow jobs
# Google Cloud zone to use for the resources
ZONE=europe-west1-b
# If not running in Cloud Shell, set the variable GOOGLE_CLOUD_PROJECT to your project id
# GOOGLE_CLOUD_PROJECT=<PROJECT ID>
# Get service account id
SERVICE_ACCOUNT=$(gcloud iam service-accounts list | grep "compute@developer.gserviceaccount.com" | cut -d " " -f 2)
# Add permission to run Dataflow jobs
@iht
iht / JsonSchemaParser.java
Created August 21, 2022 20:14
BigQuery JSON file to Beam Schema
package dev.herraiz.beam.schemas;
import com.fasterxml.jackson.databind.JsonNode;
import com.fasterxml.jackson.databind.ObjectMapper;
import com.google.api.client.googleapis.util.Utils;
import com.google.api.client.json.JsonFactory;
import com.google.api.services.bigquery.model.TableSchema;
import org.apache.beam.sdk.io.gcp.bigquery.BigQueryUtils;
import org.apache.beam.sdk.schemas.Schema;
@iht
iht / RelayOptions.java
Created January 25, 2021 17:59
Relay your custom options to Dataflow, assuming your are using Flex Templates
import org.apache.beam.runners.dataflow.options.DataflowPipelineOptions;
import org.apache.beam.sdk.options.PipelineOptionsFactory;
public static void main(String[] args) {
PipelineOptions opts =
PipelineOptionsFactory.fromArgs(args);
DataflowPipelineOptions dataflowPipelineOptions =
opts.as(DataflowPipelineOptions.class);
@iht
iht / relay_options.py
Created January 25, 2021 17:57
Relay your custom runtime options to Dataflow, assuming you use Flex Templates
def run_pipeline(argv):
opts: PipelineOptions = PipelineOptions(argv)
gcloud_opts: GoogleCloudOptions = opts.view_as(GoogleCloudOptions)
if opts.i_want_streaming_engine:
gcloud_opts.enable_streaming_engine = True
else:
gcloud_opts.enable_streaming_engine = False
...
@iht
iht / main.tf
Created January 25, 2021 17:54
Pass custom options to your Flex template
resource "google_dataflow_flex_template_job" "big_data_job" {
provider = google-beta
name = "dataflow-flextemplates-job"
container_spec_gcs_path = "gs://my-bucket/templates/template.json"
parameters = {
i_want_streaming_engine = true
}
}
@iht
iht / MyMainFile.java
Last active January 25, 2021 17:46
Set Dataflow pipeline options from your Java code
import org.apache.beam.runners.dataflow.options.DataflowPipelineOptions;
import org.apache.beam.sdk.options.PipelineOptionsFactory;
public static void main(String[] args) {
PipelineOptions opts =
PipelineOptionsFactory.fromArgs(args);
DataflowPipelineOptions dataflowPipelineOptions =
opts.as(DataflowPipelineOptions.class);
@iht
iht / dataflow_options_with_terraform.py
Last active January 25, 2021 17:38
Set Dataflow options programatically
def run_pipeline(argv):
opts: PipelineOptions = PipelineOptions(argv)
gcloud_opts: GoogleCloudOptions = opts.view_as(GoogleCloudOptions)
gcloud_opts.enable_streaming_engine = True
gcloud_opts.job_name = "Overwrite the name of your job"
...
@iht
iht / find_external_libs.py
Last active November 22, 2020 00:21
Find all the external functions called by my code
def find_external_functions(key, stats):
output = {}
for k,v in stats.items():
_, _, _, _, d = v
if key in d:
output[k] = v
return output