Skip to content

Instantly share code, notes, and snippets.

View davidallan's full-sized avatar

David Allan davidallan

View GitHub Profile
@davidallan
davidallan / Spark Notebook ScalaSQL.ipynb
Last active December 6, 2023 19:45
Spark Notebook ScalaSQL
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
@davidallan
davidallan / Spark Notebook PythonSQL.ipynb
Last active December 6, 2023 19:46
Spark Notebook Python and SQL
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
@davidallan
davidallan / Spark Notebook Python.ipynb
Last active December 6, 2023 19:48
PySpark in OCI Data Science notebook
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
@davidallan
davidallan / Spark Notebook Scala.ipynb
Last active December 6, 2023 19:47
Scala in a Data Science Notebook
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
@davidallan
davidallan / rerun_failed_tasks.py
Last active December 1, 2023 17:18
Rerun an OCI Data Integration task run
import oci
import json
import requests
import sys
import glob
import os
from oci.signer import Signer
workspaceID = sys.argv [1]
applicationKey = sys.argv[2]
@davidallan
davidallan / filecopy.py
Created November 29, 2023 16:36
File copy using Spark - parameterized with input and output files
from pyspark.sql import SparkSession
import os
import argparse
# Usage:
def main():
parser = argparse.ArgumentParser()
parser.add_argument('--inputfile')
parser.add_argument('--outputfile')
@davidallan
davidallan / README.md
Last active August 25, 2023 21:12
Create Task Schedules using Metadata

Introduction

This script can create schedules in OCI Data Integration from a tab delimited file or task schedules. The workspace ocid and application key are passed as parameters. The schedules and task schedules are created in this application/workspace and the referenced tasks and schedules are within it. This depends on the OCI Python SDK which is available in the OCI Console Cloud Shell preconfigured or can be installed locally (see here for details).

Save the Python file create_task_schedules.py into your client (easiest is Cloud Shell), then save same tsv file(s) for your schedules or task schedules. If you have the OCI Python SDK installed which is already inst

@davidallan
davidallan / container_instance_mlflow.json
Created August 8, 2023 23:04
Create Container Instance MLflow sample payload
{
"containers": [
{
"imageUrl": "iad.ocir.io/namespace/oci-mlflow:latest",
"displayName": "container-20230731-1429-1",
"environmentVariables": {
"MLFLOW_HOST": "0.0.0.0",
"MLFLOW_GUNICORN_OPTS": "--log-level debug",
"MLFLOW_PORT": "5000",
"MLFLOW_DEFAULT_ARTIFACT_ROOT": "oci://bucket@namespace/folder/",
@davidallan
davidallan / di_terminate_task_run.sql
Last active June 12, 2023 05:06
Terminate an OCI Data Integration Task from PLSQL
--
-- Terminate OCI Data Integration task, pass parameters;
-- workspace_ocid - workspace OCID
-- application_key - application key
-- task_run_key - task run key
-- region - region info
-- cred - OCI$RESOURCE_PRINICPAL or user credential name
--
create or replace procedure di_terminate_task(workspace_ocid VARCHAR2, application_key VARCHAR2, task_run_key VARCHAR2, region VARCHAR2, cred VARCHAR2, status OUT VARCHAR2) as
result DBMS_CLOUD_OCI_DI_DATA_INTEGRATION_UPDATE_TASK_RUN_RESPONSE_T;
@davidallan
davidallan / create_schedule.py
Created April 27, 2023 17:45
Create a schedule using CRON
import oci
from oci.data_integration.data_integration_client import DataIntegrationClient
config = oci.config.from_file()
data_integration_client = DataIntegrationClient(config)
workspace_id="ocid1.disworkspace.oc1.iad...."
application_key="...."
create_schedule_response=data_integration_client.create_schedule(
workspace_id=workspace_id,