Skip to content

Instantly share code, notes, and snippets.

View albinkjellin's full-sized avatar

Albin Kjellin albinkjellin

View GitHub Profile
@albinkjellin
albinkjellin / sample_timeliness.py
Created November 22, 2023 16:21
Sample timeliness Soda Spark
from pyspark.sql import DataFrame, SparkSession, SQLContext, types
from pyspark.sql.functions import *
from soda.scan import Scan
import yaml
spark_session = SparkSession.builder.config("spark.jars", "postgresql-42.3.1.jar").getOrCreate()
file_path = 'stock-price-1.csv'
file_name = 'stockprice1'
@albinkjellin
albinkjellin / py
Created August 18, 2023 14:51
Generate Soda CL based on a csv
import csv
import pandas as pd
import yaml
# Provide a link to the csv file with the rules.
file_name = 'rules.csv'
df = pd.read_csv(file_name)
df = df.reset_index() # make sure indexes pair with number of rows
| Starting new HTTPS connection (1): cloud.soda.io:443
| https://cloud.soda.io:443 "POST /api/command HTTP/1.1" 200 258
| Executing SQL query:
SELECT column_name, data_type, is_nullable FROM `shipping.INFORMATION_SCHEMA.COLUMNS` WHERE table_name = 'subscription';
| This service is instrumented using OpenTelemetry. OpenTelemetry could not be imported; please add opentelemetry-api and opentelemetry-instrumentation packages in order to get BigQuery Tracing data.
| Converted retries value: 3 -> Retry(total=3, connect=None, read=None, redirect=None, status=None)
| Making request: POST https://oauth2.googleapis.com/token
| Starting new HTTPS connection (1): oauth2.googleapis.com:443
| https://oauth2.googleapis.com:443 "POST /token HTTP/1.1" 200 None
| Starting new HTTPS connection (1): bigquery.googleapis.com:443
@albinkjellin
albinkjellin / scan-output.json
Created May 3, 2021 18:48
Sample Scan Output as JSON
{
"measurements":[
{
"metric":"schema",
"value":[
{
"name":"productid",
"type":"character varying"
},
{
table_name: product
samples:
table_limit: 50
failed_limit: 50
metrics:
- row_count
- missing_count
- missing_percentage
- values_count
- values_percentage
(2.1.0b2) $soda scan warehouse.yml tables/product.yml
| 2.1.0b2
| Scanning tables/product.yml ...
| Soda cloud: cloud.soda.io
| Soda Cloud scan start
| > /api/command (login with API key credentials)
| < 200 (login ok, token received)
| Executing SQL query:
SELECT column_name, data_type, is_nullable
FROM information_schema.columns
@albinkjellin
albinkjellin / mule-postgresql-ssl.xml
Last active August 29, 2015 14:27
Sample config for setting up ssl jdbc with postrgresql with the mulesoft database connector
<db:generic-config name="Generic_Database_Configuration" doc:name="Generic Database Configuration" driverClassName="org.postgresql.Driver" url="jdbc:postgresql://${db.host}:${db.port}/testssl?user=${db.user}&amp;password=${db.password}&amp;sslmode=verify-full&amp;sslrootcert=${app.home}/classes/rds-combined-ca-bundle.pem"/>
<http:listener-config name="HTTP_Listener_Configuration" host="0.0.0.0" port="8081" doc:name="HTTP Listener Configuration"/>
<flow name="postgress-ssl">
<http:listener config-ref="HTTP_Listener_Configuration" path="/trigger" doc:name="HTTP"/>
<db:select config-ref="Generic_Database_Configuration" doc:name="Database">
<db:parameterized-query><![CDATA[select * from test;]]></db:parameterized-query>
</db:select>
<object-to-string-transformer doc:name="Object to String"/>
</flow>
@albinkjellin
albinkjellin / file2soap.xml
Last active August 29, 2015 14:20
file2soap
<flow name="file2soap" doc:description="Reads a file and sends that as a SOAP attachment to a SOAP service.">
<file:inbound-endpoint path="src/test/resources/soap/attachment/in" responseTimeout="10000" doc:name="Read File" />
<processor-chain doc:name="Processor Chain">
<scripting:transformer doc:name="Create SOAP Attachement">
<scripting:script engine="Groovy"><![CDATA[def attachment = new org.apache.cxf.attachment.AttachmentImpl(originalFilename)
def source = new org.apache.axiom.attachments.ByteArrayDataSource(payload.getBytes(),'application/pdf');
attachment.setDataHandler(new org.apache.axiom.attachments.ConfigurableDataHandler(source));
message.setInvocationProperty('cxf_attachments',[attachment])
return payload
]]></scripting:script>
@albinkjellin
albinkjellin / gist:7f16948114f486ddb32e
Last active August 29, 2015 14:17
Sample Code for checking session token
<flow name="policy.stub">
<http:listener config-ref="HTTP_Listener_Configuration"
path="/inbound" doc:name="HTTP" />
<logger message="#[sessionVars.tokenName]" level="INFO"
doc:name="Logger" />
<enricher doc:name="Message Enricher" source="#[payload.isValid]" target="#[flowVars['isValid']]">
<processor-chain doc:name="Processor Chain">
<http:request config-ref="HTTP_Request_Configuration"
path="validate" method="GET" doc:name="HTTP" />
<json:json-to-object-transformer
@albinkjellin
albinkjellin / rest2file.xml
Last active August 29, 2015 14:16
rest2file
<flow name="main">
<http:inbound-endpoint address="http://localhost:8884/api" doc:name="HTTP" exchange-pattern="request-response" />
<apikit:router config-ref="apiConfig" doc:name="APIkit Router" />
</flow>
<flow name="post:/contact/{contactId}/datasheet:apiConfig">
<set-payload value="#[message.inboundAttachments]" doc:name="Retrieve Attachments"/>
<foreach doc:name="For Each">
<set-payload value="#[payload.getInputStream() ]" doc:name="Get Inputstream from Payload"/>
<file:outbound-endpoint path="src/test/resources/rest/attachment/out" responseTimeout="10000" doc:name="File" outputPattern="#[server.dateTime.toString()].pdf"/>