Skip to content

Instantly share code, notes, and snippets.

View sdabbour-stratio's full-sized avatar

Sameh Dabbour sdabbour-stratio

View GitHub Profile
-*-*-*-*-*-*- Request scale_down received at: 2020-Mar-12 09:22:15 -*-*-*-*-*-*-
Received Request of type: scale_down
DateTime: 2020-Mar-12 09:22:15
===============================================================
*** Request is valid, processing ***
['hdfs-default', 'zookeeper-default', 'postgres-default', 'pgbouncer-default', 'governance-agent-hdfs-internal-default', 'governance-agent-pg-internal-default', 'crossdata-default', 'discovery-default', 'rocket-default', 'intelligence-default']
DeploymentID: fincrime-fd-2
Tenant Name: fincrime-sandbox
*** Scaling cluster ***
Scaling Cluster; Attempt: 1
<15:53:11> [daedalus:0.5.0-PR343-SNAPSHOT@eyctpsandbox01] [/workspace] 130# daedalus play /stratio/ansible/playbooks/manage-nodes-down.yml -e 'nodes_to_operate=["10.45.172.20", "10.45.172.21", "10.45.173.90", "10.45.173.91"]' -v
Using /etc/ansible/ansible.cfg as config file
PLAY [master-3] *************************************************************************************
TASK [Gathering Facts] ******************************************************************************
ok: [master-3]
TASK [Check maintenance status] *********************************************************************
ok: [master-3] => {"changed": false, "content_length": "927", "content_type": "application/json", "cookies": {}, "cookies_string": "", "date": "Thu, 19 Mar 2020 15:53:27 GMT", "elapsed": 0, "json": {"down_machines": [{"hostname": "10.45.173.86", "ip": "10.45.173.86"}, {"hostname": "10.45.172.13", "ip": "10.45.172.13"}, {"hostname": "10.45.172.11", "ip": "10.45.172.11"}, {"hostname": "10.45.173.81", "ip": "10.45.173.81"},
{
"_meta": {
"hostnames": {
"10.45.172.10": "agent-adv-bnk-poc-eeb4d41",
"10.45.172.4": "master-3",
"10.45.172.5": "public-agent-2",
"10.45.172.6": "gosec-3",
"10.45.172.7": "agent-bdl-demo-b80010a",
"10.45.172.8": "agent-adv-hsb-poc-fd8761a",
"10.45.172.9": "agent-adv-hsb-poc-dd30f90",
$$$ Creating Tenant $$$
CMD: ['/bin/bash', '-i', '-c', 'daedalus tenant adv-chu-poc.json --tags users,groups,gosec,tenant,calico,vault,permitted-roles']
Executing a command as a subprocess
***********************************
CMD: ['/bin/bash', '-i', '-c', 'daedalus tenant adv-chu-poc.json --tags users,groups,gosec,tenant,calico,vault,permitted-roles']
STDOUT:
bash: cannot set terminal process group (1): Not a tty
bash: no job control in this shell
Validating /workspace/adv-chu-poc.json
Validating tenant name - OK
10.45.172.4 master-3
10.45.173.71 master-1
10.45.173.70 master-2
10.45.172.6 gosec-3
10.45.173.73 gosec-1
10.45.173.72 gosec-2
10.45.172.10 agent-adv-bnk-poc-eeb4d41
10.45.172.7 agent-bdl-demo-b80010a
10.45.172.8 agent-adv-hsb-poc-fd8761a
10.45.172.9 agent-adv-hsb-poc-dd30f90
package com.stratio.governance.unstructured.rcn.inference
import java.io.File
import com.johnsnowlabs.nlp.annotator.WordEmbeddingsModel
import com.johnsnowlabs.nlp.annotators.ner.dl.NerDLApproach
import com.johnsnowlabs.nlp.embeddings.BertEmbeddings
import com.johnsnowlabs.nlp.training.CoNLL
import org.apache.hadoop.conf.Configuration
import org.apache.hadoop.fs.LocalFileSystem
from pyspark.sql import functions as F, types as T, DataFrame
def general_converter(spark, df_table_to_convert: DataFrame) -> DataFrame:
"""
Given a dataframe (table_name) that has some columns to be auto-converted, applies the
corresponding conversions specified in the master table_name of the general converter.
:param df_table_to_convert: Spark Dataframe where a column named 'TABLA_ORIGEN' has been
added, which contains the name of the table, needed for later
<project xmlns="http://maven.apache.org/POM/4.0.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd">
<modelVersion>4.0.0</modelVersion>
<groupId>com.stratio</groupId>
<artifactId>custom-lite-input-ibm-mq</artifactId>
<version>0.0.1-SNAPSHOT</version>
<properties>
<?xml version="1.0" encoding="UTF-8"?>
<root>
<Data type="dict">
<account_transactions type="dict">
<avaloqTechnicalId type="str">78616497</avaloqTechnicalId>
<avaloqInternalId type="str">B7OHVBA</avaloqInternalId>
<society type="str">56</society>
<number type="str">0</number>
<accountId type="str">B7OHVBA</accountId>
<shortName type="str">Society: BU.HSPB.UK , AccountID: B7OHVBA</shortName>
import os
import pandas as pd
files = os.listdir(".")
csv_files = [file for file in files if '.csv' in file]
for file in csv_files:
table_name = file.lower().replace(".csv", "").replace(" ", "_")
df = pd.read_csv(file)
org_columns = df.columns.to_list()
new_columns = [column.lower().replace(" ", "_") for column in org_columns]
n = len(org_columns)