Skip to content

Instantly share code, notes, and snippets.

@edgarrmondragon
Last active February 16, 2024 05:58
Show Gist options
  • Save edgarrmondragon/feb148d50339dbbc2a6251707e13882f to your computer and use it in GitHub Desktop.
Save edgarrmondragon/feb148d50339dbbc2a6251707e13882f to your computer and use it in GitHub Desktop.
Custom logging handlers in Meltano

Send Meltano logs to Cloud Logging on GCP Compute Engine

  1. Create the smallest instance available with the following settings:

    1. Firewall: no inbound traffic
    2. Allow access to all Cloud APIs: https://cloud.google.com/logging/docs/setup/python#run-gce
    3. Check Install Ops Agent for Monitoring and Logging in the Observability - Ops Agent section
  2. SSH into the instance

  3. Check system

$ python3 -V
Python 3.11.2

$ uname -a
Linux instance-20240213-230249 6.1.0-17-cloud-amd64 #1 SMP PREEMPT_DYNAMIC Debian 6.1.69-1 (2023-12-30) x86_64 GNU/Linux
  1. Install pip and venv
sudo apt update
sudo apt install python3-pip
sudo apt install python3-venv
  1. Install pipx
sudo apt install pipx
pipx ensurepath
source ~/.bashrc
  1. Install Meltano
pipx install meltano
  1. Check versions
meltano --version
  1. Create project
meltano init --force meltano-project
cd meltano-project
  1. Create logging.yaml in the root of the Meltano project with the contents of logging.yaml from this Gist

  2. Edit /etc/google-cloud-ops-agent/config.yaml with the contents of ops_agent.yaml from this Gist (might require sudo):

  3. Restart the Ops Agent

sudo systemctl restart google-cloud-ops-agent
  1. Run a command
$ meltano test


=== Testing completed successfully. 0 test(s) successful. 0 test(s) failed. ===
  1. Check the Logs Explorer for a message like Environment 'dev' is active

References

  1. https://cloud.google.com/stackdriver/docs/solutions/agents/ops-agent/install-agent-vm-creation
  2. https://cloud.google.com/logging/docs/agent/ops-agent/configuration
version: 1
disable_existing_loggers: false
formatters:
json:
(): meltano.core.logging.json_formatter
handlers:
file:
class: logging.FileHandler
level: DEBUG
filename: /tmp/meltano.log
formatter: json
root:
level: DEBUG
handlers: [file]
logging:
receivers:
meltano:
type: files
include_paths:
- /tmp/meltano.log # Path to the Meltano log file
processors:
json:
type: parse_json
map_severity:
type: modify_fields
fields:
severity:
move_from: jsonPayload.level
service:
pipelines:
meltano:
receivers: [meltano]
processors: [json, map_severity]
@edgarrmondragon
Copy link
Author

@melgazar9 The community slack is temporarily down 😢 (https://x.com/meltanodata/status/1734701043718496676) so I'm curious if you figured it out

@melgazar9
Copy link

Hey unfortunately nope I haven't. The closest I could get is sudo docker-compose up | logger 2>&1 (or meltano el <tap> <target> | logger 2>&1) but that doesn't capture the severity level. I'm going to try and use cloud run (or something similar). It seems like their serverless solutions gravitate towards best practices, but not totally sure. Cloud run has a 60m timeout, so that might be out of the question for long-running ELT processes.

@melgazar9
Copy link

melgazar9 commented Dec 16, 2023

Hey @edgarrmondragon will the Meltano slack be up soon? I have a separate question regarding the message Target sink for <table> is full. Draining... - It sounds like this has to do with the target, but running the same very large batch job with both target-snowflake and target-bigquery I'm seeing that same message towards the end of the job.

Where's the best place to discuss at the moment?

@edgarrmondragon
Copy link
Author

Hey @edgarrmondragon will the Meltano slack be up soon?

It seems like Slack won't make it easy to recover it 😞, but the plan is to start a new workspace and send invites in the first week of 2024. In the meantime http://discuss.meltano.com/ is synced with that old workspace (internally we do have access) and has the full history.

EDIT: seems like you already tried it out, so I'll answer there!

@melgazar9
Copy link

melgazar9 commented Dec 30, 2023

Hey @edgarrmondragon I figured out a workaround for this. I still haven't been able to get meltano el or meltano run to get GCP logs sent to GCP cloud logging, but if it's running within a docker container it works by adding the below block to /etc/docker/daemon.json

{
  "log-driver": "gcplogs",
  "log-opts": {
    "gcp-meta-name": "<my-image>"
  }
}

I find it useful to do this on a VM because running such a long job on GCP cloud run costed about 5x more compared to running it on a VM. Cloud run also currently has a timeout before the job is killed. Here are the steps I took:

  • create / start VM on GCP
    • before creating, under the 'metadata' section set enable-osconfig and enable-guest-attributes to TRUE
  • add the above block to /etc/docker/daemon.json
  • git clone <repo> and set .env variables / configurations / secrets
  • install docker / docker-compose
  • type screen
  • run sudo docker build -t <my-image> . && sudo docker run --rm -it --env-file=.env --log-driver=gcplogs <my-image>
  • logs should appear in gcp cloud logging so it's safe to close the screen

@edgarrmondragon
Copy link
Author

@melgazar9 nice, glad you figured it out!

TIL https://docs.docker.com/config/containers/logging/gcplogs/

Does that mean you don't need the whole gcp_logging.py logging module workaround?

@melgazar9
Copy link

@edgarrmondragon Ah sorry just realized the approach I mentioned logs everything as default severity. Unfortunately I still haven't found a way to track the meltano logs along with the severity level from a local machine or VM -> GCP cloud logging. This approach also only works if using docker and meltano together.

@edgarrmondragon
Copy link
Author

@melgazar9 I've been really busy as of late, but I'd like to come back to this and perhaps create a small terraform recipe for deploying Meltano to GCP CE, while sending logs to GCP cloud logging.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment