Skip to content

Instantly share code, notes, and snippets.

View diggzhang's full-sized avatar
🎯
Focusing

Xingze Zhang diggzhang

🎯
Focusing
View GitHub Profile
#!/bin/bash
set -e # exit on error
#---------------------------------------------------------------------
# SCRIPT CONFIGURATION
#---------------------------------------------------------------------
SCRIPT_NAME=$(basename "$0")
VERSION=0.1
package com.onion.dataprocess.helpers
import java.util.concurrent.Future
import org.apache.kafka.clients.producer.{ KafkaProducer, ProducerRecord, RecordMetadata }
class KafkaSink[K, V](createProducer: () => KafkaProducer[K, V]) extends Serializable {
/* This is the key idea that allows us to work around running into
NotSerializableExceptions. */
lazy val producer = createProducer()
def send(topic: String, key: K, value: V): Future[RecordMetadata] =

基础依赖安装

为了防止系统依赖污染,选择使用virtualenv方式安装。

基础工具组主要是pyenv + pyenv-virtualenv:

git clone https://github.com/pyenv/pyenv.git ~/.pyenv
echo 'export PYENV_ROOT="$HOME/.pyenv"' >> ~/.zshenv
echo 'export PATH="$PYENV_ROOT/bin:$PATH"' >> ~/.zshenv
# -*- coding: utf-8 -*-
import time
from selenium import webdriver
from selenium.webdriver.common.desired_capabilities import DesiredCapabilities
import datetime
caps = DesiredCapabilities.FIREFOX
# Tell the Python bindings to use Marionette.
2018-05-09T14:11:48.960+0800 local.oplog.rs 97447
2018-05-09T14:11:48.960+0800 done dumping local.oplog.rs (97447 documents)
2018-05-09T14:11:54.481+0800 Failed: error connecting to db server: no reachable servers
Traceback (most recent call last):
File "mongodump_scroll.py", line 77, in <module>
dump_queue()
File "mongodump_scroll.py", line 61, in dump_queue
mongodump(dump_item)
File "mongodump_scroll.py", line 53, in mongodump
subprocess.check_output(['sh', file_name])

Airflow踩坑相关

1. python2/3版本选择问题

从调试过程中的日志发现大量python2/3切换产生的bug,所以建议以python2.7为airflow的开发、测试环境。

python3虽然可以跑,但主要是airflow的很多插件都滞后更新。一些插件会抛错。(测试环境是3.4.0)

  • 底层报错参考
  • print("" % string) / print "" % string问题
[2018-04-18 15:04:17,851] {jobs.py:368} INFO - Started process (PID=5940) to work on /h
ome/master/airflow/dags/test_local_executor.py
[2018-04-18 15:04:17,853] {jobs.py:368} INFO - Started process (PID=5941) to work on /h
ome/master/airflow/dags/daily_report_production_version_bash_op.py
[2018-04-18 15:04:17,862] {jobs.py:1546} INFO - Exited execute loop
[2018-04-18 15:04:17,914] {jobs.py:1560} INFO - Terminating child PID: 5940
[2018-04-18 15:04:17,915] {jobs.py:1560} INFO - Terminating child PID: 5941
[2018-04-18 15:04:17,915] {jobs.py:1564} INFO - Waiting up to 5 seconds for processes t
o exit...
[2018-04-18 15:04:17,963] {jobs.py:379} ERROR - Got an exception! Propagating...
# -*- coding: utf-8 -*-
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
Traceback (most recent call last):
File "/Users/xingzezhang/.pyenv/versions/3.4.0/envs/apache-airflow-env-py34/lib/python3.4/site-packages/airflow/models.py", line 1618, in handle_failure
self.email_alert(error, is_retry=True)
File "/Users/xingzezhang/.pyenv/versions/3.4.0/envs/apache-airflow-env-py34/lib/python3.4/site-packages/airflow/models.py", line 1779, in email_alert
send_email(task.email, title, body)
File "/Users/xingzezhang/.pyenv/versions/3.4.0/envs/apache-airflow-env-py34/lib/python3.4/site-packages/airflow/utils/email.py", line 44, in send_email
return backend(to, subject, html_content, files=files, dryrun=dryrun, cc=cc, bcc=bcc, mime_subtype=mime_subtype)
File "/Users/xingzezhang/.pyenv/versions/3.4.0/envs/apache-airflow-env-py34/lib/python3.4/site-packages/airflow/utils/email.py", line 87, in send_email_smtp
send_MIME_email(SMTP_MAIL_FROM, recipients, msg, dryrun)
File "/Users/xingzezhang/.pyenv/versions/3.4.0/envs/apache-airflow-env-py34/lib/python3.4/site-packages/ai
# -*- coding: utf-8 -*-
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,