Skip to content

Instantly share code, notes, and snippets.

@aqt01
Created January 19, 2017 18:13
Show Gist options
  • Save aqt01/283b3acd5eec12c4cd9a3cae48792cf9 to your computer and use it in GitHub Desktop.
Save aqt01/283b3acd5eec12c4cd9a3cae48792cf9 to your computer and use it in GitHub Desktop.
Running 'remote-task ImportCountryWorkflow --n-reduce-tasks 1 --host localhost --remote-name analyticstack --user hadoop --repo https://github.com/codetigerco/edx-analytics-pipeline --branch stage_edxapp --local-scheduler --verbose --wait --overwrite --interval-start 2017-01-01 --overwrite-n-days 16 --geolocation-data hdfs://localhost:9000/data/…
(pipeline)hadoop@ip-172-30-0-176:~$ remote-task ImportCountryWorkflow --n-reduce-tasks 1 --host localhost --remote-name analyticstack --user hadoop --repo https://github.com/codetigerco/edx-analytics-pipeline --branch stage_edxapp --local-scheduler --verbose --wait --overwrite --interval-start 2017-01-01 --overwrite-n-days 16 --geolocation-data hdfs://localhost:9000/data/GeoIP.dat --override-config $HOME/edx-analytics-pipeline/config/devstack.cfg --skip-setup
Parsed arguments = Namespace(branch='stage_edxapp', extra_repo=None, host='localhost', job_flow_id=None, job_flow_name=None, launch_task_arguments=['ImportCountryWorkflow', '--n-reduce-tasks', '1', '--local-scheduler', '--overwrite', '--interval-start', '2017-01-01', '--overwrite-n-days', '16', '--geolocation-data', 'hdfs://localhost:9000/data/GeoIP.dat'], log_path=None, override_config='/edx/app/hadoop/edx-analytics-pipeline/config/devstack.cfg', package=None, private_key=None, remote_name='analyticstack', repo='https://github.com/codetigerco/edx-analytics-pipeline', secure_config=None, secure_config_branch=None, secure_config_repo=None, shell=None, skip_setup=True, sudo_user='hadoop', user='hadoop', vagrant_path=None, verbose=True, virtualenv_extra_args=None, wait=True, wheel_url=None, workflow_profiler=None)
Running commands from path = /edx/app/hadoop/pipeline/share/edx.analytics.tasks
Remote name = analyticstack
Running command = ['ssh', '-tt', '-o', 'ForwardAgent=yes', '-o', 'StrictHostKeyChecking=no', '-o', 'UserKnownHostsFile=/dev/null', '-o', 'KbdInteractiveAuthentication=no', '-o', 'PasswordAuthentication=no', '-o', 'User=hadoop', '-o', 'ConnectTimeout=10', 'localhost', "sudo -Hu hadoop /bin/bash -c 'cd /var/lib/analytics-tasks/analyticstack/repo && . $HOME/.bashrc && /var/lib/analytics-tasks/analyticstack/venv/bin/launch-task ImportCountryWorkflow --n-reduce-tasks 1 --local-scheduler --overwrite --interval-start 2017-01-01 --overwrite-n-days 16 --geolocation-data hdfs://localhost:9000/data/GeoIP.dat'"]
Warning: Permanently added 'localhost' (ECDSA) to the list of known hosts.
DEBUG:stevedore.extension:found extension EntryPoint.parse('sqoop-import = edx.analytics.tasks.common.sqoop:SqoopImportFromMysql')
DEBUG:stevedore.extension:found extension EntryPoint.parse('run-vertica-sql-script = edx.analytics.tasks.warehouse.run_vertica_sql_script:RunVerticaSqlScriptTask')
DEBUG:stevedore.extension:found extension EntryPoint.parse('obfuscation = edx.analytics.tasks.export.obfuscation:ObfuscatedCourseTask')
DEBUG:stevedore.extension:found extension EntryPoint.parse('enrollment_validation = edx.analytics.tasks.monitor.enrollment_validation:CourseEnrollmentValidationTask')
DEBUG:stevedore.extension:found extension EntryPoint.parse('inc-enrollments-report = edx.analytics.tasks.warehouse.enrollment.incremental_enrollments:WeeklyIncrementalUsersAndEnrollments')
DEBUG:stevedore.extension:found extension EntryPoint.parse('total-enrollments-report = edx.analytics.tasks.warehouse.enrollment.total_enrollments:WeeklyAllUsersAndEnrollments')
DEBUG:stevedore.extension:found extension EntryPoint.parse('orders = edx.analytics.tasks.warehouse.financial.orders_import:OrderTableTask')
DEBUG:stevedore.extension:found extension EntryPoint.parse('problem_response = edx.analytics.tasks.insights.problem_response:LatestProblemResponseDataTask')
DEBUG:stevedore.extension:found extension EntryPoint.parse('push_to_vertica_lms_courseware_link_clicked = edx.analytics.tasks.warehouse.lms_courseware_link_clicked:PushToVerticaLMSCoursewareLinkClickedTask')
DEBUG:stevedore.extension:found extension EntryPoint.parse('video = edx.analytics.tasks.insights.video:InsertToMysqlAllVideoTask')
DEBUG:stevedore.extension:found extension EntryPoint.parse('ed_services_report = edx.analytics.tasks.warehouse.financial.ed_services_financial_report:BuildEdServicesReportTask')
DEBUG:stevedore.extension:found extension EntryPoint.parse('load-internal-database = edx.analytics.tasks.warehouse.load_internal_reporting_database:ImportMysqlToVerticaTask')
DEBUG:stevedore.extension:found extension EntryPoint.parse('export-student-module = edx.analytics.tasks.export.database_exports:StudentModulePerCourseAfterImportWorkflow')
DEBUG:stevedore.extension:found extension EntryPoint.parse('calendar = edx.analytics.tasks.insights.calendar_task:CalendarTableTask')
DEBUG:stevedore.extension:found extension EntryPoint.parse('load-warehouse = edx.analytics.tasks.warehouse.load_warehouse:LoadWarehouseWorkflow')
DEBUG:stevedore.extension:found extension EntryPoint.parse('cybersource = edx.analytics.tasks.warehouse.financial.cybersource:DailyPullFromCybersourceTask')
DEBUG:stevedore.extension:found extension EntryPoint.parse('load-d-user = edx.analytics.tasks.warehouse.load_internal_reporting_user:LoadInternalReportingUserToWarehouse')
DEBUG:stevedore.extension:found extension EntryPoint.parse('location-per-course = edx.analytics.tasks.insights.location_per_course:LastCountryOfUser')
DEBUG:stevedore.extension:found extension EntryPoint.parse('payment_reconcile = edx.analytics.tasks.warehouse.financial.reconcile:ReconcileOrdersAndTransactionsTask')
DEBUG:stevedore.extension:found extension EntryPoint.parse('enrollments-report = edx.analytics.tasks.warehouse.enrollment.enrollments:EnrollmentsByWeek')
DEBUG:stevedore.extension:found extension EntryPoint.parse('financial_reports = edx.analytics.tasks.warehouse.financial.finance_reports:BuildFinancialReportsTask')
DEBUG:stevedore.extension:found extension EntryPoint.parse('engagement = edx.analytics.tasks.insights.module_engagement:ModuleEngagementDataTask')
DEBUG:stevedore.extension:found extension EntryPoint.parse('events_obfuscation = edx.analytics.tasks.export.events_obfuscation:ObfuscateCourseEventsTask')
DEBUG:stevedore.extension:found extension EntryPoint.parse('dump-student-module = edx.analytics.tasks.export.database_exports:StudentModulePerCourseTask')
DEBUG:stevedore.extension:found extension EntryPoint.parse('export-events-by-course = edx.analytics.tasks.export.event_exports_by_course:EventExportByCourseTask')
DEBUG:stevedore.extension:found extension EntryPoint.parse('noop = edx.analytics.tasks.monitor.performance:ParseEventLogPerformanceTask')
DEBUG:stevedore.extension:found extension EntryPoint.parse('course_blocks = edx.analytics.tasks.insights.course_blocks:CourseBlocksApiDataTask')
DEBUG:stevedore.extension:found extension EntryPoint.parse('load-d-certificates = edx.analytics.tasks.warehouse.load_internal_reporting_certificates:LoadInternalReportingCertificatesToWarehouse')
DEBUG:stevedore.extension:found extension EntryPoint.parse('user-activity = edx.analytics.tasks.insights.user_activity:CourseActivityWeeklyTask')
DEBUG:stevedore.extension:found extension EntryPoint.parse('run-vertica-sql-scripts = edx.analytics.tasks.warehouse.run_vertica_sql_scripts:RunVerticaSqlScriptTask')
DEBUG:stevedore.extension:found extension EntryPoint.parse('paypal = edx.analytics.tasks.warehouse.financial.paypal:PaypalTransactionsByDayTask')
DEBUG:stevedore.extension:found extension EntryPoint.parse('grade-dist = edx.analytics.tasks.data_api.studentmodule_dist:GradeDistFromSqoopToMySQLWorkflow')
DEBUG:stevedore.extension:found extension EntryPoint.parse('enrollments_and_registrations_workflow-manifest = edx.analytics.tasks.warehouse.enrollment.enrollments_and_registrations_workflow_manifest:EnrollmentsandRegistrationsWorkflow')
DEBUG:stevedore.extension:found extension EntryPoint.parse('database-import = edx.analytics.tasks.insights.database_imports:ImportAllDatabaseTablesTask')
DEBUG:stevedore.extension:found extension EntryPoint.parse('catalog = edx.analytics.tasks.warehouse.course_catalog:CourseCatalogWorkflow')
DEBUG:stevedore.extension:found extension EntryPoint.parse('load-course-catalog = edx.analytics.tasks.warehouse.load_internal_reporting_course_catalog:PullCourseCatalogAPIData')
DEBUG:stevedore.extension:found extension EntryPoint.parse('enrollments = edx.analytics.tasks.insights.enrollments:ImportEnrollmentsIntoMysql')
DEBUG:stevedore.extension:found extension EntryPoint.parse('event-type-dist = edx.analytics.tasks.warehouse.event_type_dist:PushToVerticaEventTypeDistributionTask')
DEBUG:stevedore.extension:found extension EntryPoint.parse('tags-dist = edx.analytics.tasks.insights.tags_dist:TagsDistributionPerCourse')
DEBUG:stevedore.extension:found extension EntryPoint.parse('export-events = edx.analytics.tasks.export.event_exports:EventExportTask')
DEBUG:stevedore.extension:found extension EntryPoint.parse('course-enroll = edx.analytics.tasks.warehouse.enrollment.course_enroll:CourseEnrollmentChangesPerDay')
DEBUG:stevedore.extension:found extension EntryPoint.parse('data_obfuscation = edx.analytics.tasks.export.data_obfuscation:ObfuscatedCourseDumpTask')
DEBUG:stevedore.extension:found extension EntryPoint.parse('course_list = edx.analytics.tasks.insights.course_list:CourseListApiDataTask')
DEBUG:stevedore.extension:found extension EntryPoint.parse('load-d-user-course = edx.analytics.tasks.warehouse.load_internal_reporting_user_course:LoadUserCourseSummary')
DEBUG:stevedore.extension:found extension EntryPoint.parse('load-d-country = edx.analytics.tasks.warehouse.load_internal_reporting_country:LoadInternalReportingCountryToWarehouse')
DEBUG:stevedore.extension:found extension EntryPoint.parse('overall_events = edx.analytics.tasks.monitor.overall_events:TotalEventsDailyTask')
DEBUG:stevedore.extension:found extension EntryPoint.parse('load-f-user-activity = edx.analytics.tasks.warehouse.load_internal_reporting_user_activity:LoadInternalReportingUserActivityToWarehouse')
DEBUG:stevedore.extension:found extension EntryPoint.parse('student_engagement = edx.analytics.tasks.data_api.student_engagement:StudentEngagementTask')
DEBUG:stevedore.extension:found extension EntryPoint.parse('answer-dist = edx.analytics.tasks.insights.answer_dist:AnswerDistributionPerCourse')
DEBUG:stevedore.extension:found extension EntryPoint.parse('insert-into-table = edx.analytics.tasks.common.mysql_load:MysqlInsertTask')
DEBUG:stevedore.extension:found extension EntryPoint.parse('all_events_report = edx.analytics.tasks.monitor.total_events_report:TotalEventsReportWorkflow')
DEBUG:edx.analytics.tasks.launchers.local:Using override.cfg
2017-01-19 18:01:24,426 INFO 16985 [luigi-interface] worker.py:267 - Scheduled ImportCountryWorkflow(source=('hdfs://localhost:9000/data/',), expand_interval=2 days, 0:00:00, pattern=('.*tracking.log.*',), date_pattern=%Y%m%d, geolocation_data=hdfs://localhost:9000/data/GeoIP.dat, warehouse_path=hdfs://localhost:9000/edx-analytics-pipeline/warehouse/, interval=None) (PENDING)
2017-01-19 18:01:24,428 INFO 16985 [luigi-interface] worker.py:267 - Scheduled InsertToMysqlCourseEnrollByCountryWorkflow(source=('hdfs://localhost:9000/data/',), expand_interval=2 days, 0:00:00, pattern=('.*tracking.log.*',), date_pattern=%Y%m%d, geolocation_data=hdfs://localhost:9000/data/GeoIP.dat, warehouse_path=hdfs://localhost:9000/edx-analytics-pipeline/warehouse/, interval=2017-01-01-2017-01-19) (PENDING)
2017-01-19 18:01:24,430 INFO 16985 [luigi-interface] worker.py:267 - Scheduled InsertToMysqlLastCountryPerCourseTask(source=('hdfs://localhost:9000/data/',), expand_interval=2 days, 0:00:00, pattern=('.*tracking.log.*',), date_pattern=%Y%m%d, geolocation_data=hdfs://localhost:9000/data/GeoIP.dat, database=stage_reports, credentials=/edx/etc/edx-analytics-pipeline/output.json, warehouse_path=hdfs://localhost:9000/edx-analytics-pipeline/warehouse/, interval=2017-01-01-2017-01-19) (PENDING)
2017-01-19 18:01:24,445 INFO 16985 [luigi-interface] worker.py:267 - Scheduled QueryLastCountryPerCourseTask(source=('hdfs://localhost:9000/data/',), expand_interval=2 days, 0:00:00, pattern=('.*tracking.log.*',), date_pattern=%Y%m%d, geolocation_data=hdfs://localhost:9000/data/GeoIP.dat, warehouse_path=hdfs://localhost:9000/edx-analytics-pipeline/warehouse/, interval=2017-01-01-2017-01-19) (PENDING)
2017-01-19 18:01:24,446 INFO 16985 [luigi-interface] worker.py:267 - Scheduled ImportAuthUserTask(destination=hdfs://localhost:9000/edx-analytics-pipeline/warehouse/, credentials=/edx/etc/edx-analytics-pipeline/input.json, database=stage_edxapp, import_date=None) (PENDING)
2017-01-19 18:01:24,447 INFO 16985 [luigi-interface] worker.py:267 - Scheduled SqoopImportFromMysql(destination=hdfs://localhost:9000/edx-analytics-pipeline/warehouse/auth_user/dt=2017-01-19/, credentials=/edx/etc/edx-analytics-pipeline/input.json, database=stage_edxapp, num_mappers=None, verbose=False, table_name=auth_user, where=None, columns=('id', 'username', 'last_login', 'date_joined', 'is_active', 'is_superuser', 'is_staff', 'email'), null_string=\\N, fields_terminated_by=, delimiter_replacement= , mysql_delimiters=False) (PENDING)
2017-01-19 18:01:24,448 INFO 16985 [luigi-interface] worker.py:267 - Scheduled ExternalURL(url=/edx/etc/edx-analytics-pipeline/input.json) (DONE)
2017-01-19 18:01:24,449 INFO 16985 [luigi-interface] worker.py:267 - Scheduled ImportStudentCourseEnrollmentTask(destination=hdfs://localhost:9000/edx-analytics-pipeline/warehouse/, credentials=/edx/etc/edx-analytics-pipeline/input.json, database=stage_edxapp, import_date=None) (PENDING)
2017-01-19 18:01:24,449 INFO 16985 [luigi-interface] worker.py:267 - Scheduled SqoopImportFromMysql(destination=hdfs://localhost:9000/edx-analytics-pipeline/warehouse/student_courseenrollment/dt=2017-01-19/, credentials=/edx/etc/edx-analytics-pipeline/input.json, database=stage_edxapp, num_mappers=None, verbose=False, table_name=student_courseenrollment, where=None, columns=('id', 'user_id', 'course_id', 'created', 'is_active', 'mode'), null_string=\\N, fields_terminated_by=, delimiter_replacement= , mysql_delimiters=False) (PENDING)
2017-01-19 18:01:24,451 INFO 16985 [luigi-interface] worker.py:267 - Scheduled LastCountryOfUserPartitionTask(source=('hdfs://localhost:9000/data/',), expand_interval=2 days, 0:00:00, pattern=('.*tracking.log.*',), date_pattern=%Y%m%d, geolocation_data=hdfs://localhost:9000/data/GeoIP.dat, warehouse_path=hdfs://localhost:9000/edx-analytics-pipeline/warehouse/, interval=2017-01-01-2017-01-19) (PENDING)
2017-01-19 18:01:38,736 INFO 16985 [luigi-interface] worker.py:267 - Scheduled LastCountryOfUserTableTask(warehouse_path=hdfs://localhost:9000/edx-analytics-pipeline/warehouse/) (DONE)
2017-01-19 18:01:38,750 INFO 16985 [luigi-interface] worker.py:267 - Scheduled LastCountryOfUser(source=('hdfs://localhost:9000/data/',), expand_interval=2 days, 0:00:00, pattern=('.*tracking.log.*',), date_pattern=%Y%m%d, geolocation_data=hdfs://localhost:9000/data/GeoIP.dat, warehouse_path=hdfs://localhost:9000/edx-analytics-pipeline/warehouse/, interval=2017-01-01-2017-01-19) (PENDING)
2017-01-19 18:01:38,751 INFO 16985 [luigi-interface] worker.py:267 - Scheduled LastDailyIpAddressOfUserTask(source=('hdfs://localhost:9000/data/',), interval=2017-01-03-2017-01-19, expand_interval=2 days, 0:00:00, pattern=('.*tracking.log.*',), date_pattern=%Y%m%d, warehouse_path=hdfs://localhost:9000/edx-analytics-pipeline/warehouse/) (PENDING)
2017-01-19 18:01:47,033 INFO 16985 [luigi-interface] worker.py:267 - Scheduled PathSelectionByDateIntervalTask(source=('hdfs://localhost:9000/data/',), interval=2017-01-03-2017-01-19, expand_interval=2 days, 0:00:00, pattern=('.*tracking.log.*',), date_pattern=%Y%m%d) (DONE)
2017-01-19 18:01:51,024 INFO 16985 [luigi-interface] worker.py:267 - Scheduled ExternalURL(url=hdfs://localhost:9000/data/GeoIP.dat) (DONE)
2017-01-19 18:01:59,535 INFO 16985 [luigi-interface] worker.py:267 - Scheduled PathSelectionByDateIntervalTask(source=('hdfs://localhost:9000/edx-analytics-pipeline/warehouse/last_ip_of_user',), interval=2017-01-01-2017-01-03, expand_interval=0:00:00, pattern=('.*?last_ip_of_user_(?P<date>\\d{4}-\\d{2}-\\d{2})',), date_pattern=%Y-%m-%d) (DONE)
2017-01-19 18:01:59,536 INFO 16985 [luigi-interface] worker.py:267 - Scheduled UncheckedExternalURL(url=hdfs://localhost:9000/edx-analytics-pipeline/warehouse/last_ip_of_user/dt=2017-01-18/last_ip_of_user_2017-01-18) (DONE)
2017-01-19 18:01:59,536 INFO 16985 [luigi-interface] worker.py:267 - Scheduled UncheckedExternalURL(url=hdfs://localhost:9000/edx-analytics-pipeline/warehouse/last_ip_of_user/dt=2017-01-17/last_ip_of_user_2017-01-17) (DONE)
2017-01-19 18:01:59,537 INFO 16985 [luigi-interface] worker.py:267 - Scheduled UncheckedExternalURL(url=hdfs://localhost:9000/edx-analytics-pipeline/warehouse/last_ip_of_user/dt=2017-01-16/last_ip_of_user_2017-01-16) (DONE)
2017-01-19 18:01:59,537 INFO 16985 [luigi-interface] worker.py:267 - Scheduled UncheckedExternalURL(url=hdfs://localhost:9000/edx-analytics-pipeline/warehouse/last_ip_of_user/dt=2017-01-15/last_ip_of_user_2017-01-15) (DONE)
2017-01-19 18:01:59,537 INFO 16985 [luigi-interface] worker.py:267 - Scheduled UncheckedExternalURL(url=hdfs://localhost:9000/edx-analytics-pipeline/warehouse/last_ip_of_user/dt=2017-01-14/last_ip_of_user_2017-01-14) (DONE)
2017-01-19 18:01:59,538 INFO 16985 [luigi-interface] worker.py:267 - Scheduled UncheckedExternalURL(url=hdfs://localhost:9000/edx-analytics-pipeline/warehouse/last_ip_of_user/dt=2017-01-13/last_ip_of_user_2017-01-13) (DONE)
2017-01-19 18:01:59,538 INFO 16985 [luigi-interface] worker.py:267 - Scheduled UncheckedExternalURL(url=hdfs://localhost:9000/edx-analytics-pipeline/warehouse/last_ip_of_user/dt=2017-01-12/last_ip_of_user_2017-01-12) (DONE)
2017-01-19 18:01:59,538 INFO 16985 [luigi-interface] worker.py:267 - Scheduled UncheckedExternalURL(url=hdfs://localhost:9000/edx-analytics-pipeline/warehouse/last_ip_of_user/dt=2017-01-11/last_ip_of_user_2017-01-11) (DONE)
2017-01-19 18:01:59,539 INFO 16985 [luigi-interface] worker.py:267 - Scheduled UncheckedExternalURL(url=hdfs://localhost:9000/edx-analytics-pipeline/warehouse/last_ip_of_user/dt=2017-01-10/last_ip_of_user_2017-01-10) (DONE)
2017-01-19 18:01:59,539 INFO 16985 [luigi-interface] worker.py:267 - Scheduled UncheckedExternalURL(url=hdfs://localhost:9000/edx-analytics-pipeline/warehouse/last_ip_of_user/dt=2017-01-09/last_ip_of_user_2017-01-09) (DONE)
2017-01-19 18:01:59,539 INFO 16985 [luigi-interface] worker.py:267 - Scheduled UncheckedExternalURL(url=hdfs://localhost:9000/edx-analytics-pipeline/warehouse/last_ip_of_user/dt=2017-01-08/last_ip_of_user_2017-01-08) (DONE)
2017-01-19 18:01:59,540 INFO 16985 [luigi-interface] worker.py:267 - Scheduled UncheckedExternalURL(url=hdfs://localhost:9000/edx-analytics-pipeline/warehouse/last_ip_of_user/dt=2017-01-07/last_ip_of_user_2017-01-07) (DONE)
2017-01-19 18:01:59,540 INFO 16985 [luigi-interface] worker.py:267 - Scheduled UncheckedExternalURL(url=hdfs://localhost:9000/edx-analytics-pipeline/warehouse/last_ip_of_user/dt=2017-01-06/last_ip_of_user_2017-01-06) (DONE)
2017-01-19 18:01:59,540 INFO 16985 [luigi-interface] worker.py:267 - Scheduled UncheckedExternalURL(url=hdfs://localhost:9000/edx-analytics-pipeline/warehouse/last_ip_of_user/dt=2017-01-05/last_ip_of_user_2017-01-05) (DONE)
2017-01-19 18:01:59,540 INFO 16985 [luigi-interface] worker.py:267 - Scheduled UncheckedExternalURL(url=hdfs://localhost:9000/edx-analytics-pipeline/warehouse/last_ip_of_user/dt=2017-01-04/last_ip_of_user_2017-01-04) (DONE)
2017-01-19 18:01:59,541 INFO 16985 [luigi-interface] worker.py:267 - Scheduled UncheckedExternalURL(url=hdfs://localhost:9000/edx-analytics-pipeline/warehouse/last_ip_of_user/dt=2017-01-03/last_ip_of_user_2017-01-03) (DONE)
2017-01-19 18:01:59,541 INFO 16985 [luigi-interface] worker.py:267 - Scheduled ExternalURL(url=/edx/etc/edx-analytics-pipeline/output.json) (DONE)
2017-01-19 18:01:59,542 INFO 16985 [luigi-interface] worker.py:267 - Scheduled InsertToMysqlLastCountryOfUserTask(source=('hdfs://localhost:9000/data/',), expand_interval=2 days, 0:00:00, pattern=('.*tracking.log.*',), date_pattern=%Y%m%d, geolocation_data=hdfs://localhost:9000/data/GeoIP.dat, database=stage_reports, credentials=/edx/etc/edx-analytics-pipeline/output.json, warehouse_path=hdfs://localhost:9000/edx-analytics-pipeline/warehouse/, interval=2017-01-01-2017-01-19) (PENDING)
2017-01-19 18:01:59,544 INFO 16985 [luigi-interface] worker.py:267 - Scheduled LoadInternalReportingCountryTableHive(source=('hdfs://localhost:9000/data/',), expand_interval=2 days, 0:00:00, pattern=('.*tracking.log.*',), date_pattern=%Y%m%d, geolocation_data=hdfs://localhost:9000/data/GeoIP.dat, warehouse_path=hdfs://localhost:9000/edx-analytics-pipeline/warehouse/, interval=2017-01-01-2017-01-19) (PENDING)
2017-01-19 18:01:59,544 INFO 16985 [luigi-interface] interface.py:193 - Done scheduling tasks
2017-01-19 18:01:59,544 INFO 16985 [luigi-interface] worker.py:282 - [pid 16985] Worker Worker(salt=062530328, host=ip-172-30-0-176, username=hadoop, pid=16985) running SqoopImportFromMysql(destination=hdfs://localhost:9000/edx-analytics-pipeline/warehouse/auth_user/dt=2017-01-19/, credentials=/edx/etc/edx-analytics-pipeline/input.json, database=stage_edxapp, num_mappers=None, verbose=False, table_name=auth_user, where=None, columns=('id', 'username', 'last_login', 'date_joined', 'is_active', 'is_superuser', 'is_staff', 'email'), null_string=\\N, fields_terminated_by=, delimiter_replacement= , mysql_delimiters=False)
2017-01-19 18:02:25,432 INFO 16985 [luigi-interface] hadoop.py:242 - sqoop import --connect jdbc:mysql://stage-instituto-is.ct77wcfkafjj.us-east-1.rds.amazonaws.com/stage_edxapp --username stage_read_only --password-file /tmp/luigi/partial/luigitemp-591492406 --table auth_user --target-dir hdfs://localhost:9000/edx-analytics-pipeline/warehouse/auth_user/dt=2017-01-19/ --columns id,username,last_login,date_joined,is_active,is_superuser,is_staff,email --null-string \\N --null-non-string \\N --fields-terminated-by  --hive-delims-replacement --direct
2017-01-19 18:02:27,324 INFO 16985 [luigi-interface] hadoop.py:273 - 17/01/19 18:02:27 INFO sqoop.Sqoop: Running Sqoop version: 1.4.6
2017-01-19 18:02:29,780 INFO 16985 [luigi-interface] hadoop.py:273 - 17/01/19 18:02:29 INFO manager.MySQLManager: Preparing to use a MySQL streaming resultset.
2017-01-19 18:02:29,780 INFO 16985 [luigi-interface] hadoop.py:273 - 17/01/19 18:02:29 INFO tool.CodeGenTool: Beginning code generation
2017-01-19 18:02:30,694 INFO 16985 [luigi-interface] hadoop.py:273 - 17/01/19 18:02:30 INFO manager.SqlManager: Executing SQL statement: SELECT t.* FROM `auth_user` AS t LIMIT 1
2017-01-19 18:02:30,761 INFO 16985 [luigi-interface] hadoop.py:273 - 17/01/19 18:02:30 INFO manager.SqlManager: Executing SQL statement: SELECT t.* FROM `auth_user` AS t LIMIT 1
2017-01-19 18:02:30,791 INFO 16985 [luigi-interface] hadoop.py:273 - 17/01/19 18:02:30 INFO orm.CompilationManager: HADOOP_MAPRED_HOME is /edx/app/hadoop/hadoop
2017-01-19 18:02:36,891 INFO 16985 [luigi-interface] hadoop.py:273 - Note: /tmp/sqoop-hadoop/compile/70fd7a72a651031bc6338d8402d7e09f/auth_user.java uses or overrides a deprecated API.
2017-01-19 18:02:36,891 INFO 16985 [luigi-interface] hadoop.py:273 - Note: Recompile with -Xlint:deprecation for details.
2017-01-19 18:02:36,907 INFO 16985 [luigi-interface] hadoop.py:273 - 17/01/19 18:02:36 INFO orm.CompilationManager: Writing jar file: /tmp/sqoop-hadoop/compile/70fd7a72a651031bc6338d8402d7e09f/auth_user.jar
2017-01-19 18:02:36,932 INFO 16985 [luigi-interface] hadoop.py:273 - 17/01/19 18:02:36 WARN manager.DirectMySQLManager: Direct-mode import from MySQL does not support column
2017-01-19 18:02:36,933 INFO 16985 [luigi-interface] hadoop.py:273 - 17/01/19 18:02:36 WARN manager.DirectMySQLManager: selection. Falling back to JDBC-based import.
2017-01-19 18:02:36,933 INFO 16985 [luigi-interface] hadoop.py:273 - 17/01/19 18:02:36 INFO manager.MySQLManager: Setting zero DATETIME behavior to convertToNull (mysql)
2017-01-19 18:02:36,970 INFO 16985 [luigi-interface] hadoop.py:273 - 17/01/19 18:02:36 INFO mapreduce.ImportJobBase: Beginning import of auth_user
2017-01-19 18:02:37,019 INFO 16985 [luigi-interface] hadoop.py:273 - 17/01/19 18:02:37 INFO Configuration.deprecation: mapred.jar is deprecated. Instead, use mapreduce.job.jar
2017-01-19 18:02:37,085 INFO 16985 [luigi-interface] hadoop.py:273 - 17/01/19 18:02:37 INFO Configuration.deprecation: mapred.map.tasks is deprecated. Instead, use mapreduce.job.maps
2017-01-19 18:02:37,366 INFO 16985 [luigi-interface] hadoop.py:273 - 17/01/19 18:02:37 INFO client.RMProxy: Connecting to ResourceManager at /0.0.0.0:8032
2017-01-19 18:02:44,269 INFO 16985 [luigi-interface] hadoop.py:273 - 17/01/19 18:02:44 INFO db.DBInputFormat: Using read commited transaction isolation
2017-01-19 18:02:44,270 INFO 16985 [luigi-interface] hadoop.py:273 - 17/01/19 18:02:44 INFO db.DataDrivenDBInputFormat: BoundingValsQuery: SELECT MIN(`id`), MAX(`id`) FROM `auth_user`
2017-01-19 18:02:44,799 INFO 16985 [luigi-interface] hadoop.py:273 - 17/01/19 18:02:44 INFO mapreduce.JobSubmitter: number of splits:4
2017-01-19 18:02:45,144 INFO 16985 [luigi-interface] hadoop.py:273 - 17/01/19 18:02:45 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1484841390427_0009
2017-01-19 18:02:45,820 INFO 16985 [luigi-interface] hadoop.py:273 - 17/01/19 18:02:45 INFO impl.YarnClientImpl: Submitted application application_1484841390427_0009
2017-01-19 18:02:45,892 INFO 16985 [luigi-interface] hadoop.py:273 - 17/01/19 18:02:45 INFO mapreduce.Job: The url to track the job: http://ip-172-30-0-176:8088/proxy/application_1484841390427_0009/
2017-01-19 18:02:45,893 INFO 16985 [luigi-interface] hadoop.py:273 - 17/01/19 18:02:45 INFO mapreduce.Job: Running job: job_1484841390427_0009
2017-01-19 18:03:00,016 INFO 16985 [luigi-interface] hadoop.py:273 - 17/01/19 18:03:00 INFO mapreduce.Job: Job job_1484841390427_0009 running in uber mode : false
2017-01-19 18:03:00,017 INFO 16985 [luigi-interface] hadoop.py:273 - 17/01/19 18:03:00 INFO mapreduce.Job: map 0% reduce 0%
2017-01-19 18:03:43,917 INFO 16985 [luigi-interface] hadoop.py:273 - 17/01/19 18:03:43 INFO mapreduce.Job: map 25% reduce 0%
2017-01-19 18:03:44,983 INFO 16985 [luigi-interface] hadoop.py:273 - 17/01/19 18:03:44 INFO mapreduce.Job: map 50% reduce 0%
2017-01-19 18:03:45,988 INFO 16985 [luigi-interface] hadoop.py:273 - 17/01/19 18:03:45 INFO mapreduce.Job: map 100% reduce 0%
2017-01-19 18:03:48,054 INFO 16985 [luigi-interface] hadoop.py:273 - 17/01/19 18:03:48 INFO mapreduce.Job: Job job_1484841390427_0009 completed successfully
2017-01-19 18:03:48,439 INFO 16985 [luigi-interface] hadoop.py:273 - 17/01/19 18:03:48 INFO mapreduce.Job: Counters: 30
2017-01-19 18:03:48,439 INFO 16985 [luigi-interface] hadoop.py:273 - File System Counters
2017-01-19 18:03:48,440 INFO 16985 [luigi-interface] hadoop.py:273 - FILE: Number of bytes read=0
2017-01-19 18:03:48,440 INFO 16985 [luigi-interface] hadoop.py:273 - FILE: Number of bytes written=417068
2017-01-19 18:03:48,440 INFO 16985 [luigi-interface] hadoop.py:273 - FILE: Number of read operations=0
2017-01-19 18:03:48,440 INFO 16985 [luigi-interface] hadoop.py:273 - FILE: Number of large read operations=0
2017-01-19 18:03:48,441 INFO 16985 [luigi-interface] hadoop.py:273 - FILE: Number of write operations=0
2017-01-19 18:03:48,441 INFO 16985 [luigi-interface] hadoop.py:273 - HDFS: Number of bytes read=394
2017-01-19 18:03:48,441 INFO 16985 [luigi-interface] hadoop.py:273 - HDFS: Number of bytes written=781
2017-01-19 18:03:48,441 INFO 16985 [luigi-interface] hadoop.py:273 - HDFS: Number of read operations=16
2017-01-19 18:03:48,441 INFO 16985 [luigi-interface] hadoop.py:273 - HDFS: Number of large read operations=0
2017-01-19 18:03:48,442 INFO 16985 [luigi-interface] hadoop.py:273 - HDFS: Number of write operations=8
2017-01-19 18:03:48,450 INFO 16985 [luigi-interface] hadoop.py:273 - Job Counters
2017-01-19 18:03:48,450 INFO 16985 [luigi-interface] hadoop.py:273 - Launched map tasks=4
2017-01-19 18:03:48,450 INFO 16985 [luigi-interface] hadoop.py:273 - Other local map tasks=4
2017-01-19 18:03:48,450 INFO 16985 [luigi-interface] hadoop.py:273 - Total time spent by all maps in occupied slots (ms)=166540
2017-01-19 18:03:48,450 INFO 16985 [luigi-interface] hadoop.py:273 - Total time spent by all reduces in occupied slots (ms)=0
2017-01-19 18:03:48,450 INFO 16985 [luigi-interface] hadoop.py:273 - Total time spent by all map tasks (ms)=166540
2017-01-19 18:03:48,451 INFO 16985 [luigi-interface] hadoop.py:273 - Total vcore-seconds taken by all map tasks=166540
2017-01-19 18:03:48,451 INFO 16985 [luigi-interface] hadoop.py:273 - Total megabyte-seconds taken by all map tasks=170536960
2017-01-19 18:03:48,451 INFO 16985 [luigi-interface] hadoop.py:273 - Map-Reduce Framework
2017-01-19 18:03:48,451 INFO 16985 [luigi-interface] hadoop.py:273 - Map input records=8
2017-01-19 18:03:48,451 INFO 16985 [luigi-interface] hadoop.py:273 - Map output records=8
2017-01-19 18:03:48,451 INFO 16985 [luigi-interface] hadoop.py:273 - Input split bytes=394
2017-01-19 18:03:48,451 INFO 16985 [luigi-interface] hadoop.py:273 - Spilled Records=0
2017-01-19 18:03:48,452 INFO 16985 [luigi-interface] hadoop.py:273 - Failed Shuffles=0
2017-01-19 18:03:48,453 INFO 16985 [luigi-interface] hadoop.py:273 - Merged Map outputs=0
2017-01-19 18:03:48,454 INFO 16985 [luigi-interface] hadoop.py:273 - GC time elapsed (ms)=2285
2017-01-19 18:03:48,454 INFO 16985 [luigi-interface] hadoop.py:273 - CPU time spent (ms)=6610
2017-01-19 18:03:48,470 INFO 16985 [luigi-interface] hadoop.py:273 - Physical memory (bytes) snapshot=440082432
2017-01-19 18:03:48,470 INFO 16985 [luigi-interface] hadoop.py:273 - Virtual memory (bytes) snapshot=8846278656
2017-01-19 18:03:48,470 INFO 16985 [luigi-interface] hadoop.py:273 - Total committed heap usage (bytes)=243531776
2017-01-19 18:03:48,470 INFO 16985 [luigi-interface] hadoop.py:273 - File Input Format Counters
2017-01-19 18:03:48,470 INFO 16985 [luigi-interface] hadoop.py:273 - Bytes Read=0
2017-01-19 18:03:48,471 INFO 16985 [luigi-interface] hadoop.py:273 - File Output Format Counters
2017-01-19 18:03:48,471 INFO 16985 [luigi-interface] hadoop.py:273 - Bytes Written=781
2017-01-19 18:03:48,505 INFO 16985 [luigi-interface] hadoop.py:273 - 17/01/19 18:03:48 INFO mapreduce.ImportJobBase: Transferred 781 bytes in 71.3525 seconds (10.9457 bytes/sec)
2017-01-19 18:03:48,554 INFO 16985 [luigi-interface] hadoop.py:273 - 17/01/19 18:03:48 INFO mapreduce.ImportJobBase: Retrieved 8 records.
2017-01-19 18:04:19,142 INFO 16985 [luigi-interface] worker.py:296 - [pid 16985] Worker Worker(salt=062530328, host=ip-172-30-0-176, username=hadoop, pid=16985) done SqoopImportFromMysql(destination=hdfs://localhost:9000/edx-analytics-pipeline/warehouse/auth_user/dt=2017-01-19/, credentials=/edx/etc/edx-analytics-pipeline/input.json, database=stage_edxapp, num_mappers=None, verbose=False, table_name=auth_user, where=None, columns=('id', 'username', 'last_login', 'date_joined', 'is_active', 'is_superuser', 'is_staff', 'email'), null_string=\\N, fields_terminated_by=, delimiter_replacement= , mysql_delimiters=False)
2017-01-19 18:04:19,143 INFO 16985 [luigi-interface] worker.py:282 - [pid 16985] Worker Worker(salt=062530328, host=ip-172-30-0-176, username=hadoop, pid=16985) running ImportAuthUserTask(destination=hdfs://localhost:9000/edx-analytics-pipeline/warehouse/, credentials=/edx/etc/edx-analytics-pipeline/input.json, database=stage_edxapp, import_date=None)
2017-01-19 18:04:23,371 INFO 16985 [edx.analytics.tasks.insights.database_imports] database_imports.py:110 - Marking existing output as having been removed for task ImportAuthUserTask(destination=hdfs://localhost:9000/edx-analytics-pipeline/warehouse/, credentials=/edx/etc/edx-analytics-pipeline/input.json, database=stage_edxapp, import_date=None)
2017-01-19 18:04:23,371 INFO 16985 [luigi-interface] hive.py:358 - ['hive', '-f', '/tmp/tmpksUMTq', '--hiveconf', 'mapred.job.name=ImportAuthUserTask(destination=hdfs://localhost:9000/edx-analytics-pipeline/warehouse/, credentials=/edx/etc/edx-analytics-pipeline/input.json, database=stage_edxapp, import_date=None)']
2017-01-19 18:04:23,371 INFO 16985 [luigi-interface] hadoop.py:242 - hive -f /tmp/tmpksUMTq --hiveconf mapred.job.name=ImportAuthUserTask(destination=hdfs://localhost:9000/edx-analytics-pipeline/warehouse/, credentials=/edx/etc/edx-analytics-pipeline/input.json, database=stage_edxapp, import_date=None)
2017-01-19 18:04:26,796 INFO 16985 [luigi-interface] hadoop.py:273 - 17/01/19 18:04:26 INFO Configuration.deprecation: mapred.input.dir.recursive is deprecated. Instead, use mapreduce.input.fileinputformat.input.dir.recursive
2017-01-19 18:04:26,797 INFO 16985 [luigi-interface] hadoop.py:273 - 17/01/19 18:04:26 INFO Configuration.deprecation: mapred.max.split.size is deprecated. Instead, use mapreduce.input.fileinputformat.split.maxsize
2017-01-19 18:04:26,799 INFO 16985 [luigi-interface] hadoop.py:273 - 17/01/19 18:04:26 INFO Configuration.deprecation: mapred.min.split.size is deprecated. Instead, use mapreduce.input.fileinputformat.split.minsize
2017-01-19 18:04:26,799 INFO 16985 [luigi-interface] hadoop.py:273 - 17/01/19 18:04:26 INFO Configuration.deprecation: mapred.min.split.size.per.rack is deprecated. Instead, use mapreduce.input.fileinputformat.split.minsize.per.rack
2017-01-19 18:04:26,799 INFO 16985 [luigi-interface] hadoop.py:273 - 17/01/19 18:04:26 INFO Configuration.deprecation: mapred.min.split.size.per.node is deprecated. Instead, use mapreduce.input.fileinputformat.split.minsize.per.node
2017-01-19 18:04:26,800 INFO 16985 [luigi-interface] hadoop.py:273 - 17/01/19 18:04:26 INFO Configuration.deprecation: mapred.reduce.tasks is deprecated. Instead, use mapreduce.job.reduces
2017-01-19 18:04:26,872 INFO 16985 [luigi-interface] hadoop.py:273 - 17/01/19 18:04:26 INFO Configuration.deprecation: mapred.reduce.tasks.speculative.execution is deprecated. Instead, use mapreduce.reduce.speculative
2017-01-19 18:04:27,530 INFO 16985 [luigi-interface] hadoop.py:273 - 17/01/19 18:04:27 WARN conf.Configuration: org.apache.hadoop.hive.conf.LoopingByteArrayInputStream@76f2b07d:an attempt to override final parameter: mapreduce.job.end-notification.max.retry.interval; Ignoring.
2017-01-19 18:04:27,547 INFO 16985 [luigi-interface] hadoop.py:273 - 17/01/19 18:04:27 WARN conf.Configuration: org.apache.hadoop.hive.conf.LoopingByteArrayInputStream@76f2b07d:an attempt to override final parameter: mapreduce.job.end-notification.max.attempts; Ignoring.
2017-01-19 18:04:27,553 INFO 16985 [luigi-interface] hadoop.py:273 - 17/01/19 18:04:27 INFO Configuration.deprecation: mapred.job.name is deprecated. Instead, use mapreduce.job.name
2017-01-19 18:04:27,987 INFO 16985 [luigi-interface] hadoop.py:273 - Logging initialized using configuration in jar:file:/edx/app/hadoop/hive-0.11.0-bin/lib/hive-common-0.11.0.jar!/hive-log4j.properties
2017-01-19 18:04:28,008 INFO 16985 [luigi-interface] hadoop.py:273 - Hive history file=/tmp/hadoop/hive_job_log_hadoop_18723@ip-172-30-0-176.ec2.internal_201701191804_1780438655.txt
2017-01-19 18:04:28,397 INFO 16985 [luigi-interface] hadoop.py:273 - SLF4J: Class path contains multiple SLF4J bindings.
2017-01-19 18:04:28,398 INFO 16985 [luigi-interface] hadoop.py:273 - SLF4J: Found binding in [jar:file:/edx/app/hadoop/hadoop-2.3.0/share/hadoop/common/lib/slf4j-log4j12-1.7.5.jar!/org/slf4j/impl/StaticLoggerBinder.class]
2017-01-19 18:04:28,398 INFO 16985 [luigi-interface] hadoop.py:273 - SLF4J: Found binding in [jar:file:/edx/app/hadoop/hive-0.11.0-bin/lib/slf4j-log4j12-1.6.1.jar!/org/slf4j/impl/StaticLoggerBinder.class]
2017-01-19 18:04:28,398 INFO 16985 [luigi-interface] hadoop.py:273 - SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
2017-01-19 18:04:28,401 INFO 16985 [luigi-interface] hadoop.py:273 - SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory]
2017-01-19 18:04:35,752 INFO 16985 [luigi-interface] hadoop.py:273 - OK
2017-01-19 18:04:35,753 INFO 16985 [luigi-interface] hadoop.py:273 - Time taken: 6.913 seconds
2017-01-19 18:04:39,964 INFO 16985 [luigi-interface] hadoop.py:273 - OK
2017-01-19 18:04:39,976 INFO 16985 [luigi-interface] hadoop.py:273 - Time taken: 4.222 seconds
2017-01-19 18:04:40,994 INFO 16985 [luigi-interface] hadoop.py:273 - OK
2017-01-19 18:04:40,995 INFO 16985 [luigi-interface] hadoop.py:273 - Time taken: 1.018 seconds
2017-01-19 18:04:41,786 INFO 16985 [luigi-interface] hadoop.py:273 - OK
2017-01-19 18:04:41,787 INFO 16985 [luigi-interface] hadoop.py:273 - Time taken: 0.792 seconds
2017-01-19 18:04:41,836 INFO 16985 [luigi-interface] worker.py:296 - [pid 16985] Worker Worker(salt=062530328, host=ip-172-30-0-176, username=hadoop, pid=16985) done ImportAuthUserTask(destination=hdfs://localhost:9000/edx-analytics-pipeline/warehouse/, credentials=/edx/etc/edx-analytics-pipeline/input.json, database=stage_edxapp, import_date=None)
2017-01-19 18:04:41,836 INFO 16985 [luigi-interface] worker.py:282 - [pid 16985] Worker Worker(salt=062530328, host=ip-172-30-0-176, username=hadoop, pid=16985) running SqoopImportFromMysql(destination=hdfs://localhost:9000/edx-analytics-pipeline/warehouse/student_courseenrollment/dt=2017-01-19/, credentials=/edx/etc/edx-analytics-pipeline/input.json, database=stage_edxapp, num_mappers=None, verbose=False, table_name=student_courseenrollment, where=None, columns=('id', 'user_id', 'course_id', 'created', 'is_active', 'mode'), null_string=\\N, fields_terminated_by=, delimiter_replacement= , mysql_delimiters=False)
2017-01-19 18:05:05,636 INFO 16985 [luigi-interface] hadoop.py:242 - sqoop import --connect jdbc:mysql://stage-instituto-is.ct77wcfkafjj.us-east-1.rds.amazonaws.com/stage_edxapp --username stage_read_only --password-file /tmp/luigi/partial/luigitemp-730345178 --table student_courseenrollment --target-dir hdfs://localhost:9000/edx-analytics-pipeline/warehouse/student_courseenrollment/dt=2017-01-19/ --columns id,user_id,course_id,created,is_active,mode --null-string \\N --null-non-string \\N --fields-terminated-by  --hive-delims-replacement --direct
2017-01-19 18:05:09,463 INFO 16985 [luigi-interface] hadoop.py:273 - 17/01/19 18:05:09 INFO sqoop.Sqoop: Running Sqoop version: 1.4.6
2017-01-19 18:05:12,963 INFO 16985 [luigi-interface] hadoop.py:273 - 17/01/19 18:05:12 INFO manager.MySQLManager: Preparing to use a MySQL streaming resultset.
2017-01-19 18:05:12,963 INFO 16985 [luigi-interface] hadoop.py:273 - 17/01/19 18:05:12 INFO tool.CodeGenTool: Beginning code generation
2017-01-19 18:05:13,825 INFO 16985 [luigi-interface] hadoop.py:273 - 17/01/19 18:05:13 INFO manager.SqlManager: Executing SQL statement: SELECT t.* FROM `student_courseenrollment` AS t LIMIT 1
2017-01-19 18:05:13,919 INFO 16985 [luigi-interface] hadoop.py:273 - 17/01/19 18:05:13 INFO manager.SqlManager: Executing SQL statement: SELECT t.* FROM `student_courseenrollment` AS t LIMIT 1
2017-01-19 18:05:13,925 INFO 16985 [luigi-interface] hadoop.py:273 - 17/01/19 18:05:13 INFO orm.CompilationManager: HADOOP_MAPRED_HOME is /edx/app/hadoop/hadoop
2017-01-19 18:05:18,543 INFO 16985 [luigi-interface] hadoop.py:273 - Note: /tmp/sqoop-hadoop/compile/5d8c6d16892db7daefd7c7ce3c4d19e6/student_courseenrollment.java uses or overrides a deprecated API.
2017-01-19 18:05:18,547 INFO 16985 [luigi-interface] hadoop.py:273 - Note: Recompile with -Xlint:deprecation for details.
2017-01-19 18:05:18,559 INFO 16985 [luigi-interface] hadoop.py:273 - 17/01/19 18:05:18 INFO orm.CompilationManager: Writing jar file: /tmp/sqoop-hadoop/compile/5d8c6d16892db7daefd7c7ce3c4d19e6/student_courseenrollment.jar
2017-01-19 18:05:18,577 INFO 16985 [luigi-interface] hadoop.py:273 - 17/01/19 18:05:18 WARN manager.DirectMySQLManager: Direct-mode import from MySQL does not support column
2017-01-19 18:05:18,591 INFO 16985 [luigi-interface] hadoop.py:273 - 17/01/19 18:05:18 WARN manager.DirectMySQLManager: selection. Falling back to JDBC-based import.
2017-01-19 18:05:18,591 INFO 16985 [luigi-interface] hadoop.py:273 - 17/01/19 18:05:18 INFO manager.MySQLManager: Setting zero DATETIME behavior to convertToNull (mysql)
2017-01-19 18:05:18,606 INFO 16985 [luigi-interface] hadoop.py:273 - 17/01/19 18:05:18 INFO mapreduce.ImportJobBase: Beginning import of student_courseenrollment
2017-01-19 18:05:18,629 INFO 16985 [luigi-interface] hadoop.py:273 - 17/01/19 18:05:18 INFO Configuration.deprecation: mapred.jar is deprecated. Instead, use mapreduce.job.jar
2017-01-19 18:05:18,699 INFO 16985 [luigi-interface] hadoop.py:273 - 17/01/19 18:05:18 INFO Configuration.deprecation: mapred.map.tasks is deprecated. Instead, use mapreduce.job.maps
2017-01-19 18:05:19,046 INFO 16985 [luigi-interface] hadoop.py:273 - 17/01/19 18:05:19 INFO client.RMProxy: Connecting to ResourceManager at /0.0.0.0:8032
2017-01-19 18:05:25,730 INFO 16985 [luigi-interface] hadoop.py:273 - 17/01/19 18:05:25 INFO db.DBInputFormat: Using read commited transaction isolation
2017-01-19 18:05:25,731 INFO 16985 [luigi-interface] hadoop.py:273 - 17/01/19 18:05:25 INFO db.DataDrivenDBInputFormat: BoundingValsQuery: SELECT MIN(`id`), MAX(`id`) FROM `student_courseenrollment`
2017-01-19 18:05:25,839 INFO 16985 [luigi-interface] hadoop.py:273 - 17/01/19 18:05:25 INFO mapreduce.JobSubmitter: number of splits:4
2017-01-19 18:05:26,264 INFO 16985 [luigi-interface] hadoop.py:273 - 17/01/19 18:05:26 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1484841390427_0010
2017-01-19 18:05:27,181 INFO 16985 [luigi-interface] hadoop.py:273 - 17/01/19 18:05:27 INFO impl.YarnClientImpl: Submitted application application_1484841390427_0010
2017-01-19 18:05:27,345 INFO 16985 [luigi-interface] hadoop.py:273 - 17/01/19 18:05:27 INFO mapreduce.Job: The url to track the job: http://ip-172-30-0-176:8088/proxy/application_1484841390427_0010/
2017-01-19 18:05:27,351 INFO 16985 [luigi-interface] hadoop.py:273 - 17/01/19 18:05:27 INFO mapreduce.Job: Running job: job_1484841390427_0010
2017-01-19 18:05:43,842 INFO 16985 [luigi-interface] hadoop.py:273 - 17/01/19 18:05:43 INFO mapreduce.Job: Job job_1484841390427_0010 running in uber mode : false
2017-01-19 18:05:43,842 INFO 16985 [luigi-interface] hadoop.py:273 - 17/01/19 18:05:43 INFO mapreduce.Job: map 0% reduce 0%
2017-01-19 18:06:24,534 INFO 16985 [luigi-interface] hadoop.py:273 - 17/01/19 18:06:24 INFO mapreduce.Job: map 50% reduce 0%
2017-01-19 18:06:27,769 INFO 16985 [luigi-interface] hadoop.py:273 - 17/01/19 18:06:27 INFO mapreduce.Job: map 100% reduce 0%
2017-01-19 18:06:28,794 INFO 16985 [luigi-interface] hadoop.py:273 - 17/01/19 18:06:28 INFO mapreduce.Job: Job job_1484841390427_0010 completed successfully
2017-01-19 18:06:29,133 INFO 16985 [luigi-interface] hadoop.py:273 - 17/01/19 18:06:29 INFO mapreduce.Job: Counters: 30
2017-01-19 18:06:29,133 INFO 16985 [luigi-interface] hadoop.py:273 - File System Counters
2017-01-19 18:06:29,133 INFO 16985 [luigi-interface] hadoop.py:273 - FILE: Number of bytes read=0
2017-01-19 18:06:29,134 INFO 16985 [luigi-interface] hadoop.py:273 - FILE: Number of bytes written=417176
2017-01-19 18:06:29,182 INFO 16985 [luigi-interface] hadoop.py:273 - FILE: Number of read operations=0
2017-01-19 18:06:29,182 INFO 16985 [luigi-interface] hadoop.py:273 - FILE: Number of large read operations=0
2017-01-19 18:06:29,182 INFO 16985 [luigi-interface] hadoop.py:273 - FILE: Number of write operations=0
2017-01-19 18:06:29,197 INFO 16985 [luigi-interface] hadoop.py:273 - HDFS: Number of bytes read=400
2017-01-19 18:06:29,197 INFO 16985 [luigi-interface] hadoop.py:273 - HDFS: Number of bytes written=2311
2017-01-19 18:06:29,197 INFO 16985 [luigi-interface] hadoop.py:273 - HDFS: Number of read operations=16
2017-01-19 18:06:29,198 INFO 16985 [luigi-interface] hadoop.py:273 - HDFS: Number of large read operations=0
2017-01-19 18:06:29,198 INFO 16985 [luigi-interface] hadoop.py:273 - HDFS: Number of write operations=8
2017-01-19 18:06:29,198 INFO 16985 [luigi-interface] hadoop.py:273 - Job Counters
2017-01-19 18:06:29,198 INFO 16985 [luigi-interface] hadoop.py:273 - Launched map tasks=4
2017-01-19 18:06:29,198 INFO 16985 [luigi-interface] hadoop.py:273 - Other local map tasks=4
2017-01-19 18:06:29,199 INFO 16985 [luigi-interface] hadoop.py:273 - Total time spent by all maps in occupied slots (ms)=158540
2017-01-19 18:06:29,199 INFO 16985 [luigi-interface] hadoop.py:273 - Total time spent by all reduces in occupied slots (ms)=0
2017-01-19 18:06:29,199 INFO 16985 [luigi-interface] hadoop.py:273 - Total time spent by all map tasks (ms)=158540
2017-01-19 18:06:29,199 INFO 16985 [luigi-interface] hadoop.py:273 - Total vcore-seconds taken by all map tasks=158540
2017-01-19 18:06:29,199 INFO 16985 [luigi-interface] hadoop.py:273 - Total megabyte-seconds taken by all map tasks=162344960
2017-01-19 18:06:29,200 INFO 16985 [luigi-interface] hadoop.py:273 - Map-Reduce Framework
2017-01-19 18:06:29,200 INFO 16985 [luigi-interface] hadoop.py:273 - Map input records=29
2017-01-19 18:06:29,200 INFO 16985 [luigi-interface] hadoop.py:273 - Map output records=29
2017-01-19 18:06:29,200 INFO 16985 [luigi-interface] hadoop.py:273 - Input split bytes=400
2017-01-19 18:06:29,200 INFO 16985 [luigi-interface] hadoop.py:273 - Spilled Records=0
2017-01-19 18:06:29,200 INFO 16985 [luigi-interface] hadoop.py:273 - Failed Shuffles=0
2017-01-19 18:06:29,201 INFO 16985 [luigi-interface] hadoop.py:273 - Merged Map outputs=0
2017-01-19 18:06:29,201 INFO 16985 [luigi-interface] hadoop.py:273 - GC time elapsed (ms)=2360
2017-01-19 18:06:29,201 INFO 16985 [luigi-interface] hadoop.py:273 - CPU time spent (ms)=6580
2017-01-19 18:06:29,201 INFO 16985 [luigi-interface] hadoop.py:273 - Physical memory (bytes) snapshot=438423552
2017-01-19 18:06:29,201 INFO 16985 [luigi-interface] hadoop.py:273 - Virtual memory (bytes) snapshot=8846278656
2017-01-19 18:06:29,202 INFO 16985 [luigi-interface] hadoop.py:273 - Total committed heap usage (bytes)=243531776
2017-01-19 18:06:29,202 INFO 16985 [luigi-interface] hadoop.py:273 - File Input Format Counters
2017-01-19 18:06:29,202 INFO 16985 [luigi-interface] hadoop.py:273 - Bytes Read=0
2017-01-19 18:06:29,202 INFO 16985 [luigi-interface] hadoop.py:273 - File Output Format Counters
2017-01-19 18:06:29,202 INFO 16985 [luigi-interface] hadoop.py:273 - Bytes Written=2311
2017-01-19 18:06:29,202 INFO 16985 [luigi-interface] hadoop.py:273 - 17/01/19 18:06:29 INFO mapreduce.ImportJobBase: Transferred 2.2568 KB in 70.4338 seconds (32.811 bytes/sec)
2017-01-19 18:06:29,203 INFO 16985 [luigi-interface] hadoop.py:273 - 17/01/19 18:06:29 INFO mapreduce.ImportJobBase: Retrieved 29 records.
2017-01-19 18:06:54,740 INFO 16985 [luigi-interface] worker.py:296 - [pid 16985] Worker Worker(salt=062530328, host=ip-172-30-0-176, username=hadoop, pid=16985) done SqoopImportFromMysql(destination=hdfs://localhost:9000/edx-analytics-pipeline/warehouse/student_courseenrollment/dt=2017-01-19/, credentials=/edx/etc/edx-analytics-pipeline/input.json, database=stage_edxapp, num_mappers=None, verbose=False, table_name=student_courseenrollment, where=None, columns=('id', 'user_id', 'course_id', 'created', 'is_active', 'mode'), null_string=\\N, fields_terminated_by=, delimiter_replacement= , mysql_delimiters=False)
2017-01-19 18:06:54,750 INFO 16985 [luigi-interface] worker.py:282 - [pid 16985] Worker Worker(salt=062530328, host=ip-172-30-0-176, username=hadoop, pid=16985) running ImportStudentCourseEnrollmentTask(destination=hdfs://localhost:9000/edx-analytics-pipeline/warehouse/, credentials=/edx/etc/edx-analytics-pipeline/input.json, database=stage_edxapp, import_date=None)
2017-01-19 18:06:58,935 INFO 16985 [edx.analytics.tasks.insights.database_imports] database_imports.py:110 - Marking existing output as having been removed for task ImportStudentCourseEnrollmentTask(destination=hdfs://localhost:9000/edx-analytics-pipeline/warehouse/, credentials=/edx/etc/edx-analytics-pipeline/input.json, database=stage_edxapp, import_date=None)
2017-01-19 18:06:58,936 INFO 16985 [luigi-interface] hive.py:358 - ['hive', '-f', '/tmp/tmpDUrda6', '--hiveconf', 'mapred.job.name=ImportStudentCourseEnrollmentTask(destination=hdfs://localhost:9000/edx-analytics-pipeline/warehouse/, credentials=/edx/etc/edx-analytics-pipeline/input.json, database=stage_edxapp, import_date=None)']
2017-01-19 18:06:58,936 INFO 16985 [luigi-interface] hadoop.py:242 - hive -f /tmp/tmpDUrda6 --hiveconf mapred.job.name=ImportStudentCourseEnrollmentTask(destination=hdfs://localhost:9000/edx-analytics-pipeline/warehouse/, credentials=/edx/etc/edx-analytics-pipeline/input.json, database=stage_edxapp, import_date=None)
2017-01-19 18:07:03,080 INFO 16985 [luigi-interface] hadoop.py:273 - 17/01/19 18:07:03 INFO Configuration.deprecation: mapred.input.dir.recursive is deprecated. Instead, use mapreduce.input.fileinputformat.input.dir.recursive
2017-01-19 18:07:03,083 INFO 16985 [luigi-interface] hadoop.py:273 - 17/01/19 18:07:03 INFO Configuration.deprecation: mapred.max.split.size is deprecated. Instead, use mapreduce.input.fileinputformat.split.maxsize
2017-01-19 18:07:03,083 INFO 16985 [luigi-interface] hadoop.py:273 - 17/01/19 18:07:03 INFO Configuration.deprecation: mapred.min.split.size is deprecated. Instead, use mapreduce.input.fileinputformat.split.minsize
2017-01-19 18:07:03,092 INFO 16985 [luigi-interface] hadoop.py:273 - 17/01/19 18:07:03 INFO Configuration.deprecation: mapred.min.split.size.per.rack is deprecated. Instead, use mapreduce.input.fileinputformat.split.minsize.per.rack
2017-01-19 18:07:03,093 INFO 16985 [luigi-interface] hadoop.py:273 - 17/01/19 18:07:03 INFO Configuration.deprecation: mapred.min.split.size.per.node is deprecated. Instead, use mapreduce.input.fileinputformat.split.minsize.per.node
2017-01-19 18:07:03,093 INFO 16985 [luigi-interface] hadoop.py:273 - 17/01/19 18:07:03 INFO Configuration.deprecation: mapred.reduce.tasks is deprecated. Instead, use mapreduce.job.reduces
2017-01-19 18:07:03,095 INFO 16985 [luigi-interface] hadoop.py:273 - 17/01/19 18:07:03 INFO Configuration.deprecation: mapred.reduce.tasks.speculative.execution is deprecated. Instead, use mapreduce.reduce.speculative
2017-01-19 18:07:04,530 INFO 16985 [luigi-interface] hadoop.py:273 - 17/01/19 18:07:04 WARN conf.Configuration: org.apache.hadoop.hive.conf.LoopingByteArrayInputStream@76f2b07d:an attempt to override final parameter: mapreduce.job.end-notification.max.retry.interval; Ignoring.
2017-01-19 18:07:04,597 INFO 16985 [luigi-interface] hadoop.py:273 - 17/01/19 18:07:04 WARN conf.Configuration: org.apache.hadoop.hive.conf.LoopingByteArrayInputStream@76f2b07d:an attempt to override final parameter: mapreduce.job.end-notification.max.attempts; Ignoring.
2017-01-19 18:07:04,612 INFO 16985 [luigi-interface] hadoop.py:273 - 17/01/19 18:07:04 INFO Configuration.deprecation: mapred.job.name is deprecated. Instead, use mapreduce.job.name
2017-01-19 18:07:05,948 INFO 16985 [luigi-interface] hadoop.py:273 - Logging initialized using configuration in jar:file:/edx/app/hadoop/hive-0.11.0-bin/lib/hive-common-0.11.0.jar!/hive-log4j.properties
2017-01-19 18:07:05,986 INFO 16985 [luigi-interface] hadoop.py:273 - Hive history file=/tmp/hadoop/hive_job_log_hadoop_20244@ip-172-30-0-176.ec2.internal_201701191807_449754460.txt
2017-01-19 18:07:06,775 INFO 16985 [luigi-interface] hadoop.py:273 - SLF4J: Class path contains multiple SLF4J bindings.
2017-01-19 18:07:06,775 INFO 16985 [luigi-interface] hadoop.py:273 - SLF4J: Found binding in [jar:file:/edx/app/hadoop/hadoop-2.3.0/share/hadoop/common/lib/slf4j-log4j12-1.7.5.jar!/org/slf4j/impl/StaticLoggerBinder.class]
2017-01-19 18:07:06,795 INFO 16985 [luigi-interface] hadoop.py:273 - SLF4J: Found binding in [jar:file:/edx/app/hadoop/hive-0.11.0-bin/lib/slf4j-log4j12-1.6.1.jar!/org/slf4j/impl/StaticLoggerBinder.class]
2017-01-19 18:07:06,795 INFO 16985 [luigi-interface] hadoop.py:273 - SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
2017-01-19 18:07:06,883 INFO 16985 [luigi-interface] hadoop.py:273 - SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory]
2017-01-19 18:07:15,587 INFO 16985 [luigi-interface] hadoop.py:273 - OK
2017-01-19 18:07:15,588 INFO 16985 [luigi-interface] hadoop.py:273 - Time taken: 8.147 seconds
2017-01-19 18:07:19,642 INFO 16985 [luigi-interface] hadoop.py:273 - OK
2017-01-19 18:07:19,642 INFO 16985 [luigi-interface] hadoop.py:273 - Time taken: 4.053 seconds
2017-01-19 18:07:20,483 INFO 16985 [luigi-interface] hadoop.py:273 - OK
2017-01-19 18:07:20,484 INFO 16985 [luigi-interface] hadoop.py:273 - Time taken: 0.841 seconds
2017-01-19 18:07:21,352 INFO 16985 [luigi-interface] hadoop.py:273 - OK
2017-01-19 18:07:21,371 INFO 16985 [luigi-interface] hadoop.py:273 - Time taken: 0.887 seconds
2017-01-19 18:07:21,464 INFO 16985 [luigi-interface] worker.py:296 - [pid 16985] Worker Worker(salt=062530328, host=ip-172-30-0-176, username=hadoop, pid=16985) done ImportStudentCourseEnrollmentTask(destination=hdfs://localhost:9000/edx-analytics-pipeline/warehouse/, credentials=/edx/etc/edx-analytics-pipeline/input.json, database=stage_edxapp, import_date=None)
2017-01-19 18:07:21,464 INFO 16985 [luigi-interface] worker.py:282 - [pid 16985] Worker Worker(salt=062530328, host=ip-172-30-0-176, username=hadoop, pid=16985) running LastDailyIpAddressOfUserTask(source=('hdfs://localhost:9000/data/',), interval=2017-01-03-2017-01-19, expand_interval=2 days, 0:00:00, pattern=('.*tracking.log.*',), date_pattern=%Y%m%d, warehouse_path=hdfs://localhost:9000/edx-analytics-pipeline/warehouse/)
2017-01-19 18:07:27,912 INFO 16985 [luigi-interface] hadoop.py:242 - /edx/app/hadoop/hadoop/bin/hadoop jar /edx/app/hadoop/hadoop/share/hadoop/tools/lib/hadoop-streaming-2.3.0.jar -D mapred.job.name=LastDailyIpAddressOfUserTask(source=('hdfs://localhost:9000/data/',), interval=2017-01-03-2017-01-19, expand_interval=2 days, 0:00:00, pattern=('.*tracking.log.*',), date_pattern=%Y%m%d, warehouse_path=hdfs://localhost:9000/edx-analytics-pipeline/warehouse/) -D mapred.reduce.tasks=1 -mapper /usr/bin/python2.7 mrrunner.py map -reducer /usr/bin/python2.7 mrrunner.py reduce -file /var/lib/analytics-tasks/analyticstack/venv/local/lib/python2.7/site-packages/luigi/mrrunner.py -file /tmp/tmpFlhioS/packages.tar -file /tmp/tmpFlhioS/job-instance.pickle -input /data/tracking.log -output /edx-analytics-pipeline/marker/-969637439385692816-temp-2017-01-19T18-07-25.670339
2017-01-19 18:07:29,727 INFO 16985 [luigi-interface] hadoop.py:273 - 17/01/19 18:07:29 WARN streaming.StreamJob: -file option is deprecated, please use generic option -files instead.
2017-01-19 18:07:32,974 INFO 16985 [luigi-interface] hadoop.py:273 - 17/01/19 18:07:32 INFO client.RMProxy: Connecting to ResourceManager at /0.0.0.0:8032
2017-01-19 18:07:33,404 INFO 16985 [luigi-interface] hadoop.py:273 - 17/01/19 18:07:33 INFO client.RMProxy: Connecting to ResourceManager at /0.0.0.0:8032
2017-01-19 18:07:36,031 INFO 16985 [luigi-interface] hadoop.py:273 - 17/01/19 18:07:36 INFO mapred.FileInputFormat: Total input paths to process : 1
2017-01-19 18:07:36,164 INFO 16985 [luigi-interface] hadoop.py:273 - 17/01/19 18:07:36 INFO mapreduce.JobSubmitter: number of splits:2
2017-01-19 18:07:36,171 INFO 16985 [luigi-interface] hadoop.py:273 - 17/01/19 18:07:36 INFO Configuration.deprecation: mapred.job.name is deprecated. Instead, use mapreduce.job.name
2017-01-19 18:07:36,173 INFO 16985 [luigi-interface] hadoop.py:273 - 17/01/19 18:07:36 INFO Configuration.deprecation: mapred.reduce.tasks is deprecated. Instead, use mapreduce.job.reduces
2017-01-19 18:07:36,567 INFO 16985 [luigi-interface] hadoop.py:273 - 17/01/19 18:07:36 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1484841390427_0011
2017-01-19 18:07:37,167 INFO 16985 [luigi-interface] hadoop.py:273 - 17/01/19 18:07:37 INFO impl.YarnClientImpl: Submitted application application_1484841390427_0011
2017-01-19 18:07:37,314 INFO 16985 [luigi-interface] hadoop.py:273 - 17/01/19 18:07:37 INFO mapreduce.Job: The url to track the job: http://ip-172-30-0-176:8088/proxy/application_1484841390427_0011/
2017-01-19 18:07:37,316 INFO 16985 [luigi-interface] hadoop.py:273 - 17/01/19 18:07:37 INFO mapreduce.Job: Running job: job_1484841390427_0011
2017-01-19 18:07:50,192 INFO 16985 [luigi-interface] hadoop.py:273 - 17/01/19 18:07:50 INFO mapreduce.Job: Job job_1484841390427_0011 running in uber mode : false
2017-01-19 18:07:50,193 INFO 16985 [luigi-interface] hadoop.py:273 - 17/01/19 18:07:50 INFO mapreduce.Job: map 0% reduce 0%
2017-01-19 18:08:12,997 INFO 16985 [luigi-interface] hadoop.py:273 - 17/01/19 18:08:12 INFO mapreduce.Job: map 100% reduce 0%
2017-01-19 18:08:29,339 INFO 16985 [luigi-interface] hadoop.py:273 - 17/01/19 18:08:29 INFO mapreduce.Job: map 100% reduce 100%
2017-01-19 18:08:42,632 INFO 16985 [luigi-interface] hadoop.py:273 - 17/01/19 18:08:42 INFO mapreduce.Job: Task Id : attempt_1484841390427_0011_r_000000_0, Status : FAILED
2017-01-19 18:08:42,659 INFO 16985 [luigi-interface] hadoop.py:273 - Error: java.lang.RuntimeException: PipeMapRed.waitOutputThreads(): subprocess failed with code 1
2017-01-19 18:08:42,659 INFO 16985 [luigi-interface] hadoop.py:273 - at org.apache.hadoop.streaming.PipeMapRed.waitOutputThreads(PipeMapRed.java:320)
2017-01-19 18:08:42,659 INFO 16985 [luigi-interface] hadoop.py:273 - at org.apache.hadoop.streaming.PipeMapRed.mapRedFinished(PipeMapRed.java:533)
2017-01-19 18:08:42,660 INFO 16985 [luigi-interface] hadoop.py:273 - at org.apache.hadoop.streaming.PipeReducer.close(PipeReducer.java:134)
2017-01-19 18:08:42,660 INFO 16985 [luigi-interface] hadoop.py:273 - at org.apache.hadoop.io.IOUtils.cleanup(IOUtils.java:237)
2017-01-19 18:08:42,660 INFO 16985 [luigi-interface] hadoop.py:273 - at org.apache.hadoop.mapred.ReduceTask.runOldReducer(ReduceTask.java:459)
2017-01-19 18:08:42,660 INFO 16985 [luigi-interface] hadoop.py:273 - at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:392)
2017-01-19 18:08:42,660 INFO 16985 [luigi-interface] hadoop.py:273 - at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:168)
2017-01-19 18:08:42,661 INFO 16985 [luigi-interface] hadoop.py:273 - at java.security.AccessController.doPrivileged(Native Method)
2017-01-19 18:08:42,661 INFO 16985 [luigi-interface] hadoop.py:273 - at javax.security.auth.Subject.doAs(Subject.java:422)
2017-01-19 18:08:42,661 INFO 16985 [luigi-interface] hadoop.py:273 - at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1548)
2017-01-19 18:08:42,661 INFO 16985 [luigi-interface] hadoop.py:273 - at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:163)
2017-01-19 18:08:43,680 INFO 16985 [luigi-interface] hadoop.py:273 - 17/01/19 18:08:43 INFO mapreduce.Job: map 100% reduce 0%
2017-01-19 18:08:59,085 INFO 16985 [luigi-interface] hadoop.py:273 - 17/01/19 18:08:59 INFO mapreduce.Job: map 100% reduce 100%
2017-01-19 18:09:17,410 INFO 16985 [luigi-interface] hadoop.py:273 - 17/01/19 18:09:17 INFO mapreduce.Job: Task Id : attempt_1484841390427_0011_r_000000_1, Status : FAILED
2017-01-19 18:09:17,411 INFO 16985 [luigi-interface] hadoop.py:273 - Error: java.lang.RuntimeException: PipeMapRed.waitOutputThreads(): subprocess failed with code 1
2017-01-19 18:09:17,411 INFO 16985 [luigi-interface] hadoop.py:273 - at org.apache.hadoop.streaming.PipeMapRed.waitOutputThreads(PipeMapRed.java:320)
2017-01-19 18:09:17,411 INFO 16985 [luigi-interface] hadoop.py:273 - at org.apache.hadoop.streaming.PipeMapRed.mapRedFinished(PipeMapRed.java:533)
2017-01-19 18:09:17,412 INFO 16985 [luigi-interface] hadoop.py:273 - at org.apache.hadoop.streaming.PipeReducer.close(PipeReducer.java:134)
2017-01-19 18:09:17,412 INFO 16985 [luigi-interface] hadoop.py:273 - at org.apache.hadoop.io.IOUtils.cleanup(IOUtils.java:237)
2017-01-19 18:09:17,412 INFO 16985 [luigi-interface] hadoop.py:273 - at org.apache.hadoop.mapred.ReduceTask.runOldReducer(ReduceTask.java:459)
2017-01-19 18:09:17,412 INFO 16985 [luigi-interface] hadoop.py:273 - at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:392)
2017-01-19 18:09:17,412 INFO 16985 [luigi-interface] hadoop.py:273 - at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:168)
2017-01-19 18:09:17,413 INFO 16985 [luigi-interface] hadoop.py:273 - at java.security.AccessController.doPrivileged(Native Method)
2017-01-19 18:09:17,414 INFO 16985 [luigi-interface] hadoop.py:273 - at javax.security.auth.Subject.doAs(Subject.java:422)
2017-01-19 18:09:17,414 INFO 16985 [luigi-interface] hadoop.py:273 - at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1548)
2017-01-19 18:09:17,414 INFO 16985 [luigi-interface] hadoop.py:273 - at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:163)
2017-01-19 18:09:18,417 INFO 16985 [luigi-interface] hadoop.py:273 - 17/01/19 18:09:18 INFO mapreduce.Job: map 100% reduce 0%
2017-01-19 18:09:33,705 INFO 16985 [luigi-interface] hadoop.py:273 - 17/01/19 18:09:33 INFO mapreduce.Job: map 100% reduce 100%
2017-01-19 18:09:46,970 INFO 16985 [luigi-interface] hadoop.py:273 - 17/01/19 18:09:46 INFO mapreduce.Job: Task Id : attempt_1484841390427_0011_r_000000_2, Status : FAILED
2017-01-19 18:09:46,983 INFO 16985 [luigi-interface] hadoop.py:273 - Error: java.lang.RuntimeException: PipeMapRed.waitOutputThreads(): subprocess failed with code 1
2017-01-19 18:09:46,983 INFO 16985 [luigi-interface] hadoop.py:273 - at org.apache.hadoop.streaming.PipeMapRed.waitOutputThreads(PipeMapRed.java:320)
2017-01-19 18:09:46,984 INFO 16985 [luigi-interface] hadoop.py:273 - at org.apache.hadoop.streaming.PipeMapRed.mapRedFinished(PipeMapRed.java:533)
2017-01-19 18:09:46,984 INFO 16985 [luigi-interface] hadoop.py:273 - at org.apache.hadoop.streaming.PipeReducer.close(PipeReducer.java:134)
2017-01-19 18:09:46,984 INFO 16985 [luigi-interface] hadoop.py:273 - at org.apache.hadoop.io.IOUtils.cleanup(IOUtils.java:237)
2017-01-19 18:09:46,984 INFO 16985 [luigi-interface] hadoop.py:273 - at org.apache.hadoop.mapred.ReduceTask.runOldReducer(ReduceTask.java:459)
2017-01-19 18:09:46,984 INFO 16985 [luigi-interface] hadoop.py:273 - at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:392)
2017-01-19 18:09:46,985 INFO 16985 [luigi-interface] hadoop.py:273 - at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:168)
2017-01-19 18:09:46,985 INFO 16985 [luigi-interface] hadoop.py:273 - at java.security.AccessController.doPrivileged(Native Method)
2017-01-19 18:09:46,990 INFO 16985 [luigi-interface] hadoop.py:273 - at javax.security.auth.Subject.doAs(Subject.java:422)
2017-01-19 18:09:46,990 INFO 16985 [luigi-interface] hadoop.py:273 - at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1548)
2017-01-19 18:09:46,991 INFO 16985 [luigi-interface] hadoop.py:273 - at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:163)
2017-01-19 18:09:47,983 INFO 16985 [luigi-interface] hadoop.py:273 - 17/01/19 18:09:47 INFO mapreduce.Job: map 100% reduce 0%
2017-01-19 18:10:04,428 INFO 16985 [luigi-interface] hadoop.py:273 - 17/01/19 18:10:04 INFO mapreduce.Job: map 100% reduce 100%
2017-01-19 18:10:23,820 INFO 16985 [luigi-interface] hadoop.py:273 - 17/01/19 18:10:23 INFO mapreduce.Job: Job job_1484841390427_0011 failed with state FAILED due to: Task failed task_1484841390427_0011_r_000000
2017-01-19 18:10:23,820 INFO 16985 [luigi-interface] hadoop.py:273 - Job failed as tasks failed. failedMaps:0 failedReduces:1
2017-01-19 18:10:24,151 INFO 16985 [luigi-interface] hadoop.py:273 - 17/01/19 18:10:24 INFO mapreduce.Job: Counters: 37
2017-01-19 18:10:24,152 INFO 16985 [luigi-interface] hadoop.py:273 - File System Counters
2017-01-19 18:10:24,187 INFO 16985 [luigi-interface] hadoop.py:273 - FILE: Number of bytes read=0
2017-01-19 18:10:24,187 INFO 16985 [luigi-interface] hadoop.py:273 - FILE: Number of bytes written=195423
2017-01-19 18:10:24,187 INFO 16985 [luigi-interface] hadoop.py:273 - FILE: Number of read operations=0
2017-01-19 18:10:24,188 INFO 16985 [luigi-interface] hadoop.py:273 - FILE: Number of large read operations=0
2017-01-19 18:10:24,191 INFO 16985 [luigi-interface] hadoop.py:273 - FILE: Number of write operations=0
2017-01-19 18:10:24,191 INFO 16985 [luigi-interface] hadoop.py:273 - HDFS: Number of bytes read=686974
2017-01-19 18:10:24,191 INFO 16985 [luigi-interface] hadoop.py:273 - HDFS: Number of bytes written=0
2017-01-19 18:10:24,192 INFO 16985 [luigi-interface] hadoop.py:273 - HDFS: Number of read operations=6
2017-01-19 18:10:24,192 INFO 16985 [luigi-interface] hadoop.py:273 - HDFS: Number of large read operations=0
2017-01-19 18:10:24,192 INFO 16985 [luigi-interface] hadoop.py:273 - HDFS: Number of write operations=0
2017-01-19 18:10:24,192 INFO 16985 [luigi-interface] hadoop.py:273 - Job Counters
2017-01-19 18:10:24,192 INFO 16985 [luigi-interface] hadoop.py:273 - Failed reduce tasks=4
2017-01-19 18:10:24,204 INFO 16985 [luigi-interface] hadoop.py:273 - Launched map tasks=2
2017-01-19 18:10:24,204 INFO 16985 [luigi-interface] hadoop.py:273 - Launched reduce tasks=4
2017-01-19 18:10:24,205 INFO 16985 [luigi-interface] hadoop.py:273 - Data-local map tasks=2
2017-01-19 18:10:24,205 INFO 16985 [luigi-interface] hadoop.py:273 - Total time spent by all maps in occupied slots (ms)=43964
2017-01-19 18:10:24,205 INFO 16985 [luigi-interface] hadoop.py:273 - Total time spent by all reduces in occupied slots (ms)=118112
2017-01-19 18:10:24,205 INFO 16985 [luigi-interface] hadoop.py:273 - Total time spent by all map tasks (ms)=43964
2017-01-19 18:10:24,205 INFO 16985 [luigi-interface] hadoop.py:273 - Total time spent by all reduce tasks (ms)=118112
2017-01-19 18:10:24,206 INFO 16985 [luigi-interface] hadoop.py:273 - Total vcore-seconds taken by all map tasks=43964
2017-01-19 18:10:24,206 INFO 16985 [luigi-interface] hadoop.py:273 - Total vcore-seconds taken by all reduce tasks=118112
2017-01-19 18:10:24,206 INFO 16985 [luigi-interface] hadoop.py:273 - Total megabyte-seconds taken by all map tasks=45019136
2017-01-19 18:10:24,206 INFO 16985 [luigi-interface] hadoop.py:273 - Total megabyte-seconds taken by all reduce tasks=120946688
2017-01-19 18:10:24,206 INFO 16985 [luigi-interface] hadoop.py:273 - Map-Reduce Framework
2017-01-19 18:10:24,206 INFO 16985 [luigi-interface] hadoop.py:273 - Map input records=1742
2017-01-19 18:10:24,207 INFO 16985 [luigi-interface] hadoop.py:273 - Map output records=167
2017-01-19 18:10:24,207 INFO 16985 [luigi-interface] hadoop.py:273 - Map output bytes=15809
2017-01-19 18:10:24,207 INFO 16985 [luigi-interface] hadoop.py:273 - Map output materialized bytes=16155
2017-01-19 18:10:24,207 INFO 16985 [luigi-interface] hadoop.py:273 - Input split bytes=182
2017-01-19 18:10:24,207 INFO 16985 [luigi-interface] hadoop.py:273 - Combine input records=0
2017-01-19 18:10:24,208 INFO 16985 [luigi-interface] hadoop.py:273 - Spilled Records=167
2017-01-19 18:10:24,208 INFO 16985 [luigi-interface] hadoop.py:273 - Failed Shuffles=0
2017-01-19 18:10:24,208 INFO 16985 [luigi-interface] hadoop.py:273 - Merged Map outputs=0
2017-01-19 18:10:24,208 INFO 16985 [luigi-interface] hadoop.py:273 - GC time elapsed (ms)=651
2017-01-19 18:10:24,208 INFO 16985 [luigi-interface] hadoop.py:273 - CPU time spent (ms)=3880
2017-01-19 18:10:24,208 INFO 16985 [luigi-interface] hadoop.py:273 - Physical memory (bytes) snapshot=415764480
2017-01-19 18:10:24,209 INFO 16985 [luigi-interface] hadoop.py:273 - Virtual memory (bytes) snapshot=4414054400
2017-01-19 18:10:24,226 INFO 16985 [luigi-interface] hadoop.py:273 - Total committed heap usage (bytes)=317587456
2017-01-19 18:10:24,226 INFO 16985 [luigi-interface] hadoop.py:273 - File Input Format Counters
2017-01-19 18:10:24,226 INFO 16985 [luigi-interface] hadoop.py:273 - Bytes Read=686792
2017-01-19 18:10:24,226 INFO 16985 [luigi-interface] hadoop.py:273 - 17/01/19 18:10:24 ERROR streaming.StreamJob: Job not Successful!
2017-01-19 18:10:24,226 INFO 16985 [luigi-interface] hadoop.py:273 - Streaming Command Failed!
2017-01-19 18:10:24,237 ERROR 16985 [luigi-interface] worker.py:304 - [pid 16985] Worker Worker(salt=062530328, host=ip-172-30-0-176, username=hadoop, pid=16985) failed LastDailyIpAddressOfUserTask(source=('hdfs://localhost:9000/data/',), interval=2017-01-03-2017-01-19, expand_interval=2 days, 0:00:00, pattern=('.*tracking.log.*',), date_pattern=%Y%m%d, warehouse_path=hdfs://localhost:9000/edx-analytics-pipeline/warehouse/)
Traceback (most recent call last):
File "/var/lib/analytics-tasks/analyticstack/venv/local/lib/python2.7/site-packages/luigi/worker.py", line 292, in _run_task
task.run()
File "/var/lib/analytics-tasks/analyticstack/venv/local/lib/python2.7/site-packages/edx/analytics/tasks/insights/location_per_course.py", line 163, in run
super(LastDailyIpAddressOfUserTask, self).run()
File "/var/lib/analytics-tasks/analyticstack/venv/local/lib/python2.7/site-packages/luigi/hadoop.py", line 612, in run
self.job_runner().run_job(self)
File "/var/lib/analytics-tasks/analyticstack/venv/local/lib/python2.7/site-packages/luigi/hadoop.py", line 482, in run_job
run_and_track_hadoop_job(arglist)
File "/var/lib/analytics-tasks/analyticstack/venv/local/lib/python2.7/site-packages/luigi/hadoop.py", line 318, in run_and_track_hadoop_job
return track_process(arglist, tracking_url_callback, env)
File "/var/lib/analytics-tasks/analyticstack/venv/local/lib/python2.7/site-packages/luigi/hadoop.py", line 302, in track_process
raise HadoopJobError(message + 'Also, no tracking url found.', out, err)
HadoopJobError: ('Streaming job failed with exit code 1. Also, no tracking url found.', 'packageJobJar: [/var/lib/analytics-tasks/analyticstack/venv/local/lib/python2.7/site-packages/luigi/mrrunner.py, /tmp/tmpFlhioS/packages.tar, /tmp/tmpFlhioS/job-instance.pickle, /tmp/hadoop-hadoop/hadoop-unjar1279580823033983790/] [] /tmp/streamjob7319831079718813713.jar tmpDir=null\n', '17/01/19 18:07:29 WARN streaming.StreamJob: -file option is deprecated, please use generic option -files instead.\n17/01/19 18:07:32 INFO client.RMProxy: Connecting to ResourceManager at /0.0.0.0:8032\n17/01/19 18:07:33 INFO client.RMProxy: Connecting to ResourceManager at /0.0.0.0:8032\n17/01/19 18:07:36 INFO mapred.FileInputFormat: Total input paths to process : 1\n17/01/19 18:07:36 INFO mapreduce.JobSubmitter: number of splits:2\n17/01/19 18:07:36 INFO Configuration.deprecation: mapred.job.name is deprecated. Instead, use mapreduce.job.name\n17/01/19 18:07:36 INFO Configuration.deprecation: mapred.reduce.tasks is deprecated. Instead, use mapreduce.job.reduces\n17/01/19 18:07:36 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1484841390427_0011\n17/01/19 18:07:37 INFO impl.YarnClientImpl: Submitted application application_1484841390427_0011\n17/01/19 18:07:37 INFO mapreduce.Job: The url to track the job: http://ip-172-30-0-176:8088/proxy/application_1484841390427_0011/\n17/01/19 18:07:37 INFO mapreduce.Job: Running job: job_1484841390427_0011\n17/01/19 18:07:50 INFO mapreduce.Job: Job job_1484841390427_0011 running in uber mode : false\n17/01/19 18:07:50 INFO mapreduce.Job: map 0% reduce 0%\n17/01/19 18:08:12 INFO mapreduce.Job: map 100% reduce 0%\n17/01/19 18:08:29 INFO mapreduce.Job: map 100% reduce 100%\n17/01/19 18:08:42 INFO mapreduce.Job: Task Id : attempt_1484841390427_0011_r_000000_0, Status : FAILED\nError: java.lang.RuntimeException: PipeMapRed.waitOutputThreads(): subprocess failed with code 1\n\tat org.apache.hadoop.streaming.PipeMapRed.waitOutputThreads(PipeMapRed.java:320)\n\tat org.apache.hadoop.streaming.PipeMapRed.mapRedFinished(PipeMapRed.java:533)\n\tat org.apache.hadoop.streaming.PipeReducer.close(PipeReducer.java:134)\n\tat org.apache.hadoop.io.IOUtils.cleanup(IOUtils.java:237)\n\tat org.apache.hadoop.mapred.ReduceTask.runOldReducer(ReduceTask.java:459)\n\tat org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:392)\n\tat org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:168)\n\tat java.security.AccessController.doPrivileged(Native Method)\n\tat javax.security.auth.Subject.doAs(Subject.java:422)\n\tat org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1548)\n\tat org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:163)\n\n17/01/19 18:08:43 INFO mapreduce.Job: map 100% reduce 0%\n17/01/19 18:08:59 INFO mapreduce.Job: map 100% reduce 100%\n17/01/19 18:09:17 INFO mapreduce.Job: Task Id : attempt_1484841390427_0011_r_000000_1, Status : FAILED\nError: java.lang.RuntimeException: PipeMapRed.waitOutputThreads(): subprocess failed with code 1\n\tat org.apache.hadoop.streaming.PipeMapRed.waitOutputThreads(PipeMapRed.java:320)\n\tat org.apache.hadoop.streaming.PipeMapRed.mapRedFinished(PipeMapRed.java:533)\n\tat org.apache.hadoop.streaming.PipeReducer.close(PipeReducer.java:134)\n\tat org.apache.hadoop.io.IOUtils.cleanup(IOUtils.java:237)\n\tat org.apache.hadoop.mapred.ReduceTask.runOldReducer(ReduceTask.java:459)\n\tat org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:392)\n\tat org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:168)\n\tat java.security.AccessController.doPrivileged(Native Method)\n\tat javax.security.auth.Subject.doAs(Subject.java:422)\n\tat org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1548)\n\tat org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:163)\n\n17/01/19 18:09:18 INFO mapreduce.Job: map 100% reduce 0%\n17/01/19 18:09:33 INFO mapreduce.Job: map 100% reduce 100%\n17/01/19 18:09:46 INFO mapreduce.Job: Task Id : attempt_1484841390427_0011_r_000000_2, Status : FAILED\nError: java.lang.RuntimeException: PipeMapRed.waitOutputThreads(): subprocess failed with code 1\n\tat org.apache.hadoop.streaming.PipeMapRed.waitOutputThreads(PipeMapRed.java:320)\n\tat org.apache.hadoop.streaming.PipeMapRed.mapRedFinished(PipeMapRed.java:533)\n\tat org.apache.hadoop.streaming.PipeReducer.close(PipeReducer.java:134)\n\tat org.apache.hadoop.io.IOUtils.cleanup(IOUtils.java:237)\n\tat org.apache.hadoop.mapred.ReduceTask.runOldReducer(ReduceTask.java:459)\n\tat org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:392)\n\tat org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:168)\n\tat java.security.AccessController.doPrivileged(Native Method)\n\tat javax.security.auth.Subject.doAs(Subject.java:422)\n\tat org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1548)\n\tat org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:163)\n\n17/01/19 18:09:47 INFO mapreduce.Job: map 100% reduce 0%\n17/01/19 18:10:04 INFO mapreduce.Job: map 100% reduce 100%\n17/01/19 18:10:23 INFO mapreduce.Job: Job job_1484841390427_0011 failed with state FAILED due to: Task failed task_1484841390427_0011_r_000000\nJob failed as tasks failed. failedMaps:0 failedReduces:1\n\n17/01/19 18:10:24 INFO mapreduce.Job: Counters: 37\n\tFile System Counters\n\t\tFILE: Number of bytes read=0\n\t\tFILE: Number of bytes written=195423\n\t\tFILE: Number of read operations=0\n\t\tFILE: Number of large read operations=0\n\t\tFILE: Number of write operations=0\n\t\tHDFS: Number of bytes read=686974\n\t\tHDFS: Number of bytes written=0\n\t\tHDFS: Number of read operations=6\n\t\tHDFS: Number of large read operations=0\n\t\tHDFS: Number of write operations=0\n\tJob Counters \n\t\tFailed reduce tasks=4\n\t\tLaunched map tasks=2\n\t\tLaunched reduce tasks=4\n\t\tData-local map tasks=2\n\t\tTotal time spent by all maps in occupied slots (ms)=43964\n\t\tTotal time spent by all reduces in occupied slots (ms)=118112\n\t\tTotal time spent by all map tasks (ms)=43964\n\t\tTotal time spent by all reduce tasks (ms)=118112\n\t\tTotal vcore-seconds taken by all map tasks=43964\n\t\tTotal vcore-seconds taken by all reduce tasks=118112\n\t\tTotal megabyte-seconds taken by all map tasks=45019136\n\t\tTotal megabyte-seconds taken by all reduce tasks=120946688\n\tMap-Reduce Framework\n\t\tMap input records=1742\n\t\tMap output records=167\n\t\tMap output bytes=15809\n\t\tMap output materialized bytes=16155\n\t\tInput split bytes=182\n\t\tCombine input records=0\n\t\tSpilled Records=167\n\t\tFailed Shuffles=0\n\t\tMerged Map outputs=0\n\t\tGC time elapsed (ms)=651\n\t\tCPU time spent (ms)=3880\n\t\tPhysical memory (bytes) snapshot=415764480\n\t\tVirtual memory (bytes) snapshot=4414054400\n\t\tTotal committed heap usage (bytes)=317587456\n\tFile Input Format Counters \n\t\tBytes Read=686792\n17/01/19 18:10:24 ERROR streaming.StreamJob: Job not Successful!\nStreaming Command Failed!\n')
2017-01-19 18:10:24,239 INFO 16985 [luigi-interface] notifications.py:96 - Skipping error email. Set `error-email` in the `core` section of the luigi config file to receive error emails.
2017-01-19 18:10:32,881 INFO 16985 [luigi-interface] worker.py:337 - Done
2017-01-19 18:10:32,881 INFO 16985 [luigi-interface] worker.py:338 - There are no more tasks to run at this time
2017-01-19 18:10:32,882 INFO 16985 [luigi-interface] worker.py:343 - There are 8 pending tasks possibly being run by other workers
2017-01-19 18:10:32,924 INFO 16985 [luigi-interface] worker.py:117 - Worker Worker(salt=062530328, host=ip-172-30-0-176, username=hadoop, pid=16985) was stopped. Shutting down Keep-Alive thread
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment