Skip to content

Instantly share code, notes, and snippets.

View steveloughran's full-sized avatar
🌴
offline until 2024

Steve Loughran steveloughran

🌴
offline until 2024
View GitHub Profile
[*]
charset = utf-8
end_of_line = lf
indent_size = 2
indent_style = space
insert_final_newline = false
max_line_length = 80
tab_width = 2
ij_continuation_indent_size = 4
ij_formatter_off_tag = @formatter:off
@steveloughran
steveloughran / timings.md
Created January 21, 2022 17:40
manifest committer test runs on GCS vs ABFS

GCS TestCommitterLoadManifestsStage


220121 17:13:40.558:INFO [org.apache.hadoop.mapreduce.lib.output.committer.manifest.AbstractManifestCommitterTest] Aggregate FileSystem Statistics counters=((directories_created=34)
(files_created=20)
(files_deleted=42)
@steveloughran
steveloughran / log4j-audit.md
Last active January 8, 2022 20:44
Auditing hadoop for log4j 2.x on the command line.

Auditing hadoop for log4j 2.x on the command line.

ASF hadoop distributions do not contain log4j 2.x, so are not vulnerable to any of the recent CVEs. However, third party products may contain vulnerable libraries. log4j 2.x

This is how to programmatically check to see if a hadoop distribution has a log4j 2.x artifact on its class path and so potentially at risk. using the findclass command.

Introducing the findclass command

~/P/R/hadoop-3.3.1 time bin/hadoop jar $CLOUDSTORE dux -threads 64 -limit 1000 -verbose $BUCKET
Listing files under s3a://stevel-london/ with thread count 64
=============================================================
2021-06-03 20:10:37,994 [main] INFO commands.ExtendedDu (StoreDurationInfo.java:<init>(53)) - Starting: List files under s3a://stevel-london/
2021-06-03 20:10:39,932 [main] INFO commands.ExtendedDu (StoreDurationInfo.java:<init>(53)) - Starting: Initial list of path s3a://stevel-london/
2021-06-03 20:10:40,288 [main] INFO commands.ExtendedDu (StoreDurationInfo.java:close(100)) - Initial list of path s3a://stevel-london/: duration 0:00:356
2021-06-03 20:10:40,290 [pool-3-thread-1] INFO commands.ExtendedDu (StoreDurationInfo.java:<init>(53)) - Starting: List s3a://stevel-london/cloud-integration
@steveloughran
steveloughran / AWS S3 Log.txt
Created May 17, 2021 13:27
AWS S3 log with S3A auditing enabled during terasort test; note job ID propagation...this is only in store actions during task/job commit, not worker task R/W
183c9826b45486e485693808f38e2c4071004bf5dfd4c3ab210f0a21a4235ef8 stevel-london [13/May/2021:13:17:51 +0000] 109.157.171.170 arn:aws:iam::152813717728:user/stevel-dev VFFQ3CDMPHBBB42W REST.GET.BUCKET - "GET /?list-type=2&delimiter=%2F&max-keys=2&prefix=terasort-magic%2Fvalidate%2F__magic%2Fjob-job_1620911577786_0006%2Ftasks%2Fattempt_1620911577786_0006_m_000000_0%2F__base%2F&fetch-owner=false HTTP/1.1" 200 - 375 - 28 28 "https://audit.example.org/op_mkdirs/fc311a70-b166-423a-9345-ccd17f38ff5c-00000006/?op=op_mkdirs&p1=terasort-magic/validate/__magic/job-job_1620911577786_0006/tasks/attempt_1620911577786_0006_m_000000_0/__base&pr=stevel&ps=c765f30e-9f74-488b-b536-dc1714a90245&id=fc311a70-b166-423a-9345-ccd17f38ff5c-00000006&t0=1&fs=fc311a70-b166-423a-9345-ccd17f38ff5c&t1=1&ji=job_1620911577786_0006&ts=1620911871265" "Hadoop 3.4.0-SNAPSHOT, aws-sdk-java/1.11.901 Mac_OS_X/10.16 OpenJDK_64-Bit_Server_VM/25.282-b08 java/1.8.0_282 vendor/AdoptOpenJDK" - 0X+S2Iv0LI8xRYuM11e3aiPEIFkg2t2d4xLMZzLAf+p4gJGDifQGItg4qvPZLgG
@steveloughran
steveloughran / ITestS3AFileOperationCost.java
Last active December 16, 2020 18:20
S3A IOStatistics Dump from an ITestS3AOperationCost run (remote)
Aggregate FileSystem Statistics
counters=((action_executor_acquired=68)
(action_executor_acquired.failures=0)
(action_http_get_request=4)
(action_http_get_request.failures=0)
(action_http_head_request=284)
(action_http_head_request.failures=0)
(committer_bytes_committed=0)
(committer_bytes_uploaded=0)
(committer_commit_job=0)
@Test
public void test_140_teracomplete() throws Throwable {
terasortDuration.get().close();
final StringBuilder results = new StringBuilder();
results.append("\"Operation\"\t\"Duration\"\n");
// this is how you dynamically create a function in a method
// for use afterwards.
// Works because there's no IOEs being raised in this sequence.
@steveloughran
steveloughran / pannisac.gpx
Created January 30, 2018 06:40
Pannisac GPX trace
<?xml version="1.0" encoding="UTF-8"?>
<gpx creator="StravaGPX iPhone" version="1.1" xmlns="http://www.topografix.com/GPX/1/1" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://www.topografix.com/GPX/1/1 http://www.topografix.com/GPX/1/1/gpx.xsd">
<metadata>
<time>2017-07-17T17:42:03Z</time>
</metadata>
<trk>
<name>Panissac</name>
<trkseg>
<trkpt lat="46.0870610" lon="1.0791790">
<ele>275.9</ele>
@steveloughran
steveloughran / banyoles.gpx
Last active January 30, 2018 06:24
Full GPX trace of a ride around Banyoles, Catalunia, [Spain].
<?xml version="1.0" encoding="UTF-8"?>
<gpx creator="StravaGPX" version="1.1" xmlns="http://www.topografix.com/GPX/1/1" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://www.topografix.com/GPX/1/1 http://www.topografix.com/GPX/1/1/gpx.xsd http://www.garmin.com/xmlschemas/GpxExtensions/v3 http://www.garmin.com/xmlschemas/GpxExtensionsv3.xsd http://www.garmin.com/xmlschemas/TrackPointExtension/v1 http://www.garmin.com/xmlschemas/TrackPointExtensionv1.xsd" xmlns:gpxtpx="http://www.garmin.com/xmlschemas/TrackPointExtension/v1" xmlns:gpxx="http://www.garmin.com/xmlschemas/GpxExtensions/v3">
<metadata>
<time>2017-07-21T14:12:25Z</time>
</metadata>
<trk>
<name>Farm Trails to town and a cafe largo</name>
<trkseg>
<trkpt lat="42.1775050" lon="2.7566040">
<ele>183.0</ele>
<?xml version="1.0" ?>
<gpx xmlns="http://www.topografix.com/GPX/1/1">
<trk>
<name>
faslane synchrotron
</name>
<trkseg>
<trkpt lat="56.0666066942" lon="-4.8184543848">
<time>2018-01-28T07:46:58Z</time>
</trkpt>