Skip to content

Instantly share code, notes, and snippets.

@steveloughran
Last active January 8, 2022 20:44
Show Gist options
  • Save steveloughran/3f7920197480a829f870cf73bd84c5cd to your computer and use it in GitHub Desktop.
Save steveloughran/3f7920197480a829f870cf73bd84c5cd to your computer and use it in GitHub Desktop.
Auditing hadoop for log4j 2.x on the command line.

Auditing hadoop for log4j 2.x on the command line.

ASF hadoop distributions do not contain log4j 2.x, so are not vulnerable to any of the recent CVEs. However, third party products may contain vulnerable libraries. log4j 2.x

This is how to programmatically check to see if a hadoop distribution has a log4j 2.x artifact on its class path and so potentially at risk. using the findclass command.

Introducing the findclass command

The findclass command can be used for manual and scripted probes for classes on the classpath. It can be used to locate, load and instantiate classes, returning different error codes depending on the outcome of the operation.

To invoke it the full path of the tool must be invoked through the hadoop command or its variants.

> hadoop org.apache.hadoop.util.FindClass
Usage : [load | create] <classname>
        [locate | print] <resourcename>]
The return codes are:
  0 -- The operation was successful 
  1 -- Something went wrong 
  2 -- This usage message was printed 
  3 -- The class or resource was not found 
  4 -- The class was found but could not be loaded 
  5 -- The class was loaded, but an instance of it could not be created 

Auditing hadoop/common/lib.

First look for a class found in all Log4J.x releases, org.apache.logging.log4j.core.Core

> hadoop org.apache.hadoop.util.FindClass load org.apache.logging.log4j.core.Core
Loaded org.apache.logging.log4j.core.Core as class org.apache.logging.log4j.core.Core
org.apache.logging.log4j.core.Core: file:/Users/stevel/Projects/Releases/CDH-7.2.9.2/share/hadoop/common/lib/log4j-core-2.14.1.jar

> echo $0
0

If you find this class it means that the distribution is possibly vulnerable

now look for a class new in 2.17, and absent from all prior releases -that is vulnerable to the current set of CVEs:

> hadoop org.apache.hadoop.util.FindClass load org.apache.logging.log4j.core.lookup.RuntimeStrSubstitutor

Class not found org.apache.logging.log4j.core.lookup.RuntimeStrSubstitutor
java.lang.ClassNotFoundException: Class org.apache.logging.log4j.core.lookup.RuntimeStrSubstitutor not found
        at org.apache.hadoop.conf.Configuration.getClassByName(Configuration.java:2565)
        at org.apache.hadoop.util.FindClass.getClass(FindClass.java:156)
        at org.apache.hadoop.util.FindClass.loadClass(FindClass.java:247)
        at org.apache.hadoop.util.FindClass.run(FindClass.java:320)
        at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:76)
        at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:90)
        at org.apache.hadoop.util.FindClass.main(FindClass.java:381)
  
> echo $0
3

If this class is not found it means the distribution is <2.17 and so is vulnerable

found Core found RuntimeStrSubstitutor Status
no no log4j 2.x not found
yes no vulnerable artifact found
yes yes fixed artifact found

HDFS services

To audit the share/hadoop/hdfs/lib directory, use the hdfs command

~/P/R/hadoop-3.4.0-SNAPSHOT> bin/hdfs org.apache.hadoop.util.FindClass load org.apache.logging.log4j.core.Core

Class not found org.apache.logging.log4j.core.Core
java.lang.ClassNotFoundException: Class org.apache.logging.log4j.core.Core not found
        at org.apache.hadoop.conf.Configuration.getClassByName(Configuration.java:2596)
        at org.apache.hadoop.util.FindClass.getClass(FindClass.java:156)
        at org.apache.hadoop.util.FindClass.loadClass(FindClass.java:247)
        at org.apache.hadoop.util.FindClass.run(FindClass.java:320)
        at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:81)
        at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:95)
        at org.apache.hadoop.util.FindClass.main(FindClass.java:381)

If a vulnerable log4j 2.x JAR is found then all deployed HDFS services will be vulnerable.

YARN services

Invoke the FindClass tool through the yarn command to audit share/hadoop/yarn/lib

~/P/R/hadoop-3.4.0-SNAPSHOT> bin/yarn org.apache.hadoop.util.FindClass load org.apache.logging.log4j.core.Core
Class not found org.apache.logging.log4j.core.Core
java.lang.ClassNotFoundException: Class org.apache.logging.log4j.core.Core not found
        at org.apache.hadoop.conf.Configuration.getClassByName(Configuration.java:2596)
        at org.apache.hadoop.util.FindClass.getClass(FindClass.java:156)
        at org.apache.hadoop.util.FindClass.loadClass(FindClass.java:247)
        at org.apache.hadoop.util.FindClass.run(FindClass.java:320)
        at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:81)
        at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:95)
        at org.apache.hadoop.util.FindClass.main(FindClass.java:381)

If a vulnerable log4j 2.x JAR is found then all deployed YARN services will be vulnerable.

Notes

  1. There is no equivalent probe for share/hadoop/tools/lib
  2. If there is a version of the library in share/hadoop/common/lib/ then that will be found by the hdfs and yarn commands -possibly hiding any duplicate copy in the other lib directories.
  3. Applications deployed in the cluster they have their own copies of the JAR.
  4. And bundles of Application artifacts stored in shared tar.gz files on HDFS may contain copies too.

What does that mean? It means that if a vulnerable log4j 2.x artifact is found, the distribution is at risk. If the probes do not find one -it does not guarantee that the cluster is safe, only that it has not been found in classpaths loaded by the hadoop, hdfs and yarn applications and the services they deploy.

Also, it means that patching a distribution is fairly hard. Based on personal experience trying to update other binaries, it is the tarballs on HDFS which complicate the process.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment