Audit API Payloads for Data Privacy

Advanced 30 min 5 steps

The problem

Before a security review, pentest, or compliance audit, you need to verify that your API isn't exposing PII, credentials, or sensitive JWT claims in its request and response payloads. This workflow audits all layers systematically — from raw payload structure down to JWT claim content — and replaces any real data with safe synthetic equivalents.

What you'll accomplish

All API payloads formatted and PII fields inventoried by field name and location
Embedded credentials, tokens, and API keys identified and documented for removal
All PII fields replaced with format-valid synthetic equivalents in both request and response
JWT claims audited for sensitive data, weak algorithms, excessive scope, and long expiry
All real data in docs, fixtures, and Postman collections replaced with synthetic records

Step-by-step

1

Format your API request and response payloads for visual inspection

Before any automated scanning, a formatted, human-readable view of your payloads lets you identify structural patterns and obvious problems before the tools do. Paste your raw API request body and response body into the JSON Formatter & Validator separately. The prettified output makes it immediately visible: deeply nested objects that may contain PII at unexpected depths (e.g., user.profile.contact.emergency_contact.phone), large arrays that may contain repeated personal data records, metadata fields that may contain internal system information not intended for external clients (internal IDs, server-side tracking fields, implementation details), and response fields that expose more data than the client actually uses (over-fetching — sending a full user object when only name is needed). Document the fields you see — you'll use this inventory in Steps 2–4.

Tip: Use the JSON Formatter's search or key-highlighting feature to locate all fields with names like 'email', 'phone', 'name', 'address', 'ssn', 'pan', 'aadhaar', 'dob' — these are your PII inventory.

2

Scan for embedded credentials, tokens, and API keys in the payload

API responses and logs frequently contain secrets that shouldn't be there: an OAuth token returned in a response body instead of just in headers, a webhook payload that includes the signing secret, an error response that leaks an internal connection string, a debug field that shows a private key in development mode, or an internal API key embedded in a third-party integration response. Use the Secret Scanner to scan both your request and response payloads for: API key patterns (AWS AKIA*, Stripe sk_live_*, Google AIza*, GitHub ghp_*), JWT-shaped strings in unexpected fields, RSA/EC private key headers (-----BEGIN PRIVATE KEY-----), database connection strings containing passwords, .env variable patterns (KEY=value), and high-entropy strings that may be unrecognised credential formats. Each match should be investigated — not all high-entropy strings are secrets, but all secrets are high-entropy strings.

Tip: Pay special attention to error responses — APIs often include internal debugging information in error payloads that should never reach production clients. Run the scanner on your 400/401/500 error response samples too.

3

Anonymise all PII fields found in the payload inventory

Using the field inventory from Step 1, pass the payloads through the Data Anonymizer to replace all PII with format-valid synthetic equivalents. This step is distinct from the secret scanning in Step 2 — secrets are credentials and cryptographic material; PII is personal information about individuals. The anonymizer handles: names (replaced with synthetic names of similar length and format), email addresses (replaced with @example.com equivalents that pass validation), phone numbers (replaced with numbers in the same format and country code), physical addresses (replaced with addresses in the same region that don't map to real locations), national ID numbers (Aadhaar, PAN, SSN — replaced with checksum-valid synthetic equivalents), dates of birth (replaced with nearby dates that preserve age bracket), and IP addresses (replaced with addresses in the same /24 range). After anonymisation, re-run the JSON Formatter to verify the payload structure is still valid.

Tip: For API responses that include user IDs or account numbers, consider whether these should be anonymised or pseudonymised. Pseudonymisation (replacing with a consistent fake ID) allows you to trace requests across logs; full anonymisation breaks that traceability.

4

Decode and audit JWT claims for sensitive data exposure

JWTs that appear in your API request or response payloads deserve a dedicated audit step. Use the JWT Decoder to inspect both the header and payload claims of any tokens found. Check for: (a) Sensitive data in claims — JWTs are base64url encoded, not encrypted. Any claim in the payload (sub, email, name, role, custom claims) is readable by anyone who intercepts the token unless it's additionally encrypted (JWE). Never put sensitive PII in JWT claims if the token is transmitted over HTTP. (b) Weak algorithms — alg: 'none' in the header disables signature verification entirely; alg: 'HS256' with a guessable secret is brute-forceable. RS256 or ES256 (asymmetric) are preferred for production. (c) Excessive scope — a token that grants admin access when only read access is needed violates the principle of least privilege. (d) Long-lived tokens — exp values set years in the future mean a stolen token can be used indefinitely. Rotate short-lived access tokens (15 min–1 hour) with long-lived refresh tokens in HttpOnly cookies.

Tip: If JWTs appear in your API response body (rather than only in Set-Cookie headers), that's an architectural flag — tokens in response bodies can be stolen via XSS. Prefer delivering tokens via HttpOnly Secure SameSite=Strict cookies.

5

Replace any remaining real data with synthetic equivalents for safe storage

After cleaning secrets (Step 2) and anonymising PII (Step 3), the final step ensures that any real records used as examples, documentation samples, or test fixtures in your codebase are replaced with fully synthetic equivalents generated from scratch. Use the Fake Person Generator to create a set of synthetic user records that match your schema — same fields, same formats, but entirely fictional from the start. Replace any example payloads in your API documentation, Postman collections, README files, fixture JSON files, and seed scripts with these synthetic records. This is particularly important for: API documentation published publicly (never use real customer data as examples), Postman/Insomnia collections shared with contractors, test fixtures committed to version control, and error response examples that may contain real request data from a debugging session.

Tip: After replacing all example data, run a final Secret Scanner pass on your entire codebase — not just the payloads — to confirm no real credentials or PII remain in your repository history or configuration files.

Why this workflow works

Privacy audits fail when they address only one category of sensitive data. Credentials and PII are distinct: the Secret Scanner (Step 2) catches API keys and tokens but doesn't flag a name or email address. The Data Anonymizer (Step 3) removes PII but doesn't detect an embedded JWT. The JWT Decoder (Step 4) inspects token claims that automated scanners miss — a JWT might pass secret scanning (it's a token, not a credential) but contain sensitive PII inside its unencrypted payload claims. The final step (Step 5) ensures that clean, synthetic data replaces any real records in documentation and fixtures — the place most teams overlook during security reviews. The Advanced label is because this workflow requires understanding the difference between these categories and why each step is necessary even if previous steps seemed to find nothing.

Frequently asked questions

What counts as PII in an API response?

Under GDPR and India's DPDP Act, PII (Personally Identifiable Information) includes: direct identifiers — name, email, phone, national ID (Aadhaar, PAN, SSN, passport), date of birth, physical address, IP address, biometric data. Indirect identifiers — when combined with other data, they identify a person: device ID, cookie ID, account number, transaction ID linked to a person, geolocation coordinates, browser fingerprint. In API contexts: user_id fields that map to real individuals count as personal data even if they look like opaque UUIDs, because they enable identification through a lookup. The threshold question is: can this data, alone or in combination with other available data, identify a living individual?

How do I identify secrets in JSON API payloads?

Look for: high-entropy strings (random-looking base64 or hex strings of 20+ characters), strings matching known credential formats (AWS keys start with 'AKIA', Stripe live keys start with 'sk_live_', Google API keys start with 'AIza'), JWTs (three base64url segments separated by dots), URLs containing credentials in the query string (?api_key=...), and fields named 'token', 'secret', 'key', 'password', 'credential', 'auth', 'authorization'. Also look in: response headers that are logged and stored (Set-Cookie headers with tokens, Authorization headers echoed back in error responses), error messages that include query parameters or request bodies, and webhook payloads that include signing secrets alongside the event data.

What JWT claims are considered sensitive and should not be in the token payload?

JWT payloads are base64url encoded — not encrypted. Anyone who intercepts the token can read all claims. Sensitive claims to avoid unless the token is encrypted (JWE): passwords or password hashes (never), SSN, Aadhaar, or other national IDs, credit card numbers or payment info, full date of birth, full physical addresses, medical records, financial account numbers. Acceptable in JWTs: user ID (sub), email (for identity, not for sensitive lookup), roles and permissions, issued-at (iat), expiry (exp), issuer (iss), audience (aud), and non-sensitive profile data (display name, timezone preference). When in doubt: if you wouldn't put it in a URL query parameter, don't put it in an unencrypted JWT.

What is the difference between masking, anonymising, and pseudonymising API data?

Masking replaces data with asterisks or placeholder characters (john@example.com → ****@*******.***) — the structure is preserved but no value is present. Anonymisation transforms real data into realistic synthetic equivalents that cannot be reversed to identify the original person (john@example.com → sarah.chen@example.net) — format-valid but fictional. Pseudonymisation replaces real identifiers with consistent tokens (john@example.com → user_7f3a9b2c) — the mapping is stored separately, allowing re-identification with the key. For API privacy auditing: full anonymisation is the gold standard for documentation and test fixtures. Pseudonymisation is acceptable for logging and analytics where you need to trace user journeys without exposing actual identities. Masking is a last resort when structure must be preserved but value cannot be.

What regulations require API payload auditing?

GDPR (EU): Article 25 (data minimisation — don't collect more than needed), Article 32 (security of processing — encrypt data in transit and at rest), Article 83 (fines up to 4% of global annual turnover). India DPDP Act 2023: Section 8 (data fiduciaries must implement safeguards), Section 9 (data minimisation principle), applies to all processing of Indian residents' personal data regardless of where the processing occurs. PCI DSS: Requirement 3 (protect stored cardholder data), Requirement 6 (secure systems and applications) — applies if your API touches payment card data. HIPAA (US healthcare): Security Rule requires audit controls on all systems that access electronic protected health information. SOC 2 Type II: requires demonstrating access controls and data handling procedures through an independent audit.

More workflows