Audit API Payloads for Data Privacy
The problem
Before a security review, pentest, or compliance audit, you need to verify that your API isn't exposing PII, credentials, or sensitive JWT claims in its request and response payloads. This workflow audits all layers systematically — from raw payload structure down to JWT claim content — and replaces any real data with safe synthetic equivalents.
What you'll accomplish
Step-by-step
Why this workflow works
Privacy audits fail when they address only one category of sensitive data. Credentials and PII are distinct: the Secret Scanner (Step 2) catches API keys and tokens but doesn't flag a name or email address. The Data Anonymizer (Step 3) removes PII but doesn't detect an embedded JWT. The JWT Decoder (Step 4) inspects token claims that automated scanners miss — a JWT might pass secret scanning (it's a token, not a credential) but contain sensitive PII inside its unencrypted payload claims. The final step (Step 5) ensures that clean, synthetic data replaces any real records in documentation and fixtures — the place most teams overlook during security reviews. The Advanced label is because this workflow requires understanding the difference between these categories and why each step is necessary even if previous steps seemed to find nothing.
Frequently asked questions
What counts as PII in an API response?
Under GDPR and India's DPDP Act, PII (Personally Identifiable Information) includes: direct identifiers — name, email, phone, national ID (Aadhaar, PAN, SSN, passport), date of birth, physical address, IP address, biometric data. Indirect identifiers — when combined with other data, they identify a person: device ID, cookie ID, account number, transaction ID linked to a person, geolocation coordinates, browser fingerprint. In API contexts: user_id fields that map to real individuals count as personal data even if they look like opaque UUIDs, because they enable identification through a lookup. The threshold question is: can this data, alone or in combination with other available data, identify a living individual?
How do I identify secrets in JSON API payloads?
Look for: high-entropy strings (random-looking base64 or hex strings of 20+ characters), strings matching known credential formats (AWS keys start with 'AKIA', Stripe live keys start with 'sk_live_', Google API keys start with 'AIza'), JWTs (three base64url segments separated by dots), URLs containing credentials in the query string (?api_key=...), and fields named 'token', 'secret', 'key', 'password', 'credential', 'auth', 'authorization'. Also look in: response headers that are logged and stored (Set-Cookie headers with tokens, Authorization headers echoed back in error responses), error messages that include query parameters or request bodies, and webhook payloads that include signing secrets alongside the event data.
What JWT claims are considered sensitive and should not be in the token payload?
JWT payloads are base64url encoded — not encrypted. Anyone who intercepts the token can read all claims. Sensitive claims to avoid unless the token is encrypted (JWE): passwords or password hashes (never), SSN, Aadhaar, or other national IDs, credit card numbers or payment info, full date of birth, full physical addresses, medical records, financial account numbers. Acceptable in JWTs: user ID (sub), email (for identity, not for sensitive lookup), roles and permissions, issued-at (iat), expiry (exp), issuer (iss), audience (aud), and non-sensitive profile data (display name, timezone preference). When in doubt: if you wouldn't put it in a URL query parameter, don't put it in an unencrypted JWT.
What is the difference between masking, anonymising, and pseudonymising API data?
Masking replaces data with asterisks or placeholder characters (john@example.com → ****@*******.***) — the structure is preserved but no value is present. Anonymisation transforms real data into realistic synthetic equivalents that cannot be reversed to identify the original person (john@example.com → sarah.chen@example.net) — format-valid but fictional. Pseudonymisation replaces real identifiers with consistent tokens (john@example.com → user_7f3a9b2c) — the mapping is stored separately, allowing re-identification with the key. For API privacy auditing: full anonymisation is the gold standard for documentation and test fixtures. Pseudonymisation is acceptable for logging and analytics where you need to trace user journeys without exposing actual identities. Masking is a last resort when structure must be preserved but value cannot be.
What regulations require API payload auditing?
GDPR (EU): Article 25 (data minimisation — don't collect more than needed), Article 32 (security of processing — encrypt data in transit and at rest), Article 83 (fines up to 4% of global annual turnover). India DPDP Act 2023: Section 8 (data fiduciaries must implement safeguards), Section 9 (data minimisation principle), applies to all processing of Indian residents' personal data regardless of where the processing occurs. PCI DSS: Requirement 3 (protect stored cardholder data), Requirement 6 (secure systems and applications) — applies if your API touches payment card data. HIPAA (US healthcare): Security Rule requires audit controls on all systems that access electronic protected health information. SOC 2 Type II: requires demonstrating access controls and data handling procedures through an independent audit.