Skip to main content
Real-Time Redaction allows you to identify and mask sensitive information from transcriptions to protect privacy and comply with data protection regulations. The Lightning STT API supports two types of redaction: PII (Personally Identifiable Information) and PCI (Payment Card Information).

Enabling Redaction

Add redact_pii and/or redact_pci parameters to your WebSocket connection query parameters. Both parameters default to false. Options: true, false.

Real-Time WebSocket API

const url = new URL("wss://waves-api.smallest.ai/api/v1/lightning/get_text");
url.searchParams.append("language", "en");
url.searchParams.append("encoding", "linear16");
url.searchParams.append("sample_rate", "16000");
url.searchParams.append("redact_pii", "true");
url.searchParams.append("redact_pci", "true");

const ws = new WebSocket(url.toString(), {
  headers: {
    Authorization: `Bearer ${API_KEY}`,
  },
});

Redaction Types

PII Redaction (redact_pii)

When redact_pii=true is enabled, the following types of personally identifiable information are automatically identified and redacted:
  • Names: First names and surnames
  • Addresses: Street addresses and locations
  • Phone numbers: Various phone number formats
Redacted PII items are replaced with placeholder tokens like [FIRSTNAME_1], [FIRSTNAME_2], [PHONENUMBER_1], etc.

PCI Redaction (redact_pci)

When redact_pci=true is enabled, the following types of payment card information are automatically identified and redacted:
  • Credit card numbers: 16-digit credit/debit card numbers
  • CVV codes: Card verification values
  • ZIP codes: Postal/ZIP codes
  • Account numbers: Bank account numbers
Redacted PCI items are replaced with placeholder tokens like [CREDITCARDCVV_1], [ZIPCODE_1], [ACCOUNTNUMBER_1], etc.

Output Format

When redaction is enabled, the transcription text contains placeholder tokens instead of the original sensitive information. The response also includes a redacted_entities array listing all the redacted entity placeholders.

Sample Response with Redaction

{
  "session_id": "sess_12345abcde",
  "transcript": "[CREDITCARDCVV_1] and expiry [TIME_2] slash 34.",
  "is_final": true,
  "is_last": true,
  "full_transcript": "Hi, my name is [FIRSTNAME_1] [FIRSTNAME_2] You can reach me at [PHONENUMBER_1] and I paid using my Visa card [ZIPCODE_1] [ACCOUNTNUMBER_1] with [CREDITCARDCVV_1] and expiry [TIME_1].",
  "language": "en",
  "languages": ["en"],
  "redacted_entities": [
    "[CREDITCARDCVV_1]",
    "[TIME_2]"
  ]
}

Response Fields

FieldTypeWhen IncludedDescription
redacted_entitiesarrayredact_pii=true or redact_pci=trueList of redacted entity placeholders (e.g., [FIRSTNAME_1], [CREDITCARDCVV_1])
transcriptstringAlwaysTranscription text with redacted entities replaced by placeholder tokens
full_transcriptstringfull_transcript=true AND is_final=trueCumulative transcript with redacted entities (when full_transcript=true is enabled)

Redaction Placeholder Format

Redacted entities are replaced with placeholder tokens following the pattern:
  • [ENTITYTYPE_N] where ENTITYTYPE indicates the type of information (e.g., FIRSTNAME, PHONENUMBER, CREDITCARDCVV, ZIPCODE, ACCOUNTNUMBER)
  • N is a sequential number starting from 1 to uniquely identify each instance
Examples:
  • [FIRSTNAME_1], [FIRSTNAME_2] - First names
  • [PHONENUMBER_1] - Phone numbers
  • [CREDITCARDCVV_1] - Credit card CVV codes
  • [ZIPCODE_1] - ZIP/Postal codes
  • [ACCOUNTNUMBER_1] - Account numbers
For the highest level of protection and effective compliance auditing, enable both redact_pii=true and redact_pci=true flags in your request.Additionally, use the redacted_entities array in the response as an audit trail to track what data has been redacted from each transcript.

Compliance and Privacy

Redaction helps with compliance requirements for:
  • HIPAA: Health Insurance Portability and Accountability Act (healthcare data)
  • GDPR: General Data Protection Regulation (EU data protection)
  • CCPA: California Consumer Privacy Act (California data protection)
  • PCI DSS: Payment Card Industry Data Security Standard (payment card data)
  • SOC 2: System and Organization Controls (security and privacy)
Note: Redaction is a tool to help protect sensitive information, but it should be used as part of a comprehensive data protection strategy. Always consult with legal and compliance teams to ensure your implementation meets regulatory requirements.