Pronunciation dictionaries allow you to customize how specific words are pronounced in your text-to-speech synthesis. This is particularly useful for:
  • Brand names, product names, or proper nouns
  • Technical terms or acronyms
  • Words that should be pronounced differently than their standard pronunciation
  • Non-English words in English text (or vice versa)

How Pronunciation Dictionaries Work

A pronunciation dictionary is a collection of word-pronunciation pairs that you create and manage through the Waves API. Each dictionary has a unique ID that you can reference in your TTS requests to ensure consistent pronunciation across your applications.

Key Concepts

  • Word: The text that appears in your input
  • Pronunciation: The phonetic representation using International Phonetic Alphabet (IPA) notation
  • Dictionary ID: A unique identifier for your pronunciation dictionary that you use in TTS requests

Creating a Pronunciation Dictionary

Step 1: Create Your Dictionary

First, create a pronunciation dictionary with your custom word-pronunciation pairs:
curl -X POST "https://api.waves.com/api/v1/pronunciation-dicts" \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "items": [
      {
        "word": "Waves",
        "pronunciation": "weɪvz"
      },
      {
        "word": "API",
        "pronunciation": "eɪ piː aɪ"
      },
      {
        "word": "GitHub",
        "pronunciation": "ɡɪt hʌb"
      }
    ]
  }'
Response:
{
  "id": "64f1234567890abcdef12345",
  "items": [
    {
      "word": "Waves",
      "pronunciation": "weɪvz"
    },
    {
      "word": "API", 
      "pronunciation": "eɪ piː aɪ"
    },
    {
      "word": "GitHub",
      "pronunciation": "ɡɪt hʌb"
    }
  ],
  "createdAt": "2023-09-01T12:00:00.000Z"
}

Step 2: Save the Dictionary ID

Important: Save the returned id from the response. You’ll need this ID to reference your pronunciation dictionary in TTS requests and for future updates or deletions.
const dictionaryId = "64f1234567890abcdef12345"; // Save this!

Managing Your Pronunciation Dictionaries

List All Dictionaries

Retrieve all your pronunciation dictionaries:
curl -X GET "https://api.waves.com/api/v1/pronunciation-dicts" \
  -H "Authorization: Bearer YOUR_API_KEY"

Update a Dictionary

Modify an existing pronunciation dictionary:
curl -X PUT "https://api.waves.com/api/v1/pronunciation-dicts" \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "id": "64f1234567890abcdef12345",
    "items": [
      {
        "word": "Waves",
        "pronunciation": "weɪvz"
      },
      {
        "word": "OpenAI",
        "pronunciation": "oʊpən eɪ aɪ"
      },
      {
        "word": "TTS",
        "pronunciation": "tiː tiː ɛs"
      }
    ]
  }'

Delete a Dictionary

Remove a pronunciation dictionary:
curl -X DELETE "https://api.waves.com/api/v1/pronunciation-dicts" \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "id": "64f1234567890abcdef12345"
  }'

Using Pronunciation Dictionaries in TTS Requests

Once you have created a pronunciation dictionary and obtained its ID, you can use it in your TTS requests by including the pronunciation_dicts parameter. This parameter accepts an array of dictionary IDs, allowing you to use multiple pronunciation dictionaries in a single request:

Lightning Model Example

curl -X POST "https://api.waves.com/api/v1/lightning" \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "text": "Welcome to Waves API! Our TTS service integrates with GitHub.",
    "voice_id": "your_voice_id",
    "pronunciation_dicts": ["64f1234567890abcdef12345"],
    "sample_rate": 24000,
    "speed": 1.0,
    "language": "en",
    "output_format": "wav"
  }'

Lightning Large Model Example

curl -X POST "https://api.waves.com/api/v1/lightning-large" \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "text": "The Waves API makes TTS integration simple.",
    "voice_id": "your_voice_id", 
    "pronunciation_dicts": ["64f1234567890abcdef12345"],
    "sample_rate": 24000,
    "speed": 1.0,
    "consistency": 0.5,
    "similarity": 0.0,
    "enhancement": 1,
    "language": "en",
    "output_format": "wav"
  }'

Using Multiple Dictionaries

You can also use multiple pronunciation dictionaries in a single request by providing an array of dictionary IDs:
curl -X POST "https://api.waves.com/api/v1/lightning" \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "text": "Our API uses PostgreSQL and integrates with GitHub for CI/CD.",
    "voice_id": "your_voice_id",
    "pronunciation_dicts": [
      "64f1234567890abcdef12345",
      "64f9876543210fedcba09876"
    ],
    "sample_rate": 24000,
    "speed": 1.0,
    "language": "en",
    "output_format": "wav"
  }'

Complete Workflow Example

Here’s a complete example showing the full workflow from creating a dictionary to using it in synthesis:
import requests
import json

# Your API configuration
API_KEY = "your_api_key_here"
BASE_URL = "https://api.waves.com/api/v1"
headers = {
    "Authorization": f"Bearer {API_KEY}",
    "Content-Type": "application/json"
}

# Step 1: Create pronunciation dictionary
pronunciation_data = {
    "items": [
        {"word": "PostgreSQL", "pronunciation": "poʊstɡrɛs kjuː ɛl"},
        {"word": "Redis", "pronunciation": "rɛdɪs"},
        {"word": "Kubernetes", "pronunciation": "kuːbərˈnɛtɪs"},
        {"word": "nginx", "pronunciation": "ɛndʒɪnɛks"}
    ]
}

# Create the dictionary
response = requests.post(
    f"{BASE_URL}/pronunciation-dicts",
    headers=headers,
    json=pronunciation_data
)

dict_data = response.json()
dictionary_id = dict_data["id"]
print(f"Created pronunciation dictionary with ID: {dictionary_id}")

# Step 2: Use the dictionary in TTS synthesis
tts_request = {
    "text": "Our infrastructure uses PostgreSQL, Redis, Kubernetes, and nginx.",
    "voice_id": "your_voice_id",
    "pronunciation_dicts": [dictionary_id],  # Use the dictionary ID here
    "sample_rate": 24000,
    "speed": 1.0,
    "language": "en",
    "output_format": "wav"
}

# Generate speech with custom pronunciations
audio_response = requests.post(
    f"{BASE_URL}/lightning",
    headers=headers,
    json=tts_request
)

# Save the audio file
with open("speech_with_custom_pronunciations.wav", "wb") as f:
    f.write(audio_response.content)

print("Speech generated with custom pronunciations!")

Phonetic Notation Guidelines

When creating pronunciation entries, use International Phonetic Alphabet (IPA) notation:

Common IPA Symbols for English

SymbolSoundExample
”eye” soundIce
”ay” soundDay
”oh” soundGo
”ow” soundHow
ɔɪ”oy” soundBoy
θ”th” (voiceless)Think
ð”th” (voiced)This
ʃ”sh” soundShip
ʒ”zh” soundPleasure
”ch” soundChair
”j” soundJump
ŋ”ng” soundSing

Tips for Creating Pronunciations

  1. Break down complex words: For multi-syllable words, separate syllables with spaces
    • “Kubernetes” → “kuː bər ˈnɛt ɪs”
  2. Mark stress: Use ˈ before the primary stressed syllable
    • “API” → “eɪ ˈpiː aɪ”
  3. Use online IPA tools: Tools like IPA Phonetic Transcription can help generate accurate pronunciations
  4. Test and iterate: Create a small dictionary first, test the pronunciations, and adjust as needed

Best Practices

Dictionary Management

  • Use descriptive naming: While the API doesn’t support custom names, maintain your own mapping of dictionary IDs to purposes
  • Keep dictionaries focused: Create separate dictionaries for different domains (e.g., one for brand names, another for technical terms)
  • Combine multiple dictionaries: Use the array format to apply multiple pronunciation dictionaries to a single TTS request
  • Regular updates: Update dictionaries as your vocabulary needs change

Pronunciation Quality

  • Verify pronunciations: Test your custom pronunciations to ensure they sound natural
  • Consider context: Some words may need different pronunciations in different contexts
  • Language consistency: Ensure pronunciations match the language setting of your TTS requests

Performance Considerations

  • Cache dictionary IDs: Store dictionary IDs in your application to avoid repeated API calls
  • Batch updates: When possible, update multiple pronunciations in a single API call
  • Monitor usage: Keep track of which dictionaries are actively used

Troubleshooting

Common Issues

Dictionary not found: Ensure you’re using the correct dictionary ID and that the dictionary hasn’t been deleted. Pronunciations not applied: Verify that:
  • The dictionary ID is included in your TTS request
  • The words in your text match exactly with the words in your dictionary (case-sensitive)
  • The pronunciation is in valid IPA format
Unexpected pronunciations: Double-check your IPA notation and test with simpler pronunciations first.

Error Responses

The API will return specific error messages for common issues:
{
  "error": "Invalid request body",
  "details": [
    {
      "code": "invalid_type",
      "expected": "string",
      "received": "undefined",
      "path": ["items", 0, "pronunciation"],
      "message": "Required"
    }
  ]
}

Next Steps