Reference Turnkey Applications Tutorials Visual Designer
    • REST API
      • Overview
      • API Endpoint
      • Authentication
      • Requests
      • Responses
      • Paging
      • Reason Codes Dictionary
    • Management APIs
      • Accounts
      • Identity Access Management Early Access
        • Identity Access Management Overview
        • Identity Access Management API
          • User Management
            • Create a User
            • Update a User
            • Retrieve a User
            • Delete a User
          • API Keys Management
            • Create an API Key
            • Update an API Key
            • Retrieve an API Key
            • Delete an API Key
      • Applications
      • Clients
        • Create a Client
        • Delete a Client
        • Change Client’s Password
        • Get a List of Available Clients
      • Incoming Phone Numbers
        • IncomingPhoneNumber Instance Resource
        • IncomingPhoneNumbers List Resource
        • Local IncomingPhoneNumber Factory Resource
        • Toll-Free IncomingPhoneNumber Factory Resource
        • Mobile IncomingPhoneNumber Factory Resource
        • Attach a phone number to an application
        • Delete a phone number
        • List of Phone Numbers
        • Incoming Phone Number Regex Support
      • Notifications
      • Usage Records
    • Voice
      • Calls
        • Call List Resource URI
        • Making a Call
        • Modifying Live Calls
        • Examples
        • List Filter
        • Paging Information
      • Conference Management
        • Supported Operations
        • Conference List Resource URI
      • Conference Participants Management
        • Participants List Resource URI
      • Gather DTMF
      • Gather Speech
      • Say
      • Play
      • Hold
      • Recordings
      • Refers
        • Resource Properties
        • Supported Operations
        • Paging Information
      • Resume
      • SIP Refer Support
    • SMS
      • Messages
        • Send SMS
        • Get SMS List
        • Get single SMS Information
        • SMS Attributes
      • Email
    • RCML
      • Overview
        • Interacting with Your Application
        • RCML Verbs
      • Dial
        • Client
        • Conference
        • Number
        • SIP
      • Email
      • Gather
      • Say
      • Play
      • SMS
      • Hold
      • Resume
      • Hangup
      • Pause
      • Redirect
      • Record
      • Reject
      • Refer
    • Visual Designer API
      • List Application Templates
      • :List a Specific Application Template
      • Create a Visual Designer Application
      • Get Application Details
      • Save Application Changes
      • Create Application Parameters
      • List Application Parameters
      • Delete Application Parameters
      • Upload Application Media Files
      • List Application Media Files
      • Play Application Media Files
      • Delete Application Media Files
      • Get Application Logs
      • Delete Application Logs
      • Get Application Settings
      • Modify Application Settings
      • Rename an Application
      • Delete an Application
      • Get Visual Designer Configuration
    • Turnkey Apps APIs
      • Smart 2FA
        • Sending One-Time Passwords
        • Verifying One-Time Passwords
        • Cancel One-Time Passwords
        • Session Detail Record (SDR)
        • Get list of One-Time Passwords
        • Get a Single One-Time Password
        • Usage Record One-Time Passwords
        • Common Response Error Code
        • Limit
          • Create Limit
          • Update Limit
          • Delete Limit
          • Get List of Limits
      • Call Queuing
      • Auto Attendant
        • Users
        • Announcement
        • Auto Attendant System
        • Menu
        • Schedule
        • Phone Number
        • Usage Records
        • Third Party Integration
      • Number Masking
        • Application
        • Mask Number Pool
        • Context
        • Participants
        • Interactions
        • Usage Records
      • Task Router
docs 1.0
  • docs
    • 1.0
  • docs
  • Text-to-Speech and SSML Support

Text-to-Speech and SSML Support

Text-to-Speech (TTS)

The <Say> verb is used to convert text into a human-like speech real-time. All you need is to provide the text in the Visual designer’s Say element and Restcomm will synthesize speech and playback the audio. The default TTS provider is Amazon Polly. A default US English dialect is used with a male voice.

When using <Say> you have a choice between using male or female Google or Amazon Polly voices.

Speech Synthesis Markup Language (SSML)

You can send Speech Synthesis Markup Language (SSML) in your Text-to-Speech request to allow for more customization in your audio response by providing details on pauses, and audio formatting for acronyms, dates, times, abbreviations, or text that should be censored.

Supported Voices and languages

For detailed information about all supported languages and voice with Amazon Polly and Google please visit the following resources:

  • Amazon Polly supported voices and languages

  • Google supported voices and languages

Examples

SSML Markup and Text-to-Speech Synthesizes of The Text

<speak>
  This is a <say-as interpret-as="characters">SSML</say-as> example.
  I can pause <break time="3s"/>.
  I can play a sound
  <audio src="https://www.example.com/MY_MP3_FILE.mp3">didn't get your MP3 audio file</audio>.
  I can speak in cardinals. Your number is <say-as interpret-as="cardinal">10</say-as>.
  Or I can speak in ordinals. You are <say-as interpret-as="ordinal">10</say-as> in line.
  Or I can even speak in digits. The digits for ten are <say-as interpret-as="characters">10</say-as>.
  I can also substitute phrases, like the <sub alias="World Wide Web Consortium">W3C</sub>.
  Finally, I can speak a paragraph with two sentences.
  <p><s>This is sentence one.</s><s>This is sentence two.</s></p>
</speak>

Below is the synthesized text for the example SSML document:

This is a S S M L samples. I can pause [3 second pause]. I can play a sound [audio file plays].
I can speak in cardinals. Your number is ten.
Or I can speak in ordinals. You are tenth in line.
Or I can even speak in digits. The digits for ten are one oh.
I can also substitute phrases, like the World Wide Web Consortium.
Finally, I can speak a paragraph with two sentences. This is sentence one. This is sentence two.

The Google Cloud Text-to-Speech supports a subset of available SSML tags.

For more information about how to create audio data from SSML input with the Google Cloud Text-to-Speech, see Creating Voice Audio Files.

Google Cloud Support for SSML Elements

You can use various SSML elements and options for your actions. For more information check out Google Cloud Support for SSML elements.

Amazon Polly Support for SSML Elements

For more information about Amazon Polly supported SSML tags visit Amazon Polly Supported SSML Tags.

Using Speech Synthesis Markup Language (SSML) in Visual Designer

You can use SSML within a <Say> verb in Visual designer as shown below.

  • Click on the gear icon to expand the <Say> verb settings. You will notice a Language drop-down field. Select the desired language.

  • Select the male or female icon next to the Language field to set a voice variation.

  • Save your application.

Using SSML in Vsual designer

Using Speech Synthesis Markup Language (SSML) in RCML

You can use SSML in your RCML applications as follows to create pauses, and audio formatting for acronyms, dates, times, abbreviations, or text that should be censored.

The <emphasis> element can be used to add or remove emphasis from text contained by the element as follows.

<Response>
<Say voice="woman" language="en" loop="3">
<speak>
   <emphasis level="moderate">This is an important announcement</emphasis>
</speak>
</Say>
</Response>

The <break> element lets you control pausing or other prosodic boundaries between words. Using <break> between any pair of tokens is optional. If this element is not present between words, the break is automatically determined based on the linguistic context.

This element accepts two optional attributes:

  • time: Sets the length of the break by seconds or milliseconds (e.g. "3s" or "250ms").

  • strength: Sets the strength of the output’s prosodic break by relative terms. Valid values are: "x-weak", weak", "medium", "strong", and "x-strong". The value "none" indicates that no prosodic break boundary should be outputted, which can be used to prevent a prosodic break that the processor would otherwise produce. The other values indicate monotonically non-decreasing (conceptually increasing) break strength between tokens. The stronger boundaries are typically accompanied by pauses. The following example shows how to use the <break> element to pause between steps:

<Response>
<Say voice="woman" language="en" loop="3">
<speak>
  Step 1, take a deep breath. <break time="200ms"/>
  Step 2, exhale.
  Step 3, take a deep breath again. <break strength="weak"/>
  Step 4, exhale.
</speak>
</Say>
</Response>

The <say‑as> lets you indicate information about the type of text construct that is contained within the element. It also helps specify the level of detail for rendering the contained text.

The <say‑as> element has the required attribute, interpret-as, which determines how the value is spoken. Optional attributes format and detail may be used depending on the particular interpret-as value. The interpret-as attribute supports the following values:

cardinal

The following example is spoken as "Twelve thousand three hundred forty five" (for US English) or "Twelve thousand three hundred and forty five (for UK English)":

<Response>
<Say voice="woman" language="en" loop="3">
<speak>
  <say-as interpret-as="cardinal">12345</say-as>
</speak>
</Say>
</Response>

ordinal

The following example is spoken as "First":

<Response>
<Say voice="woman" language="en" loop="3">
<speak>
  <say-as interpret-as="ordinal">1</say-as>
</speak>
</Say>
</Response>

characters

The following example is spoken as "C A N":

<Response>
<Say voice="woman" language="en" loop="3">
  <speak>
    <say-as interpret-as="characters">can</say-as>
  </speak>
</Say>
</Response>

expletive or bleep

The following example comes out as a beep, as though it has been censored:

<Response>
<Say voice="woman" language="en" loop="3">
 <speak>
   <say-as interpret-as="expletive">censor this</say-as>
 </speak>
 </Say>
 </Response>

verbatim or spell-out

The following example is spelled out letter by letter:

<Response>
<Say voice="woman" language="en" loop="3">
  <speak>
    <say-as interpret-as="verbatim">abcdefg</say-as>
  </speak>
</Say>
</Response>

date

The format attribute is a sequence of date field character codes. Supported field character codes in format are {y, m, d} for year, month, and day (of the month) respectively. If the field code appears once for year, month, or day then the number of digits expected are 4, 2, and 2 respectively. If the field code is repeated then the number of expected digits is the number of times the code is repeated. Fields in the date text may be separated by punctuation and/or spaces.

The detail attribute controls the spoken form of the date. For detail='1' only the day fields and one of month or year fields are required, although both may be supplied. This is the default when less than all three fields are given. The spoken form is "The \{ordinal day} of {month}, {year}".

The following example is spoken as "The thirtieth of September, two thousand and nineteen":

<Response>
<Say voice="woman" language="en" loop="3">
<speak>
  <say-as interpret-as="date" format="yyyymmdd" detail="1">
    2019-09-30
  </say-as>
</speak>
</Say>
</Response>

The following example is spoken as "The thirtieth of September":

<speak>
  <say-as interpret-as="date" format="dm">30-9</say-as>
</speak>

If you are looking for building more complex SSML scenarios make sure to check out the Google Cloud and Amazon Polly documentation pages.

Testing your SSML settings

You can test your SSML settings by initiating a call to your application. Make sure to bind it to a phone number or SIP client prior to that.

Platform

Programmable Voice

Programmable SMS

Turnkey Applications

Smart 2FA

Call Queue

Auto Attendant

Number Masking

Task Router

Campaign Manager

Learn

Terms And Conditions

About

ABOUT

CONTACT US

© 2020, All rights reserved.