Show Menu

Programming Standards for Speech Cheat Sheet by

Programming Standards for Speech
programming     standards     applications     speech


Explore and learn about voice and multimodal standards without making a huge investment in time and money. Even if you find a standard doesn’t fit the bill, what you learn might be relevant for the next applic­ation. And if a standard does meet your requir­ements, then your project will have a big leg up over applic­ations where the components are propri­etary.

Some of the more prominent standards for speech, natural language, and multimodal systems.

Speech Synthesis Markup Language (SSML)

A W3C standard that puts the finishing touches on how speech is pronounced at a detailed level. SSML instructs the text-t­o-s­peech (TTS) synthe­sizer about specifics of pronun­cia­tion, such as which words should be emphasized and where pauses should be. These adjust­ments can make a huge difference in how the synthe­sized speech affects listeners, like comparing the line readings of a skilled actor and an ordinary person. A moving speech recited by James Earl Jones will likely sound flat and unconv­incing in the voice of your neighbor’s 15-yea­r-old son. With SSML, developers aren’t limited to a TTS system’s default pronun­cia­tions; they can make them sound exactly as they want.

SSML is widely supported by TTS systems; the TTS technology used in the Amazon Alexa Skills Kit, Microsoft Cognitive Services, IBM Watson, and Nuance cloud services all have SSML commands as an option, and most of these products offer online demos. An open-s­ource TTS platform, the Mary system (http:­//m­ary.df­ki.d­e/), allows you to experiment with SSML. Authoring SSML directly can be difficult, but authoring tools like the Chant VoiceM­arkup Kit or the open-s­ource SSML builder on GitHub can help.

State Chart Extensible Markup Language (SCXML)

Popular standard with open-s­ource support. SCXML is a powerful tool for defining state-­based speech and multimodal dialogues. When a user says something or interacts with the screen, an SCXML-­based system can react and move to a new state, triggering a display change or a spoken prompt. The state-­based approach is helpful for defining how the users progress through an app.

SCXML resources are available for many platforms, including server, desktop, and mobile apps. On a server or desktop, Apache Commons SCXML, a Java-based interp­reter, and PySCXML (for Python) are options. JavaSc­rip­t-based SCMXL interp­reters like SCION can be used directly in browsers.

To make authoring SCXML easier, several editors and visual­izers have been developed, such as SCXMLGUI and VisualSC.


The standard for defining voice dialogues, is widely implem­ented and needs no introd­uction. An open-s­ource implem­ent­ation, JVoiceXML, is available and would be a good way to start experi­menting with VoiceXML.

While SSML, SCXML, and VoiceXML are probably the most widely implem­ented speech standards, implem­ent­ations of some of the newer standards can also be found.

The Multimodal Archit­ecture

An approach to integr­ating multimodal functions like speech recogn­ition, emotion recogn­ition, and face recogn­ition into intera­ctive systems. Java, JavaSc­ript, and Action­Script libraries for this standard are available at https:­//g­ith­ub.c­om­/w3­c/mmi.

Emotion Markup Language (Emoti­onML)

A language for repres­enting emotion, has also been implem­ented in the Mary TTS system. EmotionML can change pronun­ciation at a higher level than SSSML; rather than emphas­izing a word, EmotionML can make a voice sound angry or happy. The Mary website has an online demo for using EmotionML to tweak the emotions expressed by a synthetic voice.

Download the Programming Standards for Speech Cheat Sheet

1 Page

PDF (recommended)

Alternative Downloads

Share This Cheat Sheet!



No comments yet. Add yours below!

Add a Comment

Your Comment

Please enter your name.

    Please enter your email address

      Please enter your Comment.

          Related Cheat Sheets

          Regular Expressions Cheat Sheet

          More Cheat Sheets by Davidpol