VOOZH about

URL: https://www.w3.org/TR/voicexml30/

⇱ Voice Extensible Markup Language (VoiceXML) 3.0


👁 W3C

http://www.w3.org/TR/2010/WD-voicexml30-20101216/
Latest version:
http://www.w3.org/TR/voicexml30/
Previous version:
http://www.w3.org/TR/2010/WD-voicexml30-20100831/
Editors:
Scott McGlashan, Hewlett-Packard (co-Editor-in-Chief)
Daniel C. Burnett, Voxeo (co-Editor-in-Chief)
Rahul Akolkar, IBM
RJ Auburn, Voxeo
Paolo Baggia, Loquendo
Jim Barnett, Genesys Telecommunications Laboratories
Michael Bodell, Microsoft
Jerry Carter, Nuance
Mangesh Deshmukh, Alcatel-Lucent
Matt Oshry, Microsoft
Kenneth Rehor, Cisco
Xu Yang, Aspect
Milan Young, Nuance
Rafah Hosn (until 2008, when at IBM)

Copyright © 2010 ® (, , Keio), All Rights Reserved. W3C liability, trademark and document use rules apply.


W3C technical reports index at http://www.w3.org/TR/.

This is the 16 December 2010 Eighth Public Working Draft of "Voice Extensible Markup Language (VoiceXML) 3.0". The main differences from the previous draft are described in Appendix F Major changes since the last Working Draft. A diff-marked version of this document is also available for comparison purposes.

This document is very much a work in progress. Many sections are incomplete, only stubbed out, or missing entirely. To get early feedback, the group focused on defining enough functionality, modules, and profiles to demonstrate the general framework. To complete the specification, the group expects to introduce additional functionality (for example speaker identification and verification, external eventing) and describe the existing functionality at the level of detail given for the Prompt and Field modules. We explicitly request feedback on the framework, particularly any concerns about its implementability or suitability for expected applications. By early 2011 the group expects all key capabilities to be present in the specification, with details worked out by late 2011.

Applications written as 2.1 documents can be used under a 3.0 processor using the 2.1 profile. As an example, the Implementation Report tests for 2.1 (which includes the IR tests for 2.0) will be supported on a 3.0 processor. Exceptions will be clarifications and changes needed to improve interoperability.

This document is a W3C Working Draft. It has been produced as part of the Voice Browser Activity. The authors of this document are participants in the Voice Browser Working Group. For more information see the Voice Browser FAQ. The Working Group expects to advance this Working Draft to Recommendation status.

Comments are welcome on www-voice@w3.org (archive). See W3C mailing list and archive usage guidelines.

Publication as a Working Draft does not imply endorsement by the W3C Membership. This is a draft document and may be updated, replaced or obsoleted by other documents at any time. It is inappropriate to cite this document as other than work in progress.

This document was produced by a group operating under the 5 February 2004 W3C Patent Policy. W3C maintains a public list of any patent disclosures made in connection with the deliverables of the group; that page also includes instructions for disclosing a patent. An individual who has actual knowledge of a patent which the individual believes contains Essential Claim(s) must disclose the information in accordance with section 6 of the W3C Patent Policy.

Terminology
2 Overview
    2.1 Structure of VoiceXML 3.0
    2.2 Structure of this document
    2.3 How to read this document
3 Data Flow Presentation (DFP) Framework
    3.1 Data
    3.2 Flow
    3.3 Presentation
4 Core Concepts
    4.1 Syntactic and Semantic descriptions
    4.2 Resources, Resource Controllers, and Events
        4.2.1 Top Level Controller
    4.3 Syntax
    4.4 Event Model
        4.4.1 Internal Events
            4.4.1.1 Event Interfaces
                4.4.1.1.1 Event
                4.4.1.1.2 EventTarget
                4.4.1.1.3 EventListener
            4.4.1.2 Event Flow
                4.4.1.2.1 Event Listener Registration
                4.4.1.2.2 Event Listener Activation
        4.4.2 External Events
    4.5 Document Initialization and Execution
        4.5.1 Initialization
            4.5.1.1 DOM Processing
            4.5.1.2 Preparation for Execution
        4.5.2 Execution
            4.5.2.1 Subdialogs
            4.5.2.2 Application Root
            4.5.2.3 Summary of Syntax/Semantics Interaction
        4.5.3 Transition Controllers
5 Resources
    5.1 Datamodel Resource
        5.1.1 Data Model Resource API
    5.2 Prompt Queue Resource
        5.2.1 State Chart Representation
        5.2.2 SCXML Representation
        5.2.3 Defined Events
        5.2.4 Device Events
        5.2.5 Open Issue
    5.3 Recognition Resources
        5.3.1 Definition
        5.3.2 Defined Events
        5.3.3 Device Events
        5.3.4 State Chart Representation
        5.3.5 SCXML Representation
    5.4 Connection Resource
        5.4.1 Definition
        5.4.2 Final Processing State
        5.4.3 Defined Events
        5.4.4 State Chart Representation
        5.4.5 SCXML Representation
    5.5 Timer Resource
        5.5.1 Definition
        5.5.2 Defined Events
        5.5.3 Device Events
        5.5.4 State Chart Representation
6 Modules
    6.1 Grammar Module
        6.1.1 Syntax
            6.1.1.1 Attributes
            6.1.1.2 Content Model
        6.1.2 Semantics
            6.1.2.1 Definition
            6.1.2.2 Defined Events
            6.1.2.3 External Events
            6.1.2.4 State Chart Representation
            6.1.2.5 SCXML Representation
        6.1.3 Events
        6.1.4 Examples
    6.2 Inline SRGS Grammar Module
        6.2.1 Syntax
        6.2.2 Semantics
            6.2.2.1 Definition
            6.2.2.2 Defined Events
            6.2.2.3 External Events
            6.2.2.4 State Chart Representation
            6.2.2.5 SCXML Representation
        6.2.3 Events
        6.2.4 Examples
    6.3 External Grammar Module
        6.3.1 Syntax
            6.3.1.1 Attributes
            6.3.1.2 Content Model
        6.3.2 Semantics
            6.3.2.1 Definition
            6.3.2.2 Defined Events
            6.3.2.3 External Events
            6.3.2.4 State Chart Representation
            6.3.2.5 SCXML Representation
        6.3.3 Events
        6.3.4 Examples
    6.4 Prompt Module
        6.4.1 Syntax
            6.4.1.1 Attributes
            6.4.1.2 Content Model
        6.4.2 Semantics
            6.4.2.1 Definition
            6.4.2.2 Defined Events
            6.4.2.3 External Events
            6.4.2.4 State Chart Representation
            6.4.2.5 SCXML Representation
        6.4.3 Events
        6.4.4 Examples
    6.5 Builtin SSML Module
        6.5.1 Syntax
        6.5.2 Semantics
        6.5.3 Examples
    6.6 Media Module
        6.6.1 Syntax
            6.6.1.1 Attributes
            6.6.1.2 Content Model
                6.6.1.2.1 Tips (informative)
        6.6.2 Semantics
        6.6.3 Examples
    6.7 Parseq Module
        6.7.1 Syntax
        6.7.2 Semantics
        6.7.3 Examples
    6.8 Foreach Module
        6.8.1 Syntax
            6.8.1.1 Attributes
            6.8.1.2 Content Model
        6.8.2 Semantics
        6.8.3 Examples
    6.9 Form Module
        6.9.1 Syntax
        6.9.2 Semantics
            6.9.2.1 Form RC
                6.9.2.1.1 Definition
                6.9.2.1.2 Defined Events
                6.9.2.1.3 External Events
                6.9.2.1.4 State Chart Representation
                6.9.2.1.5 SCXML Representation
    6.10 Field Module
        6.10.1 Syntax
        6.10.2 Semantics
            6.10.2.1 Field RC
                6.10.2.1.1 Definition
                6.10.2.1.2 Defined Events
                6.10.2.1.3 External Events
                6.10.2.1.4 State Chart Representation
                6.10.2.1.5 SCXML Representation
            6.10.2.2 PlayandRecognize RC
                6.10.2.2.1 Definition
                6.10.2.2.2 Defined Events
                6.10.2.2.3 External Events
                6.10.2.2.4 State Chart Representation
                6.10.2.2.5 SCXML Representation
    6.11 Builtin Grammar Module
        6.11.1 Usage of Platform Grammars
        6.11.2 Platform Requirements
        6.11.3 Syntax and Semantics
        6.11.4 Examples
    6.12 Data Access and Manipulation Module
        6.12.1 Overview
        6.12.2 Semantics
            6.12.2.1 The scope stack
            6.12.2.2 Relevance of scope stack to properties
            6.12.2.3 Implicit variables
            6.12.2.4 Variable resolution
            6.12.2.5 Standard session variables
            6.12.2.6 Standard application variables
            6.12.2.7 Legal variable values and expressions
        6.12.3 Syntax
            6.12.3.1 Creating variables: the <var> element
            6.12.3.2 Reading variables: "expr" and "cond" attributes and the <value> element
                6.12.3.2.1 Inserting variable values in prompts: The <value> element
            6.12.3.3 Updating variables: the <assign> and <data> elements
                6.12.3.3.1 The <assign> element
                6.12.3.3.2 The <data> element
            6.12.3.4 Deleting variables: the <clear> element
            6.12.3.5 Relevance for properties
        6.12.4 Backward compatibility with VoiceXML 2.1
        6.12.5 Implicit functions using XPath
    6.13 External Communication Module
        6.13.1 Receiving external messages within a voice application
            6.13.1.1 External Message Reflection
            6.13.1.2 Receiving External Messages Asynchronously
            6.13.1.3 Receiving External Messages Synchronously
                6.13.1.3.1 <receive>
        6.13.2 Sending messages from a voice application
            6.13.2.1 sendtimeout
    6.14 Session Root Module
        6.14.1 Syntax
        6.14.2 Semantics
        6.14.3 Examples
    6.15 Run Time Control Module
        6.15.1 <rtc>
            6.15.1.1 Syntax
        6.15.2 <cancelrtc>
            6.15.2.1 Syntax
        6.15.3 Semantics
        6.15.4 Examples
    6.16 SIV Module
        6.16.1 SIV Core Functions
        6.16.2 Syntax
        6.16.3 Semantics
            6.16.3.1 Definition
            6.16.3.2 Defined Events
            6.16.3.3 External Events
            6.16.3.4 State Chart Representation
        6.16.4 Events
        6.16.5 Examples
    6.17 Subdialog Module
        6.17.1 Syntax
        6.17.2 Semantics
        6.17.3 Examples
    6.18 Disconnect Module
        6.18.1 Syntax
            6.18.1.1 Attributes
            6.18.1.2 Content Model
        6.18.2 Semantics
            6.18.2.1 Definition
            6.18.2.2 Defined Events
            6.18.2.3 External Events
            6.18.2.4 State Chart Representation
            6.18.2.5 SCXML Representation
        6.18.3 Example
    6.19 Play Module
        6.19.1 Semantics
            6.19.1.1 Definition
            6.19.1.2 Defined Events
            6.19.1.3 External Events
            6.19.1.4 State Chart Representation
            6.19.1.5 SCXML Representation
    6.20 Record Module
        6.20.1 Syntax
            6.20.1.1 Attributes
            6.20.1.2 Content Model
            6.20.1.3 Data Model Variables
        6.20.2 Semantics
            6.20.2.1 RecordInputItem RC
                6.20.2.1.1 Definition
                6.20.2.1.2 Defined Events
                6.20.2.1.3 External Events
                6.20.2.1.4 State Chart Representation
                6.20.2.1.5 SCXML Representation
            6.20.2.2 Record RC
                6.20.2.2.1 Definition
                6.20.2.2.2 Defined Events
                6.20.2.2.3 External Events
                6.20.2.2.4 State Chart Representation
                6.20.2.2.5 SCXML Representation
    6.21 Property Module
        6.21.1 Syntax
            6.21.1.1 Attributes
            6.21.1.2 Content Model
        6.21.2 Semantics
            6.21.2.1 Definition
            6.21.2.2 Defined Events
            6.21.2.3 External Events
            6.21.2.4 State Chart Representation
            6.21.2.5 SCXML Representation
        6.21.3 Events
        6.21.4 Examples
    6.22 Transition Controller Module
        6.22.1 Syntax
            6.22.1.1 Attributes
            6.22.1.2 Content Model
        6.22.2 Semantics
            6.22.2.1 Definition
            6.22.2.2 Defined Events
            6.22.2.3 External Events
            6.22.2.4 State Chart Representation
            6.22.2.5 SCXML Representation
        6.22.3 Events
        6.22.4 Examples
7 Profiles
    7.1 Legacy Profile
        7.1.1 Conformance
            7.1.1.1 Vxml Root Module Requirements
            7.1.1.2 Form Module Requirements
            7.1.1.3 Field Module Requirements
            7.1.1.4 Prompt Module Requirements
            7.1.1.5 Grammar Module Requirements
            7.1.1.6 Data Access and Manipulation Module Requirements
        7.1.2 Convenience Syntax
        7.1.3 Default Handlers and Transition Controllers
    7.2 Basic Profile
        7.2.1 Introduction
        7.2.2 What the Basic Profile includes
            7.2.2.1 SIV functions
            7.2.2.2 Presentation functions
            7.2.2.3 Capture functions
            7.2.2.4 Other modules
        7.2.3 Returned results
        7.2.4 What the Basic Profile does not include
        7.2.5 Examples
    7.3 Maximal Profile
    7.4 Enhanced Profile
    7.5 Convenience Syntax (Syntactic Sugar)
8 Environment
    8.1 Resource Fetching
        8.1.1 Fetching
        8.1.2 Caching
            8.1.2.1 Controlling the Caching Policy
        8.1.3 Prefetching
        8.1.4 Protocols
    8.2 Properties
        8.2.1 Speech Recognition Properties
        8.2.2 DTMF Recognition Properties
        8.2.3 Prompt and Collect Properties
        8.2.4 Media Properties
        8.2.5 Fetch Properties
        8.2.6 Miscellaneous Properties
    8.3 Speech and DTMF Input Timing Properties
        8.3.1 DTMF Grammars
            8.3.1.1 timeout, No Input Provided
            8.3.1.2 interdigittimeout, Grammar is Not Ready to Terminate
            8.3.1.3 interdigittimeout, Grammar is Ready to Terminate
            8.3.1.4 termchar and interdigittimeout, Grammar Can Terminate
            8.3.1.5 termchar Empty When Grammar Must Terminate
            8.3.1.6 termchar Non-Empty and termtimeout When Grammar Must Terminate
            8.3.1.7 termchar Non-Empty and termtimeout When Grammar Must Terminate
            8.3.1.8 Invalid DTMF Input
        8.3.2 Speech Grammars
            8.3.2.1 timeout When No Speech Provided
            8.3.2.2 completetimeout With Speech Grammar Recognized
            8.3.2.3 incompletetimeout with Speech Grammar Unrecognized
    8.4 Value Designations
        8.4.1 Integers
        8.4.2 Real Numbers
        8.4.3 Times
9 Integration with Other Markup Languages
    9.1 Embedding of VoiceXML within SCXML
    9.2 Integrating Flow Control Languages into VoiceXML
        9.2.1 SCXML for Dialog Management
            9.2.1.1 System-driven Dialog
            9.2.1.2 User-driven Dialog
        9.2.2 Graceful Degradation
        9.2.3 SCXML as Basis for Recursive MVC

Acknowledgements
B References
    B.1 Normative References
    B.2 Informative References
C Glossary of Terms
D VoiceXML 3.0 XML Schema
    D.1 Schema for VXML Root Module
    D.2 Schema for Form Module
    D.3 Schema for Field Module
    D.4 Schema for Prompt Module
    D.5 Schema for Builtin SSML Module
    D.6 Schema for Foreach Module
    D.7 Schema for Data Access and Manipulation Module
    D.8 Schema for Legacy Profile
E Convenience Syntax in VoiceXML 2.x
    E.1 Simplified Dialog Structure
    E.2 Examples
        E.2.1 <menu> with <choice>
        E.2.2 Equivalent <form>, <field>, <option>
        E.2.3 Equivalent <form>, <field>, <grammar>
F Major changes since the last Working Draft


[RFC2119] and indicate required levels for compliant VoiceXML 3.0 implementations.

Terms used in this specification are defined in Appendix C Glossary of Terms.

3 Data Flow Presentation (DFP) Framework presents the Data-Flow-Presentation Framework, its importance for the development of VoiceXML 3.0 and how VoiceXML 3.0 fits into the model.

4 Core Concepts explains the core concepts underlying the new structure for VoiceXML, including resources, resource controllers, the relationship between syntax and semantics, DOM eventing, modules and profiles.

5 Resources presents the resources defined for the language. These provide the key presentation-related functionality in the language.

6 Modules presents the modules defined for the language. Each module consists of a syntax piece (with its user-visible events), a semantics piece (with its behind-the-scenes events) and a description of how the two are connected.

7 Profiles presents two profiles. The first, the VoiceXML 2.1 profile, shows how a language similar to VoiceXML 2.1 can be created using the structure and functionality of VoiceXML 3.0. The second, the Basic profile, leaves out higher-level flow control constructs such as <form> and the associated Form Interpretation Algorithm.

The Appendices provide useful references and a glossary of terms used in the specification.

3 Data Flow Presentation (DFP) Framework. The data-flow- presentation distinction applies not only to VoiceXML 3.0, but to many of W3C's specifications. Understanding VoiceXML's role as a presentation language is crucial context for understanding the rest of the specification.

For application authors: we recommend that you begin with syntax and only gradually explore details of the semantics as you need to understand behavioral specifics.

  1. If you are familiar with VoiceXML 2 you might want to begin with the Legacy profile in 7.1 Legacy Profile to see an example of all the syntactic pieces in the finished profile.
  2. You should then review the syntax sections of each of the modules in 6 Modules, along with the Basic profile in 7.2 Basic Profile. When you need to understand how a bit of syntax is implemented, read the semantics section corresponding to that syntax.
  3. Along the way you will definitely want to review the parts of 4 Core Concepts that are relevant to your other reading (profiles, modules, syntax, semantics, and DOM eventing).

For VoiceXML platform developers: we recommend that you begin with the functionality and framework and only focus on syntax later.

  1. If you are familiar with VoiceXML 2 you might want to begin with the Legacy profile in 7.1 Legacy Profile to see the user-visible differences between the original VoiceXML 2.1 language and the new Legacy profile. A brief review of the Basic profile in 7.2 Basic Profile would be good as well.
  2. Next you should review 4 Core Concepts in detail, since the rest of the language is built upon the framework described there.
  3. 5 Resources and 6 Modules (the semantics part) should be the bulk of your focus. Remember that they are semantic descriptions only and that you can implement the functionality any way you wish as long as the semantics remain the same.
  4. One significant difference from VoiceXML 2.1 is support for DOM 4.4 Event Model .

[DFP]) Framework.

Although VoiceXML 3.0 is a presentation language, it also contains within it all 3 levels of the DFP framework ( Figure 6).

👁 DFP Architecture

Figure 6: DFP Architecture

The Data Flow Presentation (DFP) Framework is an instance of the Model-View-Controller paradigm, where computation and control flow are kept distinct from application data and from the way in which the application communicates with the outside world. This partitioning of an application allows for any one layer to be replaced independently of the other two. In addition, it is possible to simultaneously make use of more than one Data (Model) language, Flow (Controller), and/or Presentation (View) language.

  1. D VoiceXML 3.0 XML Schema. The events are DOM level 3 events. This document provides a textual description of each element, attribute, and event.
  2. Semantics level -- The semantics of each module is described in terms of resources, resource controllers, and semantic events that the resource controllers may generate and consume. Semantics is described by both UML state chart visual diagrams and SCXML representations.

The visual UML state chart diagrams are informative. They are included for ease of reading and quick understanding. The more detailed textual SCXML representations are normative.

It is important to note that this model places no burden or requirements that a VoiceXML interpreter must implement behavior as described in the model. Rather, the requirement is that the behavior must be the same as if it were implemented as described, but it is permitted to have optimizations or different architecture behind the implementation of the markup interpretation.

The semantic descriptions are important for reasons including the following:

  • Enable alternative implementations of a module. The modular description in this document describes the semantics — "what" a module does, but not "how" a module is implemented. A module may be implemented differently on different platforms.
  • Enable alternative syntax for the semantic description code. For example, one platform developer might use a particular syntax for server-based modules, while another platform developer might use an alternative syntax for an embedded mobile platform.
  • The same semantic descriptions can be reused in multiple modules. For example, the semantic description of the 6.4 Prompt Module can be reused (referenced) from within the 5.2 Prompt Queue Resource.

4.4 Event Model . The interaction between actual DOM events and logical SCXML events is described in 4.5 Document Initialization and Execution, below.

Each VoiceXML 3.0 module is described using SCXML notation and optionally a UML state chart representation of its underlying behavior expressed in terms of resources and resource controllers. While the resources and resource controllers are not exposed directly in the markup, they are used to define the semantics of VoiceXML 3.0 markup elements. Figure 7 illustrates the relationship among resource controllers, resources, and media devices. The arrows represent events exchanged among components. A more concrete example is represented in Figure 8 which illustrates the Prompt Resource controller (further defined in 6.4.2 Semantics), the PromptQueue Resource, and the SSML Media Player.

👁 Semantic model overview

Figure 7: Semantic model with Resources and Resource Controllers

👁 Semantic model details

Figure 8: Semantic model with Specific Examples

[DOM3Events] specification. DOM Level 3 Events offer a robust set of interfaces for managing the listener registration, dispatching, propagation, and handling of events, as well as a description of how events flow through an XML tree.

The DOM 3.0 event model offers VoiceXML developers a rich set of interfaces that allow them to easily add behavior to their applications. In addition, conforming to the standard DOM event model enables authors to integrate their Voice applications in next generation multimodal or multi-namespaced frameworks such as MMI and CDF with minimal efforts. Note that the VXML 2.0 style events are supported through a new DOM event named 'vxmlevent', and if this vxmlevent is uncanceled then the default action is to run the VXML 2.0 event handling.

Within the VoiceXML 3.0 semantic model, the DOM Level 3 Events APIs are available to all Resource Controllers that have markup elements associated with them. Indeed, this section covers the eventing APIs as available to VoiceXML 3.0 markup elements. The following section describes how the semantic model ties in with the DOM eventing model.

Event interface to support voice specific event information. In particular, the VoiceXML 3.0 Event interface supports a count integer that stores the number of times a resources emits a particular event type. The semantic model manages the count field by incrementing its value and resetting it as described in the section that follows.

Editorial note
Open Issue: Because we now are using the 'vxmlevent' DOM event, we don't need to add a count to the generic DOM events (and thus change the generic DOM events). Instead, we need to specify the count as one of the properties of the vxmlevent event.

EventTarget interface.This interface allows registration and removal of event listeners as well as dispatching of events.

EventListener interface. This interface allows the activation of handlers associated with a particular event. When a listener is activated, the event handler execution is done in the semantic model as described in the section that follows.

event listener group; all events within a group are ordered. As such, in VoiceXML 3.0, event listeners are registered as they are encountered in the document. Furthermore, all event listeners registered on an element belong to the same default group. Both of these provisions ensure that event handlers will execute in document order.

[MMI]. These life cycle events allow the flow component of the DFP architecture to control the presentation layer by starting and stopping the processing of markup. By handling these events, the VoiceXML interpreter acts as a 'modality component' in the multimodal architecture, while the flow component acts as an 'interaction manager'. As a result, VoiceXML 3 applications can be easily extended into multimodal applications. However it is important to note that support for the life cycle events is required by the DFP framework in all applications, whether uni- or multimodal.

The interpreter must handle the following life cycle events automatically:

  • PrepareRequest. This event instructs the interpreter to prepare to run a VoiceXML script. The event contains: a) either the URI of the markup to run or the actual markup itself, b) a context ID which will be used in subsequent messages referring to the same markup, and c) a specification of media channel to use. Note that the interpreter does not actually start running the markup in question when it receives this message. Once the interpreter has finished its preparation, it sends a PrepareResponse event in reply. The PrepareResponse event contains the context ID that was sent in the PrepareRequest plus a status field containing 'success' or 'failure'. This message with a status of 'success' thus indicates that the interpreter is now ready to run the specified markup.
  • StartRequest. This event instructs the interpreter to run the specified script. It will contain the same context ID as the preceding PrepareRequest, and may optionally contain a new specification of the markup to run and the media channel to use, overriding those contained in the PrepareRequest. This event may also be sent without a preceding PrepareRequest. When the interpreter receives this event, it must start running the specified markup using the specified media channel. It will then send a StartResponse event in reply.
  • CancelRequest with 'immediate' flag set to true. This event instructs the interpreter to disconnect from the media and to stop processing the markup. This event contains the context ID. The interpreter may continue processing clean-up handlers etc. after it receives this event, but it should not end any events back to the sender other than the CancelResponse, which acknowledges the receipt of this command. If the 'immediate' flag is set to false, the interpreter passes the event up to be handled by author code, as described below.
  • ClearContextRequest???

All other life cycle events and all other external events are ignored unless the External Communications Module 6.13 External Communication Module is included in the profile. If the External Communications Module is present, all other external events are passed up to the application, placed in the application event queue and then handled as specified by the developer using the functionality defined in that module.

Editorial note
Open Issue: Should ClearContextRequest be handled automatically? Should Done be sent automatically when the document is finished? Where do these response events get sent?

3.2 Flow). Furthermore, these resource events are conceptual, not DOM events: they are used to define relationship with other conceptual entities and are not exposed at the markup level.

The following resources are defined: data model (5.1 Datamodel Resource), prompt queue (5.2 Prompt Queue Resource), recognition -- DTMF, ASR, and SIV (5.3 Recognition Resources), connection (5.4 Connection Resource), and timer (5.5 Timer Resource).

Details (members only).

Resolution:

None recorded.

6.1.1 Syntax. Its semantics are specified in 6.1.2 Semantics.

Editorial note

Issue: Grammar processing will need to know the Base URI to resolve relative references.

D VoiceXML 3.0 XML Schema for schema definitions].

6.2.1 Syntax. Its semantics are specified in 6.2.2 Semantics.

[SRGS]), minus the XML Prolog. Note that both elements and attributes must be in the SRGS namespace (http://www.w3.org/2001/06/grammar).

6.3.1 Syntax. Its semantics are specified in 6.3.2 Semantics.

D VoiceXML 3.0 XML Schema for schema definitions].

6.3.1.2 Content Model for restrictions on occurrence of src and srcexpr attributes.

The value of the src attribute is a URI specifying the location of the grammar with an optional fragment for the rulename. Section 2.2 of the Speech Recognition Grammar Specification [SRGS] defines several forms of rule reference. The following are the forms that are permitted on a grammar element in VoiceXML:

  • Reference to a named rule in an external grammar: src attribute is an absolute or relative URI reference to a grammar which includes a fragment with a rulename. This form of rule reference to an external grammar follows the behavior defined in Section 2.2.2 of [SRGS]. If the URI cannot be fetched or if the rulename is not defined in the grammar or is not a public (activatable) rule of that grammar then an error.badfetch is thrown.
  • Reference to the root rule of an external grammar: src attribute is an absolute or relative URI reference to a grammar but does not include a fragment identifying a rulename. This form implicitly references the root rule of the grammar as defined in Section 2.2.2 of [SRGS]. If the URI cannot be fetched or if the grammar cannot be referenced by its root (see Section 4.7 of [SRGS]) then an error.badfetch is thrown.

The following are the forms of rule reference defined by [SRGS] that are not supported in VoiceXML 3.

  • Local rule reference: a fragment-only URI is not permitted. (See definition in Section 2.2.1 of [SRGS]). A fragment-only URI value for the src attribute causes an error.semantic event.
  • Reference to special rules: there is no support for special rule references (NULL, VOID, GARBAGE) on the <grammar> element itself. In the XML form of the SRGS specification, the only way to include NULL, VOID, and GARBAGE is via the use of the "special" attribute on the <ruleref> element. Thus, it is not possible to reference individual uses of NULL, VOID, and GARBAGE rules in a separate SRGS document, since that would require a fragment identifier to place on the end of the URI referencing the document, which in turn would require an id within that document for the given use of NULL, VOID, or GARBAGE. Note that the external grammar referenced from the <grammar> element may itself be an SRGS grammar that contains a <ruleref> element with a special attribute to reference NULL, VOID, or GARBAGE.

6.5 Builtin SSML Module, 6.6 Media Module and 6.7 Parseq Module).

The attributes and content model of <prompt> are specified in 6.4.1 Syntax. Its semantics are specified in 6.4.2 Semantics, including how the final prompt content is determined and how the prompt is queued for playback using the PromptQueue Resource (5.2 Prompt Queue Resource).

D VoiceXML 3.0 XML Schema for schema definitions].

6.4 Prompt Module.

The attributes and content model of SSML elements are specified in 6.5.1 Syntax. Its semantics are specified in 6.5.2 Semantics, including how elements are evaluated to yield final content for playback.

D VoiceXML 3.0 XML Schema for schema definitions].

This module defines an SSML ([SSML]) Conforming Speech Synthesis Markup Language Fragment where:

  • there is no explicit <speak> element.
  • these elements are part of the VoiceXML namespace rather than the SSML namespace
  • the <foreach> element may appear inside these elements prior to evaluation
  • the <value> element may appear inside these elements prior to evaluation
  • The <audio> element is extended with the attributes specified in Table 33, in addition to the fetchtimeout, fetchhint, maxage, and maxstale attributes as specified in 8.1.1 Fetching.

Conforming Speech Synthesis Markup Language Processor.

Evaluation consits of the following:

  • data model expressions are evaluated against the data model.
  • If a <foreach> element is present, it is evaluated so as to yield content for each defined item in the array.
  • If an <audio> element has a specified expr attribute, then the attribute value is evaluated to provide a URI value for the src attribute. If the expr evaluation results in anything other than a valid URI string, then the expr attribute and its content are removed. ignored.
  • If a <value> element is present, its expr attribute is evaluated to return a CDATA value.
  • construction of a <speak> element with appropriate version, namespace, and xml:lang attributes. The xml:lang attribute value is inherited from the <prompt> element (see 6.4 Prompt Module).
  • if an unsupported language is encountered, the platform throws an error.unsupported.language event which specifies the language in its message variable
Editorial note

Need to specify further error cases

Need to clarify unsupported languages and external (e.g. MRCP) SSML processors.

6.4 Prompt Module).

The <media> element can be seen as an enhanced and generalized version of the VoiceXML <audio> element. It is enhanced in that it provides additional attributes describing the type of media, conditional selection, as well as control over playback . It is a generalization of the <audio> element in that it permits media other than audio to be played; for example, media formats which contains audio and video tracks.

D VoiceXML 3.0 XML Schema for schema definitions].

8.1.1 Fetching.

occurrence constraints for restrictions on occurrence of src and srcexpr attributes.

Calculations of rendered durations and interaction with other timing properties follow SMIL 2.1 Computing the active duration where

  • <media> is a time container
  • Time Designation values for clipBegin, clipEnd, and repeatDur are a subset of SMIL Clock-value
  • If the length of an media clip is not known in advance then it is treated as indefinite. Consequently repeatCount will have no effect.
  • If clipEnd is after the end of the media, then rendering ends at the media end.
  • If clipBegin is after clipEnd, no audio will be produced.
  • If clipBegin equals clipEnd, the behavior depends upon the kind of media. For media with a concept of frames, such as video, the single frame at the beginning of clipBegin is rendered for repeatDur.
  • repeatDur takes precedence over repeatCount in determining the maximum duration of the media.

Note that not all SMIL 2.1 Timing features are supported.

Editorial note

Use SMIL 3.0 or SMIL 2.1 reference?

6.4 Prompt Module).

This module is dependent upon the media module (6.6 Media Module).

With connections which support multiple media streams, it is possible to simultaneously playback multiple media types. For media container formats like 3GPP, audio and video media can be generated simultaneously from the same media resource.

There are established use cases for simultaneous playback of multiple media which are specified in separate resources:

  • Video mail: an audio message has been left using a conventional audio only system. For playback on a system with video support, a video resource can be played simultaneously with an image of the person, or an avatar.
  • Enterprise: a video stream resource from a security camera with TTS voiceover providing additional information.
  • Education: a video resource showing medical procedure with commentary provided by lecturer in student's language.
  • Talking heads: an animated avatar together with audio or TTS voiceover.

The intention is provide support for basic use cases where audio or TTS output from one resource can be complemented with output from another resource as permitted by the connection and platform capabilities.

6.5 Builtin SSML Module, the <prompt> element defined in 6.4 Prompt Module, etc.

The attributes and content model of the element are specified in 6.8.1 Syntax. Its semantics are specified in 6.8.2 Semantics.

D VoiceXML 3.0 XML Schema for schema definitions].

6.10.2.1 Field RC) of the Field Module.

The behavior of the Form RC follows the VoiceXML FIA, although some aspects of this are not modeled directly in this RC: external transition handling is not part of the form RC; input items used separate RCs to manage coordination between media resources, while recognition results can be received directly by form, field or other RCs.

[This initial version does not address all aspects of FIA behavior; for example, event handling, error handling and external transitions are not covered.]

Editorial note
This description of the form needs to be updated with all the new functionality we have given the form via our new eventing approach.

6.10.2.1 Field RC), PlayandRecognize (6.10.2.2 PlayandRecognize RC), ...

6.10.2.2 PlayandRecognize RC), causing any queued prompts to be played and recognition to be initiated. In the event data, the controller is set to this RC, and other data is derived from data model properties. The RC transitions to the Executing state.

In the Executing state, the PlayAndRecognize RC must send recoResults (or error events: noinput, nomatch, error.semantic) to the field RC.

If the field RC receives the recoResults, then it updates its name variable in the Datamodel Resource. The field RC then sends a 'fieldResult' event to its controller indicating that a field result has been received and processed.

If the recoResult is received by the field RC's controller, then the field receives an 'evaluate' event which causes it to transition to the Evaluating state.

In the Evaluating state, the field RC iterates through its children executing each filled RC: this is modeled by a separate RC (see XXX). When evaluation is complete, the RC sends a 'evaluated' event to its controller and transitions to the Ready state.

5.1.1 Data Model Resource API.

At any given point in time, based on the VoiceXML document structure and the execution state, the stack may contain the following scopes whose semantics are described in VoiceXML 3.0 as follows (bottom to top):

8.2 Properties. Properties may be defined for the whole application, for the whole document at the <vxml> level, for a particular dialog at the <form> or <menu> level, or for a particular form item. Thus, access to properties is also controlled by means of the same scope stack that is used by named variables.

VoiceXML 3.0 provides a consistent mechanism to unambiguously read these properties in any scope using the data access and manipulation language in a manner similar to accessing and manipulating named variables. This is described in the two sections below.

5.1.1 Data Model Resource API.

The above examples result in the following Data Model Resource API calls, in order:

  1. At the time of <var> execution, first the value is obtained for the new variable with name "phone" by evaluating the expression "'6305551212'", and subsequently the variable is created in the scope on top of the stack:
    1. Obtain variable value by calling EvaluateExpression("'6305551212'")
    2. Create variable by calling CreateVariable("name", value) where value is the result obtained in a. above. The optional scope parameter is not specified since the scope on the top of the stack is chosen by default.
  2. At the time of <var> execution, first the value is obtained for the new variable with name "y" by evaluating the expression "document.z+1", and subsequently the variable is created in the scope on top of the stack:
    1. Obtain variable value by calling EvaluateExpression("document.z+1")
    2. Create variable by calling CreateVariable("y", value) where value is the result obtained in a. above.
  3. At the time of <var> execution, first the value is obtained for the new variable with name "foo" by evaluating the expression "dialog.bar * 2", and subsequently the variable is created in application scope:
    1. Obtain variable value by calling EvaluateExpression("dialog.bar * 2")
    2. Create variable by calling CreateVariable("foo", value, "application") where value is the result obtained in a. above.
  4. At the time of <var> execution, the new variable with name "itinerary" is created in the scope on top of the stack using the in-line specification for the value in the body of the <var> element:
    1. Process the in-line specification below into an internal representation for the data model. For example, the assumed XML data model in this example may choose to internally represent this in-line specification as a DOM node.
       <root xmlns="">
       <flight>SW123</flight>
       <origin>JFK</origin>
       <depart>2009-01-01T14:32:00</depart>
       <destination>SFO</destination>
       <arrive>2009-01-01T18:14:00</arrive>
       </root>
       
      
    2. Create variable by calling CreateVariable("itinerary", node) where node is the DOM node from a. above.

6.4 Prompt Module specifies prompts in detail.

Attributes of <value>

5.1.1 Data Model Resource API.

The above examples result in the following Data Model Resource API calls:

    1. Evaluate the expression "application.duration + dialog.duration" in the closest enclosing scope by calling EvaluateExpression("application.duration + dialog.duration")
    2. The expression evaluator in turn resolves the scope-qualified variables in the expression by calling ReadVariable("duration", "application") ReadVariable("duration", "dialog")
    1. Evaluate the expression "foo * bar" in the closest enclosing scope by calling EvaluateExpression("foo * bar")
    2. The expression evaluator in turn resolves the scope-unqualified variables in the expression by calling ReadVariable("foo") ReadVariable("bar")
    1. Evaluate the expression "foo + bar + application.baz" in the document scope in the expression by calling EvaluateExpression("foo + bar + application.baz", "document")
    2. The expression evaluator in turn resolves the variables by calling ReadVariable("foo", "document") ReadVariable("bar", "document") ReadVariable("baz", "application")

5.1.1 Data Model Resource API.

The above examples result in the following Data Model Resource API calls, in order:

  1. At the time of <assign> execution, first the new value is obtained for the variable with name "phone" by evaluating the expression "'6305551212'", and subsequently the variable is updated in the scope on top of the stack:
    1. Obtain new variable value by calling EvaluateExpression("'6305551212'")
    2. Update variable value by calling UpdateVariable("name", value) where value is the result obtained in a. above. The optional scope parameter is not specified since the scope on the top of the stack is chosen by default.
  2. At the time of <assign> execution, first the new value is obtained for the variable with name "y" by evaluating the expression "document.z+1", and subsequently the variable is updated in the scope on top of the stack:
    1. Obtain new variable value by calling EvaluateExpression("document.z+1")
    2. Update variable value by calling UpdateVariable("y", value) where value is the result obtained in a. above.
  3. At the time of <assign> execution, first the new value is obtained for the variable with name "foo" by evaluating the expression "dialog.bar * 2", and subsequently the variable is updated in application scope:
    1. Obtain new variable value by calling EvaluateExpression("dialog.bar * 2")
    2. Update variable value by calling UpdateVariable("foo", value, "application") where value is the result obtained in a. above.
  4. At the time of <assign> execution, the variable with name "itinerary" is updated in the scope on top of the stack using the in-line specification for the new value specified in the body of the <assign> element:
    1. Process the in-line specification below into an internal representation for the data model. For example, the assumed XML data model in this example may choose to internally represent this in-line specification as a DOM node.
       <root xmlns="">
       <flight>SW123</flight>
       <origin>JFK</origin>
       <depart>2009-01-01T14:32:00</depart>
       <destination>SFO</destination>
       <arrive>2009-01-01T18:14:00</arrive>
       </root>
       
      
    2. Update variable value by calling UpdateVariable("itinerary", node) where node is the DOM node from a. above.

8.1.1 Fetching.

5.1.1 Data Model Resource API.

The single <data> usage in the above example results in the following behavior and Data Model Resource API calls:

At the time of <data> execution, the variable with name "quote" is updated in the document scope using the in-line specification for the new value retrieved from the URI expression 'http://www.example.org/getquote?ticker=' + document('tickers')/ford which evaluates to http://www.example.org/getquote?ticker=f

  1. Obtain the URI to request the in-line specification of the new value from, by evaluating the "srcexpr" attribute value EvaluateExpression("'http://www.example.org/getquote?ticker=' + document('tickers')/ford")
  2. Request the in-line specification from the URI resulting from a. which happens to be http://www.example.org/getquote?ticker=f in this example.
  3. Process the response received (the in-line data specification below) into an internal representation for the data model. The XML data model internally represents this in-line specification as a DOM node (the document element).
     <quote xmlns="http://www.example.org">
     <ticker>F</ticker>
     <name>Ford Motor Company</name>
     <change>0.10</change>
     <last>3.00</last>
     </quote>
     
    
  4. Update the "quote" variable value in document scope by calling UpdateVariable("quote", node, "document") where node is the DOM node in c. above.

<data> Fetching Properties

These properties pertain to documents fetched by the <data> element.

5.1.1 Data Model Resource API.

The above examples result in the following Data Model Resource API calls, in order:

  1. The namelist "city state zip" is tokenized into variable names and each variable is reset in the scope on top of the stack (the optional scope parameter is not specified in the calls since the scope on the top of the stack is chosen by default):
    1. Tokenize namelist "city state zip" into tokens "city", "state" and "zip"
    2. Reset variable "city" by calling DeleteVariable("city")
    3. Reset variable "state" by calling DeleteVariable("state")
    4. Reset variable "zip" by calling DeleteVariable("zip")
  2. The namelist "application.foo dialog.bar baz" is tokenized into variable names and each scope-qualified variable is reset in the mentioned scope and scope-unqualified variables are reset in the scope on top of the stack:
    1. Tokenize namelist "application.foo dialog.bar baz" into tokens "application.foo", "dialog.bar" and "baz"
    2. Reset variable "foo" in application scope by calling DeleteVariable("foo", "application")
    3. Reset variable "bar" in dialog scope by calling DeleteVariable("bar", "dialog")
    4. Reset variable "baz" in current scope by calling DeleteVariable("baz")
  3. The namelist "alpha beta application.gamma" is tokenized into variable names and each scope-qualified variable is reset in the mentioned scope and scope-unqualified variables are reset in the document scope:
    1. Tokenize namelist "alpha beta application.gamma" into tokens "alpha", "beta" and "application.gamma"
    2. Reset variable "alpha" by calling DeleteVariable("alpha", "document")
    3. Reset variable "beta" by calling DeleteVariable("beta", "document")
    4. Reset variable "gamma" by calling DeleteVariable("gamma", "application")
  4. In the absence of a namelist, each variable in the scope on top of the stack is reset. For each variable var in the closest enclosing scope:
    1. Reset variable DeleteVariable(var)
    2. If var is a form item, reset prompt and event counters
  5. In the absence of a namelist, each variable in the dialog scope is reset. For each variable var in dialog scope:
    1. Reset variable DeleteVariable(var, "dialog")
    2. If var is a form item, reset prompt and event counters

5.1.1 Data Model Resource API. This will allow resetting variable values to the initial in-line specification when such is present, for instance.

Resolution:

None recorded.

8.2 Properties. VoiceXML 3.0 provides a consistent mechanism to unambiguously read these properties in any scope using the data access and manipulation language in a manner similar to accessing and manipulating named variables as illustrated in section 2.3.2. However, properties cannot be created, updated or deleted using any of the syntax described in this module. The <property> element syntax must be used for such operations.

VXML2.

Events are dispatched to the application serially. Since the interpreter only reflects the data associated with a single external message at a time, it is the application's responsibility to manage the data associated with each external message once that message has been delivered.

The following example demonstrates asynchronous receipt of an external message. The catch handler copies the reflected external message into an array at application scope.

<vxml version="2.1"
 xmlns="http://www.w3.org/2001/vxml">
 <property name="externalevents.enable" value="true"/>
 <var name="myMessages" expr="new Array()"/>

 <catch event="externalmessage">
 <var name="lm" expr="application.lastmessage$"/>
 <if cond="lm.contenttype == 'text/xml' || lm.contenttype == 'application/xml'">
 <log>received XML with root document element
 <value expr="lm.content.documentElement.nodeName"/>
 </log>
 <elseif cond="typeof lm.content == 'string'"/>
 <log>received <value expr="lm.content"/></log>
 <else/>
 <log>received unknown external message type
 <value expr="typeof lm.content"/>
 </log>
 </if>
 <script>
 myMessages.push({'content' : lm.content, 'ctype' : lm.contenttype});
 </script>
 </catch>

 <form>
 <field name="num" type="digits">
 <prompt>pick a number any number</prompt>
 <catch event="noinput nomatch">
 sorry. didn't get that.
 <reprompt/> 
 </catch>
 <filled>
 you said <value expr="num"/>
 <clear/>
 </filled>
 </field>
 </form>
</vxml>

Section 6.5 of [VXML2]. If not specified, the value is derived from the innermost sendtimeout property.

VXML 2.0 section 5.1.2 when talking about the variable scopes the text for application in table 40 is also appropriate for session (new text "These are declared with <var> and <script> elements that are children of the session root document's <vxml> element. They are initialized when the session root document is loaded. They exist while the session document is loaded, and are visible to the session root document, the application root document, and any other loaded application leaf document.").

This session document then is loaded and active in the hierarchy of documents that follows the javascript scope chaining (that is a document is below an application root is below a session root). This means that if a variable is declared in the session root and then in some local form in the leaf document the variable would be shadowed (just like how the shadowing from the application root).

This also implies that the catch selection algorithm as described in VXML 2.0 section 5.2.4 would have to change to include the session root document as a potential source of catch handlers (new text "Form an ordered list of catches consisting of all catches in the current scope and all enclosing scopes (form item, form, document, application root document, session root document, interpreter context), ordered first by scope (starting with the current scope), and then within each scope by document order."). Then all catch handling would remain the same, in particular the as-if-by copy semantics are retained so if an event from a leaf document was handled by a catch handler from the session root the catch handler wouldn't execute within the context of the session root document but would instead execute as if by copy into the local leaf document context.

This also implies that property lookup from section 6.3 of VXML 2.0 would have to change to say that property value lookup can also go to the session root, if a more local value for the property isn't found (new text "Properties may be defined for the whole session, for the whole application, for the whole document at the <vxml> level, for a particular dialog at the <form> or <menu> level, or for a particular form item.). This doesn't change the usual way properties work where a property at a lower level override one at a higher level.

This also implies that the behavior for link's that are document-level link of session roots are active which would be a change to section 2.5 of VXML 2.0 (new text "If an application root document has a document-level link, its grammars are active no matter what document of the application is being executed. If an session root document has a document-level link, its grammars are active no matter what document of the session is being executed. If execution is in a modal form item, then link grammars at the form, document, application or session level are not active.").

Similar to for links, the scope of grammars from section 3.1.3 of VXML 2.0 would be changed to specify what happens when a grammar from a session root has document scope (new text "Form grammars are by default given dialog scope, so that they are active only when the user is in the form. If they are given scope document, they are active whenever the user is in the document. If they are given scope document and the document is the application root document, then they are also active whenever the user is in another loaded document in the same application. If they are given scope document and the document is the session root document, then they are also active throughout the session.". Note that this active throughout the session can still be trumped by modal listen states (just like the application root can). Section 3.1.4 of VXML 2.0 also changes the activation of grammars bulleted list to include the session root (new text: "grammars contained in links in its application root document or session root document, and grammars for menus and forms in its application root document or session root document which are given document scope.").

D VoiceXML 3.0 XML Schema for schema definitions].

The <voicemodel> element has the attributes specified in Table 72, in addition to the fetchtimeout, fetchhint, maxage, and maxstale attributes as specified in 8.1.1 Fetching.

6.18.1 Syntax. Its semantics are specified in 6.18.2 Semantics.

D VoiceXML 3.0 XML Schema for schema definitions].

6.19 Play Module), causing any queued prompts to be played. Note that the event data passed to Play RC must have:

  • the controller set to this RC
  • bargein = false

This RC transitions to the Executing state after sending the event request to the Play RC.

In the Executing state, when the disconnect RC receives the "playDone" event, it instructs the connection resource to disconnect the interpreter context from the user and enters into the "Disconnecting" state.

In the "Disconnecting" state, when the disconnect RC receives "userDisconnected" event,

  • The namelist variables are tokenized into individual variable names and their values are obtained using ReadVariable. The way, values of these individual variables is made available to interpreter context is platform specific
  • It throws the "connection.disconnect.hangup" event to the parent element
  • It transitions to the Ready state
Editorial note

Play RC is not yet defined.

6.18 Disconnect Module).

5.2 Prompt Queue Resource).

6.20.1 Syntax. Its semantics are specified in 6.20.2 Semantics.

D VoiceXML 3.0 XML Schema for schema definitions].

6.20.2.1 RecordInputItem RC), PlayandRecognize (6.10.2.2 PlayandRecognize RC), Record (6.20.2.2 Record RC).

8.2 Properties.

Properties may be defined for the session, for the whole application, for the whole document at the <vxml> level, for a particular dialog at the <form> or <menu> level, or for a particular form item. Properties apply to their parent element and all the descendants of the parent. A property at a lower level overrides a property at a higher level. When different values for a property are specified at the same level, the last one in document order applies. Properties specified in the session root document provide default values for properties throughout the session; properties specified in the application root document provide default values for properties in every document in the application; properties specified in an individual document override property values specified in the application root document.

D VoiceXML 3.0 XML Schema for schema definitions].

6.22.1 Syntax. Its semantics are specified in 6.22.2 Semantics.

D VoiceXML 3.0 XML Schema for schema definitions].

9.2 Integrating Flow Control Languages into VoiceXML.

[VOICEXML21] specification is helpful as the semantics of these elements are already well defined and well understood. Thus changes in how they are presented are a result of the module and profile style of VoiceXML 3.0 and of making more explicit and formal the precise detailed semantics.

The Legacy profile also plays a transitional role as VoiceXML 3.0 as a whole is built on top of VoiceXML 2.1. VoiceXML 3.0 is a superset of VoiceXML 2.1 and includes the traditional 2.1 functionality plus some new modules. The Legacy profile is the set of modules that were always present in VoiceXML 2.1 but that weren't expressed in the specification as individual modules. This also allows a clear path for the VoiceXML application developer as the application developer will not need to learn substantial new syntax or semantics when they develop in the Legacy profile of VoiceXML 3.0.

The Legacy profile also represents a proof of concept to ensure that the new modular profile method of describing the specification is in no way limited. VoiceXML 3.0 in its entirety will be in no way limited or constrained because of the use of profiles and modules and formalized semantic models. Anything that was standardized in VoiceXML 2.1 can be standardized in this new format and the Legacy profile reveals that.

This profile can be best described in the following 3 sections:

  • Module Conformance
  • Convenience Syntax
  • Default Handlers and Transition Controllers
The schema for the Legacy Profile is given in D.8 Schema for Legacy Profile.
Editorial note

The following content is missing from the Vxml 3.0 specification and needs to be defined:

  • Vxml Root Module
  • Event Handling and throwing (event handlers like <catch>, <noinput>,<nomatch>, etc.)
  • FIA? (although talked about in the Form module, it is not addressed completely)
  • Executable Content
  • <reprompt>
  • <disconnect>
  • <exit>
  • <if> <elseif> <else>
  • <goto>
  • <submit>
  • <log>
  • <return>
  • <script>
  • <throw>
  • All of the VoiceXML 2.0/2.1 properties (Should go in 6.12)
  • <script> Script Module
  • <transfer>
  • <initial>
  • <link>
  • <record>
  • <object>
  • Transitions: <goto>, <submit>, <subdialog>
  • Specify transitions amongst inter-module clearly
  • author controlled transitions within form
  • author controlled transitions outside form
  • automatic transition behavior outside form
  • <filled>

Supported media types must be defined.

6.16 SIV Module) includes verification, identification, and enrollment functions.

6.5 Builtin SSML Module), the Media Module (6.6 Media Module), and the Parseq Module (6.7 Parseq Module) provide functions for presenting information to the user.

6.12 Data Access and Manipulation Module) for accessing local variables, parameters, returned values, etc. This module is not intended to access external databases.

E Convenience Syntax in VoiceXML 2.x shows how the VoiceXML 2.1 <menu> and pre-defined catch handlers could be coded using other V2 notation (i.e., as convenience syntax).

[Examples TBD.]

4.5.2.2 Application Root and [RFC2396]). However, if the URI reference to the root document contains a query string or a namelist attribute, the root document is fetched.

Elements that fetch VoiceXML documents also support the following additional attribute:

Table 97: Additional Fetch Attribute
fetchaudio The URI of the audio clip to play while the fetch is being done. If not specified, the fetchaudio property is used, and if that property is not set, no audio is played during the fetch. The fetching of the audio clip is governed by the audiofetchhint, audiomaxage, audiomaxstale, and fetchtimeout properties in effect at the time of the fetch. The playing of the audio clip is governed by the fetchaudiodelay, and fetchaudiominimum properties in effect at the time of the fetch.

The fetchaudio attribute is useful for enhancing a user experience when there may be noticeable delays while the next document is retrieved. This can be used to play background music, or a series of announcements. When the document is retrieved, the audio file is interrupted if it is still playing. If an error occurs retrieving fetchaudio from its URI, no badfetch event is thrown and no audio is played during the fetch.

[HTML] visual browsers, can use caching to improve performance in fetching documents and other resources; audio recordings (which can be quite large) are as common to VoiceXML documents as images are to HTML pages. In a visual browser it is common to include end user controls to update or refresh content that is perceived to be stale. This is not the case for the VoiceXML interpreter context, since it lacks equivalent end user controls. Thus enforcement of cache refresh is at the discretion of the document through appropriate use of the maxage, and maxstale attributes.

The caching policy used by the VoiceXML interpreter context must adhere to the cache correctness rules of HTTP 1.1 ([RFC2616]). In particular, the Expires and Cache-Control headers must be honored. The following algorithm summarizes these rules and represents the interpreter context behavior when requesting a resource:

  • If the resource is not present in the cache, fetch it from the server using get.
  • If the resource is in the cache,
    • If a maxage value is provided,
      • If age of the cached resource <= maxage,
        • If the resource has expired,
          • Perform maxstale check.
        • Otherwise, use the cached copy.
      • Otherwise, fetch it from the server using get.
    • Otherwise,
      • If the resource has expired,
        • Perform maxstale check.
      • Otherwise, use the cached copy.

The "maxstale check" is:

  • If maxstale is provided,
    • If cached copy has exceeded its expiration time by no more than maxstale seconds, then use the cached copy.
    • Otherwise, fetch it from the server using get.
  • Otherwise, fetch it from the server using get.

Note: it is an optimization to perform a "get if modified" on a document still present in the cache when the policy requires a fetch from the server.

The maxage and maxstale properties are allowed to have no default value whatsoever. If the value is not provided by the document author, and the platform does not provide a default value, then the value is undefined and the 'Otherwise' clause of the algorithm applies. All other properties must provide a default value (either as given by the specification or by the platform).

While the maxage and maxstale attributes are drawn from and directly supported by HTTP 1.1, some resources may be addressed by URIs that name protocols other than HTTP. If the protocol does not support the notion of resource age, the interpreter context shall compute a resource's age from the time it was received. If the protocol does not support the notion of resource staleness, the interpreter context shall consider the resource to have expired immediately upon receipt.

8.2.1 Speech Recognition Properties), DTMF recognition (8.2.2 DTMF Recognition Properties), prompt and collect (8.2.3 Prompt and Collect Properties), media (8.2.4 Media Properties), fetching (8.2.5 Fetch Properties) and miscellaneous (8.2.6 Miscellaneous Properties) properties.

Editorial note

Open issue: should the specification provide specific default values rather than platform-specific?

Open issue: Should we add a 'type' column for all properties?

8.1.2 Caching.

Table 102: Fetch Properties
Name Description Default
audiofetchhint This tells the platform whether or not it can attempt to optimize dialog interpretation by pre-fetching audio. The value is either safe to say that audio is only fetched when it is needed, never before; or prefetch to permit, but not require the platform to pre-fetch the audio. prefetch
audiofetchtimeout The timeout for audio fetches. The value is a Time Designation (see 8.4 Value Designations). platform-specific
audiomaxage Tells the platform the maximum acceptable age, in seconds, of cached audio resources. platform-specific
audiomaxstale Tells the platform the maximum acceptable staleness, in seconds, of expired cached audio resources. platform-specific
documentfetchhint Tells the platform whether or not documents may be pre-fetched. The value is either safe (the default), or prefetch. safe
documentfetchtimeout The timeout for document fetches. The value is a Time Designation (see 8.4 Value Designations). platform-specific
documentmaxage Tells the platform the maximum acceptable age, in seconds, of cached documents. platform-specific
documentmaxstale Tells the platform the maximum acceptable staleness, in seconds, of expired cached documents. platform-specific
grammarfetchhint Tells the platform whether or not grammars may be pre-fetched. The value is either prefetch (the default), or safe. prefetch
grammarfetchtimeout The timeout for grammar fetches. The value is a Time Designation (see 8.4 Value Designations). platform-specific
grammarmaxage Tells the platform the maximum acceptable age, in seconds, of cached grammars. platform-specific
grammarmaxstale Tells the platform the maximum acceptable staleness, in seconds, of expired cached grammars. platform-specific.
mediafetchhint Tells the platform whether or not media files may be pre-fetched. The value is either prefetch (the default), or safe. prefetch
mediafetchtimeout The timeout for media fetches. The value is a Time Designation (see 8.4 Value Designations). platform-specific
mediamaxage Tells the platform the maximum acceptable age, in seconds, of cached media. platform-specific
mediamaxstale Tells the platform the maximum acceptable staleness, in seconds, of expired cached media. platform-specific.
objectfetchhint Tells the platform whether the URI contents for <object> may be pre-fetched or not. The values are prefetch, or safe. prefetch
objectfetchtimeout The timeout for objectfetches. The value is a Time Designation (see 8.4 Value Designations). platform-specific
objectmaxage Tells the platform the maximum acceptable age, in seconds, of cached objects. platform-specific
objectmaxstale Tells the platform the maximum acceptable staleness, in seconds, of expired cached objects. platform-specific
scriptfetchhint Tells whether scripts may be pre-fetched or not. The values are prefetch (the default), or safe. prefetch
scriptfetchtimeout The timeout for script fetches. The value is a Time Designation (see 8.4 Value Designations). platform-specific
scriptmaxage Tells the platform the maximum acceptable age, in seconds, of cached scripts. platform-specific
scriptmaxstale Tells the platform the maximum acceptable staleness, in seconds, of expired cached scripts. platform-specific.
fetchaudio The URI of the audio to play while waiting for a document to be fetched. The default is not to play any audio during fetch delays. There are no fetchaudio properties for audio, grammars, objects, and scripts. The fetching of the audio clip is governed by the audiofetchhint, audiomaxage, audiomaxstale, and fetchtimeout properties in effect at the time of the fetch. The playing of the audio clip is governed by the fetchaudiodelay, and fetchaudiominimum properties in effect at the time of the fetch. undefined
fetchaudiodelay The time interval to wait at the start of a fetch delay before playing the fetchaudio source. The value is a Time Designation (see 8.4 Value Designations). The default interval is platform-dependent, e.g. "2s".  The idea is that when a fetch delay is short, it may be better to have a few seconds of silence instead of a bit of fetchaudio that is immediately cut off. platform-specific
fetchaudiominimum The minimum time interval to play a fetchaudio source, once started, even if the fetch result arrives in the meantime. The value is a Time Designation (see 8.4 Value Designations). The default is platform-dependent, e.g., "5s".  The idea is that once the user does begin to hear fetchaudio, it should not be stopped too quickly. platform-specific

8.2.2 DTMF Recognition Properties to tailor the user experience. The effects of these are shown in the following timing diagrams.

8.2.3 Prompt and Collect Properties and 8.2.1 Speech Recognition Properties to tailor the user experience. The effects of these are shown in the following timing diagrams.

[CSS2].

The Voice Browser DFP Framework W3C Informative Note, February 2006. (See http://www.w3.org/Voice/2006/DFP.)
Multimodal Architecture and Interfaces W3C Working Draft, December 2009. (See http://www.w3.org/TR/2009/WD-mmi-arch-20091201/.)
Document Object Model (DOM) Level 3 Events Specification Schepers, Höhrmann, Le Hégaret and Pixley. W3C Working Draft, September 2009. (See http://www.w3.org/TR/2009/WD-DOM-Level-3-Events-20090908/.)
Key words for use in RFCs to Indicate Requirement Levels IETF RFC 2119, 1997. (See http://www.ietf.org/rfc/rfc2119.txt.)
Standard ECMA-262 ECMAScript Language Specification, Standard ECMA-262, December 1999. (See http://www.ecma-international.org/publications/standards/Ecma-262.htm.)
Voice Extensible Markup Language (VoiceXML) Version 2.0 McGlashan et al. W3C Recommendation, March 2004. (See http://www.w3.org/TR/voicexml20/.)
Voice Extensible Markup Language (VoiceXML) Version 2.1 Oshry et al. W3C Recommendation, May 2007. (See http://www.w3.org/TR/voicexml21/.)
Speech Synthesis Markup Language Version 1.1 Burnett and Shuang. W3C Recommendation, September 2010. (See http://www.w3.org/TR/speech-synthesis11/.)
Speech Recognition Grammar Specification Version 1.0 Hunt and McGlashan. W3C Recommendation, March 2004. (See http://www.w3.org/TR/speech-grammar/.)
Hypertext Transfer Protocol -- HTTP/1.1 IETF RFC 2616, 1999. (See http://www.ietf.org/rfc/rfc2616.txt.)
Uniform Resource Identifiers (URI): Generic Syntax IETF RFC 2396, 1998. (See http://www.ietf.org/rfc/rfc2396.txt.)
Tags for Identifying Languages and Matching of Language Tags A. Phillips and M. Davis, Editors. IETF, September 2009. (See http://www.rfc-editor.org/bcp/bcp47.txt.)
[CSS2]
dialog
An interaction with the user specified in a VoiceXML document. Types of dialogs include forms and menus.
DTMF (Dual Tone Multi-Frequency)
Touch-tone or push-button dialing. Pushing a button on a telephone keypad generates a sound that is a combination of two tones, one high frequency and the other low frequency.
ECMAScript
A standard version of JavaScript backed by the European Computer Manufacturer's Association. See [ECMASCRIPT]
event
A notification "thrown" by the implementation platform, VoiceXML interpreter context, VoiceXML interpreter, or VoiceXML code. Events include exceptional conditions (semantic errors), normal errors (user did not say something recognizable), normal events (user wants to exit), and user defined events.
executable content
Procedural logic that occurs in <block>, <filled>, and event handlers.
form
A dialog that interacts with the user in a highly flexible fashion with the computer and the user sharing the initiative.
FIA (Form Interpretation Algorithm)
An algorithm implemented in a VoiceXML interpreter which drives the interaction between the user and a VoiceXML form or menu. See vxml2: Section 2.1.6, vxml2: Appendix C.
form item
An element of <form> that can be visited during form execution: <initial>, <block>, <field>, <record>, <object>, <subdialog>, and <transfer>.
form item variable
A variable, either implicitly or explicitly defined, associated with each form item in a form. If the form item variable is undefined, the form interpretation algorithm will visit the form item and use it to interact with the user.
implementation platform
A computer with the requisite software and/or hardware to support the types of interaction defined by VoiceXML.
input item
A form item whose purpose is to input a input item variable. Input items include <field>, <record>, <object>, <subdialog>, and <transfer>.
[: language identifier]
A language identifier labels information content as being of a particular human language variant. Following the XML specification for language identification [XML], a legal language identifier is identified by BCP 47 [BCP47].
link
A set of grammars that when matched by something the user says or keys in, either transitions to a new dialog or document or throws an event in the current form item.
menu
A dialog presenting the user with a set of choices and takes action on the selected one.
mixed initiative
A computer-human interaction in which either the computer or the human can take initiative and decide what to do next.
JSGF
Java API Speech Grammar Format. A proposed standard for representing speech grammars. See [JSGF]
object
A platform-specific capability with an interface available via VoiceXML.
request
A collection of data including: a URI specifying a document server for the data, a set of name-value pairs of data to be processed (optional), and a method of submission for processing (optional).
script
A fragment of logic written in a client-side scripting language, especially ECMAScript, which is a scripting language that must be supported by any VoiceXML interpreter.
session
A connection between a user and an implementation platform, e.g. a telephone call to a voice response system. One session may involve the interpretation of more than one VoiceXML document.
SRGS (Speech Recognition Grammar Specification)
A standard format for context-free speech recognition grammars being developed by the W3C Voice Browser group. Both ABNF and XML formats are defined [SRGS].
SSML (Speech Synthesis Markup Language)
A standard format for speech synthesis being developed by the W3C Voice Browser group [SSML].
subdialog
A VoiceXML dialog (or document) invoked from the current dialog in a manner analogous to function calls.
tapered prompts
A set of prompts used to vary a message given to the human. Prompts may be tapered to be more terse with use (field prompting), or more explicit (help prompts).
throw
An element that fires an event.
TTS
text-to-speech; speech synthesis.
user
A person whose interaction with an implementation platform is controlled by a VoiceXML interpreter.
URI
Uniform Resource Indicator.
URL
Uniform Resource Locator.
VoiceXML document
An XML document conforming to the VoiceXML specification.
VoiceXML interpreter
A computer program that interprets a VoiceXML document to control an implementation platform for the purpose of conducting an interaction with a user.
VoiceXML interpreter context
A computer program that uses a VoiceXML interpreter to interpret a VoiceXML Document and that may also interact with the implementation platform independently of the VoiceXML interpreter.
W3C
World Wide Web Consortium http://www.w3.org/