![]() |
VOOZH | about |
Copyright © 2010 ® (, , Keio), All Rights Reserved. W3C liability, trademark and document use rules apply.
W3C technical reports index at http://www.w3.org/TR/.
This is the 16 December 2010 Eighth Public Working Draft of "Voice Extensible Markup Language (VoiceXML) 3.0". The main differences from the previous draft are described in Appendix F Major changes since the last Working Draft. A diff-marked version of this document is also available for comparison purposes.
This document is very much a work in progress. Many sections are incomplete, only stubbed out, or missing entirely. To get early feedback, the group focused on defining enough functionality, modules, and profiles to demonstrate the general framework. To complete the specification, the group expects to introduce additional functionality (for example speaker identification and verification, external eventing) and describe the existing functionality at the level of detail given for the Prompt and Field modules. We explicitly request feedback on the framework, particularly any concerns about its implementability or suitability for expected applications. By early 2011 the group expects all key capabilities to be present in the specification, with details worked out by late 2011.
Applications written as 2.1 documents can be used under a 3.0 processor using the 2.1 profile. As an example, the Implementation Report tests for 2.1 (which includes the IR tests for 2.0) will be supported on a 3.0 processor. Exceptions will be clarifications and changes needed to improve interoperability.
This document is a W3C Working Draft. It has been produced as part of the Voice Browser Activity. The authors of this document are participants in the Voice Browser Working Group. For more information see the Voice Browser FAQ. The Working Group expects to advance this Working Draft to Recommendation status.
Comments are welcome on www-voice@w3.org (archive). See W3C mailing list and archive usage guidelines.
Publication as a Working Draft does not imply endorsement by the W3C Membership. This is a draft document and may be updated, replaced or obsoleted by other documents at any time. It is inappropriate to cite this document as other than work in progress.
This document was produced by a group operating under the 5 February 2004 W3C Patent Policy. W3C maintains a public list of any patent disclosures made in connection with the deliverables of the group; that page also includes instructions for disclosing a patent. An individual who has actual knowledge of a patent which the individual believes contains Essential Claim(s) must disclose the information in accordance with section 6 of the W3C Patent Policy.
Terminology
2 Overview
2.1 Structure of VoiceXML 3.0
2.2 Structure of this document
2.3 How to
read this document
3 Data Flow Presentation (DFP) Framework
3.1 Data
3.2 Flow
3.3 Presentation
4 Core Concepts
4.1 Syntactic and
Semantic descriptions
4.2 Resources,
Resource Controllers, and Events
4.2.1 Top Level Controller
4.3 Syntax
4.4 Event
Model
4.4.1 Internal Events
4.4.1.1
Event Interfaces
4.4.1.1.1
Event
4.4.1.1.2
EventTarget
4.4.1.1.3
EventListener
4.4.1.2
Event Flow
4.4.1.2.1
Event Listener Registration
4.4.1.2.2
Event Listener Activation
4.4.2 External Events
4.5 Document Initialization and
Execution
4.5.1 Initialization
4.5.1.1
DOM Processing
4.5.1.2
Preparation for Execution
4.5.2 Execution
4.5.2.1
Subdialogs
4.5.2.2
Application Root
4.5.2.3
Summary of Syntax/Semantics Interaction
4.5.3 Transition Controllers
5 Resources
5.1 Datamodel Resource
5.1.1 Data Model Resource API
5.2 Prompt
Queue Resource
5.2.1 State Chart
Representation
5.2.2 SCXML
Representation
5.2.3 Defined Events
5.2.4 Device Events
5.2.5 Open Issue
5.3 Recognition Resources
5.3.1 Definition
5.3.2 Defined Events
5.3.3 Device Events
5.3.4 State Chart Representation
5.3.5 SCXML Representation
5.4 Connection Resource
5.4.1 Definition
5.4.2 Final Processing
State
5.4.3 Defined Events
5.4.4 State Chart Representation
5.4.5 SCXML Representation
5.5 Timer
Resource
5.5.1 Definition
5.5.2 Defined Events
5.5.3 Device Events
5.5.4 State Chart Representation
6 Modules
6.1 Grammar Module
6.1.1 Syntax
6.1.1.1
Attributes
6.1.1.2
Content Model
6.1.2 Semantics
6.1.2.1
Definition
6.1.2.2
Defined Events
6.1.2.3
External Events
6.1.2.4
State Chart Representation
6.1.2.5
SCXML Representation
6.1.3 Events
6.1.4 Examples
6.2 Inline SRGS Grammar
Module
6.2.1 Syntax
6.2.2 Semantics
6.2.2.1
Definition
6.2.2.2
Defined Events
6.2.2.3
External Events
6.2.2.4
State Chart Representation
6.2.2.5
SCXML Representation
6.2.3 Events
6.2.4 Examples
6.3 External Grammar
Module
6.3.1 Syntax
6.3.1.1
Attributes
6.3.1.2
Content Model
6.3.2 Semantics
6.3.2.1
Definition
6.3.2.2
Defined Events
6.3.2.3
External Events
6.3.2.4
State Chart Representation
6.3.2.5
SCXML Representation
6.3.3 Events
6.3.4 Examples
6.4 Prompt
Module
6.4.1 Syntax
6.4.1.1
Attributes
6.4.1.2
Content Model
6.4.2 Semantics
6.4.2.1
Definition
6.4.2.2
Defined Events
6.4.2.3
External Events
6.4.2.4
State Chart Representation
6.4.2.5
SCXML Representation
6.4.3 Events
6.4.4 Examples
6.5 Builtin
SSML Module
6.5.1 Syntax
6.5.2 Semantics
6.5.3 Examples
6.6 Media
Module
6.6.1 Syntax
6.6.1.1
Attributes
6.6.1.2
Content Model
6.6.1.2.1
Tips (informative)
6.6.2 Semantics
6.6.3 Examples
6.7 Parseq
Module
6.7.1 Syntax
6.7.2 Semantics
6.7.3 Examples
6.8 Foreach
Module
6.8.1 Syntax
6.8.1.1
Attributes
6.8.1.2
Content Model
6.8.2 Semantics
6.8.3 Examples
6.9 Form
Module
6.9.1 Syntax
6.9.2 Semantics
6.9.2.1
Form RC
6.9.2.1.1
Definition
6.9.2.1.2
Defined Events
6.9.2.1.3
External Events
6.9.2.1.4
State Chart Representation
6.9.2.1.5
SCXML Representation
6.10 Field
Module
6.10.1 Syntax
6.10.2 Semantics
6.10.2.1
Field RC
6.10.2.1.1
Definition
6.10.2.1.2
Defined Events
6.10.2.1.3
External Events
6.10.2.1.4
State Chart Representation
6.10.2.1.5
SCXML Representation
6.10.2.2
PlayandRecognize RC
6.10.2.2.1
Definition
6.10.2.2.2
Defined Events
6.10.2.2.3
External Events
6.10.2.2.4
State Chart Representation
6.10.2.2.5
SCXML Representation
6.11 Builtin
Grammar Module
6.11.1 Usage of Platform Grammars
6.11.2 Platform Requirements
6.11.3 Syntax and Semantics
6.11.4 Examples
6.12 Data Access and
Manipulation Module
6.12.1 Overview
6.12.2 Semantics
6.12.2.1
The scope stack
6.12.2.2
Relevance of scope stack to properties
6.12.2.3
Implicit variables
6.12.2.4
Variable resolution
6.12.2.5
Standard session variables
6.12.2.6
Standard application variables
6.12.2.7
Legal variable values and expressions
6.12.3 Syntax
6.12.3.1
Creating variables: the <var>
element
6.12.3.2
Reading variables: "expr" and "cond" attributes
and the <value> element
6.12.3.2.1
Inserting variable values in prompts: The
<value> element
6.12.3.3
Updating variables: the <assign> and
<data> elements
6.12.3.3.1
The <assign> element
6.12.3.3.2
The <data> element
6.12.3.4
Deleting variables: the <clear>
element
6.12.3.5
Relevance for properties
6.12.4 Backward compatibility with VoiceXML 2.1
6.12.5 Implicit functions using XPath
6.13 External Communication
Module
6.13.1 Receiving external
messages within a voice application
6.13.1.1
External
Message Reflection
6.13.1.2
Receiving
External Messages Asynchronously
6.13.1.3
Receiving
External Messages Synchronously
6.13.1.3.1
<receive>
6.13.2 Sending messages from a
voice application
6.13.2.1
sendtimeout
6.14 Session
Root Module
6.14.1 Syntax
6.14.2 Semantics
6.14.3 Examples
6.15 Run Time
Control Module
6.15.1 <rtc>
6.15.1.1
Syntax
6.15.2 <cancelrtc>
6.15.2.1
Syntax
6.15.3 Semantics
6.15.4 Examples
6.16 SIV Module
6.16.1 SIV Core Functions
6.16.2 Syntax
6.16.3 Semantics
6.16.3.1
Definition
6.16.3.2
Defined Events
6.16.3.3
External Events
6.16.3.4
State Chart Representation
6.16.4 Events
6.16.5 Examples
6.17 Subdialog
Module
6.17.1 Syntax
6.17.2 Semantics
6.17.3 Examples
6.18 Disconnect
Module
6.18.1 Syntax
6.18.1.1
Attributes
6.18.1.2
Content Model
6.18.2 Semantics
6.18.2.1
Definition
6.18.2.2
Defined Events
6.18.2.3
External Events
6.18.2.4
State Chart Representation
6.18.2.5
SCXML Representation
6.18.3 Example
6.19 Play
Module
6.19.1 Semantics
6.19.1.1
Definition
6.19.1.2
Defined Events
6.19.1.3
External Events
6.19.1.4
State Chart Representation
6.19.1.5
SCXML Representation
6.20 Record
Module
6.20.1 Syntax
6.20.1.1
Attributes
6.20.1.2
Content Model
6.20.1.3
Data Model Variables
6.20.2 Semantics
6.20.2.1
RecordInputItem RC
6.20.2.1.1
Definition
6.20.2.1.2
Defined Events
6.20.2.1.3
External Events
6.20.2.1.4
State Chart Representation
6.20.2.1.5
SCXML Representation
6.20.2.2
Record RC
6.20.2.2.1
Definition
6.20.2.2.2
Defined Events
6.20.2.2.3
External Events
6.20.2.2.4
State Chart Representation
6.20.2.2.5
SCXML Representation
6.21 Property
Module
6.21.1 Syntax
6.21.1.1
Attributes
6.21.1.2
Content Model
6.21.2 Semantics
6.21.2.1
Definition
6.21.2.2
Defined Events
6.21.2.3
External Events
6.21.2.4
State Chart Representation
6.21.2.5
SCXML Representation
6.21.3 Events
6.21.4 Examples
6.22 Transition
Controller Module
6.22.1 Syntax
6.22.1.1
Attributes
6.22.1.2
Content Model
6.22.2 Semantics
6.22.2.1
Definition
6.22.2.2
Defined Events
6.22.2.3
External Events
6.22.2.4
State Chart Representation
6.22.2.5
SCXML Representation
6.22.3 Events
6.22.4 Examples
7 Profiles
7.1 Legacy
Profile
7.1.1 Conformance
7.1.1.1
Vxml Root Module
Requirements
7.1.1.2
Form Module
Requirements
7.1.1.3
Field Module
Requirements
7.1.1.4
Prompt Module
Requirements
7.1.1.5
Grammar Module
Requirements
7.1.1.6
Data Access and Manipulation
Module Requirements
7.1.2 Convenience Syntax
7.1.3 Default Handlers and Transition
Controllers
7.2 Basic
Profile
7.2.1 Introduction
7.2.2 What the Basic Profile
includes
7.2.2.1
SIV functions
7.2.2.2
Presentation
functions
7.2.2.3
Capture functions
7.2.2.4
Other modules
7.2.3 Returned results
7.2.4 What the Basic Profile does not
include
7.2.5 Examples
7.3 Maximal
Profile
7.4 Enhanced
Profile
7.5 Convenience Syntax (Syntactic
Sugar)
8 Environment
8.1 Resource
Fetching
8.1.1 Fetching
8.1.2 Caching
8.1.2.1
Controlling the Caching
Policy
8.1.3 Prefetching
8.1.4 Protocols
8.2 Properties
8.2.1 Speech Recognition Properties
8.2.2 DTMF Recognition Properties
8.2.3 Prompt and Collect
Properties
8.2.4 Media Properties
8.2.5 Fetch Properties
8.2.6 Miscellaneous Properties
8.3 Speech and
DTMF Input Timing Properties
8.3.1 DTMF Grammars
8.3.1.1
timeout, No Input
Provided
8.3.1.2
interdigittimeout,
Grammar is Not Ready to Terminate
8.3.1.3
interdigittimeout,
Grammar is Ready to Terminate
8.3.1.4
termchar
and interdigittimeout, Grammar Can Terminate
8.3.1.5
termchar
Empty When Grammar Must Terminate
8.3.1.6
termchar Non-Empty and termtimeout When Grammar
Must Terminate
8.3.1.7
termchar Non-Empty and termtimeout When Grammar
Must Terminate
8.3.1.8
Invalid DTMF Input
8.3.2 Speech Grammars
8.3.2.1
timeout When No Speech
Provided
8.3.2.2
completetimeout
With Speech Grammar Recognized
8.3.2.3
incompletetimeout
with Speech Grammar Unrecognized
8.4 Value
Designations
8.4.1 Integers
8.4.2 Real Numbers
8.4.3 Times
9 Integration with Other Markup
Languages
9.1 Embedding of VoiceXML within
SCXML
9.2 Integrating Flow Control
Languages into VoiceXML
9.2.1 SCXML for Dialog
Management
9.2.1.1
System-driven
Dialog
9.2.1.2
User-driven
Dialog
9.2.2 Graceful
Degradation
9.2.3 SCXML as Basis for
Recursive MVC
Acknowledgements
B References
B.1 Normative
References
B.2 Informative
References
C Glossary of Terms
D VoiceXML 3.0 XML Schema
D.1 Schema for VXML
Root Module
D.2 Schema for Form
Module
D.3 Schema for
Field Module
D.4 Schema for
Prompt Module
D.5 Schema
for Builtin SSML Module
D.6 Schema for
Foreach Module
D.7 Schema for Data
Access and Manipulation Module
D.8 Schema for
Legacy Profile
E Convenience Syntax in VoiceXML
2.x
E.1 Simplified Dialog
Structure
E.2 Examples
E.2.1 <menu> with
<choice>
E.2.2 Equivalent <form>,
<field>, <option>
E.2.3 Equivalent
<form>, <field>, <grammar>
F Major changes since the last Working
Draft
[RFC2119] and indicate required levels for compliant VoiceXML 3.0 implementations.
Terms used in this specification are defined in Appendix C Glossary of Terms.
3 Data Flow Presentation (DFP) Framework presents the Data-Flow-Presentation Framework, its importance for the development of VoiceXML 3.0 and how VoiceXML 3.0 fits into the model.
4 Core Concepts explains the core concepts underlying the new structure for VoiceXML, including resources, resource controllers, the relationship between syntax and semantics, DOM eventing, modules and profiles.
5 Resources presents the resources defined for the language. These provide the key presentation-related functionality in the language.
6 Modules presents the modules defined for the language. Each module consists of a syntax piece (with its user-visible events), a semantics piece (with its behind-the-scenes events) and a description of how the two are connected.
7 Profiles presents two profiles. The first, the VoiceXML 2.1 profile, shows how a language similar to VoiceXML 2.1 can be created using the structure and functionality of VoiceXML 3.0. The second, the Basic profile, leaves out higher-level flow control constructs such as <form> and the associated Form Interpretation Algorithm.
The Appendices provide useful references and a glossary of terms used in the specification.
3 Data Flow Presentation (DFP) Framework. The data-flow- presentation distinction applies not only to VoiceXML 3.0, but to many of W3C's specifications. Understanding VoiceXML's role as a presentation language is crucial context for understanding the rest of the specification.
For application authors: we recommend that you begin with syntax and only gradually explore details of the semantics as you need to understand behavioral specifics.
For VoiceXML platform developers: we recommend that you begin with the functionality and framework and only focus on syntax later.
[DFP]) Framework.
Although VoiceXML 3.0 is a presentation language, it also contains within it all 3 levels of the DFP framework ( Figure 6).
👁 DFP ArchitectureFigure 6: DFP Architecture
The Data Flow Presentation (DFP) Framework is an instance of the Model-View-Controller paradigm, where computation and control flow are kept distinct from application data and from the way in which the application communicates with the outside world. This partitioning of an application allows for any one layer to be replaced independently of the other two. In addition, it is possible to simultaneously make use of more than one Data (Model) language, Flow (Controller), and/or Presentation (View) language.
The visual UML state chart diagrams are informative. They are included for ease of reading and quick understanding. The more detailed textual SCXML representations are normative.
It is important to note that this model places no burden or requirements that a VoiceXML interpreter must implement behavior as described in the model. Rather, the requirement is that the behavior must be the same as if it were implemented as described, but it is permitted to have optimizations or different architecture behind the implementation of the markup interpretation.
The semantic descriptions are important for reasons including the following:
4.4 Event Model . The interaction between actual DOM events and logical SCXML events is described in 4.5 Document Initialization and Execution, below.
Each VoiceXML 3.0 module is described using SCXML notation and optionally a UML state chart representation of its underlying behavior expressed in terms of resources and resource controllers. While the resources and resource controllers are not exposed directly in the markup, they are used to define the semantics of VoiceXML 3.0 markup elements. Figure 7 illustrates the relationship among resource controllers, resources, and media devices. The arrows represent events exchanged among components. A more concrete example is represented in Figure 8 which illustrates the Prompt Resource controller (further defined in 6.4.2 Semantics), the PromptQueue Resource, and the SSML Media Player.
👁 Semantic model overviewFigure 7: Semantic model with Resources and Resource Controllers
👁 Semantic model detailsFigure 8: Semantic model with Specific Examples
[DOM3Events] specification. DOM Level 3 Events offer a robust set of interfaces for managing the listener registration, dispatching, propagation, and handling of events, as well as a description of how events flow through an XML tree.
The DOM 3.0 event model offers VoiceXML developers a rich set of interfaces that allow them to easily add behavior to their applications. In addition, conforming to the standard DOM event model enables authors to integrate their Voice applications in next generation multimodal or multi-namespaced frameworks such as MMI and CDF with minimal efforts. Note that the VXML 2.0 style events are supported through a new DOM event named 'vxmlevent', and if this vxmlevent is uncanceled then the default action is to run the VXML 2.0 event handling.
Within the VoiceXML 3.0 semantic model, the DOM Level 3 Events APIs are available to all Resource Controllers that have markup elements associated with them. Indeed, this section covers the eventing APIs as available to VoiceXML 3.0 markup elements. The following section describes how the semantic model ties in with the DOM eventing model.
Event interface to support voice specific event information. In particular, the VoiceXML 3.0 Event interface supports a count integer that stores the number of times a resources emits a particular event type. The semantic model manages the count field by incrementing its value and resetting it as described in the section that follows.
| Editorial note | |
| Open Issue: Because we now are using the 'vxmlevent' DOM event, we don't need to add a count to the generic DOM events (and thus change the generic DOM events). Instead, we need to specify the count as one of the properties of the vxmlevent event. | |
EventTarget interface.This interface allows registration and removal of event listeners as well as dispatching of events.
EventListener interface. This interface allows the activation of handlers associated with a particular event. When a listener is activated, the event handler execution is done in the semantic model as described in the section that follows.
event listener group; all events within a group are ordered. As such, in VoiceXML 3.0, event listeners are registered as they are encountered in the document. Furthermore, all event listeners registered on an element belong to the same default group. Both of these provisions ensure that event handlers will execute in document order.
[MMI]. These life cycle events allow the flow component of the DFP architecture to control the presentation layer by starting and stopping the processing of markup. By handling these events, the VoiceXML interpreter acts as a 'modality component' in the multimodal architecture, while the flow component acts as an 'interaction manager'. As a result, VoiceXML 3 applications can be easily extended into multimodal applications. However it is important to note that support for the life cycle events is required by the DFP framework in all applications, whether uni- or multimodal.
The interpreter must handle the following life cycle events automatically:
All other life cycle events and all other external events are ignored unless the External Communications Module 6.13 External Communication Module is included in the profile. If the External Communications Module is present, all other external events are passed up to the application, placed in the application event queue and then handled as specified by the developer using the functionality defined in that module.
| Editorial note | |
| Open Issue: Should ClearContextRequest be handled automatically? Should Done be sent automatically when the document is finished? Where do these response events get sent? | |
3.2 Flow). Furthermore, these resource events are conceptual, not DOM events: they are used to define relationship with other conceptual entities and are not exposed at the markup level.
The following resources are defined: data model (5.1 Datamodel Resource), prompt queue (5.2 Prompt Queue Resource), recognition -- DTMF, ASR, and SIV (5.3 Recognition Resources), connection (5.4 Connection Resource), and timer (5.5 Timer Resource).
6.1.1 Syntax. Its semantics are specified in 6.1.2 Semantics.
| Editorial note | |
|
Issue: Grammar processing will need to know the Base URI to resolve relative references. |
|
D VoiceXML 3.0 XML Schema for schema definitions].
6.2.1 Syntax. Its semantics are specified in 6.2.2 Semantics.
[SRGS]), minus the XML Prolog. Note that both elements and attributes must be in the SRGS namespace (http://www.w3.org/2001/06/grammar).
6.3.1 Syntax. Its semantics are specified in 6.3.2 Semantics.
D VoiceXML 3.0 XML Schema for schema definitions].
6.3.1.2 Content Model for restrictions on occurrence of src and srcexpr attributes.
The value of the src attribute is a URI specifying the location of the grammar with an optional fragment for the rulename. Section 2.2 of the Speech Recognition Grammar Specification [SRGS] defines several forms of rule reference. The following are the forms that are permitted on a grammar element in VoiceXML:
The following are the forms of rule reference defined by [SRGS] that are not supported in VoiceXML 3.
6.5 Builtin SSML Module, 6.6 Media Module and 6.7 Parseq Module).
The attributes and content model of <prompt> are specified in 6.4.1 Syntax. Its semantics are specified in 6.4.2 Semantics, including how the final prompt content is determined and how the prompt is queued for playback using the PromptQueue Resource (5.2 Prompt Queue Resource).
D VoiceXML 3.0 XML Schema for schema definitions].
The attributes and content model of SSML elements are specified in 6.5.1 Syntax. Its semantics are specified in 6.5.2 Semantics, including how elements are evaluated to yield final content for playback.
D VoiceXML 3.0 XML Schema for schema definitions].
This module defines an SSML ([SSML]) Conforming Speech Synthesis Markup Language Fragment where:
Conforming Speech Synthesis Markup Language Processor.
Evaluation consits of the following:
| Editorial note | |
|
Need to specify further error cases Need to clarify unsupported languages and external (e.g. MRCP) SSML processors. |
|
The <media> element can be seen as an enhanced and generalized version of the VoiceXML <audio> element. It is enhanced in that it provides additional attributes describing the type of media, conditional selection, as well as control over playback . It is a generalization of the <audio> element in that it permits media other than audio to be played; for example, media formats which contains audio and video tracks.
D VoiceXML 3.0 XML Schema for schema definitions].
occurrence constraints for restrictions on occurrence of src and srcexpr attributes.
Calculations of rendered durations and interaction with other timing properties follow SMIL 2.1 Computing the active duration where
Note that not all SMIL 2.1 Timing features are supported.
| Editorial note | |
|
Use SMIL 3.0 or SMIL 2.1 reference? |
|
This module is dependent upon the media module (6.6 Media Module).
With connections which support multiple media streams, it is possible to simultaneously playback multiple media types. For media container formats like 3GPP, audio and video media can be generated simultaneously from the same media resource.
There are established use cases for simultaneous playback of multiple media which are specified in separate resources:
The intention is provide support for basic use cases where audio or TTS output from one resource can be complemented with output from another resource as permitted by the connection and platform capabilities.
6.5 Builtin SSML Module, the <prompt> element defined in 6.4 Prompt Module, etc.
The attributes and content model of the element are specified in 6.8.1 Syntax. Its semantics are specified in 6.8.2 Semantics.
D VoiceXML 3.0 XML Schema for schema definitions].
6.10.2.1 Field RC) of the Field Module.
The behavior of the Form RC follows the VoiceXML FIA, although some aspects of this are not modeled directly in this RC: external transition handling is not part of the form RC; input items used separate RCs to manage coordination between media resources, while recognition results can be received directly by form, field or other RCs.
[This initial version does not address all aspects of FIA behavior; for example, event handling, error handling and external transitions are not covered.]
| Editorial note | |
| This description of the form needs to be updated with all the new functionality we have given the form via our new eventing approach. | |
6.10.2.1 Field RC), PlayandRecognize (6.10.2.2 PlayandRecognize RC), ...
6.10.2.2 PlayandRecognize RC), causing any queued prompts to be played and recognition to be initiated. In the event data, the controller is set to this RC, and other data is derived from data model properties. The RC transitions to the Executing state.
In the Executing state, the PlayAndRecognize RC must send recoResults (or error events: noinput, nomatch, error.semantic) to the field RC.
If the field RC receives the recoResults, then it updates its name variable in the Datamodel Resource. The field RC then sends a 'fieldResult' event to its controller indicating that a field result has been received and processed.
If the recoResult is received by the field RC's controller, then the field receives an 'evaluate' event which causes it to transition to the Evaluating state.
In the Evaluating state, the field RC iterates through its children executing each filled RC: this is modeled by a separate RC (see XXX). When evaluation is complete, the RC sends a 'evaluated' event to its controller and transitions to the Ready state.
5.1.1 Data Model Resource API.
At any given point in time, based on the VoiceXML document structure and the execution state, the stack may contain the following scopes whose semantics are described in VoiceXML 3.0 as follows (bottom to top):
8.2 Properties. Properties may be defined for the whole application, for the whole document at the <vxml> level, for a particular dialog at the <form> or <menu> level, or for a particular form item. Thus, access to properties is also controlled by means of the same scope stack that is used by named variables.
VoiceXML 3.0 provides a consistent mechanism to unambiguously read these properties in any scope using the data access and manipulation language in a manner similar to accessing and manipulating named variables. This is described in the two sections below.
5.1.1 Data Model Resource API.
The above examples result in the following Data Model Resource API calls, in order:
<root xmlns=""> <flight>SW123</flight> <origin>JFK</origin> <depart>2009-01-01T14:32:00</depart> <destination>SFO</destination> <arrive>2009-01-01T18:14:00</arrive> </root>
6.4 Prompt Module specifies prompts in detail.
Attributes of <value>
5.1.1 Data Model Resource API.
The above examples result in the following Data Model Resource API calls:
5.1.1 Data Model Resource API.
The above examples result in the following Data Model Resource API calls, in order:
<root xmlns=""> <flight>SW123</flight> <origin>JFK</origin> <depart>2009-01-01T14:32:00</depart> <destination>SFO</destination> <arrive>2009-01-01T18:14:00</arrive> </root>
5.1.1 Data Model Resource API.
The single <data> usage in the above example results in the following behavior and Data Model Resource API calls:
At the time of <data> execution, the variable with name "quote" is updated in the document scope using the in-line specification for the new value retrieved from the URI expression 'http://www.example.org/getquote?ticker=' + document('tickers')/ford which evaluates to http://www.example.org/getquote?ticker=f
<quote xmlns="http://www.example.org"> <ticker>F</ticker> <name>Ford Motor Company</name> <change>0.10</change> <last>3.00</last> </quote>
<data> Fetching Properties
These properties pertain to documents fetched by the <data> element.
5.1.1 Data Model Resource API.
The above examples result in the following Data Model Resource API calls, in order:
5.1.1 Data Model Resource API. This will allow resetting variable values to the initial in-line specification when such is present, for instance.
Resolution:
None recorded.
8.2 Properties. VoiceXML 3.0 provides a consistent mechanism to unambiguously read these properties in any scope using the data access and manipulation language in a manner similar to accessing and manipulating named variables as illustrated in section 2.3.2. However, properties cannot be created, updated or deleted using any of the syntax described in this module. The <property> element syntax must be used for such operations.
Events are dispatched to the application serially. Since the interpreter only reflects the data associated with a single external message at a time, it is the application's responsibility to manage the data associated with each external message once that message has been delivered.
The following example demonstrates asynchronous receipt of an external message. The catch handler copies the reflected external message into an array at application scope.
<vxml version="2.1"
xmlns="http://www.w3.org/2001/vxml">
<property name="externalevents.enable" value="true"/>
<var name="myMessages" expr="new Array()"/>
<catch event="externalmessage">
<var name="lm" expr="application.lastmessage$"/>
<if cond="lm.contenttype == 'text/xml' || lm.contenttype == 'application/xml'">
<log>received XML with root document element
<value expr="lm.content.documentElement.nodeName"/>
</log>
<elseif cond="typeof lm.content == 'string'"/>
<log>received <value expr="lm.content"/></log>
<else/>
<log>received unknown external message type
<value expr="typeof lm.content"/>
</log>
</if>
<script>
myMessages.push({'content' : lm.content, 'ctype' : lm.contenttype});
</script>
</catch>
<form>
<field name="num" type="digits">
<prompt>pick a number any number</prompt>
<catch event="noinput nomatch">
sorry. didn't get that.
<reprompt/>
</catch>
<filled>
you said <value expr="num"/>
<clear/>
</filled>
</field>
</form>
</vxml>
Section 6.5 of [VXML2]. If not specified, the value is derived from the innermost sendtimeout property.
VXML 2.0 section 5.1.2 when talking about the variable scopes the text for application in table 40 is also appropriate for session (new text "These are declared with <var> and <script> elements that are children of the session root document's <vxml> element. They are initialized when the session root document is loaded. They exist while the session document is loaded, and are visible to the session root document, the application root document, and any other loaded application leaf document.").
This session document then is loaded and active in the hierarchy of documents that follows the javascript scope chaining (that is a document is below an application root is below a session root). This means that if a variable is declared in the session root and then in some local form in the leaf document the variable would be shadowed (just like how the shadowing from the application root).
This also implies that the catch selection algorithm as described in VXML 2.0 section 5.2.4 would have to change to include the session root document as a potential source of catch handlers (new text "Form an ordered list of catches consisting of all catches in the current scope and all enclosing scopes (form item, form, document, application root document, session root document, interpreter context), ordered first by scope (starting with the current scope), and then within each scope by document order."). Then all catch handling would remain the same, in particular the as-if-by copy semantics are retained so if an event from a leaf document was handled by a catch handler from the session root the catch handler wouldn't execute within the context of the session root document but would instead execute as if by copy into the local leaf document context.
This also implies that property lookup from section 6.3 of VXML 2.0 would have to change to say that property value lookup can also go to the session root, if a more local value for the property isn't found (new text "Properties may be defined for the whole session, for the whole application, for the whole document at the <vxml> level, for a particular dialog at the <form> or <menu> level, or for a particular form item.). This doesn't change the usual way properties work where a property at a lower level override one at a higher level.
This also implies that the behavior for link's that are document-level link of session roots are active which would be a change to section 2.5 of VXML 2.0 (new text "If an application root document has a document-level link, its grammars are active no matter what document of the application is being executed. If an session root document has a document-level link, its grammars are active no matter what document of the session is being executed. If execution is in a modal form item, then link grammars at the form, document, application or session level are not active.").
Similar to for links, the scope of grammars from section 3.1.3 of VXML 2.0 would be changed to specify what happens when a grammar from a session root has document scope (new text "Form grammars are by default given dialog scope, so that they are active only when the user is in the form. If they are given scope document, they are active whenever the user is in the document. If they are given scope document and the document is the application root document, then they are also active whenever the user is in another loaded document in the same application. If they are given scope document and the document is the session root document, then they are also active throughout the session.". Note that this active throughout the session can still be trumped by modal listen states (just like the application root can). Section 3.1.4 of VXML 2.0 also changes the activation of grammars bulleted list to include the session root (new text: "grammars contained in links in its application root document or session root document, and grammars for menus and forms in its application root document or session root document which are given document scope.").
D VoiceXML 3.0 XML Schema for schema definitions].
The <voicemodel> element has the attributes specified in Table 72, in addition to the fetchtimeout, fetchhint, maxage, and maxstale attributes as specified in 8.1.1 Fetching.
6.18.1 Syntax. Its semantics are specified in 6.18.2 Semantics.
D VoiceXML 3.0 XML Schema for schema definitions].
6.19 Play Module), causing any queued prompts to be played. Note that the event data passed to Play RC must have:
This RC transitions to the Executing state after sending the event request to the Play RC.
In the Executing state, when the disconnect RC receives the "playDone" event, it instructs the connection resource to disconnect the interpreter context from the user and enters into the "Disconnecting" state.
In the "Disconnecting" state, when the disconnect RC receives "userDisconnected" event,
| Editorial note | |
|
Play RC is not yet defined. |
|
6.20.1 Syntax. Its semantics are specified in 6.20.2 Semantics.
D VoiceXML 3.0 XML Schema for schema definitions].
6.20.2.1 RecordInputItem RC), PlayandRecognize (6.10.2.2 PlayandRecognize RC), Record (6.20.2.2 Record RC).
Properties may be defined for the session, for the whole application, for the whole document at the <vxml> level, for a particular dialog at the <form> or <menu> level, or for a particular form item. Properties apply to their parent element and all the descendants of the parent. A property at a lower level overrides a property at a higher level. When different values for a property are specified at the same level, the last one in document order applies. Properties specified in the session root document provide default values for properties throughout the session; properties specified in the application root document provide default values for properties in every document in the application; properties specified in an individual document override property values specified in the application root document.
D VoiceXML 3.0 XML Schema for schema definitions].
6.22.1 Syntax. Its semantics are specified in 6.22.2 Semantics.
D VoiceXML 3.0 XML Schema for schema definitions].
[VOICEXML21] specification is helpful as the semantics of these elements are already well defined and well understood. Thus changes in how they are presented are a result of the module and profile style of VoiceXML 3.0 and of making more explicit and formal the precise detailed semantics.
The Legacy profile also plays a transitional role as VoiceXML 3.0 as a whole is built on top of VoiceXML 2.1. VoiceXML 3.0 is a superset of VoiceXML 2.1 and includes the traditional 2.1 functionality plus some new modules. The Legacy profile is the set of modules that were always present in VoiceXML 2.1 but that weren't expressed in the specification as individual modules. This also allows a clear path for the VoiceXML application developer as the application developer will not need to learn substantial new syntax or semantics when they develop in the Legacy profile of VoiceXML 3.0.
The Legacy profile also represents a proof of concept to ensure that the new modular profile method of describing the specification is in no way limited. VoiceXML 3.0 in its entirety will be in no way limited or constrained because of the use of profiles and modules and formalized semantic models. Anything that was standardized in VoiceXML 2.1 can be standardized in this new format and the Legacy profile reveals that.
This profile can be best described in the following 3 sections:
| Editorial note | |
|
The following content is missing from the Vxml 3.0 specification and needs to be defined:
Supported media types must be defined. |
|
6.16 SIV Module) includes verification, identification, and enrollment functions.
6.5 Builtin SSML Module), the Media Module (6.6 Media Module), and the Parseq Module (6.7 Parseq Module) provide functions for presenting information to the user.
6.12 Data Access and Manipulation Module) for accessing local variables, parameters, returned values, etc. This module is not intended to access external databases.
E Convenience Syntax in VoiceXML 2.x shows how the VoiceXML 2.1 <menu> and pre-defined catch handlers could be coded using other V2 notation (i.e., as convenience syntax).
[Examples TBD.]
4.5.2.2 Application Root and [RFC2396]). However, if the URI reference to the root document contains a query string or a namelist attribute, the root document is fetched.
Elements that fetch VoiceXML documents also support the following additional attribute:
| fetchaudio | The URI of the audio clip to play while the fetch is being done. If not specified, the fetchaudio property is used, and if that property is not set, no audio is played during the fetch. The fetching of the audio clip is governed by the audiofetchhint, audiomaxage, audiomaxstale, and fetchtimeout properties in effect at the time of the fetch. The playing of the audio clip is governed by the fetchaudiodelay, and fetchaudiominimum properties in effect at the time of the fetch. |
|---|
The fetchaudio attribute is useful for enhancing a user experience when there may be noticeable delays while the next document is retrieved. This can be used to play background music, or a series of announcements. When the document is retrieved, the audio file is interrupted if it is still playing. If an error occurs retrieving fetchaudio from its URI, no badfetch event is thrown and no audio is played during the fetch.
[HTML] visual browsers, can use caching to improve performance in fetching documents and other resources; audio recordings (which can be quite large) are as common to VoiceXML documents as images are to HTML pages. In a visual browser it is common to include end user controls to update or refresh content that is perceived to be stale. This is not the case for the VoiceXML interpreter context, since it lacks equivalent end user controls. Thus enforcement of cache refresh is at the discretion of the document through appropriate use of the maxage, and maxstale attributes.
The caching policy used by the VoiceXML interpreter context must adhere to the cache correctness rules of HTTP 1.1 ([RFC2616]). In particular, the Expires and Cache-Control headers must be honored. The following algorithm summarizes these rules and represents the interpreter context behavior when requesting a resource:
The "maxstale check" is:
Note: it is an optimization to perform a "get if modified" on a document still present in the cache when the policy requires a fetch from the server.
The maxage and maxstale properties are allowed to have no default value whatsoever. If the value is not provided by the document author, and the platform does not provide a default value, then the value is undefined and the 'Otherwise' clause of the algorithm applies. All other properties must provide a default value (either as given by the specification or by the platform).
While the maxage and maxstale attributes are drawn from and directly supported by HTTP 1.1, some resources may be addressed by URIs that name protocols other than HTTP. If the protocol does not support the notion of resource age, the interpreter context shall compute a resource's age from the time it was received. If the protocol does not support the notion of resource staleness, the interpreter context shall consider the resource to have expired immediately upon receipt.
8.2.1 Speech Recognition Properties), DTMF recognition (8.2.2 DTMF Recognition Properties), prompt and collect (8.2.3 Prompt and Collect Properties), media (8.2.4 Media Properties), fetching (8.2.5 Fetch Properties) and miscellaneous (8.2.6 Miscellaneous Properties) properties.
| Editorial note | |
|
Open issue: should the specification provide specific default values rather than platform-specific? Open issue: Should we add a 'type' column for all properties? |
|
| Name | Description | Default |
|---|---|---|
| audiofetchhint | This tells the platform whether or not it can attempt to optimize dialog interpretation by pre-fetching audio. The value is either safe to say that audio is only fetched when it is needed, never before; or prefetch to permit, but not require the platform to pre-fetch the audio. | prefetch |
| audiofetchtimeout | The timeout for audio fetches. The value is a Time Designation (see 8.4 Value Designations). | platform-specific |
| audiomaxage | Tells the platform the maximum acceptable age, in seconds, of cached audio resources. | platform-specific |
| audiomaxstale | Tells the platform the maximum acceptable staleness, in seconds, of expired cached audio resources. | platform-specific |
| documentfetchhint | Tells the platform whether or not documents may be pre-fetched. The value is either safe (the default), or prefetch. | safe |
| documentfetchtimeout | The timeout for document fetches. The value is a Time Designation (see 8.4 Value Designations). | platform-specific |
| documentmaxage | Tells the platform the maximum acceptable age, in seconds, of cached documents. | platform-specific |
| documentmaxstale | Tells the platform the maximum acceptable staleness, in seconds, of expired cached documents. | platform-specific |
| grammarfetchhint | Tells the platform whether or not grammars may be pre-fetched. The value is either prefetch (the default), or safe. | prefetch |
| grammarfetchtimeout | The timeout for grammar fetches. The value is a Time Designation (see 8.4 Value Designations). | platform-specific |
| grammarmaxage | Tells the platform the maximum acceptable age, in seconds, of cached grammars. | platform-specific |
| grammarmaxstale | Tells the platform the maximum acceptable staleness, in seconds, of expired cached grammars. | platform-specific. |
| mediafetchhint | Tells the platform whether or not media files may be pre-fetched. The value is either prefetch (the default), or safe. | prefetch |
| mediafetchtimeout | The timeout for media fetches. The value is a Time Designation (see 8.4 Value Designations). | platform-specific |
| mediamaxage | Tells the platform the maximum acceptable age, in seconds, of cached media. | platform-specific |
| mediamaxstale | Tells the platform the maximum acceptable staleness, in seconds, of expired cached media. | platform-specific. |
| objectfetchhint | Tells the platform whether the URI contents for <object> may be pre-fetched or not. The values are prefetch, or safe. | prefetch |
| objectfetchtimeout | The timeout for objectfetches. The value is a Time Designation (see 8.4 Value Designations). | platform-specific |
| objectmaxage | Tells the platform the maximum acceptable age, in seconds, of cached objects. | platform-specific |
| objectmaxstale | Tells the platform the maximum acceptable staleness, in seconds, of expired cached objects. | platform-specific |
| scriptfetchhint | Tells whether scripts may be pre-fetched or not. The values are prefetch (the default), or safe. | prefetch |
| scriptfetchtimeout | The timeout for script fetches. The value is a Time Designation (see 8.4 Value Designations). | platform-specific |
| scriptmaxage | Tells the platform the maximum acceptable age, in seconds, of cached scripts. | platform-specific |
| scriptmaxstale | Tells the platform the maximum acceptable staleness, in seconds, of expired cached scripts. | platform-specific. |
| fetchaudio | The URI of the audio to play while waiting for a document to be fetched. The default is not to play any audio during fetch delays. There are no fetchaudio properties for audio, grammars, objects, and scripts. The fetching of the audio clip is governed by the audiofetchhint, audiomaxage, audiomaxstale, and fetchtimeout properties in effect at the time of the fetch. The playing of the audio clip is governed by the fetchaudiodelay, and fetchaudiominimum properties in effect at the time of the fetch. | undefined |
| fetchaudiodelay | The time interval to wait at the start of a fetch delay before playing the fetchaudio source. The value is a Time Designation (see 8.4 Value Designations). The default interval is platform-dependent, e.g. "2s". The idea is that when a fetch delay is short, it may be better to have a few seconds of silence instead of a bit of fetchaudio that is immediately cut off. | platform-specific |
| fetchaudiominimum | The minimum time interval to play a fetchaudio source, once started, even if the fetch result arrives in the meantime. The value is a Time Designation (see 8.4 Value Designations). The default is platform-dependent, e.g., "5s". The idea is that once the user does begin to hear fetchaudio, it should not be stopped too quickly. | platform-specific |
8.2.2 DTMF Recognition Properties to tailor the user experience. The effects of these are shown in the following timing diagrams.
8.2.3 Prompt and Collect Properties and 8.2.1 Speech Recognition Properties to tailor the user experience. The effects of these are shown in the following timing diagrams.