1. Introduction
The Web Neural Network API defines a web-friendly hardware-agnostic abstraction layer that makes use of Machine Learning capabilities of operating systems and underlying hardware platforms without being tied to platform-specific capabilities. The abstraction layer addresses the requirements of key Machine Learning JavaScript frameworks and also allows web developers familiar with the ML domain to write custom code without the help of libraries.
For an illustrated introduction, please see the explainer.
2. Use cases
2.1. Application Use Cases
This section illustrates application-level use cases for neural network inference hardware acceleration. All applications in those use cases can be built on top of pre-trained deep neural network (DNN) [models].
Note: Please be aware that some of the use cases described here, are by their very nature, privacy-invasive. Developers who are planning to use the API for such use cases should ensure that the API is being used to benefit users, for purposes that users understand, and approve. They should apply the Ethical Principles for Web Machine Learning [webmachinelearning-ethics] and implement appropriate privacy risk mitigations such as transparency, data minimisation, and users controls.
Note: § 3 Accessibility Considerations provides guidance on how to improve accessibility of these use cases.
2.1.1. Person Detection
A user opens a web-based video conferencing application, but she temporarily leaves from her room. The application is watching whether she is in front of her PC by using object detection (for example, using object detection approaches such as [SSD] or [YOLO] that use a single DNN) to detect regions in a camera input frame that include persons.
When she comes back, the application automatically detects her and notifies other online users that she is active now.
2.1.2. Semantic Segmentation
A user joins a teleconference via a web-based video conferencing application at her desk since no meeting room in her office is available. During the teleconference, she does not wish that her room and people in the background are visible. To protect the privacy of the other people and the surroundings, the application runs a machine learning model such as [DeepLabv3+], [MaskR-CNN] or [SegAny] to semantically split an image into segments and replaces segments that represent other people and background with another picture.
2.1.3. Skeleton Detection
A web-based video conferencing application tracks a pose of user’s skeleton by running a machine learning model, which allows for real-time human pose estimation, such as [PoseNet] to recognize her gesture and body language. When she raises her hand, her microphone is automatically unmuted and she can start speaking on the teleconference.
2.1.4. Face Recognition
There are multiple people in the conference room and they join an online meeting using a web-based video conferencing application. The application detects faces of participants by using object detection (for example, using object detection approaches such as [SSD]) and checks whether each face was present at the previous meeting or not by running a machine learning model such as [FaceNet], which verifies whether two faces would be identical or not.
2.1.5. Facial Landmark Detection
A user wants to find new glasses that beautifully fits her on an online glasses store. The online store offers web-based try-on simulator that runs a machine learning model such as Face Alignment Network [FAN] to detect facial landmarks like eyes, nose, mouth, etc. When she chooses a pair of glasses, the simulator properly renders the selected glasses on the detected position of eyes on her facial image.
2.1.6. Style Transfer
A user is looking for cosmetics on an online store and wondering which color may fit her face. The online store shows sample facial makeup images of cosmetics, and offers makeup simulator that runs a machine learning model like [ContextualLoss] or [PairedCycleGAN] to transfer the makeup style of the sample makeup image to her facial image. She can check how the selected makeup looks like on her face by the simulator.
2.1.7. Super Resolution
A web-based video conferencing is receiving a video stream from its peer, but the resolution of the video becomes lower due to network congestion. To prevent degradation of the perceived video quality, the application runs a machine learning model for super-resolution such as [SRGAN] to generate higher-resolution video frames.
2.1.8. Image Captioning
For better accessibility, a web-based presentation application provides automatic image captioning by running a machine learning model such as [im2txt] which predicts explanatory words of the presentation slides.
2.1.9. Text-to-image
Images are a core part of modern web experiences. An ability to generate images based on text input in a privacy-preserving manner enables visual personalization and adaptation of web applications and content. For example, a web application can use as an input a natural language description on the web page or a description provided by the user within a text prompt to produce an image matching the text description. This text-to-image use case enabled by latent diffusion model architecture [LDM] forms the basis for additional text-to-image use cases. For example, inpainting where a portion of an existing image on the web page is selectively modified using the newly generated content, or the converse, outpainting, where an original image is extended beyond its original dimensions filling the empty space with generated content.
2.1.10. Machine Translation
Multiple people from various countries are talking via a web-based real-time text chat application. The application translates their conversation by using a machine learning model such as [GNMT] or [OpenNMT], which translates every text into different language.
2.1.11. Emotion Analysis
A user is talking to her friend via a web-based real-time text chat application, and she is wondering how the friend feels because she cannot see the friend’s face. The application analyses the friend’s emotion by using a machine learning model such as [DeepMoji], which infers emotion from input texts, and displays an emoji that represents the estimated emotion.
2.1.12. Video Summarization
A web-based video conferencing application records received video streams, and it needs to reduce recorded video data to be stored. The application generates the short version of the recorded video by using a machine learning model for video summarization such as [Video-Summarization-with-LSTM].
2.1.13. Noise Suppression
A web-based video conferencing application records received audio streams, but usually the background noise is everywhere. The application leverages real-time noise suppression using Recurrent Neural Network such as [RNNoise] for suppressing background dynamic noise like baby cry or dog barking to improve audio experiences in video conferences.
2.1.14. Speech Recognition
Speech recognition, also known as speech to text, enables recognition and translation of spoken language into text. Example applications of speech recognition include transcription, automatic translation, multimodal interaction, real-time captioning and virtual assistants. Speech recognition improves accessibility of auditory content and makes it possible to interact with such content in a privacy-preserving manner in a textual form. Examples of common use cases include watching videos or participating in online meetings using real-time captioning. Models such as [Whisper] approach humans in their accuracy and robustness and are well positioned to improve accessibility of such use cases.
2.1.15. Text Generation
Various text generation use cases are enabled by large language models (LLM) that are able to perform tasks where a general ability to predict the next item in a text sequence is required. This class of models can translate texts, answer questions based on a text input, summarize a larger body of text, or generate text output based on a textual input. LLMs enable better performance compared to older models based on RNN, CNN, or LSTM architectures and further improve the performance of many other use cases discussed in this section. Examples of LLMs include [t5-small], [m2m100_418M], [gpt2], and [llama-2-7b].
2.1.16. Detecting fake video
A user is exposed to realistic fake videos generated by ‘deepfake’ on the web. The fake video can swap the speaker’s face into the president’s face to incite a user politically or to manipulate user’s opinion. The deepfake detection applications such as [FaceForensics++] analyze the videos and protect a user against the fake videos or images. When she watches a fake video on the web, the detection application alerts her of the fraud video in real-time.
2.2. Framework Use Cases
This section collects framework-level use cases for a dedicated low-level API for neural network inference hardware acceleration. It is expected that Machine Learning frameworks will be key consumers of the Web Neural Network API (WebNN API) and the low-level details exposed through the WebNN API are abstracted out from typical web developers. However, it is also expected that web developers with specific interest and competence in Machine Learning will want to interface with the WebNN API directly instead of a higher-level ML framework.
2.2.1. Custom Layer
A web application developer wants to run a DNN model on the WebNN API. However, she has found that some of activation functions like [LeakyReLU], [ELU], etc. are not included in the WebNN API. To address this issue, she constructs custom layers of the additional activation functions on top of the WebNN API. Note that the scope of custom layers may include convolution, normalization, etc. as well as activation.
2.2.2. Network Concatenation
A web application uses a DNN model, and its model data of upper convolutional layers and lower fully-connected layers are stored in separate files, since model data of the fully-connected layers are periodically updated due to fine tuning at the server side.
Therefore, the application downloads both partial model files at first and concatenates them into a single model. When the model is updated, the application downloads fine-tuned part of the model and replace only the fully-connected layers with it.
2.2.3. Performance Adaptation
A web application developer has a concern about performance of her DNN model on mobile devices. She has confirmed that it may run too slow on mobile devices which do not have GPU acceleration. To address this issue, her web application refers to the WebNN API to confirm whether acceleration is available or not, so that the application can display the warning for devices without acceleration.
After several weeks, she has developed a tiny DNN model that can even run on CPU. In order to accommodate CPU execution, she modifies the application so that the application loads the tiny model in the case of CPU-only devices.
2.2.4. Operation Level Execution
A JavaScript ML framework is responsible for loading, interpreting and executing a ML model. During the model execution phase, the framework iterates through the operations of the model and executes each operation on the hardware device, like CPU, GPU or ML accelerator. To avoid the unnecessary data copying across devices, the framework selects the same device to execute the operations. For a compute intensive operation, such as convolution 2D or matrix multiplication, the framework uses WebNN API to execute it with the ML-specific acceleration available on that selected device.
2.2.5. Integration with real-time video processing
The user experience of WebRTC-based video conferencing is enhanced using real-time video processing. For example, background blur implemented using a § 2.1.2 Semantic Segmentation model blurs the background in the user’s live camera feed. To satisfy the performance requirements of this use case, the WebNN API integrates with primitives from other Web APIs that make up the media pipeline to allow WebNN API-based transformation of real-time video streams.
3. Accessibility Considerations
This section provides guidance to web authors on how to improve accessibility of § 2.1 Application Use Cases enabled by neural network inference hardware acceleration. This guidance generalizes beyond the specific use cases outlined in this specification, and web authors are encouraged to consult [wcag] for further accessibility guidance and § 6 Ethical Considerations for digital accessibility in context of ethical principles.
§ 2.1.8 Image Captioning can be improved by ensuring the captions are surfaced to screen-reader and other Assistive Technology (AT) users. Web authors are encouraged to ensure the generated image captions are semantically linked to their respective images, either via the standard alt attribute, or other means which may depend on whether the descriptions are updated on initial page load, or later, as the result of user action.
§ 2.1.11 Emotion Analysis can mis-label and thus mis-classify users, leading to discriminatory experiences. Web authors are encouraged to expose confidence scores and give users an option to turn the feature off.
§ 2.1.13 Noise Suppression with aggressive filters can wipe out the speech of users with dysarthria, making captions and recognition fail. Web authors are encouraged to expose a bypass or sensitivity control, and not hard-wire noise suppression when live captions are active.
§ 2.2.5 Integration with real-time video processing with background-blur powered segmentation helps remove distractions, but can add too much delay that breaks lip-reading and live captions. Web authors are encouraged to provide an ability for user-facing keyboard- and screen-reader-operable “Background blur on/off” control, surfaced next to other accessibility/media settings.
§ 7.2 Device Selection allows web authors to indicate preferences for execution speed and power consumption. Implementers are encouraged to allow users to override the web author hint in browser UI to ensure that people on low-end or battery-sensitive devices can keep captions and other critical accessibility features responsive, especially on portable AAC or eye-gaze setups.
4. Security Considerations
This specification defines a low-level API for neural network inference hardware acceleration. This API is considered a powerful feature [POWERFUL-FEATURES] because it grants low-level access to a user’s computer. To meet the authentication and confidentiality expectations of a powerful feature and to prevent man-in-the-middle attacks, all interfaces defined by this specification are only available in a secure context.This API is disabled by default in all cross-origin frames using the § 7.5 Permissions Policy Integration. This prevents third-party content from using this API unless the embedding page explicitly sets a policy that grants permission.
This API allows creation of an MLContext from a GPUDevice defined by WebGPU specification. See WebGPU Security Considerations for more information regarding security characteristics of this context.
This API provides an abstraction across GPU, CPU, and dedicated ML accelerator hardware. When using a GPU, denial of service considerations similar to WebGPU apply. When using a CPU or a dedicated ML accelerator, the types of potential resource contention are different and mitigations will be implementation and configuration dependent. Implementations should use whatever mechanisms are available from the platform to prevent sites from using an unfair amount of system resources. These compute units are shared resources, and the use of any compute API will affect overall performance on a fully-loaded system.
Once the graph is fully constructed and compiled, the input shapes into each of the operations in the graph are inferred and finalized. The bounds checking occurs when the compute method is invoked that executes the graph against the actual data. No actual data is bound to the compiled graph before this stage. It is the implementation’s responsibility to make sure proper bounds checking occurs against the shapes of the data already inferred by that time.
Document operations susceptible to out-of-bounds access as a guidance to implementers.
Implementations must defend against control-flow attacks based on changes to data considered to be constant. For example, optimizations in the underlying platform may assume that a weight remains unchanged throughout a computation. If the API allowed the contents of buffers holding weights to change during a computation then those optimization assumptions would be invalidated, causing undefined behavior in the underlying platform. The API mitigates this category of attacks from script by always copying or transferring buffers, but implementations should consider additional defenses such as process isolation of data assumed to be constant.
As a future-proofing measure, the API design allows certain operations that can be generically emulated to be deprecated for security, performance, or other reasons without breaking compatibility. This is made possible by high-level functions that are defined in terms of smaller primitive operations defined in this specifications. This enables a native implementation of a high-level function to be replaced with a polyfill implementation.
Investigate side channel attack feasibility considering the current state where CPU is shared between processes running renderers.
In order to not allow an attacker to target a specific implementation that may contain a flaw, the § 7.2 Device Selection mechanism is a hint only, and the concrete device selection is left to the implementation - a user agent could for instance choose never to run a model on a device with known vulnerabilities. As a further mitigation, no device enumeration mechanism is defined.
Hinting partially mitigates the concern. Investigate additional mitigations.
The API design minimizes the attack surface for the compiled computational graph. The MLGraphBuilder interface that hosts the various operations is a data definition API and as such doesn’t execute anything, only constructs data. What follows, is that the potential for an attack is limited to when binding the data to the graph before executing it by invoking the MLContext.dispatch() method. This enables implementers to focus on hardening the MLContext.dispatch() method. For example, by making sure it honors the boundary of data and fails appropriately when the bounds are not respected.
Purpose-built Web APIs for measuring high-resolution time mitigate against timing attacks using techniques such as resolution reduction, adding jitter, detection of abuse and API call throttling [hr-time-3]. The practical deployment of WebNN implementations are likely to bring enough jitter to make timing attacks impractical (e.g. because they would use IPC) but implementers are advised to consider and test their implementations against timing attacks.
Note: Security risks related to Unicode sequences are discussed in context of the label USVString definition.
4.1. Guidelines for new operations
This section is non-normative.
To ensure operations defined in this specification are shaped in a way they can be implemented securely, this section includes guidelines on how operations are expected to be defined to reduce potential for implementation problems. These guidelines are expected to evolve over time to align with industry best practices:
-
Prefer simplicity of arguments
-
Don’t use parsers for complex data formats
-
If an operation can be decomposed to low level primitives:
-
Add an informative emulation path
-
Prefer primitives over new high level operations but consider performance consequences
-
-
Follow a consistent style for operation inputs and attributes
-
Share API shape and options for operation families such as pooling and reduction
-
Formalize failure cases into test cases whenever possible
-
When in doubt, leave it out: keep the API surface as small as possible to satisfy the use cases, but no smaller
-
Try to keep the API free of implementation details that might inhibit future evolution, do not overspecify
-
Fail fast: the sooner the web developer is informed of an issue, the better
In general, always consider the security and privacy implications as documented in [security-privacy-questionnaire] by the Technical Architecture Group and the Privacy Interest Group when adding new features.
5. Privacy Considerations
This API provides a privacy improvement over cloud-based inference alternatives by keeping sensitive user data within the browser’s sandbox. Input data such as images, audio, video streams, and other personal information never leave the user’s device, eliminating risks associated with data transmission to remote servers and third-party data processing.
However, as a powerful local compute API that interacts closely with hardware acceleration capabilities, the WebNN API has to balance performance optimization with privacy protection. The API includes multiple privacy-preserving measures to mitigate against fingerprinting while still enabling effective machine learning inference capabilities.
5.1. Fingerprinting
By design, this API aims to expose the minimum amount of information necessary to address the identified § 2 Use cases with the best performance and reliability of results. First, the API mitigates against fingerprinting through standardization: by defining consistent behavior across diverse platform APIs and by minimizing information leakage about the underlying hardware variation across conformant implementations. This is achieved through:
-
§ 7.3 Operators that are hardware-agnostic and minimize the exposure of low-level details of the underlying platform, in line with the principle of data minimization.
-
§ 8.2.1 MLContextOptions API that allows a web developer to indicate preference for execution speed and power consumption, but does not expose the actual device selected for execution, nor does it allow a web developer to enumerate or select specific devices. This hinting mechanism does not add to the entropy.
-
§ 8.3.7 opSupportLimits() API that allows a web developer to query support for specific operators using an explicit query API instead of inferring this information using a side channel. This API can contribute to fingerprintability, but its entropy can be reduced by limiting the number of distinguishable configurations exposed through this API using buckets as appropriate.
-
Standardized data types and tensor operations that work consistently across platforms.
-
Consistent error handling across different backend implementations.
The overall design ensures that implementations maintain a consistent interface across different platforms while providing the necessary functionality. By abstracting platform-specific details, the API can provide privacy-preserving predictable behavior regardless of whether the underlying acceleration is provided by CPU, GPU, or dedicated ML hardware.
Note: MLContextOptions is under active development, and the design is expected to change, informed by further implementation experience and new use cases from the wider web community.
MLGraph.devices API extension has been proposed to expose the actual devices selected for execution after the graph is fully constructed and compiled. Privacy implications of this API extension are under investigation. [Issue #836]
5.2. Execution Time Analysis
The timing characteristics of operations can provide some indirect information about the underlying hardware performance, a feature inherent to any compute API. In certain circumstances an execution time analysis can reveal indirectly the performance of the underlying platform’s neural network hardware acceleration capabilities relative to another underlying platform. See also § 4 Security Considerations for further discussion on timing attacks.
Note: The group welcomes further input on the proposed execution time analysis fingerprinting vector and mitigations.
5.3. WebGPU Comparison
Unlike WebGPU, this API does not intrinsically support custom shader authoring; and as a result is not prone to timing attacks that rely on shader caches, or other persistent data. The API builds upon pre-existing shaders and lower level primitives of the browser or the underlying OS. Web developers who interface with GPUDevice are expected to be aware of WebGPU compilation cache considerations.
The WebGPU API identifies machine-specific artifacts as a privacy consideration. Similarly, the WebNN API’s compute unit scheduling may under certain circumstances introduce a fingerprint. However, similarly to WebGPU, such fingerprints are identical across most or all of the devices of each vendor, mitigating the concern. Furthermore, software implementations can be used to further eliminate such artifacts.
In general, implementers of this API are expected to apply WebGPU Privacy Considerations to their implementations where applicable.
6. Ethical Considerations
The Working Group has started documenting ethical issues associated with using Machine Learning on the Web, to help identify what mitigations its normative specifications should take into account. The Working Group publishes and maintains an Ethical Principles for Web Machine Learning document [webmachinelearning-ethics] open to contributions from the wider community via a dedicated GitHub repository.
7. Programming Model
7.1. Overview
At the heart of neural networks is a computational graph of mathematical operations. These operations are the building blocks of modern machine learning technologies in computer vision, natural language processing, and robotics. The WebNN API is a specification for constructing, compiling, and executing computational graphs of neural networks.
The MLGraph interface represents a compiled computational graph that is immutable (that is, a model).
The MLGraphBuilder interface serves as a builder (factory) to construct a computational graph (its graph) that is then compiled to create an MLGraph.
In WebNN, a computational graph is composed of operators which act on data, and are the nodes of the graph. MLOperands are a representation of data that flows within the computational graph, and are the edges of the graph. MLOperands include a computational graph’s input values for inference, constants (including trained weights) used for inference, intermediate values (often referred to as activations) computed during inference, as well as the output values of inference. An operator’s input is one or more MLOperands. An operator’s output is one or more MLOperands. Operators have operator-specific parameters that control their behavior, which can include zero or more activation functions.
A key part of the MLGraphBuilder interface are methods such as gemm() and relu() which create an operator which represents the actual operation to perform on the input data when the computation is run, and return a new MLOperand holding the operator. Methods that create an MLOperand connect any inputs and activations to the operator. Each method invocation returns a distinct new value, without changing the value of any other MLOperand.
An operator has a label, a string which may be included in diagnostics such as exception messages. When an operator is created its label is initialized in an implementation-defined manner and may include the passed label.
Consider adding a mechanism for reporting errors during dispatch(). [Issue #778]
At inference time, every MLOperand will be bound to a tensor (the actual data), which are essentially multidimensional arrays. The representation of the tensors is implementation dependent, but it typically includes the array data stored in some buffer (memory) and some metadata describing the array data (such as its shape).
Operations within the computational graph have functional semantics. This allows the implementation to potentially share the array data between multiple tensors. For example, the implementation of operations such as reshape, or slice may return a view of its input tensor that shares the same buffer as the input tensor. (In the case of reshape, the entire data is shared, while in the case of slice, a part of the input data is shared.) The implementation may use views, as above, for intermediate values.
Before the execution, the computation graph that is used to compute one or more specified outputs needs to be converted, compiled, and optimized. The key purpose of the compilation step is to enable optimizations that span two or more operations, such as operation or loop fusion. The user agent may also perform these optimizations during graph conversion.
The MLGraphBuilder.build() method compiles the graph in the background without blocking the calling thread, and returns a Promise that resolves to an MLGraph. Each MLGraphBuilder can build at most one MLGraph.
The MLGraph underlying implementation will be composed of platform-specific representations of operators and operands which correspond to the MLGraphBuilder’s operators and MLOperands, but which are not script-visible and may be compositions or decompositions of the graph as constructed by script.
Once the MLGraph is constructed, the MLContext.dispatch() method performs the execution of the graph asynchronously either on a parallel timeline in a separate worker thread for the CPU execution or on a GPU timeline in a GPU command queue. This method returns immediately without blocking the calling thread while the actual execution is offloaded to a different timeline. The caller supplies the input values using MLNamedTensors, binding the input MLOperands to their values. The caller also supplies MLNamedTensors for output MLOperands which will contain the result of graph execution, if successful, which may be read back to script using the MLContext.readTensor(tensor) method. This type of execution supports CPU, GPU, and NPU devices.
7.2. Device Selection
An MLContext interface represents a global state of neural network execution. One of the important context states is the underlying execution device that manages the resources and facilitates the compilation and the eventual execution of the neural network graph. In addition to the default method of creation with MLContextOptions, an MLContext could also be created from a specific GPUDevice that is already in use by the application.
In a situation when a GPU context executes a graph with a constant or an input in the system memory as an ArrayBufferView, the input content is automatically uploaded from the system memory to the GPU memory, and downloaded back to the system memory of an ArrayBufferView output buffer at the end of the graph execution. This data upload and download cycles will only occur whenever the execution device requires the data to be copied out of and back into the system memory, such as in the case of the GPU. It doesn’t occur when the device is a CPU device. Additionally, the result of the graph execution is in a known layout format. While the execution may be optimized for a native memory access pattern in an intermediate result within the graph, the output of the last operation of the graph must convert the content back to a known layout format at the end of the graph in order to maintain the expected behavior from the caller’s perspective.
MLContext is created with MLContextOptions, the user agent selects and creates the underlying execution device by taking into account these options.
Depending on the underlying platform, the user agent may select different combinations of CPU, NPU and GPU devices.
For a history and rationale of this design, please see the device selection explainer.
7.3. Operators
This section is non-normative.
The WebNN API defines a set of operators required by well-known CNN and RNN, transformer and generative models that address key § 2.1 Application Use Cases. The details of each operator are defined in the normative sections of this specification, in alphabetical order by the operator name. These operators are grouped into categories based on their functionality in the following non-normative table to give a functional overview of the API surface.
Note: Some operators belong to multiple categories. For example, clamp() is both a math function and also used as an activation.
7.4. Task Source
The ML task source is a task source to be used for all tasks related to asynchronous compilation and execution of MLGraphs and creation of MLContexts.
To queue an ML task given a global object global and a series of steps steps, queue a global task on the ML task source with global and steps.
7.5. Permissions Policy Integration
This specification defines a policy-controlled feature identified by the
string "webnn".
Its default allowlist is 'self'.
8. API
8.1. The navigator.ml interface
An ML object is available in the Window and WorkerGlobalScope contexts through the Navigator
and WorkerNavigator interfaces respectively and is exposed via navigator.ml.
{ [, ]; }; ; ;
8.2. ML interface
{
,
,
};
{
= "default";
= ;
};
[, =(, )]
{
<> ( = {});
<> ( );
};
8.2.1. MLContextOptions
Note: MLContextOptions is under active development, and the design is expected to change, informed by further implementation experience and new use cases from the wider web community. The Working Group is considering additional API controls to allow the definition of a fallback device, multiple devices in a preferred order, or an exclusion of a specific device. Other considerations under discussion include error handling, ultimate fallback, and quantized operators. Feedback is welcome on any of these design considerations from web developers, library authors, OS and hardware vendors, and other stakeholders via GitHub. See § 5 Privacy Considerations for additional discussion of fingerprinting considerations.
The powerPreference option is an MLPowerPreference and indicates the application’s preference as related to power consumption. It is one of the following:
- "
default" - Let the user agent select the most suitable behavior.
- "
high-performance" - Prioritizes execution speed over power consumption.
- "
low-power" - Prioritizes power consumption over other considerations such as execution speed.
The accelerated option indicates the application’s preference as related to massively parallel acceleration. This option has less priority than powerPreference. When set to true (by default), the underlying platform will attempt to use the available massively parallel accelerators, such as a GPU or NPU, also depending on the powerPreference. When set to false, the application indicates it prefers CPU inference. If there is contradictory input, for instance when powerPreference is "high-performance" and accelerated is false, then the implementation will choose the best available match in the underlying platform (for instance a high performance CPU mode, or will ignore accelerated as it has less priority than powerPreference).
8.2.2. createContext()
-
options: anMLContextOptions. Provides the application’s preferences for the context. -
gpuDevice: aGPUDevice. A specific device to use with the context.
MLContext.
8.3. MLContext interface
The MLContext interface represents a global state of neural network compute workload and execution processes. Each MLContext object has associated context type and MLPowerPreference.
<, >;{ ; }; [, =(, )]{ ( , , ); <> ( ); <> ( , ); <> ( ); <> ( , ); ( , );(); (); ; <> ; };
MLContext has the following internal slots:
[[contextType]]of type context type.-
The
MLContext’s context type. [[powerPreference]]of typeMLPowerPreference.-
The
MLContext’sMLPowerPreference. [[accelerated]]of typeboolean.-
The
MLContext’s processing type (CPU or massively parallel processing). [[lost]]of typePromise<MLContextLostInfo>.-
A
Promisethat is resolved when theMLContext’s underlying execution device is no longer available. [[timeline]]-
A timeline associated with the execution of operations on the compute units of the
MLContext. These operations include inferencing on computational graphs and modifying the[[data]]ofMLTensors.More rigorously define this timeline. [Issue #529]
The context type is the type of the execution context that manages the resources and facilitates the compilation and execution of the neural network graph:
- "default"
- Context created per user preference options.
- "webgpu"
- Context created from WebGPU device.
8.3.1. dispatch()
Schedules the computational workload of a compiled MLGraph on the MLContext’s [[timeline]].
-
graph: anMLGraph. The computational graph to be executed. -
inputs: anMLNamedTensors. The inputs to the computational graph. -
outputs: anMLNamedTensors. The outputs of the computational graph.
Returns: undefined.
Note: dispatch() itself provides no signal that graph execution has completed. Rather, callers can await the results of reading back the output tensors. See § 8.3.1.1 Examples below.
When a constant operand is created using a tensor, it is legal for that tensor to be destroyed after build completes. Implementations are expected to ensure that the compiled graph remains valid and unaffected by such destruction.
8.3.1.1. Examples
8.3.2. createTensor()
Creates an MLTensor associated with this MLContext.
-
descriptor: anMLTensorDescriptor.
8.3.3. createConstantTensor()
Creates a constant MLTensor associated with this MLContext.
-
descriptor: anMLOperandDescriptor. -
inputData: anAllowSharedBufferSource. The buffer whose bytes will be written into the tensor.
8.3.4. readTensor(tensor)
Reads back the [[data]] of an MLTensor from the MLContext.[[timeline]] to script.
-
tensor: anMLTensor. The tensor to be read.
Returns: Promise<ArrayBuffer>. A buffer containing the result of the read.
8.3.5. readTensor(tensor, outputData)
Bring-your-own-buffer variant of readTensor(tensor). Reads back the [[data]] of an MLTensor into the provided buffer.
-
tensor: anMLTensor. The tensor to be read. -
outputData: anAllowSharedBufferSource. The buffer to read the result into.
8.3.6. writeTensor()
Writes data to the [[data]] of an MLTensor on the MLContext’s [[timeline]].
-
tensor: anMLTensor. The tensor to be written to. -
inputData: anAllowSharedBufferSource. The buffer whose bytes will be written into the tensor.
Returns: undefined.
Note: Similar to dispatch(), writeTensor() itself provides no signal that the write has completed. To inspect the contents of a tensor, callers can await the results of reading back the tensor.
8.3.7. opSupportLimits()
The opSupportLimits() exposes level of support that differs across implementations at operator level. Consumers of the WebNN API are encouraged to probe feature support level by using opSupportLimits() to determine the optimal model architecture to be deployed for each target platform.
Note: The opSupportLimits() API is not intended to provide additional entropy for browser fingerprinting. In current implementations this feature support information can be inferred from the OS and browser version alone. If the diversity of future implementations warrants it, this API allows future implementations to add new privacy mitigations e.g. to bucket capabilities similar to WebGPU to reduce entropy.
See § 5 Privacy Considerations for additional discussion of fingerprinting considerations.
8.3.7.1. MLOpSupportLimits dictionary
The MLOpSupportLimits has the following top level members, aside from these, each operator has a corresponding member defined in its builder method.
{
;
[] ;
;
;
;
};
preferredInputLayout, of type MLInputOperandLayout-
Preferred input layout for layout dependent operators like
conv2d(). maxTensorByteLength, of type unsigned long long-
The maximum supported length of tensors, in bytes.
input, of type MLTensorLimitsconstant, of type MLTensorLimitsoutput, of type MLTensorLimits
8.3.7.2. MLRankRange dictionary
{
;
;
};
min, of type unsigned long-
Minimum supported rank.
max, of type unsigned long-
Maximum supported rank.
8.3.7.3. MLTensorLimits dictionary
<>;{ ; ; };
dataTypes, of type MLDataTypeList-
Supported data types.
rankRange, of type MLRankRange-
Minimum and maximum supported ranks.
8.3.7.4. MLBinarySupportLimits dictionary
{
;
;
;
};
a, of type MLTensorLimits-
MLTensorLimitsfor a operand. b, of type MLTensorLimits-
MLTensorLimitsfor b operand. output, of type MLTensorLimits-
MLTensorLimitsfor output operand.
8.3.7.5. MLSingleInputSupportLimits dictionary
{
;
;
};
input, of type MLTensorLimits-
MLTensorLimitsfor input operand. output, of type MLTensorLimits-
MLTensorLimitsfor output operand.
8.3.8. destroy()
The destroy() method can be called to release all resources associated with the context. Any outstanding compute requests and MLTensor creation/read/write requests will fail.
8.3.9. Errors
When a user agent determines that an MLContext is no longer available to fulfill requests, it must run the context lost steps for it.
message, of type DOMString-
An implementation-defined message providing information about the error that occurred.
A MLContext is lost if its [[lost]] Promise is settled.
8.4. MLGraph interface
The MLGraph interface represents a compiled computational graph. A compiled graph once constructed is immutable and cannot be subsequently changed.
[, =(, )]
{
();
};
MLGraph has the following internal slots:
[[context]]of typeMLContext[[inputDescriptors]]of type record<USVString,MLOperandDescriptor>-
Maps the name of an input
MLOperandto itsMLOperandDescriptorfor all inputMLOperands of thisMLGraph. [[outputDescriptors]]of type record<USVString,MLOperandDescriptor>-
Maps the name of an output
MLOperandto itsMLOperandDescriptorfor all outputMLOperands of thisMLGraph. [[implementation]]-
The underlying implementation provided by the User Agent.
[[isDestroyed]]of typeboolean-
Whether the
MLGraph.destroy()method steps have been run. Once destroyed, theMLGraphcan no longer be used.
8.4.1. destroy()
The destroy() method can be called to release all resources associated with the graph.
Note: Since no further workloads can be enqueued using this graph, implementations can free any additional resource allocations associated with this graph once all previously submitted workloads using it are complete.
8.5. MLOperandDescriptor dictionary
An MLOperandDescriptor describes the shape (dimensions) and data type of an operand. They are used to describe the inputs and constants for an MLGraph, and every MLOperand has an internal MLOperandDescriptor.
{,};{,,,,,,,};{ ; <[] > ; };
dataType, of type MLOperandDataType-
The operand data type.
shape, of typesequence<[EnforceRange] unsigned long>-
The list of dimensions of the operand. It is empty for scalar operands.
MLOperandDescriptor A is equal to an MLOperandDescriptor B if A.dataType equals B.dataType and A.shape equals B.shape.
A valid dimension is an integer greater than zero and in the range of long. Implementations may impose a smaller upper bound.
A valid tensor count is an integer greater than zero and less or equal to 8192. Implementations may impose a smaller upper bound.
Should 0-size dimensions be supported? [Issue #391]
8.6. MLOperand interface
An MLOperand represents an intermediary graph being constructed as a result of compositing parts of an operation into a fully composed operation.
For instance, an MLOperand can represent a constant feeding to an operation or the result from combining multiple constants together into an operation. See also § 7 Programming Model.
[, =(, )]{ ; <> ; };{ = ""; }; ( ) ;
MLOperand has the following internal slots:
[[builder]]of typeMLGraphBuilder-
The
MLOperand’s associated builder object. [[descriptor]]of typeMLOperandDescriptor-
The
MLOperand’s descriptor. [[name]]of type string-
The
MLOperand’s name (only for input operands). [[operator]]of type operator[[constantTensor]]of typeMLTensor-
The
MLOperand’s tensor (only for constant operands).
An MLOperand’s dataType is its [[descriptor]].dataType.
An MLOperand’s shape is its [[descriptor]].shape.
An MLOperand’s rank is its shape’s size.
The dataType getter steps are to return this’s dataType.
The shape getter steps are to return this’s shape.
Since the [[builder]] object is bound by the MLGraphBuilder() constructor to an MLContext object, an MLOperand is also always bound to the same MLContext object.
If an operation supports only a subset of MLOperandDataTypes, the allowed data types for each of the operation’s input operands, including both positional arguments and options, are given as either an explicit list of MLOperandDataTypes, or a constraint that the operand’s dataType must be the same as the dataType of another input operand, or any to allow any MLOperandDataType.
Implementations may support fewer data types for operands than specified. This can be queried for each operation using the opSupportLimits() method on MLContext and inspecting the dataTypes value of the corresponding member for the operation.
Should we specify the subset of data types that must be supported for each operator?
If an operation requires input operands with a particular rank, the allowed ranks for each of the operation’s input operands, including both positional arguments and options, are given as an explicit rank (e.g. 1), or N to allow any dimensionality, or the same as another operand. More specific constraints are common, such as when an input operand’s shape must be unidirectionally broadcastable to or bidirectionally broadcastable with another input operand; in these cases, the allowed ranks are listed as a range, with specific validation given as steps in the operation.
Implementations may impose a more restricted lower bound and/or upper bound on the rank of operands than specified. This can be queried for each operation using the opSupportLimits() method on MLContext and inspecting the rankRange.min and rankRange.max values of the corresponding member for the operation.
MLOperatorOptions has the following members:
label, of type USVString, defaulting to""-
Optionally provided when an operator is created using
MLGraphBuildermethods that createMLOperands. The implementation may use this value to initialize the operator’s label.
Note: The label is not intended to be a natural language string. It is a language-independent identifier, analogous to a variable name or error code, like "mul#1234".
Note: Implementations are encouraged to use the label provided by developers to enhance error messages and improve debuggability, including both synchronous errors during graph construction and for errors that occur during the asynchronous build() method.
When displaying labels provided by developers via label in debugging tools, logs, or error messages, implementations should sanitize the output to prevent security risks, such as injection of malicious Unicode sequences (e.g. Bidirectional Text Spoofing [UTR36], Source Code Spoofing [UTS55] and other concerns). For example, implementations should escape or filter control characters (e.g., U+202A to U+202E, U+2066 to U+2069) or use a safe rendering mechanism to neutralize potential spoofing.
8.6.1. Creating an MLOperand
The MLOperand objects are created by the methods of MLGraphBuilder, internally using the following algorithms.
To validate operand given MLGraphBuilder builder and MLOperand operand, return true if operand.[[builder]] is builder, and false otherwise.
8.6.1.1. MLNumber
MLNumber is used when specifying the type of a numeric option for an MLOperand which can be of any MLOperandDataType, including both 64-bit integer types ("uint64" and "int64") and 32-bit floating point ("float32"). Implementations process the value according to the corresponding MLOperandDataType. For example, if clamp(input, options) is called with an MLOperand with dataType "uint32", the MLNumber parameters are explicitly cast to unsigned long.
double would lose accuracy when passing values over 253, and specifying long long would disallow values over 263.
Support for unions of bigint and numeric types is new in [WEBIDL], and implementation support is also limited. Prototype implementations are encouraged to provide feedback for this approach. [whatwg/webidl Issue #1388]
8.7. MLTensorDescriptor dictionary
An MLTensorDescriptor describes the characteristics and capabilities of an MLTensor.
: {
= ;
= ;
};
readable, of type boolean, defaulting tofalse-
Whether the tensor’s contents can be read via
readTensor(tensor)orreadTensor(tensor, outputData). writable, of type boolean, defaulting tofalse-
Whether the tensor’s contents can be written to via
writeTensor().
8.8. MLTensor interface
The MLTensor interface represents a tensor which may be used as an input or output to an MLGraph. The memory backing an MLTensor should be allocated in an implementation-defined fashion according to the requirements of the MLContext and the MLTensorDescriptor used to create it. Operations involving the [[data]] of an MLTensor occur on the [[timeline]] of its associated MLContext.
The implementation-defined requirements of how an MLTensor is allocated may include constraints such as that the memory is allocated with a particular byte alignment or in a particular memory pool.
[, =(, )]
{
;
<> ;
;
;
;
();
};
MLTensor has the following internal slots:
[[context]]of typeMLContext-
The
MLTensor’s associated context. [[descriptor]]of typeMLTensorDescriptor-
The
MLTensor’s descriptor. [[pendingPromises]]of type set ofPromises-
Promises corresponding to
MLContext.readTensor(tensor)method calls which are in-progress and have yet to resolve. All pending promises will be rejected when theMLTensoris destroyed. [[isDestroyed]]of typeboolean-
Whether the
MLTensor.destroy()steps have been run. Once destroyed, theMLTensorcan no longer be used. [[data]]of an implementation-defined type-
The bytes backing the
MLTensor. This data may only be accessed or modified from the[[context]].[[timeline]]. [[isConstant]]of typeboolean-
Whether the
MLTensorwas created by create a constant MLTensor.
An MLTensor’s dataType is its [[descriptor]]’s dataType.
An MLTensor’s shape is its [[descriptor]]’s shape.
The dataType getter steps are to return this’s dataType.
The shape getter steps are to return this’s shape.
The readable getter steps are to return this.[[descriptor]].readable.
The writable getter steps are to return this.[[descriptor]].writable.
The constant getter steps are to return this’s [[isConstant]].
8.8.1. Creating an MLTensor
An MLTensor is created by its associated MLContext.
8.8.2. destroy()
Releases the resources associated with the MLTensor. This method is idempotent.
undefined.
Note: Since no further operations can be enqueued using this tensor, implementations can free any additional resource allocations associated with this tensor once all previously submitted operations using it are complete.
8.8.3. Creating a constant MLTensor
A constant MLTensor is created by its associated MLContext.
8.9. MLGraphBuilder interface
The MLGraphBuilder interface defines a set of operations as identified by the § 2 Use cases that can be composed into a computational graph. It also represents the intermediate state of a graph building session.
<, >; [, =(, )]{ // Construct the graph builder from the context. ( ); // Create an operand for a graph input. ( , ); // Create an operand for a graph constant. ( , ); // Create a scalar operand from the specified number of the specified type. ( , ); // Create an operand from a specified constant tensor. ( ); // Compile the graph up to the specified output operands asynchronously. <> ( ); };
MLGraphBuilder.build() method compiles the graph builder state up to the specified output operands into a compiled graph according to the type of MLContext that creates it. When the [[contextType]] of the MLContext is set to "default", the compiled graph is initialized right before the MLGraph is returned. This graph initialization stage is important for optimal performance of the subsequent graph executions. It typically involves a process known as "weight preprocessing" where all the constant inputs to the graph are preprocessed and cached at the operating system level for subsequent graph execution calls. The initializing inputs are typically the constant weight data specified through the constant() method as constant operands during graph construction time.
MLGraphBuilder has the following internal slots:
[[context]]of typeMLContext-
The context of type
MLContextassociated with thisMLGraphBuilder. [[hasBuilt]]of typeboolean-
Whether
MLGraphBuilder.build()has been called. Once built, theMLGraphBuildercan no longer create operators or compileMLGraphs.
8.9.1. MLGraphBuilder constructor
-
context: anMLContext. The context to associate with theMLGraphBuilder.
8.9.2. input operands
Create a named MLOperand based on a descriptor, that can be used as an input.
-
name: a string name of the input. -
descriptor: anMLOperandDescriptorobject.
MLOperand.
MLGraphBuilder API allows creating an MLGraph without input operands. If the underlying platform doesn’t support that, implementations can add a stub input, or pass constants as inputs to the graph.
8.9.3. constant operands
Create a constantMLOperand that can be used in MLGraphBuilder methods.
8.9.3.1. constant(descriptor, buffer)
Create a constant MLOperand of the specified data type and shape that contains the initializing data.
-
descriptor: anMLOperandDescriptor. The descriptor of the output tensor. -
buffer: anAllowSharedBufferSource. The buffer containing the initializing data.
MLOperand. The constant output tensor.
8.9.3.2. constant(tensor)
Create a constant MLOperand of the specified data type and shape that contains the initialized data.
-
tensor: anMLTensor. The constant tensor containing the initialized data.
MLOperand. The constant output tensor.
8.9.3.3. constant(dataType, value)
Create a scalar constant MLOperand of the specified value and data type.
"int8" data type, etc.
-
dataType: anMLOperandDataType. -
value: anMLNumber. The value of the constant.
MLOperand. The constant output.
8.9.4. build method
Build a composed graph up to a given output operand into a computational graph asynchronously.-
outputs: anMLNamedOperands. Identifies theMLOperands that will be the outputs of the graph.
MLGraph>.
NOTE: Specifying an input operand or constant operand as a graph output results in an error, as this is usually an incorrect usage of the API. Callers can work around this by introducing an identity() operator.
8.9.5. argMin/argMax operations
Return the index location of the minimum or maximum values of all the input values along the axis. In case of ties, the identity of the return value is implementation dependent. : {
= ;
= "int32";
};
{
( , [] ,
= {});
( , [] ,
= {});
};
{
;
;
};
MLArgMinMaxOptions has the following members:
keepDimensions, of type boolean, defaulting tofalse-
If true, retains reduced dimensions with size 1.
outputDataType, of type MLOperandDataType, defaulting to"int32"-
An
MLOperandDataType. The output data type.
-
input: anMLOperand. The input N-D tensor. -
axis: The dimension to reduce. The value must be in the range [0, N-1] where N is the rank of the input tensor. -
options: an optionalMLArgMinMaxOptions. The optional parameters of the operation.
Returns: an MLOperand. The output N-D tensor of rank equal to input’s rank if keepDimensions is true or the input’s rank - 1 if keepDimensions is false. The values must be of type outputDataType in the range [0, N-1] where N is the size of the input dimension specified by axis.
| operand | allowed data types | allowed ranks |
|---|---|---|
input
| any | 1 to N |
| output | "int32", "int64"
| N |
MLOpSupportLimits has the following members for argMin() and argMax():
argMin, of type MLSingleInputSupportLimits-
Support limits for operator
argMin(). argMax, of type MLSingleInputSupportLimits-
Support limits for operator
argMax().
8.9.6. batchNormalization
Normalize the values of the input tensor using [Batch-Normalization]. For each input feature, the mean and variance values of that feature are computed across all the samples in the batch dimension while the model is trained. These mean and variance values are then subsequently given to this operation during model inference.: { ; ; [] = 1; = 1e-5; }; { ( , , , = {}); };{ ; ; ; ; ; ; }; { ; };
MLBatchNormalizationOptions has the following members:
scale, of type MLOperand-
The 1-D tensor of the scaling values whose size is equal to the size of the input dimension denoted by
axis. bias, of type MLOperand-
The 1-D tensor of the bias values whose size is equal to the size of the input dimension denoted by
axis. axis, of type unsigned long, defaulting to1-
The index to the feature count dimension of the input shape for which the mean and variance values are. Its value must be in the range [0, N-1] where N is the rank of the input tensor. The default value is 1, corresponding to the channel ("c") dimension in the
"nchw"data layout. epsilon, of type double, defaulting to1e-5-
A small value to prevent computational error due to divide-by-zero.
-
input: anMLOperand. The input N-D tensor. -
mean: anMLOperand. Specifies the 1-D tensor of the mean values of the input features across the batch. Its size is equal to the size of the input dimension denoted byaxis. -
variance: anMLOperand. The 1-D tensor of the variance values of the input features across the batch whose size is equal to the size of the input dimension denoted byaxis. -
options: an optionalMLBatchNormalizationOptions. Specifies the optional parameters of the operation.
Returns: an MLOperand. The batch-normalized N-D tensor of the same shape as input.
| operand | allowed data types | allowed ranks |
|---|---|---|
input
| "float32", "float16"
| 1 to N |
mean
| same as input
| 1 |
variance
| same as input
| 1 |
scale
| same as input
| 1 |
bias
| same as input
| 1 |
| output | same as input
| same as input
|
MLBatchNormalizationSupportLimits has the following members:
input, of type MLTensorLimits-
MLTensorLimitsfor input operand. mean, of type MLTensorLimits-
MLTensorLimitsfor mean operand. variance, of type MLTensorLimits-
MLTensorLimitsfor variance operand. scale, of type MLTensorLimits-
MLTensorLimitsfor scale operand. bias, of type MLTensorLimits-
MLTensorLimitsfor bias operand. output, of type MLTensorLimits-
MLTensorLimitsfor output operand.
MLOpSupportLimits has the following members for batchNormalization():
batchNormalization, of type MLBatchNormalizationSupportLimits-
Support limits for operator
batchNormalization().
8.9.7. cast
Cast each element in the input tensor to the target data type. {
( ,
,
= {});
};
{
;
};
-
input: anMLOperand. The input N-D tensor. -
dataType: anMLOperandDataType. The target data type. -
options: anMLOperatorOptions. Specifies the optional parameters of the operation.
Returns: an MLOperand. The N-D tensor of the same shape as input with each element casted to the target data type.
| operand | allowed data types | allowed ranks |
|---|---|---|
input
| any | N |
| output | any | same as input
|
MLOpSupportLimits has the following members for cast():
cast, of type MLSingleInputSupportLimits-
Support limits for operator
cast().
Casting between MLOperandDataTypes is specified for some cases and implementation-defined in other cases, according to the following table:
| Target type Input type |
"float32",
"float16"
|
"int32",
"uint32",
"int64",
"uint64",
"int8",
"uint8"
|
|---|---|---|
"float32",
"float16"
|
If in range, nearest representable value.
If out of range, +/-Infinity. |
If in range, truncated.
If out of range, implementation-defined. |
"int32",
"uint32",
"int64",
"uint64",
"int8",
"uint8"
|
If in range, nearest representable value.
If out of range, +/-Infinity. |
If in range, same value.
If out of range, lowest N bits reinterpreted as target type, assuming two’s complement for signed types. |
NOTE: For example, casting -1 from "int8" to "uint8" is specified to yield 255. But casting -1 from "float32" to "uint8" is implementation-defined.
8.9.8. clamp
Clamp the input tensor element-wise within a range specified by the minimum and maximum values. : {
;
;
};
{
( , = {});
};
{
;
};
MLClampOptions has the following members:
minValue, of type MLNumber-
The minimum value of the range. When it is not specified, the clamping is not performed on the lower limit of the range.
maxValue, of type MLNumber-
The maximum value of the range. When it is not specified, the clamping is not performed on the upper limit of the range.
-
input: anMLOperand. The input tensor. -
options: an optionalMLClampOptions. The optional parameters of the operation.
| operand | allowed data types | allowed ranks |
|---|---|---|
input
| any | N |
| output | same as input
| same as input
|
MLOpSupportLimits has the following member for clamp():
clamp, of type MLSingleInputSupportLimits-
Support limits for operator
clamp().
8.9.9. concat
Concatenates the input tensors along a given axis. {
(<> ,
[] ,
= {});
};
{
;
;
};
{
;
};
-
inputs: a sequence<MLOperand>. All input tensors must have the same shape, except for the size of the dimension to concatenate on. -
axis: anunsigned longscalar. The axis that the inputs concatenate along. Its value must be in the range [0, N-1] where N is the rank of the input tensors. -
options: anMLOperatorOptions. Specifies the optional parameters of the operation.
Returns: an MLOperand. The concatenated tensor of all the inputs along
the axis. The output tensor has the same shape except on the dimension
that all the inputs concatenated along. The size of that dimension is
computed as the sum of all the input sizes of the same dimension.
| operand | allowed data types | allowed ranks |
|---|---|---|
inputs’s items
| any | 1 to N |
| output | same as inputs’s items
| same as inputs’s items
|
MLConcatSupportLimits has the following members:
inputs, of type MLTensorLimits-
MLTensorLimitsfor all input operands. output, of type MLTensorLimits-
MLTensorLimitsfor output operand.
MLOpSupportLimits has the following member for concat():
concat, of type MLConcatSupportLimits-
Support limits for operator
concat().
8.9.10. conv2d
Compute a 2-D convolution given 4-D input and filter tensors{,,,};: { <[] > ; <[] > ; <[] > ; [] = 1; = "nchw"; = "oihw"; ; }; { ( , , = {}); };{ ; ; ; ; }; { ; };
MLConv2dOptions has the following members:
padding, of typesequence<[EnforceRange] unsigned long>-
A list of length 4: [beginningHeight, endingHeight, beginningWidth, endingWidth]. Specifies the additional rows and columns added to the beginning and ending of each spatial dimension of the convolution input. The default value is [0, 0, 0, 0].
strides, of typesequence<[EnforceRange] unsigned long>-
A list of length 2: [strideHeight, strideWidth]. Specifies the stride of the sliding window for each spatial dimension of the convolution input. The default value is [1, 1].
dilations, of typesequence<[EnforceRange] unsigned long>-
A list of length 2: [dilationHeight, dilationWidth]. Specifies the dilation factor for each spatial dimension applied on the convolution filter (kernel). The default value is [1, 1].
groups, of type unsigned long, defaulting to1-
The number of groups that input channels and output channels are divided into.
inputLayout, of type MLInputOperandLayout, defaulting to"nchw"-
Specifies the layout format of the input and output tensor as follows:
filterLayout, of type MLConv2dFilterOperandLayout, defaulting to"oihw"-
Specifies the layout format of the filter tensor as follows:
bias, of type MLOperand-
An additional 1-D tensor with the shape of [outputChannels] whose values are to be added to the convolution result.
-
input: anMLOperand. The input 4-D tensor. The logical shape is interpreted according to the value ofinputLayout. -
filter: anMLOperand. The filter 4-D tensor. The logical shape is interpreted according to the value offilterLayoutandgroups. -
options: anMLConv2dOptions. The optional parameters of the operation.
Returns: an MLOperand. The output 4-D tensor that contains the convolution result. The output shape is interpreted according to inputLayout. More specifically, the spatial dimensions or the sizes of the last two dimensions of the output tensor for the "nchw" input layout can be calculated as follows:
outputSize = 1 + (inputSize - (filterSize - 1) * dilation - 1 + beginningPadding + endingPadding) / stride
| operand | allowed data types | allowed ranks |
|---|---|---|
input
| "float32", "float16"
| 4 |
filter
| same as input
| 4 |
bias
| same as input
| 1 |
| output | same as input
| 4 |
MLConv2dSupportLimits has the following members:
input, of type MLTensorLimits-
MLTensorLimitsfor input operand. filter, of type MLTensorLimits-
MLTensorLimitsfor filter operand. bias, of type MLTensorLimits-
MLTensorLimitsfor bias operand. output, of type MLTensorLimits-
MLTensorLimitsfor output operand.
MLOpSupportLimits has the following member for conv2d():
conv2d, of type MLConv2dSupportLimits-
Support limits for operator
conv2d().
groups = inputChannels = outputChannels and the shape of filter tensor is [options.groups, 1, height, width]
for "oihw" layout, [height, width, 1, options.groups] for "hwio" layout, [options.groups, height, width, 1] for "ohwi" layout and [1, height, width, options.groups] for "ihwo" layout.
8.9.11. convTranspose2d
Compute a 2-D transposed convolution given 4-D input and filter tensors{,,};: { <[] > ; <[] > ; <[] > ; <[] > ; <[] > ; [] = 1; = "nchw"; = "iohw"; ; }; { ( , , = {}); }; { ; };
MLConvTranspose2dOptions has the following members:
padding, of typesequence<[EnforceRange] unsigned long>-
A list of length 4: [beginningHeight, endingHeight, beginningWidth, endingWidth]. Specifies the additional rows and columns added to the beginning and ending of each spatial dimension of the convolution input. The default value is [0, 0, 0, 0].
strides, of typesequence<[EnforceRange] unsigned long>-
A list of length 2: [strideHeight, strideWidth]. Specifies the stride of the sliding window for each spatial dimension of the convolution input. The default value is [1, 1].
dilations, of typesequence<[EnforceRange] unsigned long>-
A list of length 2: [dilationHeight, dilationWidth]. Specifies the dilation factor for each spatial dimension applied on the convolution filter (kernel). The default value is [1, 1].
outputPadding, of typesequence<[EnforceRange] unsigned long>-
A list of length 2. Specifies the padding values applied to each spatial dimension of the output tensor. The explicit padding values are needed to disambiguate the output tensor shape for transposed convolution when the value of the
stridesis greater than 1.Note that these values are only used to disambiguate output shape when needed; it does not necessarily cause any padding value to be written to the output tensor.
The default value is [0, 0].
outputSizes, of typesequence<[EnforceRange] unsigned long>-
A list of length 2. Specifies the sizes of the last two dimensions of the output tensor. When the output sizes are explicitly specified, the output padding values in
outputPaddingare ignored.If not specified, the output sizes are automatically computed.
groups, of type unsigned long, defaulting to1-
The number of groups that input channels and output channels are divided into.
inputLayout, of type MLInputOperandLayout, defaulting to"nchw"-
Specifies the layout format of the input and output tensor as follows:
filterLayout, of type MLConvTranspose2dFilterOperandLayout, defaulting to"iohw"-
Specifies the layout format of the filter tensor as follows:
bias, of type MLOperand-
An additional 1-D tensor with the shape of [outputChannels] whose values are to be added to the convolution result.
-
input: anMLOperand. The input 4-D tensor. The logical shape is interpreted according to the value ofinputLayout. -
filter: anMLOperand. The filter 4-D tensor. The logical shape is interpreted according to the value offilterLayoutandgroups. -
options: an optionalMLConvTranspose2dOptions.
Returns: an MLOperand. The output 4-D tensor that contains the transposed convolution result. The output shape is interpreted according to inputLayout. More specifically, unless outputSizes is explicitly specified, outputPadding is needed to compute the spatial dimension values of the output tensor as follows:
outputSize = (inputSize - 1) * stride + (filterSize - 1) * dilation + 1 - beginningPadding - endingPadding + outputPadding
| operand | allowed data types | allowed ranks |
|---|---|---|
input
| "float32", "float16"
| 4 |
filter
| same as input
| 4 |
bias
| same as input
| 1 |
| output | same as input
| 4 |
MLOpSupportLimits has the following member for convTranspose2d():
convTranspose2d, of type MLConv2dSupportLimits-
Support limits for operator
convTranspose2d().
8.9.12. cumulativeSum
Compute the accumulated sum of a series of values along the given axis, either including or excluding the current value. : {
= ;
= ;
};
{
( ,
,
= {});
};
{
;
};
| operand | allowed data types | allowed ranks |
|---|---|---|
input
| "float32", "float16", "int32", "uint32", "int64", "uint64"
| 1 to N |
| output | same as input
| same as input
|
MLCumulativeSumOptions has the following members:
exclusive, of type boolean, defaulting tofalse-
Whether to include or exclude the current value in the output, meaning inclusive prefix sum or exclusive prefix sum [Prefix-sum]. Given input [1,2,3,4], inclusive summation would yield an output of [1,3,6,10] whereas exclusive would yield [0,1,3,6]. The default is inclusive.
reversed, of type boolean, defaulting tofalse-
Whether to reverse the summation direction along the active axis to instead start from the high coordinate to low coordinate. Given input [1,2,3,4], inclusive forward summation would yield an output of [1,3,6,10] whereas inclusive backward summation would yield [10,9,7,4]. The default is forward.
-
input: anMLOperand. The input tensor. -
axis: anunsigned longscalar. The axis the summation will be performed on. Its value must be in the range [0, N-1] where N isinput’s rank. -
options: anMLCumulativeSumOptions. Specifies the optional parameters of the operation.
Returns:
MLOpSupportLimits has the following member for cumulativeSum():
cumulativeSum, of type MLSingleInputSupportLimits-
Support limits for operator
cumulativeSum().
8.9.13. Element-wise binary operations
Compute the element-wise binary addition, subtraction, multiplication, division, power, maximum and minimum of the two input tensors.The operation will be broadcast according to [numpy-broadcasting-rule]. The input tensors must be bidirectionally broadcastable. The rank of the output tensor is the maximum rank of the input tensors. For each dimension of the output tensor, its size is the maximum size along that dimension of the input tensors.
{
( , , = {});
( , , = {});
( , , = {});
( , , = {});
( , , = {});
( , , = {});
( , , = {});
};
{
;
;
;
;
;
;
;
};
-
a: anMLOperand. The first input tensor. -
b: anMLOperand. The second input tensor. -
options: anMLOperatorOptions. Specifies the optional parameters of the operation.
Returns: an MLOperand. The output tensor that contains the result of
element-wise binary operation of the two input tensors.
-
add: Add the values of the two input tensors, element-wise.
-
sub: Subtract the values of the second input tensor from the values of the first input tensor, element-wise.
-
mul: Multiply the values of the two input tensors, element-wise.
-
div: Divide the values of the first input tensor with the values of the second tensor, element-wise. Integer types are truncated toward zero.
-
max: Select the greater values of the two input tensors, element-wise.
-
min: Select the lesser values of the two input tensors, element-wise.
-
pow: Compute the values of the values of the first input tensor to the power of the values of the second input tensor, element-wise.
| operand | allowed data types | allowed ranks |
|---|---|---|
a
| any | N |
b
| same as a
| N |
| output | same as a
| N |
MLOpSupportLimits has the following members for element-wise binary operations:
add, of type MLBinarySupportLimits-
Support limits for operator
add(). sub, of type MLBinarySupportLimits-
Support limits for operator
sub(). mul, of type MLBinarySupportLimits-
Support limits for operator
mul(). div, of type MLBinarySupportLimits-
Support limits for operator
div(). max, of type MLBinarySupportLimits-
Support limits for operator
max(). min, of type MLBinarySupportLimits-
Support limits for operator
min(). pow, of type MLBinarySupportLimits-
Support limits for operator
pow().
8.9.14. Element-wise logical operations
Compare input tensors element-wise and return a"uint8" tensor of values 0 (false) or 1 (true) for the comparisons. For single-operand operations, return the logical results of the operation.
For multiple-operand operations, the operation will be broadcast according to [numpy-broadcasting-rule]. The input tensors must be bidirectionally broadcastable. The rank of the output tensor is the maximum rank of the input tensors. For each dimension of the output tensor, its size is the maximum size along that dimension of the input tensors.
{
( ,
,
= {});
( ,
,
= {});
( ,
,
= {});
( ,
,
= {});
( ,
,
= {});
( ,
,
= {});
( , = {});
( ,
,
= {});
( ,
,
= {});
( ,
,
= {});
( , = {});
( , = {});
};
{
;
;
};
{
;
;
;
;
;
;
;
;
;
;
;
;
};
-
a: anMLOperand. The first input tensor. -
b: anMLOperand. The second input tensor when specified. -
options: anMLOperatorOptions. Specifies the optional parameters of the operation.
Returns: an MLOperand. The output tensor that contains the result of element-wise comparison of the two input tensors.
| operand | allowed data types | allowed ranks |
|---|---|---|
a
| specified as part of operation steps | N |
b
| same as a
| N |
| output | "uint8"
| N |
MLLogicalNotSupportLimits has the following members:
a, of type MLTensorLimits-
MLTensorLimitsfor a operand. output, of type MLTensorLimits-
MLTensorLimitsfor output operand.
MLOpSupportLimits has the following members for element-wise logical operations:
equal, of type MLBinarySupportLimits-
Support limits for operator
equal(). notEqual, of type MLBinarySupportLimits-
Support limits for operator
notEqual(). greater, of type MLBinarySupportLimits-
Support limits for operator
greater(). greaterOrEqual, of type MLBinarySupportLimits-
Support limits for operator
greaterOrEqual(). lesser, of type MLBinarySupportLimits-
Support limits for operator
lesser(). lesserOrEqual, of type MLBinarySupportLimits-
Support limits for operator
lesserOrEqual(). logicalNot, of type MLLogicalNotSupportLimits-
Support limits for operator
logicalNot(). logicalAnd, of type MLBinarySupportLimits-
Support limits for operator
logicalAnd(). logicalOr, of type MLBinarySupportLimits-
Support limits for operator
logicalOr(). logicalXor, of type MLBinarySupportLimits-
Support limits for operator
logicalXor(). isNaN, of type MLLogicalNotSupportLimits-
Support limits for operator
isNaN(). isInfinite, of type MLLogicalNotSupportLimits-
Support limits for operator
isInfinite().
-
equal: Compare if the values of the two input tensors are equal, element-wise.
-
notEqual: Compare if the values of the two input tensors are not equal, element-wise.
-
greater: Compare if the values of the first input tensor is greater, element-wise.
-
greaterOrEqual: Compare if the values of the first input tensor is greater or equal, element-wise.
-
lesser: Compare if the values of the first input tensor is lesser, element-wise.
-
lesserOrEqual: Compare if the values of the first input tensor is lesser or equal, element-wise.
-
logicalNot: Invert the values of the input tensor to values 0 or 1, element-wise. Specifically, when the input value is non-zero, invert it to 0. Conversely, for a zero input value, invert it to 1.
-
logicalAnd: Compute the logical and of the two input tensors, element-wise, treating any non-zero value as true and returning elements of 0 or 1.
-
logicalOr: Compute the logical or of the two input tensors, element-wise, treating any non-zero value as true and returning elements of 0 or 1.
-
logicalXor: Compute the logical xor of the two input tensors, element-wise, treating any non-zero value as true and returning elements of 0 or 1.
-
isNaN: Check if the values of the input tensor are invalid numeric representations (NaN’s), element-wise, returning 1’s for NaN’s and 0 otherwise.
-
isInfinite: Check if the values of the input tensor are infinite, element-wise, returning 1’s for positive or negative infinity and 0 otherwise.
greaterOrEqual() and lesserOrEqual() can each be implemented in terms of operations logicalNot(), lesser(), and greater() (in other words builder.greaterOrEqual(a, b) is builder.logicalNot(builder.lesser(a, b))), they are specifically defined to handle NaN cases and for performance reason to avoid double comparisons.
8.9.15. Element-wise unary operations
Compute the element-wise unary operation for input tensor. {
( , = {});
( , = {});
( , = {});
( , = {});
( , = {});
( , = {});
( , = {});
( , = {});
( , = {});
( , = {});
( , = {});
( , = {});
( , = {});
( , = {});
( , = {});
};
{
;
;
;
;
;
;
;
;
;
;
;
;
;
;
;
};
-
input: anMLOperand. The input tensor. -
options: anMLOperatorOptions. Specifies the optional parameters of the operation.
Returns: an MLOperand. The output tensor that contains the result of
element-wise unary operation of the input tensor. The shape of the output
tensor is the same as the shape of input tensor.
| operand | allowed data types | allowed ranks |
|---|---|---|
input
| specified as part of operation steps | N |
| output | same as input
| same as input
|
MLOpSupportLimits has the following members for element-wise unary operations:
abs, of type MLSingleInputSupportLimits-
Support limits for operator
abs(). ceil, of type MLSingleInputSupportLimits-
Support limits for operator
ceil(). cos, of type MLSingleInputSupportLimits-
Support limits for operator
cos(). erf, of type MLSingleInputSupportLimits-
Support limits for operator
erf(). exp, of type MLSingleInputSupportLimits-
Support limits for operator
exp(). floor, of type MLSingleInputSupportLimits-
Support limits for operator
floor(). identity, of type MLSingleInputSupportLimits-
Support limits for operator
identity(). log, of type MLSingleInputSupportLimits-
Support limits for operator
log(). neg, of type MLSingleInputSupportLimits-
Support limits for operator
neg(). reciprocal, of type MLSingleInputSupportLimits-
Support limits for operator
reciprocal(). roundEven, of type MLSingleInputSupportLimits-
Support limits for operator
roundEven(). sin, of type MLSingleInputSupportLimits-
Support limits for operator
sin(). sign, of type MLSingleInputSupportLimits-
Support limits for operator
sign(). sqrt, of type MLSingleInputSupportLimits-
Support limits for operator
sqrt(). tan, of type MLSingleInputSupportLimits-
Support limits for operator
tan().
-
abs: Compute the absolute value of the input tensor, element-wise.
-
ceil: Compute the ceiling of the input tensor, element-wise.
-
cos: Compute the cosine of the input tensor, element-wise.
-
erf: Compute the error function [Error-Function] of the input tensor, element-wise.
-
exp: Compute the exponential of the input tensor, element-wise.
-
floor: Compute the floor of the input tensor, element-wise.
-
identity: Copy the value of the input tensor to the output tensor, element-wise.
-
log: Compute the natural logarithm of the input tensor, element-wise.
-
neg: Compute the numerical negative value of the input tensor, element-wise.
-
reciprocal: Compute the reciprocal of the input tensor, element-wise.
-
roundEven: Round the input tensor with halves to the nearest even value, element-wise (e.g. [0.1, 0.9, 1.1, 1.9, -3.5, -2.5, -1.5, 1.5, 2.5, 3.5] yields [0.0, 1.0, 1.0, 2.0, -4.0, -2.0, -2.0, 2.0, 2.0, 4.0]).
-
sin: Compute the sine of the input tensor, element-wise.
-
sign: Compute the sign (-1, 0, 1) of the input tensor, element-wise, returning 1 if > 0, -1 if < 0, and 0 otherwise.
-
sqrt: Compute the square root of the input tensor, element-wise.
-
tan: Compute the tangent of the input tensor, element-wise.
8.9.16. dequantizeLinear
Dequantizes an integer tensor to floating point tensor using the scale and zero-point bias, whereoutput = (input - zeroPoint) * scale. The scale and zeroPoint tensors can be smaller than the input tensor as they are blockwise broadcastable.
{
( ,
,
,
= {});
};
{
;
;
;
;
};
{
;
};
-
input: anMLOperand. The input tensor. -
scale: anMLOperand. The scale tensor to multiply each input value by after adjusting by the zero point. It must be blockwise broadcastable with the input. Values must be positive and nonzero, or else the behavior is implementation-defined (e.g. correct results, incorrect results, or compilation failure). -
zeroPoint: anMLOperand. The zero point tensor to subtract from each input value. It has the same shape as the scale. -
options: anMLOperatorOptions. Specifies the optional parameters of the operation.
Returns: an MLOperand. The output tensor that contains the dequantized values.
| operand | allowed data types | allowed ranks |
|---|---|---|
input
| "uint8", "int8", "uint32", "int32"
| N |
scale
| "float32", "float16"
| same as input
|
zeroPoint
| same as input
| same as input
|
| output | same as scale
| same as input
|
MLQuantizeDequantizeLinearSupportLimits has the following members:
input, of type MLTensorLimits-
MLTensorLimitsfor input operand. scale, of type MLTensorLimits-
MLTensorLimitsfor scale operand. zeroPoint, of type MLTensorLimits-
MLTensorLimitsfor zeroPoint operand. output, of type MLTensorLimits-
MLTensorLimitsfor output operand.
MLOpSupportLimits has the following member for dequantizeLinear():
dequantizeLinear, of type MLQuantizeDequantizeLinearSupportLimits-
Support limits for operator
dequantizeLinear().
8.9.17. quantizeLinear
Quantizes a floating point tensor to integer tensor using the scale and zero-point bias (e.g.output = clamp(roundEven(input / scale) + zeroPoint, 0, 255) for "uint8"). The scale and zeroPoint tensors can be smaller than the input tensor as they are blockwise broadcast.
{
( ,
,
,
= {});
};
{
;
};
-
input: anMLOperand. The input tensor. -
scale: anMLOperand. The scale tensor to divide each input value by before adjusting by the zero point. It must be blockwise broadcastable with the input. Values must be positive and nonzero, or else behaviors are implementation dependent (e.g. correct results, incorrect results, or compilation failure). -
zeroPoint: anMLOperand. The zero point tensor to add to each rescaled input value. It has the same shape as the scale. -
options: anMLOperatorOptions. Specifies the optional parameters of the operation.
Returns: an MLOperand. The output tensor that contains the quantized values.
| operand | allowed data types | allowed ranks |
|---|---|---|
input
| "float32", "float16"
| N |
scale
| same as input
| same as input
|
zeroPoint
| "uint8", "int8", "uint32", "int32"
| same as input
|
| output | same as zeroPoint
| same as input
|
MLOpSupportLimits has the following member for quantizeLinear():
quantizeLinear, of type MLQuantizeDequantizeLinearSupportLimits-
Support limits for operator
quantizeLinear().
8.9.18. elu
Calculate the exponential linear unit function (ELU) on the input tensor element-wise. The calculation follows the expressionmax(0, x) + alpha * (exp(min(0, x)) - 1).
: {
= 1;
};
{
( , = {});
};
{
;
};
MLEluOptions has the following members:
alpha, of type double, defaulting to1-
A scalar multiplier.
-
input: anMLOperand. The input tensor. -
options: an optionalMLEluOptions. The optional parameters of the operation.
Returns:
| operand | allowed data types | allowed ranks |
|---|---|---|
input
| "float32", "float16"
| N |
| output | same as input
| same as input
|
MLOpSupportLimits has the following members for elu():
elu, of type MLSingleInputSupportLimits-
Support limits for operator
elu().
8.9.19. expand
Expand any dimension of size 1 of the input tensor to a larger size according to the new shape. The expansion is consistent with [numpy-broadcasting-rule]. The input tensor must be unidirectionally broadcastable to the new shape; each dimension must be of size 1 or match the sizes of the corresponding output dimensions according to the new shape. {
( ,
<[] > ,
= {});
};
{
;
};
-
input: anMLOperand. An input tensor -
newShape: sequence<unsigned long>. The new shape the input tensor is expanded to. -
options: anMLOperatorOptions. Specifies the optional parameters of the operation.
Returns: an MLOperand. The tensor with expanded size shape.
| operand | allowed data types | allowed ranks |
|---|---|---|
input
| any | N |
| output | same as input
| N |
MLOpSupportLimits has the following members for expand():
expand, of type MLSingleInputSupportLimits-
Support limits for operator
expand().
8.9.20. gather
Gather values of the input tensor along an axis according to the indices.: { [] = 0; }; { ( , , = {}); };{ ; ; ; }; { ; };
MLGatherOptions has the following members:
axis, of type unsigned long, defaulting to0-
The axis along which the gathered values are obtained. Its value must be in the range [0, N-1] where N is the rank of the input tensor.
-
input: anMLOperand. The input N-D tensor from which the values are gathered. -
indices: anMLOperand. The indices N-D tensor of the input values to gather. The values must be of type"int32","uint32", or"int64", and must be in the range -N (inclusive) to N (exclusive) where N is the size of the input dimension indexed byaxis, and a negative index means indexing from the end of the dimension. -
options: an optionalMLGatherOptions. The optional parameters of the operation.
Returns: an MLOperand. The output N-D tensor of rank equal to the rank of input + the rank of indices - 1.
indices parameter to gather() can not be clamped to the allowed range when the graph is built because the inputs are not known until execution. Implementations can introduce clamp() in the compiled graph if the specified clamping behavior is not provided by the underlying platform. Similarly, if the underlying platform does not support negative indices, the implementation can introduce operations in the compiled graph to transform a negative index from the end of the dimension into a positive index.
| operand | allowed data types | allowed ranks |
|---|---|---|
input
| any | 1 to N |
indices
| "int32", "uint32", "int64"
| N |
| output | same as input
| N |
MLGatherSupportLimits has the following members:
input, of type MLTensorLimits-
MLTensorLimitsfor input operand. indices, of type MLTensorLimits-
MLTensorLimitsfor indices operand. output, of type MLTensorLimits-
MLTensorLimitsfor output operand.
MLOpSupportLimits has the following members for gather():
gather, of type MLGatherSupportLimits-
Support limits for operator
gather().
8.9.21. gatherElements
Gather values of the input tensor along an axis according to the indices. {
( ,
,
= {});
};
{
;
};
-
input: anMLOperand. The input N-D tensor from which the values are gathered. -
indices: anMLOperand. The indices N-D tensor of the input values to gather. The values must be of type"int32","uint32", or"int64", and must be in the range -N (inclusive) to N (exclusive) where N is the size of the input dimension indexed by options.axis, and a negative index means indexing from the end of the dimension. -
options: an optionalMLGatherOptions. The optional parameters of the operation.
Returns: an MLOperand. The output N-D tensor of rank equal to input’s rank.
| operand | allowed data types | allowed ranks |
|---|---|---|
input
| any | 1 to N |
indices
| "int32", "uint32", "int64"
| same as input
|
| output | same as input
| same as input
|
MLOpSupportLimits has the following members for gatherElements():
gatherElements, of type MLGatherSupportLimits-
Support limits for operator
gatherElements().
indices parameter to gatherElements() can not be clamped to the allowed range when the graph is built because the inputs are not known until execution. Implementations can introduce clamp() in the compiled graph if the specified clamping behavior is not provided by the underlying platform. Similarly, if the underlying platform does not support negative indices, the implementation can introduce operations in the compiled graph to transform a negative index from the end of the dimension into a positive index.
8.9.22. gatherND
Gather slices of the input tensor according to the indices. {
( ,
,
= {});
};
{
;
};
-
input: anMLOperand. The input N-D tensor from which the values are gathered. -
indices: anMLOperand. The indices array contains entire coordinates into the input tensor, with the rightmost dimension holding the number of dimensions per coordinate. So an indices tensor of shape [10,1] holds 10 single-axis indices, and a shape of [4,3] holds 4 indices of 3D coordinates. The values must be of type"int32","uint32", or"int64", and each must be in the range -N (inclusive) to N (exclusive) where N is the size of the corresponding input dimension, and a negative index means indexing from the end of the corresponding dimension. -
options: an optionalMLOperatorOptions. The optional parameters of the operation.
Returns: an MLOperand. The output N-D tensor of rank equal to the input’s rank + indices’s rank - indices’s shape[-1] - 1.
| operand | allowed data types | allowed ranks |
|---|---|---|
input
| any | 1 to N |
indices
| "int32", "uint32", "int64"
| 1 to N |
| output | same as input
| N |
MLOpSupportLimits has the following members for gatherND():
gatherND, of type MLGatherSupportLimits-
Support limits for operator
gatherND().
indices parameter to gatherND() can not be clamped to the allowed range when the graph is built because the inputs are not known until execution. Implementations can introduce clamp() in the compiled graph if the specified clamping behavior is not provided by the underlying platform. Similarly, if the underlying platform does not support negative indices, the implementation can introduce operations in the compiled graph to transform a negative index from the end of the dimension into a positive index.
8.9.23. gelu
Compute the gaussian error linear unit function (GELU) of the input tensor. The calculation follows the expression0.5 * x * (1 + erf(x / sqrt(2))).
{
( , = {});
};
{
;
};
-
input: anMLOperand. The input tensor. -
options: anMLOperatorOptions. Specifies the optional parameters of the operation.
Returns:
| operand | allowed data types | allowed ranks |
|---|---|---|
input
| "float32", "float16"
| N |
| output | same as input
| same as input
|
MLOpSupportLimits has the following member for gelu():
gelu, of type MLSingleInputSupportLimits-
Support limits for operator
gelu().
8.9.24. gemm
Calculate the general matrix multiplication of the Basic Linear Algebra Subprograms. The calculation follows the expressionalpha * A * B + beta * C, where A is a 2-D tensor with shape [M, K] or [K, M], B is a 2-D tensor with shape [K, N] or [N, K], and C is unidirectionally broadcastable to the shape [M, N]. A and B can optionally be transposed prior to the calculation.
: { ; = 1.0; = 1.0; = ; = ; }; { ( , , = {}); };{ ; ; ; ; }; { ; };
MLGemmOptions has the following members:
c, of type MLOperand-
The third input tensor. It is either a scalar, or of the shape that is unidirectionally broadcastable to the shape [M, N]. When it is not specified, the computation is done as if
cis a scalar 0.0. alpha, of type double, defaulting to1.0-
A multiplier for the first input.
beta, of type double, defaulting to1.0-
A multiplier for the third input
c. aTranspose, of type boolean, defaulting tofalse-
Indicates if the first input is transposed prior to calculating the output.
bTranspose, of type boolean, defaulting tofalse-
Indicates if the second input is transposed prior to calculating the output.
-
a: anMLOperand. The first input 2-D tensor with shape [M, K] ifaTransposeis false, or [K, M] ifaTransposeis true. -
b: anMLOperand. The second input 2-D tensor with shape [K, N] ifbTransposeis false, or [N, K] ifbTransposeis true. -
options: an optionalMLGemmOptions. The optional parameters of the operation.
Returns: an MLOperand. The output 2-D tensor of shape [M, N] that contains the calculated product of all the inputs.
| operand | allowed data types | allowed ranks |
|---|---|---|
a
| "float32", "float16"
| 2 |
b
| same as a
| 2 |
c
| same as a
| 0 to 2 |
| output | same as a
| 2 |
MLGemmSupportLimits has the following members:
a, of type MLTensorLimits-
MLTensorLimitsfor a operand. b, of type MLTensorLimits-
MLTensorLimitsfor b operand. c, of type MLTensorLimits-
MLTensorLimitsfor c operand. output, of type MLTensorLimits-
MLTensorLimitsfor output operand.
MLOpSupportLimits has the following member for gemm():
gemm, of type MLGemmSupportLimits-
Support limits for operator
gemm().
8.9.25. gru
Gated Recurrent Unit [GRU] recurrent network uses an update, reset, and new gate to compute the output state that rolls into the output across the temporal sequence of the network.{, // update-reset-new gate ordering// reset-update-new gate ordering };{,,};{,,};: { ; ; ; = ; = ; = "forward"; = "zrn"; <> ; }; { <> ( , , , [] , [] , = {}); };{ ; ; ; ; ; ; ; ; }; { ; };
MLGruOptions has the following members:
bias, of type MLOperand-
The 2-D input bias tensor of shape [numDirections, 3 * hiddenSize]. The ordering of the bias vectors in the second dimension of the tensor shape is specified according to
layout. recurrentBias, of type MLOperand-
The 2-D recurrent bias tensor of shape [numDirections, 3 * hiddenSize]. The ordering of the bias vectors in the second dimension of the tensor shape is specified according to
layout. initialHiddenState, of type MLOperand-
The 3-D initial hidden state tensor of shape [numDirections, batchSize, hiddenSize]. When not specified, implementations must use a tensor filled with zero.
resetAfter, of type boolean, defaulting totrue-
Indicates whether to apply the reset gate after or before matrix multiplication.
returnSequence, of type boolean, defaulting tofalse-
Indicates whether to also return the entire sequence with every output from each time step in it in addition to the output of the last time step.
direction, of type MLRecurrentNetworkDirection, defaulting to"forward"-
The processing direction of the input sequence. When set to
"both", the size of the first dimension of the weight and the bias tensor shapes must be 2, and the input is processed in both directions. layout, of type MLGruWeightLayout, defaulting to"zrn"-
The ordering of the weight and bias vectors for the internal gates of GRU, specifically the
update (z),reset (r), andnew (n)gate, as indicated in the second dimension of the weight and bias tensor shape. activations, of type sequence<MLRecurrentNetworkActivation>-
Specifies a pair of activation functions with the first function used for the update and reset gate, and the second used for the new gate. When not specified, defaults to the
"sigmoid"and"tanh"functions, respectively.
-
input: anMLOperand. The input 3-D tensor of shape [steps, batchSize, inputSize]. -
weight: anMLOperand. The 3-D input weight tensor of shape [numDirections, 3 * hiddenSize, inputSize]. The ordering of the weight vectors in the second dimension of the tensor shape is specified according tolayout. -
recurrentWeight: anMLOperand. The 3-D recurrent weight tensor of shape [numDirections, 3 * hiddenSize, hiddenSize]. The ordering of the weight vectors in the second dimension of the tensor shape is specified according tolayout. -
steps: anunsigned longscalar. The number of time steps in the recurrent network. The value must be greater than 0. -
hiddenSize: anunsigned longscalar. The value of the third dimension of the cell output tensor shape. It indicates the number of features in the hidden state. -
options: an optionalMLGruOptions. The optional parameters of the operation.
Returns: sequence<MLOperand>. The first element is a 3-D tensor of shape [numDirections, batchSize, hiddenSize], the cell output from the last time step of the network. Additionally, if returnSequence is set to true, the second element is the 4-D output tensor of shape [steps, numDirections, batchSize, hiddenSize] containing every cell outputs from each time step in the temporal sequence.
| operand | allowed data types | allowed ranks |
|---|---|---|
input
| "float32", "float16"
| 3 |
weight
| same as input
| 3 |
recurrentWeight
| same as input
| 3 |
bias
| same as input
| 2 |
recurrentBias
| same as input
| 2 |
initialHiddenState
| same as input
| 3 |
| outputs[0] | same as input
| 3 |
outputs[1] if returnSequence is true
| same as input
| 4 |
MLGruSupportLimits has the following members:
input, of type MLTensorLimits-
MLTensorLimitsfor input operand. weight, of type MLTensorLimits-
MLTensorLimitsfor weight operand. recurrentWeight, of type MLTensorLimits-
MLTensorLimitsfor recurrentWeight operand. bias, of type MLTensorLimits-
MLTensorLimitsfor bias operand. recurrentBias, of type MLTensorLimits-
MLTensorLimitsfor recurrentBias operand. initialHiddenState, of type MLTensorLimits-
MLTensorLimitsfor initialHiddenState operand. output0, of type MLTensorLimits-
MLTensorLimitsfor all the output operands[0]. output1, of type MLTensorLimits-
MLTensorLimitsfor all the output operands[1].
MLOpSupportLimits has the following member for gru():
gru, of type MLGruSupportLimits-
Support limits for operator
gru().
8.9.26. gruCell
A single time step of the Gated Recurrent Unit [GRU] recurrent network using an update gate and a reset gate to compute the hidden state that rolls into the output across the temporal sequence of a recurrent network.: { ; ; = ; = "zrn"; <> ; }; { ( , , , , [] , = {}); };{ ; ; ; ; ; ; ; }; { ; };
MLGruCellOptions has the following members:
bias, of type MLOperand-
The 1-D input bias tensor of shape [3 * hiddenSize]. The ordering of the bias vectors in the second dimension of the tensor shape is specified according to
layout. recurrentBias, of type MLOperand-
The 1-D recurrent bias tensor of shape [3 * hiddenSize]. The ordering of the bias vectors in the second dimension of the tensor shape is specified according to
layout. resetAfter, of type boolean, defaulting totrue-
Indicates whether to apply the reset gate after or before matrix multiplication.
layout, of type MLGruWeightLayout, defaulting to"zrn"-
The ordering of the weight and bias vectors for the internal gates of GRU, specifically the
update (z),reset (r), andnew (n)gate, as indicated in the second dimension of the weight and bias tensor shape. activations, of type sequence<MLRecurrentNetworkActivation>-
Specifies a pair of activation functions with the first function used for the update and reset gate, and the second used for the new gate. When not specified, defaults to the
"sigmoid"and"tanh"functions, respectively.
-
input: anMLOperand. The input 2-D tensor of shape [batchSize, inputSize]. -
weight: anMLOperand. The 2-D input weight tensor of shape [3 * hiddenSize, inputSize]. The ordering of the weight vectors in the first dimension of the tensor shape is specified according tolayout. -
recurrentWeight: anMLOperand. The 2-D recurrent weight tensor of shape [3 * hiddenSize, hiddenSize]. The ordering of the weight vectors in the first dimension of the tensor shape is specified according tolayout. -
hiddenState: anMLOperand. The 2-D input hidden state tensor of shape [batchSize, hiddenSize]. -
hiddenSize: anunsigned longscalar. The value of the second dimension of the output tensor shape. It indicates the number of features in the hidden state. -
options: an optionalMLGruCellOptions. The optional parameters of the operation.
Returns: an MLOperand. The 2-D tensor of shape [batchSize, hiddenSize], the cell output hidden state of a single time step of the recurrent network.
| operand | allowed data types | allowed ranks |
|---|---|---|
input
| "float32", "float16"
| 2 |
weight
| same as input
| 2 |
recurrentWeight
| same as input
| 2 |
bias
| same as input
| 1 |
recurrentBias
| same as input
| 1 |
| output | same as input
| 2 |
MLGruCellSupportLimits has the following members;
input, of type MLTensorLimits-
MLTensorLimitsfor input operand. weight, of type MLTensorLimits-
MLTensorLimitsfor weight operand. recurrentWeight, of type MLTensorLimits-
MLTensorLimitsfor recurrentWeight operand. hiddenState, of type MLTensorLimits-
MLTensorLimitsfor hiddenState operand. bias, of type MLTensorLimits-
MLTensorLimitsfor bias operand. recurrentBias, of type MLTensorLimits-
MLTensorLimitsfor recurrentBias operand. output, of type MLTensorLimits-
MLTensorLimitsfor output operand.
MLOpSupportLimits has the following member for gruCell():
gruCell, of type MLGruCellSupportLimits-
Support limits for operator
gruCell().
8.9.27. hardSigmoid
Calculate the non-smooth hard sigmoid function on the input tensor, used instead of the sigmoid function for faster computation. : {
= 0.2;
= 0.5;
};
{
( , = {});
};
{
;
};
MLHardSigmoidOptions has the following members:
alpha, of type double, defaulting to0.2-
A scalar multiplier.
beta, of type double, defaulting to0.5-
A scalar addition.
-
input: anMLOperand. The input tensor. -
options: an optionalMLHardSigmoidOptions. The optional parameters of the operation.
Returns:
| operand | allowed data types | allowed ranks |
|---|---|---|
input
| "float32", "float16"
| N |
| output | same as input
| same as input
|
MLOpSupportLimits has the following member for hardSigmoid():
hardSigmoid, of type MLSingleInputSupportLimits-
Support limits for operator
hardSigmoid().
8.9.28. hardSwish
Computes the nonlinear functiony = x * max(0, min(6, (x + 3))) / 6 that is introduced by [MobileNetV3] on the input tensor element-wise.
{
( , = {});
};
{
;
};
-
input: anMLOperand. The input tensor. -
options: anMLOperatorOptions. Specifies the optional parameters of the operation.
Returns:
| operand | allowed data types | allowed ranks |
|---|---|---|
input
| "float32", "float16"
| N |
| output | same as input
| same as input
|
MLOpSupportLimits has the following member for hardSwish():
hardSwish, of type MLSingleInputSupportLimits-
Support limits for operator
hardSwish().
8.9.29. instanceNormalization
Normalize the input using [Instance-Normalization]. UnlikebatchNormalization() where the mean and variance values used in the normalization are computed across all the samples in the batch dimension while the model is trained, the mean and variance values used in the instance normalization are computed on the fly for each input feature of each individual sample in the batch.
: { ; ; = 1e-5; = "nchw"; }; { ( , = {}); };{ ; ; ; ; }; { ; };
MLInstanceNormalizationOptions has the following members:
scale, of type MLOperand-
The 1-D tensor of the scaling values whose size is equal to the number of channels, i.e. the size of the feature dimension of the input. For example, for an
inputtensor with"nchw"layout, the size is equal toinput’s shape[1]. bias, of type MLOperand-
The 1-D tensor of the bias values whose size is equal to the size of the feature dimension of the input. For example, for an
inputtensor with"nchw"layout, the size is equal toinput’s shape[1]. epsilon, of type double, defaulting to1e-5-
A small value to prevent computational error due to divide-by-zero.
layout, of type MLInputOperandLayout, defaulting to"nchw"-
The layout format of the input.
-
input: anMLOperand. The input 4-D tensor. -
options: an optionalMLInstanceNormalizationOptions. The optional parameters of the operation.
Returns: an MLOperand. The instance-normalized 4-D tensor of the same shape as input.
| operand | allowed data types | allowed ranks |
|---|---|---|
input
| "float32", "float16"
| 4 |
scale
| same as input
| 1 |
bias
| same as input
| 1 |
| output | same as input
| 4 |
MLNormalizationSupportLimits has the following members:
input, of type MLTensorLimits-
MLTensorLimitsfor input operand. scale, of type MLTensorLimits-
MLTensorLimitsfor scale operand. bias, of type MLTensorLimits-
MLTensorLimitsfor bias operand. output, of type MLTensorLimits-
MLTensorLimitsfor output operand.
MLOpSupportLimits has the following member for instanceNormalization():
instanceNormalization, of type MLNormalizationSupportLimits-
Support limits for operator
instanceNormalization().
8.9.30. layerNormalization
Normalize the input using [Layer-Normalization]. UnlikebatchNormalization() where the mean and variance values are computed across all the samples in the batch dimension while the model is trained, and in instanceNormalization() where the mean and variance values are computed on the fly for each input feature of each individual sample in the batch, the means and variance values of the layer normalization are computed on the fly across all the input features of each individual sample in the batch.
: {
;
;
<[] > ;
= 1e-5;
};
{
( ,
= {});
};
{
;
};
MLLayerNormalizationOptions has the following members:
scale, of type MLOperand-
The N-D tensor of the scaling values whose shape is determined by the
axesmember in that each value inaxesindicates the dimension of the input tensor with scaling values. For example, for anaxesvalues of [1,2,3], the shape of this tensor is the list of the corresponding sizes of the input dimension 1, 2 and 3. When this member is not present, the scaling value is assumed to be 1. bias, of type MLOperand-
The N-D tensor of the bias values whose shape is determined by the
axesmember in that each value inaxesindicates the dimension of the input tensor with bias values. For example, for anaxesvalues of [1,2,3], the shape of this tensor is the list of the corresponding sizes of the input dimension 1, 2 and 3. When this member is not present, the bias value is assumed to be 0. axes, of typesequence<[EnforceRange] unsigned long>-
The indices to the input dimensions to reduce. When this member is not present, it is treated as if all dimensions except the first were given (e.g. for a 4-D input tensor,
axes= [1,2,3]). That is, the reduction for the mean and variance values are calculated across all the input features for each independent batch. If empty, no dimensions are reduced. epsilon, of type double, defaulting to1e-5-
A small value to prevent computational error due to divide-by-zero.
-
input: anMLOperand. The input N-D tensor. -
options: an optionalMLLayerNormalizationOptions. The optional parameters of the operation.
Returns: an MLOperand. The layer-normalized N-D tensor of the same shape as input.
| operand | allowed data types | allowed ranks |
|---|---|---|
input
| "float32", "float16"
| N |
scale
| same as input
| N |
bias
| same as input
| N |
| output | same as input
| same as input
|
MLOpSupportLimits has the following member for layerNormalization():
layerNormalization, of type MLNormalizationSupportLimits-
Support limits for operator
layerNormalization().
8.9.31. leakyRelu
Calculate the leaky version of rectified linear function on the input tensor element-wise. The calculation follows the expressionmax(0, x) + alpha * min(0, x).
: {
= 0.01;
};
{
( , = {});
};
{
;
};
MLLeakyReluOptions has the following members:
alpha, of type double, defaulting to0.01-
A scalar multiplier.
-
input: anMLOperand. The input tensor. -
options: an optionalMLLeakyReluOptions. The optional parameters of the operation.
Returns:
| operand | allowed data types | allowed ranks |
|---|---|---|
input
| "float32", "float16"
| N |
| output | same as input
| same as input
|
MLOpSupportLimits has the following member for leakyRelu():
leakyRelu, of type MLSingleInputSupportLimits-
Support limits for operator
leakyRelu().
8.9.32. linear
Calculate a linear functiony = alpha * x + beta on the input tensor.
: {
= 1;
= 0;
};
{
( , = {});
};
{
;
};
MLLinearOptions has the following members:
alpha, of type double, defaulting to1-
A scalar multiplier.
beta, of type double, defaulting to0-
A scalar addition.
-
input: anMLOperand. The input tensor. -
options: an optionalMLLinearOptions. The optional parameters of the operation.
Returns:
| operand | allowed data types | allowed ranks |
|---|---|---|
input
| "float32", "float16"
| N |
| output | same as input
| same as input
|
MLOpSupportLimits has the following member for linear():
linear, of type MLSingleInputSupportLimits-
Support limits for operator
linear().
8.9.33. lstm
Long Short-Term Memory [LSTM] recurrent network uses an input, output, forget, and cell gate to compute the output state that rolls into the output across the temporal sequence of the network.{, // input-output-forget-cell gate ordering// input-forget-cell-output gate ordering };: { ; ; ; ; ; = ; = "forward"; = "iofg"; <> ; }; { <> ( , , , [] , [] , = {}); };{ ; ; ; ; ; ; ; ; ; ; ; }; { ; };
MLLstmOptions has the following members:
bias, of type MLOperand-
The 2-D input bias tensor of shape [numDirections, 4 * hiddenSize]. The ordering of the bias vectors in the second dimension of the tensor shape is specified according to
layout. recurrentBias, of type MLOperand-
The 2-D recurrent bias tensor of shape [numDirections, 4 * hiddenSize]. The ordering of the bias vectors in the first dimension of the tensor shape is specified according to
layout. peepholeWeight, of type MLOperand-
The 2-D weight tensor for peepholes of shape [numDirections, 3 * hiddenSize]. The pack ordering of the weight vectors is for the
input (i),output (o), andforget (f)gate, respectively. initialHiddenState, of type MLOperand-
The 3-D initial hidden state tensor of shape [numDirections, batchSize, hiddenSize]. When not specified, implementations must use a tensor filled with zero.
initialCellState, of type MLOperand-
The 3-D initial hidden state tensor of shape [numDirections, batchSize, hiddenSize]. When not specified, implementations must use a tensor filled with zero.
returnSequence, of type boolean, defaulting tofalse-
Indicates whether to also return the entire sequence with every output from each time step in it in addition to the output of the last time step.
direction, of type MLRecurrentNetworkDirection, defaulting to"forward"-
The processing direction of the input sequence. When set to
"both", the size of the first dimension of the weight and the bias tensor shapes must be 2, and the input is processed in both directions. layout, of type MLLstmWeightLayout, defaulting to"iofg"-
The ordering of the weight and bias vectors for the internal gates of LSTM, specifically the
input (i),output (o),forget (f), andcell (g)gate, as indicated in the first dimension of the weight and bias tensor shapes. activations, of type sequence<MLRecurrentNetworkActivation>-
A list of three activation functions, the first one is used for the
input (i),forget (f), andoutput (o)gate, the second one is used for thecell (g)gate, and the last used for filtering the output cell state before combining it with the result of the output gate to form the output hidden state. When not specified, defaults to a sequence of the"sigmoid","tanh", and"tanh"functions, respectively.
-
input: anMLOperand. The input 3-D tensor of shape [steps, batchSize, inputSize]. -
weight: anMLOperand. The 3-D input weight tensor of shape [numDirections, 4 * hiddenSize, inputSize]. The ordering of the weight vectors in the second dimension of the tensor shape is specified according tolayout. -
recurrentWeight: anMLOperand. The 3-D recurrent weight tensor of shape [numDirections, 4 * hiddenSize, hiddenSize]. The ordering of the weight vectors in the second dimension of the tensor shape is specified according tolayout. -
steps: anunsigned longscalar. The number of time steps in the recurrent network. The value must be greater than 0. -
hiddenSize: anunsigned longscalar. The value of the third dimension of the cell output tensor shape. It indicates the number of features in the hidden state. -
options: an optionalMLLstmOptions. The optional parameters of the operation.
Returns: sequence<MLOperand>. The first element is a 3-D tensor of shape [numDirections, batchSize, hiddenSize], the output hidden state from the last time step of the network. The second element is a 3-D tensor of shape [numDirections, batchSize, hiddenSize], the output cell state from the last time step of the network. Additionally, if returnSequence is set to true, the third element is the 4-D output tensor of shape [steps, numDirections, batchSize, hiddenSize] containing every output from each time step in the temporal sequence.
| operand | allowed data types | allowed ranks |
|---|---|---|
input
| "float32", "float16"
| 3 |
weight
| same as input
| 3 |
recurrentWeight
| same as input
| 3 |
bias
| same as input
| 2 |
recurrentBias
| same as input
| 2 |
peepholeWeight
| same as input
| 2 |
initialHiddenState
| same as input
| 3 |
initialCellState
| same as input
| 3 |
| outputs[0] | same as input
| 3 |
| outputs[1] | same as input
| 3 |
outputs[2] if returnSequence is true
| same as input
| 4 |
MLLstmSupportLimits has the following members:
input, of type MLTensorLimits-
MLTensorLimitsfor input operand. weight, of type MLTensorLimits-
MLTensorLimitsfor weight operand. recurrentWeight, of type MLTensorLimits-
MLTensorLimitsfor recurrentWeight operand. bias, of type MLTensorLimits-
MLTensorLimitsfor bias operand. recurrentBias, of type MLTensorLimits-
MLTensorLimitsfor recurrentBias operand. peepholeWeight, of type MLTensorLimits-
MLTensorLimitsfor peepholeWeight operand. initialHiddenState, of type MLTensorLimits-
MLTensorLimitsfor initialHiddenState operand. initialCellState, of type MLTensorLimits-
MLTensorLimitsfor initialCellState operand. output0, of type MLTensorLimits-
MLTensorLimitsfor all the output operands[0]. output1, of type MLTensorLimits-
MLTensorLimitsfor all the output operands[1]. output2, of type MLTensorLimits-
MLTensorLimitsfor all the output operands[2].
MLOpSupportLimits has the following member for lstm():
lstm, of type MLLstmSupportLimits-
Support limits for operator
lstm().
8.9.34. lstmCell
A single time step of the Long Short-Term Memory [LSTM] recurrent network using a cell state, an input, output, and forget gate to compute the cell state and the hidden state of the next time step that rolls into the output across the temporal sequence of the network.: { ; ; ; = "iofg"; <> ; }; { <> ( , , , , , [] , = {}); };{ ; ; ; ; ; ; ; ; ; ; }; { ; };
MLLstmCellOptions has the following members:
bias, of type MLOperand-
The 1-D input bias tensor of shape [4 * hiddenSize]. The ordering of the bias vectors in the first dimension of the tensor shape is specified according to
layout. recurrentBias, of type MLOperand-
The 1-D recurrent bias tensor of shape [4 * hiddenSize]. The ordering of the bias vectors in the first dimension of the tensor shape is specified according to
layout. peepholeWeight, of type MLOperand-
The 1-D weight tensor for peepholes of shape [3 * hiddenSize]. The pack ordering of the weight vectors is for the
input (i),output (o), andforget (f)gate, respectively. layout, of type MLLstmWeightLayout, defaulting to"iofg"-
The ordering of the weight and bias vectors for the internal gates of LSTM, specifically the
input (i),output (o),forget (f), andcell (g)gate, as indicated in the first dimension of the weight and bias tensor shapes. activations, of type sequence<MLRecurrentNetworkActivation>-
A list of three activation functions, the first one is used for the
input (i),forget (f), andoutput (o)gate, the second one is used for thecell (g)gate, and the last used for filtering the output cell state before combining it with the result of the output gate to form the output hidden state. When not specified, defaults to a sequence of the"sigmoid","tanh", and"tanh"functions, respectively.
-
input: anMLOperand. The input 2-D tensor of shape [batchSize, inputSize]. -
weight: anMLOperand. The 2-D input weight tensor of shape [4 * hiddenSize, inputSize]. The ordering of the weight vectors in the first dimension of the tensor shape is specified according tolayout. -
recurrentWeight: anMLOperand. The 2-D recurrent weight tensor of shape [4 * hiddenSize, hiddenSize]. The ordering of the weight vectors in the first dimension of the tensor shape is specified according tolayout. -
hiddenState: anMLOperand. The 2-D input hidden state tensor of shape [batchSize, hiddenSize]. -
cellState: anMLOperand. The 2-D input cell state tensor of shape [batchSize, hiddenSize]. -
hiddenSize: anunsigned longscalar. The value of the second dimension of the output tensor shape. It indicates the number of features in the hidden state. -
options: an optionalMLLstmCellOptions. The optional parameters of the operation.
Returns: sequence<MLOperand>. The first element is the output hidden state of the current time step of the recurrent network. The following element is the output cell state. Both elements are 2-D tensors of shape [batchSize, hiddenSize].
| operand | allowed data types | allowed ranks |
|---|---|---|
input
| "float32", "float16"
| 2 |
weight
| same as input
| 2 |
recurrentWeight
| same as input
| 2 |
hiddenState
| same as input
| 2 |
cellState
| same as input
| 2 |
bias
| same as input
| 1 |
recurrentBias
| same as input
| 1 |
peepholeWeight
| same as input
| 1 |
| outputs[0] | same as input
| 2 |
| outputs[1] | same as input
| 2 |
MLLstmCellSupportLimits has the following members:
input, of type MLTensorLimits-
MLTensorLimitsfor input operand. weight, of type MLTensorLimits-
MLTensorLimitsfor weight operand. recurrentWeight, of type MLTensorLimits-
MLTensorLimitsfor recurrentWeight operand. hiddenState, of type MLTensorLimits-
MLTensorLimitsfor hiddenState operand. cellState, of type MLTensorLimits-
MLTensorLimitsfor cellState operand. bias, of type MLTensorLimits-
MLTensorLimitsfor bias operand. recurrentBias, of type MLTensorLimits-
MLTensorLimitsfor recurrentBias operand. peepholeWeight, of type MLTensorLimits-
MLTensorLimitsfor peepholeWeight operand. output0, of type MLTensorLimits-
MLTensorLimitsfor all the output operands[0]. output1, of type MLTensorLimits-
MLTensorLimitsfor all the output operands[1].
MLOpSupportLimits has the following member for lstmCell():
lstmCell, of type MLLstmCellSupportLimits-
Support limits for operator
lstmCell().
8.9.35. matmul
Compute the matrix product of two input tensors. {
( , , = {});
};
{
;
};
-
a: anMLOperand. The first input tensor which is at least 2-D. -
b: anMLOperand. The second input tensor which is at least 2-D. -
options: anMLOperatorOptions. Specifies the optional parameters of the operation.
Returns: an MLOperand. The output tensor that contains the matrix
product of two input tensors.
-
If both
aandbare 2-dimensional, they are multiplied like conventional matrices and produce a 2-dimensional tensor as the output. -
If either
aorbisN-dimensional whereN > 2, it is treated as a stack of matrices with dimensions corresponding to the last two indices. The matrix multiplication will be broadcast according to [numpy-broadcasting-rule]. The shapes ofaandb, except the last two dimensions, must be bidirectionally broadcastable. The output is aN-dimensional tensor whose rank is the maximum rank of the input tensors. For each dimension, except the last two, of the output tensor, its size is the maximum size along that dimension of the input tensors.
| operand | allowed data types | allowed ranks |
|---|---|---|
a
| "float32", "float16"
| 2 to N |
b
| same as a
| 2 or N |
| output | same as a
| 2 or N |
MLOpSupportLimits has the following member for matmul():
matmul, of type MLBinarySupportLimits-
Support limits for operator
matmul().
8.9.36. pad
Inflate the tensor with constant or mirrored values on the edges.{,,};: { = "constant"; = 0; }; { ( , <[] > , <[] > , = {}); }; { ; };
MLPadOptions has the following members:
mode, of type MLPaddingMode, defaulting to"constant"-
The different ways to pad the tensor.
value, of type MLNumber, defaulting to0-
The padding value when
modeis set to"constant".
-
input: anMLOperand. The input tensor. -
beginningPadding: sequence<unsigned long>. The number of padding values to add at the beginning of each input dimension, of length N where N is the rank of the input tensor. For each dimension d ofinput,beginningPadding[d] indicates how many values to add before the content in that dimension. -
endingPadding: sequence<unsigned long>. The number of padding values to add at the ending of each input dimension, of length N where N is the rank of the input tensor. For each dimension d ofinput,endingPadding[d] indicates how many values to add after the content in that dimension. -
options: an optionalMLPadOptions. The optional parameters of the operation.
Returns: an MLOperand. The padded output tensor. Each dimension of the output tensor can be calculated as follows:
output size = beginning padding + input size + ending padding
| operand | allowed data types | allowed ranks |
|---|---|---|
input
| any | N |
| output | same as input
| same as input
|
MLOpSupportLimits has the following member for pad():
pad, of type MLSingleInputSupportLimits-
Support limits for operator
pad().
8.9.37. Pooling operations
Compute a pooling operation across all the elements within the moving window over the input tensor.{,};: { <[] > ; <[] > ; <[] > ; <[] > ; = "nchw"; = "floor"; <[] > ; }; { ( , = {}); ( , = {}); ( , = {}); }; { ; ; ; };
MLPool2dOptions has the following members:
windowDimensions, of typesequence<[EnforceRange] unsigned long>-
A list of length 2: [windowHeight, windowWidth]. Specifies the dimensions of the sliding window. The default value for the window dimensions are the height and width dimensions of the input shape.
padding, of typesequence<[EnforceRange] unsigned long>-
A list of length 4: [beginningHeight, endingHeight, beginningWidth, endingWidth]. Specifies the additional rows and columns added to the beginning and ending of each spatial dimension of the convolution input. The default value is [0,0,0,0].
strides, of typesequence<[EnforceRange] unsigned long>-
A list of length 2: [strideHeight, strideWidth]. Specifies the stride of the sliding window for each spatial dimension of the convolution input. The default value is [1,1].
dilations, of typesequence<[EnforceRange] unsigned long>-
A list of length 2: [dilationHeight, dilationWidth]. Specifies the dilation factor for each spatial dimension applied on the convolution filter (kernel). The default value is [1,1].
layout, of type MLInputOperandLayout, defaulting to"nchw"-
Specifies the layout format of the input and output tensor as follows:
outputShapeRounding, of type MLRoundingType, defaulting to"floor"-
The rounding function used to compute the output shape, depending on whether full or partial window results are desired.
outputSizes, of typesequence<[EnforceRange] unsigned long>-
A list of length 2: [outputHeight, outputWidth] Specifies the sizes of the two spatial dimensions of the output tensor. When the output sizes are explicitly specified, the
outputShapeRoundingis ignored. If not specified, the output sizes are automatically computed.
-
input: anMLOperand. The input 4-D tensor. The logical shape is interpreted according to the value oflayout. -
options: an optionalMLPool2dOptions. The optional parameters of the operation.
Returns: an MLOperand. The output 4-D tensor that contains the
result of the reduction. The logical shape is interpreted according to the
value of layout. More specifically, if the outputShapeRounding is "floor", the spatial dimensions for a single dimension of the output tensor can be calculated as follows:
output size = floor(1 + (input size - filter size + beginning padding + ending padding) / stride)
or if outputShapeRounding is "ceil":
output size = ceil(1 + (input size - filter size + beginning padding + ending padding) / stride)
| operand | allowed data types | allowed ranks |
|---|---|---|
input
| specified as part of operation steps | 4 |
| output | same as input
| 4 |
MLOpSupportLimits has the following members for pooling operations:
averagePool2d, of type MLSingleInputSupportLimits-
Support limits for operator
averagePool2d(). l2Pool2d, of type MLSingleInputSupportLimits-
Support limits for operator
l2Pool2d(). maxPool2d, of type MLSingleInputSupportLimits-
Support limits for operator
maxPool2d().
buildermaxPool2dinput
8.9.37.1. averagePool2d
Calculate the average value for patches of a feature map, and use it to create a pooled feature map. See § 8.9.37 Pooling operations for more detail.8.9.37.2. l2Pool2d
Apply the L2 norm function to a region of the input feature map. The L2 norm is the square root of the sum of the squares of its elements. See § 8.9.37 Pooling operations for more detail.8.9.37.3. maxPool2d
Calculate the maximum value for patches of a feature map, and use it to create a pooled feature map. See § 8.9.37 Pooling operations for more detail.8.9.38. prelu
Calculate the parametric version of rectified linear function (Parametric ReLU) on the input tensor element-wise. Parametric ReLU is a type of leaky ReLU that, instead of having a scalar slope like 0.01, making the slope (coefficient of leakage) into a parameter that is learned during the model training phase of this operation. The calculation follows the expressionmax(0, x) + slope * min(0, x).
The operation will be broadcast according to [numpy-broadcasting-rule]. The input tensors must be bidirectionally broadcastable. The rank of the output tensor is the maximum rank of the input tensors. For each dimension of the output tensor, its size is the maximum size along that dimension of the input tensors.
{
( ,
,
= {});
};
{
;
;
;
};
{
;
};
-
input: anMLOperand. The input tensor. -
slope: anMLOperand. The slope tensor. Its shape must be bidirectionally broadcastable to the shape ofinput. -
options: anMLOperatorOptions. Specifies the optional parameters of the operation.
Returns:
| operand | allowed data types | allowed ranks |
|---|---|---|
input
| "float32", "float16", "int64", "int32", "int8"
| N |
slope
| same as input
| N |
| output | same as input
| N |
MLPreluSupportLimits has the following members:
input, of type MLTensorLimits-
MLTensorLimitsfor input operand. slope, of type MLTensorLimits-
MLTensorLimitsfor slope operand. output, of type MLTensorLimits-
MLTensorLimitsfor output operand.
MLOpSupportLimits has the following member for prelu():
prelu, of type MLPreluSupportLimits-
Support limits for operator
prelu().
8.9.39. Reduction operations
Reduce the input tensor along all dimensions, or along the axes specified in theaxes array parameter. For each specified axis, the dimension with that index is reduced, i.e. the resulting tensor will not contain it, unless keepDimensions is specified. The values of the resulting tensor are calculated using the specified reduction function that takes as parameters all the input values across the reduced dimensions.
: {
<[] > ;
= ;
};
{
( , = {});
( , = {});
( , = {});
( , = {});
( , = {});
( , = {});
( , = {});
( , = {});
( , = {});
( , = {});
};
{
;
;
;
;
;
;
;
;
;
;
};
MLReduceOptions has the following members:
axes, of typesequence<[EnforceRange] unsigned long>-
The dimensions to reduce, which also specifies which of the values in the input tensor are used with the reduction function. The axes in the list must be in the range [0, N-1] where N is the rank of the input tensor.
If not present, all dimensions are reduced. The input values for the reduction function are all of the values in the input tensor.
If present and not empty, the input values for the reduction function are all the values for the specified dimensions of the input tensor.
If present and empty, no dimensions are reduced, and the shape of the output tensor is the same as the shape of the input tensor; the reduction function is applied to each value in the tensor individually.
keepDimensions, of type boolean, defaulting tofalse-
If true, the output has the same rank as the input, setting any reduced dimensions to size 1.
-
input: anMLOperand. The input tensor. -
options: an optionalMLReduceOptions. The optional parameters of the operation.
Returns: an MLOperand. The output N-D tensor of rank in the range 0 to input’s rank, inclusive, depending on axes and keepDimensions. If the input operand is a scalar, the reduction function is applied to the scalar value, and the output is also a scalar.
| operand | allowed data types | allowed ranks |
|---|---|---|
input
| specified as part of operation steps | N |
| output | same as input
| N |
MLOpSupportLimits has the following members for reduction operations:
reduceL1, of type MLSingleInputSupportLimits-
Support limits for operator
reduceL1(). reduceL2, of type MLSingleInputSupportLimits-
Support limits for operator
reduceL2(). reduceLogSum, of type MLSingleInputSupportLimits-
Support limits for operator
reduceLogSum(). reduceLogSumExp, of type MLSingleInputSupportLimits-
Support limits for operator
reduceLogSumExp(). reduceMax, of type MLSingleInputSupportLimits-
Support limits for operator
reduceMax(). reduceMean, of type MLSingleInputSupportLimits-
Support limits for operator
reduceMean(). reduceMin, of type MLSingleInputSupportLimits-
Support limits for operator
reduceMin(). reduceProduct, of type MLSingleInputSupportLimits-
Support limits for operator
reduceProduct(). reduceSum, of type MLSingleInputSupportLimits-
Support limits for operator
reduceSum(). reduceSumSquare, of type MLSingleInputSupportLimits-
Support limits for operator
reduceSumSquare().
-
L1: Compute the L1 norm, the sum of the absolute value of the input values.
-
L2: Compute the L2 norm, the square root of the sum of the square of the input values.
-
LogSum: Compute the log value of the sum of the input values.
-
LogSumExp: Compute the log value of the sum of the exponent of the input values.
-
Max: Compute the maximum value of the input values.
-
Mean: Compute the average value of the input values.
-
Min: Compute the minimum value of the input values.
-
Product: Compute the product of the input values.
-
Sum: Compute the sum of the input values.
-
SumSquare: Compute the sum of the square of the input values.
keepDimensions directly. This does not affect the underlying tensor data, only the shape. For example, if the input shape is [2, 3, 4], the axis is 1, and keepDimensions is true, the expected output shape is [2, 1 ,4]. If the underlying platform never keeps reduced dimensions it will produce an output shape of [2, 4]. The implementation can introduce a no-op reshape to [2, 1, 4]. A similar no-op reshape can be introduced if keepDimensions is false but the underlying platform always keeps reduced dimensions.
8.9.40. relu
Compute the rectified linear function of the input tensor. {
( , = {});
};
{
;
};
-
input: anMLOperand. The input tensor. -
options: anMLOperatorOptions. Specifies the optional parameters of the operation.
Returns:
| operand | allowed data types | allowed ranks |
|---|---|---|
input
| "float32", "float16", "int64", "int32", "int8"
| N |
| output | same as input
| same as input
|
MLOpSupportLimits has the following member for relu():
relu, of type MLSingleInputSupportLimits-
Support limits for operator
relu().
8.9.41. resample2d
Resample the tensor values from the source to the destination dimensions according to the axes and scaling factors.{ , };: { = "nearest-neighbor"; <> ; <[] > ; <[] > ; }; { ( , = {}); }; { ; };
-
input: anMLOperand. The input 4-D tensor. -
options: an optionalMLResample2dOptions. The optional parameters of the operation.
Returns: an MLOperand. The output 4-D tensor.
MLResample2dOptions has the following members:
mode, of type MLInterpolationMode, defaulting to"nearest-neighbor"-
The interpolation algorithm used to fill the output tensor values.
Both algorithms start with these inputs, computed for each spatial axis (based on
axes), whereinputSizeis given by theinputtensor’s shape,outputSizeis given bysizesorscales, andoutputCoordinateidentifies the element in the output tensor being computed.scale = outputSize / inputSize unclampedCoordinate = (outputCoordinate + 0.5) / scale - 0.5 inputCoordinate = clamp(unclampedCoordinate, 0, inputSize - 1)
For a givenoutputCoordinate.xandoutputCoordinate.ylocation in the output tensor, the above equations give a rationalinputCoordinate.xandinputCoordinate.y.nearest-neighbor-
The
inputCoordinate.xandinputCoordinate.ycomputed above are used as inputs to a nearest-neighbor sampling algorithm to compute the output tensor value as follows:x = ceil(inputCoordinate.x - 0.5) y = ceil(inputCoordinate.y - 0.5) output tensor value = input tensor value at (x, y)
linear-
The
inputCoordinate.xandinputCoordinate.ycomputed above are used as inputs to a bilinear sampling algorithm to compute the output tensor value as follows:x0 = floor(inputCoordinate.x) x1 = ceil(inputCoordinate.x) y0 = floor(inputCoordinate.y) y1 = ceil(inputCoordinate.y) vx0y0 = input tensor value at (x0, y0) vx1y0 = input tensor value at (x1, y0) vx0y1 = input tensor value at (x0, y1) vx1y1 = input tensor value at (x1, y1) tx = inputCoordinate.x - x0 ty = inputCoordinate.y - y0 vy0 = vx0y0 * (1 - tx) + vx1y0 * tx vy1 = vx0y1 * (1 - tx) + vx1y1 * tx output tensor value = vy0 * (1 - ty) + vy1 * ty
scales, of type sequence<float>-
A list of length 2. Specifies the scaling factor for each input dimension from
axes: [scaleForFirstAxis, scaleForSecondAxis]. The default value is [1.0, 1.0]. sizes, of typesequence<[EnforceRange] unsigned long>-
A list of length 2. Specifies the target sizes for each input dimension from
axes: [sizeForFirstAxis, sizeForSecondAxis]. Whensizesis specified,scalesis ignored, since the scaling factor values are derived from the target sizes of the input. axes, of typesequence<[EnforceRange] unsigned long>-
A list of length 2. Specifies the two dimensions of the input tensor to which the interpolation algorithm applies. The default value is [2, 3].
| operand | allowed data types | allowed ranks |
|---|---|---|
input
| "float32", "float16", "uint8", "int8"
| 4 |
| output | same as input
| 4 |
MLOpSupportLimits has the following member for resample2d():
resample2d, of type MLSingleInputSupportLimits-
Support limits for operator
resample2d().
linear resampling from the following [4, 4] input tensor (considering only spatial dimensions):
[ 0 1 2 3 ] [ 0 1 2 3 ] [ 12 13 14 15 ] [ 12 13 14 15 ]
For an [8, 8] output tensor, the expected values are:
[ 0 0.25 0.75 1.25 1.75 2.25 2.75 3 ] [ 0 0.25 0.75 1.25 1.75 2.25 2.75 3 ] [ 0 0.25 0.75 1.25 1.75 2.25 2.75 3 ] [ 3 3.25 3.75 4.25 4.75 5.25 5.75 6 ] [ 9 9.25 9.75 10.25 10.75 11.25 11.75 12 ] [ 12 12.25 12.75 13.25 13.75 14.25 14.75 15 ] [ 12 12.25 12.75 13.25 13.75 14.25 14.75 15 ] [ 12 12.25 12.75 13.25 13.75 14.25 14.75 15 ]
This has the convenient properties that the sampling is evenly distributed, symmetric, robust to image mirroring, and the corner values are aligned.
8.9.42. reshape
Alter the shape of a tensor to a new shape. Reshape does not copy or change the content of the tensor. It just changes the tensor’s logical shape for the subsequent operations. {
( ,
<[] > ,
= {});
};
{
;
};
-
input: anMLOperand. The input tensor. -
newShape: sequence<unsigned long>. The shape of the output tensor. The number of elements implied bynewShapemust be the same as the number of elements in the input tensor. -
options: anMLOperatorOptions. Specifies the optional parameters of the operation.
Returns: an MLOperand. The output tensor. The values of the output
tensor are the same as values of the input tensor. The shape of the output
tensor is specified by newShape.
| operand | allowed data types | allowed ranks |
|---|---|---|
input
| any | N |
| output | same as input
| N |
MLOpSupportLimits has the following member for reshape():
reshape, of type MLSingleInputSupportLimits-
Support limits for operator
reshape().
8.9.43. reverse
Reverse a tensor along the given axes. : {
<[] > ;
};
{
( , = {});
};
{
;
};
MLReverseOptions has the following members:
axes, of typesequence<[EnforceRange] unsigned long>-
The indices to the input dimensions to reverse. When this member is not present, it is treated as if all dimensions are reversed. If explicitly passed as empty, no dimensions are reversed.
-
input: anMLOperand. The input tensor. -
options: anMLOperatorOptions. Specifies the optional parameters of the operation.
Returns:
| operand | allowed data types | allowed ranks |
|---|---|---|
input
| any | N |
| output | same as input
| same as input
|
MLOpSupportLimits has the following member for reverse():
reverse, of type MLSingleInputSupportLimits-
Support limits for operator
reverse().
8.9.44. scatterElements
Scatter values from the updates tensor atop a copy of the input tensor the along an axis according to the indices.: { [] = 0; }; { ( , , , = {}); };{ ; ; ; ; }; { ; };
MLScatterOptions has the following members:
axis, of type unsigned long, defaulting to0-
The axis along which the scattered values are obtained. Its value must be in the range [0, N-1] where N is the rank of the input tensor.
-
input: anMLOperand. The input N-D tensor from to initialize the output with. -
indices: anMLOperand. The indices N-D tensor of the input values to scatter over. The values must be of type"int32","uint32", or"int64", and must be in the range -N (inclusive) to N (exclusive) where N is the size of the input dimension indexed by options.axis, and a negative index means indexing from the end of the dimension. -
updates: anMLOperand. New values to replace atop the input, with the same shape as the indices. -
options: an optionalMLScatterOptions. The optional parameters of the operation.
Returns: an MLOperand. The output N-D tensor of rank equal to input’s rank.
| operand | allowed data types | allowed ranks |
|---|---|---|
input
| any | 1 to N |
indices
| "int32", "uint32", "int64"
| same as input
|
updates
| same as input
| same as input
|
| output | same as input
| same as input
|
MLScatterSupportLimits has the following members:
input, of type MLTensorLimits-
MLTensorLimitsfor input operand. indices, of type MLTensorLimits-
MLTensorLimitsfor indices operand. updates, of type MLTensorLimits-
MLTensorLimitsfor updates operand. output, of type MLTensorLimits-
MLTensorLimitsfor output operand.
MLOpSupportLimits has the following members for scatterElements():
scatterElements, of type MLScatterSupportLimits-
Support limits for operator
scatterElements().
indices parameter to scatterElements() can not be clamped to the allowed range when the graph is built because the inputs are not known until execution. Implementations can introduce clamp() in the compiled graph if the specified clamping behavior is not provided by the underlying platform. Similarly, if the underlying platform does not support negative indices, the implementation can introduce operations in the compiled graph to transform a negative index from the end of the dimension into a positive index.
8.9.45. scatterND
Scatter slices of values from the update tensor atop a copy of the input tensor according to the indices. {
( ,
,
,
= {});
};
{
;
};
-
input: anMLOperand. The input N-D tensor from to initialize the output with. -
indices: anMLOperand. The indices array contains entire coordinates into the output tensor, with the rightmost dimension holding the number of dimensions per coordinate. So an indices tensor of shape [10,1] holds 10 single-axis indices, and a shape of [4,3] holds 4 indices of 3D coordinates. The values must be of type"int32","uint32", or"int64", and each must be in the range -N (inclusive) to N (exclusive) where N is the size of the corresponding output dimension, and a negative index means indexing from the end of the corresponding dimension. -
updates: anMLOperand. New values to replace atop the input. -
options: an optionalMLScatterOptions. The optional parameters of the operation.
Returns: an MLOperand. The output N-D tensor of rank equal to input’s rank + indices’s rank - indices’s shape[-1] - 1.
| operand | allowed data types | allowed ranks |
|---|---|---|
input
| any | 1 to N |
indices
| "int32", "uint32", "int64"
| 1 to N |
updates
| same as input
| N |
| output | same as input
| 1 to N |
MLOpSupportLimits has the following members for scatterND():
scatterND, of type MLScatterSupportLimits-
Support limits for operator
scatterND().
indices parameter to scatterND() can not be clamped to the allowed range when the graph is built because the inputs are not known until execution. Implementations can introduce clamp() in the compiled graph if the specified clamping behavior is not provided by the underlying platform. Similarly, if the underlying platform does not support negative indices, the implementation can introduce operations in the compiled graph to transform a negative index from the end of the dimension into a positive index.
8.9.46. sigmoid
Compute the sigmoid function of the input tensor. The calculation follows the expression1 / (exp(-x) + 1).
{
( , = {});
};
{
;
};
-
input: anMLOperand. The input tensor. -
options: anMLOperatorOptions. Specifies the optional parameters of the operation.
Returns:
| operand | allowed data types | allowed ranks |
|---|---|---|
input
| "float32", "float16"
| N |
| output | same as input
| same as input
|
MLOpSupportLimits has the following member for sigmoid():
sigmoid, of type MLSingleInputSupportLimits-
Support limits for operator
sigmoid().
8.9.47. slice
Produce a slice of the input tensor. : {
<[] > ;
};
{
( ,
<[] > ,
<[] > ,
= {});
};
{
;
};
MLSliceOptions has the following members:
strides, of typesequence<[EnforceRange] unsigned long>-
The stride to step over each input along each axis. The length of the strides array must equal the rank of the input tensor. The default is an array of length rank consisting of all 1’s. e.g. [1,1,1] for a 3-D tensor. Strides must be greater than zero.
-
input: anMLOperand. The input tensor. -
starts: a sequence<unsigned long>. The starting index to slice of each input dimension, of length N where N is the rank of the input tensor. For each dimension d ofinput,starts[d] indicates the starting index to slice in that dimension. The starting index must be in the range [0, input size - 1] in that dimension. -
sizes: a sequence<unsigned long>. The number of elements to slice of each input dimension, of length N where N is the rank of the input tensor. For each dimension d ofinput,sizes[d] indicates the number of elements to slice in that dimension. The size must not be 0 and must satisfy the constraintstarting index + size <= input sizein that dimension. -
options: anMLSliceOptions. Specifies the optional parameters of the operation.
Returns: an MLOperand. The output tensor of the same rank as the input tensor with tensor values stripped to the specified starting and ending indices in each dimension.
| operand | allowed data types | allowed ranks |
|---|---|---|
input
| any | N |
| output | same as input
| same as input
|
MLOpSupportLimits has the following member for slice():
slice, of type MLSingleInputSupportLimits-
Support limits for operator
slice().
8.9.48. softmax
Compute the softmax values of the N-D input tensor along the given axis. {
( ,
[] ,
= {});
};
{
;
};
-
input: anMLOperand. The input N-D tensor. -
axis: anunsigned longscalar. The dimension the reduction will be performed on. -
options: anMLOperatorOptions. Specifies the optional parameters of the operation.
Returns:
| operand | allowed data types | allowed ranks |
|---|---|---|
input
| "float32", "float16"
| 1 to N |
| output | same as input
| same as input
|
MLOpSupportLimits has the following member for softmax():
softmax, of type MLSingleInputSupportLimits-
Support limits for operator
softmax().
8.9.49. softplus
Compute the softplus function of the input tensor. The calculation follows the expressionln(1 + exp(x)).
{
( , = {});
};
{
;
};
-
input: anMLOperand. The input tensor. -
options: anMLOperatorOptions. Specifies the optional parameters of the operation.
Returns:
| operand | allowed data types | allowed ranks |
|---|---|---|
input
| "float32", "float16"
| N |
| output | same as input
| same as input
|
MLOpSupportLimits has the following member for softplus():
softplus, of type MLSingleInputSupportLimits-
Support limits for operator
softplus().
8.9.50. softsign
Compute the softsign function of the input tensor. The calculation follows the expressionx / (1 + |x|).
{
( , = {});
};
{
;
};
-
input: anMLOperand. The input tensor. -
options: anMLOperatorOptions. Specifies the optional parameters of the operation.
Returns:
| operand | allowed data types | allowed ranks |
|---|---|---|
input
| "float32", "float16"
| N |
| output | same as input
| same as input
|
MLOpSupportLimits has the following member for softsign():
softsign, of type MLSingleInputSupportLimits-
Support limits for operator
softsign().
8.9.51. split
Split the input tensor into a number of sub tensors along the given axis.: { [] = 0; }; { <> ( , ([] <[] >) , = {}); };{ ; ; }; { ; };
-
input: anMLOperand. The input tensor. -
splits: anunsigned longor sequence<unsigned long>. If anunsigned long, it specifies the number of output tensors along the axis. The number must evenly divide the dimension size ofinputalongaxis. If a sequence<unsigned long>, it specifies the sizes of each output tensor along theaxis. The sum of sizes must equal to the dimension size ofinputalongaxis. -
options: an optionalMLSplitOptions. The optional parameters of the operation.
Returns: sequence<MLOperand>. The split output tensors. If splits is an unsigned long, the size of the output is equal to splits. The shape of each output tensor is the same as input except the dimension size of axis equals to the quotient of dividing the dimension size of input along axis by splits. If splits is a sequence<unsigned long>, the size of the output equals the size of splits. The shape of the i-th output tensor is the same as input except along axis where the dimension size is splits[i].
MLSplitOptions has the following members:
axis, of type unsigned long, defaulting to0-
The dimension along which to split. Its value must be in the range [0, N-1] where N is the rank of the input tensor.
| operand | allowed data types | allowed ranks |
|---|---|---|
input
| any | 1 to N |
| outputs | same as input
| same as input
|
MLSplitSupportLimits has the following members:
input, of type MLTensorLimits-
MLTensorLimitsfor input operand. outputs, of type MLTensorLimits-
MLTensorLimitsfor all the output operands.
MLOpSupportLimits has the following member for split():
split, of type MLSplitSupportLimits-
Support limits for operator
split().
8.9.52. tanh
Compute the hyperbolic tangent function of the input tensor. The calculation follows the expression(exp(2 * x) - 1) / (exp(2 * x) + 1).
{
( , = {});
};
{
;
};
-
input: anMLOperand. The input tensor. -
options: anMLOperatorOptions. Specifies the optional parameters of the operation.
Returns:
| operand | allowed data types | allowed ranks |
|---|---|---|
input
| "float32", "float16"
| N |
| output | same as input
| same as input
|
MLOpSupportLimits has the following member for tanh():
tanh, of type MLSingleInputSupportLimits-
Support limits for operator
tanh().
8.9.53. tile
Repeat a tensor the given number of times along each dimension. {
( ,
<> ,
= {});
};
{
;
};
-
input: anMLOperand. The input N-D tensor. -
repetitions: A count per dimension of how many times to repeat that dimension. The size must match theinput’s rank, using 1’s for any axis that should retain the same size. -
options: an optionalMLOperatorOptions. The optional parameters of the operation.
Returns: an MLOperand. The reversed N-D tensor.
| operand | allowed data types | allowed ranks |
|---|---|---|
input
| any | N |
| output | same as input
| same as input
|
MLOpSupportLimits has the following members for tile():
tile, of type MLSingleInputSupportLimits-
Support limits for operator
tile().
8.9.54. transpose
Permute the dimensions of the input tensor according topermutation.
: {
<[] > ;
};
{
( , = {});
};
{
;
};
MLTransposeOptions has the following members:
permutation, of typesequence<[EnforceRange] unsigned long>-
The values used to permute the output shape. The default is [N-1, ..., 0], where N is the rank of the input tensor, e.g. [2,1,0] for a 3-D tensor. These default values cause the output to become a transposed tensor of the input. When specified, the number of values must be the same as the rank of the input tensor, and the values must be within the range from 0 to N-1 with no duplicates.
-
input: anMLOperand. The input N-D tensor. -
options: an optionalMLTransposeOptions. The optional parameters of the operation.
Returns: an MLOperand. The permuted or transposed N-D tensor.
| operand | allowed data types | allowed ranks |
|---|---|---|
input
| any | N |
| output | same as input
| same as input
|
MLOpSupportLimits has the following member for transpose():
transpose, of type MLSingleInputSupportLimits-
Support limits for operator
transpose().
8.9.55. triangular
Given a 2-D tensor (matrix), return a 2-D tensor containing either the upper or lower triangular part of the input tensor. If the input tensor has greater than 2 dimensions it is treated as a batch of matrices and the result has the same shape. : {
= ;
[] = 0;
};
{
( , = {});
};
{
;
};
MLTriangularOptions has the following members:
upper, of type boolean, defaulting totrue-
Indicates whether the output the upper or the lower part of the input matrix is retained. True indicates that the upper part is retained.
diagonal, of type long, defaulting to0-
Specifies how many diagonals above or below the main diagonals of the input matrix are retained or excluded. A value of 0 means no diagonals other than the main diagonals are affected.
-
input: anMLOperand. The input tensor which is at least 2-D. -
options: an optionalMLTriangularOptions. The optional parameters of the operation.
Returns: an MLOperand. The output tensor representing a triangular matrix, or batch of matrices which is the same shape as the input.
| operand | allowed data types | allowed ranks |
|---|---|---|
input
| any | 2 to N |
| output | same as input
| same as input
|
MLOpSupportLimits has the following member for triangular():
triangular, of type MLSingleInputSupportLimits-
Support limits for operator
triangular().
8.9.56. where
Select the values from thetrueValue or the falseValue tensor depending on the corresponding values of the condition tensor, where non-zero is true and zero is false. The condition tensor is often the output of one of the element-wise logical operations.
The operation will be broadcast according to [numpy-broadcasting-rule]. The input tensors must be bidirectionally broadcastable. The rank of the output tensor is the maximum rank of the input tensors. For each dimension of the output tensor, its size is the maximum size along that dimension of the input tensors.
{
( ,
,
,
= {});
};
{
;
;
;
;
};
{
;
};
-
condition: anMLOperand. The condition tensor. -
trueValue: anMLOperand. The tensor from which the value is selected when the condition of the corresponding element is set to true. -
falseValue: anMLOperand. The tensor from which the value is selected when the condition of the corresponding element is set to false. -
options: anMLOperatorOptions. Specifies the optional parameters of the operation.
Returns: an MLOperand. The output tensor that contains the values selected element-wise from either the trueValue or the falseValue tensor.
| operand | allowed data types | allowed ranks |
|---|---|---|
condition
| "uint8"
| N |
trueValue
| any | N |
falseValue
| same as trueValue
| N |
| output | same as trueValue
| N |
MLWhereSupportLimits has the following members:
condition, of type MLTensorLimits-
MLTensorLimitsfor condition operand. trueValue, of type MLTensorLimits-
MLTensorLimitsfor trueValue operand. falseValue, of type MLTensorLimits-
MLTensorLimitsfor falseValue operand. output, of type MLTensorLimits-
MLTensorLimitsfor output operand.
MLOpSupportLimits has the following member for where():
where, of type MLWhereSupportLimits-
Support limits for operator
where().
9. Algorithms
9.1. Broadcasting
Broadcasting describes how WebNN treats tensors with different shapes during graph construction and computation. It is heavily influenced by [NumPy] and follows the [numpy-broadcasting-rule]. Loosely speaking, it allows an operation on a smaller tensor to be "broadcast" across the shape of a larger tensor, so that the same data can be applied repeatedly without making copies.
The simplest example is the application of a scalar constant to an N-dimension tensor with element-wise binary operations such as add() or mul(). Rather than needing to allocate and populate a matching N-dimensional tensor containing multiple copies of the scalar constant, these element-wise operations allow the scalar constant to be used directly, and broadcast the scalar value across the N-dimensional tensor. With the following considerations, the same logic applies to tensors of other dimensions.
The shapes of the input tensors must be compatible. A tensor is unidirectionally broadcastable to another tensor if the first tensor can be "stretched" by repeating the first tensor along an axis with size 1 or repeating across new dimensions, starting from the last (rightmost) dimension. For example, a [4] tensor can be broadcast to a [5, 4] tensor by repeating it 5 times. A [1] tensor can be broadcast to a [5,4] tensor by repeating it 4 times on the last dimension and 5 times on the preceding dimension. Unidirectional broadcasting is important for operations such as expand() where the target tensor shape is explicitly given.
Two tensors are bidirectionally broadcastable if they can be mutually "stretched" (repeated) across their various dimensions, starting from the last dimension. For example, a [5,1] tensor can be bidirectionally broadcast with a [1,6] tensor by repeating the first tensor 6 times in the last dimension and the second tensor 5 times in preceding dimension. The result of the operation will be a [5,6] tensor. Bidirectional broadcasting is convenient for element-wise operations.
A tensor is blockwise broadcastable if the all dimensions can be upsampled by integer multiples to the target tensor’s shape. For example, a [4,5] tensor can be blockwise broadcast up to a [16,10] tensor as it is an exact multiple (16 % 4 = 0, 10 % 5 = 0) by repeating every element 4 times in the first dimension and every element 2 times in the last dimension (e.g. values [1,2,3,4,5] in the last dimensions would be repeated to [1,1,2,2,3,3,4,4,5,5]). However, a [4,5] tensor would be incompatible with a [9,3] tensor since both dimensions have a nonzero remainder (9 % 4 = 1, 3 % 5 = 3). Blockwise broadcasting is useful for sharing common values in larger blocks to save memory. Both tensors are expected to have the same rank, and the output shape is simply the target tensor’s shape which the smaller one is being upsampled to.
Some operations allow broadcasting with special semantics. For example, matmul() treats the last two dimensions of the input tensors as the rows and columns of the matrices, and the number of columns in the first matrix must be equal to the number of rows in the second matrix. The matrix multiplication is bidirectionally broadcast across any additional dimensions, treating the input tensors as stacks of matrices to multiply.
shapeFrom is unidirectionally broadcastable to shapeTo if unidirectionally broadcasting shapeFrom and shapeTo does not result in failure.
shapeA is bidirectionally broadcastable to shapeB if bidirectionally broadcasting shapeA and shapeB does not result in failure.
shapeFrom is blockwise broadcastable to shapeTo if blockwise broadcasting shapeFrom and shapeTo returns true.
9.2. Casting
Explicit numeric casting is used in algorithms where parameters passed as MLNumber or double need to be converted to match the MLOperandDataType of input or output MLOperands.
9.3. Miscellaneous
Remove this when a definition in [INFRA] is available. [whatwg/infra Issue #664]
10. Examples
constant1 ---+ +--- Add ---> intermediateOutput1 ---+ input1 ---+ | +--- Mul---> output constant2 ---+ | +--- Add ---> intermediateOutput2 ---+ input2 ---+
11. Operator Emulation
This section is non-normative.
Operations present in other neural network inference APIs can often be emulated using operations present in WebNN.
11.1. squeeze
11.2. unsqueeze
11.3. flatten
12. Appendices
12.1. MLOperandDataType and ArrayBufferView compatibility
MLOperandDataType
| ArrayBufferView
|
|---|---|
float32
| Float32Array
|
float16
| Float16Array
|
int64
| BigInt64Array
|
uint64
| BigUint64Array
|
int32
| Int32Array
|
uint32
| Uint32Array
|
int8
| Int8Array
|
uint8
| Uint8Array
|
Float16Array is at ECMA Stage 3 signaling its design is finished. Implementers wanting to enable this type ahead native implementations can emulate the type by passing raw bits via Uint16Array. [Issue webnn#373]
13. Acknowledgements
This specification follows the concepts of the Android Neural Networks API C API.
Thanks to Tomoyuki Shimizu, Ningxin Hu, Zhiqiang Yu and Belem Zhang for the use cases.
Thanks to Nikhil Thorat, Daniel Smilkov, Ganesan Ramalingam, Rafael Cintron and Benjamin Poulain for their contributions to the API specification.
Thanks to Sangwhan Moon and the W3C Technical Architecture Group for review of this specification for web architecture fit, design consistency and developer ergonomics.
Thanks to Zoltan Kis for adding algorithms and making navigating this specification a delightful experience. Thanks to Joshua Bell for aligning the specification with modern editorial conventions. Thanks to Ningxin Hu, Lisha Guo, Shiyi Zou, Mingming Xu, Junwei Fu, Bruce Dai and Bin Miao for careful review and comments.
Thanks to W3C Privacy Interest Group for privacy and security review and feedback.
Thanks to Alex Gough and the Chrome Security team for security review and questions.
Thanks to Michal Karzynski for sharing practical guidelines and learnings from ONNX.
Thanks to Kaustubha Govind and Chrome privacy reviewers for feedback and privacy considerations.
Thanks to Jiewei Qian for Chromium implementation review and feedback.
Thanks to Dwayne Robinson, Joshua Lochner and Wanming Lin for their work investigating and providing recommendation for transformer support. Additional thanks to Dwayne and Wanming for providing reviews of operator conformance and web-platform-tests implementation.
Thanks to Feng Dai for his continuous contributions that keep web-platform-tests evolving alongside the specification.
Thanks to Fuqiao Xue and the W3C Internationalization Activity for reviews and suggestions.
14. Changes
This section is non-normative.
This section documents changes made to this specification since its previous major publication in terms of Classes of Changes.
