How to implement AEC on iOS using 'MicrosoftCognitiveServicesSpeechEmbedded-iOS' (1.49.1)

Yin Chenqiao 20 Reputation points

What is the solution to implement AEC echo cancellation on iOS? The SDK used is 'MicrosoftCognitiveServicesSpeechEmbedded-iOS' (1.49.1)

The specific requirement is to always turn on continuous STT, and at the same time, the speaker has TTS sound playing. At this time, how to make the STT not recognize the sound from the speaker, but only recognize the sound input from the microphone?

  1. Karnam Venkata Rajeswari 3,835 Reputation points Microsoft External Staff Moderator

    Hello @Yin Chenqiao ,Following up to see if the below answer was helpful. If this answers your query, could you please take a moment to mark it as Accepted with an upvote? This helps others in the community with the same question find the solution more easily.

    Thank you

  2. Karnam Venkata Rajeswari 3,835 Reputation points Microsoft External Staff Moderator

    Hello @Yin Chenqiao ,

    Just checking in to see if you have got a chance to see my response to resolve the issue.

    Looking forward to your response and appreciate your time on this.

    If the query has been resolved, please accept the answer by clicking the "Upvote" and "Accept Answer" on the post.

    Thank you!


Sign in to comment

Answer accepted by question author

Karnam Venkata Rajeswari 3,835 Reputation points Microsoft External Staff Moderator

Hello @Yin Chenqiao ,

Welcome to Microsoft Q&A .Thank you for reaching out to us.

The Embedded Speech SDK for iOS does not provide a built‑in Acoustic Echo Cancellation (AEC) configuration. In addition, advanced echo‑cancellation pipelines provided by the Microsoft Audio Stack (MAS), including model‑based AEC, are supported only on Windows platforms and are not available on iOS.

On iOS, echo cancellation is performed entirely by the operating system’s audio framework, before audio is delivered to the Speech SDK. As a result, recognition accuracy depends on correct iOS audio session configuration rather than SDK‑level settings. 

Please check if the following helps-

  1. Enabling iOS Voice Processing Audio Session Echo cancellation should be enabled using native iOS audio capabilities:
    • Set AVAudioSession category to playAndRecord
    • Set audio session mode to voiceProcessing (preferred where available) or voiceChat
    • Activate and reuse the same audio session for both STT and TTS
    This configuration enables system‑level processing, including:
    • Acoustic Echo Cancellation (AEC)
    • Noise suppression
    • Automatic gain control
    This ensures microphone audio is echo‑reduced before being consumed by the Speech SDK.
  2. Speech SDK Audio Input Handling In standard iOS implementations, once voice processing is correctly enabled:
    • The default microphone input already contains echo‑reduced audio
    • The Speech SDK can consume this input directly
    • No SDK‑level AEC configuration is required
    In non‑standard or complex audio routing scenarios (for example, external audio devices, custom playback pipelines, or persistent residual echo):
    • Capturing post‑processed PCM audio from the iOS audio pipeline may be required
    • Feeding this audio into the Speech SDK using SPXPushAudioInputStream can help ensure only processed audio is recognized
    This stream‑based approach should be treated as a fallback option, not a default requirement.
  3. Unified Audio Routing for STT and TTS For stable echo cancellation behavior:
    • STT and TTS should share the same AVAudioSession
    • The audio session should not be reconfigured during playback
    • TTS output should remain within the configured audio route
    This allows iOS to correctly subtract playback audio from microphone input.

Practical limitations and expectations

Even with correct configuration:

  • iOS AEC significantly reduces echo but may not eliminate it completely
  • Residual echo can occur due to:
    • High speaker volume
    • Close proximity between speaker and microphone
    • Device hardware characteristics
    • Environmental acoustics

This behavior is expected in real‑time voice applications.

To further improve recognition accuracy:

  • Keep TTS playback volume at moderate levels
  • Avoid maximum speaker amplification
  • Maintain reasonable distance between speaker and microphone
  • Validate behavior in real acoustic environments
  • Use headphones or external audio routing when feasible

Please note that -

  1. No AEC toggle exists within the Speech SDK for iOS
  2. MAS‑based echo cancellation is not supported on iOS https://learn.microsoft.com/azure/ai-services/speech-service/audio-processing-model-based-echo-cancellation
  3. Echo cancellation effectiveness depends fully on iOS system audio processing and device hardware

To summarise

  1. Root cause: Echo occurs due to simultaneous playback and microphone capture in full‑duplex audio scenarios
  2. Primary solution: Enable iOS voiceProcessing or voiceChat audio session mode for system‑level AEC
  3. SDK role: Consumes already processed microphone audio without performing echo cancellation
  4. Fallback option: Use stream‑based audio input if residual echo persists despite correct configuration
  5. Expected outcome: Continuous STT with significantly reduced TTS interference, within documented platform limitations

The following references might be helpful , please check them out

Please let us know if the response was helpful

Thank you

Please 'Upvote'(Thumbs-up) and 'Accept' as answer if the response was helpful. This will be benefitting other community members who face the same issue.

0 comments No comments

Sign in to comment

2 additional answers

  1. Amanda Zhu 80 Reputation points

    There isn’t a Speech SDK switch that I can find for enabling AEC in MicrosoftCognitiveServicesSpeechEmbedded-iOS. The practical fix is to do echo cancellation before audio reaches the recognizer.

    On iOS, configure the shared AVAudioSession for voice capture/playback and use Apple’s echo-canceled input path, then pass that processed mic audio into Speech using a custom stream. 

    You can do this by:

    That keeps AEC ownership in iOS, while Speech only receives the cleaned microphone stream.

    0 comments No comments

    Sign in to comment
  2. kagiyama yutaka 3,685 Reputation points

    I think the SDK does not provide built‑in echo control, and the supported approach is to adjust input gain and TTS volume so the recognizer does not pick up speaker output.

    0 comments No comments

    Sign in to comment
Sign in to answer

Your answer