Azure Speech Service: ConversationTranscriber via Private Endpoint returns 0 segments with 140s session_stopped delay - canadacentral

👁 Image

Amandeep Sadioura 0 Reputation points

Service Azure Cognitive Services — Speech Service (azure-cognitiveservices-speech==1.46.0, Python), AKS canadacentral, Private Endpoint.

Scenario Using ConversationTranscriber with the universal/v2 real-time endpoint accessed via a Cognitive Services Private Endpoint from an AKS cluster. The session establishes successfully but no transcription results are returned.

Result session_started fires in under 1s confirming the WebSocket connection is established. All audio is streamed successfully. However session_stopped fires after ~140s with 0 segments and no CANCELED event or error details regardless of audio length.

Environment

Speech resource region: canadacentral
SDK: azure-cognitiveservices-speech==1.46.0 (Python)
Endpoint: wss://<resource>.cognitiveservices.azure.com/stt/speech/universal/v2
Private Endpoint sub-resource: account
Private DNS zone: privatelink.cognitiveservices.azure.com with A record - private IP
AKS - Private Endpoint: TCP 443 reachable, NSG rules allow traffic in both directions

Troubleshooting steps taken

Check	Status
DNS resolution	Resolves to private IP via private DNS zone
Private endpoint sub-resource	`account`
TCP 443 to private endpoint	Reachable from AKS pod
NSG rules	Bidirectional TCP 443 allowed between AKS and PE subnet
`session_started`	Fires in <1s

All infrastructure verified on our end. No firewall between AKS and private endpoint — only NSGs.

Minimal reproducible steps (run inside AKS pod, 3s of silence)

import azure.cognitiveservices.speech as s
import time, os

cfg = s.SpeechConfig(
 subscription=os.environ['AZURE_SPEECH_KEY'],
 endpoint='wss://<resource>.cognitiveservices.azure.com/stt/speech/universal/v2'
)
fmt = s.audio.AudioStreamFormat(16000, 16, 1)
ps = s.audio.PushAudioInputStream(stream_format=fmt)
r = s.transcription.ConversationTranscriber(cfg, s.audio.AudioConfig(stream=ps))

t = time.time()
r.session_started.connect(lambda e: print(f'session_started T+{time.time()-t:.1f}s'))
r.session_stopped.connect(lambda e: print(f'session_stopped T+{time.time()-t:.1f}s'))
r.transcribed.connect(lambda e: print(f'transcribed: {e.result.text}'))

r.start_transcribing_async().get()
time.sleep(3)
ps.close()
time.sleep(180)

Output:

session_started T+0.4s
session_stopped T+140.2s

Question Does ConversationTranscriber via universal/v2 fully support Cognitive Services Private Endpoints in canadacentral? Specifically, does the private link account sub-resource cover the complete real-time diarization result delivery path, or are there additional endpoints required that are not covered by the private endpoint configuration?

0 comments No comments

2 answers

👁 Image

Amandeep Sadioura 0 Reputation points

Hi Harshitha,

First of all, thanks for your helpful response and following it we are able to resolve the issue Have a nice day
👁 Image

Manas Mohanty 17,185 Reputation points • Microsoft External Staff • Moderator

Hey Amandeep Sadioura

Thank you for your inputs here on forum

I was analysing the code and architecture at my side.

As per my last experiences -

Endpoint syntax changes in VNET scenario compared to public endpoint and We have to allow traffic to AKS in outbound rules.

Custom domain pointer was mentioned earlier.

We also need Cognitive Service tags in outbound rules.

cfg = s.SpeechConfig(subscription=KEY, region="canadacentral" )

Reference - https://learn.microsoft.com/en-us/azure/ai-services/speech-service/speech-service-vnet-service-endpoint

Feel free to share any observation to help others on same context.

Have converted Harshita's comment to answer in case you found our inputs useful

Thank you
Sign in to comment
👁 Image

Harshitha Eligeti 10 Reputation points • Microsoft External Staff • Moderator
Hello @Amandeep Sadioura
Based on what you described, this looks like a real-time ConversationTranscriber session that connects successfully (WebSocket up) but then stops after ~140s with no transcribed segments and no canceled error details.

Does ConversationTranscriber (universal/v2) fully support Speech Private Endpoint?

From the provided documentation, we can say the following:
- For Speech Private Endpoint scenarios, the service expects the client to use a custom domain for the Speech resource (required for private endpoints) and then replace the host name in request URLs with that custom domain. The private-link article explains that the URL construction changes, while the rest of the path stays the same.
- The doc also explicitly calls out that there are different endpoint sets for different Speech APIs (e.g., REST APIs vs SDK/other operations) and that you must use the correct endpoint URL pattern in private-link scenarios.
- However, the provided documentation does not explicitly confirm whether ConversationTranscriber (universal/v2) diarization result delivery is covered entirely by the Private Endpoint sub-resource account for Speech in canadacentral, nor does it enumerate any additional sub-resources/endpoints specific to ConversationTranscriber beyond the general private-link guidance.
So, using only the info available here: we can’t definitively answer whether account covers the entire ConversationTranscriber result path or whether additional private endpoints / endpoint transformations are required.

What you can validate with the available guidance
1. Custom domain / endpoint URL transformation
  
  The private-link doc emphasizes that after you enable private endpoints (and thus a custom domain), you typically need to replace the host name in your SDK endpoint URLs with the custom domain host name.
  
  If you’re currently using https://<resource>.cognitiveservices.azure.com/... rather than https://<custom-name>.cognitiveservices.azure.com/... (or the documented equivalent transformation for SDK), the connection may still establish, but other parts of the session/result flow can behave unexpectedly.
2. Use SDK logging to capture the real cause
  
  The “client issues” doc recommends enabling Speech SDK logging (by setting Speech_LogFilename) because it provides diagnostics and includes the session id—useful when diagnosing slow responses or cancellations.
  
  Since you’re not seeing a CANCELED event, logging is especially important to determine what the service/runtime did during the session_stopped at ~140s.
0 comments No comments

Sign in to comment

URL: https://learn.microsoft.com/en-us/answers/questions/5913884/azure-speech-service-conversationtranscriber-via-p

⇱ Azure Speech Service: ConversationTranscriber via Private Endpoint returns 0 segments with 140s session_stopped delay - canadacentral - Microsoft Q&A

Azure Speech Service: ConversationTranscriber via Private Endpoint returns 0 segments with 140s session_stopped delay - canadacentral

2 answers

Your answer