Azure custom translator training is taking more than 48 hours

Igarashi Tomomi 25 Reputation points

I’ve been training two models and the training is taking over 48 hours, but they won't finish.

I want to cancel or delete them and try again one by one, but it seems like I cannot cancel or delete because they are still "Training running".

Is there any way to stop training?

  1. Karnam Venkata Rajeswari 3,830 Reputation points Microsoft External Staff Moderator

    Hello Igarashi Tomomi,

    Did you get any chance to review the above response?

    Do let me know if you have any further queries.

    Thank you!

  2. Igarashi Tomomi 25 Reputation points

    I understood that there is no cancel or stop option available, therefore, I contacted the support team.

    Thank you for your reply.


Sign in to comment

Answer accepted by question author

Karnam Venkata Rajeswari 3,830 Reputation points Microsoft External Staff Moderator

Hello Igarashi Tomomi 

Welcome to Microsoft Q&A .Thank you for reaching out.

It is understood that you are unable to cancel or delete the custom translator model while it is stuck in the running/training phase for more than 48 hours.

As of now , there is no cancel or stop option available either on the portal or through API or via CLI for a Custom Translator training once it reaches “Training running” state. Additionally, deleting the model, project, or workspace is blocked while training is running

As the model is stuck in the training phase for 48+ hours, it usually indicates a backend stall. If it eventually flips to Failed state automatically deletion is allowed. 

Below are the resources for further reference and best practices

·       Quickstart: Build, deploy, and use a custom model - Custom Translator - Foundry Tools | Microsoft Learn

·       What are training and modeling? - Custom Translator - Foundry Tools | Microsoft Learn

·       Train a Custom Translator model - Foundry Tools | Microsoft Learn

·       Custom Translator for beginners - Foundry Tools | Microsoft Learn

·       What is Custom Translator? - Foundry Tools | Microsoft Learn

·       Frequently asked questions - Custom Translator - Foundry Tools | Microsoft Learn

·       Azure Translator documentation - quickstarts, tutorials, API reference - Foundry Tools | Microsoft Learn

Please let me know if you have any questions.

Thank you!

  1. Karnam Venkata Rajeswari 3,830 Reputation points Microsoft External Staff Moderator

    Hello Igarashi Tomomi ,

    Glad to hear that my response was helpful.

    Since I’ve converted my earlier comment into an answer, could you please take a moment to mark it as Accepted? This helps others in the community with the same question find the solution more easily.

    Thank you!


Sign in to comment

1 additional answer

  1. AI answer

    To cancel a training job in Azure, you can create a POST request using the appropriate URL for cancelling the training job. Here’s a general approach:

    1. Request URL: Use the following format for your API request. Replace the placeholder values with your own values:
       {ENDPOINT}/language/authoring/analyze-conversations/projects/{PROJECT-NAME}/train/jobs/{JOB-ID}/:cancel?api-version={API-VERSION}
      
    2. Headers and JSON Body: Ensure you include the necessary headers and body as required by the API.

    If you are using the Azure SDK for .NET, you can also use the CancelTrainingJob method from the TextAuthoringProject class. This method allows you to trigger a cancellation for a running training job. You can call it as follows:

    public virtual Azure.Operation<BinaryData> CancelTrainingJob(Azure.WaitUntil waitUntil, string jobId, Azure.RequestContext context);
    

    Make sure to replace jobId with the actual ID of your training job. If you are facing issues with the cancellation, ensure that you have the correct permissions and that the job ID is valid.

    After cancelling the jobs, you can attempt to train them one by one as you planned.

    AI-generated content may be incorrect. Read our transparency notes for more information.
Sign in to answer

Your answer