Note

Access to this page requires authorization. You can try signing in or .

Access to this page requires authorization. You can try .

TextCatalog.LatentDirichletAllocation Method

Definition

Namespace:: Microsoft.ML

Assembly:: Microsoft.ML.Transforms.dll

Package:: Microsoft.ML v4.0.1

Package:: Microsoft.ML v1.0.0

Package:: Microsoft.ML v1.1.0

Package:: Microsoft.ML v1.2.0

Package:: Microsoft.ML v1.3.1

Package:: Microsoft.ML v1.4.0

Package:: Microsoft.ML v1.5.5

Package:: Microsoft.ML v1.6.0

Package:: Microsoft.ML v1.7.0

Package:: Microsoft.ML v2.0.1

Package:: Microsoft.ML v3.0.1

Package:: Microsoft.ML v5.0.0-preview.1.25125.4

Source:: TextCatalog.cs

Source:: TextCatalog.cs

Source:: TextCatalog.cs

Important

Some information relates to prerelease product that may be substantially modified before it’s released. Microsoft makes no warranties, express or implied, with respect to the information provided here.

Create a LatentDirichletAllocationEstimator, which uses LightLDA to transform text (represented as a vector of floats) into a vector of Single indicating the similarity of the text with each topic identified.

public static Microsoft.ML.Transforms.Text.LatentDirichletAllocationEstimator LatentDirichletAllocation(this Microsoft.ML.TransformsCatalog.TextTransforms catalog, string outputColumnName, string inputColumnName = default, int numberOfTopics = 100, float alphaSum = 100, float beta = 0.01, int samplingStepCount = 4, int maximumNumberOfIterations = 200, int likelihoodInterval = 5, int numberOfThreads = 0, int maximumTokenCountPerDocument = 512, int numberOfSummaryTermsPerTopic = 10, int numberOfBurninIterations = 10, bool resetRandomGenerator = false);

static member LatentDirichletAllocation : Microsoft.ML.TransformsCatalog.TextTransforms * string * string * int * single * single * int * int * int * int * int * int * int * bool -> Microsoft.ML.Transforms.Text.LatentDirichletAllocationEstimator

<Extension()>
Public Function LatentDirichletAllocation (catalog As TransformsCatalog.TextTransforms, outputColumnName As String, Optional inputColumnName As String = Nothing, Optional numberOfTopics As Integer = 100, Optional alphaSum As Single = 100, Optional beta As Single = 0.01, Optional samplingStepCount As Integer = 4, Optional maximumNumberOfIterations As Integer = 200, Optional likelihoodInterval As Integer = 5, Optional numberOfThreads As Integer = 0, Optional maximumTokenCountPerDocument As Integer = 512, Optional numberOfSummaryTermsPerTopic As Integer = 10, Optional numberOfBurninIterations As Integer = 10, Optional resetRandomGenerator As Boolean = false) As LatentDirichletAllocationEstimator

Parameters

catalog: TransformsCatalog.TextTransforms

The transform's catalog.

outputColumnName: String

Name of the column resulting from the transformation of inputColumnName. This estimator outputs a vector of Single.

inputColumnName: String

Name of the column to transform. If set to null, the value of the outputColumnName will be used as source. This estimator operates over a vector of Single.

numberOfTopics: Int32

The number of topics.

alphaSum: Single

Dirichlet prior on document-topic vectors.

beta: Single

Dirichlet prior on vocab-topic vectors.

samplingStepCount: Int32

Number of Metropolis Hasting step.

maximumNumberOfIterations: Int32

Number of iterations.

likelihoodInterval: Int32

Compute log likelihood over local dataset on this iteration interval.

numberOfThreads: Int32

The number of training threads. Default value depends on number of logical processors.

maximumTokenCountPerDocument: Int32

The threshold of maximum count of tokens per doc.

numberOfSummaryTermsPerTopic: Int32

The number of words to summarize the topic.

numberOfBurninIterations: Int32

The number of burn-in iterations.

resetRandomGenerator: Boolean

Reset the random number generator for each document.

Returns

LatentDirichletAllocationEstimator

Examples

using System;
using System.Collections.Generic;
using Microsoft.ML;

namespace Samples.Dynamic
{
 public static class LatentDirichletAllocation
 {
 public static void Example()
 {
 // Create a new ML context, for ML.NET operations. It can be used for
 // exception tracking and logging, as well as the source of randomness.
 var mlContext = new MLContext();

 // Create a small dataset as an IEnumerable.
 var samples = new List<TextData>()
 {
 new TextData(){ Text = "ML.NET's LatentDirichletAllocation API " +
 "computes topic models." },

 new TextData(){ Text = "ML.NET's LatentDirichletAllocation API " +
 "is the best for topic models." },

 new TextData(){ Text = "I like to eat broccoli and bananas." },
 new TextData(){ Text = "I eat bananas for breakfast." },
 new TextData(){ Text = "This car is expensive compared to last " +
 "week's price." },

 new TextData(){ Text = "This car was $X last week." },
 };

 // Convert training data to IDataView.
 var dataview = mlContext.Data.LoadFromEnumerable(samples);

 // A pipeline for featurizing the text/string using 
 // LatentDirichletAllocation API. o be more accurate in computing the
 // LDA features, the pipeline first normalizes text and removes stop
 // words before passing tokens (the individual words, lower cased, with
 // common words removed) to LatentDirichletAllocation.
 var pipeline = mlContext.Transforms.Text.NormalizeText("NormalizedText",
 "Text")
 .Append(mlContext.Transforms.Text.TokenizeIntoWords("Tokens",
 "NormalizedText"))
 .Append(mlContext.Transforms.Text.RemoveDefaultStopWords("Tokens"))
 .Append(mlContext.Transforms.Conversion.MapValueToKey("Tokens"))
 .Append(mlContext.Transforms.Text.ProduceNgrams("Tokens"))
 .Append(mlContext.Transforms.Text.LatentDirichletAllocation(
 "Features", "Tokens", numberOfTopics: 3));

 // Fit to data.
 var transformer = pipeline.Fit(dataview);

 // Create the prediction engine to get the LDA features extracted from
 // the text.
 var predictionEngine = mlContext.Model.CreatePredictionEngine<TextData,
 TransformedTextData>(transformer);

 // Convert the sample text into LDA features and print it.
 PrintLdaFeatures(predictionEngine.Predict(samples[0]));
 PrintLdaFeatures(predictionEngine.Predict(samples[1]));

 // Features obtained post-transformation.
 // For LatentDirichletAllocation, we had specified numTopic:3. Hence
 // each prediction has been featurized as a vector of floats with length
 // 3.

 // Topic1 Topic2 Topic3
 // 0.6364 0.2727 0.0909
 // 0.5455 0.1818 0.2727
 }

 private static void PrintLdaFeatures(TransformedTextData prediction)
 {
 for (int i = 0; i < prediction.Features.Length; i++)
 Console.Write($"{prediction.Features[i]:F4} ");
 Console.WriteLine();
 }

 private class TextData
 {
 public string Text { get; set; }
 }

 private class TransformedTextData : TextData
 {
 public float[] Features { get; set; }
 }
 }
}

URL: https://learn.microsoft.com/en-us/dotnet/api/microsoft.ml.textcatalog.latentdirichletallocation?view=ml-dotnet-preview