Note
Access to this page requires authorization. You can try signing in or .
Access to this page requires authorization. You can try .
TextCatalog.ProduceWordBags Method
Definition
- Namespace:
- Microsoft.ML
- Assembly:
- Microsoft.ML.Transforms.dll
- Package:
- Microsoft.ML v4.0.1
- Package:
- Microsoft.ML v2.0.1
- Package:
- Microsoft.ML v3.0.1
- Package:
- Microsoft.ML v5.0.0-preview.1.25125.4
- Package:
- Microsoft.ML v1.0.0
- Package:
- Microsoft.ML v1.1.0
- Package:
- Microsoft.ML v1.2.0
- Package:
- Microsoft.ML v1.3.1
- Package:
- Microsoft.ML v1.4.0
- Package:
- Microsoft.ML v1.5.5
- Package:
- Microsoft.ML v1.6.0
- Package:
- Microsoft.ML v1.7.0
Important
Some information relates to prerelease product that may be substantially modified before it’s released. Microsoft makes no warranties, express or implied, with respect to the information provided here.
Overloads
| ProduceWordBags(TransformsCatalog+TextTransforms, String, Char, Char, String, Int32) |
Create a WordBagEstimator, which maps the column specified in |
| ProduceWordBags(TransformsCatalog+TextTransforms, String, String, Int32, Int32, Boolean, Int32, NgramExtractingEstimator+WeightingCriteria) |
Create a WordBagEstimator, which maps the column specified in |
| ProduceWordBags(TransformsCatalog+TextTransforms, String, String[], Int32, Int32, Boolean, Int32, NgramExtractingEstimator+WeightingCriteria) |
Create a WordBagEstimator, which maps the multiple columns specified in |
ProduceWordBags(TransformsCatalog+TextTransforms, String, Char, Char, String, Int32)
- Source:
- TextCatalog.cs
- Source:
- TextCatalog.cs
- Source:
- TextCatalog.cs
Create a WordBagEstimator, which maps the column specified in inputColumnName
to a vector of n-gram counts in a new column named outputColumnName.
public static Microsoft.ML.Transforms.Text.WordBagEstimator ProduceWordBags(this Microsoft.ML.TransformsCatalog.TextTransforms catalog, string outputColumnName, char termSeparator, char freqSeparator, string inputColumnName = default, int maximumNgramsCount = 10000000);
static member ProduceWordBags : Microsoft.ML.TransformsCatalog.TextTransforms * string * char * char * string * int -> Microsoft.ML.Transforms.Text.WordBagEstimator
<Extension()>
Public Function ProduceWordBags (catalog As TransformsCatalog.TextTransforms, outputColumnName As String, termSeparator As Char, freqSeparator As Char, Optional inputColumnName As String = Nothing, Optional maximumNgramsCount As Integer = 10000000) As WordBagEstimator
Parameters
- catalog
- TransformsCatalog.TextTransforms
The transform's catalog.
- outputColumnName
- String
Name of the column resulting from the transformation of inputColumnName.
This column's data type will be known-size vector of Single.
- termSeparator
- Char
- freqSeparator
- Char
- inputColumnName
- String
Name of the column to take the data from. <param name="maximumNgramsCount">Maximum number of n-grams to store in the dictionary.</param><param name="termSeparator">Separator used to separate terms/frequency pairs.</param><param name="freqSeparator">Separator used to separate terms from their frequency.</param> This estimator operates over vector of text.
- maximumNgramsCount
- Int32
Returns
Remarks
WordBagEstimator is different from NgramExtractingEstimator in that the former tokenizes text internally and the latter takes tokenized text as input.
Applies to
ProduceWordBags(TransformsCatalog+TextTransforms, String, String, Int32, Int32, Boolean, Int32, NgramExtractingEstimator+WeightingCriteria)
- Source:
- TextCatalog.cs
- Source:
- TextCatalog.cs
- Source:
- TextCatalog.cs
Create a WordBagEstimator, which maps the column specified in inputColumnName
to a vector of n-gram counts in a new column named outputColumnName.
public static Microsoft.ML.Transforms.Text.WordBagEstimator ProduceWordBags(this Microsoft.ML.TransformsCatalog.TextTransforms catalog, string outputColumnName, string inputColumnName = default, int ngramLength = 2, int skipLength = 0, bool useAllLengths = true, int maximumNgramsCount = 10000000, Microsoft.ML.Transforms.Text.NgramExtractingEstimator.WeightingCriteria weighting = Microsoft.ML.Transforms.Text.NgramExtractingEstimator+WeightingCriteria.Tf);
static member ProduceWordBags : Microsoft.ML.TransformsCatalog.TextTransforms * string * string * int * int * bool * int * Microsoft.ML.Transforms.Text.NgramExtractingEstimator.WeightingCriteria -> Microsoft.ML.Transforms.Text.WordBagEstimator
<Extension()>
Public Function ProduceWordBags (catalog As TransformsCatalog.TextTransforms, outputColumnName As String, Optional inputColumnName As String = Nothing, Optional ngramLength As Integer = 2, Optional skipLength As Integer = 0, Optional useAllLengths As Boolean = true, Optional maximumNgramsCount As Integer = 10000000, Optional weighting As NgramExtractingEstimator.WeightingCriteria = Microsoft.ML.Transforms.Text.NgramExtractingEstimator+WeightingCriteria.Tf) As WordBagEstimator
Parameters
- catalog
- TransformsCatalog.TextTransforms
The transform's catalog.
- outputColumnName
- String
Name of the column resulting from the transformation of inputColumnName.
This column's data type will be known-size vector of Single.
- inputColumnName
- String
Name of the column to take the data from. This estimator operates over vector of text.
- ngramLength
- Int32
Ngram length.
- skipLength
- Int32
Maximum number of tokens to skip when constructing an n-gram.
- useAllLengths
- Boolean
Whether to include all n-gram lengths up to ngramLength or only ngramLength.
- maximumNgramsCount
- Int32
Maximum number of n-grams to store in the dictionary.
Statistical measure used to evaluate how important a word is to a document in a corpus.
Returns
Remarks
WordBagEstimator is different from NgramExtractingEstimator in that the former tokenizes text internally and the latter takes tokenized text as input.
Applies to
ProduceWordBags(TransformsCatalog+TextTransforms, String, String[], Int32, Int32, Boolean, Int32, NgramExtractingEstimator+WeightingCriteria)
- Source:
- TextCatalog.cs
- Source:
- TextCatalog.cs
- Source:
- TextCatalog.cs
Create a WordBagEstimator, which maps the multiple columns specified in inputColumnNames
to a vector of n-gram counts in a new column named outputColumnName.
public static Microsoft.ML.Transforms.Text.WordBagEstimator ProduceWordBags(this Microsoft.ML.TransformsCatalog.TextTransforms catalog, string outputColumnName, string[] inputColumnNames, int ngramLength = 2, int skipLength = 0, bool useAllLengths = true, int maximumNgramsCount = 10000000, Microsoft.ML.Transforms.Text.NgramExtractingEstimator.WeightingCriteria weighting = Microsoft.ML.Transforms.Text.NgramExtractingEstimator+WeightingCriteria.Tf);
static member ProduceWordBags : Microsoft.ML.TransformsCatalog.TextTransforms * string * string[] * int * int * bool * int * Microsoft.ML.Transforms.Text.NgramExtractingEstimator.WeightingCriteria -> Microsoft.ML.Transforms.Text.WordBagEstimator
<Extension()>
Public Function ProduceWordBags (catalog As TransformsCatalog.TextTransforms, outputColumnName As String, inputColumnNames As String(), Optional ngramLength As Integer = 2, Optional skipLength As Integer = 0, Optional useAllLengths As Boolean = true, Optional maximumNgramsCount As Integer = 10000000, Optional weighting As NgramExtractingEstimator.WeightingCriteria = Microsoft.ML.Transforms.Text.NgramExtractingEstimator+WeightingCriteria.Tf) As WordBagEstimator
Parameters
- catalog
- TransformsCatalog.TextTransforms
The transform's catalog.
- outputColumnName
- String
Name of the column resulting from the transformation of inputColumnNames.
This column's data type will be known-size vector of Single.
- inputColumnNames
- String[]
Names of the multiple columns to take the data from. This estimator operates over vector of text.
- ngramLength
- Int32
Ngram length.
- skipLength
- Int32
Maximum number of tokens to skip when constructing an n-gram.
- useAllLengths
- Boolean
Whether to include all n-gram lengths up to ngramLength or only ngramLength.
- maximumNgramsCount
- Int32
Maximum number of n-grams to store in the dictionary.
Statistical measure used to evaluate how important a word is to a document in a corpus.
Returns
Remarks
WordBagEstimator is different from NgramExtractingEstimator in that the former tokenizes text internally and the latter takes tokenized text as input.
