👁 Image
Tiktoken.Encodings.p50k 3.1.5

.NET 8.0 .NET Standard 2.0 .NET Framework 4.6.2

dotnet add package Tiktoken.Encodings.p50k --version 3.1.5

NuGet\Install-Package Tiktoken.Encodings.p50k -Version 3.1.5

This command is intended to be used within the Package Manager Console in Visual Studio, as it uses the NuGet module's version of Install-Package.

<PackageReference Include="Tiktoken.Encodings.p50k" Version="3.1.5" />

For projects that support PackageReference, copy this XML node into the project file to reference the package.

<PackageVersion Include="Tiktoken.Encodings.p50k" Version="3.1.5" />
 

 Directory.Packages.props

<PackageReference Include="Tiktoken.Encodings.p50k" />
 

 Project file

For projects that support Central Package Management (CPM), copy this XML node into the solution Directory.Packages.props file to version the package.

paket add Tiktoken.Encodings.p50k --version 3.1.5

The NuGet Team does not provide support for this client. Please contact its maintainers for support.

#r "nuget: Tiktoken.Encodings.p50k, 3.1.5"

#r directive can be used in F# Interactive and Polyglot Notebooks. Copy this into the interactive tool or source code of the script to reference the package.

#:package Tiktoken.Encodings.p50k@3.1.5

#:package directive can be used in C# file-based apps starting in .NET 10 preview 4. Copy this into a .cs file before any lines of code to reference the package.

#addin nuget:?package=Tiktoken.Encodings.p50k&version=3.1.5
 

 Install as a Cake Addin

#tool nuget:?package=Tiktoken.Encodings.p50k&version=3.1.5
 

 Install as a Cake Tool

The NuGet Team does not provide support for this client. Please contact its maintainers for support.

Tiktoken

👁 Nuget package
👁 dotnet
👁 License: MIT
👁 Discord
👁 Throughput

One of the fastest BPE tokenizers in any language — the fastest in .NET, competitive with pure Rust implementations. Zero-allocation token counting, built-in multilingual cache, and up to 42x faster than other .NET tokenizers. We will be happy to accept any PR.

Implemented encodings

o200k_base
cl100k_base
r50k_base
p50k_base
p50k_edit

Usage

using Tiktoken;

var encoder = TikTokenEncoder.CreateForModel(Models.Gpt4o);
var tokens = encoder.Encode("hello world"); // [15339, 1917]
var text = encoder.Decode(tokens); // hello world
var numberOfTokens = encoder.CountTokens(text); // 2
var stringTokens = encoder.Explore(text); // ["hello", " world"]

// Alternative APIs:
var encoder = ModelToEncoder.For("gpt-4o");
var encoder = new Encoder(new O200KBase());

Load from HuggingFace tokenizer.json

The Tiktoken.Encodings.Tokenizer package enables loading any HuggingFace-format tokenizer.json file — supporting GPT-2, Llama 3, Qwen2, DeepSeek, and other BPE-based models.

using Tiktoken;
using Tiktoken.Encodings;

// From a local file
var encoding = TokenizerJsonLoader.FromFile("path/to/tokenizer.json");
var encoder = new Encoder(encoding);

// From a stream (HTTP responses, embedded resources)
using var stream = File.OpenRead("tokenizer.json");
var encoding = TokenizerJsonLoader.FromStream(stream);

// From a URL (e.g., HuggingFace Hub)
using var httpClient = new HttpClient();
var encoding = await TokenizerJsonLoader.FromUrlAsync(
 new Uri("https://huggingface.co/openai-community/gpt2/raw/main/tokenizer.json"),
 httpClient,
 name: "gpt2");

// Custom regex patterns (optional — auto-detected by default)
var encoding = TokenizerJsonLoader.FromFile("tokenizer.json", patterns: myPatterns);

Supported pre-tokenizer types:

ByteLevel — GPT-2 and similar models
Split with regex pattern — direct regex-based splitting
Sequence[Split, ByteLevel] — Llama 3, Qwen2, DeepSeek, and other modern models

Count message tokens (OpenAI chat format)

Count tokens for chat messages using OpenAI's official formula, including support for function/tool definitions:

using Tiktoken;

// Simple message counting
var messages = new List<ChatMessage>
{
 new("system", "You are a helpful assistant."),
 new("user", "What is the weather in Paris?"),
};
int count = TikTokenEncoder.CountMessageTokens("gpt-4o", messages);

// With tool/function definitions
var tools = new List<ChatFunction>
{
 new("get_weather", "Get the current weather", new List<FunctionParameter>
 {
 new("location", "string", "The city name", isRequired: true),
 new("unit", "string", "Temperature unit",
 enumValues: new[] { "celsius", "fahrenheit" }),
 }),
};
int countWithTools = TikTokenEncoder.CountMessageTokens("gpt-4o", messages, tools);

// Or use the Encoder instance directly
var encoder = ModelToEncoder.For("gpt-4o");
int toolTokens = encoder.CountToolTokens(tools);

Nested object parameters and array types are also supported:

new FunctionParameter("address", "object", "Mailing address", properties: new List<FunctionParameter>
{
 new("street", "string", "Street address", isRequired: true),
 new("city", "string", "City name", isRequired: true),
});

Custom encodings

Load encoding data from .tiktoken text files or .ttkb binary files:

using Tiktoken.Encodings;

// Load from file (auto-detects format by extension)
var ranks = EncodingLoader.LoadEncodingFromFile("my_encoding.ttkb");
var ranks = EncodingLoader.LoadEncodingFromFile("my_encoding.tiktoken");

// Load from binary byte array (e.g., from embedded resource or network)
byte[] binaryData = File.ReadAllBytes("my_encoding.ttkb");
var ranks = EncodingLoader.LoadEncodingFromBinaryData(binaryData);

// Convert text format to binary for faster loading
var textRanks = EncodingLoader.LoadEncodingFromFile("my_encoding.tiktoken");
using var output = File.Create("my_encoding.ttkb");
EncodingLoader.WriteEncodingToBinaryStream(output, textRanks);

The .ttkb binary format loads ~30% faster than .tiktoken text (no base64 decoding) and is 34% smaller. See for the format specification and conversion tools.

Benchmarks

Benchmarked on Apple M4 Max, .NET 10.0, o200k_base encoding. Tested with diverse inputs: short ASCII, multilingual (12 scripts + emoji), CJK-heavy, Python code, and long documents.

CountTokens — zero allocation, fastest in class

Input	SharpToken	TiktokenSharp	Microsoft.ML	Tiktoken	Throughput	Speedup
Hello, World! (13 B)	217 ns	164 ns	319 ns	88 ns	141 MiB/s	1.9-3.6x
Multilingual (382 B, 12 scripts)	14.7 us	9.5 us	5.1 us	1.1 us	339 MiB/s	4.7-13.6x
CJK-heavy (1,676 B, 6 scripts)	109.4 us	65.6 us	37.0 us	2.6 us	618 MiB/s	14.3-42.3x
Python code (879 B)	13.1 us	9.7 us	21.6 us	5.5 us	153 MiB/s	1.8-4.0x
Multilingual long (4,312 B)	283.1 us	155.7 us	71.0 us	9.0 us	458 MiB/s	7.9-31.6x
Bitcoin whitepaper (19,884 B)	400.3 us	255.4 us	321.3 us	105.1 us	180 MiB/s	2.4-3.8x

Zero allocation across all inputs (0 B). Tiktoken's advantage is most pronounced on multilingual/CJK text — up to 42x faster than competitors. Throughput on cached multilingual text reaches 618 MiB/s, competitive with the fastest Rust tokenizers.

Cache effect on CountTokens

Built-in token cache dramatically accelerates repeated non-ASCII patterns:

Input	No cache	Cached	Cache speedup
Hello, World! (13 B)	88 ns	86 ns	—
Multilingual (382 B)	5.4 us	1.1 us	4.9x
CJK-heavy (1,676 B)	33.7 us	2.6 us	13.1x
Python code (879 B)	5.6 us	5.5 us	—
Multilingual long (4,312 B)	78.0 us	9.0 us	8.6x
Bitcoin whitepaper (19,884 B)	122.7 us	104.9 us	1.2x

Cache has no effect on ASCII-dominant inputs (already on fast path). On multilingual/CJK text, cache provides 5-13x speedup by skipping UTF-8 conversion and BPE on subsequent calls. Cold-path performance was significantly improved by the O(n log n) min-heap BPE merge optimization.

Encode — returns token IDs

Input	SharpToken	TiktokenSharp	Microsoft.ML	Tiktoken	Throughput	Speedup
Hello, World! (13 B)	214 ns	163 ns	316 ns	109 ns	114 MiB/s	1.5-2.9x
Multilingual (382 B, 12 scripts)	14.5 us	9.4 us	5.2 us	1.3 us	287 MiB/s	4.1-11.4x
CJK-heavy (1,676 B, 6 scripts)	107.9 us	64.7 us	37.0 us	3.3 us	484 MiB/s	11.2-32.7x
Python code (879 B)	13.1 us	9.7 us	23.6 us	5.8 us	145 MiB/s	1.7-4.1x
Multilingual long (4,312 B)	276.4 us	151.3 us	70.7 us	10.9 us	376 MiB/s	6.5-25.2x
Bitcoin whitepaper (19,884 B)	366.1 us	245.5 us	317.7 us	111.8 us	170 MiB/s	2.2-3.3x

Same performance characteristics as CountTokens, with additional allocation for the output int[] array.

Construction — encoder initialization

Encoding	Time	Description
o200k_base	0.78 ms	GPT-4o (200K vocab, pre-computed hash table, lazy FastEncoder)
cl100k_base	0.46 ms	GPT-3.5/4 (100K vocab)

Encoder construction includes loading embedded binary data, building hash tables, and compiling regex. FastEncoder and Decoder dictionaries are lazy-initialized on first use only. Reuse Encoder instances across calls for best performance.

Cross-language context

All numbers below measured on Apple M4 Max with identical inputs and o200k_base encoding. See for full reports.

Implementation	Language	Encode Throughput	CountTokens Throughput	Notes
Tiktoken (cached)	.NET/C#	114-484 MiB/s	141-618 MiB/s	Zero-alloc counting; cache gives 5-13x on multilingual
Tiktoken (no cache)	.NET/C#	44-145 MiB/s	47-155 MiB/s	Cold/first-call with O(n log n) min-heap BPE merge
`tiktoken` v3	Rust	34-88 MiB/s	—	Pure Rust, arena-based
GitHub `bpe` v0.3	Rust	33-64 MiB/s	29-66 MiB/s	Aho-Corasick, O(n) worst case
OpenAI tiktoken 0.12	Python	7-20 MiB/s	—	Rust core, but Python FFI overhead

.NET Tiktoken's token cache makes it dramatically faster than native Rust on repeated/multilingual text — up to 7x faster than the fastest Rust tokenizer on CJK text. Even without the cache, .NET is competitive with or faster than both Rust crates on most inputs thanks to the O(n log n) min-heap BPE merge optimization.

You can view the full raw BenchmarkDotNet reports for each version .

CLI Tool

ttok is a standalone CLI for counting, encoding, decoding, and exploring tokens — powered by this library. NativeAOT-compiled for instant startup.

# Install (macOS)
brew install tryAGI/tap/ttok

# Install (macOS/Linux)
curl -fsSL https://raw.githubusercontent.com/tryAGI/Tiktoken/main/install.sh | sh

# Install (.NET global tool)
dotnet tool install -g Tiktoken.Cli

# Usage
echo "Hello world" | ttok # 3
ttok src/ --include "*.cs" # count tokens in files

See the full for all options and install methods.

Support

Priority place for bugs: https://github.com/tryAGI/LangChain/issues
Priority place for ideas and general questions: https://github.com/tryAGI/LangChain/discussions
Discord: https://discord.gg/Ca2xhfBf3v

Product	Versions Compatible and additional computed target framework versions.
.NET	net5.0 net5.0 was computed. net5.0-windows net5.0-windows was computed. net6.0 net6.0 was computed. net6.0-android net6.0-android was computed. net6.0-ios net6.0-ios was computed. net6.0-maccatalyst net6.0-maccatalyst was computed. net6.0-macos net6.0-macos was computed. net6.0-tvos net6.0-tvos was computed. net6.0-windows net6.0-windows was computed. net7.0 net7.0 was computed. net7.0-android net7.0-android was computed. net7.0-ios net7.0-ios was computed. net7.0-maccatalyst net7.0-maccatalyst was computed. net7.0-macos net7.0-macos was computed. net7.0-tvos net7.0-tvos was computed. net7.0-windows net7.0-windows was computed. net8.0 net8.0 is compatible. net8.0-android net8.0-android was computed. net8.0-browser net8.0-browser was computed. net8.0-ios net8.0-ios was computed. net8.0-maccatalyst net8.0-maccatalyst was computed. net8.0-macos net8.0-macos was computed. net8.0-tvos net8.0-tvos was computed. net8.0-windows net8.0-windows was computed. net9.0 net9.0 is compatible. net9.0-android net9.0-android was computed. net9.0-browser net9.0-browser was computed. net9.0-ios net9.0-ios was computed. net9.0-maccatalyst net9.0-maccatalyst was computed. net9.0-macos net9.0-macos was computed. net9.0-tvos net9.0-tvos was computed. net9.0-windows net9.0-windows was computed. net10.0 net10.0 is compatible. net10.0-android net10.0-android was computed. net10.0-browser net10.0-browser was computed. net10.0-ios net10.0-ios was computed. net10.0-maccatalyst net10.0-maccatalyst was computed. net10.0-macos net10.0-macos was computed. net10.0-tvos net10.0-tvos was computed. net10.0-windows net10.0-windows was computed.
.NET Core	netcoreapp2.0 netcoreapp2.0 was computed. netcoreapp2.1 netcoreapp2.1 was computed. netcoreapp2.2 netcoreapp2.2 was computed. netcoreapp3.0 netcoreapp3.0 was computed. netcoreapp3.1 netcoreapp3.1 was computed.
.NET Standard	netstandard2.0 netstandard2.0 is compatible. netstandard2.1 netstandard2.1 is compatible.
.NET Framework	net461 net461 was computed. net462 net462 is compatible. net463 net463 was computed. net47 net47 was computed. net471 net471 was computed. net472 net472 was computed. net48 net48 was computed. net481 net481 was computed.
MonoAndroid	monoandroid monoandroid was computed.
MonoMac	monomac monomac was computed.
MonoTouch	monotouch monotouch was computed.
Tizen	tizen40 tizen40 was computed. tizen60 tizen60 was computed.
Xamarin.iOS	xamarinios xamarinios was computed.
Xamarin.Mac	xamarinmac xamarinmac was computed.
Xamarin.TVOS	xamarintvos xamarintvos was computed.
Xamarin.WatchOS	xamarinwatchos xamarinwatchos was computed.

Compatible target framework(s)

Included target framework(s) (in package)

Learn more about Target Frameworks and .NET Standard.

.NETFramework 4.6.2
- Tiktoken.Encodings.Abstractions (>= 3.1.5)
.NETStandard 2.0
- Tiktoken.Encodings.Abstractions (>= 3.1.5)
.NETStandard 2.1
- Tiktoken.Encodings.Abstractions (>= 3.1.5)
net10.0
- Tiktoken.Encodings.Abstractions (>= 3.1.5)
net8.0
- Tiktoken.Encodings.Abstractions (>= 3.1.5)
net9.0
- Tiktoken.Encodings.Abstractions (>= 3.1.5)

NuGet packages (1)

Showing the top 1 NuGet packages that depend on Tiktoken.Encodings.p50k:

Package	Downloads
Tiktoken The fastest tokenizer for GPT-3.5 and GPT-4 inspired by Tiktoken.

GitHub repositories

This package is not used by any popular GitHub repositories.

Version	Downloads	Last Updated
3.1.5	20,400	4/30/2026
3.1.5-alpha.0.6	118	4/8/2026
3.1.4	24,703	3/24/2026
3.1.3	475	3/24/2026
3.1.2	159	3/24/2026
3.1.2-alpha.0.1	76	3/24/2026
3.1.1	157	3/24/2026
3.1.1-alpha.0.1	78	3/24/2026
3.1.0	160	3/23/2026
3.1.0-rc.1.4	74	3/23/2026
3.1.0-rc.1.3	72	3/23/2026
3.1.0-rc.1.2	76	3/23/2026
3.1.0-rc.1.1	75	3/23/2026
3.1.0-rc.1	81	3/23/2026
3.0.1-alpha.0.4	69	3/23/2026
3.0.0	802	3/22/2026
2.2.0	2,594	11/15/2024
2.1.1	206	11/9/2024
2.1.0	234	11/9/2024
2.0.3	255	7/10/2024

Loading failed

URL: https://www.nuget.org/packages/Tiktoken.Encodings.p50k/

⇱ NuGet Gallery | Tiktoken.Encodings.p50k 3.1.5

👁 Image
Tiktoken.Encodings.p50k 3.1.5

Tiktoken

Implemented encodings

Usage

Load from HuggingFace tokenizer.json

Count message tokens (OpenAI chat format)

Custom encodings

Benchmarks

CountTokens — zero allocation, fastest in class

Cache effect on CountTokens

Encode — returns token IDs

Construction — encoder initialization

Cross-language context

CLI Tool

Support

.NETFramework 4.6.2

.NETStandard 2.0

.NETStandard 2.1

net10.0

net8.0

net9.0

NuGet packages (1)

GitHub repositories

URL: https://www.nuget.org/packages/Tiktoken.Encodings.p50k/

⇱ NuGet Gallery | Tiktoken.Encodings.p50k 3.1.5

👁 Image Tiktoken.Encodings.p50k 3.1.5

Tiktoken

Implemented encodings

Usage

Load from HuggingFace tokenizer.json

Count message tokens (OpenAI chat format)

Custom encodings

Benchmarks

CountTokens — zero allocation, fastest in class

Cache effect on CountTokens

Encode — returns token IDs

Construction — encoder initialization

Cross-language context

CLI Tool

Support

.NETFramework 4.6.2

.NETStandard 2.0

.NETStandard 2.1

net10.0

net8.0

net9.0

NuGet packages (1)

GitHub repositories

👁 Image
Tiktoken.Encodings.p50k 3.1.5