![]() |
VOOZH | about |
dotnet add package Pidgin --version 3.5.1
NuGet\Install-Package Pidgin -Version 3.5.1
<PackageReference Include="Pidgin" Version="3.5.1" />
<PackageVersion Include="Pidgin" Version="3.5.1" />Directory.Packages.props
<PackageReference Include="Pidgin" />Project file
paket add Pidgin --version 3.5.1
#r "nuget: Pidgin, 3.5.1"
#:package Pidgin@3.5.1
#addin nuget:?package=Pidgin&version=3.5.1Install as a Cake Addin
#tool nuget:?package=Pidgin&version=3.5.1Install as a Cake Tool
A lightweight, fast, and flexible parsing library for C#.
Pidgin is available on Nuget. API docs are hosted on my website.
There's a tutorial on using Pidgin to parse a subset of Prolog on my website.
Pidgin is a parser combinator library, a lightweight, high-level, declarative tool for constructing parsers. Parsers written with parser combinators look like a high-level specification of a language's grammar, but they're expressed within a general-purpose programming language and require no special tools to produce executable code. Parser combinators are more powerful than regular expressions - they can parse a larger class of languages - but simpler and easier to use than parser generators like ANTLR.
Pidgin's core type, Parser<TToken, T>, represents a procedure which consumes an input stream of TTokens, and may either fail with a parsing error or produce a T as output. You can think of it as:
delegate T? Parser<TToken, T>(IEnumerator<TToken> input);
In order to start building parsers we need to import two classes which contain factory methods: Parser and Parser<TToken>.
using Pidgin;
using static Pidgin.Parser;
using static Pidgin.Parser<char>; // we'll be parsing strings - sequences of characters. For other applications (eg parsing binary file formats) TToken may be some other type (eg byte).
Now we can create some simple parsers. Any represents a parser which consumes a single character and returns that character.
Assert.AreEqual('a', Any.ParseOrThrow("a"));
Assert.AreEqual('b', Any.ParseOrThrow("b"));
Char, an alias for Token, consumes a particular character and returns that character. If it encounters some other character then it fails.
Parser<char, char> parser = Char('a');
Assert.AreEqual('a', parser.ParseOrThrow("a"));
Assert.Throws<ParseException>(() => parser.ParseOrThrow("b"));
Digit parses and returns a single digit character.
Assert.AreEqual('3', Digit.ParseOrThrow("3"));
Assert.Throws<ParseException>(() => Digit.ParseOrThrow("a"));
String parses and returns a particular string. If you give it input other than the string it was expecting it fails.
Parser<char, string> parser = String("foo");
Assert.AreEqual("foo", parser.ParseOrThrow("foo"));
Assert.Throws<ParseException>(() => parser.ParseOrThrow("bar"));
Return (and its synonym FromResult) never consumes any input, and just returns the given value. Likewise, Fail always fails without consuming any input.
Parser<char, int> parser = Return(3);
Assert.AreEqual(3, parser.ParseOrThrow("foo"));
Parser<char, int> parser2 = Fail<int>();
Assert.Throws<ParseException>(() => parser2.ParseOrThrow("bar"));
The power of parser combinators is that you can build big parsers out of little ones. The simplest way to do this is using Then, which builds a new parser representing two parsers applied sequentially (discarding the result of the first).
Parser<char, string> parser1 = String("foo");
Parser<char, string> parser2 = String("bar");
Parser<char, string> sequencedParser = parser1.Then(parser2);
Assert.AreEqual("bar", sequencedParser.ParseOrThrow("foobar")); // "foo" got thrown away
Assert.Throws<ParseException>(() => sequencedParser.ParseOrThrow("food"));
Before throws away the second result, not the first.
Parser<char, string> parser1 = String("foo");
Parser<char, string> parser2 = String("bar");
Parser<char, string> sequencedParser = parser1.Before(parser2);
Assert.AreEqual("foo", sequencedParser.ParseOrThrow("foobar")); // "bar" got thrown away
Assert.Throws<ParseException>(() => sequencedParser.ParseOrThrow("food"));
Map does a similar job, except it keeps both results and applies a transformation function to them. This is especially useful when you want your parser to return a custom data structure. (Map has overloads which operate on between one and eight parsers; the one-parser version also has a postfix synonym Select.)
Parser<char, string> parser1 = String("foo");
Parser<char, string> parser2 = String("bar");
Parser<char, string> sequencedParser = Map((foo, bar) => bar + foo, parser1, parser2);
Assert.AreEqual("barfoo", sequencedParser.ParseOrThrow("foobar"));
Assert.Throws<ParseException>(() => sequencedParser.ParseOrThrow("food"));
Bind uses the result of a parser to choose the next parser. This enables parsing of context-sensitive languages. For example, here's a parser which parses any character repeated twice.
/// parse any character, then parse a character matching the first character
Parser<char, char> parser = Any.Bind(c => Char(c));
Assert.AreEqual('a', parser.ParseOrThrow("aa"));
Assert.AreEqual('b', parser.ParseOrThrow("bb"));
Assert.Throws<ParseException>(() => parser.ParseOrThrow("ab"));
Pidgin parsers support LINQ query syntax. It may be easier to see what the above example does when it's written out using LINQ:
Parser<char, char> parser =
from c in Any
from c2 in Char(c)
select c2;
Parsers written like this look like a simple imperative script. "Run the Any parser and name its result c, then run Char(c) and name its result c2, then return c2."
Or represents a parser which can parse one of two alternatives. It runs the left parser first, and if it fails it tries the right parser.
Parser<char, string> parser = String("foo").Or(String("bar"));
Assert.AreEqual("foo", parser.ParseOrThrow("foo"));
Assert.AreEqual("bar", parser.ParseOrThrow("bar"));
Assert.Throws<ParseException>(() => parser.ParseOrThrow("baz"));
OneOf is equivalent to Or, except it takes a variable number of arguments. Here's a parser which is equivalent to the one using Or above:
Parser<char, string> parser = OneOf(String("foo"), String("bar"));
If one of Or or OneOf's component parsers fails after consuming input, the whole parser will fail.
Parser<char, string> parser = String("food").Or(String("foul"));
Assert.Throws<ParseException>(() => parser.ParseOrThrow("foul")); // why didn't it choose the second option?
What happened here? When a parser successfully parses a character from the input stream, it advances the input stream to the next character. Or only chooses the next alternative if the given parser fails without consuming any input; Pidgin does not perform any lookahead or backtracking by default. Backtracking is enabled using the Try function.
// apply Try to the first option, so we can return to the beginning if it fails
Parser<char, string> parser = Try(String("food")).Or(String("foul"));
Assert.AreEqual("foul", parser.ParseOrThrow("foul"));
Almost any non-trivial programming language, markup language, or data interchange language will feature some sort of recursive structure. C# doesn't support recursive values: a recursive referral to a variable currently being initialised will return null. So we need some sort of deferred execution of recursive parsers, which Pidgin enables using the Rec combinator. Here's a simple parser which parses arbitrarily nested parentheses with a single digit inside them.
Parser<char, char> expr = null;
Parser<char, char> parenthesised = Char('(')
.Then(Rec(() => expr)) // using a lambda to (mutually) recursively refer to expr
.Before(Char(')'));
expr = Digit.Or(parenthesised);
Assert.AreEqual('1', expr.ParseOrThrow("1"));
Assert.AreEqual('1', expr.ParseOrThrow("(1)"));
Assert.AreEqual('1', expr.ParseOrThrow("(((1)))"));
However, Pidgin does not support left recursion. A parser must consume some input before making a recursive call. The following example will produce a stack overflow because a recursive call to arithmetic occurs before any input can be consumed by Digit or Char('+'):
Parser<char, int> arithmetic = null;
Parser<char, int> addExpr = Map(
(x, _, y) => x + y,
Rec(() => arithmetic),
Char('+'),
Rec(() => arithmetic)
);
arithmetic = addExpr.Or(Digit.Select(d => (int)char.GetNumericValue(d)));
arithmetic.Parse("2+2"); // stack overflow!
Another powerful element of this programming model is that you can write your own functions to compose parsers. Pidgin contains a large number of higher-level combinators, built from the primitives outlined above. For example, Between runs a parser surrounded by two others, keeping only the result of the central parser.
Parser<TToken, T> InBraces<TToken, T, U, V>(this Parser<TToken, T> parser, Parser<TToken, U> before, Parser<TToken, V> after)
=> before.Then(parser).Before(after);
Pidgin features operator-precedence parsing tools, for parsing expression grammars with associative infix operators. The ExpressionParser class builds a parser from a parser to parse a single expression term and a table of operators with rules to combine expressions.
Examples, such as parsing (a subset of) JSON and XML into document structures, can be found in the Pidgin.Examples project.
Why doesn't this code compile?
class Base {}
class Derived : Base {}
Parser<char, Base> p = Return(new Derived()); // Cannot implicitly convert type 'Pidgin.Parser<char, Derived>' to 'Pidgin.Parser<char, Base>'
This would be possible if Parser were defined as a covariant in its second type parameter (ie interface Parser<TToken, out T>). For the purposes of efficiency, Pidgin parsers return a struct. Structs and classes aren't allowed to have variant type parameters (only interfaces and delegates); since a Pidgin parser's return value isn't variant, nor can the parser itself.
In my experience, this crops up most frequently when returning a node of a syntax tree from a parser using Select. The least verbose way of rectifying this is to explicitly set Select's type parameter to the supertype:
Parser<char, Base> p = Any.Select<Base>(() => new Derived());
Pidgin is designed to be fast and produce a minimum of garbage. A carefully written Pidgin parser can be competitive with a hand-written recursive descent parser. If you find that parsing is a bottleneck in your code, here are some tips for minimising the runtime of your parser.
SelectMany, however, for long queries the translation can allocate a large number of anonymous objects. This generates a lot of garbage; while those objects often won't survive the nursery it's still preferable to avoid allocating them!TextReader or an IEnumerable, Try buffers its input to enable backtracking, which can be expensive.Skip* parsers can be used when the result of parsing is not required. They typically run faster than their counterparts because they don't need to save the values generated.Bind and SelectMany where possible. Many practical grammars are context-free and can therefore be written purely with Map. If you do have a context-sensitive grammar, it may make sense to parse it in a context-free fashion and then run a semantic checker over the result.Sprache is another parser combinator library for C# and served as one of the sources of inspiration for Pidgin. Pidgin's API is somewhat similar to that of Sprache, but Pidgin aims to improve on Sprache in a number of ways:
FParsec is a parser combinator library for F# based on Parsec.
This is how Pidgin compares to other tools in terms of performance. The benches can be found in the Pidgin.Bench project.
BenchmarkDotNet=v0.11.5, OS=Windows 10.0.14393.3384 (1607/AnniversaryUpdate/Redstone1)
Intel Core i5-4460 CPU 3.20GHz (Haswell), 1 CPU, 4 logical and 4 physical cores
Frequency=3125000 Hz, Resolution=320.0000 ns, Timer=TSC
.NET Core SDK=3.1.100
[Host] : .NET Core 3.0.0 (CoreCLR 4.700.19.46205, CoreFX 4.700.19.46214), 64bit RyuJIT
DefaultJob : .NET Core 3.0.0 (CoreCLR 4.700.19.46205, CoreFX 4.700.19.46214), 64bit RyuJIT
ExpressionBench| Method | Mean | Error | StdDev | Ratio | RatioSD | Gen 0 | Gen 1 | Gen 2 | Allocated |
|---|---|---|---|---|---|---|---|---|---|
| LongInfixL_Pidgin | 625,148.8 ns | 3,015.040 ns | 2,672.7541 ns | 2.25 | 0.01 | - | - | - | 128 B |
| LongInfixR_Pidgin | 625,530.1 ns | 4,104.833 ns | 3,839.6633 ns | 2.25 | 0.02 | - | - | - | 128 B |
| LongInfixL_FParsec | 278,035.1 ns | 1,231.538 ns | 1,151.9816 ns | 1.00 | 0.00 | - | - | - | 200 B |
| LongInfixR_FParsec | 326,047.3 ns | 931.485 ns | 871.3119 ns | 1.17 | 0.01 | - | - | - | 200 B |
| ShortInfixL_Pidgin | 1,506.5 ns | 5.515 ns | 5.1590 ns | 2.67 | 0.01 | 0.0401 | - | - | 128 B |
| ShortInfixR_Pidgin | 1,636.6 ns | 6.882 ns | 5.7467 ns | 2.90 | 0.02 | 0.0401 | - | - | 128 B |
| ShortInfixL_FParsec | 564.1 ns | 1.894 ns | 1.6788 ns | 1.00 | 0.00 | 0.0629 | - | - | 200 B |
| ShortInfixR_FParsec | 567.7 ns | 1.200 ns | 0.9373 ns | 1.01 | 0.00 | 0.0629 | - | - | 200 B |
JsonBench| Method | Mean | Error | StdDev | Ratio | RatioSD | Gen 0 | Gen 1 | Gen 2 | Allocated |
|---|---|---|---|---|---|---|---|---|---|
| BigJson_Pidgin | 684.6 us | 2.888 us | 2.701 us | 1.00 | 0.00 | 33.2031 | - | - | 101.7 KB |
| BigJson_Sprache | 3,597.5 us | 17.595 us | 16.458 us | 5.25 | 0.03 | 1726.5625 | - | - | 5291.81 KB |
| BigJson_Superpower | 2,884.4 us | 6.504 us | 5.766 us | 4.21 | 0.02 | 296.8750 | - | - | 913.43 KB |
| BigJson_FParsec | 750.1 us | 3.516 us | 3.289 us | 1.10 | 0.01 | 111.3281 | - | - | 343.43 KB |
| LongJson_Pidgin | 517.5 us | 2.418 us | 2.261 us | 1.00 | 0.00 | 33.2031 | - | - | 104.25 KB |
| LongJson_Sprache | 2,858.5 us | 10.491 us | 9.300 us | 5.53 | 0.03 | 1390.6250 | - | - | 4269.33 KB |
| LongJson_Superpower | 2,348.1 us | 14.194 us | 13.277 us | 4.54 | 0.03 | 230.4688 | - | - | 706.79 KB |
| LongJson_FParsec | 642.5 us | 2.708 us | 2.533 us | 1.24 | 0.01 | 125.0000 | - | - | 384.3 KB |
| DeepJson_Pidgin | 399.3 us | 1.784 us | 1.582 us | 1.00 | 0.00 | 26.3672 | - | - | 82.24 KB |
| DeepJson_Sprache | 2,983.0 us | 42.512 us | 39.765 us | 7.46 | 0.09 | 761.7188 | 191.4063 | - | 2922.46 KB |
| DeepJson_FParsec | 701.8 us | 1.665 us | 1.557 us | 1.76 | 0.01 | 112.3047 | - | - | 344.43 KB |
| WideJson_Pidgin | 427.8 us | 1.619 us | 1.515 us | 1.00 | 0.00 | 15.6250 | - | - | 48.42 KB |
| WideJson_Sprache | 1,704.2 us | 9.246 us | 8.196 us | 3.98 | 0.02 | 900.3906 | - | - | 2763.22 KB |
| WideJson_Superpower | 1,494.6 us | 9.581 us | 8.962 us | 3.49 | 0.02 | 148.4375 | - | - | 459.74 KB |
| WideJson_FParsec | 379.5 us | 1.597 us | 1.494 us | 0.89 | 0.00 | 41.9922 | - | - | 129.02 KB |
| Product | Versions Compatible and additional computed target framework versions. |
|---|---|
| .NET | net7.0 net7.0 is compatible. net7.0-android net7.0-android was computed. net7.0-ios net7.0-ios was computed. net7.0-maccatalyst net7.0-maccatalyst was computed. net7.0-macos net7.0-macos was computed. net7.0-tvos net7.0-tvos was computed. net7.0-windows net7.0-windows was computed. net8.0 net8.0 was computed. net8.0-android net8.0-android was computed. net8.0-browser net8.0-browser was computed. net8.0-ios net8.0-ios was computed. net8.0-maccatalyst net8.0-maccatalyst was computed. net8.0-macos net8.0-macos was computed. net8.0-tvos net8.0-tvos was computed. net8.0-windows net8.0-windows was computed. net9.0 net9.0 was computed. net9.0-android net9.0-android was computed. net9.0-browser net9.0-browser was computed. net9.0-ios net9.0-ios was computed. net9.0-maccatalyst net9.0-maccatalyst was computed. net9.0-macos net9.0-macos was computed. net9.0-tvos net9.0-tvos was computed. net9.0-windows net9.0-windows was computed. net10.0 net10.0 was computed. net10.0-android net10.0-android was computed. net10.0-browser net10.0-browser was computed. net10.0-ios net10.0-ios was computed. net10.0-maccatalyst net10.0-maccatalyst was computed. net10.0-macos net10.0-macos was computed. net10.0-tvos net10.0-tvos was computed. net10.0-windows net10.0-windows was computed. |
Showing the top 5 NuGet packages that depend on Pidgin:
| Package | Downloads |
|---|---|
|
ActualChat.Api
Package Description |
|
|
Blazor.ExtraDry.Core
Extensions to Blazor and DRY libraries to create Blazor client applications with a lower-code footprint. Use this Core library for server side and shared library common elements. |
|
|
ZingzeuData.Shared
ZingzeuData.Shared |
|
|
War3Net.CodeAnalysis
Helper methods for Pidgin parsers. |
|
|
ODataQuery
Enables server-side filtering, sorting and pagination of any IQueryable<T> using OData syntax and without needing an EDM model. |
Showing the top 14 popular GitHub repositories that depend on Pidgin:
| Repository | Stars |
|---|---|
|
space-wizards/space-station-14
A multiplayer game about paranoia and chaos on a space station. Remake of the cult-classic Space Station 13.
|
|
|
unitystation/unitystation
The original unitystation
|
|
|
space-wizards/RobustToolbox
Robust multiplayer game engine, used by Space Station 14
|
|
|
sebastienros/parlot
Fast and lightweight parser creation tools
|
|
|
Jcparkyn/nodexr
Graphical regular expression editor
|
|
|
pdfforge/PDFCreator
PDFCreator - The free PDF Converter
|
|
|
SnowflakePowered/snowflake
:snowflake: :video_game: Emulator Frontend and SDK
|
|
|
space-syndicate/space-station-14
🚀 Билд первого русскоязычного сервера Space Station 14
|
|
|
Drake53/War3Net
The complete .NET toolkit for Warcraft III modding.
|
|
|
OpenNefia/OpenNefia
Moddable engine reimplementation of the Japanese roguelike Elona.
|
|
|
DeltaV-Station/Delta-v
A fork of Space Station 14, embracing a mixture of classic SS13 chaos and experimentation only possible with the new engine
|
|
|
surrealdb/surrealdb.net
SurrealDB SDK for .NET
|
|
|
teo-tsirpanis/Farkle
LALR parser combinators for C# and F#.
|
|
|
ss14Starlight/space-station-14
An open source project aimed at creating unique mechanics and a pleasant game atmosphere in the game Space Station 14
|