VOOZH about

URL: https://man7.org/linux/man-pages/man3/pcre.3.html

⇱ pcre(3) - Linux manual page


man7.org > Linux > man-pages

Linux/UNIX system programming training


pcre(3) — Linux manual page

NAME | PLEASE TAKE NOTE | INTRODUCTION | SECURITY CONSIDERATIONS | USER DOCUMENTATION | AUTHOR | REVISION | COLOPHON

PCRE(3) Library Functions Manual PCRE(3)

NAME         top

 PCRE - Perl-compatible regular expressions (original API)

PLEASE TAKE NOTE         top

 This document relates to PCRE releases that use the original API,
 with library names libpcre, libpcre16, and libpcre32. January
 2015 saw the first release of a new API, known as PCRE2, with
 release numbers starting at 10.00 and library names libpcre2-8,
 libpcre2-16, and libpcre2-32. The old libraries (now called
 PCRE1) are now at end of life, and 8.45 is the final release. New
 projects are advised to use the new PCRE2 libraries.

INTRODUCTION         top

 The PCRE library is a set of functions that implement regular
 expression pattern matching using the same syntax and semantics
 as Perl, with just a few differences. Some features that appeared
 in Python and PCRE before they appeared in Perl are also
 available using the Python syntax, there is some support for one
 or two .NET and Oniguruma syntax items, and there is an option
 for requesting some minor changes that give better JavaScript
 compatibility.

 Starting with release 8.30, it is possible to compile two
 separate PCRE libraries: the original, which supports 8-bit
 character strings (including UTF-8 strings), and a second library
 that supports 16-bit character strings (including UTF-16
 strings). The build process allows either one or both to be
 built. The majority of the work to make this possible was done by
 Zoltan Herczeg.

 Starting with release 8.32 it is possible to compile a third
 separate PCRE library that supports 32-bit character strings
 (including UTF-32 strings). The build process allows any
 combination of the 8-, 16- and 32-bit libraries. The work to make
 this possible was done by Christian Persch.

 The three libraries contain identical sets of functions, except
 that the names in the 16-bit library start with pcre16_ instead
 of pcre_, and the names in the 32-bit library start with pcre32_
 instead of pcre_. To avoid over-complication and reduce the
 documentation maintenance load, most of the documentation
 describes the 8-bit library, with the differences for the 16-bit
 and 32-bit libraries described separately in the pcre16 and
 pcre32 pages. References to functions or structures of the form
 pcre[16|32]_xxx should be read as meaning "pcre_xxx when using
 the 8-bit library, pcre16_xxx when using the 16-bit library, or
 pcre32_xxx when using the 32-bit library".

 The current implementation of PCRE corresponds approximately with
 Perl 5.12, including support for UTF-8/16/32 encoded strings and
 Unicode general category properties. However, UTF-8/16/32 and
 Unicode support has to be explicitly enabled; it is not the
 default. The Unicode tables correspond to Unicode release 6.3.0.

 In addition to the Perl-compatible matching function, PCRE
 contains an alternative function that matches the same compiled
 patterns in a different way. In certain circumstances, the
 alternative function has some advantages. For a discussion of
 the two matching algorithms, see the pcrematching page.

 PCRE is written in C and released as a C library. A number of
 people have written wrappers and interfaces of various kinds. In
 particular, Google Inc. have provided a comprehensive C++
 wrapper for the 8-bit library. This is now included as part of
 the PCRE distribution. The pcrecpp page has details of this
 interface. Other people's contributions can be found in the
 Contrib directory at the primary FTP site, which is:

 ftp://ftp.csx.cam.ac.uk/pub/software/programming/pcre

 Details of exactly which Perl regular expression features are and
 are not supported by PCRE are given in separate documents. See
 the pcrepattern and pcrecompat pages. There is a syntax summary
 in the pcresyntax page.

 Some features of PCRE can be included, excluded, or changed when
 the library is built. The pcre_config() function makes it
 possible for a client to discover which features are available.
 The features themselves are described in the pcrebuild page.
 Documentation about building PCRE for various operating systems
 can be found in the README and NON-AUTOTOOLS_BUILD files in the
 source distribution.

 The libraries contains a number of undocumented internal
 functions and data tables that are used by more than one of the
 exported external functions, but which are not intended for use
 by external callers. Their names all begin with "_pcre_" or
 "_pcre16_" or "_pcre32_", which hopefully will not provoke any
 name clashes. In some environments, it is possible to control
 which external symbols are exported when a shared library is
 built, and in these cases the undocumented symbols are not
 exported.

SECURITY CONSIDERATIONS         top

 If you are using PCRE in a non-UTF application that permits users
 to supply arbitrary patterns for compilation, you should be aware
 of a feature that allows users to turn on UTF support from within
 a pattern, provided that PCRE was built with UTF support. For
 example, an 8-bit pattern that begins with "(*UTF8)" or "(*UTF)"
 turns on UTF-8 mode, which interprets patterns and subjects as
 strings of UTF-8 characters instead of individual 8-bit
 characters. This causes both the pattern and any data against
 which it is matched to be checked for UTF-8 validity. If the data
 string is very long, such a check might use sufficiently many
 resources as to cause your application to lose performance.

 One way of guarding against this possibility is to use the
 pcre_fullinfo() function to check the compiled pattern's options
 for UTF. Alternatively, from release 8.33, you can set the
 PCRE_NEVER_UTF option at compile time. This causes a compile time
 error if a pattern contains a UTF-setting sequence.

 If your application is one that supports UTF, be aware that
 validity checking can take time. If the same data string is to be
 matched many times, you can use the PCRE_NO_UTF[8|16|32]_CHECK
 option for the second and subsequent matches to save redundant
 checks.

 Another way that performance can be hit is by running a pattern
 that has a very large search tree against a string that will
 never match. Nested unlimited repeats in a pattern are a common
 example. PCRE provides some protection against this: see the
 PCRE_EXTRA_MATCH_LIMIT feature in the pcreapi page.

USER DOCUMENTATION         top

 The user documentation for PCRE comprises a number of different
 sections. In the "man" format, each of these is a separate "man
 page". In the HTML format, each is a separate page, linked from
 the index page. In the plain text format, the descriptions of the
 pcregrep and pcretest programs are in files called pcregrep.txt
 and pcretest.txt, respectively. The remaining sections, except
 for the pcredemo section (which is a program listing), are
 concatenated in pcre.txt, for ease of searching. The sections are
 as follows:

 pcre this document
 pcre-config show PCRE installation configuration
 information
 pcre16 details of the 16-bit library
 pcre32 details of the 32-bit library
 pcreapi details of PCRE's native C API
 pcrebuild building PCRE
 pcrecallout details of the callout feature
 pcrecompat discussion of Perl compatibility
 pcrecpp details of the C++ wrapper for the 8-bit
 library
 pcredemo a demonstration C program that uses PCRE
 pcregrep description of the pcregrep command (8-bit
 only)
 pcrejit discussion of the just-in-time optimization
 support
 pcrelimits details of size and other limits
 pcrematching discussion of the two matching algorithms
 pcrepartial details of the partial matching facility
 pcrepattern syntax and semantics of supported
 regular expressions
 pcreperform discussion of performance issues
 pcreposix the POSIX-compatible C API for the 8-bit
 library
 pcreprecompile details of saving and re-using precompiled
 patterns
 pcresample discussion of the pcredemo program
 pcrestack discussion of stack usage
 pcresyntax quick syntax reference
 pcretest description of the pcretest testing command
 pcreunicode discussion of Unicode and UTF-8/16/32 support

 In the "man" and HTML formats, there is also a short page for
 each C library function, listing its arguments and results.

AUTHOR         top

 Philip Hazel
 University Computing Service
 Cambridge CB2 3QH, England.

 Putting an actual email address here seems to have been a spam
 magnet, so I've taken it away. If you want to email me, use my
 two initials, followed by the two digits 10, at the domain
 cam.ac.uk.

REVISION         top

 Last updated: 14 June 2021
 Copyright (c) 1997-2021 University of Cambridge.

COLOPHON         top

 This page is part of the PCRE (Perl Compatible Regular
 Expressions) project. Information about the project can be found
 at ⟨http://www.pcre.org/⟩. If you have a bug report for this
 manual page, see
 ⟨http://bugs.exim.org/enter_bug.cgi?product=PCRE⟩. This page was
 obtained from the tarball pcre-8.45.tar.gz fetched from
 ⟨ftp://ftp.csx.cam.ac.uk/pub/software/programming/pcre/⟩ on
 2021-08-27. If you discover any rendering problems in this HTML
 version of the page, or you believe there is a better or more up-
 to-date source for the page, or you have corrections or
 improvements to the information in this COLOPHON (which is not
 part of the original manual page), send a mail to
 man-pages@man7.org

PCRE 8.45 14 June 2021 PCRE(3)

Pages that refer to this page: grep(1)pcre-config(1)pcretest(1)pcrepattern(3)pcresyntax(3)sefcontext_compile(8)



HTML rendering created 2021-08-27 by Michael Kerrisk, author of The Linux Programming Interface, maintainer of the Linux man-pages project.

For details of in-depth Linux/UNIX system programming training courses that I teach, look here.

Hosting by jambit GmbH.

👁 Cover of TLPI