[ANNOUNCE] haproxy-2.8.0

Willy Tarreau Wed, 31 May 2023 08:15:45 -0700

Hi,

HAProxy 2.8.0 was released on 2023/05/31. It added 27 new commits
after version 2.8-dev13.
Only a small minor issues were addressed this time, the rest was
mostly doc polishing and cleanups. 2.8 is entering LTS status and will
be supported till 2028-Q2, and 2.9-dev0 was just created to pursue the
development, with an expected release around end of November this year.

Let's try to summarize the changes from 37 participants in the 1382
commits that were merged since 2.7.0 from a high level perspective:

- Lua/Mailers: there's now a full-Lua implementations of the mailers
 subsystem. It's provided as a Lua script (examples/lua/mailers.lua)
 which relies on the new internal event notification API. As such the
 script subscribes to server state change events and emits mails when
 the defined criteria are matched. It continues to rely on the
 "mailers" section, but being a Lua script, it's totally customizable.
 You can imagine to change the contents, change the notification
 conditions, send to multiple destinations etc. With this change, the
 internal Lua view of the servers was made fully dynamic so that added
 or removed servers are always seen in their current state. In fact the
 new event notification API goes way beyond this but better read the Lua
 API documentation to know more. The next step will be to completely
 deprecate the old Mailers subsystem in 2.9 and 3.0 and to remove it in
 3.1.

- HTTP/2 is advertised by default in ALPN on TLS listeners. It was about
 time, 5 years have passed since it was introduced, it's been enabled by
 default in clear text as an HTTP/1 upgrade for 4 years, yet some users
 do not know how to enable it. From now on, ALPN defaults to "h2,http/1.1"
 on TCP and "h3" on QUIC so that these protocol versions work by default.
 It's still possible to set/reset the ALPN to disable them of course. The
 old concern some users were having about window sizes was addressed by
 having a setting for each side (front vs back).

- Threading: thread groups are now usable by default by "bind" lines
 without requiring to replicate these lines once per thread group. This
 means that by default a bind line is bound to all threads, regardless
 of the number of groups (up to 64 groups of 64 threads or 4096 threads
 total). As such it becomes possible to enable multiple groups on a large
 system to benefit from all the processing power available if you're
 running heavy rules, Lua, compression, SSL or whatever. We still default
 to a single NUMA node because the cases where it brings solid benefits
 are not frequent enough, compared to the cost of having more listening
 sockets. Note that on systems with non-uniform L3 caches like AMD EPYC,
 this can bring important performance gains with only one setting in the
 config. We noticed a doubling of the request rate on a 24-core EPYC 74F3
 by enabling 8 groups instead of the default 1, to map to the L3 cache
 topology. The maximum tested so far was 224 threads with 4 & 8 groups on
 a dual-socket intel Sapphire Rapids system. That was blazingly fast :-)

- SSL: there are quite a bunch of updates on the SSL front in this release:
 - it's possible to adjust the signature algorithms to improve
 interoperability with some other TLSv1.2/1.3 clients. These
 algorithms are used to sign the ephemeral keys used during the
 handshake. Changing these algorithms are useful for buggy clients
 that negociate algorithms they don't support. Though the usage is
 very specific. It's also possible to adjust this parameter for
 Client Authentication.

 - SSL hanshake failure logs now dump the OpenSSL error string by
 default. No need to configure an error-log-format anymore to show
 details on the handshake error. It can be helpful to debug SSL
 problems (e.g. you'll now see "tlsv1 alert unknown ca" instead
 of just "SSL handshake failure").

 - OCSP: in 2.8 the OCSP responses for certificates can be automatically
 updated by a background task (by default every 5 minutes) so that it is
 no longer necessary to feed them over the CLI from an external script.
 Of course, this requires that your load balancers have outgoing HTTP
 access. This is enabled in crt-list files by adding "ocsp-update on" on
 the certificate's line. All this is observable on the CLI via
 "show ssl-ocsp-update" and "show ssl-ocsp-response".
 
 - LetsEncrypt: there's an acme.sh script in admin/acme.sh that can be used
 with your existing deployments (pull request for upstream still pending).
 It will permit to handle the renewal of LE certificates in stateless mode
 with no hassle (no need to proxy to a local port anymore).

 - OpenSSL: version 3.1 is now supported. It's less slow than 3.0 but still
 significantly slower than 1.1.1, but might be usable for most users with
 a low enough traffic.

 - wolfSSL: we've worked quite a bit with the wolfSSL team to make sure
 their latest version works well with HAProxy. As expected with such
 type of integration, there have been some rough edges at the beginning
 but we've now reached a point where their current release (5.6.0) works
 for simple setups, and their latest development branch (some PRs still
 under review) covers most of HAProxy's features. We're sufficiently
 confident in the fact that the last adjustments to be made will be in
 the lib (we're still working hand-in-hand with them to polish everything)
 and that the HAProxy side will not change for this. That's particularly
 important because it means that as new wolfSSL releases will appear
 in the next few weeks/months, stable HAProxy 2.8 releases will continue
 to work with it, or maybe even work better. From our testing, there are
 two nice aspects of this lib compared to OpenSSL:
 - it's fast and scales really well on multi-processor machines
 (2.5 times OpenSSL 3.1's performance on a 24-core machine)
 - it natively supports QUIC

 For these two reasons alone we do expect to encounter it increasingly
 frequently as users start to migrate from distros based on OpenSSL
 1.1.1 to distros based on 3.0 with no option to rollback to 1.1.1
 after they discover they need to multiply the number of LBs by 4 just
 to compensate for design flaws in a security library.

- QUIC: it has been running almost flawlessly for a year on haproxy.org,
 and totally flawlessly over the last 6 months. We also owe @Tristan971
 a huge kudos for deploying it live on significantly more traffic, and
 reporting countless issues. The internal architecture experienced the
 last few changes that we estimated were necessary, and we're confident
 that it's in a totally maintainable form now. Does it mean it's totally
 free of bugs ? Of course not, but in my opinion it reached the same
 level of stability as H2 had in 2.0 or 2.2, which is already pretty
 good. At this point we're only aware of a case which affects a small
 but non-negligible percentage of users' response time for Tristan,
 without being able to reproduce it out of his infrastructure. We're
 still on it of course, but despite this minor glitch we now consider
 it production-ready, which means that we're not seeing a good reason
 to stay away from it now if it brings benefits to your web site (e.g.
 visitors over lossy networks etc). For sure the SSL dependencies are
 still a constraint for the vast majority of those relying on OpenSSL,
 but with 3.0's performance ruined, even non-QUIC users have to rebuild
 anyway, so OpenSSL is no more a QUIC-only problem nowadays. What 2.8
 brings to QUIC is a lot of stuff (mostly backported to 2.7), the
 support for reloads by default, and a global kill-switch to disable
 it entirely in case of doubt, issue or just to confirm whether or not
 an observed issue comes from it or not.

- Stick-tables: the maximum number of parallel stick-counters used to be
 set at build time (default 3). Now it can be changed in the configuration
 using global.tune.stick-counters.

- HTTP compression: now HTTP request body can be compressed. This is
 useful when you deal with many POSTs and your origin servers are on
 a different hosting area that makes your traffic pass over paid links!

- HTTP "Forwarded" header field (RFC7239): this header that aims at
 replacing X-forwarded-for and friends is now supported, in input and
 output. It means we can complement it with certain parts (host, by,
 by_port, for, for_port). The benefit of using this one instead of the
 other is not always obvious, until you start to mix different products
 in your edge access and figure that they don't all add the same set of
 headers, and that for the application to figure which instance goes
 with which one, it's a nightmare. "Forwarded" conveys an ordered list
 of items so the ordering becomes as easy as it was when dealing with
 X-forwarded-for alone.

- JWT now supports the RSA-PSS algorithm

- There are a few reliability improvements:
 - Lua now has a burst-timeout setting which controls how long it can
 run a loop in non-yieldable context (e.g. converter function) and it
 will abort past this delay

 - binding errors faced during a reload could sometimes fail to resume
 on the old process (e.g. UNIX sockets). Now the mechanism was made
 more reliable, with the new process taking more care of old sockets
 until it manages to bind everything, and being able to roll them back
 entirely on error.

 - new metrics in show info to report the number of config warnings,
 the boot time and the number of times the global maxconn was reached.

 - the internal clock now wraps 20s after the boot, and not just every
 49.7 days. This makes sure that developers have a better chance of
 facing clock-wrapping related bugs before they hit your production.
 And it worked, we found something like 8 of them, most likely all
 in fact.

 - the internal connection handling was revisited so that low-level
 errors are more accurately reported through the layers. There should
 be less case where some termination codes will be reported for a
 different condition when errors arrive together.

- There were some performance improvements as well:
 - those mixing short and long connections might end up with unequal thread
 loads because incoming connections assigned to the least loaded thread
 could be off after short connections are gone and long ones are left on
 only some threads. A new queue load balancing algorithm "fair" resolves
 this by applying a round-robin to the threads.

 - rings used by traces are being used increasingly as a debugging aid by
 both users and developers. They're now much faster (2-3x). The support
 for the "trace" keyword in the global section is still marked
 experimental because some forthcoming changes are envisionned for 2.9
 to almost completely remove the locking, and it may slightly affect the
 on-disk format for file-backed maps.

 - sometimes an old stopping process making heavy use of stick-tables
 could consume insane amounts of CPU almost entirely spent in the libc's
 malloc_trim() function (or in free/malloc due to locking contention).
 This was addressed and stick-table memory releasing on stopping will
 no happen in small, almost unnoticeable batches.

- We know that users love troubleshooting tools (developers do as well),
 so here's some new stuff to play with:
 - "show quic" is to QUIC what "netstat" or "ss" are to TCP. It also
 supports a detailed format.

 - "show fd" can now filter on certain types (e.g. dump front sockets
 only, or UNIX sockets only)

 - H2 traces can at last show the received HTTP headers!

 - the CLI supports the process' uptime in the prompt. There's little
 use for this except for those who want to instantly spot when their
 LBs have rebooted (or failed to).

 - thread dumps in the panic output and "show activity" are now unlimited
 in length. That was becoming critical with buffers filling around 60
 threads...

 - crashes when facing a bogus condition ("BUG_ON") will now produce an
 "illegal instruction" instead of "segmentation fault" on architectures
 supporting this (i386, x86_64, arm64 for now). This will improve the
 ability to diagnose what happened and the quality of bug reports.

- There were a few updates to the configuration (cpu-map now supports
 commas, http-after-response supports more actions, sc-add-gpc() to
 increment a GPC by a fixed value, ability to ignore case when fetching
 a request parameter, httpclient supports disabling resolvers, enabled()
 preprocessor macro to enable config blocks only when features are
 supported)

- There are also a few unlikely but possibly breaking changes:
 - option httpclose in the frontend no longers triggers a close in the
 backend and conversely.

 - fixed typo in "show info" ("TotalSplicdedBytesOut" is now properly
 spelled "TotalSplicdedBytesOut"). Only affects the CLI, not Prometheus.

 - ALPN as mentioned above is now presented by default in HTTP to enable
 HTTP/2 over TCP+SSL and HTTP/3 over QUIC.

- For packagers, the build system is more flexible now with every single
 build option supporting its own CFLAGS and LDFLAGS (e.g. convenient when
 trying to force to use a static version of a lib).

And as usual, this summary doesn't do justice to all those having worked
hard on invisible things to make all this possible, nor those who spend a
lot of time helping users who report issues and ask for help, and those
who take the time to report cleanly documented issues as well! Thanks to
them for their efforts!

Please find the usual URLs below :
 Site index : https://www.haproxy.org/
 Documentation : https://docs.haproxy.org/
 Wiki : https://github.com/haproxy/wiki/wiki
 Discourse : https://discourse.haproxy.org/
 Slack channel : https://slack.haproxy.org/
 Issue tracker : https://github.com/haproxy/haproxy/issues
 Sources : https://www.haproxy.org/download/2.8/src/
 Git repository : https://git.haproxy.org/git/haproxy-2.8.git/
 Git Web browsing : https://git.haproxy.org/?p=haproxy-2.8.git
 Changelog : https://www.haproxy.org/download/2.8/src/CHANGELOG
 Dataplane API : 
https://github.com/haproxytech/dataplaneapi/releases/latest
 Pending bugs : https://www.haproxy.org/l/pending-bugs
 Reviewed bugs : https://www.haproxy.org/l/reviewed-bugs
 Code reports : https://www.haproxy.org/l/code-reports
 Latest builds : https://www.haproxy.org/l/dev-packages

Willy
---
Complete changelog since 2.8-dev13:
Amaury Denoyelle (7):
 CLEANUP: mux-quic: remove unneeded fields in qcc
 MINOR: mux-quic: remove nb_streams from qcc
 MINOR: quic: fix stats naming for flow control BLOCKED frames
 BUG/MEDIUM: mux-quic: only set EOI on FIN
 DOC: quic: remove experimental status for QUIC
 CLEANUP: mux-quic: rename functions for mux_ops
 CLEANUP: mux-quic: rename internal functions

Aurelien DARRAGON (2):
 BUILD: init: print rlim_cur as regular integer
 DOC: config: fix rfc7239 converter examples

Christopher Faulet (2):
 MINOR: compression: Improve the way Vary header is added
 DOC: config: Fix bind/server/peer documentation in the peers section

Frédéric Lécaille (1):
 MINOR: quic: Add QUIC connection statistical counters values to "show 
quic"

Patrick Hemmer (1):
 MINOR: init: pre-allocate kernel data structures on init

William Lallemand (2):
 DOC: install: add details about WolfSSL
 DOC: install: specify the minimum openssl version recommended

Willy Tarreau (10):
 BUILD: makefile: search for SSL_INC/wolfssl before SSL_INC
 BUG/MEDIUM: threads: fix a tiny race in thread_isolate()
 BUG/MINOR: mux-h2: refresh the idle_timer when the mux is empty
 BUILD: Makefile: use -pthread not -lpthread when threads are enabled
 CLEANUP: doc: remove 21 totally obsolete docs
 DOC: install: mention the common strict-aliasing warning on older 
compilers
 DOC: install: clarify a few points on the wolfSSL build method
 EXAMPLES: update the basic-config-edge file for 2.8
 MINOR: quic/cli: clarify the "show quic" help message
 MINOR: version: mention that it's LTS now.

eaglegai (2):
 BUG/MINOR: ssl_sock: add check for ha_meth
 BUG/MINOR: thread: add a check for pthread_create

---

Reply via email to