![]() |
VOOZH | about |
...one of the most highly
regarded and expertly designed C++ library projects in the
world.
— Herb Sutter and Andrei
Alexandrescu, C++
Coding Standards
boost::hash is an enhanced implementation of the
hash function object specified by
C++11 as std::hash. It is the default hash function for
Boost.Unordered,
Boost.Intrusive's unordered associative
containers, Boost.MultiIndex's hash
indices, and Boost.Bimap's unordered_set_of.
Out of the box, boost::hash supports
standard integral types (integers, character types, and bool);
standard floating point types (float, double, long double);
pointers (to objects and to functions, but not pointers to members)
and nullptr;
enumeration types;
C arrays;
std::complex;
tuple-like types, such as std::pair, std::tuple, and user-defined
types that specialize std::tuple_size and provide get<I>;
sequence-like types, both standard and user-defined (sequence-like types
have begin() and end() member functions returning iterators);
unordered sequences, standard or user-defined (sequences for which the hash
value does not depend on the element order, such as std::unordered_set and
std::unordered_map);
described structs and classes — ones that have been annotated with the
BOOST_DESCRIBE_STRUCT or BOOST_DESCRIBE_CLASS macros from
Boost.Describe;
std::unique_ptr, std::shared_ptr;
std::type_index;
std::error_code, std::error_condition;
std::optional;
std::variant, std::monostate.
boost::hash is extensible; it’s possible for a user-defined type X to make
iself hashable via boost::hash<X> by defining an appropriate overload of the
function hash_value. Many, if not most, Boost types already contain the
necessary support.
boost::hash meets the requirements for std::hash specified in the C++11
standard, namely, that for two different input values their corresponding hash
values are either guaranteed to be distinct, or the probability of their being
the same (a hash collision) is small. Standard unordered containers, and the
hash-based Boost containers, are designed to work well with such hash functions.
boost::hash does not meet the stronger requirements often placed on hash
functions in a more general context. In particular, the hash function is not
cryptographic, is not collision-resistant against a determined adversary, and
does not necessarily possess good "avalanche" properties; that is, small
(single bit) perturbations in the input do not necessarily result in large
(half bits changing) perturbations in the output.
In particular, boost::hash has traditionally been the identity function for
all integral types that fit into std::size_t, because this guarantees lack of
collisions and is as fast as possible.
Added the hash_is_avalanching trait class.
C++03 is no longer supported.
Added an overload of hash_value for std::nullptr_t.
Added is_tuple_like and an overload of hash_value for
tuple-like types.
Changed string hashing to use
mulxp1_hash,
improving both quality and speed. This changes the hash values
for string-like types (ranges of char, signed char,
unsigned char, std::byte, char8_t).
Major update.
The specializations of boost::hash have been removed; it now
always calls hash_value.
Support for BOOST_HASH_NO_EXTENSIONS has been removed. The
extensions are always enabled.
All standard containers are now supported. This includes
std::forward_list and the unordered associative containers.
User-defined containers (types that have begin() and end()
member functions that return iterators) are now supported out
of the box.
Described structs and classes (those annotated with
BOOST_DESCRIBE_STRUCT or BOOST_DESCRIBE_CLASS) are now
supported out of the box.
hash_combine has been improved. This changes the hash values
of composite (container and tuple) types and of scalar types
bigger than size_t.
The performance (and quality, as a result of the above change)
of string hashing has been improved. boost::hash for strings
now passes SMHasher in 64 bit mode.
The documentation has been substantially revised to reflect the changes.
Fixed hash_combine so that its behavior no longer depends
on whether size_t is the exact same type as boost::uint64_t
(which wasn’t the case on macOS). This changes the hash values
of composite (container and tuple) types on macOS.
When using a Boost container such as
Boost.Unordered, you don’t need to do
anything to use boost::hash as it’s the default. To find out how to use
a user-defined type, read the section on extending boost::hash
for user types.
If you wish to use boost::hash with the standard unordered associative
containers, pass it as a template parameter:
std::unordered_multiset<int, boost::hash<int> >
set_of_ints;
std::unordered_set<std::pair<int, int>, boost::hash<std::pair<int, int> > >
set_of_pairs;
std::unordered_map<int, std::string, boost::hash<int> > map_int_to_string;
To use boost::hash directly, create an instance and call it as a function:
#include <boost/container_hash/hash.hpp>
int main()
{
boost::hash<std::string> string_hash;
std::size_t h = string_hash("Hash me");
}
or alternatively:
#include <boost/container_hash/hash.hpp>
int main()
{
std::size_t h = boost::hash<std::string>()("Hash me");
}
For an example of generic use, here is a function to generate a vector containing the hashes of the elements of a container:
template <class Container>
std::vector<std::size_t> get_hashes(Container const& x)
{
std::vector<std::size_t> hashes;
std::transform(x.begin(), x.end(), std::back_inserter(hashes),
boost::hash<typename Container::value_type>());
return hashes;
}
boost::hash is implemented by calling the function hash_value. The
namespace isn’t specified so that it can detect overloads via argument
dependant lookup. So if there is a free function hash_value in the same
namespace as a user type, it will get called.
If you have a structure library::book, where each book is uniquely defined
by its member id:
namespace library
{
struct book
{
int id;
std::string author;
std::string title;
// ....
};
bool operator==(book const& a, book const& b)
{
return a.id == b.id;
}
}
Then all you would need to do is write the function library::hash_value:
namespace library
{
std::size_t hash_value(book const& b)
{
boost::hash<int> hasher;
return hasher(b.id);
}
}
And you can now use boost::hash with book:
library::book knife(3458, "Zane Grey", "The Hash Knife Outfit");
library::book dandelion(1354, "Paul J. Shanley",
"Hash & Dandelion Greens");
boost::hash<library::book> book_hasher;
std::size_t knife_hash_value = book_hasher(knife);
// If std::unordered_set is available:
std::unordered_set<library::book, boost::hash<library::book> > books;
books.insert(knife);
books.insert(library::book(2443, "Lindgren, Torgny", "Hash"));
books.insert(library::book(1953, "Snyder, Bernadette M.",
"Heavenly Hash: A Tasty Mix of a Mother's Meditations"));
assert(books.find(knife) != books.end());
assert(books.find(dandelion) == books.end());
The full example can be found in examples/books.hpp and examples/books.cpp.
|
Tip
|
When writing a hash function, first look at how the equality function
works. Objects that are equal must generate the same hash value. When objects
are not equal they should generate different hash values. In this object
equality was based just on id so the hash function only hashes id. If it
was based on the object’s name and author then the hash function should take
them into account (how to do this is discussed in the next section).
|
Say you have a point class, representing a two dimensional location:
class point
{
int x;
int y;
public:
point() : x(0), y(0) {}
point(int x, int y) : x(x), y(y) {}
bool operator==(point const& other) const
{
return x == other.x && y == other.y;
}
};
and you wish to use it as the key for an unordered_map. You need to
customise the hash for this structure. To do this we need to combine the
hash values for x and y. The function boost::hash_combine is supplied
for this purpose:
class point
{
...
friend std::size_t hash_value(point const& p)
{
std::size_t seed = 0;
boost::hash_combine(seed, p.x);
boost::hash_combine(seed, p.y);
return seed;
}
...
};
Calls to hash_combine incrementally build the hash from the different
members of point, it can be repeatedly called for any number of elements.
It calls hash_value on the supplied element, and combines it with the seed.
Full code for this example is at examples/point.cpp.
Note that when using boost::hash_combine the order of the calls matters.
std::size_t seed = 0;
boost::hash_combine(seed, 1);
boost::hash_combine(seed, 2);
and
std::size_t seed = 0;
boost::hash_combine(seed, 2);
boost::hash_combine(seed, 1);
result in a different values in seed.
To calculate the hash of an iterator range you can use boost::hash_range:
std::vector<std::string> some_strings;
std::size_t hash = boost::hash_range(some_strings.begin(), some_strings.end());
Since hash_range works by repeatedly invoking hash_combine on the elements
of the range, the hash value will also be dependent on the element order.
If you are calculating a hash value for a range where the order of the data
doesn’t matter, such as unordered_set, you can use
boost::hash_unordered_range instead.
std::unordered_set<std::string> set;
std::size_t hash = boost::hash_unordered_range(set.begin(), set.end());
When writing template classes, you might not want to include the main
hash.hpp header as it’s quite an expensive include that brings in a lot of
other headers, so instead you can include the
<boost/container_hash/hash_fwd.hpp> header which forward declares
boost::hash, boost::hash_combine, boost::hash_range, and
boost::hash_unordered_range. You’ll need to include the main header before
instantiating boost::hash. When using a container that uses boost::hash it
should do that for you, so your type will work fine with the Boost hash
containers. There’s an example of this in
examples/template.hpp and
examples/template.cpp.
To avoid including even hash_fwd.hpp - which still requires the contents
of Boost.ContainerHash to be physically present - you are allowed to copy the
declarations from hash_fwd.hpp (and only those) directly into your own
header. This is a special exception guaranteed by the library; in general,
you can’t declare library functions, Boost or otherwise, without risk of
breakage in a subsequent release.
Let’s look at our point class again:
class point
{
int x;
int y;
public:
point() : x(0), y(0) {}
point(int x, int y) : x(x), y(y) {}
};
If you’re using C++14 or above, a much easier way to add
support for boost::hash to point is by using
Boost.Describe (and
get an automatic definition of operator== for free):
#include <boost/describe/class.hpp>
#include <boost/describe/operators.hpp>
class point
{
int x;
int y;
BOOST_DESCRIBE_CLASS(point, (), (), (), (x, y))
public:
point() : x(0), y(0) {}
point(int x, int y) : x(x), y(y) {}
};
using boost::describe::operators::operator==;
using boost::describe::operators::operator!=;
(Full code for this example is at examples/point2.cpp.)
Since the point class has been annotated with BOOST_DESCRIBE_CLASS,
the library can enumerate its members (and base classes) and automatically
synthesize the appropriate hash_value overload for it, without us needing
to do so.
This header contains forward declarations for the library primitives. These declarations are guaranteed to be relatively stable, that is, best effort will be expended on their not changing from release to release, allowing their verbatim copy into user headers that do not wish to physically depend on Boost.ContainerHash.
namespace boost
{
namespace container_hash
{
template<class T> struct is_range;
template<class T> struct is_contiguous_range;
template<class T> struct is_unordered_range;
template<class T> struct is_described_class;
template<class T> struct is_tuple_like;
} // namespace container_hash
template<class T> struct hash;
template<class T> void hash_combine( std::size_t& seed, T const& v );
template<class It> void hash_range( std::size_t& seed, It first, It last );
template<class It> std::size_t hash_range( It first, It last );
template<class It> void hash_unordered_range( std::size_t& seed, It first, It last );
template<class It> std::size_t hash_unordered_range( It first, It last );
template<class Hash> struct hash_is_avalanching;
} // namespace boost
Defines boost::hash, and helper functions.
namespace boost
{
template<class T> struct hash;
template<class T> void hash_combine( std::size_t& seed, T const& v );
template<class It> void hash_range( std::size_t& seed, It first, It last );
template<class It> std::size_t hash_range( It first, It last );
template<class It> void hash_unordered_range( std::size_t& seed, It first, It last );
template<class It> std::size_t hash_unordered_range( It first, It last );
// Enabled only when T is an integral type
template<class T>
std::size_t hash_value( T v );
// Enabled only when T is an enumeration type
template<class T>
std::size_t hash_value( T v );
// Enabled only when T is a floating point type
template<class T>
std::size_t hash_value( T v );
template<class T>
std::size_t hash_value( T* const& v );
template<class T, std::size_t N>
std::size_t hash_value( T const (&v)[N] );
template<class T>
std::size_t hash_value( std::complex<T> const& v );
template<class A, class B>
std::size_t hash_value( std::pair<A, B> const& v );
// Enabled only when container_hash::is_tuple_like<T>::value is true
template<class T>
std::size_t hash_value( T const& v );
// Enabled only when container_hash::is_range<T>::value is true
template<class T>
std::size_t hash_value( T const& v );
// Enabled only when container_hash::is_contiguous_range<T>::value is true
template<class T>
std::size_t hash_value( T const& v );
// Enabled only when container_hash::is_unordered_range<T>::value is true
template<class T>
std::size_t hash_value( T const& v );
// Enabled only when container_hash::is_described_class<T>::value is true
template<class T>
std::size_t hash_value( T const& v );
template<class T>
std::size_t hash_value( std::shared_ptr<T> const& v );
template<class T, class D>
std::size_t hash_value( std::unique_ptr<T, D> const& v );
std::size_t hash_value( std::type_index const& v );
std::size_t hash_value( std::error_code const& v );
std::size_t hash_value( std::error_condition const& v );
template<class T>
std::size_t hash_value( std::optional<T> const& v );
std::size_t hash_value( std::monostate v );
template<class... T>
std::size_t hash_value( std::variant<T...> const& v );
} // namespace boost
template<class T> struct hash
{
std::size_t operator()( T const& v ) const;
};
std::size_t operator()( T const& v ) const;
hash_value(v).
Only throws if hash_value(v) throws.
The call to hash_value is unqualified, so that user-supplied
overloads will be found via argument dependent lookup.
template<class T> void hash_combine( std::size_t& seed, T const& v );
Called repeatedly to incrementally create a hash value from several variables.
Updates seed with a new hash value generated by
deterministically combining it with the result of boost::hash<T>()(v).
Only throws if boost::hash<T>()(v) throws. On exception,
seed is not updated.
Equivalent to seed = combine(seed, boost::hash<T>()(v)),
where combine(s, v) is a mixing function that takes two arguments of
type std::size_t and returns std::size_t, with the following desirable
properties:
For a constant s, when v takes all possible size_t values,
combine(s, v) should also take all possible size_t values, producing
a sequence that is close to random; that is, it should be a random
permutation.
This guarantees that for a given seed, combine does not introduce
hash collisions when none were produced by boost::hash<T>(v); that is,
it does not lose information from the input. It also implies that
combine(s, v), as a function of v, has good avalanche properties;
that is, small (e.g. single bit) perturbations in the input v lead to
large perturbations in the return value (half of the output bits changing,
on average).
For two different seeds s1 and s2, combine(s1, v) and
combine(s2, v), treated as functions of v, should produce two
different random permutations.
combine(0, 0) should not be 0. Since a common initial value of seed
is zero, combine(0, 0) == 0 would imply that applying hash_combine on
any sequence of zeroes, regardless of length, will produce zero. This is
undesirable, as it would lead to e.g. std::vector<int>() and
std::vector<int>(4) to have the same hash value.
The current implementation uses the function mix(s + 0x9e3779b9 + v) as
combine(s, v), where mix(x) is a high quality mixing function that is a
bijection over the std::size_t values, of the form
x ^= x >> k1;
x *= m1;
x ^= x >> k2;
x *= m2;
x ^= x >> k3;
where the constants k1, k2, k3, m1, m2 are suitably chosen.
Note that mix(0) is 0. This is why we add the arbitrary constant
0x9e3779b9 to meet the third requirement above.
template<class It> void hash_range( std::size_t& seed, It first, It last );
When typename std::iterator_traits<It>::value_type is not char, signed char,
unsigned char, std::byte, or char8_t,
for( ; first != last; ++first )
{
boost::hash_combine<typename std::iterator_traits<It>::value_type>( seed, *first );
}
Otherwise, bytes from [first, last) are coalesced and hashed in an
unspecified manner. This is done in order to improve performance when hashing
strings.
For chars, the current implementation uses
mulxp1_hash when std::size_t is
64 bit, and mulxp1_hash32 when it’s 32 bit.
template<class It> std::size_t hash_range( It first, It last );
size_t seed = 0;
boost::hash_range( seed, first, last );
return seed;
template<class It> void hash_unordered_range( std::size_t& seed, It first, It last );
Updates seed with the values of
boost::hash<typename std::iterator_traits<It>::value_type>()(*i)
for each i in [first, last), such that the order of elements does
not affect the final result.
template<class It> std::size_t hash_unordered_range( It first, It last );
size_t seed = 0;
boost::hash_unordered_range( seed, first, last );
return seed;
// Enabled only when T is an integral type
template<class T>
std::size_t hash_value( T v );
When the value of v fits into std::size_t, when T is an unsigned type,
or into ssize_t, when T is a signed type, static_cast<std::size_t>(v).
Otherwise, an unspecified value obtained by mixing the value bits of v.
// Enabled only when T is an enumeration type
template<class T>
std::size_t hash_value( T v );
static_cast<std::size_t>(v).
hash_value(std::to_underlying(v)) would be better, but C++03
compatibility mandates the current implementation.
// Enabled only when T is a floating point type
template<class T>
std::size_t hash_value( T v );
An unspecified value obtained by mixing the value bits of v.
When sizeof(v) <= sizeof(std::size_t), the bits of v are returned
as-is (except in the case of -0.0, which is treated as +0.0).
template<class T>
std::size_t hash_value( T* const& v );
An unspecified value derived from reinterpret_cast<std::uintptr_t>(v).
template<class T, std::size_t N>
std::size_t hash_value( T const (&v)[N] );
boost::hash_range( v, v + N ).
template<class T>
std::size_t hash_value( std::complex<T> const& v );
An unspecified value derived from boost::hash<T>()(v.real()) and
boost::hash<T>()(v.imag()) such that, if v.imag() == 0, the value
is equal to boost::hash<T>()(v.real()).
A more straightforward implementation would just have used hash_combine
on v.real() and v.imag(), but the historical guarantee that real-valued
complex numbers should match the hash value of their real part precludes it.
This guarantee may be dropped in a future release, as it’s of questionable utility.
template<class A, class B>
std::size_t hash_value( std::pair<A, B> const& v );
std::size_t seed = 0;
boost::hash_combine( seed, v.first );
boost::hash_combine( seed, v.second );
return seed;
// Enabled only when container_hash::is_tuple_like<T>::value is true
template<class T>
std::size_t hash_value( T const& v );
std::size_t seed = 0;
using std::get;
boost::hash_combine( seed, get<0>(v) );
boost::hash_combine( seed, get<1>(v) );
// ...
boost::hash_combine( seed, get<N-1>(v) );
return seed;
where N is std::tuple_size<T>::value.
This overload is only enabled when
container_hash::is_range<T>::value is false.
// Enabled only when container_hash::is_range<T>::value is true
template<class T>
std::size_t hash_value( T const& v );
boost::hash_range( v.begin(), v.end() ).
This overload is only enabled when
container_hash::is_contiguous_range<T>::value and
container_hash::is_unordered_range<T>::value are both false.
It handles all standard containers that aren’t contiguous or unordered, such
as std::deque, std::list, std::set, std::map.
// Enabled only when container_hash::is_contiguous_range<T>::value is true
template<class T>
std::size_t hash_value( T const& v );
boost::hash_range( v.data(), v.data() + v.size() ).
This overload handles all standard contiguous containers, such as
std::string, std::vector, std::array, std::string_view.
// Enabled only when container_hash::is_unordered_range<T>::value is true
template<class T>
std::size_t hash_value( T const& v );
boost::hash_unordered_range( v.begin(), v.end() ).
This overload handles the standard unordered containers, such as
std::unordered_set and std::unordered_map.
// Enabled only when container_hash::is_described_class<T>::value is true
template<class T>
std::size_t hash_value( T const& v );
std::size_t seed = 0;
boost::hash_combine( seed, b1 );
boost::hash_combine( seed, b2 );
// ...
boost::hash_combine( seed, bM );
boost::hash_combine( seed, m1 );
boost::hash_combine( seed, m2 );
// ...
boost::hash_combine( seed, mN );
return seed;
where bi are the bases of v and mi are its members.
template<class T>
std::size_t hash_value( std::shared_ptr<T> const& v );
template<class T, class D>
std::size_t hash_value( std::unique_ptr<T, D> const& v );
boost::hash<T*>( v.get() ).
std::size_t hash_value( std::type_index const& v );
v.hash_code().
std::size_t hash_value( std::error_code const& v );
std::size_t hash_value( std::error_condition const& v );
std::size_t seed = 0;
boost::hash_combine( seed, v.value() );
boost::hash_combine( seed, &v.category() );
return seed;
template<class T>
std::size_t hash_value( std::optional<T> const& v );
For a disengaged v, an unspecified constant value; otherwise,
boost::hash<T>()( *v ).
std::size_t hash_value( std::monostate v );
An unspecified constant value.
template<class... T>
std::size_t hash_value( std::variant<T...> const& v );
std::size_t seed = 0;
boost::hash_combine( seed, v.index() );
boost::hash_combine( seed, x );
return seed;
where x is the currently contained value in v.
std::bad_variant_access when v.valueless_by_exception() is true.
Defines the trait boost::hash_is_avalanching.
namespace boost
{
template<class Hash> struct hash_is_avalanching;
} // namespace boost
template<class Hash> struct hash_is_avalanching
{
static constexpr bool value = /* see below */;
};
hash_is_avalanching<Hash>::value is:
false if Hash::is_avalanching is not present,
Hash::is_avalanching::value if this is present and convertible at compile time to a bool,
true if Hash::is_avalanching is void (this usage is deprecated),
ill-formed otherwise.
A hash function is said to have the avalanching property if small changes
in the input translate to large changes in the returned hash code
—ideally, flipping one bit in the representation of the input value results
in each bit of the hash code flipping with probability 50%. Libraries
such as Boost.Unordered consult this trait
to determine if the supplied hash function is of high quality.
boost::hash for std::basic_string<Ch> and std::basic_string_view<Ch>
has this trait set to true when Ch is an integral type (this includes
std::string and std::string_view, among others).
Users can set this trait for a particular Hash type by:
Inserting the nested is_avalanching typedef in the class definition
if they have access to its source code.
Writing a specialization of boost::hash_is_avalanching
for Hash.
Note that usage of this trait is not restricted to hash functions produced with Boost.ContainerHash.
Defines the trait boost::container_hash::is_range.
namespace boost
{
namespace container_hash
{
template<class T> struct is_range;
} // namespace container_hash
} // namespace boost
template<class T> struct is_range
{
static constexpr bool value = /* see below */;
};
is_range<T>::value is true when, for a const value x of type
T, x.begin() and x.end() return iterators of the same type
It (such that std::iterator_traits<It> is a valid specialization.)
Users are allowed to specialize is_range for their types if the
default behavior does not deduce the correct value.
Defines the trait boost::container_hash::is_contiguous_range.
namespace boost
{
namespace container_hash
{
template<class T> struct is_contiguous_range;
} // namespace container_hash
} // namespace boost
template<class T> struct is_contiguous_range
{
static constexpr bool value = /* see below */;
};
is_contiguous_range<T>::value is true when is_range<T>::value is
true and when, for a const value x of type T, x.data() returns
a pointer to a type that matches the value_type of the iterator returned
by x.begin() and x.end(), and x.size() returns a value of an integral
type.
Users are allowed to specialize is_contiguous_range for their types
if the default behavior does not deduce the correct value.
Defines the trait boost::container_hash::is_unordered_range.
namespace boost
{
namespace container_hash
{
template<class T> struct is_unordered_range;
} // namespace container_hash
} // namespace boost
template<class T> struct is_unordered_range
{
static constexpr bool value = /* see below */;
};
is_unordered_range<T>::value is true when is_range<T>::value is
true and when T::hasher is a valid type.
Users are allowed to specialize is_unordered_range for their types
if the default behavior does not deduce the correct value.
Defines the trait boost::container_hash::is_described_class.
namespace boost
{
namespace container_hash
{
template<class T> struct is_described_class;
} // namespace container_hash
} // namespace boost
template<class T> struct is_described_class
{
static constexpr bool value = /* see below */;
};
is_described_class<T>::value is true when
boost::describe::has_describe_bases<T>::value is true,
boost::describe::has_describe_members<T>::value is true, and
T is not a union.
Users are allowed to specialize is_described_class for their types
if the default behavior does not deduce the correct value.
Defines the trait boost::container_hash::is_tuple_like.
namespace boost
{
namespace container_hash
{
template<class T> struct is_tuple_like;
} // namespace container_hash
} // namespace boost
template<class T> struct is_tuple_like
{
static constexpr bool value = /* see below */;
};
is_tuple_like<T>::value is true when std::tuple_size<T>::value
is valid.
Users are allowed to specialize is_tuple_like for their types
if the default behavior does not deduce the correct value.
Many hash functions strive to have little correlation between the input and output values. They attempt to uniformally distribute the output values for very similar inputs. This hash function makes no such attempt. In fact, for integers, the result of the hash function is often just the input value. So similar but different input values will often result in similar but different output values. This means that it is not appropriate as a general hash function. For example, a hash table may discard bits from the hash function resulting in likely collisions, or might have poor collision resolution when hash values are clustered together. In such cases this hash function will perform poorly.
But the standard has no such requirement for the hash function, it just requires that the hashes of two different values are unlikely to collide. Containers or algorithms designed to work with the standard hash function will have to be implemented to work well when the hash function’s output is correlated to its input. Since they are paying that cost a higher quality hash function would be wasteful.
The way one customizes the standard std::hash function object for user
types is via a specialization. boost::hash chooses a different mechanism — an overload of a free function hash_value in the user namespace that is
found via argument-dependent lookup.
Both approaches have their pros and cons. Specializing the function object
is stricter in that it only applies to the exact type, and not to derived
or convertible types. Defining a function, on the other hand, is easier
and more convenient, as it can be done directly in the type definition as
an inline friend.
The fact that overloads can be invoked via conversions did cause issues in
an earlier iteration of the library that defined hash_value for all
integral types separately, including bool. Especially under C++03,
which doesn’t have explicit conversion operators, some types were
convertible to bool to allow their being tested in e.g. if statements,
which caused them to hash to 0 or 1, rarely what one expects or wants.
This, however, was fixed by declaring the built-in hash_value overloads
to be templates constrained on e.g. std::is_integral or its moral
equivalent. This causes types convertible to an integral to no longer
match, avoiding the problem.
In general, the library does not promise that the hash values will stay
the same from release to release (otherwise improvements would be
impossible). However, historically values have been quite stable. Before
release 1.81, the previous changes have been in 1.56 (a better
hash_combine) and 1.78 (macOS-specific change to hash_combine.)
Code should generally not depend on specific hash values, but for those
willing to take the risk of occasional breaks due to hash value changes,
the library now has a test that checks hash values for a number of types
against reference values (test/hash_reference_values.cpp),
whose version history
can be used as a rough guide to when hash values have changed, and for what
types.
The initial implementation of the library was based on Issue 6.18 of the
Library Extension Technical Report Issues List
(pages 63-67) which proposed the following implementation of hash_combine:
template<class T>
void hash_combine(size_t & seed, T const & v)
{
seed ^= hash_value(v) + (seed << 6) + (seed >> 2);
}
taken from the paper "Methods for Identifying Versioned and Plagiarised Documents" by Timothy C. Hoad and Justin Zobel.
During the Boost formal review, Dave Harris pointed out that this suffers
from the so-called "zero trap"; if seed is initially 0, and all the
inputs are 0 (or hash to 0), seed remains 0 no matter how many input
values are combined.
This is an undesirable property, because it causes containers of zeroes to have a zero hash value regardless of their sizes.
To fix this, the arbitrary constant 0x9e3779b9 (the golden ratio in a
32 bit fixed point representation) was added to the computation, yielding
template<class T>
void hash_combine(size_t & seed, T const & v)
{
seed ^= hash_value(v) + 0x9e3779b9 + (seed << 6) + (seed >> 2);
}
This is what shipped in Boost 1.33, the first release containing the library.
This function was a reasonable compromise between quality and speed for its
time, when the input consisted of chars, but it’s less suitable for
combining arbitrary size_t inputs.
In Boost 1.56, it was replaced by functions derived from Austin Appleby’s MurmurHash2 hash function round.
In Boost 1.81, it was changed again — to the equivalent of
mix(seed + 0x9e3779b9 + hash_value(v)), where mix(x) is a high quality
mixing function that is a bijection over the size_t values, of the form
x ^= x >> k1;
x *= m1;
x ^= x >> k2;
x *= m2;
x ^= x >> k3;
This type of mixing function was originally devised by Austin Appleby as the "final mix" part of his MurmurHash3 hash function. He used
x ^= x >> 33;
x *= 0xff51afd7ed558ccd;
x ^= x >> 33;
x *= 0xc4ceb9fe1a85ec53;
x ^= x >> 33;
as the 64 bit function fmix64 and
x ^= x >> 16;
x *= 0x85ebca6b;
x ^= x >> 13;
x *= 0xc2b2ae35;
x ^= x >> 16;
as the 32 bit function fmix32.
Several improvements of the 64 bit function have been subsequently proposed, by David Stafford, Pelle Evensen, and Jon Maiga. We currently use Jon Maiga’s function
x ^= x >> 32;
x *= 0xe9846af9b1a615d;
x ^= x >> 32;
x *= 0xe9846af9b1a615d;
x ^= x >> 28;
Under 32 bit, we use a mixing function proposed by "TheIronBorn" in a Github issue in the repository of Hash Prospector by Chris Wellons:
x ^= x >> 16;
x *= 0x21f0aaad;
x ^= x >> 15;
x *= 0x735a2d97;
x ^= x >> 15;
With this improved hash_combine, boost::hash for strings now passes the
SMHasher test suite by Austin Appleby
(for a 64 bit size_t).
The traditional implementation of hash_range(seed, first, last) has been
for( ; first != last; ++first )
{
boost::hash_combine<typename std::iterator_traits<It>::value_type>( seed, *first );
}
(the explicit template parameter is needed to support iterators with proxy
return types such as std::vector<bool>::iterator.)
This is logical, consistent and straightforward. In the common case where
typename std::iterator_traits<It>::value_type is char — which it is
in the common case of boost::hash<std::string> — this however leaves a
lot of performance on the table, because processing each char individually
is much less efficient than processing several in bulk.
In Boost 1.81, hash_range was changed to process elements of type char,
signed char, unsigned char, std::byte, or char8_t, four of a time.
A uint32_t is composed from first[0] to first[3], and that uint32_t
is fed to hash_combine.
In Boost 1.82, hash_range for these types was changed to use
mulxp1_hash. This improves both
quality and speed of string hashing.
Note that hash_range has also traditionally guaranteed that the same element
sequence yields the same hash value regardless of the iterator type. This
property remains valid after the changes to char range hashing. hash_range,
applied to the char sequence { 'a', 'b', 'c' }, results in the same value
whether the sequence comes from char[3], std::string, std::deque<char>,
or std::list<char>.
A Proposal to Add Hash Tables to the Standard Library
http://www.open-std.org/JTC1/SC22/WG21/docs/papers/2003/n1456.html
The hash table proposal explains much of the design. The hash function object is discussed in Section D.
The C++ Standard Library Technical Report
http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2005/n1836.pdf
Contains the hash function specification in section 6.3.2.
Library Extension Technical Report Issues List
http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2005/n1837.pdf
The library implements the extension described in Issue 6.18, pages 63-67.
Methods for Identifying Versioned and Plagiarised Documents
Timothy C. Hoad, Justin Zobel
https://people.eng.unimelb.edu.au/jzobel/fulltext/jasist03thz.pdf
Contains the hash function that the initial implementation of boost::hash_combine was based on.
Performance in Practice of String Hashing Functions
M.V. Ramakrishna, J. Zobel
In Proc. Int. Conf. on Database Systems for Advanced Applications, pages 215-223, Melbourne, Australia, April 1997.
https://www.comp.nus.edu.sg/~lingtw/dasfaa_proceedings/DASFAA97/P215.pdf
Referenced in the above paper as the source of the hash function.
MurmurHash3 hash function source
Austin Appleby
https://github.com/aappleby/smhasher/blob/61a0530f28277f2e850bfc39600ce61d02b518de/src/MurmurHash3.cpp#L65-L90
Austin Appleby’s 32 and 64 bit finalization mixing functions that introduced the "xmxmx" general form of a high quality bijective transformation that approximates a random permutation.
SMHasher hash function test suite
Austin Appleby
https://github.com/aappleby/smhasher
Contains a battery of tests for evaluating hash functions. The current
64 bit implementation of boost::hash for strings passes SMHasher.
Previous iterations did not.
Better Bit Mixing - Improving on MurmurHash3’s 64-bit Finalizer
David Stafford
https://zimbry.blogspot.com/2011/09/better-bit-mixing-improving-on.html
Describes the so-called "variant 13" mixing function, an improvement
over fmix64 from MurmurHash3, made famous by its adoption by the
splitmix64 random number generator.
Stronger, better, morer, Moremur; a better Murmur3-type mixer
Pelle Evensen
https://mostlymangling.blogspot.com/2019/12/stronger-better-morer-moremur-better.html
Describes Moremur, an improvement over MurmurHash3 fmix64 and Stafford
"variant 13".
Improved mx3 and the RRC test
Jon Maiga
http://jonkagstrom.com/mx3/mx3_rev2.html
Contains another improvement over MurmurHash3 fmix64 and "variant 13". This
is what the current implementation of boost::hash_combine uses when
std::size_t is 64 bits.
Prospecting for Hash Functions
Chris Wellons
https://nullprogram.com/blog/2018/07/31/
Describes Hash Prospector, a utility for discovering and evaluating mixing functions.
New best known functions
"TheIronBorn"
https://github.com/skeeto/hash-prospector/issues/19
Describes a good 32 bit mixing function, used by the current implementation
of boost::hash_combine when std::size_t is 32 bits.
This library is based on the design by Peter Dimov. During the initial development Joaquín M López Muñoz made many useful suggestions and contributed fixes.
The formal review was managed by Thorsten Ottosen, and the library reviewed by: David Abrahams, Alberto Barbati, Topher Cooper, Caleb Epstein, Dave Harris, Chris Jefferson, Bronek Kozicki, John Maddock, Tobias Swinger, Jaap Suter, Rob Stewart and Pavel Vozenilek. Since then, further constructive criticism has been made by Daniel Krügler, Alexander Nasonov and 沈慧峰.
The implementation of the hash function for pointers is based on suggestions made by Alberto Barbati and Dave Harris. Dave Harris also suggested an important improvement to boost::hash_combine that was taken up.
Some useful improvements to the floating point hash algorithm were suggested by Daniel Krügler.
The original implementation came from Jeremy B. Maitin-Shepard’s hash table library, although this is a complete rewrite.
The documentation was converted from Quickbook to AsciiDoc by Christian Mazakas.
Moved library into its own module, container_hash.
Moved headers for new module name, now at: <boost/container_hash/hash.hpp>, <boost/container_hash/hash_fwd.hpp>, <boost/container_hash/extensions.hpp>.
Added forwarding headers to support the old headers locations.
Support std::string_view, std::error_code, std::error_condition, std::optional, std::variant, std::monostate where available.
Update include paths from other Boost libraries.
Manually write out tuple overloads, rather than using the preprocessor to generate them. Should improve usability, due to better error messages, and easier debugging.
Fix tutorial example (#11017).
Quick fix for hashing vector<bool> when using libc++. Will try to introduce a more general fix in the next release.
Avoid float comparison warning when using Clang - this workaround was already in place for GCC, and was used when Clang pretends to be GCC, but the warning was appearing when running Clang in other contexts.
Support for char16_t, char32_t, u16string, u32string
Fix for recent versions of Visual C++ which have removed std::unary_function and std::binary_function (#12353).
Fixed some warnings.
Only define hash for std::wstring when we know we have a wchar_t. Otherwise there’s a compile error as there’s no overload for hashing the characters in wide strings (#8552).
Fixed strict aliasing violation (GitHub #3).
Removed some Visual C++ 6 workarounds.
Ongoing work on improving hash_combine. This changes the combine function which was previously defined in the reference documentation.
Ticket 7957: Fixed a typo.
Add support for boost::int128_type and boost::uint128_type where available - currently only __int128 and unsigned __int128 on some versions of gcc.
On platforms that are known to have the standard floating point functions, don’t use automatic detection - which can break if there are ambiguous overloads.
Fix undefined behaviour when using the binary float hash (Thomas Heller).
Restore enum support, which was accidentally removed in the last version.
New floating point hasher - will hash the binary representation on more platforms, which should be faster.
Support the standard smart pointers.
hash_value now implemented using SFINAE to avoid implicit casts to built in types when calling it.
Updated to use the new config macros.
Ticket 6771: Avoid gcc’s -Wfloat-equal warning.
Ticket 6806: Support std::array and std::tuple when available.
Add deprecation warning to the long deprecated boost/container_hash/detail/container_fwd.hpp.
Avoid warning due with gcc’s -Wconversion flag.
Add option to prevent implicit conversions when calling hash_value by defining BOOST_HASH_NO_IMPLICIT_CASTS. When using boost::hash for a type that does not have hash_value declared but does have an implicit conversion to a type that does, it would use that implicit conversion to hash it. Which can sometimes go very wrong, e.g. using a conversion to bool and only hashing to 2 possible values. Since fixing this is a breaking change and was only approached quite late in the release cycle with little discussion it’s opt-in for now. This, or something like it, will become the default in a future version.
Ticket 3866: Don’t foward declare containers when using gcc’s parallel library, allow user to stop forward declaration by defining the BOOST_DETAIL_NO_CONTAINER_FWD macro.
Ticket 4038: Avoid hashing 0.5 and 0 to the same number.
Stop using deprecated BOOST_HAS_* macros.
Reduce the number of warnings for Visual C++ warning level 4.
Some code formatting changes to fit lines into 80 characters.
Rename an internal namespace.
Automatically configure the float functions using template metaprogramming instead of trying to configure every possibility manually.
Workaround for when STLport doesn’t support long double.
Move the hash_fwd.hpp implementation into the hash subdirectory, leaving a forwarding header in the old location. You should still use the old location, the new location is mainly for implementation and possible modularization.
Ticket 2412: Removed deprecated headers.
Ticket 2957: Fix configuration for vxworks.
Changed the warnings in the deprecated headers from 1.34.0 to errors. These will be removed in a future version of Boost.
Moved detail headers out of boost/container_hash/detail, since they are part of functional/hash, not container_hash. boost/container_hash/detail/container_fwd.hpp has been moved to boost/detail/container_fwd.hpp as it’s used outside of this library, the others have been moved to boost/functional/hash/detail.
Ticket 2264: In Visual C++, always use C99 float functions for long double and float as the C++ overloads aren’t always availables.
Stop using OpenBSD’s dodgy std::numeric_limits.
Using the boost typedefs for long long and unsigned long long.
Move the extensions into their own header.
Support for long long, std::complex.
Improved algorithm for hashing floating point numbers:
Improved portablity, as described by Daniel Krügler in a post to the boost users list.
Fits more information into each combine loop, which can reduce the the number of times combine is called and hopefully give a better quality hash function.
Improved the algorithm for hashing floating point numbers.
On Cygwin use a binary hash function for floating point numbers, as Cygwin doesn’t have decent floating point functions for long double.
Never uses fpclass which doesn’t support long double.
Ticket 1064: Removed unnecessary use of errno.
Explicitly overload for more built in types.
Minor improvements to the documentation.
A few bug and warning fixes:
Ticket 1509: Suppress another Visual C++ warning.
Some workarounds for the Sun compilers.
Ticket 952: Suppress incorrect 64-bit warning on Visual C++.
Use declarations for standard classes, so that the library doesn’t need to include all of their headers
Deprecated the <boost/functional/hash/*.hpp> headers. Now a single header, <boost/functional/hash.hpp> is used.
Add support for the BOOST_HASH_NO_EXTENSIONS macro, which disables the extensions to TR1.
Minor improvements to the hash functions for floating point numbers.
Update the portable example to hopefully be more generally portable.
Fixed the points example, as pointed out by 沈慧峰.
Initial Release
This documentation is
Copyright 2005-2008 Daniel James
Copyright 2022 Peter Dimov
and is distributed under the Boost Software License, Version 1.0.