New unique package

The Go Blog

New unique package

Michael Knyszek
27 August 2024

The standard library of Go 1.23 now includes the new unique package.
The purpose behind this package is to enable the canonicalization of
comparable values.
In other words, this package lets you deduplicate values so that they point
to a single, canonical, unique copy, while efficiently managing the canonical
copies under the hood.
You might be familiar with this concept already, called
“interning”.
Let’s dive in to see how it works, and why it’s useful.

A simple implementation of interning

At a high level, interning is very simple.
Take the code sample below, which deduplicates strings using just a regular
map.

var internPool map[string]string

// Intern returns a string that is equal to s but that may share storage with
// a string previously passed to Intern.
func Intern(s string) string {
    pooled, ok := internPool[s]
    if !ok {
        // Clone the string in case it's part of some much bigger string.
        // This should be rare, if interning is being used well.
        pooled = strings.Clone(s)
        internPool[pooled] = pooled
    }
    return pooled
}

This is useful for when you’re constructing a lot of strings that are likely to
be duplicates, like when parsing a text format.

This implementation is super simple and works well enough for some cases, but it
has a few problems:

  • It never removes strings from the pool.
  • It cannot be safely used by multiple goroutines concurrently.
  • It only works with strings, even though the idea is quite general.

There’s also a missed opportunity in this implementation, and it’s subtle.
Under the hood, strings are immutable structures consisting of a pointer
and a length
.
When comparing two strings, if the pointers are not equal, then we must
compare their contents to determine equality.
But if we know that two strings are canonicalized, then it is sufficient
to just check their pointers.

Enter the unique package

The new unique package introduces a function similar to Intern called
Make.

It works about the same way as Intern.
Internally there’s also a global map (a fast generic concurrent
map
) and Make looks up the
provided value in that map.
But it also differs from Intern in two important ways.
Firstly, it accepts values of any comparable type.
And secondly, it returns a wrapper value, a
Handle[T], from which the canonical value
can be retrieved.

This Handle[T] is key to the design.
A Handle[T] has the property that two Handle[T] values are equal if and
only if the values used to create them are equal.
What’s more, the comparison of two Handle[T] values is cheap: it comes down
to a pointer comparison.
Compared to comparing two long strings, that’s an order of magnitude cheaper!

So far, this is nothing you can’t do in ordinary Go code.

But Handle[T] also has a second purpose: as long as a Handle[T] exists for
a value, the map will retain the canonical copy of the value.
Once all Handle[T] values that map to a specific value are gone, the
package marks that internal map entry as deletable, to be reclaimed in the near
future.
This sets a clear policy for when to remove entries from the map: when the
canonical entries are no longer being used, then the garbage collector is free
to clean them up.

If you’ve used Lisp before, this may all sound quite familiar to you.
Lisp symbols are interned
strings, but not strings themselves, and all symbols’ string values are
guaranteed to be in the same pool.
This relationship between symbols and strings parallels the relationship
between Handle[string] and string.

A real-world example

So, how might one use unique.Make?
Look no further than the net/netip package in the standard library, which
interns values of type addrDetail, part of the
netip.Addr structure.

Below is an abridged version of the actual code from net/netip that uses
unique.

// Addr represents an IPv4 or IPv6 address (with or without a scoped
// addressing zone), similar to net.IP or net.IPAddr.
type Addr struct {
    // Other irrelevant unexported fields...

    // Details about the address, wrapped up together and canonicalized.
    z unique.Handle[addrDetail]
}

// addrDetail indicates whether the address is IPv4 or IPv6, and if IPv6,
// specifies the zone name for the address.
type addrDetail struct {
    isV6   bool   // IPv4 is false, IPv6 is true.
    zoneV6 string // May be != "" if IsV6 is true.
}

var z6noz = unique.Make(addrDetail{isV6: true})

// WithZone returns an IP that's the same as ip but with the provided
// zone. If zone is empty, the zone is removed. If ip is an IPv4
// address, WithZone is a no-op and returns ip unchanged.
func (ip Addr) WithZone(zone string) Addr {
    if !ip.Is6() {
        return ip
    }
    if zone == "" {
        ip.z = z6noz
        return ip
    }
    ip.z = unique.Make(addrDetail{isV6: true, zoneV6: zone})
    return ip
}

Since many IP addresses are likely to use the same zone and this zone is part
of their identity, it makes a lot of sense to canonicalize them.
The deduplication of zones reduces the average memory footprint of each
netip.Addr, while the fact that they’re canonicalized means netip.Addr
values are more efficient to compare, since comparing zone names becomes a
simple pointer comparison.

A footnote about interning strings

While the unique package is useful, Make is admittedly not quite like
Intern for strings, since the Handle[T] is required to keep a string from
being deleted from the internal map.
This means you need to modify your code to retain handles as well as strings.

But strings are special in that, although they behave like values, they
actually contain pointers under the hood, as we mentioned earlier.
This means that we could potentially canonicalize just the underlying storage
of the string, hiding the details of a Handle[T] inside the string itself.
So, there is still a place in the future for what I’ll call transparent string
interning
, in which strings can be interned without the Handle[T] type,
similar to the Intern function but with semantics more closely resembling
Make.

In the meantime, unique.Make("my string").Value() is one possible workaround.
Even though failing to retain the handle will allow the string to be deleted
from unique’s internal map, map entries are not deleted immediately.
In practice, entries will not be deleted until at least the next garbage
collection completes, so this workaround still allows for some degree of
deduplication in the periods between collections.

Some history, and looking toward the future

The truth is that the net/netip package actually interned zone strings since
it was first introduced.
The interning package it used was an internal copy of the
go4.org/intern package.
Like the unique package, it has a Value type (which looks a lot like a
Handle[T], pre-generics), has the notable property that entries in the
internal map are removed once their handles are no longer referenced.

But to achieve this behavior, it has to do some unsafe things.
In particular, it makes some assumptions about the garbage collector’s behavior
to implement weak pointers
outside the runtime.
A weak pointer is a pointer that doesn’t prevent the garbage collector from
reclaiming a variable; when this happens, the pointer automatically becomes
nil.
As it happens, weak pointers are also the core abstraction underlying the
unique package.

That’s right: while implementing the unique package, we added proper weak
pointer support to the garbage collector.
And after stepping through the minefield of regrettable design decisions that
accompany weak pointers (like, should weak pointers track object
resurrection
? No!), we were
astonished by how simple and straightforward all of it turned out to be.
Astonished enough that weak pointers are now a public
proposal
.

This work also led us to reexamine finalizers, resulting in another proposal
for an easier-to-use and more efficient replacement for
finalizers
.
With a hash function for comparable values on the way as well,
the future of building memory-efficient
caches
in Go is bright!

Previous article: Range Over Function Types

Blog Index