The Go Blog

More powerful Go execution traces

Michael Knyszek
14 March 2024

The runtime/trace package contains a powerful tool for understanding and
troubleshooting Go programs.
The functionality within allows one to produce a trace of each goroutine’s execution over some
time period.
With the go tool trace command (or the excellent open source
gotraceui tool), one may then visualize and explore the data within these
traces.

The magic of a trace is that it can easily reveal things about a program that are hard to see in
other ways.
For example, a concurrency bottleneck where lots of goroutines block on the same channel might be
quite difficult to see in a CPU profile, because there’s no execution to sample.
But in an execution trace, the lack of execution will show up with amazing clarity, and the stack
traces of blocked goroutines will quickly point at the culprit.

Go developers are even able to instrument their own programs with tasks,
regions, and logs that
they can use to correlate their higher-level concerns with lower-level execution details.

Issues

Unfortunately, the wealth of information in execution traces can often be out of reach.
Four big issues with traces have historically gotten in the way.

Traces had high overheads.
Traces didn’t scale well, and could become too big to analyze.
It was often unclear when to start tracing to capture a specific bad behavior.
Only the most adventurous gophers could programmatically analyze traces, given the lack of a
public package for parsing and interpreting execution traces.

If you’ve used traces in the last few years, you’ve likely been frustrated by one or more of these
problems.
But we’re excited to share that over the last two Go releases we’ve made big progress in all four
of these areas.

Low-overhead tracing

Prior to Go 1.21, the run-time overhead of tracing was somewhere between 10–20% CPU for many
applications, which limits tracing to situational usage, rather than continuous usage like CPU
profiling.
It turned out that much of the cost of tracing came down to tracebacks.
Many events produced by the runtime have stack traces attached, which are invaluable to actually
identifying what goroutines were doing at key moments in their execution.

Thanks to work by Felix Geisendörfer and Nick Ripley on optimizing the efficiency of tracebacks,
the run-time CPU overhead of execution traces has been cut dramatically, down to 1–2% for many
applications.
You can read more about the work done here in Felix’s great blog
post
on the topic.

Scalable traces

The trace format and its events were designed around relatively efficient emission, but required
tooling to parse and keep around the state of the entirety of a trace.
A few hundred MiB trace could require several GiB of RAM to analyze!

This issue is unfortunately fundamental to how traces are generated.
To keep run-time overheads low, all events are written to the equivalent of thread-local buffers.
But this means events appear out of their true order, and the burden is placed on the trace
tooling to figure out what really happened.

The key insight to making traces scale while keeping overheads low was to occasionally split the
trace being generated.
Each split point would behave a bit like simultaneously disabling and reenabling tracing in one
go.
All the trace data so far would represent a complete and self-contained trace, while the new trace
data would seamlessly pick up from where it left off.

As you might imagine, fixing this required rethinking and rewriting a lot of the foundation of
the trace implementation in the runtime.
We’re happy to say that the work landed in Go 1.22 and is now generally available.
A lot of nice improvements came with the rewrite, including some
improvements to the go tool trace command as well.
The gritty details are all in the design
document,
if you’re curious.

(Note: go tool trace still loads the full trace into memory, but removing this
limitation for traces produced by Go 1.22+ programs is now feasible.)

Flight recording

Suppose you work on a web service and an RPC took a very long time.
You couldn’t start tracing at the point you knew the RPC was already taking a while, because the
root cause of the slow request already happened and wasn’t recorded.

There’s a technique that can help with this called flight recording, which you may already be
familiar with from other programming environments.
The insight with flight recording is to have tracing on continuously and always keep the most
recent trace data around, just in case.
Then, once something interesting happens, the program can just write out whatever it has!

Before traces could be split, this was pretty much a non-starter.
But because continuous tracing is now viable thanks to low overheads, and the fact that the runtime
can now split traces any time it needs, it turns out it was straightforward to implement flight
recording.

As a result, we’re happy to announce a flight recorder experiment, available in the
golang.org/x/exp/trace package.

Please try it out!
Below is an example that sets up flight recording to capture a long HTTP request to get you started.

    // Set up the flight recorder.
    fr := trace.NewFlightRecorder()
    fr.Start()

    // Set up and run an HTTP server.
    var once sync.Once
    http.HandleFunc("/my-endpoint", func(w http.ResponseWriter, r *http.Request) {
        start := time.Now()

        // Do the work...
        doWork(w, r)

        // We saw a long request. Take a snapshot!
        if time.Since(start) > 300*time.Millisecond {
            // Do it only once for simplicity, but you can take more than one.
            once.Do(func() {
                // Grab the snapshot.
                var b bytes.Buffer
                _, err = fr.WriteTo(&b)
                if err != nil {
                    log.Print(err)
                    return
                }
                // Write it to a file.
                if err := os.WriteFile("trace.out", b.Bytes(), 0o755); err != nil {
                    log.Print(err)
                    return
                }
            })
        }
    })
    log.Fatal(http.ListenAndServe(":8080", nil))

If you have any feedback, positive or negative, please share it to the proposal
issue!

Trace reader API

Along with the trace implementation rewrite came an effort to clean up the other trace internals,
like go tool trace.
This spawned an attempt to create a trace reader API that was good enough to share and that could
make traces more accessible.

Just like the flight recorder, we’re happy to announce that we also have an experimental trace reader
API that we’d like to share.
It’s available in the same package as the flight recorder,
golang.org/x/exp/trace.

We think it’s good enough to start building things on top of, so please try it out!
Below is an example that measures the proportion of goroutine block events that blocked to wait on
the network.

    // Start reading from STDIN.
    r, err := trace.NewReader(os.Stdin)
    if err != nil {
        log.Fatal(err)
    }

    var blocked int
    var blockedOnNetwork int
    for {
        // Read the event.
        ev, err := r.ReadEvent()
        if err == io.EOF {
            break
        } else if err != nil {
            log.Fatal(err)
        }

        // Process it.
        if ev.Kind() == trace.EventStateTransition {
            st := ev.StateTransition()
            if st.Resource.Kind == trace.ResourceGoroutine {
                from, to := st.Goroutine()

                // Look for goroutines blocking, and count them.
                if from.Executing() && to == trace.GoWaiting {
                    blocked++
                    if strings.Contains(st.Reason, "network") {
                        blockedOnNetwork++
                    }
                }
            }
        }
    }
    // Print what we found.
    p := 100 * float64(blockedOnNetwork) / float64(blocked)
    fmt.Printf("%2.3f%% instances of goroutines blocking were to block on the networkn", p)

And just like the flight recorder, there’s a proposal issue that would
be a great place to leave feedback!

We’d like to quickly call out Dominik Honnef as someone who tried it out early, provided great
feedback, and has contributed support for older trace versions to the API.

Thank you!

This work was completed, in no small part, thanks to the help of the those in the diagnostics
working group, started over a year ago as a collaboration between stakeholders from
across the Go community, and open to the public.

We’d like to take a moment to thank those community members who have attended the diagnostic
meetings regularly over the last year: Felix Geisendörfer, Nick Ripley, Rhys Hiltner, Dominik
Honnef, Bryan Boreham, thepudds.

The discussions, feedback, and work you all put in have been instrumental to getting us to where we
are today.
Thank you!

Next article: Go Developer Survey 2024 H1 Results

Previous article: Robust generic functions on slices

Blog Index