Goroutine leak in driver? Ever-increasing connection and goroutine count

I have a Go program with a thousand goroutines that each run once per hour, reading and writing a bit of data from MongoDB.

I noticed the goroutine count was increasing by about 150 per day, and the MongoDB connection count was also steadily increasing over time.

Using the pprof tools I got a summary of the goroutines:
http://<host>/debug/pprof/goroutine?debug=1

There are six groups of goroutines that keep increasing in number, all related to the MongoDB driver:

36 @ 0x43b1d6 0x4685dd 0x4685bd 0x473a4c 0xb2814b 0xb27ddb 0x46c4a1
#	0x4685bc	sync.runtime_notifyListWait+0x11c								/actions-runner-1467fb14/_work/_tool/go/1.17.8/x64/src/runtime/sema.go:513
#	0x473a4b	sync.(*Cond).Wait+0x8b										/actions-runner-1467fb14/_work/_tool/go/1.17.8/x64/src/sync/cond.go:56
#	0xb2814a	go.mongodb.org/mongo-driver/x/mongo/driver/topology.(*pool).createConnections.func2+0xea	/actions-runner-1467fb14/_work/go/pkg/mod/go.mongodb.org/mongo-driver@v1.8.4/x/mongo/driver/topology/pool.go:616
#	0xb27dda	go.mongodb.org/mongo-driver/x/mongo/driver/topology.(*pool).createConnections+0x13a		/actions-runner-1467fb14/_work/go/pkg/mod/go.mongodb.org/mongo-driver@v1.8.4/x/mongo/driver/topology/pool.go:643

18 @ 0x43b1d6 0x433b93 0x466c09 0x4d0772 0x4d1ada 0x4d1ac8 0x52b2a9 0x53c085 0x61833d 0x4f78d8 0x618525 0x615b92 0x61b42f 0x61b430 0x49819a 0xb1cfe5 0xb1cfb0 0xb1c9a7 0xb1d9a8 0xa9357e 0xa99ce5 0xac2785 0xb2ec51 0xb2d6e5 0x46c4a1
#	0x466c08	internal/poll.runtime_pollWait+0x88							/actions-runner-1467fb14/_work/_tool/go/1.17.8/x64/src/runtime/netpoll.go:234
#	0x4d0771	internal/poll.(*pollDesc).wait+0x31							/actions-runner-1467fb14/_work/_tool/go/1.17.8/x64/src/internal/poll/fd_poll_runtime.go:84
#	0x4d1ad9	internal/poll.(*pollDesc).waitRead+0x259						/actions-runner-1467fb14/_work/_tool/go/1.17.8/x64/src/internal/poll/fd_poll_runtime.go:89
#	0x4d1ac7	internal/poll.(*FD).Read+0x247								/actions-runner-1467fb14/_work/_tool/go/1.17.8/x64/src/internal/poll/fd_unix.go:167
#	0x52b2a8	net.(*netFD).Read+0x28									/actions-runner-1467fb14/_work/_tool/go/1.17.8/x64/src/net/fd_posix.go:56
#	0x53c084	net.(*conn).Read+0x44									/actions-runner-1467fb14/_work/_tool/go/1.17.8/x64/src/net/net.go:183
#	0x61833c	crypto/tls.(*atLeastReader).Read+0x3c							/actions-runner-1467fb14/_work/_tool/go/1.17.8/x64/src/crypto/tls/conn.go:777
#	0x4f78d7	bytes.(*Buffer).ReadFrom+0x97								/actions-runner-1467fb14/_work/_tool/go/1.17.8/x64/src/bytes/buffer.go:204
#	0x618524	crypto/tls.(*Conn).readFromUntil+0xe4							/actions-runner-1467fb14/_work/_tool/go/1.17.8/x64/src/crypto/tls/conn.go:799
#	0x615b91	crypto/tls.(*Conn).readRecordOrCCS+0x111						/actions-runner-1467fb14/_work/_tool/go/1.17.8/x64/src/crypto/tls/conn.go:606
#	0x61b42e	crypto/tls.(*Conn).readRecord+0x16e							/actions-runner-1467fb14/_work/_tool/go/1.17.8/x64/src/crypto/tls/conn.go:574
#	0x61b42f	crypto/tls.(*Conn).Read+0x16f								/actions-runner-1467fb14/_work/_tool/go/1.17.8/x64/src/crypto/tls/conn.go:1277
#	0x498199	io.ReadAtLeast+0x99									/actions-runner-1467fb14/_work/_tool/go/1.17.8/x64/src/io/io.go:328
#	0xb1cfe4	io.ReadFull+0x224									/actions-runner-1467fb14/_work/_tool/go/1.17.8/x64/src/io/io.go:347
#	0xb1cfaf	go.mongodb.org/mongo-driver/x/mongo/driver/topology.(*connection).read+0x1ef		/actions-runner-1467fb14/_work/go/pkg/mod/go.mongodb.org/mongo-driver@v1.8.4/x/mongo/driver/topology/connection.go:456
#	0xb1c9a6	go.mongodb.org/mongo-driver/x/mongo/driver/topology.(*connection).readWireMessage+0x206	/actions-runner-1467fb14/_work/go/pkg/mod/go.mongodb.org/mongo-driver@v1.8.4/x/mongo/driver/topology/connection.go:417
#	0xb1d9a7	go.mongodb.org/mongo-driver/x/mongo/driver/topology.initConnection.ReadWireMessage+0x27	/actions-runner-1467fb14/_work/go/pkg/mod/go.mongodb.org/mongo-driver@v1.8.4/x/mongo/driver/topology/connection.go:584
#	0xa9357d	go.mongodb.org/mongo-driver/x/mongo/driver.Operation.readWireMessage+0x7d		/actions-runner-1467fb14/_work/go/pkg/mod/go.mongodb.org/mongo-driver@v1.8.4/x/mongo/driver/operation.go:693
#	0xa99ce4	go.mongodb.org/mongo-driver/x/mongo/driver.Operation.ExecuteExhaust+0xe4		/actions-runner-1467fb14/_work/go/pkg/mod/go.mongodb.org/mongo-driver@v1.8.4/x/mongo/driver/operation_exhaust.go:22
#	0xac2784	go.mongodb.org/mongo-driver/x/mongo/driver/operation.(*Hello).StreamResponse+0x184	/actions-runner-1467fb14/_work/go/pkg/mod/go.mongodb.org/mongo-driver@v1.8.4/x/mongo/driver/operation/hello.go:200
#	0xb2ec50	go.mongodb.org/mongo-driver/x/mongo/driver/topology.(*Server).check+0x4d0		/actions-runner-1467fb14/_work/go/pkg/mod/go.mongodb.org/mongo-driver@v1.8.4/x/mongo/driver/topology/server.go:709
#	0xb2d6e4	go.mongodb.org/mongo-driver/x/mongo/driver/topology.(*Server).update+0x344		/actions-runner-1467fb14/_work/go/pkg/mod/go.mongodb.org/mongo-driver@v1.8.4/x/mongo/driver/topology/server.go:522

18 @ 0x43b1d6 0x44af72 0xa372fe 0x46c4a1
#	0xa372fd	go.mongodb.org/mongo-driver/internal.(*CancellationListener).Listen+0x7d	/actions-runner-1467fb14/_work/go/pkg/mod/go.mongodb.org/mongo-driver@v1.8.4/internal/cancellation_listener.go:30

18 @ 0x43b1d6 0x44af72 0xb288a6 0x46c4a1
#	0xb288a5	go.mongodb.org/mongo-driver/x/mongo/driver/topology.(*pool).maintain+0x285	/actions-runner-1467fb14/_work/go/pkg/mod/go.mongodb.org/mongo-driver@v1.8.4/x/mongo/driver/topology/pool.go:750

18 @ 0x43b1d6 0x44af72 0xb2a5f5 0x46c4a1
#	0xb2a5f4	go.mongodb.org/mongo-driver/x/mongo/driver/topology.(*rttMonitor).start+0x174	/actions-runner-1467fb14/_work/go/pkg/mod/go.mongodb.org/mongo-driver@v1.8.4/x/mongo/driver/topology/rtt_monitor.go:95

6 @ 0x43b1d6 0x40808c 0x407ab8 0xb34950 0x46c4a1
#	0xb3494f	go.mongodb.org/mongo-driver/x/mongo/driver/topology.(*Topology).pollSRVRecords+0x60f	/actions-runner-1467fb14/_work/go/pkg/mod/go.mongodb.org/mongo-driver@v1.8.4/x/mongo/driver/topology/topology.go:555

I tried the mongo-driver versions 1.8.3 and 1.8.4 and they both have this behavior.

Given that all these stacktraces are for the driver packages, I’m not sure how to debug it.
Could this indicate a bug in the connection pooling?

Maybe it’s your code, @Jim_Slattery ?

I’m not an expert in the Go driver (though I have used it), but I’m looking at the connection example and I see that won’t disconnect until main() exits. Do your main() routines exit and leave hung connections, or are the connections opened actually hanging around programmatically?

Most of the code in the stack traces you posted is expected to run 1 to multiple goroutine per connected server, so having many of those running is expected (depending on how many servers are in your MongoDB deployment). However, having 6 goroutines for pollSRVRecords seems unusual:

6 @ 0x43b1d6 0x40808c 0x407ab8 0xb34950 0x46c4a1
#	0xb3494f	go.mongodb.org/mongo-driver/x/mongo/driver/topology.(*Topology).pollSRVRecords+0x60f	/actions-runner-1467fb14/_work/go/pkg/mod/go.mongodb.org/mongo-driver@v1.8.4/x/mongo/driver/topology/topology.go:555

Is it possible that your code is creating or connecting new clients per run and/or not disconnecting old clients?

Ok, our MongoDB deployment only has 3 servers, so those goroutine counts do seem high.

Our code only connects to MongoDB once during startup:

client, err := mongo.Connect(ctx, clientOpts)
...
db := client.Database(db)

And then, all of the goroutines reuse that db to make their requests, like:

db.Collection(c).FindOne(ctx, criteria, ...)

So I’m not sure if there’s something we’re doing wrong, or if it could be identified with any debugging tools?

BTW, we weren’t explicitly using pool size settings, but I did notice if I set them in this conflicting way, &minPoolSize=100&maxPoolSize=1, the goroutine count exploded:

594 @ 0x43b1d6 0x40808c 0x407ab8 0xb28af0 0x46c4a1
#	0xb28aef	go.mongodb.org/mongo-driver/x/mongo/driver/topology.(*pool).maintain.func4+0x2f	/actions-runner-2de7bb32/_work/optima-recommendations/optima-recommendations/go/pkg/mod/go.mongodb.org/mongo-driver@v1.8.4/x/mongo/driver/topology/pool.go:741

12 @ 0x43b1d6 0x4685dd 0x4685bd 0x473a4c 0xb2814b 0xb27ddb 0x46c4a1
#	0x4685bc	sync.runtime_notifyListWait+0x11c								/actions-runner-2de7bb32/_work/_tool/go/1.17.8/x64/src/runtime/sema.go:513
#	0x473a4b	sync.(*Cond).Wait+0x8b										/actions-runner-2de7bb32/_work/_tool/go/1.17.8/x64/src/sync/cond.go:56
#	0xb2814a	go.mongodb.org/mongo-driver/x/mongo/driver/topology.(*pool).createConnections.func2+0xea	/actions-runner-2de7bb32/_work/optima-recommendations/optima-recommendations/go/pkg/mod/go.mongodb.org/mongo-driver@v1.8.4/x/mongo/driver/topology/pool.go:616
#	0xb27dda	go.mongodb.org/mongo-driver/x/mongo/driver/topology.(*pool).createConnections+0x13a		/actions-runner-2de7bb32/_work/optima-recommendations/optima-recommendations/go/pkg/mod/go.mongodb.org/mongo-driver@v1.8.4/x/mongo/driver/topology/pool.go:643

Agreed. It seems like each of the 6 topology instances is connecting to all 3 servers, resulting in 18 goroutines for the rtt_monitor.go, pool.go, and cancellation_listener.go. For 3 servers, there should only be 3 goroutines for each of those functions, not 18.

I don’t see any immediately obvious problems in those code snippets. You’re only connecting a Client once at startup, which seems correct. Using db.Collection(c).FindOne(...) also seems correct because all Database and Collection instances should use the same Client resources.

I noticed the directories from your stack traces frequently start with /actions-runner; are you running these tasks with Github Actions or some other system with “variable persistence” (i.e. may create a new runtime, may reuse an existing one) like AWS Lambda? I’m not immediately familiar with Github Actions or other “variable persistence” environments, but I wonder if it’s possible that some Go driver resources are unexpectedly re-initialized sometimes.

If you dump a goroutine summary using a pre-v1.8.0 version of the Go driver, do you also notice multiple goroutines for call topology.(*Topology).pollSRVRecords, or only one?

Concerning:

That actually sounds like a bug. I’ve filed GODRIVER-2366 to investigate and fix it.

Thanks Matt, I’ll have to try out version 1.7.5 of the driver for comparison.

Regarding /actions-runner in the stack traces, you are correct GitHub Actions was involved, but that was only to compile the binary :slight_smile:.
That binary is later deployed on a standard, long-running server.

BTW, I had tried setting maxPoolSize=5 but it didn’t seem to help, the goroutines have still been increasing by about 150 per day.
Mongo goroutine summary:

84 @ 0x43b1d6 0x4685dd 0x4685bd 0x473a4c 0xb2814b 0xb27ddb 0x46c4a1
#	0x4685bc	sync.runtime_notifyListWait+0x11c								/actions-runner-1467fb14/_work/_tool/go/1.17.8/x64/src/runtime/sema.go:513
#	0x473a4b	sync.(*Cond).Wait+0x8b										/actions-runner-1467fb14/_work/_tool/go/1.17.8/x64/src/sync/cond.go:56
#	0xb2814a	go.mongodb.org/mongo-driver/x/mongo/driver/topology.(*pool).createConnections.func2+0xea	/actions-runner-1467fb14/_work/optima-recommendations/optima-recommendations/go/pkg/mod/go.mongodb.org/mongo-driver@v1.8.4/x/mongo/driver/topology/pool.go:616
#	0xb27dda	go.mongodb.org/mongo-driver/x/mongo/driver/topology.(*pool).createConnections+0x13a		/actions-runner-1467fb14/_work/optima-recommendations/optima-recommendations/go/pkg/mod/go.mongodb.org/mongo-driver@v1.8.4/x/mongo/driver/topology/pool.go:643

42 @ 0x43b1d6 0x433b93 0x466c09 0x4d0772 0x4d1ada 0x4d1ac8 0x52b2a9 0x53c085 0x61833d 0x4f78d8 0x618525 0x615b92 0x61b42f 0x61b430 0x49819a 0xb1cfe5 0xb1cfb0 0xb1c9a7 0xb1d9a8 0xa9357e 0xa99ce5 0xac2785 0xb2ec51 0xb2d6e5 0x46c4a1
#	0x466c08	internal/poll.runtime_pollWait+0x88							/actions-runner-1467fb14/_work/_tool/go/1.17.8/x64/src/runtime/netpoll.go:234
#	...
#	0xb1cfe4	io.ReadFull+0x224									/actions-runner-1467fb14/_work/_tool/go/1.17.8/x64/src/io/io.go:347
#	0xb1cfaf	go.mongodb.org/mongo-driver/x/mongo/driver/topology.(*connection).read+0x1ef		/actions-runner-1467fb14/_work/optima-recommendations/optima-recommendations/go/pkg/mod/go.mongodb.org/mongo-driver@v1.8.4/x/mongo/driver/topology/connection.go:456
#	0xb1c9a6	go.mongodb.org/mongo-driver/x/mongo/driver/topology.(*connection).readWireMessage+0x206	/actions-runner-1467fb14/_work/optima-recommendations/optima-recommendations/go/pkg/mod/go.mongodb.org/mongo-driver@v1.8.4/x/mongo/driver/topology/connection.go:417
#	0xb1d9a7	go.mongodb.org/mongo-driver/x/mongo/driver/topology.initConnection.ReadWireMessage+0x27	/actions-runner-1467fb14/_work/optima-recommendations/optima-recommendations/go/pkg/mod/go.mongodb.org/mongo-driver@v1.8.4/x/mongo/driver/topology/connection.go:584
#	0xa9357d	go.mongodb.org/mongo-driver/x/mongo/driver.Operation.readWireMessage+0x7d		/actions-runner-1467fb14/_work/optima-recommendations/optima-recommendations/go/pkg/mod/go.mongodb.org/mongo-driver@v1.8.4/x/mongo/driver/operation.go:693
#	0xa99ce4	go.mongodb.org/mongo-driver/x/mongo/driver.Operation.ExecuteExhaust+0xe4		/actions-runner-1467fb14/_work/optima-recommendations/optima-recommendations/go/pkg/mod/go.mongodb.org/mongo-driver@v1.8.4/x/mongo/driver/operation_exhaust.go:22
#	0xac2784	go.mongodb.org/mongo-driver/x/mongo/driver/operation.(*Hello).StreamResponse+0x184	/actions-runner-1467fb14/_work/optima-recommendations/optima-recommendations/go/pkg/mod/go.mongodb.org/mongo-driver@v1.8.4/x/mongo/driver/operation/hello.go:200
#	0xb2ec50	go.mongodb.org/mongo-driver/x/mongo/driver/topology.(*Server).check+0x4d0		/actions-runner-1467fb14/_work/optima-recommendations/optima-recommendations/go/pkg/mod/go.mongodb.org/mongo-driver@v1.8.4/x/mongo/driver/topology/server.go:709
#	0xb2d6e4	go.mongodb.org/mongo-driver/x/mongo/driver/topology.(*Server).update+0x344		/actions-runner-1467fb14/_work/optima-recommendations/optima-recommendations/go/pkg/mod/go.mongodb.org/mongo-driver@v1.8.4/x/mongo/driver/topology/server.go:522

42 @ 0x43b1d6 0x44af72 0xa372fe 0x46c4a1
#	0xa372fd	go.mongodb.org/mongo-driver/internal.(*CancellationListener).Listen+0x7d	/actions-runner-1467fb14/_work/optima-recommendations/optima-recommendations/go/pkg/mod/go.mongodb.org/mongo-driver@v1.8.4/internal/cancellation_listener.go:30

42 @ 0x43b1d6 0x44af72 0xb288a6 0x46c4a1
#	0xb288a5	go.mongodb.org/mongo-driver/x/mongo/driver/topology.(*pool).maintain+0x285	/actions-runner-1467fb14/_work/optima-recommendations/optima-recommendations/go/pkg/mod/go.mongodb.org/mongo-driver@v1.8.4/x/mongo/driver/topology/pool.go:750

42 @ 0x43b1d6 0x44af72 0xb2a5f5 0x46c4a1
#	0xb2a5f4	go.mongodb.org/mongo-driver/x/mongo/driver/topology.(*rttMonitor).start+0x174	/actions-runner-1467fb14/_work/optima-recommendations/optima-recommendations/go/pkg/mod/go.mongodb.org/mongo-driver@v1.8.4/x/mongo/driver/topology/rtt_monitor.go:95

14 @ 0x43b1d6 0x40808c 0x407ab8 0xb34950 0x46c4a1
#	0xb3494f	go.mongodb.org/mongo-driver/x/mongo/driver/topology.(*Topology).pollSRVRecords+0x60f	/actions-runner-1467fb14/_work/optima-recommendations/optima-recommendations/go/pkg/mod/go.mongodb.org/mongo-driver@v1.8.4/x/mongo/driver/topology/topology.go:555

I’m embarrassed to say I found another component in our codebase that was creating Mongo connections.

It was doing a health check of Mongo by periodically calling mongo.Connect, doing a ping, and then returning. Maybe it should have called client.Disconnect before returning, although the ctx it had used for the connection did get cancelled.

I just rewrote that health check to only call mongo.Connect once at startup, and reuse that client for the life of the process.
So far, it’s been 14 hours and I haven’t seen any increase in goroutines!

Thanks @Matt_Dale and @Jack_Woehr for your help!

Maybe there is some Go debugging tool that could have described the goroutines stacktraces beyond where they were spawned, it could have directly identified the source of this problem.

@Jim_Slattery glad to hear you found the source of the extra goroutines! You’re correct that you have to call client.Disconnect to close all of the resources created by mongo.Connect (including the background goroutines).

This topic was automatically closed 5 days after the last reply. New replies are no longer allowed.