Hi, everyone!
We also encountered this problem while upgrading our MongoDB shard cluster from 4.0.12 to 4.2.17.
During the upgrade we were observing this kind of error messages in our logs
Code: 11600;
CodeName: InterruptedAtShutdown;
Command: { "getMore" : NumberLong("9001353061637322596"), "collection" : "some.collection" };
ErrorMessage: interrupted at shutdown;
Result: { "ok" : 0.0, "errmsg" : "interrupted at shutdown", "code" : 11600, "codeName" : "InterruptedAtShutdown" };
ConnectionId: { ServerId : { ClusterId : 1, EndPoint : "Unspecified/some.router.name:27017" }, LocalValue : 290 };
ErrorLabels: System.Collections.Generic.List`1[System.String]
MongoDB.Driver.MongoNodeIsRecoveringException: Server returned node is recovering error (code = 11600, codeName = "InterruptedAtShutdown").
at MongoDB.Driver.Core.WireProtocol.CommandUsingCommandMessageWireProtocol`1.ProcessResponse(ConnectionId connectionId, CommandMessage responseMessage)
at MongoDB.Driver.Core.WireProtocol.CommandUsingCommandMessageWireProtocol`1.Execute(IConnection connection, CancellationToken cancellationToken)
at MongoDB.Driver.Core.WireProtocol.CommandWireProtocol`1.Execute(IConnection connection, CancellationToken cancellationToken)
at MongoDB.Driver.Core.Servers.Server.ServerChannel.ExecuteProtocol[TResult](IWireProtocol`1 protocol, ICoreSession session, CancellationToken cancellationToken)
at MongoDB.Driver.Core.Servers.Server.ServerChannel.Command[TResult](ICoreSession session, ReadPreference readPreference, DatabaseNamespace databaseNamespace, BsonDocument command, IEnumerable`1 commandPayloads, IElementNameValidator commandValidator, BsonDocument additionalOptions, Action`1 postWriteAction, CommandResponseHandling responseHandling, IBsonSerializer`1 resultSerializer, MessageEncoderSettings messageEncoderSettings, CancellationToken cancellationToken)
at MongoDB.Driver.Core.Operations.AsyncCursor`1.ExecuteGetMoreCommand(IChannelHandle channel, CancellationToken cancellationToken)
at MongoDB.Driver.Core.Operations.AsyncCursor`1.GetNextBatch(CancellationToken cancellationToken)
at MongoDB.Driver.Core.Operations.AsyncCursor`1.MoveNext(CancellationToken cancellationToken)
at MongoDB.Driver.IAsyncCursorExtensions.ToList[TDocument](IAsyncCursor`1 source, CancellationToken cancellationToken)
at MongoDB.Driver.IAsyncCursorSourceExtensions.ToList[TDocument](IAsyncCursorSource`1 source, CancellationToken cancellationToken)
Code: 11600;
CodeName: InterruptedAtShutdown;
Command: { "find" : "some.collection", "filter" : {somefilter} };
ErrorMessage: Encountered non-retryable error during query :: caused by :: interrupted at shutdown;
Result: { "ok" : 0.0, "errmsg" : "Encountered non-retryable error during query :: caused by :: interrupted at shutdown", "code" : 11600, "codeName" : "InterruptedAtShutdown", "operationTime" : Timestamp(1642587846, 583), "$clusterTime" : { "clusterTime" : Timestamp(1642587850, 1385), "signature" : { "some signature" } } };
ConnectionId: { ServerId : { ClusterId : 2, EndPoint : "Unspecified/some.router.name:27017" }, LocalValue : 4228 };
ErrorLabels: System.Collections.Generic.List`1[System.String]
MongoDB.Driver.MongoNodeIsRecoveringException: Server returned node is recovering error (code = 11600, codeName = "InterruptedAtShutdown").
at MongoDB.Driver.Core.Operations.RetryableReadOperationExecutor.ExecuteAsync[TResult](IRetryableReadOperation`1 operation, RetryableReadContext context, CancellationToken cancellationToken)
at MongoDB.Driver.Core.Operations.ReadCommandOperation`1.ExecuteAsync(RetryableReadContext context, CancellationToken cancellationToken)
at MongoDB.Driver.Core.Operations.FindCommandOperation`1.ExecuteAsync(RetryableReadContext context, CancellationToken cancellationToken)
at MongoDB.Driver.Core.Operations.FindOperation`1.ExecuteAsync(RetryableReadContext context, CancellationToken cancellationToken)
at MongoDB.Driver.Core.Operations.FindOperation`1.ExecuteAsync(IReadBinding binding, CancellationToken cancellationToken)
at MongoDB.Driver.OperationExecutor.ExecuteReadOperationAsync[TResult](IReadBinding binding, IReadOperation`1 operation, CancellationToken cancellationToken)
at MongoDB.Driver.MongoCollectionImpl`1.ExecuteReadOperationAsync[TResult](IClientSessionHandle session, IReadOperation`1 operation, ReadPreference readPreference, CancellationToken cancellationToken)
at MongoDB.Driver.MongoCollectionImpl`1.UsingImplicitSessionAsync[TResult](Func`2 funcAsync, CancellationToken cancellationToken)
(I edited out the collection names, filters, signature, and router names.)
The upgrade process was as follows:
- Stop one of the secondary replica set members by issuing the command
systemctl stop mongod
or systemctl stop mongos
.
- Upgrade the MongoDB packages to 4.2.17.
- Start the
mongod
or mongos
processes.
- After upgrading all the secondaries, change the primary and perform steps 1-3.
As we observed, the aforementioned errors were showing up right after stopping the process. And also we saw the errors while connecting with ReadPreference = Primary
and ReadPreference = SecondaryPreferred
.
To answer @kevinadi questions:
-
4.0.12
before the upgrade.
-
2.11.4
(we’re using the C# MongoDB driver).
- Here’s our C# driver settings:
var settings = new MongoClientSettings
{
Servers = "{router names}",
ConnectionMode = ConnectionMode.Automatic,
MaxConnectionIdleTime = TimeSpan.FromMinutes(10),
MaxConnectionLifeTime = TimeSpan.FromMinutes(30),
MaxConnectionPoolSize = 100,
MinConnectionPoolSize = 1,
ReadPreference = ReadPreference.Primary, // and could be SecondaryPreferred
SocketTimeout = TimeSpan.Zero,
WaitQueueTimeout = TimeSpan.FromMinutes(2),
WriteConcern = WriteConcern.W1,
ConnectTimeout = TimeSpan.FromSeconds(15),
ReadConcern = ReadConcern.Default,
ServerSelectionTimeout = TimeSpan.FromSeconds(15),
};
- It was happening while we were shutting down the servers during the upgrade process as mentioned before.
- We didn’t come across any errors like those while using the
mongo
shell.
What we want to know is how to upgrade a MongoDB shard cluster more flawlessly. Any advice is really appreciated.