Problem
I have 3 instances running MongoDB as a replica set: mongo-1
, mongo-2
and mongo-3
, One is PRIMARY member and other 2 are SECONDARY. I adjusted the machine types of these three instances so they restarts at the same time. However, my main Java application keeps restarting after around 40 seconds.
When I looked into the docker logs of the Java app, two entries drew my attention:
Unauthorized: not authorized on admin to execute command
{ replSetHeartbeat: "rs",
...
No server chosen by WritableServerSelector
...
Waiting for 30000 ms before timing out
Then I ssh in to the MongoDB instances and went into mongo
shell, mongo-1
has the prompt rs:SECONDARY>
and the other two are rs:OTHER
. And inside the mongo shell, any rs.xxx()
command gives me Unauthorized
error.
So what breaks down is clear:
- The authorized user doesn’t have enough permission to do replica set operations like sending a heartbeat any more.
- When shutting down all SECONDARY and PRIMARY member instances, there are no PRIMARY member in the replica set. The replica set becomes unavailable because there is no writable member, while my Java app needs one.
Solution
- Stop the
mongod
you want to set to PRIMARY. Stop the container if usingdocker run
. Remove the container if usingdocker-compose
. -
Remove all authorization and replica set settings in:
- arguments (if starting using command):
--auth
,--keyFile
.--replSet
-
or settings (if starting use .conf file):
security: keyFile: <string> authorization: <string> oplogSizeMB: <int> replication: replSetName: <string>
- arguments (if starting using command):
- Start the
mongod
directly or by docker. You should see the empty prompt>
and warnings of not in replica set and using root user. -
Grant root role to the current user on
admin
database:db.grantRolesToUser( "<username>", [ { role: "root", db: "admin"} ] )
- Exit the mongo shell and add authorization and replica set settings back. Log into the mongo shell again using the user you want to use.
-
The prompt is still
rs:SECONDARY>
now. Make it PRIMARY by these two commands:rs:SECONDARY> cfg = { _id: "rs", members: [ { _id: 0, host: "123.23.111.3:27017" } ] } rs:SECONDARY> rs.reconfig( cfg, { force:true } )
- Now check
rs.status()
, it should be fine now. My Java app is running again immediately after this step. - (Optional) Remove the root access you gave to the user by
db.revokeRolesFromUser()
References
Set to PRIMARY part of the solution: http://doc.okbase.net/2426299/archive/180088.html