Flite Culture

The Other Reason You Need Unique Server_ids in MySQL Replication

I recently encountered an error when two slave databases were using the same server_id value. I know having unique server_ids across all MySQL databases in a given replication topology is a best practice, but I’m not sure I understand exactly why MySQL is trying to enforce uniqueness in this specific case.

The most obvious reason to use unique server_id values for every database in a replication setup is to avoid having a master and slave with the same server_id. If that happens the slave will skip any events coming from that master.

But if I have two slaves with the same server_id, that should be safe unless I promote one of those slave to be the master of the other. Is that the case that MySQL is trying to protect me from, or is there some other case I am missing?

My specific problem started when I cloned a slave database. This is something I do fairly often, but most of the time I am creating a new read slave, and I change its server_id to something unique. This time I was replacing an existing database, so I left the server_id the same. When I started the slave on the new copy of the database, I now had two slaves with the same server_id. It didn’t seem like a big deal at the time, because I knew I would be shutting down the old copy in the near future.

The first symptom I noticed was an explosion of relay logs on the new database. On a healthy slave database there should usually only be one or two relay logs, but this database had tens of thousands of them, all very small, and all very recent.

I looked into the MySQL error log to see what was going on, and saw the same error over and over again:

1
2
3
4
5
6
7
8
9
10
11
12
130122 20:28:08 [Note] Slave I/O thread: Failed reading log event, reconnecting to retry, log 'bin-log.000049' at position 340959933
130122 20:28:08 [Note] Slave: received end packet from server, apparent master shutdown: 
130122 20:28:08 [Note] Slave I/O thread: Failed reading log event, reconnecting to retry, log 'bin-log.000049' at position 340959933
130122 20:28:08 [Note] Slave: received end packet from server, apparent master shutdown: 
130122 20:28:08 [Note] Slave I/O thread: Failed reading log event, reconnecting to retry, log 'bin-log.000049' at position 340959933
130122 20:28:08 [Note] Slave: received end packet from server, apparent master shutdown: 
130122 20:28:08 [Note] Slave I/O thread: Failed reading log event, reconnecting to retry, log 'bin-log.000049' at position 340959933
130122 20:28:09 [Note] Slave: received end packet from server, apparent master shutdown: 
130122 20:28:09 [Note] Slave I/O thread: Failed reading log event, reconnecting to retry, log 'bin-log.000049' at position 340959933
130122 20:28:09 [Note] Slave: received end packet from server, apparent master shutdown: 
130122 20:28:09 [Note] Slave I/O thread: Failed reading log event, reconnecting to retry, log 'bin-log.000049' at position 340959933
130122 20:28:09 [Note] Slave: received end packet from server, apparent master shutdown: 

I did a search for that error message and found Peter’s post from a few years ago which pointed me in the right direction.

So I changed the server_id on the old slave db, restarted it, and the errors went away.

In the future I’ll be more careful about this when cloning databases, but I have to say this seems like an odd way to enforce this restriction, and not a very descriptive error message. Even though the master was constantly sending end packets to the slave and trying to make it go away, the slave continued to reconnect and process updates from the master, although it certainly did so inefficiently.

I’ve learned my lesson for the future, but I’m still left with two questions:

  1. Why is MySQL enforcing unique server_ids for slave databases?
  2. Can we get a more descriptive error message when this situation occurs?
Comments

Comments