We have identified the problem causing these outages the last two days.
The replication of our storage arrays between our two datacenters was
causing excessive i/o latency. The two sites are configured for
synchronous replication, meaning a delay at the backup site was causing
delays for production.
For now, we have suspended replication, and will determine the best way
forward over the coming days.
Thanks for your patience, and sorry for the troubles.
On Tue, 10 Jul 2012, Mike Austin wrote:
> On Tue, 10 Jul 2012, Mike Austin wrote:
>> On Tue, 10 Jul 2012, Monica Devino wrote:
>>> Happening again perhaps?
>> Yes, DB server crashed again. DB is back, jabber will be a big longer.
> "a bit longer" - and it is back.
>> be migrating the DB over to its backup partner this evening most likely.