Just recently we had an incident in which a couple of switches were rebooted unintentionally which caused a backup process (Veeam) that utilizes VM snapshots (on VMware) to go haywire. I thought that it was just the backup process that was corrupted but then suddenly, I started getting errors every 5 minutes after the last snapshot was removed that primarily log event ID 233 but also 234, 228, and 530. Below are what they state:
228: At '5/15/2015 11:42:26 AM', the copy of database 'DB' on this server encountered an error that couldn't be automatically repaired without running as a passive copy and failover was attempted. The error returned was "There is only one copy of this
mailbox database (DB). Automatic recovery is not available.". For more information about the failure, consult the Event log on the server for "ExchangeStoreDb" events.
233: At '5/15/2015 11:46:03 AM', database copy 'DB' on this server encountered an error. For more information, consult the Event log for "ExchangeStoreDb" or "MSExchangeRepl" events.
234: At '5/15/2015 11:42:26 AM', the copy of database 'DB' on this server encountered a serious I/O error that may have affected all copies of the database. For information about the failure, consult the Event log on the server for "ExchangeStoreDb" or "MSExchangeRepl" events. All data should be immediately moved out of this database into a new database.
530: Information Store (3468) DB: The database page read from the file "F:\DB\Database\DB.edb" at offset 238081900544 (0x000000376ec98000) (database page 7265682 (0x6EDD92)) for 32768 (0x00008000) bytes failed verification due to a lost flush detection
timestamp mismatch. The read operation will fail with error -1119 (0xfffffba1). If this condition persists, restore the database from a previous backup. This problem is likely due to faulty hardware. Please contact your hardware vendor for further
assistance diagnosing the problem.
So I figured, well, I can just create a new database and migrate the mailboxes over but many of them fail to migrate. For those mailboxes, I've tried to run the PowerShell command to repair them (New-MailboxRepairRequest) but that fails too with the following error:
10049: Online integrity check for request 0ab17d2b-bd15-4161-b4df-0dfcfd16c4d6 failed with error -1119.
The export to PST file fails as well and users report that archiving through Outlook fails once it reaches a corrupted folder.
I thought this was only happening for one of the databases so we figured we'd migrate as many as we could to a new drive and then announce data loss for the rest. Right now, we're copying the last good backup of the edb and the log files to the drive to mount in the old one's place in hopes that we can get away from the errors. Unfortunately, due to drive constraints, we were forced to enable circular logging on this database but we're okay with the one-two days of data loss for that particular database. The disturbing part is that once we dismounted the corrupted database, we started receiving the same errors for two other databases... Fortunately, at least those aren't nearly as big and they do not have circular logging enabled so we might be able to do a full restore assuming that the log files are not corrupted. However, I am worried that there is a bigger problem such as drive failure.
I am wondering if anyone can offer some advice for this scenario and I wanted to make sure that I am going down the right path of simply running restore process for each DB that gets this error until we can move everything to new storage. I am on Exchange 2010 SP1 and we have been working hard over the last few months to get our environment ready for 2013 (we purchased new storage for that deployment). Sorry for the lengthy post and please let me know if you need any further info for me.
Thank you in advance for your time!