The production database just went down! What do I do?
A few of my colleagues can relate with the title of this article, they might have experienced an issue with the database in production. Dear colleague, you are not alone. Panicking during a disaster in production has happened to the best of us.
I can recall asking a friend a question about his experience during a disaster in production. He encountered a very serious disaster and he did not know what to do, felt like taking his bag and running away from the office. This colleague of mine is very good at what he does, he decided to relax and take a second look at the issue. A few moments later, he was able to find the solution.
As an ISO 22301 lead implementer, my major responsibility is to ensure there is business continuity in place in an organization. There must be a business continuity plan in place to guard against any disaster whenever it strikes and to ensure that the business continues running accordingly.
Production environments are very sensitive, especially database environments. It is very important for engineers managing these environments to undergo training concerning disaster recovery, business continuity, and disaster recovery plan. Whenever a disaster strikes, everybody knows what to do.
Database administrators need to ensure the following are executed regularly to prevent any surprises before they occur.
Regular backup and restore test
Some database administrators rely majorly on restoring a backup in case of disaster but do not test them beforehand to know whether they are valid or not. The rule of the thumb is to create a process of automating backup restore periodically and confirm that the recent restore is valid.
This restore should be automated periodically, at least twice a month is fine. Once the restore is done, scripts to check the database integrity will be executed against the database. If there are issues, they should be addressed immediately and the restored database dropped afterward.
Database Backup
Automated database backup in form of jobs needs to be set up and configured on the database for full backup, differential, and transactional log backup; with email notification to database administrators for backup status (failure or success)
Database High Availability
It is very essentials for every database in production to be on high availability, this will grantee an up-time or availability of 99.99999% equivalent to 100%.
Configuring database high availability simply means there will be more than one (1) instance of a database server running; once an instance fails, there will be an automatic failover to any of other instances. However, if the database is on high availability we can easily failover to the other instances in cases of any disaster.
During any disaster or emergency, I will recommend the following
- Stay positive and extremely calm to prevent more future damage. This also prevents mistakes.
- Raise an incident. It is very important to inform our customers or colleagues about the current situations. This is very good because it will indicate that we are aware of the current situation and also working on a resolution.
- Check if there is a recent backup which must include at least a recent full backup (and recent differential backup and a transactional log backup where applicable). If there is a recent backup and you can accommodate data loss, you can restore this database.
- Check if the affected database can be made available by running different scripts such as dbcc check DB or any other related scripts.
- Failover the database to the DR instance (alternative instance), if the database is on high availability