Database Forum / Oracle / Oracle Server / January 2006
Oracle: how to demonstrate successful restore?
|
|
Thread rating:  |
Stefan - 25 Jan 2006 14:48 GMT What are various techniques to demonstrate successful restore in Oracle?
for instance: what kind of formal confirmation does oracle provide? are there any sort of restore reports? what kind of information do they report? are there any additional manual checks that a DBA can do - maybe looking at time of system change numbers, or transaction times etc...
???????
thanks
Daniel Fink - 25 Jan 2006 19:19 GMT There are many ways to do this, it depends on what you are trying to test. Just being able to open the database is one measure!
Here is an example of a very simple restore test for an incomplete recovery: Database is running in archivelog mode. 1) Perform backup of database 2) Update test table with sysdate (make sure to record the exact date including minutes/seconds) and commit 3) Wait a certain number of minutes 4) Record sysdate 5) Update test table with new sysdate and commit 6) Shutdown database 7) Restore backups and roll forward to sysdate in step 4 8) Query test table and make sure the last updated date is the value in step 2 and not the value in step 5
For a complete recovery, verify that the test table has a last updated date found in step 5.
Regards, Daniel Fink
Tiff - 26 Jan 2006 00:02 GMT Daniel,
What a great idea! I have been on a DBA team for several years and we perform backups of all our databases (primary and secondary) daily. In this time, we have yet to need a recover (thankfully), but I told my team lead, I won't feel confident of our recovery plan until I see it in action.
What a great way to test a recovery. Do you think it would be a good test to create this test table and delete its datafile and try a recover to get back just this missing table? Will deleting this one datafile affect the rest of the database or is this a test best reserved for a development environment?
You can see I have no experience in disaster recovery.
Thanks,
Tiffany
Joel Garry - 26 Jan 2006 01:28 GMT Tiffany:
You definitely need to have a test environment set up separate from your production environment. (Even then I've seen people screw up and get fired because the front ends can look so similar).
If you have metalink access, type in "recovery scenarios" in the knowledge browser. Plenty of ideas scattered about there, especially if you run across the scenarios they use in the classes (and note: 62385.1 is a pretty decent list of basic things to try). Recovery is _the_ most significant dba skill, and you are right to recognize that you don't have valid backup procedures unless they've been tested. You need to know it cold for when you need it for real.
jg
 Signature @home.com is bogus. http://www.rathergood.com/independent_woman/
Daniel Fink - 26 Jan 2006 04:42 GMT Always, always, always test recoveries in a non-production environment. It is a great idea to use a production backup to do so (kills two birds with one stone). Need to refresh a development/test environment? How about using the latest production backup?
Think about different recovery scenarios. Loss of a single table, loss of a datafile, loss of a device, controller, system, data center, etc. Work out how you would recover from each, how long it would take, how much data would be lost, etc.
Presented for your consideration "The responsibility of a DBA is not to back up the database...the responsibility of the DBA is to recover the database!" (paraphrase of Tim Gorman).
I recall a discussion at a user group meeting where a dba was telling the story of a new tape drive in their backup system. Seems that there was a slight miscalibration and the head would move a fraction of a millimeter each time it wrote a new tape. Tapes would write successfully, would be verified successfully...and could not ever be read again!
I myself went through a situation where a bug caused the database to be unrecoverable. Not fun!
Regards, Daniel Fink
Tiff - 26 Jan 2006 16:20 GMT Guys, thanks so much!
I will take the advice both of you have offered and will work on a "Disaster Recovery" document and then take it to my lead to ask for the chance to test out some of the scenarios.
I will make the Gorman quote my new mantra!
My goal is to finally get certified this year and I am certain these hand on exercises will prove invaluable.
Thanks again!
Tiff
P.S. I will try using a backup from Production to load our Dev environment. I always just load using export or sqlldr. This will be a "fun" experiment.
Mladen Gogala - 27 Jan 2006 04:38 GMT > You can see I have no experience in disaster recovery. Companies normally have DR tests, just like fire drills. Typically, those tests go from primary and standby databases switching places all the way to going to remote location and restoring 1.1TB database at a spare location which can be provided by companies like IBM or SunGuard. It is important to understand that DR document should include the possibility of failure of any component, like router, name server, firewall, application server, web server, VPN server or LDAP server. Without the name server or router your clients will not be able to find the database server, even if the latter is perfectly operational.
The first thing to decide when writing a DR document is how far do you want to go and what do you want to protect the company from? Are we talking about major malfunction (like the 2003 big power outage) or total loss of a location as after Katrina or 9/11? What is the cost of the downtime and what are the time constraints? If there is an outage, how long can the company afford to stay down? If the allowed downtime is long enough, you can get away with restoring from backup in the moment of a problem or simply activating a standby which can be located at the other end of the same building. The second thing to decide is how much data can you afford to lose? If it is a dating database that keeps "compatibility points" and 10 million addresses of ineligible and undesired bachelors, you can afford to lose more data then if you are operating an on-line banking database. Last but not least is how much does the company want to spend on data protection? DR plans are frequently the first casualty of cost cutting. Suddenly, non-technical people (CFO, for instance) bring in this wonderful sales guy from EMC telling you that RAID-5 is just fine and is equally as fast as RAID 1+0 or that those nasty old PA RISC boxes can easily be replaced by nice new thingys with quad-Opteron motherboard and running RH. When you find an unknown coffee bags in the kitchen in the place where Maxwell House coffee bags used to be, it's time to get your resume on the Dice. Maxwell House coffee is critical for DR plans as it enables your DBA to perform database recovery at 03:30 AM. It's good to the last drop. CIO is usually extremely touchy when it comes to signing checks, especially for disaster recovery. Cheap solutions are abundant and normally do not work without an extreme effort and many hours of pointless toiling for which nobody will thank you. DR is primarily a business decision for which you will need support of the senior management of the company.
Why am I telling you all this? 1) You are a junior DBA person and obviously are in charge of the company's DR plan. That is usually a grave mistake and a sign of the company trying to save money on the wrong thing. Nothing personal, but as a long time DBA, I don't think that a DR plan should be tasked to a junior person. That is the job for your team lead. 2) You don't have a DR drill. Fire drills are required by law, but not the DR drills. To be effective, DR drill must be performed at least once a year. A&E, the company that I had a consulting gig with for a few months, performed DR drills monthly. They have two locations (NYC and Stamford, CT) and they switch roles of the primary and standby databases, name servers, routers and they go through the whole 9 yards. Now that's the company conscious of the data security. You don't even know whether your backup can be restored. That leads me to the conclusion that RESTORE DATABASE VALIDATE is not a part of your weekly backup routine. (This RMAN command scans the last backup and checks whether it can be restored)
If your company is trying to save some money and if you, as a junior DBA person, are in charge of DR plan, then you are what is commonly known as a "scapegoat". Anything happens, and it's your job on the line. For a DBA, an item like "I was fired when I lost company's production database" doesn't look too nice on the resume and somewhat diminishes chances of getting hired again as a DBA. Try clarifying the points mentioned above with the management and if you don't get clear and satisfactory answers, run like hell.
I humbly apologize if this sounds harsh and cynical, but I assure you, it's a cold world out there. You are free to hate me if you so desire.
 Signature http://www.mgogala.com
Frank van Bortel - 27 Jan 2006 20:38 GMT Hear! Hear!
But you will be surprised how few companies actually test their DR plan...
Reminds me of a former employer, that put great effort in designing a monkey-proof DRP, but never tested it, only to find out no tapes could be read when hell broke loose.
Took us (5 programmers) a week to get some 98% of the data back. Go figure: marketing, planning, sales, billing and MRP not available for a week.
 Signature Regards, Frank van Bortel
Top-posting is one way to shut me up...
Mladen Gogala - 28 Jan 2006 05:05 GMT > But you will be surprised how few companies actually > test their DR plan... It's a business decision. What surprises me is how few people actually verify their backups. Here is it all it takes: RMAN> restore database validate;
Starting restore at 27-JAN-06 allocated channel: ORA_DISK_1 channel ORA_DISK_1: sid=64 devtype=DISK
channel ORA_DISK_1: starting validation of datafile backupset channel ORA_DISK_1: reading from backup piece /oradata/back/back_01h9uj59_1_1.bkp channel ORA_DISK_1: restored backup piece 1 piece handle=/oradata/back/back_01h9uj59_1_1.bkp tag=TAG20060127T232632 channel ORA_DISK_1: validation complete, elapsed time: 00:01:06 Finished restore at 27-JAN-06
My database wasn't actually restored, only the backup was validated. Here is the problem, visible in sar output while the validation is running: $ sar -u 5 10 Linux 2.6.12-1.1381_FC3 (medo.noip.com) 01/27/2006
11:51:53 PM CPU %user %nice %system %iowait %idle 11:51:58 PM all 99.00 0.00 1.00 0.00 0.00 11:52:03 PM all 99.00 0.00 1.00 0.00 0.00 11:52:08 PM all 99.00 0.00 1.00 0.00 0.00 11:52:13 PM all 99.00 0.00 1.00 0.00 0.00 11:52:18 PM all 16.03 0.00 1.40 82.57 0.00 11:52:23 PM all 2.59 0.00 1.60 95.81 0.00 11:52:28 PM all 1.61 0.00 0.60 97.79 0.00 11:52:33 PM all 0.40 0.00 0.80 98.80 0.00 11:52:38 PM all 0.40 0.00 0.20 99.40 0.00 11:52:43 PM all 2.20 0.00 1.40 96.40 0.00 Average: all 41.94 0.00 1.00 57.06 0.00
Validation was active during the first few snapshots. It's extremely expensive operation, especially if the backupset is compressed. Without the compression, it takes approximately 40% of the CPU, but it is an expensive operation, even on the much bigger machines then my measly PC. Probably people think that not having a backup or having a bad backup is cheaper then verifying it? You are a DBA in the Big Easy, no snow, no ice storms, what could ever happen to your database? It's unlikely that it will ever be sleeping with the fishes, to use the term from "Godfather"? Why verify?
 Signature http://www.mgogala.com
JEDIDIAH - 30 Jan 2006 17:08 GMT >> But you will be surprised how few companies actually >> test their DR plan... [quoted text clipped - 15 lines] > > My database wasn't actually restored, only the backup was validated. You trust Oracle far too much. The backup isn't validated until that database is running again and the users applications are successfuly using it. There's nothing like actually doing something to prove that it can be done.
[deletia]
 Signature If you think that an 80G disk can hold HUNDRENDS of ||| hours of DV video then you obviously haven't used iMovie either. / | \
Joel Garry - 30 Jan 2006 22:02 GMT > >> But you will be surprised how few companies actually > >> test their DR plan... [quoted text clipped - 19 lines] > database is running again and the users applications are successfuly using it. > There's nothing like actually doing something to prove that it can be done. The backup is validated. What is needed is the _restore_ to be validated. And clarity on whether we are talking about validating procedures or actual restores.
Validating the backup shows that you have something that could be used in a restore. Even better is using the backup to restore to an off-production environment. There's always going to be some difference between what you can test and reality, the amount of difference is directly related to cost, that's where service level agreements and management decision making come into play.
I think we've all seen questionable management decisions. Technical advisors need to be sure they have input to make these decisions more reasonable.
jg -- @home.com is bogus. Game Over. http://www.signonsandiego.com/uniontrib/20060130/news_lz1b30ac2.html
rcyoung - 28 Jan 2006 18:04 GMT I find that most companies "fail" to do "real life" DR scenarios. Oh they do what looks fine "on paper"...to fulfill some mandated requirement..but nothing like a real life recovery. You really need to run through the whole process....including recalls of media that may be stored "off site", using alternate tape drives, etc
Vince Laurent - 27 Jan 2006 16:47 GMT The Oracle class on Backup and Recovery was one of the best I have taken. Not only do you learn how to deal with nearly 20 different scenerios the lab actually tests this knowledge. Our instructor would cause a failure and you would have to figure out which of the 20 it was. Good labs.
>Daniel, > [quoted text clipped - 15 lines] > >Tiffany ----------------------------------------------------- Come race with us! http://www.mgpmrc.org
Mladen Gogala - 27 Jan 2006 01:12 GMT > What are various techniques to demonstrate successful restore in > Oracle? > > for instance: what kind of formal confirmation does oracle provide? You can call Oracle support and ask them to come and certify your database. If your database is well protected, they'll give you Backup Secured Enterprise or BSE certification. Call Oracle support ask them about BSE certification of your database.
 Signature http://www.mgogala.com
dominica@gmail.com - 27 Jan 2006 02:46 GMT Actually, I agree with Joel Garry and Daniel..and so on... We could always backup the database in tape, but we might NEVER get it back. Sometimes, tape could be bad after leaving there for a while. And I usually recommend every 5 months, every company should do at least one recovery test , even on a small DB . And normally, an DBA should do/test recovery for different recovery scenario. (complete and in-complete recovery and full-db recovery or partial recovery).
Guess what? I just have to do 3 disasater recovery of my 3 production databases last week (the largest one is 450 GIg, smallest db is still 300 GIG). Actually I do a partial-recover, I only recover the tablespace that I want. BUt that tablespace is still 100+ GIG. Basically, one of the application has a BUG and nullify one important column and people don't realize the column is GONE until after 1 month.
So I have to get back the oracle hotbackup and archivelog from tape from last month.
I am very stressful about it and stay up late for 3 nights in a row.. and recover all 3 databases in a TEMP environment and then the developer could update back that column those those 13 million row tables from the RECOVERED-DBs.
The funny thing is , I don't do recovery that often, but my current work, lately has a lot of need to do recover.. even do logmining... to mind the delete rows. Though, it become very good experience.
Dominica
Mladen Gogala - 27 Jan 2006 06:45 GMT > Actually, I agree with Joel Garry and Daniel..and so on... > We could always backup the database in tape, but we might NEVER get it > back. That's called a black hole backup. It's essentially equivalent to doing backup to /dev/null. On the plus side, you will never run out of space, but you might have a problem with restoring it.
> Sometimes, tape could be bad after leaving there for a while. > And I usually recommend every 5 months, every company should do at > least one > recovery test , even on a small DB . 5 MONTHS? Are you sure that the tape is ripe enough for a recovery test after only 5 months? You must let sun flares and electromagnetic storms to take their natural course, as Simon Trevaglia would say on certain occasions. Restoring 5 months old backup makes a lot of business sense, I'm sure that numerous business analysts would be grateful to you for providing them less then half a year old database, but I'd rather let that tape to mature for few more months before attempting to verify it.
> And normally, an DBA should do/test recovery for different recovery > scenario. What is a "recovery scenario"? DBA has to be able to restore the database. DBA does backup of the database, DBA does backup of archivelogs, DBA does a full export of RMAN catalog database afterward, preferably by using scheduled scripts. Tapes then go to the tape duplicator where they are duplicated and stored in two different locations. If and when the need arises, DBA restores the database, from the backup that is less then 24 hours old, if possible, and if it isn't, then from the newest available backup. Testing recovery is done from the complete set of tapes, that's it. In case of a failure, DBA has to know what to restore and where. If you have a RAC with one or two standby databases, then you don't have a failure. You lose one, you continue with another. The same database is still available. Emergency restore to another server is done only when everything is lost. DR test is not a database recovery course to practice recovering tablespace containing the precious EMP and DEPT tables. DR test is a simulation of a disaster in which, typically, the whole data center is lost. In other words, someone tells you that all your base are belong to us and that you are on the way to destruction.
> (complete and in-complete recovery and full-db recovery or partial > recovery). I believe that management may have something to say about the partial recovery. They might not be thrilled by missing data.
In addition to that, "BSE certification" was meant to bring a smile on the faces of my Commonwealth friends, like Jonathan, Nuno and Niall. BSE certification of a database would be especially popular type of request in UK. DBA asking for such certification would have to be an Oracle Certifiable Person, OCP for short.
 Signature http://www.mgogala.com
dominica@gmail.com - 27 Jan 2006 20:34 GMT Hi Mladen,
1) Don't worry, I could do full recovery (if I want to). I have my own reason for doing partial-recovery on only one tablespace. I did not lose any data at all.
It is just because the developer want to see something in ONLY ONE SINGLE TABLE that they have one column NULLIFLY. (there is a long story and explanation why it is partial-recovery).
2) There is another reason why I restore one-month old backup (this is a special requirement, long story again). I recover to another TEMP-ENVIRONMENT, not the production one.
And for the "5 months" thing, I am not restoring 5 months-old backup. I am just saying test recover one every few months is good.(like every 5 months). Of course, if you run hot-standby, you don't have that recovery problem. (I used to run hot-standby in another work place, but not my current one).
Dominica
Joel Garry - 27 Jan 2006 21:41 GMT Dominica wrote:
>Of course, if you run hot-standby, you don't have that recovery problem. Yes, you have _other_ recovery problems. You need to test standby also, some people who have been burned switch it into readable mode every day (or hour). Beyond that, it needs to be fully tested periodically with a full app switchover.
I found one place that was faithfully moving changed code to the standby - but not changed shell scripts!
Another place I discovered was using a combination of backup software and tape compression that would backup within the window, at the cost of making it wayyyyy too slow to restore, the software would make the tape hunt all over for each [I don't know what, smaller than a file], making the restore go on for days, rather than just blasting it all back onto the DR machine. Which is why to this day I'm kinda weird about wanting the occasional cold dumb backup.
Mladen, we hate you _because_ we love you! :-) (You ought to take that write-up, generalize it, and add it to a faq or the dizwell wiki). Good point making the distinction between doing a DR plan right and learning recovery techniques. I have fallen into the "trying to answer their question and oversimplifying the answer until it is wrong" trap in this thread.
jg -- @home.com is bogus. "People are always going to find a way around us." http://www.signonsandiego.com/uniontrib/20060127/news_1n27tunnel.html
Mladen Gogala - 28 Jan 2006 03:53 GMT > Mladen, we hate you _because_ we love you! :-) (You ought to take > that write-up, generalize it, and add it to a faq or the dizwell wiki). This is actually a very good idea and I will try to write down something. Thanks. I've also been experimenting with HTMLDB and I found it ideal for creating a capacity plan. I will write down some facts about the DR plans and a capacity plan. I recently signed up for Howard's site and am delighted by the quality. It's a shame that Howard is no longer active on this forum.
 Signature http://www.mgogala.com
|
|
|