How i crashed solaris !

Well I’ve been working for quite some time on Linux platform. I’ve using solaris for the past few months. By mistake i found out a major flaw that can be dangerous to the system & the data. So i started my detailed research on how can i crash the system & to what extent it can be recovered.

Hardware:
Processor: AMD Athlon(tm) 64 X2 Dual Core Processor 4400+

Hard disk: 40 GB IDE

RAM: 4 GB RAM & other normal stuff.

Distributions used for testing.

OpenSolaris 2008

Belenix 0.7.1

Problems faced:

GRUB error

Boot archive error

ZFS degraded.

Methodology:

Simply switch off the system (plz note that it’s switch off or cut the power and not shutdown).

The mistake that gave the gyan:

I had installed belenix on my system. I’ve been doing R&D with it for quite some time. One day the power went down when the solaris system was running & my ups dried out. So the system just switched off. Next time i started the system, i just these words on the screen “GRUB” and nothing else. There were no grub menu etc. So i realized that something is wrong & i started investigating the problem.

kindly note that these tests have been performed many times.  Sometimes the grub is lost in the first instance & sometimes it takes more than twice to switch off the machine to loose the grub. ;) . Well lets get started.

Test 1: Steps performed.

  1. I reinstalled belenix on my system and confirmed that everything was working fine.
  2. Then i just switched off the power.
  3. Restarted the system & the grub was lost (very strange). Sometimes it takes 2-3 times of step 2 to get to this stage.

I’ve read and heard about the stability & security of solaris, and that’s why i thought of using it in the first place. But what’s this flaw which can compromise the entire system. At first i thought this way specific to belenix, so i decided to give opensolaris a try.
Test 2: Steps

  1. Installed opensolaris on my system.
  2. 40 GB divided into 2 primary partitions.
  • /dev/rdsk/c4d0s1 as swap (1.2 GB)
  • /dev/rdsk/c4d0s0 as zfs root filesystem on which os2008 was installed (18.5 GB)
  • /dev/rdsk/c4d0s2 was left for creating zfs pool (20.5 GB)

After checking that everything was working fine. I switched off the power again. And what i find is that grub is lost again. Now this is a serious issue.Any ways, i did lot of googleing related to recovery of grub on opensolaris, & found some good solution.

To know more, on how to recover grub in opensolaris, read my other post.
http://www.linuxguy.in/?p=3

I recovered the grub, using the methods provided on the link & things were normal again. Till this level if you happen to loose the grub, you can recover it. But i was not satisfied so i thought of performing some more stunts.

Test 3: Steps

  • I created zfs pool on the second hard disk.
  • This time i was copying some huge data on the new pool over the ssh when i switched off the computer. As expected grub was lost. So i recovered the grub. But this time after recovering the grub three different scenarios  happened:
  • Scenario 1: Some boot archive issue. check the last entry in the mailing list thread to know more.
  • I tried to recover the boot archive & even deleted & recreated the same but still the system was not starting. Had to reinstall the system.
  • Scenario 2: System booted. i could see the other pool & when i imported it then it showed in the status as degraded. I cleared the pool with the command “zpool create poolname “.
  • Restarted the system and i lost my grub again. i restored the grub again and the system won’t boot even after restoring the grub.
  • Scenario 3: Restored the grub. Started the system. Grub menu shows up & then reboots the system in the continuous loop. Only solution was to reinstall the entire system again.

I found this thread on the mailing list in which some guys have earlier faced a similar issue.

http://opensolaris.org/jive/thread.jspa?messageID=268456

Result: In the thread whatever methods they’ve told are not successful. I was successful in recovering the system if the grub was lost. But if you happen to face a situation that the last guy in the thread is facing which even i did, I still don’t have the solution to recover the system from that stage.

Outcome: Please be very careful while switching off your solaris system. Do a proper shutdown, or you may end up loosing your valuable data.

Updates on 2.11.2008

Test 4: Steps

  1. Installed os2008 on the entire disk (choose the whole disk while installation), rather than on a partition.
  2. Confirmed that the OS is working perfectly & switched off the machine.
  3. Restarted the system, & the system works. (Great news).

Result: If you are installing os2008 on the entire disk, & switch off & start, the grub is not lost.



4 Responses to “How i crashed solaris !”

  • pinky214 Says:

    Opensolaris and belinux are not what I would call “ready for productin”. With that said I run opensolaris on my laprop and through out the last the earlier releases I have done unclean shutdowns and never seen what you went through. If you are looking for stability give solaris x86(download from sun) a try since this is the commercial version it ensure stability in their releases.

  • Moinak Ghosh Says:

    This is very weird. I have had occassions in the past due to some issue or the other where I had to switch off the system without shutting down OpenSolaris (mostly due to hangs with earlier Nvidia drivers). I am yet to face this problem.

  • Uwe Dippel Says:

    True. Fully true.
    Living in a tropical area with a lot of power outages through thunderstorms, I can confirm your observation partially. I never lost grub, but I get rather regular corruptions of the boot archive, that require a failsafe reboot.
    What amazes me most, is that atomic (reads and) writes are the sales argument. Somewhere I read that corruption didn’t happen in a simulated 1 million power-cut-offs. In a real-world situation, it rather looks like 1 in 5 that end up with data corruption. I have been running various versions (usually close to the most recent) of Nevada.

  • Kebabbert Says:

    This is strange. I have shut off the power several times on my Solaris Nevada computer, when I tried different device drivers etc. I never once had a problem, nor with ZFS nor with anything else.

Leave a Reply