Archive for zfs

Root zfs mirror boot

Posted in software with tags , , on 2009.03.13 by ipv5

Status on booting opensolaris from a zfs redundant array:

  • the installation does not provide the option to do it
  • you cannot boot from a raidz (i.e.: raid5) zpool
  • you can boot from a zpool mirror, but you must create the mirror after the single-disk zpool installation

To get the / zpool up, her’s a very nice console howto, based on this other posting.

Breaking and fixing it:

To test the resilience of the system I created the mirror, booted from a livecd partitioner, erased the second disk, and finally rebooted from the installed system.

The mirror was recognized as degraded, and I was able to recover it without a hitch using the aforementioned howto.

admin@opensolaris:~$ zpool list
NAME SIZE USED AVAIL CAP HEALTH ALTROOT
rpool 232G 4.49G 228G 1% DEGRADED –
admin@opensolaris:~$ zpool status
pool: rpool
state: DEGRADED
status: One or more devices could not be opened. Sufficient replicas exist for
the pool to continue functioning in a degraded state.
action: Attach the missing device and online it using ‘zpool online’.
see: http://www.sun.com/msg/ZFS-8000-2Q
scrub: none requested
config:

NAME STATE READ WRITE CKSUM
rpool DEGRADED 0 0 0
mirror DEGRADED 0 0 0
c4d0s0 ONLINE 0 0 0
c4d1s0 UNAVAIL 9 1.91K 0 cannot open

errors: No known data errors
admin@opensolaris:~$

admin@opensolaris:~$ pfexec zpool detach rpool c4d1s0
admin@opensolaris:~$ zpool status
pool: rpool
state: ONLINE
scrub: none requested
config:

NAME STATE READ WRITE CKSUM
rpool ONLINE 0 0 0
c4d0s0 ONLINE 0 0 0

errors: No known data errors

admin@opensolaris:~# format
Searching for disks…done

AVAILABLE DISK SELECTIONS:
0. c4d0 <drive type unknown>
/pci@0,0/pci-ide@1f,2/ide@0/cmdk@0,0
1. c4d1 <DEFAULT cyl 30390 alt 2 hd 255 sec 63>
/pci@0,0/pci-ide@1f,2/ide@0/cmdk@1,0
2. c5d0 <ST325062-         5QE3FBW-0001-232.83GB>
/pci@0,0/pci-ide@1f,2/ide@1/cmdk@0,0
3. c5d1 <ST325062-         5QE3FAA-0001-232.83GB>
/pci@0,0/pci-ide@1f,2/ide@1/cmdk@1,0
Specify disk (enter its number): 0

AVAILABLE DRIVE TYPES:
0. DEFAULT
1. other
Specify disk type (enter its number): 0
selecting c4d0
No current partition list
No defect list found
[disk formatted, no defect list found]

No fdisk solaris partition found.

FORMAT MENU:
disk       – select a disk
type       – select (define) a disk type
…..
volname    – set 8-character volume name
!<cmd>     – execute <cmd>, then return
quit
format> fdisk
No fdisk table exists. The default partition for the disk is:

a 100% “SOLARIS System” partition

Type “y” to accept the default partition,  otherwise type “n” to edit the
partition table.
y
format> quit
admin@opensolaris:~#

admin@opensolaris:~$ pfexec prtvtoc /dev/rdsk/c4d0s2 | fmthard -s- /dev/rdsk/c4d1s2
fmthard: Cannot open device /dev/rdsk/c4d1s2 – Permission denied
admin@opensolaris:~$ su
Password:
admin@opensolaris:~# prtvtoc /dev/rdsk/c4d0s2 | fmthard -s- /dev/rdsk/c4d1s2
fmthard: New volume table of contents now in place.

admin@opensolaris:~# zpool attach -f rpool c4d0s0 c4d1s0
Please be sure to invoke installgrub(1M) to make ‘c4d1s0’ bootable.
admin@opensolaris:~# installgrub -m /boot/grub/stage1 /boot/grub/stage2 /dev/rdsk/c4d1s0
Updating master boot sector destroys existing boot managers (if any).
continue (y/n)?y
stage1 written to partition 0 sector 0 (abs 16065)
stage2 written to partition 0, 267 sectors starting at 50 (abs 16115)
stage1 written to master boot sector

admin@opensolaris:~# zpool status
pool: rpool
state: ONLINE
status: One or more devices is currently being resilvered. The pool will
continue to function, possibly in a degraded state.
action: Wait for the resilver to complete.
scrub: resilver in progress for 0h2m, 50.72% done, 0h2m to go
config:

NAME STATE READ WRITE CKSUM
rpool ONLINE 0 0 0
mirror ONLINE 0 0 0
c4d0s0 ONLINE 0 0 0 3.00M resilvered
c4d1s0 ONLINE 0 0 0 2.28G resilvered

errors: No known data errors

Unfortunatey, while zapping the second disk, rebooting, reconstructing et al worked perfectly, when I tried and zapped the first disk the machine hanged at boot showing only:

GRUB

Booting from the livecd, importing the zpool mirror and fixing it (with the same method) however worked just fine.

jack@opensolaris:~# zpool import
pool: rpool
id: 15289673696250060511
state: DEGRADED
status: The pool was last accessed by another system.
action: The pool can be imported despite missing or damaged devices.  The
fault tolerance of the pool may be compromised if imported.
see: http://www.sun.com/msg/ZFS-8000-EY
config:

rpool       DEGRADED
mirror    DEGRADED
c4d0s0  UNAVAIL  cannot open
c4d1s0  ONLINE
jack@opensolaris:~# zpool list
no pools available
jack@opensolaris:~# zpool import rpool
cannot import ‘rpool’: pool may be in use from other system, it was last accessed by opensolaris (hostid: 0xc75b92) on Fri Mar 13 02:59:45 2009
use ‘-f’ to import anyway
jack@opensolaris:~# zpool import -f rpool
jack@opensolaris:~# zpool list
NAME    SIZE   USED  AVAIL    CAP  HEALTH  ALTROOT
rpool   232G  4,49G   228G     1%  DEGRADED  –

jack@opensolaris:~# zpool detach rpool c4d0s0
jack@opensolaris:~# zpool status
pool: rpool
state: ONLINE
scrub: none requested
config:

NAME        STATE     READ WRITE CKSUM
rpool       ONLINE       0     0     0
c4d1s0    ONLINE       0     0     0

errors: No known data errors

jack@opensolaris:~# format
Searching for disks…done

AVAILABLE DISK SELECTIONS:
0. c4d0 <drive type unknown>
/pci@0,0/pci-ide@1f,2/ide@0/cmdk@0,0
1. c4d1 <DEFAULT cyl 30390 alt 2 hd 255 sec 63>
/pci@0,0/pci-ide@1f,2/ide@0/cmdk@1,0
2. c5d0 <ST325062-         5QE3FBW-0001-232.83GB>
/pci@0,0/pci-ide@1f,2/ide@1/cmdk@0,0
3. c5d1 <ST325062-         5QE3FAA-0001-232.83GB>
/pci@0,0/pci-ide@1f,2/ide@1/cmdk@1,0
Specify disk (enter its number): 0

AVAILABLE DRIVE TYPES:
0. DEFAULT
1. other
Specify disk type (enter its number): 0
selecting c4d0
No current partition list
No defect list found
[disk formatted, no defect list found]

No fdisk solaris partition found.

FORMAT MENU:
disk       – select a disk
type       – select (define) a disk type
…..
volname    – set 8-character volume name
!<cmd>     – execute <cmd>, then return
quit
format> fdisk
No fdisk table exists. The default partition for the disk is:

a 100% “SOLARIS System” partition

Type “y” to accept the default partition,  otherwise type “n” to edit the
partition table.
y
format> quit
jack@opensolaris:~#

jack@opensolaris:~# prtvtoc /dev/rdsk/c4d1s2 | fmthard -s- /dev/rdsk/c4d0s2
fmthard:  New volume table of contents now in place.

jack@opensolaris:~# zpool attach -f rpool c4d1s0 c4d0s0
Please be sure to invoke installgrub(1M) to make ‘c4d0s0’ bootable.

jack@opensolaris:~# zpool status
pool: rpool
state: ONLINE
status: One or more devices is currently being resilvered.  The pool will
continue to function, possibly in a degraded state.
action: Wait for the resilver to complete.
scrub: resilver in progress for 0h0m, 0,86% done, 0h44m to go
config:

NAME        STATE     READ WRITE CKSUM
rpool       ONLINE       0     0     0
mirror    ONLINE       0     0     0
c4d1s0  ONLINE       0     0     0  1,57M resilvered
c4d0s0  ONLINE       0     0     0  39,4M resilvered

errors: No known data errors

jack@opensolaris:~# installgrub -m /boot/grub/stage1 /boot/grub/stage2 /dev/rdsk/c4d0s0
Updating master boot sector destroys existing boot managers (if any).
continue (y/n)?y
stage1 written to partition 0 sector 0 (abs 16065)
stage2 written to partition 0, 267 sectors starting at 50 (abs 16115)
stage1 written to master boot sector

Zfs versions feature madness

Posted in software with tags , , , on 2008.04.22 by ipv5

Feature creep? This is feature madness! This is SpartaZFS!
just go to http://opensolaris.org/os/community/zfs/version/1/ and increment the last number (or, read on, might as well write it down as I read) for all the goodies [thet keep gettin] added to zfs.

  1. This is the initial ZFS on-disk format as integrated on 10/31/05
  2. Support for “Ditto Blocks”, or replicated metadata. Metadata can be replicated up to 3 times for each block, independently on the underlying redundancy. (i.e.: if you have a raid1 on two disk, you get 6 copies of the blocks you deem important) So even if your user data get corrupted everything (fingers crossed) will still be discoverable and the pool will be useable.
  3. Hot spares, improved RAID-Z accounting (does not mention how it get improved however), and support for double-parity RAID-Z (aka raidz2, aka suspiciously-looks-alot-like-raid6).
  4. zpool history. A log of whatever happens to your pools
  5. gzip compression for zfs datasets. Your /usr/ports is now very happy (remember to mount /usr/ports/distfiles elsewhere however)
  6. ‘bootfs’ pool property. (yes, it does what it looks like it does)
  7. With the ZFS Intent Log (ZIL) an application (a database usually) does know that whatever it did just wrote to disk will stay written even if a power failure occurs. Instead of waiting a second or two for the zfs to do all its magik, there’s a transaction log in which fsync(fd)s are stored, so the database can churn away happily without having to wait. If power failure occurs between zfs disk commits, this log is read and committed to disk as well.
  8. Administrative tasks (such as creation/of descendent datasets) can be assigned to non-administrative users. While this is a bit scary, remember we can assign quotas to the parent dataset
  9. Dataset quotas and reservation can be configured not to include descendent datasets (such as snapshots/clones) in the space consumption cap. And there’s support for the sun cifs server as well
  10. You can specify a device in the zfs pool to act as cache. “These devices provide an additional layer of caching between main memory and disk. Using cache devices provides the greatest performance improvement for random read-workloads of mostly static content.” You know what a cache is, and there’s way too much math for me to go look at the detailed performance improvement.

interesting links:
nice recap of avaiable solaris filesystems

easy introduction to zfs, and way-too-much-math introduction to zfs2

zfs cheat sheet

introduction to ZIL and more in-depth stuff as well

List of zfs administrative tasks which can be delegated, along with a nice primer

configuring the cifs server to use zfs datasets, for workgroups and with active directory

edit: a very interesting blog about building an home fileserver using ZFS, and the ZFS Evil Tuning Guide.