The MacView

Virtual Instrumentation from a Mac perspective

My Photo
Name:
Location: Pflugerville, TX, United States

Tuesday, June 26, 2007

What ZFS support means

There have been a lot of ramblings about ZFS support in Leopard. Many people have wondered what the big deal is, and how it would help the average Mac user. Personally, I was just waiting for Steve Jobs to say the words "boot ZFS" and my life would have been compelte (well, not really, but it would have been pretty cool).

First, a little history (as I remember it, so it may not be completely accurate, but its close enough). The first filesystem (the way files are stored on disk) on the Macintosh was MFS (Multiple File System). MFS was very short lived (I have never actually used it, just heard about it). It did not have hierarchical file system (no folders). HFS (Hierarchical File System) replaced MFS pretty quickly. HFS was written when most people still booted off of 3.5" floppy disks (1.44 MB max) and computer RAM was measured in kB not GB. Then Apple upgraded HFS to HFSPlus. It handled much bigger drives, supported Unicode better and was just overall a better, more modern filesystem. The last little tweak Apple did to HFSPlus was to add journalling support (the filesystem better handled unexpected power outages).

The Sun develops an incredible filesystem, ZFS (Zetabyte File System).


  1. Adding more disk space is easy

    Right now, there are one or more logical disks for every physical disk. Most people are familiar with Macintosh HD which is a logical disk (Macintosh HD on the desktop) and a physical disk (say, a Western Digital inside their Mac). You can partition a physical disk into multiple logical disks. I have done this with a 250 GB external firewire drive. I have a different Mac OS version on each of (now 5) partitions, or logical disks.

    ZFS takes that trend in the reverse direction. You can have multiple physical disks "pooled" together into one logical disk. Imagine the following: you are running out of disk space, so you buy a new, much bigger hard drive. You install it and format it, and now you have two options: (1) migrate everything to the new drive, or (2) do some UNIX command lines to hobble the new disk into the filesystem on the old disk (sym-links, moving the home directory, etc). Neither of these is very clean, and both make things feel like the system is fragile.

    If your main hard drive had been formatted ZFS, you could do the following instead: Tell the OS to add the new drive to the "pool" of drive space available. That's it. The logical disk, Macintosh HD on your desktop, would now have the full storage capacity of both physical disks in your system. You can add as many drives as you can connect to your machine.

  2. Speed benefits of RAID, but in a simpler package


    The idea of "pooling" physical disks together for the logical disk has another benefit that is similar to the speed concepts in RAID. When you write data, you split the data between multiple disks. Then when you read it, both disks work as fast as they can to get their piece of the data requested. This is similar in concept to multiple CPUs (cores) in Macs today. If you can have two or more things sharing the load, you can make it faster.

  3. Failing disks can be detected sooner


    The problem with using multiple disks, is that your risk of disk failure goes up dramatically. Instead of a 5% that one disk will fail, you have a 10% chance that one of the two disks will fail (ok, I don't remember my statistics, or at least didn't want to think too hard about it, but you get the idea). Some RAID schemes solve this problem by storing just enough shared data on the disks, that if one drive goes down, it can continue to give you correct data (although slower) until you replace the drive.

    ZFS has a RAID mode, but you have to have all the disks be the same size. Standard ZFS has a cool feature though. Every block of data that is written, has a checksum written with it. Every time a block is read, it checks to make sure that checksum is correct. As soon as it finds a bad block, you can tell it to remove the bad disk from the "pool" (which will copy the data onto the remaining disks) and then replace the bad disk. You get very early detection of a failing disk, and should lose less data.

  4. Compression is built into the filesystem


    Back in the dark days Back when I used a PC and DOS, there was a cool program called Stacker (later Microsoft had a very similar feature). It allows you to reformat a disk and use compression to get more disk space. ZFS brings this back.

    With ZFS, you can turn compression on or off at any moment. While it is on, any data written to disk will be compressed. When off, data is written in raw, uncompressed format. When reading, it will read whatever format was written, compressed or uncompressed.

    You may thing that adding compression would slow down the filesystem, but it actually speeds it up. Processor and memory speeds have been growing at a much faster pace than disk speeds. So the little bit of time it takes to compress/uncompress the data is nothing compared to the time it takes the disk to read or write the data.

  5. Entire "Disk" revision history available


    ZFS has a feature called snapshots. This allows you to create a special "directory" (really a file) that is a snapshot of the filesystem at that moment in time. Think of it as a live, whole system Time Machine, without the external disk.

    This snapshot "directory" takes next to no time to create (there's nothing to copy), and takes up very minimal space. Basically, whenever you modify a file, the old one is kept in the snapshot. Any files that have not been changed since the snapshot are shared between the two. Its kind of like a whole filesystem diff.

    Imagine working on a project, and you get to a cross-road. A decision on direction is needed. You choose what you think is the best direction, but want to get back to the point your at in your project just in case. The code may not be to a point that you can put it in source code control, so you right click on the folder and select Create an Archive and make a zip file of the "snapshot". If it is a large project, you go to lunch and come back just as it finishes archiving it. You continue on and then realize that it is not the direction you really wanted to go. You unzip your snapshot zip file (again, going to lunch), and only then realize that you missed some files in another folder.

    With ZFS, just take regular snapshots of the entire filesystem. They are quick, small and capture the state of the entire filesystem.

    NOTE: Snapshots are not a replacement for backups. If your system gets fried, you lose your snapshots and your data. Backups, like with Time Machine, are extremely valuable and everyone should have a backup strategy.



So what is Apple's plan with ZFS? Nobody but Apple (maybe even Steve Jobs) really knows, but here is what we do know. Leopard has had some limited support for ZFS. They are stated that Leopard has "read only" support for ZFS (at least the beta the just handed out at WWDC has read only ZFS). MacRumors has posted that Apple is giving developers a beta of read-write ZFS as a separate download.

What I am looking forward to is when Apple replaces Journaled HFSPlus with ZFS as the default filesystem. That means they still need a non-beta read-write filesystem that you can boot Mac OS X off of (booting of ZFS is a fairly new feature on any OS).

So I hope the clamor for ZFS grows and Apple listens. I love HFSPlus, but I have a feeling I would love ZFS a whole lot more.

Labels: , , , , , , ,

The views expressed on this website/weblog are mine alone and do not necessarily reflect the views of my employer.


Friday, June 15, 2007

The Importance of Documentation

Sorry for the long delay in posting. Between my three-year-old twin boys, three-month-old twin girls and working hard on getting the next version of LabVIEW for the Mac in great shape, I've been a bit busy.

I've noticed recently what a difference it makes having polished VIs. By polished, I mean well documented, consistent connector panes and a general overall consistency. With all the dialogs to get at the various information and set documentation on controls/indicators and VIs it can be a bit time consuming and difficult to make sure everything is consistent, but it is worth it.

In 8.2.1, you not longer use Command-H to show/hide the contextual help window (which is the system Hide LabVIEW command), you use the Help key (or if you have a MacBook, Command-Shift-H). The contextual help window is your guide to understanding code on the block diagram.

So the places to document your VI are:

1. File -> VI Properties -> Documentation

Set the VI description to some helpful text on what the VI does and how to use it. Also create an HTML file for further documentation. You can have just one HTML file and use anchors <a name="test"> for each VI. The Help tag is the name of the anchor. (NOTE: as of 8.2.1, you must manually escape any spaces in the anchors/Help tags, for instance <a name="test me"> would have a help tag of "test%20me").

While you are in VI Properties, visit the following pages also:

- Protection: make sure the password protection/locking is how you want it.
- Execution: make sure Allow debugging and Enable automatic error handling are turned off (if desired).
- Window Appearance: Make sure the window title is what you want it to be

2. Right-click (control-click if you don't have a mighty mouse) on each control/indicator that is in the connector pane (and possibly others) and select Properties -> Documentation and make sure the description is meaningful, as well as a short description in the tip strip. Also Right-click on each control/indicator in the connector pane and make sure that its Required/Recommended/Optional is set correctly. Also make sure that all controls and indicators that you meant to have on the connector pane are there.

You probably want to visit similar controls/indicators on each VI, instead of visiting each control/indicator on a single VI. For instance, visit all the "error in (no error)" controls on all your VIs, to make sure they are named the same and have the same (or consistent) descriptions, then go on to "error out" and any other common data types. The idea is to make sure you are consistent.

3. Right-click on the following elements of your Project window and select Properties:

- Project Node (My Project.lvproj)
- Library Nodes (MyLib.lvlib)
- Class Nodes (MyClass.lvclass)
etc.

Go to the Documentation page (in the project case, its the Project Description) and fill in the information in a similar way.

Doing this has three benefits:

1. The next person who needs to modify the project (or you in 6 months) will have plenty of documentation on how it works and how to use it.

2. If you product is meant to be used by other LabVIEW developers (a library or class meant for other to use) will have an easier time learning how to use your code correctly.

3. While you document you code, you find things you missed. It forces you to think through how things actually work, and helps you find problems or missing elements.


To simplify the documentation of VIs (not other project items like classes and libraries), I have written a tool that walks through all the VIs in a project and allows you to see in one place all of the VI information and allows you to update it in one place.

As you can see by clicking on the image to the right, you can easily navigate through all VIs in a project, and all terminals in a VI. You can also connect/disconnect terminals, rename terminals, add documentation to terminals and VIs, set the window title, set debugging and auto error handling and see how it all fits together in one screen.

If you are interested in trying out this tool, you can email me at Marc dot Page at NI dot com, and I will reply with a zip file containing this tool.

Labels: , , , , , , ,

The views expressed on this website/weblog are mine alone and do not necessarily reflect the views of my employer.