scsibug.com

November 13, 2007

Inside Leopard FSEvents

Filed under: OS X, Programming, Technology — Tags: , , , , — greg @ 11:08 pm

I pieced together some of Apple’s sample code for working with Filesystem Events, the API that powers TimeMachine. Might be useful to see what exactly gets modified between backups, or what directories are being modified on a running system. You can drop this source file into an XCode project with CoreFoundation and CoreServices frameworks, or download a universal binary. Should go without saying, but this requires OS X 10.5.

#include <CoreFoundation/CoreFoundation.h>
#include <CoreServices/CoreServices.h>

/*
 *  Watch for filesystem changes using FSEvents API.
 *  11/14/2007, Greg Heartsfield
 *
 *  No copyright claimed by me, this is mostly Apple
 *  example code, with a run loop added.
 */

void printDirectories
(
 ConstFSEventStreamRef streamRef,
 void *clientCallBackInfo,
 size_t numEvents,
 void *eventPaths,
 const FSEventStreamEventFlags eventFlags[],
 const FSEventStreamEventId eventIds[])
{
  int i;
  char **paths = eventPaths;
  for (i=0; i<numEvents; i++) {
    printf("%s\n", paths[i],
           eventFlags[i]);
  }
}

int main (int argc, const char * argv[]) {
  fprintf(stderr, "Watching for filesystem changes to /\n");
  CFStringRef mypath = CFSTR("/");
  CFArrayRef pathsToWatch =
    CFArrayCreate(NULL,
                  (const void **)&mypath,
                  1, NULL);
  void *callbackInfo = NULL;
  FSEventStreamRef stream;
  CFAbsoluteTime latency = 2.0;
  stream =
    FSEventStreamCreate(NULL,
                        printDirectories,
                        callbackInfo,
                        pathsToWatch,
                        kFSEventStreamEventIdSinceNow,
                        latency,
                        kFSEventStreamCreateFlagNone
                        );
  CFRunLoopRef mainLoop = CFRunLoopGetCurrent();
  FSEventStreamScheduleWithRunLoop(stream,
                                   mainLoop,
                                   kCFRunLoopDefaultMode);
  FSEventStreamStart(stream);
  CFRunLoopRun();
  return 0;
}

November 3, 2007

GHC 6.8.1 OS X Builds

Filed under: Haskell, Programming, Technology — Tags: , , , , , — greg @ 9:25 pm

Builds of the newly released GHC 6.8.1 for OS X PPC and Intel.

ghc-6.8.1-powerpc-apple-darwin.tar.bz2 (70 MB)
ghc-6.8.1-i386-apple-darwin.tar.bz2 (61 MB)
Requires readline/GMP, as usual.

To install:

./configure
sudo make install

Note: The PowerPC build does not work on 10.5 Leopard. See this GHC Trac bug.

October 27, 2007

Leopard Breaks /usr/local/lib

Filed under: OS X, Technology — Tags: , , , , , — greg @ 11:17 am

After updating to Leopard last night, I noticed that certain command line apps had broken, with various different errors:

$ ghci-6.6.1
/usr/local/bin/ghci-6.6.1: line 12: /usr/local/lib/ghc-6.6.1/ghc-6.6.1: Too many levels of symbolic links
/usr/local/bin/ghci-6.6.1: line 12: exec: /usr/local/lib/ghc-6.6.1/ghc-6.6.1: cannot execute: Too many levels of symbolic links
$ mutt
dyld: Library not loaded: /usr/local/lib/libintl.3.dylib
  Referenced from: /usr/local/bin/mutt
  Reason: no suitable image found.  Did find:
	/usr/local/lib/libintl.3.dylib: stat() failed with errno=62
	/usr/local/lib/libintl.3.dylib: stat() failed with errno=62
Trace/BPT trap

Traced it back relatively quickly to this travesty:

$ ls -altr /usr/local/lib
lrwxr-xr-x  1 root  wheel  14 Oct 26 20:40 /usr/local/lib -> /usr/local/lib

Yep, /usr/local/lib is a symlink to /usr/local/lib. Something in the install process moved my original /usr/local/lib to “/usr/local/lib 1″, so the fix is simple:

$ sudo mv /usr/local/lib /usr/local/lib_broken
$ sudo mv /usr/local/lib\ 1/ /usr/local/lib

Besides that, Leopard is brilliant :)

Update: Coqide (theorem prover GUI) crashes on startup :(

October 22, 2007

QuickCheck Love

Filed under: Haskell, Programming, Technology — Tags: , , , — greg @ 9:49 pm

If you actually want to get started with QuickCheck, check out the HaskellWiki. The following is just meant to give a simple motivating example for using QuickCheck.

We’ve implemented some simple symmetric block ciphers in Haskell, all of which have encryption/decryption functions with types like the following:

tea_enc :: Int                              -- Cycle count
        -> (Word32, Word32, Word32, Word32) -- 128-bit key
        -> (Word32, Word32)                 -- Input block
        -> (Word32, Word32)                 -- Output block

For these ciphers, we want to verify the very simple property that when using the same key, encrypting and then decrypting a block will yield that same block. That property is represented with the code below (which takes the encryption/decryption ciphers, a key, and an input block as arguments):

test_identity enc dec key input =
    input == dec key (enc key input)

Now, we want to use QuickCheck to randomly generate keys and input blocks, and check the identity property. To do that, we’ll change the function to just take the cipher algorithms as arguments, and use QuickCheck to generate the rest.

test_identity enc dec =
    quickCheck (\key input ->
                input == dec key (enc key input))

Notice how it’s almost identical to our original function that tests a single instance of the property. The only change was moving the arguments that we want to vary into a lambda function, and handing it off to the quickCheck function. QuickCheck figures out how to generate the required arguments, pumps out arbitrary values, and tests them against the given property.

Finally, we can run some tests on our cipher (using 32 cycles):

TEA> test_identity (xtea_encode 32) (xtea_decode 32)
OK, passed 100 tests.

Let’s deliberately cause the property to be invalidated by changing just the encoding function and re-running the test:

TEA> test_identity (xtea_encode 32) (xtea_decode 32)
Falsifiable, after 1 tests:
(3,4294967295)
(2,1,0,4294967295)

Which tells us after the very first test, it was able to find an input block and key that make the property false. Notice the numbers QuickCheck selected for its first test, they do not appear random, and in fact, QuickCheck doesn’t necessarily generate random values for types. In this case, it is starting with what look like typical corner cases (very large numbers, zero, small numbers).

Note: QuickCheck isn’t natively able to generate arbitrary values for the type Word32, so we had to tell it how to do so, with the following code, which maps arbitrary integers onto Word32 values (courtesy of a haskell-cafe post by Sebastian Sylvan):

instance Arbitrary Word32 where
    arbitrary = do c <- arbitrary :: Gen Integer
		   return (fromIntegral c)

By making any user-defined type an instance of Arbitrary, you can have QuickCheck automatically generate tests for properties that take that type as an argument.

xUnit tests definitely have their place. Some ciphers have sets of test vectors for instance, which exploit very specific corner cases in the algorithm. But for testing general properties of code, the quickcheck style can be very intuitive and… quick.

September 30, 2007

Pre-signing URIs for expiration with S3

Filed under: Haskell, Programming, Technology, Uncategorized, planethaskell — greg @ 10:34 pm

This is a neat feature of S3 that I wasn’t aware of until after digging through the API docs. You can generate URIs to S3 resources, which expire after a given time. It’s in the latest release of hS3, which includes an example for generating a URI valid for a certain number of seconds in the future.

Instead of making an object public via an ACL, you create a signature of the resource and an expiration date. This signature is added as a query element to the URI, and then given out to users. The end result looks like the following:

http://s3.amazonaws.com:80/hS3/LICENSE
?AWSAccessKeyId=09MD8BAR1GEXCERHT1R2
&Expires=1191208823
&Signature=bY6Luynk8mxbzaO8yv2Pcd3kd1U%3d

Which is generated by code like:

do uri <- publicUriForSeconds connection object exp_seconds
   putStrLn uri

September 23, 2007

Haskell S3 Library

Filed under: Haskell, Programming, Technology, planethaskell — greg @ 10:28 pm

Amazon says:
Amazon S3 provides a simple web services interface that can be used to store and retrieve any amount of data, at any time, from anywhere on the web. It gives any developer access to the same highly scalable, reliable, fast, inexpensive data storage infrastructure that Amazon uses to run its own global network of web sites. The service aims to maximize benefits of scale and to pass those benefits on to developers.

After hacking away for most of my evenings this week, and the weekend, I’ve got something that I don’t feel terribly bad about releasing. Still needs a good deal of work, but the fundamental operations work. I’m sure there will be some API changes in the future to accomodate some of the missing features.

What works:

Issues:

  • It’s slooow. The HTTP library doesn’t use ByteStrings, so sending multi-megabyte files is painful.
  • Requires the absolute latest Crypto and HTTP libraries, since I submitted patches for HMAC-SHA1, and HTTP DELETE during the course of writing the library.
  • To use the more advanced features, such as ACLs, metadata, hierarchical listings, you’ll probably have to spend some time with the Amazon docs.

Coming Soon:

  • Better documentation, examples, and test cases.
  • Other Amazon Web Services, in the order that they interest me (SQS, EC2, FPS…)

Here’s how you get started. The latest version is available via darcs:

darcs get http://scsibug.com/hS3

Or, get the official release from Hackage

Documentation here, the AWSConnection, S3Bucket, and S3Object modules are where you’ll want to begin.

September 16, 2007

Haskell HMAC

Filed under: Haskell, Programming, Technology, planethaskell — greg @ 12:01 am

A necessary part of the Amazon S3 library I’m planning on putting some effort into is HMAC-SHA1, for authentication. It is missing from any public Haskell libraries I could find, so I put one together during my weekend in Austin. Hopefully it will make a nice addition (after some cleanup) to the excellent Crypto library, on which it depends for message digest functions.

-----------------------------------------------------------------------------
-- |
-- Module      :  Data.HMAC
-- Copyright   :  (c) Greg Heartsfield 2007
-- License     :  BSD-style (see the file ReadMe.tex)
--
-- Implements HMAC (hashed message authentication code) as defined in FIPS 198
-- .
–
—————————————————————————–

module Data.HMAC(
   — * Function Types
   hmac, hmac_sha1, hmac_md5,    sha1_hm,key_from_user,
   — * Data Types
   HashMethod(HashMethod, digest, input_blocksize),
   ) where

import Data.Digest.SHA1 as SHA1
import Data.Digest.MD5 as MD5
import Data.Word(Word32)
import Data.Bits (shiftR, xor, bitSize, Bits)
import Codec.Utils (Octet)
import Debug.Trace

– | HMAC works over any hash function, which is represented by
–   HashMethod.  A hash function, and input block size must
–   be specified.

data HashMethod =
    HashMethod { — | An arbitrary hash function
                 digest :: [Octet] -> [Octet],
                — | Bit size of an input block to the hash function
                 input_blocksize :: Int}

–Some useful digest functions for use with HMAC.

sha1_hm = HashMethod (w160_to_w8s . SHA1.hash) 512
md5_hm = HashMethod MD5.hash 512

– | Compute an HMAC using SHA-1 as the underlying hash function.

hmac_sha1 :: [Octet] — ^ Secret key
          -> [Octet] — ^ Message text
          -> [Octet] — ^ Resulting HMAC-SHA1 value
hmac_sha1 = hmac sha1_hm

– | Compute an HMAC using MD5 as the underlying hash function.

hmac_md5 :: [Octet] — ^ Secret key
         -> [Octet] — ^ Message text
         -> [Octet] — ^ Resulting HMAC-SHA1 value
hmac_md5 = hmac md5_hm

w160_to_w8s :: Word160 -> [Octet]
w160_to_w8s w = concat $ map w32_to_w8s (w160_to_w32s w)

w160_to_w32s :: Word160 -> [Word32]
w160_to_w32s (Word160 a b c d e) = a : b : c : d : e : []

w32_to_w8s :: Word32 -> [Octet]
w32_to_w8s a = (fromIntegral (shiftR a 24)) :
               (fromIntegral (shiftR a 16)) :
               (fromIntegral (shiftR a 8)) :
               (fromIntegral a) : []

– | Generalized function for creating HMACs on a specified
–   hash function.

hmac :: HashMethod — ^ Hash function and associated block size
        -> [Octet] — ^ Secret key
        -> [Octet] — ^ Message text
        -> [Octet] — ^ Resulting HMAC value
hmac h uk m = hash (opad ++ (hash (ipad ++ m)))
    where hash = digest h
          (opad, ipad) = process_pads key
                           (make_start_pad bs opad_pattern)
                           (make_start_pad bs ipad_pattern)
          bs = input_blocksize h
          key = key_from_user h uk

– Create a key of the proper size from the user-supplied key.
– Keys greater than blocksize get hashed and returned.
– Keys same as blocksize are used as is.
– Keys shorter than blocksize are padding with zeros.

key_from_user :: HashMethod -> [Octet] -> [Octet]
key_from_user h uk =
    case (compare (bitcount uk) (input_blocksize h)) of
      GT -> fill_key ((digest h) uk)
      LT -> fill_key uk
      EQ -> uk
    where fill_key kd =
              kd ++ (take (((input_blocksize h) - (bitcount kd)) `div` 8)
                     (repeat 0×0))

– Create the inner/outer pad values by XOR’ing with the key.

process_pads :: [Octet] — Key
             -> [Octet] — opad
             -> [Octet] — ipad
             -> ([Octet], [Octet]) — new opad, new ipad
process_pads ks os is =
    unzip $ zipWith3 (\k o i -> (k `xor` o, k `xor` i)) ks os is

– Create padding values for a hash of a given bit size.

make_start_pad :: Int -> Octet -> [Octet]
make_start_pad size pad = take (size `div` (bitSize pad)) $ repeat pad

– Padding constants, per the spec.

opad_pattern = 0×5c :: Octet
ipad_pattern = 0×36 :: Octet

– Bit count of byte array.

bitcount :: [Octet] -> Int
bitcount k = (length k) * (bitSize (head k))

edited to fix bugs and incorporate suggestions from comments
And… test cases
This is now incorporated into the latest version of the Haskell crypto library (later than 4.0.3)

September 7, 2007

Leaf Counting

Filed under: Haskell, Programming, Technology — greg @ 10:48 pm

What follows is a quick implementation for a “how would you do that
in Haskell” question. The underlying question for me was “Is the naive
implementation good enough” (I think the answer is yes). This post is
literate haskell, so you can load it as is into a haskell compiler.

Given a tree, we define the quantity to be the count of leaf nodes
under that tree. For some arbitrary tree, we want to know how many
possible subtrees have a quantity of N or more.

First, define our datatype. A tree is either a list of children
trees, or a terminal leaf. We make this an instance of the Show
typeclass so it can be printed.
 

> data PTree = PTree [PTree] | Leaf deriving (Show)

 
This is the problem we want to solve, how many possible subtrees exist
with a quantity at or above the given number. We’ll define the
quantity and all_trees functions next.
 

> min_quantity :: Int -> PTree -> Int
> min_quantity n t = length $ filter (\x -> (quantity x) >= n) (all_trees t)

 
Find the number of leaves on a tree. If a tree is a leaf, define the
quantity to be one. If it is a tree, find the sum of all the
quantities of its children.
 

> quantity :: PTree -> Int
> quantity (PTree x) = foldr ((+) . quantity) 0 x
> quantity Leaf = 1

 
Find all possible subtrees.
 

> all_trees :: PTree -> [PTree]
> all_trees t@(PTree x) = t : (concat $ map all_trees x)
> all_trees Leaf = [Leaf]

 
Now, some quick functions to give us a nice big tree, for testing.

Take a tree, and add a given number of levels of depth. The leaf
count is going to be 2^(n-1), so you won’t want to run this with more
than a depth of about 25 (16M leaves).
 

> tree_of_depth :: Int -> PTree
> tree_of_depth x = tree_of_depth_acc x Leaf

> tree_of_depth_acc :: Int -> PTree -> PTree
> tree_of_depth_acc 0 pt = pt
> tree_of_depth_acc x pt = tree_of_depth_acc (x-1) (add_depth pt)

> add_depth :: PTree -> PTree
> add_depth (PTree x) = PTree [(PTree x), (PTree x)]
> add_depth Leaf = PTree [Leaf]

 
So, now that all works, and we can answer our question with some test
data. The following, when run, will evaluate how many possible trees
and subtrees exist in a tree with depth 20, that have a quantity of
50 or greater:
 

> main = (putStrLn . show) $ min_quantity 50 (tree_of_depth 20)

 
which should return “16383″

Running on my macbook pro, compiled, this takes under 1.5 seconds real time.
Searching for trees of quantity 50 or more, with a depth of 25 (16M+ leaves)
finishes in under a minute.

Compiling with profiling enabled, running with heap profiling, and examining
the memory usage reveals that this is fairly well-behaved.

Of course, we have the benefit in this case of being able to lazily evaluate the
tree, but at least our main algorithms aren’t showing any space leaks.

The profiling chart linked above was created by compiling and running the
program as follows:
 

ghc -prof -auto-all post.lhs -o tsp
./tsp +RTS -hc
hp2ps -c tsp.hp
convert -rotate 270 -scale 800 tsp.ps tsp.png

August 2, 2007

Show and Destroy Trailing Whitespace

Filed under: Haskell, Programming — greg @ 6:36 pm

Unnecessary whitespace at the end of a line is my pet peeve, especially since darcs makes it so obvious at record time. This is a pretty straightforward way to highlight these characters in any emacs buffer containing haskell code (use a hook like first-change-hook if you want it to show up everywhere). Add the following to .emacs and enjoy:

(add-hook 'haskell-mode-hook
          '(lambda ()
             (setq show-trailing-whitespace t)))

And don’t forget M-x delete-trailing-whitespace for automatic cleanup.

July 5, 2007

uncurry

Filed under: Haskell, Programming, Technology — greg @ 2:29 pm

My last post had me thinking of some of the ways the function uncurry is useful in Haskell. Let’s first look at the type:

uncurry :: (a -> b -> c) -> (a, b) -> c

Usually the results are obvious, transforming a given function by simply taking a couple of its arguments and replacing them with a single tuple argument. But sometimes we can get something a bit more interesting:

  • uncurry (flip (,)) :: (a, b) -> (b, a) — swap tuple elements around.
    • uncurry (flip (,)) (”hello”, “world”) ==> (”world”, “hello”)
  • uncurry ($) :: (a -> b, a) -> b — function application within a tuple; second element applied to the first.
    • uncurry ($) ((+3), 4) ==> 7
  • uncurry const :: (a, b) -> a — oops! we’ve un-optimized ourselves back to the equivalent of fst.
  • uncurry (flip const) :: (a, b) -> b — and we can duplicate snd by adding a flip.
  • map . uncurry :: (a -> b -> c) -> [(a, b)] -> [c] — apply function over a list, getting arguments from within tuples in that list.
    • (map . uncurry) (+) [(1,1), (2,3), (5,10)] ==> [2,5,15]

So basically, uncurry is a heavy-duty higher-order tool for working with tuples.

Newer Posts »

Powered by WordPress