Saturday, October 24, 2009

Computer Forensics Paper

Forensic Focus, the leading computer forensics community site, has posted one of my papers, titled, "Simple Steganography on NTFS when using the NSRL." It's a relatively simple idea, but important for computer forensics investigators who use the NSRL (National Software Reference Library from NIST). For those of you who aren't familiar with the NSRL, it contains the hashes of millions of files from operating systems and applications. This information is then used to identify and filter out the files that you have to look at during an investigation. This is standard practice in computer forensics, and the paper is about some steganography that you have to look out for.

Tuesday, October 13, 2009

Using Logic Puzzles in Interviews

I have never been a fan of using logic puzzles in the hiring process. The practice apparently originated at Microsoft. The idea for using them is based in the fact that technology is always changing. And so you don't want to test anyone on a current technology, but rather on their general logic abilities which will let you know whether they will continue to be useful to the company in the future when it has to adopt new technologies. This line of reasoning is compelling and it seems like a lot of tech companies have accepted it.

Personally, I don't buy the argument. For one thing I don't see why the puzzle necessarily tests someone's logic abilities better than a programming test, which gives someone a formal way to reason - the programming language. It also does not necessarily follow that someone who can solve logic puzzles well is going to be any good at programming which has all kinds of constraints and doesn't rely on clever tricks. I think it's far better to give someone a real programming problem that you have at work and see how they go about solving it. As it's a real work problem, you should be familiar with the details and some of the approaches to solving it. From this familiarity with the problem you should be able to tell more about someone's ability to reason and think logically, then from a puzzle which has no context.

As for whether a potential programmer will be able to move on to other technologies or languages, it makes more sense to me to see how they understand the concepts behind what they are doing and also to get a sense of what kind of person they are. When it comes to learning a new technology, what you need more than some general logical abilities is good curiosity and motivation.

Sunday, October 11, 2009

Distributed Computing Models

There are a number of general models for distributed computing that exist. A lot of the terms are used interchangeably, and there are lots of systems that seem to fall in-between these models or that combine them. Nevertheless I think it is useful to make distinctions and define the different models this way:

Client-Server, 3-tier, N-tier - The processing is distributed through the use of layers. There are layers for UI, for business logic, for data storage, etc. These are generally data-driven applications.

Clustered - A set of machines act as one. There are usually shared data stores and the multiple machines are effectively transparent. This is used for things like load-balancing or fault tolerance.

Peer-to-Peer - These systems are decentralized and used for applications like file sharing or instant messaging. In practice these systems need some centralization at least for user management.

Grid - These are systems where the processing is split up so that many machines can work in parallel. These are becoming the most popular because they are necessary for big data systems.

So when you think of distributed systems, there really seem to be 4 concepts: layers, unified, decentralized, and parallel.

Let me know if you think I'm missing something.

Saturday, October 3, 2009

Hadoop World 2009

I went to Hadoop World: NYC 2009 on Friday, October 2. It was organized by Cloudera, the company that provides professional support and training for Hadoop. (Amr Awadallah, their CTO, sent me a discount code - Thanks Amr!)

The first time that I really took notice of Hadoop was early last year. It's amazing to see how much ground it's covered since then. At the conference there was a whole track devoted to applications. There was your usual bunch of niche companies using it, but also presentations by VISA, JP Morgan Chase, eBay, and other big names. A lot of people are using it in conjunction with Lucene.

What's becoming clear to me is that Hadoop is becoming THE platform for data analysis and processing. There are other systems out there to handle large data sets, most of them are based in some way on a relational database and incorporate MapReduce and a distributed architecture, but none of them seem to have the flexiblity of Hadoop. There are a range of useful applications, for example, that can be built which just use the HDFS (the Hadoop Distributed File System).