Thursday, December 17, 2009
The Daily Scrum or Morning Meeting
During the meeting, each member of the team goes over what they have accomplished since the day before and what they are working on. According to strict principles, the meeting is not supposed to become a status meeting run by the project manager (technically the role is supposed to be called a scrum master, but I don't like that name). Personally I like having the project manager run the meeting. I prefer the direction that it gives. And you still end up with the peer pressure because everyone is listening.
Sunday, December 13, 2009
Using Humans in Your System Design
Saturday, October 24, 2009
Computer Forensics Paper
Tuesday, October 13, 2009
Using Logic Puzzles in Interviews
Personally, I don't buy the argument. For one thing I don't see why the puzzle necessarily tests someone's logic abilities better than a programming test, which gives someone a formal way to reason - the programming language. It also does not necessarily follow that someone who can solve logic puzzles well is going to be any good at programming which has all kinds of constraints and doesn't rely on clever tricks. I think it's far better to give someone a real programming problem that you have at work and see how they go about solving it. As it's a real work problem, you should be familiar with the details and some of the approaches to solving it. From this familiarity with the problem you should be able to tell more about someone's ability to reason and think logically, then from a puzzle which has no context.
As for whether a potential programmer will be able to move on to other technologies or languages, it makes more sense to me to see how they understand the concepts behind what they are doing and also to get a sense of what kind of person they are. When it comes to learning a new technology, what you need more than some general logical abilities is good curiosity and motivation.
Sunday, October 11, 2009
Distributed Computing Models
Client-Server, 3-tier, N-tier - The processing is distributed through the use of layers. There are layers for UI, for business logic, for data storage, etc. These are generally data-driven applications.
Clustered - A set of machines act as one. There are usually shared data stores and the multiple machines are effectively transparent. This is used for things like load-balancing or fault tolerance.
Peer-to-Peer - These systems are decentralized and used for applications like file sharing or instant messaging. In practice these systems need some centralization at least for user management.
Grid - These are systems where the processing is split up so that many machines can work in parallel. These are becoming the most popular because they are necessary for big data systems.
So when you think of distributed systems, there really seem to be 4 concepts: layers, unified, decentralized, and parallel.
Let me know if you think I'm missing something.
Saturday, October 3, 2009
Hadoop World 2009
The first time that I really took notice of Hadoop was early last year. It's amazing to see how much ground it's covered since then. At the conference there was a whole track devoted to applications. There was your usual bunch of niche companies using it, but also presentations by VISA, JP Morgan Chase, eBay, and other big names. A lot of people are using it in conjunction with Lucene.
What's becoming clear to me is that Hadoop is becoming THE platform for data analysis and processing. There are other systems out there to handle large data sets, most of them are based in some way on a relational database and incorporate MapReduce and a distributed architecture, but none of them seem to have the flexiblity of Hadoop. There are a range of useful applications, for example, that can be built which just use the HDFS (the Hadoop Distributed File System).
Monday, August 24, 2009
Finance company uses daily scrum
Tuesday, July 7, 2009
Speaking engagement at ITARC New York
Monday, June 15, 2009
Tech Hiring Process
Here is a basic structure that you can use to make your own process:
1. Initial screen email
2. Phone screen
3. Written test
4. Interviews
The initial screen email is just to cover real basic issues like whether the person can legally work in the US, whether they are really looking for full-time work, whether they need to relocate, and other such issues. You'd be surprised how many people send out their resumes without really reading certain details about the position. And be sure to ask about salary expectations or last salary earned, if you don't put a range for the position on the initial ad. I think you need to know where someone is with salary before even talking to them because you really do not want to be surprised later on. It can also be a bad sign if someone does not want to answer a question about salary or just says that it is negotiable. In this situation, you are more than likely dealing with someone who is not that serious. Good people know what they want and aren't bashful about saying where they are with pay.
The phone screen should be short, about 20 minutes. There should be 4-5 technical questions that cover basic knowledge required for the position. You should ask the same questions of everyone so you can hear the differences. You should also ask a question about what they're working on to get a sense of how they communicate. This should weed out a lot of candidates.
The written test should make the candidate do something that they would be faced with on the job. There are some differences in opinion here, but I believe that trying to simulate real work problems that come up in the position is the way to go, even to the extent of giving them a recent problem or issue faced by your team. When the candidate finishes the test, you can go over it with them and get an understanding of how they tackle problems. I think this gives you the clearest picture of what the person would be like if they came to work because, really, you gave them a little work to do.
The final interviews are with team members and I think it's a good idea to prep them with questions. They can ask what they want to, but make sure they have access to questions so they don't have to worry about it. Personally, I don't like the brain teasers.
Thursday, April 30, 2009
Non-Relational Databases in the Enterprise
It's hard to tell at this point which ones will still be actively developed and used a few years from now. I would assume that the Apache projects have as good a chance as any of them.
I'm interested generally in how these systems can be used inside the enterprise or for non-web applications. Now these systems are built for semi-structured data (key-value) and there is plenty of this kind of data in enterprise systems. Often this data seems somehow extra or may have a variable nature. A good example of this is the properties of a file (author, subject, date created, etc.). This kind of data can be found in lots of existing relational databases in tables that have a foreign key and, not surprisingly, columns usually called “key” and “value.” I've seen these kinds of tables in lots of systems. The important thing to realize is that the data does not need to be used in a query – it does not need to appear in a SQL where-clause. So really there is no need to keep it in the relational database, except for the fact that you want to persist the data in a secure way.
Another option for this data, of course, has been to use XML files. In this kind of solution you would probably have to rely on organizing the information using certain directory and file names. The file would most likely be named with the foreign key. Then you would have to write the code to manage those files, which at the least means a component to read / write the XML files.
But the cost of keeping this data either in a relational database or in XML files ends up being high because you have to consider availability and integrity. For both of these solutions that usually ends up meaning a cluster set-up at the “front” with a RAID array for storage and a somewhat complicated back-up processes.
Cost seems to make the non-relational database systems particularly attractive for the enterprise. The non-relational databases have been specifically developed with the idea that you can use cheap hardware to scale them out. They are distributed systems and rely on different replication schemes to keep copies of the data on a certain minimum number of machines at all times to ensure that the data is always available. People generally seem to feel comfortable with the same data existing on at least 3 machines. These machines theoretically do not need to be much more powerful than a regular desktop machine. Start adding a few more machines and your capacity and savings should really start to add up.
Of course there would be training and switch-over costs, but your programmers will be happy to work on the new technology. For a large company that has many internal, proprietary systems, there is probably a lot of money to save by creating one of these clusters and consolidating all that semi-structured data into it. Save the expensive storage for the highly structured, transactional data.
Ordering for Trees in SQL
Depth-wise Ordering for Trees in SQL