For every question there is an answer that is obvious, simple…

… and wrong.   Never have I seen such a cluster of these answers around the questions of “How long will <X> take?”, and “Are we there yet?”. Today I’m going to describe a bit about how to answer them, and be a lot less wrong in the process.

How long will <X> take?
This is the classic estimation problem.  A wiser man than myself said “Normally longer than you think”, but while accurate, it’s not very fulfilling nor informative.  My experience is that most people are pretty damn good at estimating how long “their bit” takes, and what’s more – how long “their bit” takes when they actually get to work on it.  However, nearly everybody is terrible at understanding the effects of external influences on the time.  This is where the significant source of error in estimation occurs.  Not only that – this is where great caution needs to be applied to the “well, it took Z days last time” without understanding the influences on that outcome.
At this point we get to segue into a real world example of exactly this.  I ride to work every day.  I’ve ridden to work every day for over 18 months.  ~ 6km each way for 12 months to ANZ (~400 samples), and ~ 10km each way for 6 months to REA (~200 samples).  I travel on the same route every day and at approximately the same time every day.  I have a Garmin 500 GPS unit that tracks all my travels – so I have a long historical record of doing exactly the same thing every day.  With all this wonderful data, you would think I’d be able to accurately predict how long it takes me to get to and/or from work.  Here’s the news, for what is an average of 30 min journey, I cannot predict within 10% what my journey time will be.  How the fuck is that possible?  My fastest time home is 25 minutes, and my slowest is nearly 35 minutes.  
So, you’re an astute reader (well, you’re reading my blog – so you must be), you’re scratching your head trying to work out how I’m getting nearly a 30% variation over the time.  Time of day? (no) Weather? (no) Fitness? (no) Bike chosen to ride on? (no) Traffic? (no)
Here’s the crucial piece of information that my awesome Garmin unit has.  It has my average speed, and my average _moving_ speed.  Turns out that my average moving time is very stable (pretty low variation) – a fairly comfortable 26km/hr.  So, if my moving speed is constant at 26km/hr – how on earth is there a 30% variation?
Externalities.  In this case – traffic lights.  There are 30 traffic and pedestrian lights on my trip into work.  I’ve not done the analysis on all of them – but I know that 2 of the traffic lights have a cycle of 2 minutes.  So – from a best case of 0, to a worst case of 4 minutes – that’s a  10% variation just from those.  Wow.   On the upside, I can say that my _expected_ time is 30 minutes, but it could be from 25 to 35 minutes.
So, here’s a tip.  When looking at estimating – even for things you know you do all the time – look at the external influences on the task at hand.  Count them – that should give you a good idea of the level of variation that may occur.  More external influences that you don’t have any control over – the lower the confidence that should be placed, and the greater the need to have a conversation about “minimum, expected and maximum”.
This is also a very good reason to use synthetic values for estimation (function points, story points) and instead of predicting, use tracking as a means of determination of task and project length.
Are we there yet?
Not only is this the dreaded question for parents of children, it’s also a bleeding sore for most software developers.  Provided you’ve already moved away from aggregating single estimates in hours or days and have decided that a synthetic proxy is the way to go (great first step) we need to have some coherent way of determining when we’re likely to be finished.
We’ve all read the “past performance does not guarantee future performance”, yet that is exactly what we’re doing when we take a time-slice of the project, and then project the work already completed to determine the end-point.  The good (and bad) news is that while we should keep the quote in the back of our heads, there’s not a better way to determine the end-point.
However, the big kicker is this, the value of “past performance” is relative to the volume of activity performed, and the amount of variance external activities have caused on those activities _relative_ to the possible impacts remaining.  The first ~200m of my journey has no traffic lights, so it should come as no surprise that there’s low variation, but also should come as no surprise that the predictive value of that first 200m of the journey is low.  Yet, I see people doing this every day in projects.  “At the end of the first iteration we did 40 units of work, excellent – we’re going to finish in <X>” and then getting frustrated, angry or disappointed that next iteration only 20 units of work was completed.
At what point can we have a discussion about the end-point? You probably could at the very start, but it’s hardly a valuable discussion. The very end is probably too late – so it’s somewhere in between. Sure, but where? This is the hard part, and it’s a function of the number of external influences remaining. As we’ve seen from my cycling story above, when there’s a large number of external influences (approximately 1 every minute) – we’re looking at a 30% variation, regardless of where we choose to make a projection.  Clearly as we get closer to the end, there’s less total impact, but the ratio of impact remains (mostly) constant.
Sadly for this story, there’s not an easy answer to the question of “Are we done yet?”.  The best advice I can give is to reduce the external impacts or at the very least be able to quantify them and reduce the problem to understanding your average moving speed.