The Honest Estimation Problem: Why Software Forecasting Fails and What Actually Helps

The real enemy isn’t estimation itself. It’s false precision and the organisational dysfunction that demands it.

Jan 20, 2026

TL;DR

Software estimation has a terrible track record. Average cost overruns of 189% and time overruns of 222% aren’t anomalies; they’re the norm. But the problem isn’t that the future is unknowable. The problem is that organisations punish honesty about uncertainty.

The core claim: Estimation fails not because prediction is impossible, but because most organisations can’t handle honest answers. They demand single numbers, treat those numbers as commitments, and blame people when reality diverges from the guess. No methodology fixes that.

What doesn’t work: Traditional point estimates pretend certainty where none exists. Story points get gamed the moment they become performance metrics. Even probabilistic forecasting fails if the organisation punishes you for missing the 75th percentile instead of the point estimate.

What actually helps: Track what happens (cycle time, throughput). Present ranges with confidence levels. Keep work items small, genuinely small, using deliberate slicing techniques. Treat estimates as decision-support, not promises. Most importantly, build organisational tolerance for uncertainty. If your culture punishes honest answers, no methodology will save you.

The bottom line: The question isn’t “how do we estimate better?” The question is “how do we build organisations mature enough to handle honest answers about uncertainty?” That’s harder than switching methodologies, but it’s the only thing that actually works.

The terminology trap

In brief: Organisations routinely conflate estimates, targets, and commitments. This vocabulary confusion is where dysfunction begins.

Before examining why estimation fails, we need to untangle a vocabulary problem that poisons most estimation conversations.

Steve McConnell, in Software Estimation: Demystifying the Black Art, identifies three concepts that organisations routinely conflate:

An estimate is an analytical prediction of how long something will take, based on available information. It’s a forecast, not a promise. It should include uncertainty ranges and will change as information improves.

A target is a business goal or desired deadline. It represents what the organisation wants to achieve, independent of whether it’s realistic.

A commitment is a promise to deliver by a specific date. It carries accountability and should only be made when uncertainty has narrowed enough that the risk is acceptable.

The dysfunction begins when these get confused. A developer saying “this looks like maybe six weeks” intends an estimate, a preliminary forecast based on incomplete information. Management hears a commitment. The roadmap records a target. When reality diverges, everyone feels betrayed despite never having agreed on what the original number meant.

This isn’t mere pedantry. McConnell argues that consciously separating these concepts is a prerequisite for healthy estimation conversations. An estimate can inform whether a target is realistic. A commitment should only follow once estimates have been refined sufficiently that the uncertainty band becomes acceptable for the business risk involved.

Most organisational dysfunction around estimation traces back to this confusion. When executives treat preliminary estimates as commitments, or when targets are disguised as “just estimates,” the game is rigged before it starts.

The uncomfortable track record

In brief: The statistics are bad, but often overstated. The real problem isn’t prediction itself; it’s what organisations do with predictions.

Let’s be honest about the numbers, and careful about what they actually mean.

Bent Flyvbjerg’s research on over 16,000 projects found that only 0.5% deliver their promised benefits within budget and timeframe. The Standish Group’s CHAOS Report shows roughly 16% of projects succeed on time, on budget, with all features. About 53% are challenged, meaning over budget, over time, or with fewer features. The remaining 31% are cancelled outright.

These numbers are often cited to damn estimation itself, but we should be careful. Flyvbjerg’s research focuses heavily on large infrastructure and megaprojects: airports, railways, Olympic venues. These domains have specific pathologies, including strategic misrepresentation (the polite term for lying to get projects approved), that don’t translate cleanly to typical software teams. A SaaS product built by a startup is not an airport.

More importantly, Flyvbjerg’s 0.5% figure measures whether projects deliver their promised benefits, not whether they hit their estimates. A project can hit its estimates perfectly and still fail to deliver value because the wrong thing was built. Conversely, a project can overrun significantly and still be wildly successful. Amazon Web Services famously launched years late and over budget, but it would be absurd to call it a failure.

That said, the track record for estimation accuracy in software specifically is still poor. Average cost overruns of 189% and time overruns of 222% aren’t anomalies; they’re common. Something is systematically wrong. But importing the worst-case statistics from megaprojects and presenting them as universal software truth overstates the case.

McConnell adds another sobering observation: projects hardly ever remain stable regarding the assumptions that affect estimates. Staff departs. Priorities shift. Budgets shrink. Markets change. The estimate made at the start described a project that no longer exists by the middle.

Why your brain lies to you

In brief: Cognitive biases make accurate estimation genuinely hard. The planning fallacy is robust across cultures, personality types, and experience levels. Even knowing about it doesn’t prevent it.

In 1979, Daniel Kahneman and Amos Tversky identified what they called the planning fallacy: the tendency to underestimate time, costs, and risks while overestimating benefits. This isn’t a bug in human cognition; it’s a feature. And it’s remarkably robust. It appears for small tasks and massive infrastructure projects. It generalises across personality types and cultures. It affects individuals and groups equally. Most troublingly, even knowing about the bias doesn’t prevent it.

When you estimate a software project, your brain deploys an arsenal of cognitive shortcuts. Optimism bias makes you believe you’re less likely to experience problems than others. Anchoring bias means the first number mentioned warps all subsequent thinking. The availability heuristic causes you to weight recent, memorable experiences more heavily than historical data. Self-serving bias leads you to take credit when things go well but blame external factors when they don’t, preventing you from learning.

Kahneman distinguished between the “inside view” and the “outside view”. The inside view focuses on a project’s specific characteristics: the features, the technology, the team. This feels like the right approach, but it systematically produces overconfident estimates. The outside view instead treats your project as one instance in a class of similar efforts and asks how those similar efforts actually performed. Kahneman called the outside view “the single most important piece of advice regarding how to increase accuracy in forecasting”.

This insight was formalised as Reference Class Forecasting: rather than estimating your project based on its unique characteristics, identify a reference class of similar past projects and use their actual outcomes as your baseline. The technique won Kahneman the Nobel Prize in Economics and has been mandated by the UK Treasury for large public projects. Yet software teams rarely implement it systematically, instead relying on expert judgment that demonstrably produces optimistic bias.

Ron Jeffries makes a related point: teams typically estimate “at the moment of maximum ignorance,” when understanding of what needs to be built is lowest. The estimates that get locked in as commitments are precisely the ones made when the team knew least about the work.

The Cone of Uncertainty: theory meets reality

In brief: McConnell’s Cone describes how uncertainty narrows on well-managed projects. The caveat is important: it only works when teams already practice healthy project management.

Steve McConnell popularised the Cone of Uncertainty as a framework for understanding estimation accuracy. The concept is elegant: early in a project, uncertainty spans a wide range (estimates may be off by 4x in either direction). As work progresses and unknowns resolve, the cone narrows. Variability decreases from 4x to 2x to 1.25x as projects move through requirements, design, and implementation phases.

The Cone offers useful insights. It legitimises early imprecision: demanding precise estimates during initial scoping ignores the fundamental reality that key decisions haven’t yet been made. It justifies iterative replanning: as information accumulates, estimates can legitimately narrow. Organisations that lock in early estimates and refuse updates are fighting mathematical reality.

But the critical caveat undermines the Cone as prescription. McConnell himself emphasises that the Cone describes what happens on well-managed projects. He writes: “It’s easily possible to do worse.” Many teams don’t systematically attack their highest sources of variability early. They postpone difficult problems, creating false certainty that shatters late in the project.

If the framework only works when teams already practice healthy project management, it describes a symptom of good practice rather than providing a path toward it. Teams struggling with estimation dysfunction can’t simply “follow the Cone.” They must first solve the organisational problems that prevent uncertainty from narrowing naturally.

The story points trap

In brief: Story points fail not because relative estimation is flawed, but because organisations misuse them. Even the practice’s inventor now expresses regret.

When Agile emerged, story points and velocity were supposed to solve the estimation problem. Abstract away from time. Focus on relative sizing. Let the data tell you how much you can deliver.

In theory, elegant. In practice, often dysfunctional.

The moment velocity becomes a performance metric, teams start gaming it. Story point inflation becomes rampant. Different teams have different reference scales, making cross-team comparisons meaningless, yet organisations constantly try to compare them. When hitting velocity targets matters more than sustainable delivery, teams cut corners. Technical debt accumulates. And despite story points being designed to avoid time estimation, stakeholders want dates, so organisations create conversion factors that destroy whatever value story points might have provided.

The inventor’s second thoughts

Perhaps the most striking critique of story points comes from Ron Jeffries himself, one of the original Extreme Programming founders often credited with inventing the practice. In 2012, Jeffries wrote:

“A team that is focusing on velocity is not focusing on value. I wish I had never invented velocity, if in fact I did.”

This isn’t nostalgic regret about misuse. Jeffries argues that velocity focuses on “the short end of the lever,” optimising the cost side of the equation when what matters in Agile is steering by selecting what to do and what to defer. He observed teams being measured on estimate accuracy, with product owners reduced to “just following the plan” rather than genuinely prioritising value.

The creator’s disillusionment doesn’t invalidate story points for all contexts. But it suggests that even under ideal conditions, the practice may misdirect attention. When your estimation system’s architect concludes it optimises the wrong variable, perhaps the problem runs deeper than organisational dysfunction.

The broader pattern

This matters because the same critique applies to any metric, including the flow metrics and throughput measures that forecasting advocates prefer. Throughput can be gamed too. Cycle time can be manipulated by how you define “started” and “done”. If leaders misuse metrics, changing metrics won’t fix leadership.

The Agile Manifesto’s seventh principle states: “Working software is the primary measure of progress.” Not story points. Not velocity. Not throughput. Working software. Any metric is a proxy, and any proxy can be gamed.

Forecasting: changing the conversation

In brief: Forecasting is still estimation. The value isn’t ontological (it doesn’t make prediction possible). The value is behavioural (it changes how estimates are discussed).

Around 2011, practitioners like Vasco Duarte, Neil Killick, and Woody Zuill started questioning the estimation orthodoxy under the hashtag #NoEstimates. They drew a distinction between estimating (giving a general idea based on opinion) and forecasting (calculating predictions based on historical data).

This distinction is useful, but we should be clear about what it actually changes. Forecasting is still estimation. When you run a Monte Carlo simulation and say “75% chance of completion by October 22nd”, you’re still making a prediction about the future. You’ve wrapped that prediction in probability language, making uncertainty explicit rather than hidden. That’s genuinely valuable, but it’s not magic.

The real value of probabilistic forecasting isn’t that it makes prediction possible where it wasn’t before. It’s that it changes the conversation. It’s harder to treat a probability distribution as a commitment. It forces discussions about risk tolerance. It makes uncertainty visible and therefore discussable.

Monte Carlo simulation, invented by John von Neumann and Stanislaw Ulam during World War II, uses repeated random sampling to model uncertain outcomes. For software delivery, you gather historical throughput data (how many items your team completes per week) and cycle time data (how long items take from start to finish). You then run thousands of simulations, randomly sampling from past performance to project possible futures. Instead of a single date, you get a probability distribution: 50% chance by October 15, 75% by October 22, 90% by November 1.

Daniel Vacanti puts it well: “If you want to get stuff done by August 31st, sure, we can get stuff done by August 31st, but there’s about a 40% chance of that happening. Are you okay taking that 60% risk? Now we can have a smarter, more adult economic conversation about what risk we’re willing to take.”

What honest estimation looks like

In brief: A concrete example of the difference between dysfunctional and healthy estimation conversations.

The difference between bad and good estimation isn’t the technique. It’s the conversation. Here’s what the same situation looks like handled badly versus handled well.

The dysfunctional version:

Product Manager: “When will the new payment integration be done?”

Tech Lead: “Hard to say exactly. There’s a lot of uncertainty around the third-party API.”

Product Manager: “I need a date for the roadmap.”

Tech Lead: “Um... maybe six weeks?”

Product Manager: “Great, I’ll put it down for March 15th.”

Four weeks later:

Product Manager: “We’re halfway through and you’re saying it might slip? You committed to March 15th.”

Tech Lead: “I said ‘maybe six weeks.’ And we’ve discovered the API doesn’t support batch operations, so we need to redesign.”

Product Manager: “This is going to be a difficult conversation with leadership. They’re expecting March 15th.”

The healthy version:

Product Manager: “When will the new payment integration be done?”

Tech Lead: “Based on similar integrations we’ve done, I’d say 50% chance we’re done in four weeks, 75% chance by six weeks, 90% by eight weeks. The big unknown is the third-party API. If it’s well-documented and supports our use cases, we’re at the fast end. If we hit surprises, we’re at the slow end.”

Product Manager: “Leadership wants it for the March launch. That’s six weeks away.”

Tech Lead: “So we’re looking at roughly 75% confidence for that date. What’s the cost if we miss it? Is there a fallback plan?”

Product Manager: “The launch can proceed without it, but it’s a headline feature. Missing it would be disappointing, not catastrophic.”

Tech Lead: “Then let’s proceed, but I’ll flag it early if we hit the API problems. If we’re in trouble by week three, we’ll know, and you’ll have time to adjust messaging.”

Four weeks later:

Tech Lead: “The API doesn’t support batch operations. We’re now tracking toward the six-to-eight week range. I wanted to let you know with two weeks still to go.”

Product Manager: “Thanks for the early warning. I’ll talk to leadership about plan B for the launch.”

The technique (Monte Carlo, throughput tracking, whatever) matters less than what’s happening in that conversation: ranges instead of points, explicit confidence levels, discussion of risk tolerance, early warning when forecasts change, and no blame when reality differs from prediction.

The assumptions behind forecasting

In brief: Forecasting requires stable throughput, comparable work items, and sufficient history. When these don’t hold, forecasts are unreliable, and you should say so.

Monte Carlo forecasting isn’t magic. It rests on assumptions that often go unstated.

It assumes reasonably stable throughput, meaning your team’s capacity tomorrow will resemble its capacity yesterday. It assumes comparable work items, meaning the things you’re forecasting are similar in nature to the things in your historical data. It assumes sufficient historical data to sample from. And it assumes no major structural changes to team composition, domain, or architecture.

These conditions often don’t hold. Greenfield products have no history. Early startups pivot constantly. Platform rewrites involve genuinely novel technical challenges. Regulated environments have external constraints that historical data can’t capture. Teams undergoing rapid scaling have unstable throughput by definition.

The situations where estimation is hardest are precisely the situations where you have the least relevant historical data to sample from. A team that’s been delivering similar features for two years can forecast with reasonable confidence. A team building something genuinely new, in a domain they don’t know well, with technology they haven’t used before, has no meaningful reference class.

For teams without historical data, the honest answer is: your forecasts will be unreliable, and you should say so. Start collecting data immediately, make small commitments, and let your forecasting improve as your history grows. Pretending you can forecast accurately without data isn’t forecasting; it’s just the old estimation game with fancier vocabulary.

Cognitive biases don’t disappear with data

In brief: Forecasting moves bias into different places (how work is sliced, which history is included) rather than eliminating it. Discipline still required.

The argument that estimation suffers from cognitive biases while forecasting doesn’t is too clean. The same biases that corrupt our estimates also affect how we collect and interpret historical data.

Forecasting moves bias into different places rather than eliminating it. Bias affects how work is sliced: teams often break down work in ways that make historical comparisons favourable. Bias affects which history is included: it’s tempting to exclude that disaster project as an “outlier” that doesn’t represent normal performance. Bias affects how outliers are treated: do you cap them, exclude them, or let them skew your distribution?

Teams tend to remember successes more vividly than failures. They categorise work items in ways that flatter past performance. The person choosing which historical items to include in the reference class brings all their biases with them.

This doesn’t mean data-driven forecasting is worthless. It just means it’s not the silver bullet it’s sometimes presented as. Good forecasting requires discipline: consistent categorisation, honest inclusion of failures, resistance to cherry-picking. The methodology helps, but it doesn’t eliminate the need for intellectual honesty.

The small stories alternative

In brief: If work items are small enough, estimation becomes almost unnecessary. You count items and measure throughput. The challenge lies in achieving that granularity.

Ron Jeffries argues that if work items are small enough, estimation becomes almost unnecessary. When everything takes roughly a day, you don’t need sophisticated forecasting. You count items and measure throughput. The challenge lies in achieving that granularity.

Technique 1: Single Acceptance Test Method

Neil Killick, cited by Jeffries, proposes examining acceptance criteria and implementing them one at a time, starting with the simplest. Rather than estimating “User can pay for order” as a single story, break it into separate work items:

Display payment button (simplest). Accept credit card number. Validate card format. Process test transaction. Handle payment failure gracefully. Store transaction record. Send receipt email.

Each acceptance test becomes a separate work item. Most will be small, genuinely completable in a day. The few that aren’t get broken down further.

Technique 2: The “One Dumb Idea” approach

Jeffries describes a psychological technique for unlocking small stories. When teams face seemingly monolithic features, propose a deliberately inadequate but technically possible first step.

His cable TV example: Instead of building full pay-per-view functionality, propose “Play one specific movie on a secret channel.” This exists almost entirely with current infrastructure. No user selection, no payment, no scheduling. Just hard-code a movie playing on channel 999.

The power lies in team psychology. When someone proposes an obviously insufficient solution, others instinctively respond with “What we could do instead is...” Suddenly the conversation shifts from “this is impossible” to discussing achievable increments. As Jeffries notes: “We’ve gone, in one step, from ‘impossible’ to knowing a stupid, but possible, thing to do.”

Technique 3: Vertical slicing with minimal viability

Each small story should deliver something end-to-end, however minimal.

Horizontal slicing (avoid): Build database schema. Create API endpoints. Implement frontend components. Wire everything together.

Vertical slicing (prefer): User can see one hardcoded product (end-to-end). User can see one product from database. User can see list of products. User can filter products by category.

The vertical approach produces shippable increments and reveals integration problems immediately rather than concentrating them at the end.

Why small stories help estimation

Even if you don’t eliminate estimation entirely, small stories transform the accuracy problem.

Reduced variability: A 10-day estimate might be off by 5 days. Ten 1-day estimates won’t all be wrong in the same direction.

Faster feedback: When items complete daily, you discover problems within days rather than weeks.

Throughput becomes measurable: With sufficient small items, historical throughput data emerges quickly, enabling forecasting without per-item estimation.

Cognitive load decreases: Estimating “this takes about a day” requires less analysis than forecasting two-week epics.

This approach requires investment in slicing skills and may initially feel slower than rougher-grained planning. But Jeffries argues the payoff is faster delivery with less estimation overhead, and crucially, less opportunity for estimates to become weaponised commitments.

The NoEstimates philosophy

In brief: The actual argument isn’t “don’t estimate” but “continuously question whether estimation is earning its keep.”

The NoEstimates movement, associated with Woody Zuill, Vasco Duarte, and Ron Jeffries, often gets reduced to “don’t estimate.” The actual argument is more nuanced: continuously question whether estimation is earning its keep.

Estimation as expense

Jeffries frames it starkly: “Estimates are always waste; they are not our product.” From a lean perspective, any activity that doesn’t directly produce customer value is expense. Estimation doesn’t ship features. The question becomes: does this expense generate sufficient return in decision quality to justify its cost?

On the C3 payroll project (Extreme Programming’s flagship case study), Jeffries’ team initially used story estimates for planning. They later realised they could have achieved similar outcomes by breaking work into single acceptance tests and counting completions. Mechanical measurement replacing estimation entirely.

When estimation provides value

Jeffries acknowledges legitimate use cases.

Sales and contracts: Pricing decisions require some basis. Though he critiques how organisations weaponise estimates in negotiations, he concedes that customers reasonably want cost projections before committing.

Understanding: Estimation discussions surface differing interpretations of requirements. Team members discover they imagined different solutions. However, Jeffries suggests the conversation provides this value. Written estimates aren’t strictly necessary.

Learning: Comparing estimates to actuals reveals systematic biases and process problems. Yet alternative monitoring methods exist; you can track cycle time without estimating individual items.

The pragmatic position

Jeffries’ conclusion is measured: “We always could stop estimating, but it’s not always the right thing to do. It’s always legitimate to think about it.”

This isn’t dogma. It’s a heuristic for continuous improvement. Each time estimation seems mandatory, ask: Is there a way to make decisions without this expense? What would we lose? What might we gain? Sometimes the answer favours estimation. Often, teams discover they estimate from habit rather than necessity.

When estimation is unavoidable

In brief: Fixed-price contracts, regulatory deadlines, capital budgeting, and external coordination all require estimates. The answer is to estimate honestly, not to pretend estimation is unnecessary.

Some contexts don’t allow the luxury of “we’ll deliver what we can when we can”. In these situations, estimation isn’t optional, and the question becomes how to do it less badly.

Fixed-price contracts require estimates. A client asking for a quote needs a number, and “it depends” doesn’t win business. You can build contingency into the price, you can structure contracts with change mechanisms, but you can’t avoid making a forward-looking commitment.

Regulatory commitments have hard deadlines. If compliance with a new regulation is required by a specific date, missing it has consequences that probabilistic language doesn’t soften. You need to know whether you’re likely to make it, and if not, what to do about it.

Capital budgeting requires forecasts. Organisations allocate resources annually or quarterly. Someone deciding whether to fund your initiative versus a competing one needs to understand what they’re getting for their investment. “Trust us” isn’t a capital allocation strategy.

External stakeholder negotiations depend on estimates. If you’re coordinating with partners, aligning marketing campaigns, or scheduling dependent work streams, those stakeholders need something to plan against.

In these situations, the answer isn’t to pretend estimation is unnecessary. It’s to estimate honestly: provide ranges rather than points, communicate confidence levels, update forecasts as you learn, and build relationships where changing estimates isn’t treated as failure.

Estimating less badly

In brief: Ranges, confidence levels, rolling-wave planning, Bayesian updating, and treating estimates as decision-support rather than promises.

When estimation is required, several practices help reduce the damage.

Use ranges instead of points. “Two to four weeks” is more honest than “three weeks” and gives stakeholders useful information about uncertainty. If they need the optimistic end, they know it’s a stretch. If they need certainty, they can plan for the pessimistic end.

Express confidence levels explicitly. P50, P75, and P90 estimates communicate that different levels of certainty come with different timelines. A P50 estimate means there’s a 50% chance of missing it. If that risk is unacceptable, plan for P90.

Practice rolling-wave planning. Estimate near-term work in detail, further-out work in ranges, and distant work as rough orders of magnitude. Don’t pretend you know what you’ll discover.

Update estimates as you learn. Bayesian updating means revising your forecasts as new information emerges. An estimate made at the start of a project should evolve. Treating the original estimate as a commitment regardless of what you’ve learned is organisational dysfunction, not estimation failure.

Treat estimates as decision-support, not promises. The purpose of an estimate is to help someone make a decision: should we fund this, should we commit to this date, should we staff this team. Once the decision is made, the estimate has served its purpose. Holding people to it regardless of changed circumstances misunderstands what estimates are for.

McConnell uses a helpful metaphor: estimates need not be perfect, just close enough that minor adjustments (equivalent to “sitting on the suitcase”) achieve reasonable success. Obsessing over estimation precision often misses the point.

Hybrid approaches

In brief: Use rough estimates early and forecasts later. Combine discovery and delivery. Layer probabilistic methods on top of expert judgment.

The debate is often framed as estimation versus forecasting, as if you must choose one camp. In practice, hybrid approaches often work best.

Use rough estimates early, forecasts later. At the inception of a project, you lack historical data for the specific work. Rough expert estimates, honestly communicated as guesses, help with initial go/no-go decisions. As work progresses and you accumulate data, shift to probabilistic forecasting based on actual throughput.

Combine discovery and delivery tracks. Run a time-boxed discovery phase to reduce uncertainty before committing to estimates. The goal of discovery is to learn enough that your subsequent estimates have a meaningful basis. Don’t estimate what you haven’t explored.

Use scenario-based planning. Instead of a single estimate, develop scenarios: “If the integration goes smoothly, four weeks. If we hit the authentication complexity we suspect, eight weeks. If we need to rebuild the data layer, three months.” This surfaces the key risks and lets stakeholders understand what drives the uncertainty.

Layer probabilistic forecasts on top of rough scoping. Use expert judgment to identify the likely scope, then apply Monte Carlo to the execution. The forecast doesn’t replace judgment about what needs to be built; it provides rigour around how long building takes.

McConnell advocates something similar: define requirements upfront with enough detail for story point estimation, then track velocity to calibrate forecasts. This offers a middle path between pure NoEstimates and traditional detailed estimation.

The economics of estimation

In brief: Cost of delay, risk-adjusted ROI, opportunity cost, and option value. If we want “adult economic conversations”, we should actually talk economics.

One thing largely missing from the estimation debate is economic decision theory. If we want “adult economic conversations”, we should actually talk economics.

Cost of delay matters enormously. A feature delivered in January might be worth twice what it’s worth in June. If you’re choosing between a certain six-month delivery and a risky four-month delivery, the right choice depends on how value decays over time. Probabilistic forecasting is most useful when connected to explicit cost-of-delay analysis.

Risk-adjusted return on investment changes decisions. A project with an expected value of £1 million but high variance might be less attractive than one with an expected value of £800,000 and low variance. Portfolio thinking requires understanding not just expected outcomes but distributions of outcomes.

Opportunity cost is invisible but real. While your team spends six months on Project A, they’re not working on Projects B, C, and D. The value of better estimation isn’t just delivering A faster; it’s making better choices about whether to do A at all.

Option value exists in uncertainty. Sometimes the right response to uncertainty isn’t better estimation; it’s structuring work to preserve options. Small investments that let you learn before committing are often worth more than precise forecasts that lock you in.

Fixed time, variable scope

In brief: Shape Up’s approach works well for product-led organisations with autonomy. It doesn’t work for regulatory deadlines or fixed external commitments.

Basecamp’s Shape Up methodology offers an interesting flip. Instead of fixing scope and letting time vary, you fix time and let scope vary. You decide that something is worth six weeks of effort, then build the best version you can in six weeks.

This is liberating in some contexts. Instead of expanding timelines to fit scope (which leads to Parkinson’s Law), you ruthlessly trim scope to fit timelines. Six weeks turns out to be a sweet spot: long enough to finish something meaningful, short enough to see the end from the beginning.

But Shape Up is context-specific, not universally applicable. It works well for product-led organisations with strong product management, high team autonomy, and low external deadline pressure. It works poorly for contract-based delivery, regulatory milestones, hardware dependencies, and systems with heavy multi-team coordination.

If the business requirement is “we need these specific regulatory features by this compliance deadline”, you can’t negotiate scope. If you’re coordinating with external partners who expect specific functionality, you can’t just deliver “whatever fits”. Shape Up is a valuable tool where it applies, not a universal solution.

The selection bias in success stories

In brief: Teams that work without traditional estimates have usually earned that trust through years of reliable delivery. Many teams don’t have that luxury.

Teams that successfully operate without traditional estimates share characteristics that often go unmentioned. They typically have high trust with stakeholders built over years of reliable delivery. They have mature development practices: continuous integration, automated testing, small incremental releases. They have stable funding that doesn’t require competitive justification.

In other words, they’ve earned the right to say “trust us”. They can operate with probabilistic forecasts because their stakeholders have seen enough delivery to believe the probabilities are meaningful.

Many teams operate in environments where that trust hasn’t been established. Funding is competitive. External dependencies require coordination. Stakeholders have been burned before and want commitments. Telling those teams to “just stop estimating” isn’t practical advice. They need to build trust first, which often means delivering reliably against stated expectations, which requires some form of estimation.

Flow metrics: the foundation

In brief: Cycle time, throughput, WIP, and work item age. These measure what actually happened rather than what someone guessed. But they can be gamed too.

Whatever approach you take, tracking the right metrics matters. The four essential flow metrics are cycle time (how long work takes from start to finish), throughput (how many items you complete per time period), work in progress (how many items are currently in flight), and work item age (how long an item has been in progress).

These connect through Little’s Law: average cycle time equals average work in progress divided by average throughput. Limiting WIP decreases cycle time. Decreased cycle time increases predictability. Increased predictability makes forecasting more accurate.

Teams that shift from story points and velocity to flow metrics often report reduced cycle times and better predictability. The mechanism is straightforward: flow metrics measure what actually happened, while story points measure what someone guessed would happen.

But remember: these metrics can be gamed too. The value comes from honest measurement and continuous improvement, not from the metrics themselves. Any metric that becomes a target ceases to be a good metric. The solution is cultural, not methodological.

The real enemy: organisational dysfunction

In brief: The problem isn’t estimation. It’s what organisations do with estimates. False precision is the enemy, not prediction itself.

Here’s the deeper issue the estimation debate often misses: the problem isn’t estimation itself. It’s the organisational dysfunction that surrounds it.

A team saying “two to four weeks, depending on what we discover” is estimating, and that’s fine. The problem is when that becomes “we committed to two weeks” in a status report, which becomes “why did you miss your commitment?” in a performance review. The dysfunction isn’t the estimate; it’s what the organisation does with it.

False precision is the enemy, not estimation. When someone asks “how long will this take?” and the honest answer is “probably 2-4 weeks, but it could be longer if we hit complications”, the organisation needs to be able to hear that. If it can’t, if it demands a single number and then treats that number as a commitment, the problem is cultural, not methodological.

The responsibility question

Ron Jeffries raises a pointed question about accountability. Developers cannot reasonably be held responsible for meeting deadlines without corresponding authority. Unless developers can hire or reassign staff, acquire additional resources, decide which features ship versus defer, or adjust scope unilaterally, they cannot control the variables that determine delivery dates.

Jeffries compares development to a machine with fixed capacity. Pushing harder doesn’t increase throughput; it risks breakdown. The product owner must select work batches that fit the timeline, not demand the machine work faster.

This has uncomfortable implications. When projects fail to meet deadlines, the conventional response is blaming developers for poor estimates. Jeffries suggests the failure often lies in management’s scope decisions, resource allocation, or unrealistic targets, factors developers cannot control.

What developers can commit to: delivering working, tested features regularly; keeping the codebase shippable at all times; working on whatever sequence management prioritises; surfacing impediments early rather than hiding them. This is narrower accountability than “hit the date,” but it’s accountability developers can actually fulfil without authority over scope and resources.

Dysfunction arises when organisations hold people accountable for outcomes they cannot control. Estimates become the instrument for manufacturing this false accountability.

The cultural barrier

Monte Carlo simulations and probabilistic forecasting help with the honesty part. They make it harder to pretend certainty where none exists. But they don’t solve the cultural part. An organisation that punishes missed estimates won’t suddenly become healthy because you started expressing estimates as probability distributions. They’ll just punish you for missing the 75th percentile instead of the point estimate.

What actually helps

In brief: Track reality, embrace probability, keep things small, focus on flow, estimate honestly when you must, and build a culture that can handle uncertainty.

Based on all the evidence, here’s what actually improves outcomes.

Track what actually happens. Historical throughput beats expert guesses. Collect cycle time and throughput data consistently, even if you’re not sure how you’ll use it yet.

Embrace probabilistic thinking. Present ranges with confidence levels rather than single dates. Have risk conversations rather than commitment ceremonies. Acknowledge that you’re uncertain and explain what drives the uncertainty.

Keep work items genuinely small. Target 1-day items where possible, certainly no more than 3 days. Use the slicing techniques: single acceptance tests, vertical slicing, the “one dumb idea” approach. Smaller items mean tighter distributions and better forecasts.

Focus on flow. Measure cycle time and throughput. Limit work in progress. The maths is clear: lower WIP means faster cycle times and more predictable delivery.

Where possible, fix time and vary scope. If you can negotiate what gets built, time-boxing forces real prioritisation and reduces the scope creep that derails traditional projects.

When estimation is required, do it honestly. Use ranges, express confidence levels, update as you learn, and treat estimates as decision-support rather than commitments. Separate estimates from targets from commitments in your vocabulary and your conversations.

Build organisational tolerance for uncertainty. This is the hardest part and the most important. If your organisation punishes honest uncertainty, no methodology will save you. Work on the culture alongside the practices.

What about AI?

In brief: AI agents change the nature of the work being estimated. Historical data becomes less relevant. Variance increases. The estimation problem gets harder before it gets easier.

AI coding agents are already changing how software gets built. Tasks that took a day now take an hour. But not all tasks, and not predictably. This creates a new estimation problem: how do you forecast when the work itself is transforming?

Historical throughput data assumes some stability in how work gets done. If your team’s cycle time for “build a new API endpoint” was consistently 2-3 days, you could forecast based on that. But now one developer with an AI agent finishes it in two hours, while another developer working on a different endpoint hits edge cases the AI can’t handle and takes three days anyway. Your historical distribution no longer describes your current capability.

The variance problem gets worse, not better. AI agents are fast when they work and useless when they don’t, and predicting which situation you’ll hit is difficult. A task might complete in minutes if the AI handles it cleanly, or take longer than the pre-AI baseline if you spend hours debugging AI-generated code that almost works. The distribution of outcomes becomes bimodal or worse, which breaks the assumptions behind Monte Carlo forecasting.

There’s also a decomposition problem. Traditional estimation assumes humans do the work and you’re estimating human effort. When AI agents do significant portions of the work, what exactly are you estimating? The human time spent prompting, reviewing, and correcting? The wall-clock time including AI processing? The cognitive load on the human, which might be higher when supervising AI than when doing the work directly? None of our existing frameworks handle this cleanly.

The honest answer is that we don’t yet know how to estimate human-AI collaborative work well. Teams adopting AI agents should expect their forecasting accuracy to degrade temporarily. Historical data becomes less useful. New patterns haven’t stabilised enough to replace the old ones. The best approach is probably radical incrementalism: even smaller batches, even shorter feedback loops, even more willingness to update forecasts as you learn. Treat every AI-assisted task as an experiment until you’ve built enough new history to see patterns.

This might ultimately be good news for the estimation debate. If AI makes historical data unreliable and variance unpredictable, organisations will be forced to accept uncertainty whether they like it or not. You can’t demand false precision when everyone can see the ground shifting. But in the short term, expect estimation to get harder, not easier.

Conclusion: three voices, one uncomfortable truth

We’ve heard three distinct perspectives on software estimation.

Steve McConnell argues that estimation is a craft that can be practiced skilfully. Distinguish estimates from targets from commitments. Understand the Cone of Uncertainty. Use historical data and reference classes. Track actuals. The problem isn’t estimation itself but doing it carelessly.

Ron Jeffries questions whether estimation deserves its central role. Every estimate is expense, not product. When work is sliced small enough and delivered continuously, forecasting emerges from counting rather than guessing. At minimum, keep asking: can we make this decision without estimating?

The organisational dysfunction thesis identifies cultural problems as the root cause: not estimation technique or philosophy, but what organisations do with estimates, demanding false precision, punishing honest uncertainty, treating preliminary forecasts as ironclad commitments.

These perspectives aren’t as contradictory as they first appear.

McConnell would agree that estimation without distinguishing it from commitment is organisational malpractice. Jeffries would agree that when estimation is necessary, doing it well beats doing it badly. All parties agree that probabilistic language beats point estimates, that smaller work items improve predictability, and that organisational culture determines whether any technique succeeds.

McConnell offers: “The primary purpose of software estimation is not to predict a project’s outcome; it is to determine whether a project’s targets are realistic enough to allow the project to be controlled to meet them.”

Jeffries offers: “Stop estimating. Start shipping.”

Perhaps both are right. When an organisation has the maturity to use estimates as McConnell envisions, for project control rather than prophecy, estimation becomes a valuable tool. When an organisation lacks that maturity, Jeffries’ provocative advice may be the safest path: stop the estimation theatre entirely, focus on small deliverables, and let observable throughput speak for itself.

The question isn’t “Should we estimate?” but “Has our organisation earned the right to estimate responsibly?”

That requires cultural change harder than switching methodologies, but it’s the only approach that genuinely works.

References

Steve McConnell

McConnell, S. (2006). Software Estimation: Demystifying the Black Art. Microsoft Press.

McConnell, S. “Software Engineering Radio Episode 273: Steve McConnell on Software Estimation.” Software Engineering Radio.

McConnell, S. “The Cone of Uncertainty.” Construx. https://www.construx.com/books/the-cone-of-uncertainty/

Ron Jeffries

Jeffries, R. “The NoEstimates Movement.” ronjeffries.com. https://ronjeffries.com/xprog/articles/the-noestimates-movement/

Jeffries, R. “Estimation is Evil.” ronjeffries.com. https://ronjeffries.com/articles/019-01ff/estimation-is-evil/

Jeffries, R. “Getting Small Stories.” ronjeffries.com. https://ronjeffries.com/articles/015-10/small-stories/

Jeffries, R. “Making the Date.” ronjeffries.com. https://ronjeffries.com/articles/making-the-date/

Jeffries, R. “Story Points Revisited.” ronjeffries.com. https://ronjeffries.com/articles/019-01ff/story-points/Index.html

Jeffries, R. (2012). Comment on Scrum Alliance discussion regarding velocity. Referenced in InfoQ: “Should we stop using Story Points and Velocity?”

Daniel Kahneman and Amos Tversky

Kahneman, D. & Tversky, A. (1979). “Intuitive Prediction: Biases and Corrective Procedures.” TIMS Studies in Management Science, 12, 313-327.

Kahneman, D. (2011). Thinking, Fast and Slow. Farrar, Straus and Giroux.

Reference Class Forecasting

Flyvbjerg, B. (2006). “From Nobel Prize to Project Management: Getting Risks Right.” Project Management Journal, 37(3), 5-15.

Wikipedia. “Reference Class Forecasting.” https://en.wikipedia.org/wiki/Reference_class_forecasting

Project Statistics

Flyvbjerg, B. (2021). “How Big Things Get Done.” Oxford Saïd Business School.

The Standish Group. “CHAOS Report.” https://www.standishgroup.com/

NoEstimates Movement

Duarte, V. NoEstimates: How to Measure Project Progress Without Estimating. Leanpub. https://leanpub.com/noestimates

Zuill, W. “NoEstimates.” https://zuill.us/WosenseeBlog/tag/noestimates/

NoEstimates.org. Links and Resources. https://www.noestimates.org/

Flow Metrics and Forecasting

Vacanti, D. When Will It Be Done? Lean-Agile Forecasting to Answer Your Customers’ Most Important Question. Leanpub. https://leanpub.com/whenwillitbedone

Vacanti, D. Actionable Agile Metrics for Predictability. Leanpub. https://leanpub.com/actionableagilemetrics

Kersten, M. (2018). Project to Product: How to Survive and Thrive in the Age of Digital Disruption with the Flow Framework. IT Revolution Press.

DORA Research

Forsgren, N., Humble, J., & Kim, G. (2018). Accelerate: The Science of Lean Software and DevOps. IT Revolution Press.

DORA. “DORA Metrics: The Four Keys.” https://dora.dev/guides/dora-metrics-four-keys/

Shape Up

Singer, R. Shape Up: Stop Running in Circles and Ship Work that Matters. Basecamp. https://basecamp.com/shapeup

Additional Sources

Killick, N. “Slicing Heuristics.” Referenced in Jeffries’ work on small stories.

Beck, K. et al. “Manifesto for Agile Software Development.” https://agilemanifesto.org/

Martin

Jan 23

Very quoteabe here, Andrea; love this piece I found via your linkedin.

I have always taken issue with those in teams who say we need to 'be better at estimating'. It's the wrong thing to focus on. You simply don't 'get better' at estimating in isolation; there are other ways to _together_ get to the information they _actually_ wanna know, and there's much to be said for maturing the thinking of the org about what an estimate is, and the tolerance for change. I.e it's not a failing. If we're doing anything novel, we simply don't know and that's a feature not bug.

Without becoming a contract, estimates are still perhaps something to reach collaboratively to inform business decisions, tactically with the stakeholders. Honest discussion of unknowns, how critical the deadline is, what the messaging around it can be to the person who ultimately wants to know. But those using what they've been given as estimates are as much responsibile for how they use that info as the devs giving the estimate IMO.

I've found stakeholders generally forgiving if devs get comfortable with being honest (our job isn't to make stakeholders comfortable), and signal the inevitable issues early as you outlined.

I like the stepped percentage approach; I'll try this when faced with this absurdity in future.

Great writing! Thanks.

Andrea’s Substack

Discussion about this post

Ready for more?