From Story Points to Reliable Planning: A Practical Guide for Teams Ready to Stop Guessing
Story points have become so ubiquitous in software development that many teams assume they’re a fundamental part of agile. They’re not. Story points were invented as a tool to help teams have conversations about work, but somewhere along the way they became a ritual that consumes time, creates false confidence, and rarely delivers the predictability teams actually need.
If you’ve ever sat through a planning session where the team spent twenty minutes debating whether something is a 5 or an 8, you’ve experienced the dysfunction firsthand. That time could have been spent understanding the work, identifying risks, or actually delivering something.
This article is a practical guide for teams who want to move beyond story points toward planning approaches that are simpler, faster, and grounded in reality rather than collective guessing.
Why story points fail
The theory behind story points sounds reasonable: estimate relative complexity rather than time, use historical velocity to forecast, and avoid the trap of conflating estimates with commitments. In practice, this rarely works as intended.
The first problem is that story points mean different things to different people, even within the same team. One developer thinks about complexity. Another thinks about effort. A third is secretly converting to hours in their head and then picking a Fibonacci number. When you average these different mental models together, you don’t get wisdom of crowds. You get noise.
The second problem is that velocity, the sum of story points completed per sprint, is treated as if it were a stable measure when it isn’t. Velocity fluctuates based on who’s on holiday, how many meetings interrupt the sprint, whether the work was estimated by the same people who did it, and countless other factors. Teams often respond to this instability by adding more process: re-estimation, calibration sessions, reference stories. None of this makes the underlying measure more meaningful.
The third problem is that story points create perverse incentives. When velocity becomes a performance metric, teams unconsciously inflate their estimates. A 3 becomes a 5. An 8 becomes a 13. Velocity goes up, but throughput stays the same. Everyone pretends not to notice.
The final problem is opportunity cost. The time spent estimating is time not spent understanding, slicing, or delivering. A team that spends two hours per sprint on estimation and calibration is spending over a hundred hours per year on an activity that doesn’t move the product forward.
What actually predicts delivery
If story points don’t work, what does? The answer is surprisingly simple: count things.
Throughput, the number of items a team completes per unit of time, is a far more stable and useful measure than velocity. This works because throughput is based on observation rather than prediction. You’re not asking “how hard do we think this will be?” You’re asking “how many things did we actually finish?”
For throughput to be a reliable predictor, one condition must be met: the items being counted need to be roughly similar in size. This doesn’t mean identical, just within the same order of magnitude. If most of your work takes one to three days to complete, counting items gives you a useful forecast. If your backlog contains a mix of half-day tasks and three-week epics, counting won’t tell you much.
This is where the real skill comes in. The shift from story points to throughput-based planning is really a shift from estimating to slicing. Instead of asking “how big is this?” you ask “how do we make this small enough that size doesn’t matter?”
The art of slicing work small
Slicing is the most valuable planning skill a team can develop, and it’s almost entirely ignored in mainstream agile training. The ability to take a large, ambiguous piece of work and break it into thin vertical slices that each deliver value is what separates high-performing teams from the rest.
A good slice has three properties. First, it delivers something a user or stakeholder can see, use, or give feedback on. Second, it can be completed independently, without waiting for other slices. Third, it’s small enough to finish in a day or two of focused work.
The third property is the critical one for planning purposes. When everything in your backlog is one to three days of work, the difference between items becomes negligible. A two-day item and a three-day item are close enough that counting them as equivalent introduces less error than trying to estimate them separately.
Teams often resist slicing this small because it feels unnatural. “We can’t deliver anything useful in two days,” they say. This is almost never true. What’s actually happening is that the team has become accustomed to thinking in terms of technical tasks rather than user value. A “user story” that says “implement the payment gateway” isn’t a story at all. It’s a technical component. A real slice might be “a customer can pay for a single item using a saved card” or even “a customer sees a payment button that shows a coming soon message.” Both of these deliver something observable. Both can be done in a day or two. Both give you something to learn from.
Learning to slice well takes practice, and the best way to practice is to do it together as a team. Every time someone brings a large item to planning, treat it as an opportunity to slice collaboratively. Ask: what’s the smallest thing we could deliver that would let us learn something? What could we ship that a user would actually notice? What’s the riskiest part of this, and how could we test that assumption with a thin slice?
How to run planning without estimates
Once your team has embraced small slices and started tracking throughput, planning becomes remarkably simple.
Before the session, gather your historical data. How many items has the team completed in each of the last eight to ten sprints? Calculate the average and note the range. If you’ve been completing between six and ten items per sprint, with an average of eight, that’s your forecast baseline.
Start the planning session by reviewing the team’s throughput data together. This isn’t about judgement or performance management. It’s about grounding the conversation in reality. “Based on our history, we typically complete around eight items per sprint. Sometimes it’s six, sometimes it’s ten. Let’s plan accordingly.”
Then work through the backlog in priority order. For each item, ask three questions. Do we understand what done looks like for this? Is this small enough to complete in a couple of days? Are there any blockers, dependencies, or risks we need to address?
If the answer to the second question is no, stop and slice. This is the most important part of the session. Don’t let large items into the sprint. Every large item is a forecast risk and a flow impediment.
Keep pulling items until you’ve reached your typical throughput number. If your average is eight, stop at eight or perhaps nine. Resist the temptation to overcommit because this sprint “feels different.” It doesn’t. The whole point of using historical data is to protect you from optimism bias.
That’s it. No poker. No Fibonacci. No debates about whether complexity and effort should be weighted differently. Just a focused conversation about understanding the work and making sure it’s small enough to flow.
Answering the objections
When you propose this approach, you’ll face pushback. Here’s how to address the most common objections.
“How will we know if we’ve planned the right amount of work?” You’ll know the same way you know now: by comparing what you planned to what you delivered. The difference is that throughput-based planning gives you an honest forecast based on measurement rather than a confident-sounding number based on guessing. If you consistently complete eight items, planning for eight items is a reasonable bet. If you consistently complete somewhere between six and ten, acknowledge that range rather than pretending you can predict exactly.
“What about items that are genuinely complex and can’t be sliced smaller?” This is almost always a failure of imagination rather than a hard constraint. I’ve worked with teams building safety-critical systems, complex financial products, and intricate distributed architectures. In every case, we found ways to slice work small. The technique varies depending on context, but the principle holds: there’s always a thinner slice that still delivers something real. If you truly cannot slice something smaller, that’s a signal that you don’t yet understand the work well enough. Do a spike. Build a prototype. Run an experiment. Don’t commit to delivering something you can’t decompose.
“Stakeholders expect velocity reports and burn-down charts.” This is a change management challenge, not a technical one. Most stakeholders don’t actually care about velocity. They care about predictability: when will this be done, and can we count on that date? You can answer those questions more honestly with throughput data and cycle time distributions than with velocity. Have a conversation with your stakeholders about what information they actually need and why. Often they’re relieved to stop pretending the current reports mean something.
“Different team members work at different speeds, so counting items doesn’t account for who picks up what.” Neither do story points. You’re measuring team throughput, not individual productivity. Over time, the mix of who does what averages out. If it doesn’t, you have a team design problem that no estimation method will solve.
“We need estimates for roadmap planning and budgeting.” Throughput data supports roadmap planning better than story points. If you have fifty items in the backlog and you complete eight per sprint, you can forecast six to seven sprints of work with appropriate confidence intervals. This is more honest than converting everything to story points and dividing by velocity, which gives you a single number that implies false precision.
Making the transition
If you’re convinced this approach is worth trying, here’s how to introduce it without causing chaos.
Start by gathering data. Even if you’re still using story points, begin tracking throughput alongside velocity. After a few sprints, you’ll be able to show the team how throughput compares in terms of stability and predictability.
Next, invest in slicing skills. Run workshops on vertical slicing. Practice breaking down real backlog items together. Make slicing a core part of your refinement sessions. This is the foundation that makes everything else work.
Then propose an experiment. Suggest trying throughput-based planning for three sprints. Frame it as a learning exercise rather than a permanent change. This reduces resistance and gives sceptics a face-saving way to engage.
During the experiment, facilitate planning sessions that focus on understanding and slicing rather than estimating. Keep the conversations grounded in “is this small enough?” rather than “how big is this?”
After three sprints, retrospect together. Was planning faster? Did the team deliver roughly what they forecast? Did the conversations during planning feel more useful? In my experience, teams rarely want to go back once they’ve experienced the simplicity of this approach.
The deeper shift
Moving away from story points isn’t just a process change. It’s a shift in mindset from prediction to measurement, from estimation to understanding, from big batches to small slices.
This shift has benefits beyond planning. Small slices improve flow, reduce risk, accelerate feedback, and make continuous integration actually possible. Teams that slice well deliver more frequently and learn faster. They spend less time in meetings and more time shipping.
Story points were never the point. The point was always to deliver valuable software sustainably. If your current approach to planning isn’t helping you do that, it might be time to try something simpler.

