ProbableOdyssey

Extreme Programming for ML Teams: Faster Delivery, Reliable Results

Despite their overlap, the best practices of software engineering in one field often falter when applied directly to data science. Agile rituals that work for feature-driven software projects can feel clumsy when applied to ML. In software, progress can be sliced into neat features shipped every one or two weeks. In data science, progress is murkier: projects take longer, discovery work is essential, and failed experiments aren’t setbacks — they’re the process itself.

Agile frameworks were developed for software because traditional “waterfall” workflows from engineering could not keep pace in the uncertain world of software. Data science teams face the same problem, but with different constraints. The lesson is the same: collaboration and knowledge sharing are the countermeasures to uncertainty — enabling teams to learn quickly, embrace creativity, and move toward faster, more reliable progress.

That’s why a recent post by Jacob Clark from Hyperact (Should we revisit XP in the age of AI?) caught my attention. Jacob makes a compelling argument for bringing “extreme programming” (XP) back to the forefront for agile software development today, and I noticed that the tenets of XP are exactly the same as what I’ve observed makes ML teams successful.

What is XP?

XP is an agile software development methodology focused on improving software quality and responsiveness to changing requirements. Introduced by Kent Beck in 1996, the core idea is to take good development practices to the “extreme”. It’s defined by 5 core values:

  1. Communication: constant, clear communication between team members and stakeholders.

  2. Simplicity: build the simplest solution that works; avoid over-engineering.

  3. Feedback: use frequent testing and customer involvement to get rapid feedback.

  4. Courage: be willing to change direction, refactor, or throw away code when needed.

  5. Respect: everyone’s contributions are valued; sustainable teamwork.

XP prescribes a set of engineering practices, many of which are now mainstream agile techniques:

  1. Collaboration & Team Practices

    • Pair Programming – two developers work together at one workstation.
    • Collective Code Ownership – anyone can change any part of the code.
    • Coding Standards – shared style to keep codebase consistent.
  2. Code Quality & Engineering Discipline

    • Test-Driven Development (TDD) – write automated tests before writing code.
    • Refactoring – improve code structure continuously without changing behavior.
    • Simple Design – design for current needs, not speculative future requirements.
  3. Process & Delivery Practices

    • Continuous Integration (CI) – integrate and test code frequently (multiple times a day).
    • Small Releases – deliver working software frequently (days to weeks).
    • Sustainable Pace (“40-hour week”) – avoid burnout; maintain long-term productivity.
    • On-Site Customer – customer representative works closely with the team to clarify requirements.

In practice, development is done in short 1-2 week iterations, tasks are derived from user stores — which keep the requirements and needs in full focus. Importantly, planning for each task is adaptive: the scope can change, but time boxes are fixed.

This workflow is very responsive to changing requirements, increases collaboration and knowledge sharing, and encourages high code quality through TDD, CI, and refactoring.

However, it’s not a practice suited for large distributed teams. It makes a strong assumption on the ease of getting regular time with a customer representative, and it also appears to have a high cost to productivity at first. These are valid concerns, but I strongly argue against the perceived buy-in cost. The heart of this workflow is not only sufficient for fast and reliable development — it’s absolutely necessary!

Adapting XP for Data Science

Some of these principles work well in data science without modification, but parts of this need re-interpretation to be applied effectively in ML teams.

Where XP Fits Well:

Where XP Needs Adaptation:

Where XP Breaks Down:

An XP Workflow for data science

Adapting XP to data science means treating experiments, pipelines, and models as the deliverables of your workflow, rather than traditional software features. A typical workflow might look like this:

Iteration Cycle (1–2 weeks)

Daily Rhythm

Testing and Validation

By combining these practices, the workflow maintains XPs core values: frequent feedback, simplicity, collaboration, and courage to pivot. Experiments are treated as releases, knowledge is shared continuously, and data quality and reproducibility are built into the process. The result is a team that can explore boldly while still delivering reliable, high-quality outcomes.

Key takeaways

XP Brings robust development practices to ML Teams by taking the elements of good collaboration to the extreme, resulting in

Something I think is worth highlighting: frequent pairing may feel like a bottleneck, but I’ve experienced the complete opposite effect whenever I engage with it. Simply put, it’s more fun. Some of the most insightful tips I’ve learned didn’t come from textbooks, they come from people I paired with over the years. The shared enjoyment of learning motivates sustainable progress and keeps momentum consistently high over a long stretch of time.

XP wasn’t written with ML in mind, but its core philosophy — rapid feedback, simplicity, discipline — fits the challenges of ML/AI engineering. By treating experiments as releases, data as code, and evaluation as testing, XP gives ML teams a methodology that supports both exploration and reliable delivery.

Reply to this post by email ↪