Data Science and Coaching: The Yin/Yang of Better Interventions

If you’ve ever wondered if Agile coaching is effective or worth the money spent, you’re not alone. Your customers probably do, too. Let’s look at how one manager who grappled with these same questions came to measure the value of coaching and using methods of Data Science to discover which interventions work and
which don’t.

Thirty-year-old Kylie manages a department of eighty people, divided across several teams. During the company’s Agile transformation, she chose an evolutionary approach to change and hired external coaches to help her every employee realize their full potential. Six months later, she feels a new vibe in the department. People smile more often and seem to be having more fun. Kylie believes the coaches are doing a fantastic job, but how can she know for sure?

This article is published in . Agile NXT is the magazine full of inspiration for professionals on the emerging Agile journey. Theme of #2: New Insights for Agile Performance Management.

Customer Language
Despite her gut feeling about the new vibe at the office, another part of her mind keeps pressing her with questions: How do I know coaching is effective? How can I tell if it isn’t? How long does it take for interventions to change results? How can we tell that the change in the results is significant?”

All day long, week after week, at the water cooler, over lunch, during the 30-minute bicycle ride to the office and home, her mind works over and over these questions. She still believes the coaches are helping the teams, but how can she know for sure? How can she prove that the interventions are effective?

One night, while catching up on some reading for work before going to bed, she comes across an article on performance management. One line, in particular, jumps out at her, “Intent leads to behavior which leads to actions which lead to business results.” Suddenly, it dawns on her: if coaching focuses on behavior, it should be measurable in terms of the results.

But what results should she consider? For Kylie’s department, the number of items completed every two weeks is what matters to the customers. So, focusing on these results would also support customer collaboration.

Business Results = Number of Completed Customer
Requests (per 2 weeks)

Data Science
Back at work and excited, Kyle seeks out the expertise of one of her colleagues, John, a data scientist. Since her teams have been using Jira for over a year, to track issues and manage their Agile projects, would it contain the information she needed to see if the coaching was working?

When Kylie poses her question to John, he’s intrigued. “Let’s take a look,” he says, opening his laptop. He pulls up a work record for Team Saturn and Team Jupiter. The numbers in Table 1 correspond to the number of customer requests completed in a two-week period.

Team \ Iteration |1 |2 |3 |4 |5 |6 |7|8 |9 |10|11|12|13|14|15|16|17|18|19|20|21|22|23

----------------|--|--|--|--|--|--|-|--|--|--|--|--|--|--|--|--|--|--|--|--|--|--|--

Saturn |3 |3 |4 |4 |7 |5 |1|11|5 |6 |3 |6 |6 |5 |4 |10|4 |5 |8 |2 |4 |12|5

Jupiter |51|49|51|49|67|37|9|18|24|15|12|34|62|79|20|15|24|29|39|14

Table 1. Number of bi-weekly completed customer requests.

He explains, “The teams are like a highway. Counting the number of completed requests is similar to counting the number of cars passing a point in a certain period. Because the drivers are all different people with varying driving styles, the number of cars vary per period. The average number of cars passing, and the variation is described by what mathematicians call a Poisson distribution.”

Kylie looks closer at the data, “I see small variations in the data. But also, some large changes in the numbers—small numbers followed by relatively larger ones. But that doesn’t necessarily mean it signifies an important event, right?”

“That’s right,” says John. “The team could have changed its behavior around that point, or it could be a statistical variation that has no special meaning.”

“How can we tell the difference?” Kylie asks.

John mumbles something about using the Poisson distribution to model the team and goodness of fit criteria to filter out the jumps in the data that are significant. He then mumbles for another thirty minutes.

When he finishes, Kylie summarizes what she made of it, “So, we understand the results by using a Poisson distribution to model the output of the team, and other well-known statistical methods to tell whether a team changed its behavior enough to be noticeable in the data?”

John nods.

Data_Science_and_Coaching_The_YinYang_of_Better_Interventions_image

Seeing Team Behavioral Change in the Data
“That’s great!” Kylie exclaims, looking at the charts in Figure 1, which shows the value of the goodness of fit at the 2-week periods on the vertical axis and the 2-week periods on the horizontal axis. “So the data shows no significant change for Team Saturn, while the chart for Team Jupiter clearly shows four significant changes.”

John explains “...the chart for team Saturn shows a ‘Goodness of Fit’ (blue markers) well below 5 for the entire range which is very good. Now, team Jupiter, has a very different chart. First, the scale is much larger as can be seen from the vertical axis. The ‘jumps’ in Chi-squared values are larger. This is reflected in the jumps in the observed Delivery Rate. In the figure this is indicated by the arrows. Each arrow corresponds to a period where the team shows a constant behaviour;
at least as seen by the customer. Around the ‘jumps’ the team apparently changed their behaviour or way of working that results in a different Delivery Rate. In the case of team Saturn this rate is consistently constant
over the whole range.”

Figure 1: The goodness of fit of a statistical model describes how well it fits a set of observations. Measures of goodness of fit typically summarize the discrepancy between observed values and the values expected under the model in question.

The next day, Kylie gathers all the coaches and shows them the charts from Figure 1, highlighting the main points from John’s analysis. “What can we make of all of this?” She asks, prompting a lively conversation about interventions. She and the coaches discuss what makes interventions effective and explore ways to measure that effectiveness. She then asks, “How can we improve our interventions in a way that results in a noticeable impact on our customer?”

Several of the coaches chime in with ideas:

“Visualize the interventions on a board.”
“Formulate them as a hypothesis.”
“Keep tracking them for observable effects.”
“Regularly review the result to check on which work and which don’t.”

With these ideas and a new tool in hand, Kylie can now measure the effectiveness of coaching and support the coaches to become even better. She has measurable feedback, in relation to what’s important to the customer. She can also look deeper into the team’s work, and working agreements, to discover possible interventions. She can rest easy knowing that coaching results go beyond a positive vibe in her gut—it’s making an actual, measurable impact with their customers.

Get Started
Want to know what John did to perform the analysis? John used the tool Chi2fit. To get started immediately go to repository: piisgaaf/chi2fit/master. Or install from the source: github.com/piisgaaf/chi2fit

Want to know more about this topic? Download and start your personal change tomorrow.

Data Science and Coaching: The Yin/Yang of Better Interventions

Explore more articles