The Art and Science of Measuring Data Teams Value

“Data is a precious thing and will last longer than the systems themselves” – Tim Berners-Lee.

Data holds enormous value in today's world, where businesses of all sizes rely on it to make decisions, spot new opportunities, and improve performance. However, the nature of data projects can sometimes lead to the perception of their outcomes being intangible, making measuring value and return on investment (ROI) challenging.

Analyzing the analysis: The challenge of measuring data teams value

Tweet by Sean J. Taylor

Do you have high expectations in your data projects, only to find it hard to measure success? Have you struggled to make an impact and prove your worth as a data professional? This irony is all too common in data analytics and data science. Determining the value of data projects and teams can be complicated for several reasons.

Pedram Navid's recent tweet asking the audience: “Have we figured out how to measure the value of a data team yet?” caught my attention. As a data engineer, I always focused on serving stakeholders to the best of my ability, but I never considered specific success metrics. The most shocking part was the tweet responses' lack of concrete answers.

The main problem with measuring a data team's value is that our impact is often indirect and not easily quantifiable. The data team usually supports other functions within the company – like marketing, sales, or operations – to help drive company performance, but we don't necessarily generate revenue alone.

For example, if we improve a company's data infrastructure by designing a data lake or more reliable pipelines, it's hard to put a dollar value on that. And even with more tangible outcomes like dashboards or models, measuring the impact in terms of ROI can be daunting.

Of course, there are exceptions. If a company sells data as its product, then the data team has a more obvious connection to value creation, and it's easier to measure our impact. But for most data teams, it's a bit more complicated.

Generally, a data project becomes tangible or realized when stakeholders – the individuals who use the data – put data assets to use. In this sense, the value of data and its derivatives is subjective as it depends on the user and what they do with such assets. Nevertheless, data projects hold value, either potential or realized. 

Tristan Handy, the founder of dbt Labs, believes that data teams need to be evaluated more in terms of their value and argues that functional areas of the business have established metrics to measure efficiency and ROI. However, data teams still need to have a similar process in place. Moreover, Tristan insists that data teams should start applying more scrutiny to their costs as macroeconomic conditions shift, allowing them to make stronger arguments for their budgets.

With this blog post, I don't pretend to have all the answers to valuing data teams – we already established that it’s a complicated endeavor. I aim to share what I've learned through my research and bring attention to some perspectives I believe are helpful and can guide us in the right direction.

Understanding data value creation: The data hierarchy of needs

When talking about a "data team," we often refer to a group of skilled people with unique areas of expertise who collaborate to harness the power of various technologies to drive value through data.

Data teams are comprised of professionals with various roles, such as data engineers, analysts, and scientists. Their responsibilities can vary from company to company. For instance, data engineers might focus solely on ensuring the reliability and robustness of the data platform. In contrast, data scientists might have a closer relationship with the business to provide data-driven recommendations.

As a result, each of these data professionals has different stakeholders. Data engineers support data analysts and scientists, who in turn support decision-makers in functional areas with the most in-depth understanding of the business.

To fully understand data teams' value creation, breaking down these teams into smaller components is essential. One helpful way to understand a data team's components – or rather, layers – and the value they deliver is by examining the "data science hierarchy of needs." This concept draws inspiration from Maslow's hierarchy of needs and asserts that reaching full potential requires first meeting basic needs.

A quick Google search will show you a few different hierarchy versions. Some focus more on technology, while others focus on roles, but they all try to convey the same underlying message.

Data Hierarchy of Needs

I particularly like the hierarchy originally proposed by Shopify because each level represents the outcomes that data teams deliver. Each layer provides information, from raw data to knowledge to wisdom.

Over the years, several approaches to calculating the value of information have been proposed. One of the most prominent examples is the book Infonomics, where Doug Laney presents several models to estimate the value of information assets. 

Laney’s models are divided into “foundational” and “financial.” The foundational models evaluate the qualitative aspects of information and its effects on key performance indicators. Meanwhile, the financial models estimate the monetary value of traditional assets and apply those models to information.

Laney then explains how the models can be further broken down into “leading” and “lagging,” with leading indicators highlighting the potential value and lagging indicators the realized value of information. The spectrum of potential-to-realized value can be linked to the data science hierarchy of needs, which usually has a potential value (raw data) at the bottom and realized value (applied insights) at the top.

The Data Hierarchy of Needs from the perspective of Laney's Information Asset Valuation Models. The bottom of the hierarchy represents potential value, and the top realized value.

The challenge of calculating the value of data teams can be so broad; therefore, I propose breaking it into smaller pieces, inspired mainly by Laney’s work. So, let's explore each level of the data hierarchy of needs to identify the data value creation and the metrics and methods that could be used to calculate such value.

Collect and model: Building a solid foundation.

The base of the hierarchy involves data collection and modeling. With a robust data platform holding clean and organized data, analysts and engineers can eventually create fancy machine-learning models and make impactful analyses. It's all about laying the groundwork. And just because it comes first doesn't mean it's less critical. Taking care of the basics gives us peace of mind knowing that we can trust the insights from our analysis.

How to measure value at this level?

At this level, it’s all about establishing a robust data platform. Carefully designed data warehouses and data lakes, robust data pipelines, ensuring data quality, and complying with SLAs are ways we, data engineers, create value. There’s no analysis or decision-making yet at this level, just pure data and systems.

The intrinsic value of information (IVI) helps you evaluate the innate quality of data assets. It tells you how complete, accurate, and unique the information is. Here's an example of calculating IVI:

  • Validity: Percentage of correct records.
  • Completeness: Percentage of total records versus the universe of potential or supposed records.
  • Scarcity: Percentage of your market or competitors that likely have this same data.
  • Lifecycle: The reasonable usable length of utility for any given unit (record) of the information asset (e.g., in months).

The ideal IVI is 1.0, meaning the data is accurate, complete, and no competitors possess it.

Now, suppose you would like to value the data from a financial perspective. In that case, you can use the cost value of information (CVI) metric, which calculates the data value by determining the cost of collecting, securing, and managing it. Any financial loss that would occur if the data was lost or damaged can be included optionally. Here's an example of how you can calculate CVI:

  • ProcExp: The annualized cost of the processes involved in capturing the data. 
  • Attrib: The portion of process expense attributable to capturing the data. 
  • T: Average life span of any given instance of data.
  • t: The time over which the expense is measured. 
  • n: The number of periods of time until the information is reacquired or until business continuity is no longer affected by the lost or damaged information.

The cost associated with data collection can be challenging to quantify because some may be collected as part of a business's regular operations. That’s why the formula differentiates the portion of expenses attributable to collecting the data.

Describe: Gaining a baseline understanding of the business

At this level, data analysts use the collected data to see what's happening at the company. They get to answer questions like: “How's product X doing in the last three months?” The answers give stakeholders a better understanding of the business and can guide them on what to do next or lead analysts to even more complex analyses. For example, if product X is a hit, they may want to know why and how to replicate that success.

How to measure value at this level?

At this level, you could still use the models presented in the previous section since, most probably, analysts generated derived datasets. However, it may be more convenient to use metrics that include the impact of such analysis on the business.

The business value of information (BVI) metric considers the data quality, accuracy, and relevance to business activities. Here’s an example of how to calculate the BVI:

  • Relevance: The information's usefulness to a specific business process (rated 0 to 1).
  • Validity: The percentage of records with correct values.
  • Coverage: The number of records in the dataset compared to the total universe of potential records.
  • Timeliness: The probability that the information is up-to-date at any given time.

But what if you want to know the value of your collected information to outsiders? Data has become a commodity of great financial worth in recent years, and Laney’s market value of information (MVI) model provides a way to calculate value. Some companies monetize their data by selling it through hosted data marketplaces, for example, the Snowflake Marketplace. Here’s how you can think about MVI:

The MVI takes into account that most information is not transferred in ownership but instead licensed. Market size analysis and surveys of potential licensors can also be used to determine the premium factor, which is included to account for the diminished marketability of information as it becomes more ubiquitous.

Predict: Answering deeper questions using advanced analytical techniques.

Things get more interesting at this stage. With a solid foundation and a good understanding of what's happening in the business, data scientists and machine learning engineers are ready to make educated guesses about the future. They can provide stakeholders with predictions and estimations to questions like: “What are sales going to look like in the future?” or “Why did product X take off?”. 

It’s important to note that not all companies need to reach this hierarchy level. Often, having a data platform and analytics team to support more basic analysis provides enough value.

How to measure value at this level?

As we delve deeper into complex analysis and ascend the data hierarchy of needs, the challenge of quantifying value intensifies. At this level, it may be worth looking at Laney’s economic value of information (EVI), which calculates how much revenue the business can make by incorporating data into business processes.

Here's how you may think about EVI:

  • Revenuei: The revenue generated with the information asset (informed group).
  • Revenuec: The revenue generated without the information asset (control group).
  • T: The average expected life span of any given information instance or record. 
  • t: The period of time during which the EVI experiment or trial was executed.

The EVI can be used to see the impact on revenue by running an experiment and comparing Revenue(informed) to Revenue(controlled).

As measuring value at this level becomes more indirect and subjective, I’d like to bring up some other interesting (and less mathematical) metrics I came up with during my research.

Barry McCardel, Co-founder and CEO at Hex, argues that the best way to measure your data team's ROI is to let others speak on your behalf. This method may prove helpful when valuing the outcomes of more complex analyses.

If your team is providing valuable contributions, the leaders of other departments should be vocal about it and advocate for more resources. On the other hand, if stakeholders are not supportive, it may indicate that your team needs to reevaluate its operations and align more closely with business outcomes.

Katie Bauer, Head of data at GlossGenius, shares a similar view on this challenge. She suggests beginning with basic metrics related to engagement and usage. For example, daily, weekly, or monthly active users can be tracked if the data project is a dashboard or a platform. These or other metrics can tell you if stakeholders are utilizing what you offer.

It’s worth noting that stakeholder engagement and usage metrics can be helpful at any of the hierarchy levels, not just at this one.

Prescribe and influence: Recommending action based on insights.

With all the information gathered, data scientists are ready to give solid advice to the business, be the experts and materialize value. The top of the hierarchy is the pinnacle of what data science should be. Its main goal is to prescribe data-driven solutions and eventually influence the course of the business.

How to measure value at this level?

Now we find ourselves in a landscape of subjective and immeasurable elements, where data, business, and even human psychology merge to unlock the power of data-driven recommendations.

Still, a metric that could be applied at this level is Laney’s performance value of information (PVI), an approach to measure the impact of an information asset on business objectives or key performance indicators (KPIs). The PVI answers the question: How much does having this information improve business performance? And it requires running a controlled experiment.

Here’s how to estimate the PVI:

  • KPIi: Business process instances using the information asset (informed group).
  • KPIc: Business process instances not using the information (control group).
  • T: The average usable life span of any data instance.
  • t: The duration over which the KPI was measured.

The problem with metrics like the PVI is that running controlled experiments is sometimes impossible.

Benn Stancil, Chief Analytics Officer at Mode, points out that, ironically, the industry has a big challenge in measuring itself at this hierarchy level.

Stancil shares the following example to illustrate the issue: Imagine a company is trying to choose between opening a sales office in London or Tokyo. An analyst recommends going with Tokyo, which turns out to be a decent success. However, it's tough to say if it was indeed the best decision as there's no way to know what would have happened if they had chosen London instead.

As Stancil suggests, a possible solution to quantifying value is to measure the speed at which decisions are made based on the analysis. The lower the time on the timer, the better the analyst performs. 

Focusing on this metric has several benefits, including forcing data analysts to understand the problem, encouraging them to see issues from the decision-makers perspective, providing a counterbalance against excessive analysis, and emphasizing effective communication.

Although measuring the exact time a decision was made may be difficult, it can be estimated or sensed, making it a better metric than subjective measures such as the "goodness" of the outcome.

Summing up

The challenge of determining the value of data projects and teams arises from their indirect and intangible nature. In my opinion, "how to measure the value of data teams?" is too broad, making it worthwhile to break down the question into smaller parts. We can understand data value creation at different levels by examining the data science hierarchy of needs. Each level of the hierarchy requires specific metrics and methods to calculate value – and even a further breakdown, into distinct business units, for example, has to be considered.

At the end of the day, we have to accept that there will always be an element of subjectivity in measuring value; much like art is in the eye of the beholder, the value of data can sometimes only be assessed by those who use it.

Limitless data movement with free Alpha and Beta connectors
Ready to unlock all your data with the power of 300+ connectors?