Part III: Data Puppy - Shrinking Datadog Costs

Jul 20th, 2023
Part III: Data Puppy - Shrinking Datadog Costs
URL Copied

Welcome back to the series on hacking the Datadog pricing model. In the previous blog post we explored the Datadog pricing model. In this part, we’ll discover the key factors to consider and unlock the hidden potential for optimization.

Without further adieu, let’s dive into what can be done.

Use Committed Prices

You already decided Datadog is your observability platform of choice (which is a good decision), so you need to be able to manage spend and commitments with your Datadog account manager

The easiest saving opportunity is to analyze your usage, and adjust your commitments or base fee accordingly - this can help you save anywhere between 10% to 30%, without any engineering involvement.

Once you set your commitment levels, You have to have the tools to monitor it on a daily basis, alert on anomalies, and proactively identify any threat to those commitment levels.

Build a process of cost government - when you understand their pricing model and how your production system behaves - it is achievable even in products as complex as DataDog.

This process is an ongoing effort and it needs to be repeated monthly / annually (depending on your plan).

Transition from End of Month “Bill Shock” to Push-Based Daily Cost Monitoring

One of the main pain points using the Datadog platform is not that it’s expensive, it’s the looming bill shock. The billing date might blind side you with unexpected usage.

That being said, the earlier you catch anomalies and cost spikes, the earlier you can address their root cause, and the more predictable your end of the month payment will be.

There are many reasons for cost spikes in the wild:

  • A developer enabling debug logs in production by mistake.
  • Unknowingly adding a high cardinality tag to a custom metric.
  • Spinning up a testing environment and forgetting it.
  • A new feature is released to production and you didn’t foresee it’s usage increase

So keep an eye on your usage, and preferably set alerts! Within the Datadog platform this can be achieved by setting monitors on the estimated usage metrics (note that those are usage and not cost metrics), or alternatively leveraging an anomaly engine such as Finout (full disclosure - I work @ Finout).

Unit Economics of Your Observability Platform

First, let’s agree that a typical DD invoice can easily get to 4-10% of your cloud provider invoice - and since we know how expensive your cloud provider is, you can now understand how expensive Datadog is too.

Scalable - Throughput and $ Wise

 

Datadog is designed to scale with your business and infrastructure and it does that amazingly well, just keep an eye on the unit economics of the observed system. 

In practice, make sure your observability spend is within a “reasonable” ratio to the infrastructure spend. We see from customers that anything between 2%-6% of your infra cost is probably “healthy”.

Monitoring the trend here is key, don’t get caught with your devs adopting Datadog without an ability to verify that the increase makes business sense.

New call-to-action

Reduce Waste and Idle Monitoring

This is the fun part - eliminate waste.

Logs

Everything in my post on logging optimization is also applicable to Datadog logging products too. Make sure your debug logs are turned off at any scalable system, and only log what you need, and with the correct retention.

Datadog offers a cool platform of “logs as metrics”. This can be leveraged to reduce the volume of indexed logs, while still keeping some of their visibility as metrics (Note that you’ll still pay for the log ingestion).

Debug log costs by service from the Finout Platform

No, you don’t need all those Custom Metrics!

We love metrics, and especially custom metrics. A code review ending with “well you are sending too many metrics” is rare. We are educated to feel that “ you could never have enough monitoring”, and that we need each and one of these metrics in case a production incident happens and this specific metric will save my Friday night.

Three months later we are left with tons of idle metrics that are being used only by the Datadog billing systems, so clean up your infrastructure and remove any unused custom metrics. 

We’ve created this python script to identify unused custom metrics that do not exist in any Dashboard or Monitor. I encourage you to give it a test run.

Just Delete them. You can let go of them.

Metrics cardinality - a blessing in disguise

You’ve deleted all your old, stale, unused custom metrics. You are left with only what is necessary to keep production alive, but what about the tags cardinality? Make sure the cardinality of sent tags is reasonable! Custom metrics product is billed by the metric type used (Gauge / Histogram / Counter) and the cardinality of tags sent.

A good starting point will be leveraging the list-active-tags-and-aggregations API, to get a general understanding of your stance.

Finally, use the right observability primitives for each metric - Histograms and Distributions are more expensive than Counters and Gauges, who use them only when you really going to use their added value.

Synthetic Tests

Make sure that your Synthetic tests are actually used and alerted on by at least one Monitor, and verify that it is scheduled at reasonable intervals.

Make sure you don’t have multiple tests for a single endpoint - I know this is obvious, but in a large organization with a lot of staff changing roles  - these things tend to happen.

Host Monitoring and APM

This is a tricky one, but in general make sure that you only monitor hosts you care about and not all your fleet by default, taking into account that a highly containerized environment inflicts additional container costs.

Consider using cheaper open source tools for “general monitoring”, and the full extent of the Datadog platform for your most valuable hosts and services (see my “I Have An APM Addiction” talk [Hebrew] that covers exactly this issue.)

One of the patterns I’ve used in the past is to create internal low maintenance observability solutions for non-critical workloads (such as dev and staging environments) by leveraging open source tools with very short retentions.

New call-to-action

Encourage Effective Monitoring State of Mind

Many engineers, especially less experienced ones, do not necessarily understand the cost of observability - for them it’s simply a single line of code, and lines of code are free!

Create an environment where the engineering organization is aware of observability costs, and consider any log line or metric added or change, as a change with cost implications.

Focus on custom metric tag cardinality and log throughput.

End Note

I hope this series gave you some food for thought on how to address your Datadog cost, and position yourself better in your next pricing negotiations.

You can find the link to the original blog post here.

Datadog is an amazing observability platform (although I am more of an open source monitoring kind of a guy 😉), but its cost can get out of hand quite easily and become a hassle for your FinOps or Engineering/DevOps Manager. It doesn’t have to be.

As always, thoughts and comments are welcome on twitter at @cherkaskyb

Read More About Datadog Costs

How Much Does Datadog Cost?

Understanding Datadog's pricing model is crucial when evaluating it as a solution. Explore the various factors that influence Datadog's pricing and gain insights into its cost structure. Additionally, discover effective considerations for managing usage-based pricing tools like Datadog within the context of FinOps.

Read more: How Much Does Datadog Cost?

Part I: Getting Around the Datadog Pricing Model

In the first part of the blog series, written by our talented Software Engineer, Boris Cherkasky, we explore the question: "Why you should care about your Datadog costs?" Boris dives into crucial aspects of Datadog costs, emphasizing the importance of understanding them. He also sheds light on how Datadog pricing works, shares his experiences and lessons learned as a Datadog user, and discusses strategies to crack the Datadog cost/usage model. Moreover, Boris provides valuable insights on how to effectively gain control over Datadog costs.

Read more: Part I: Getting Around the Datadog Pricing Model

Part II: The Magic That Is In Datadog Pricing

In the second part of the blog series written by our talented Software Engineer, Boris Cherkasky, we cover how in general Datadog products get billed, and uncover the factors that sometimes lead to unexpected end-of-month invoices.

Read more: Part II: The Magic That Is In Datadog Pricing 

Datadog Pricing

Discover the intricacies of Datadog pricing, explore key features such as debug, custom metrics, and synthetic monitoring, and provide strategies to optimize costs without compromising on functionality.

Read more: Datadog Pricing Explained

Datadog Debug Pricing

Datadog Debug offers developers the remarkable ability to streamline bug resolution and optimize application performance. To fully harness the potential of this invaluable tool, it is important to grasp its pricing structure, evaluate the value of its advanced features for your specific debugging requirements, and identify key elements that influence Debug pricing. 

In this blog post, we dive deep into these essential aspects, providing you with the knowledge needed to make informed decisions and leverage Datadog Debug effectively for enhanced development workflows.

Read more: Understanding Datadog Debug Pricing 

Datadog Custom Metrics Pricing

Datadog custom metrics empower businesses to capture and analyze application-specific data points, tailored to their unique use cases. The true potential of Datadog custom metrics lies in the precise insights they offer into application performance. Therefore, comprehending the product's pricing structure and evaluating the value of advanced features becomes crucial in making informed decisions to optimize costs effectively.

Read more: Understanding Datadog Custom Metrics Pricing 

Datadog Synthetic Pricing

Integrating Datadog Synthetic Monitoring into your monitoring and observability strategy is a vital step for organizations seeking to proactively monitor and optimize their applications, while ensuring exceptional user experiences and mitigating risks.

In this blog, we will dive into the Datadog Synthetic pricing structure and explore the key factors that influence these costs. By understanding these aspects, you will be equipped to make informed decisions and leverage the full potential of Datadog Synthetic Monitoring.

Read more: Understanding Datadog Synthetic Pricing 

Optimizing Datadog Costs

Discover effective cost optimization strategies for utilizing Datadog to its full potential without incurring unnecessary expenses. By implementing these best practices, organizations can achieve maximum efficiency with Datadog while ensuring a high level of observability. Learn how to reduce monitoring costs without compromising on the quality of insights and monitoring capabilities.

Read more: Optimizing Datadog Costs 

Main topics