Using Stats in Splunk Part 1: Basic Anomaly Detection (2024)

One of the most powerful uses of Splunk rests in its ability to take large amounts of data and pick out outliers in the data. For some events this can be done simply, where the highest values can be picked out via commands like rare and top. However, more subtle anomalies or anomalies occurring over a span of time require a more advanced approach.

This article will offer an explanation of the standard score (also known as z-score) in statistics, how to implement it in Splunk’s search processing language (SPL), and some caveats associated with the technique. By the end of this article you should have a better familiarity with these statistical concepts and gain some intuition on the appropriate uses of such techniques.

Commands and subcommands

There are several commands and subcommands that this technique uses. Below is a brief overview of these; feel free to skip this section if you’re already familiar with them.

bin/bucket

The bin/bucket commands (which can be used interchangeably) break timestamps down into chunks we can use for processing in the stats command.

Avg/stdev/count/sum

  • Average:calculates the average (sum of all values over the number of the events) of a particular numerical field.
  • Stdev:calculates the standard deviation of a numerical field. Standard deviation is a measure of how variable the data is. If the standard deviation is low, you can expect most data to be very close to the average. If it is high, the data is more spread out.
  • Count:provides a count of occurrences of field values within a field. You’ll want to use this if you’re dealing with text data.
  • Sum:provides a sum of all values of data within a given field. You’ll want to use this for numerical data (e.g. if the field contains the number of bytes transferred in the event).

How many events do we need?

When calculating the statistics mentioned above, we need to make sure the sample size we’re choosing accurately represents the data. If we choose too small of a timeframe, we might not get a representative sample of the data. Our calculations could produce either a lot of false positives or miss some anomalous events as a result.

Luckily, the Central Limit Theorem offers us some insight into how many events we need for a good sample. The short version of the theorem states that as sample size increases, the mean (average) of the sample data will be closer to the mean of the overall population. Since getting an average for all your data is likely impractical computationally, we can use this theorem to our advantage. If we can create a search that has around30 data pointsper time span, we’ll likely have enough data to have an accurate sample.

Applying what we learned

Given this information, we can do something like the following to calculate some statistics about the normal indexing of data, which we save into a lookup for future reference:

Copy to Clipboard

The above produces a lookup containing the amount of data indexed for an index in a 15m period.

From this we can begin to work on our detection search. We’ll join the historical statistical data we saved to the lookup with a new search that will look for drops. After we do so, we can calculate the z-score, which tells us the number of standard deviations a particular value is from the average.

Copy to Clipboard

More about z-score

How do we determine what value of z-score to set for our threshold? The answer is a bit complicated. There are, however, a few rules that we can take into consideration to help us decide:

1. 68–95–99.7 rule

This rule applies to totally normal distributions (where the data looks like a standard bell curvehttps://en.wikipedia.org/wiki/File:Standard_deviation_diagram.svg<- good chart). The quick takeaway is that if the distribution is normal, we can expect 99.7% of values to have a z-score of less than 3.

2. Chebyshev’s inequality

This is a more general rule stating that for a wide class of probability distributions, we only expect values to be a certain distance (measured in standard deviation) from the mean.https://en.wikipedia.org/wiki/Chebyshev%27s_inequality
The quick takeaway is that for most distributions we expect 99% of values to have a z-score of less than 10.

In the above example, we’re assuming that the distribution matches a standard distribution, but your data may be different. In that case, you should apply the findings of Chebyshev’s inequality to determine the threshold to use.

Conclusion

Hopefully this article provided some insight into how to perform basic anomaly detection using some of Splunk’s built-in SPL commands. It should also give you an idea of what thresholds to use to determine what constitutes an anomaly. Happy Splunking!

Using Stats in Splunk Part 1: Basic Anomaly Detection (2024)

References

Top Articles
Dexknows Find A Person
Courtney Miller (comedian) - Wikitia
#ridwork guides | fountainpenguin
Bj 사슴이 분수
Skycurve Replacement Mat
Login Page
Angela Babicz Leak
Kaydengodly
Exam With A Social Studies Section Crossword
Txtvrfy Sheridan Wy
America Cuevas Desnuda
Sportsman Warehouse Cda
Aiken County government, school officials promote penny tax in North Augusta
King Fields Mortuary
Mikayla Campino Video Twitter: Unveiling the Viral Sensation and Its Impact on Social Media
Dark Souls 2 Soft Cap
Items/Tm/Hm cheats for Pokemon FireRed on GBA
Dusk
My.doculivery.com/Crowncork
Leeks — A Dirty Little Secret (Ingredient)
Gemita Alvarez Desnuda
Sussyclassroom
Best Transmission Service Margate
Panola County Busted Newspaper
2021 MTV Video Music Awards: See the Complete List of Nominees - E! Online
fft - Fast Fourier transform
Jailfunds Send Message
100 Gorgeous Princess Names: With Inspiring Meanings
Gncc Live Timing And Scoring
Imagetrend Elite Delaware
Salemhex ticket show3
Used Safari Condo Alto R1723 For Sale
Craigslist Free Puppy
Prima Healthcare Columbiana Ohio
Unity Webgl Player Drift Hunters
Best Workers Compensation Lawyer Hill & Moin
Boggle BrainBusters: Find 7 States | BOOMER Magazine
Aurora Il Back Pages
Union Corners Obgyn
Panorama Charter Portal
Gfs Ordering Online
10 Rarest and Most Valuable Milk Glass Pieces: Value Guide
Best Restaurants West Bend
Uc Davis Tech Management Minor
Ucla Basketball Bruinzone
Caesars Rewards Loyalty Program Review [Previously Total Rewards]
Craiglist.nj
Is TinyZone TV Safe?
10 Bedroom Airbnb Kissimmee Fl
Puss In Boots: The Last Wish Showtimes Near Valdosta Cinemas
Tweedehands camper te koop - camper occasion kopen
Basic requirements | UC Admissions
Latest Posts
Article information

Author: Gov. Deandrea McKenzie

Last Updated:

Views: 5397

Rating: 4.6 / 5 (46 voted)

Reviews: 85% of readers found this page helpful

Author information

Name: Gov. Deandrea McKenzie

Birthday: 2001-01-17

Address: Suite 769 2454 Marsha Coves, Debbieton, MS 95002

Phone: +813077629322

Job: Real-Estate Executive

Hobby: Archery, Metal detecting, Kitesurfing, Genealogy, Kitesurfing, Calligraphy, Roller skating

Introduction: My name is Gov. Deandrea McKenzie, I am a spotless, clean, glamorous, sparkling, adventurous, nice, brainy person who loves writing and wants to share my knowledge and understanding with you.