May 7, 2024

User Behavior Analytics: Why False Positives are NOT the Problem

4

The axiom “garbage in, garbage out” has been around since the early days of computer science and remains apropos today to the data associated with user behavior analytics and insider risk management (IRM).

During a recent Conversations from the Inside (CFTI) episodeMohan Koo, DTEX President and Co-Founder, spoke about how organizations are often quick to blame the tools for not living up to expectations. By and large, the real issue isn’t the tech or the analytics, he said, but the mismatch between the questions being asked against the data being questioned.

During that same episode of CFTI, Howard Holton, GigaOm CTO, discussed the fundamental importance of having the right data to begin with. He reflected on the evolution of UEBA, highlighting the shift away from “hoovering” up as much data as possible, throwing it in a data lake and running analytics over the top.

Focusing on more data collection without any explicit understanding of its connection to delivering meaningful outcomes and expecting it to sing is asinine. What results is a mountain of false positives and analysts suffering from “alert fatigue.”

Unfortunately, it took years for organizations to realize that this approach was the failure, and not the mountain of false positives. The answers being sought weren’t being asked of the right data, and the right data wasn’t in the data lake to produce the answers.

“Flip the tables”

Koo observed that what needs to happen (and readers should take this as a recommendation) is to be “laser focused on a very few use cases that need solving that aren’t solvable today.” It is important not to try and boil the ocean, but rather focus on a very small, finite number of use cases that need solving.

Once you determine the first use case, Koo suggests reverse engineering from the outcomes that are being sought. In doing so, organizations can identify the minimum amount of high-quality data that is needed to ingest to answer that specific use case.

Once this has been achieved, the next step is to ask the next question, ingest small sets of well-structured quality data, making sure it is clean and not missing any pieces of the puzzle. “Then and only then have you flipped the table from looking at a mountain of false positives to a small collection of action items that require attention to determine if risk exists.”

The Big 3: Considerations for Big Data

Larger enterprises (Fortune 10, 50, 100) have resources that may dwarf many, save for a government. Their lessons in managing big data and risk are being consumed by all, and now those with less resources can and are learning how to ask the right questions for desired outcomes.

For example, the use case to determine malicious from non-malicious insider risk is broad, as it may require determining if data was lost or if data was stolen. The distinction may provide perspective and direction. If we then narrow the equation down even further and identify the type of data that is being lost or stolen, we can then lay behavior analytics over the top and detect for those items.

The key, according to Koo: “Anytime you are working with big data, there are three things you need to understand.”

  1. Ask the right question. The right question determines the quality of the answer. This means understanding what you are looking to determine from the query.
  2. Understand the data. You need to understand your data well enough to apply the question, “Do I have the right data for that question? Is my data properly positioned for those questions?”
  3. Use only quality dataQuality of data is paramount. The data must be clean and well-structured to consistently answers the questions being asked.

Failure to address these three points will inevitably result in the answer returning a blanket “no” because the data does not support the query. In this case, the recourse is to either change the query or change the data. Success is measured when one realizes that the question being asked is too specific for the data to contain the desired answers.

Quality Questions Asked of Quality Data = Outcomes

The goal, according to Koo and Holton, is to ensure you have a complete understanding of your data sources and data points. What is being collected, why is the information being collected, and what questions are being answered? No longer will there be a “data lake” of information that was collected just because it was there.

This allows the IRM team to use the data to quantify risk, and to indicate whether a risk exists and to refute a claim of misbehavior.

Start out with five use cases, not 500, that fall within the organization’s risk tolerance. Zero tolerance for risk? No worries, but you should probably shutter your doors. If this is the answer, it’s time to “start turning things” and have leadership come to the table to glean an honest answer. From there, understand the most important issues involving the behavior of employees and contractors, and what can be measured, analyzed, and evaluated using the available data. Afterall, the goal is to understand and mitigate risk – not get caught up in the circle of “garbage in, garbage out”.

DTEX’s Conversations from the Inside is now on Spotify! Tune in to this original episode on the go or watch the full replay with Mohan Koo and Howard Holton.

Subscribe today to stay informed and get regular updates from DTEX Systems