« Back to All Blog Posts

The Need for a Dedicated Analytics Platform for SIEM Users

The evolution of data warehousing during the 1990s clearly showed the power of dedicated systems for in-depth analysis. Attempts to analyze data in mission-critical OLTP systems—often running at maximum capacity—were strategically flawed. The explosion of data, particularly unstructured and semistructured data, and the rise of Hadoop have accelerated a secular change in analytic depth, breadth and latency. The rise in security threats has increased the strategic importance of machine data because it is often the primary source of information on the patterns of malicious agents within systems. In addition, a big data infrastructure has become an underlying requirement for security analytic solutions that collect and analyze machine-generated data.

SIEM systems aggregate and correlate diverse sources of machine data to detect and monitor current security threats. SIEMs are similar to OLTP systems in their limited capacity to retain the large volumes of historic data. Most SIEM vendors provide add-on components with a focus on longer-term archival for auditing and compliance reasons, but those components don’t support the flexible and scalable ad-hoc query capability required by many customers. Workarounds often entail directing machine data to Hadoop and related open-source environments for the required in-depth analysis. This in turn requires labor-intensive efforts to organize and manage the array of open source software components for effective, timely analytics. Much of this analysis focuses on daily searches for known threats throughout historic data, often petabytes of it, to quickly determine the first appearance, breadth of system penetration and nature of threat activity. Open source approaches to these requirements generally entail storing all historic data in a dedicated search system. These search results are then provided to data management systems, often Hadoop-based, for quantitative analysis. Additional applications are required to visualize data, provide alerts on significant events and so on. The data interchange between search, analysis and supporting systems is generally a labor- and time-intensive process and often requires redundant data storage. Metaphorically, this is similar to multiple remote controllers for specialized content display devices. Each has slightly different functionality, and the user must integrate these systems.

Machine data often entails complex structures that are difficult to represent comprehensively and consistently. Components of a record may not comply with manually or automatically assigned structural representations. These components may contain critical data for security analysis. For example, a string of text may contain two IP addresses where one is traditionally encountered. The data ingest system thus may extract only one IP address and miss the second, which may be a security threat. Similarly, the complexity of machine data often results in incorrectly assigning or missing the assignment of data within records to fields in structured representations, making such data basically invisible. One customer reported that approximately 30 percent of their SIEM data was not accessible because of incorrect assignment of raw record components to associated fields in structured representations.

X15 Enterprise™ was specifically designed to handle this problem. X15 initially indexes all components of each raw record. This enables searching all data via Lucene syntax. Security analysts can thus quickly find all records with any data component (e.g., IP addresses, user IDs and MAC addresses). During initial data load, X15 simultaneously determines the structure of all records based on manual and automated record parsing rules. The structure of records found during initial search is immediately and seamlessly exposed for in-depth quantitative, pattern and statistical analysis.

Effective security analysis requires a seamlessly integrated solution that leverages the cost effectiveness of Hadoop with the following capabilities:

  • A comprehensive capability to load diverse sources of machine and reference data
  • Indexing and efficient storage of all raw machine data to permit search access to all record contents
  • Seamless quantitative analysis of the search results to determine the nature of the security threat
  • Access to historic machine data to quickly determine first occurrence, breadth of system penetration and activity patterns. This analysis often may involve petabytes of data
  • Rapid response time across the breadth of data for both initial search and quantitative analysis of potentially hundreds of users and queries
  • Ability to scale linearly, leveraging available compute and storage resources
  • Support for SQL and JDBC/ODBC access by analytical tools
  • REST-based API access
  • Real-time alerting and querying
  • Graphical user interfaces for easy location and analysis of data

X15 provides these capabilities in a seamlessly-integrated event data warehouse that complements a SIEM environment. Users can quickly find the needle in the haystack and determine patterns in seconds. All data is comprehensively indexed for search and quantitative analysis upon ingest. Data patterns are automatically discovered with associated metadata dynamically adjusted to eliminate the need to specify rigid schemas upfront. Data structure is exposed as metadata accessible via Postgres-compliant SQL. Data is stored once with no redundancy and significantly compressed.

X15 is based on an MPP architecture that supports scaling to accommodate large data volumes and numbers of users. Security analysts can continuously search hundreds of terabytes of data to quickly find records with the data elements of interest within seconds. They can immediately analyze these data sets to determine the nature of a threat. We have worked closely with a broad spectrum of security analysts and their feedback indicates that X15 provides exactly the paradigm for effective security analysis: easily and rapidly find and analyze all available machine data.