A leading e-commerce enterprise facilitates high volume of transactions, supported by its global infrastructure of servers and applications. The ability to observe the health of these systems is essential for business continuity. It is important to architect and instrument these systems with reliable Logging, Monitoring and Alerting (LMA).
Biarca conducted a comprehensive study of the client’s existing logging infrastructure and proposed several critical changes to build in resiliency. Biarca also proposed standardized log formats and introduced audit logging to enable effective use of analytical tools. Following the client’s buy in, these enhancements are implemented in production.
Biarca’s infrastructure enhancements proved invaluable in the client’s efforts to reliably identify critical events like security hacks and system outages.
The Challenge
The client’s logging environment had thousands of source servers sending logs to syslog-ng collectors. The logging infrastructure had evolved over time, which led to some of the following shortfalls:
- Loss of data as log collectors were unable to handle the volume and velocity of incoming traffic of logs
- Varied log formats, making it impossible to seamlessly forward logs to analytical tools
- No guaranteed delivery because sources were sending logs to collectors using UDP format, which does not guarantee delivery
- Uneven distribution of logs across log collectors
- Possible security breaches since there is no mechanism for detecting unauthorized access
The client needed a robust logging infrastructure that could support reliable identification of unauthorized access and outages, with no packet loss.
Customized Solution from Biarca
Biarca conducted a thorough study of the client’s syslog-ng environment and worked closely with the client’s engineers to implement a solution that led to the following results:
- Achieved resiliency by replacing virtual machine based log collectors with redundant, geographically dispersed, bare-metal-servers.
- Improved log analytics data ingestion quality by standardizing the log format and simplifying internal log handling across the enterprise.
- Enhanced reliability and extensibility in source to log collector communication by leveraging the benefits of TCP (vs UDP), tagging, DNS names and network ports.
- Balanced load across log collectors to avoid data loss.
- Enabled visibility into security hacks by implementing an audit log.
- Stress-tested the final solution to ensure that there was no data loss.
Our solution was template-driven and designed for orderly scale-out to syslog collection points across the data center. It helped to minimize the data-loss through the scale-out period. We further enhanced the solution through integration with a SaaS-based analytics and machine-learning service to ignore false positives while correctly identifying threatening warn conditions and escalating them proactively.
Customer Value Proposition
The client obtained the following tangible benefits from this project:
- Dependable logging base for analytics tool to identify system hacks and outages
- Smooth migration to redundant, high-capacity, bare-metal-server based log collectors
- Resilient and efficient logging mechanism with no packet loss
In the meantime, if you are looking for Cloud Services — DevOps, CICD, Kubernetes, Cloud Native Application and Infrastructure Modernization please contact us.
Written by Team Biarca