History: 1) Tale, Story. 2) A branch of knowledge that records and explains past events. 3) An established record.
In IT, we are accountable for our performance. If something goes wrong, we can’t blame our tools, and we can’t blame our resources. You would never blame the shovel for not digging a hole deep enough. It’s not the shovel’s fault. It’s our choice on which shovel we choose to use, how much effort we put into digging the hole—if a shovel is even the right tool? The shovel can’t make these decisions, but we are accountable for the outcome.
So how do we know if we are making the right decisions? Every day we work hard. We focus on what is in front of us. That is why history is so important. If a catastrophe should occur, we need to
a) identify what the outcome of the catastrophe was
b) collect evidence surrounding the catastrophic event
c) research and debate the evidence to realize the cause, and
d) build a plan to assure the catastrophic event does not occur again.
Many of us are accountable only to ourselves for audits, but many are also accountable to governing authorities. You are required to hold history so that evidence can be reviewed and decisions made if something unfortunate occurs.
When we relate this to the performance of our systems and networks, not including our records of traps and syslogs, we are self-governing on these points.
“I want all the history!” OK, this might be a bit much.
“I don’t think I need more than a day.” OK, this may be too little.
It depends on your application of the tools and the environment you work in. We have to challenge ourselves with a few key questions:
- Will I be audited? (straight off, this will define a window of time that you need to keep history and at what collection rate)
- Performance Bottlenecks: How far back would I need to review performance data to understand a trend?
- Security Compromise: How far back would I need to review logs to understand what led up to a security event?
- Hardware Failure: If the event is a hardware failure, how far back do you need to review logs to clarify the source of the failure?
As a strategist, I’m often asked if we should toss the history out and start with a new fresh database for the monitoring product or push through and try to retain the history. My answer is always, “If the data is not at risk or abused, keep it.” Here’s the catch though, data is only as good as you are intentional with it. Be intentional with what you bring into your monitoring. These tools should be the heartbeat of your IT organization. They help scale your investments and enable you to drive excellence in your organization to your customers.
If the database is corrupt for whatever reason, archive your database for reference and build anew. Start over. However, take measures to assure integrity to your database and the historicals you are archiving. Your history can represent the difference between stumbling through your day or being intentional and purposeful—hitting all your roadmap goals and achieving and driving excellence.
Let’s go be amazing! Wisdom + Data + Effort = Excellence!
Jason Henson | Global Director of Technical Solutions