I frequently challenge my sales team to give customer presentation without using any word or acronym that cannot be found in a standard school dictionary.
Among other things, one of the interesting contribution of IT industry has been the glut of Jargon.
Over the years we have specialized in creating acronyms, abbreviations and jargons to extent that an IT presentation sounds almost incomprehensible to un-initiated.
On top of jargon charts today is the term ILM or Information Lifecycle Management. It is a very simple concept to understand and implement – but maybe we have complicated it much beyond its simple origin.
ILM is no different from the housekeeping that we do in our home and office everyday. Our work-table gets cluttered regularly with papers and documents. Periodically, we clean it up by trashing (deleting) documents that we no longer need or by filing documents in cupboard or a warehouse (archiving). It is easy to note that this cluttering and cleanup is irrespective of the size of the table – just simple human nature. Generally, we will not exercise the option of buying a bigger table just to avoid the cleanup even if the cost of table is drastically reduced – because to search something useful will definitely require more effort (processing power) and it will require more physical space which is anyway expensive.
Something which is obvious and easy in real life should not look too difficult in world of digital data. Obviously the difference is of size and magnitude. The reducing price of physical hard-disk has made it an easier decision to store data rather to decide what to Store. And as we store more and more, this decision becomes more and more difficult.
So how does one start cleaning up?? Any ILM consultant will tell you that the first step to ILM is to categorize the data and I will agree. But it is easier said than done. The easiest way to categorize the data is to do it at time of creation – but very difficult to do for existing sea of data (though tools do exist to do it).
For those looking for immediate benefits, I will suggest a different approach. Rather than trying to take a holistic view of all data – check out the applications where data growth is fastest and data categorization is easiest and try to solve that issue first.
I will put three applications in that category in order of attractiveness and ease of implementation.
1. e-mail: No other application eats up storage space like e-mail. It is almost a virus. One attachment of 1MB gets multiplied into hundred thousand copies in no time by inadvertent use of “reply all”. IT in some”organizations has hit upon an easy solution - limiting the size of the mailbox on server thus forcing the employees to keep a local copy of most of the e-mails. While this action saves the enterprise disk, it gravely exposes the organization to litigation. The local copies are not protected. In an environment, where most approvals and communication happen on e-mail, keeping most of that correspondence in employee machine is corporate hara-kiri. Legislations like Serbane Oxley mandates upto 20 years of imprisonment for organizations’ incapability of keeping the necessary documentation safe for long periods of time. I do not even know if IT shares the implications with CEO before implementing such policies. The ideal solution is to regularly shift bulk of the email to a cheaper storage (digital akin of a large cupboard). This cheap storage is easily available. However couple of things are important for this shifting of e-mail - all of which are pretty easy:
a) User transparency: Ideally the user should not know that a particular email has shifted. The way he tries to access a particular mail should remain same – only the time in which he actually gets it should vary.
b) De-duplication of attachments: There is no point wasting precious space in storing multiple copies of same attachment. We should be able to store just one copy and make it look like multiple copies.
c) Automatic Categorization: It is easy to categorize emails as each email contain all relevant information – date of creation, who created, when it was last accessed, size etc. All available e-mail archival solutions can easily do all this for e-mail apps like Lotus Notes and MS Exchange, making e-mail as the single most attractive candidate for ILM. Both IBM and Veritas have ready solutions for this.
2. User Files: Everyone create files on daily basis. These files can be documents, presentations, worksheets, Drawings, artworks designs etc. And we modify these files often and create new version without deleting the older ones. The concept of “file servers” to provide a common shared storage for files is perhaps the oldest server concept in IT. Organization again enforces some discipline by implementing a quota system forcing users to store files locally. No different from what we discussed in e-mail. Again, it is very easy to implement a solution to migrate files to cheaper storage. While independent files do not contain as comprehensive information tags like e-mail but sufficient to achieve a very high level of automated migration. IBM Hierarchical Storage Manager (HSM) and IBM SAN File System are ideal candidates.
3. Database Records: Databases or the most critical data stores in any organization. Unfortunately, Database were never designed for size. So as the database grows, the performance come down. One would like to shift individual records to cheaper storage. Again selecting the records will not be tough as the records are more or less categorized already. For long, there were only very few and expensive applications to achieve this. But now the problem has been significantly simplified as database vendors like DB2 and Oracle have come out with their own archiving tools.
While this is my view of the order in which ILM deployment should be prioritized, any other order will serve as well. One should be very careful however that this application based approach does not replace the consultative approach to ILM – which you will like to consider for a long term company wide ILM. It is simply a choice between solving 70% of problem NOW or 100% of problem over a longer period of time.
Whatever approach you take, a realization that there is a drain of resources in managing large amount of largely inactive data is the first step. As a professional storage sales guy, I will be happy to sell more storage at anytime – but I know that no customer’s budget can keep up’with a “non-ILMed data growth”. Seems that I have just created a new jargon - NIDG.
The views are my own and may not be necessarily shared by IBM Corporation. To discuss or to see ILM solutions in action, do not hesitate to contact me – firstname.lastname@example.org