What is CRISMP-DM and why does it matter?
CRISPM-DM is short for the Cross-Industry Standard Process for Data Mining. This framework to formalize the process of Data Mining was developed almost 30 years ago, and it is the predecessor to current buzz words such as Data Science or Data Analytics. However, it is still the single most adopted model in Data Analytics today and has served as the foundation for many of the other commonly used process models.
Why has it remained relevant after all these years?
Well, that is an easy answer. Today it is even more important to have a documented process developed specifically for data analytics. There has been a proliferation of commercially available software programs specifically design for analytics. These programs have all been designed to make the actual data preparation and model building process significantly easier. What they do not addresses are all the steps that are required prior to uploading data. Additionally, the CRISP-DM framework is a vendor agnostic approach based on the practical experience of a consortium representing major organizations and industry professionals with a commitment data discovery.
Some common misconceptions!
CRISP-DM, as eluded to above, is just a solid guide built on a practical and almost intuitive process. It is what I would refer to as a formalization of a logical process that needs to be followed for a successful data mining project.
What it is not is a software program, a data gathering tool or a project management system. It does not provide recommended modern tools, examples, or case studies. It is simply a guide, a detailed and well documented guide, but a guide non the less.
So how do we integrate the process into our own Data Analytics initiatives?
My goal here is not to regurgitate the actual text of the process as there is ample documentation (we have included a full version of the CRISP-DM 1.0 documentation on our website) and commentary discussing the CRISP-DM. We will spend the next few posts diving into each of the steps individually outlined below and provide some real-life examples of tools to help bring this framework to life.
The CRISP-DM Process
- Business Understanding: Defining the scope of the potential project and converting the objective into a data analytical process
- Data Understanding: Initial data review to determine the completeness, integrity, and validity of the data in relationship to the stated objective.
- Data Preparation: Transforming the data into a format that can be easily modeled for analytics
- Modeling: Applying a model or a series of models based on the data available.
- Evaluation: Selecting one or more models developed to optimize the analytical value ensuring the processes is consistent with the business objectives.
- Deployment: Moving the optimized model into production mode.