--

CRISP-DM

Hello, this is my first English Medium article. I want to explain the CRISP-DM methodology which is frequently used in data science projects. I hope it will be a nice and instructive article. If you are ready, let’s get started! :)

Canva

CRISP-DM stands for Cross-Industry Standard Process for Data Mining. It is a widely used framework for data science projects that consists of six phases: Business Understanding, Data Understanding, Data Preparation, Modeling, Evaluation, and Deployment. In this blog post, I will explain what each phase and how it can help you achieve your data science goals.

Source

Business Understanding (What does the business need?): This is the first and most important phase of any data science project. Because if you want to create a good project, you must understand the customer needs. This phase involves defining the problem, the objectives, the success criteria, requirements and the scope of the project. You need to understand the business context, the stakeholders, and the expected outcomes of your analysis. You also need to identify the data sources and the resources available for the project.

Source

Data Understanding (What data do we have / need? Is it clean?): This phase involves collecting, exploring, and describing the data. You need to check the quality, the completeness, and the relevance of the data. You also need to perform some exploratory data analysis (EDA) to gain insights, identify patterns, and discover anomalies in the data. You can use various techniques such as summary statistics, visualizations, and correlation analysis to understand your data better.

Missing values example

Data Preparation (How do we organize the data for modeling?): The majority of 80% of a project’s work is dedicated to data preparation. This phase involves transforming, cleaning, and integrating the data. You need to deal with missing values, outliers, duplicates, and inconsistencies in the data. You also need to perform some feature engineering, such as creating new variables, encoding categorical variables, scaling numerical variables, and reducing dimensionality. You can use various tools such as pandas, scikit-learn, and TensorFlow to prepare your data for modeling.

Accuracy example

Modeling (What modeling techniques should we apply?): This phase involves selecting, building, and testing different models. You need to choose the appropriate algorithms, techniques, and parameters for your problem. You also need to train, validate, and compare different models using various metrics such as accuracy, precision, recall, F1-score, ROC curve, and AUC. You can use various libraries such as scikit-learn, TensorFlow, PyTorch, and XGBoost to create and evaluate your models.

Example of the testing model with new data

Evaluation (Which model best meets the business objectives?): This phase involves assessing the performance and the robustness of your models. You need to check if your models meet the business objectives and the success criteria. You also need to test your models on new or unseen data to ensure their generalizability. You can use various methods such as cross-validation, bootstrapping, and sensitivity analysis to evaluate your models.

Example of the deploying models with Streamlit

Deployment (How do stakeholders access the results?): This is the final phase of any data science project. It involves deploying your models into production or delivering your results to the stakeholders. You need to ensure that your models are reliable, scalable, and maintainable. You also need to monitor and update your models regularly to account for changes in the data or the environment. You can use various platforms such as AWS, Azure, Google Cloud Platform, Streamlit or Heroku to deploy your models.

While writing this article, I used Bing artificial intelligence bot, https://www.datascience-pm.com/crisp-dm-2/, https://en.wikipedia.org/wiki/Cross-industry_standard_process_for_data_mining,https://www.ibm.com/docs/en/spss-modeler/saas?topic=dm-crisp-help-overview resources.

Thanks for reading my article! Take care, see you later :)

Sign up to discover human stories that deepen your understanding of the world.

Free

Distraction-free reading. No ads.

Organize your knowledge with lists and highlights.

Tell your story. Find your audience.

Membership

Read member-only stories

Support writers you read most

Earn money for your writing

Listen to audio narrations

Read offline with the Medium app

--

--

Şevval Özlem ÇARKIT
Şevval Özlem ÇARKIT

Written by Şevval Özlem ÇARKIT

BAU Computer Engineering'23 | Jr Data Engineer @Turkish Technology ✈️ | You can contact me: linkedin.com/in/sevval-ozlem-carkit

Responses (1)

Write a response