Data Science Heroes Blog
  • Blog
  • Twitter
  • Datos en R (Spanish)

data preparation

A collection of 6 posts

data cleaning

Automatic data types checking in predictive models

Given certain data, and we need to create models (xgboost, random forest, regression, etc). Each one of them has its constraints regarding data types. Errors are not clear, here's a new function to speed up model creation.

  • Pablo Casas
    Pablo Casas
3 min read
data preparation

Fast data exploration for predictive modeling

Before predictive model creation, we need to check/change numerical, categorical, NAs, one unique value and high cardinality variables. This new function will assist us in this task.

  • Pablo Casas
    Pablo Casas
4 min read
machine learning

How to use `recipes` package from `tidymodels` for one hot encoding 🛠

Quick introduction to `recipes` package, from the `tidymodels` family, based on one hot encoding. Useful to automatize some data preparation tasks.

  • Pablo Casas
    Pablo Casas
6 min read
data preparation

New discretization method: Recursive information gain ratio maximization

This method can discretize a variable taking into consideration the target variable, similar to what decision tree do but with gain ratio.

  • Pablo Casas
    Pablo Casas
3 min read
rstats

Exploratory Data Analysis & Data Preparation with 'funModeling'

funModeling quick-start This package contains a set of functions related to exploratory data analysis, data preparation, and model performance. It is used by people coming from business, research, and teaching (professors and students)

  • Pablo Casas
    Pablo Casas
11 min read
rstats

Data discretization made easy with funModeling

tl;dr: Convert numerical variables into categorical, as it is shown in the next image. ⏳ Reading time ~ 6 min. Let's start! The package funModeling (from version > 1.6.6) introduces two functions—

  • Pablo Casas
    Pablo Casas
6 min read
Data Science Heroes Blog © 2025
Latest Posts Facebook Twitter Ghost