Data Science Heroes Blog
  • Blog
  • Twitter
  • Datos en R (Spanish)

machine learning

A collection of 16 posts

data science

Tips before migrating to a newer R version

A summary of common problems that my colleagues and I had when migrating R / packages to newer version.

  • Pablo Casas
    Pablo Casas
4 min read
machine learning

How to use `recipes` package from `tidymodels` for one hot encoding 🛠

Quick introduction to `recipes` package, from the `tidymodels` family, based on one hot encoding. Useful to automatize some data preparation tasks.

  • Pablo Casas
    Pablo Casas
6 min read
shap

A gentle introduction to SHAP values in R

Opening the black-box in complex models: SHAP values. What are they and how to draw conclusions from them? With R code example!

  • Pablo Casas
    Pablo Casas
5 min read
data preparation

New discretization method: Recursive information gain ratio maximization

This method can discretize a variable taking into consideration the target variable, similar to what decision tree do but with gain ratio.

  • Pablo Casas
    Pablo Casas
3 min read
machine learning

Sample size and class balance on model performance

Analyzing the relationship between the sample size and how it impacts on the accuracy in a classification model

  • Pablo Casas
    Pablo Casas
5 min read
exploratory data analysis

Exploratory Data Analysis in R (introduction)

Exploratory data analysis (EDA) the very first step in a data project. We will create a code-template to achieve this with one function.

  • Pablo Casas
    Pablo Casas
5 min read
machine learning

Introduction to Machine Learning for non-developers

About Machine Learning We all know that machine learning is about handling data, but it also can be seen as: The art of finding order in data by browsing its inner information. Some

  • Pablo Casas
    Pablo Casas
4 min read
rstats

Data discretization made easy with funModeling

tl;dr: Convert numerical variables into categorical, as it is shown in the next image. ⏳ Reading time ~ 6 min. Let's start! The package funModeling (from version > 1.6.6) introduces two functions—

  • Pablo Casas
    Pablo Casas
6 min read
R

xray: The R Package to Have X Ray Vision on your Datasets

This package lets you analyze the variables of a dataset, to evaluate how the data is shaped. Consider this the first step when you have your data for modeling, you can use this

  • Pablo Seibelt
    Pablo Seibelt
3 min read
data science

Data Science Live Book (open source) ~ new big release! 200-pages

Well after some time, and +300 commits, this is the biggest release of the Data Science Live Book! (open source), after the first publication more than 1 year ago :) tl;dr: Hi there!

  • Pablo Casas
    Pablo Casas
3 min read
R

Time Series Analysis Using Max/Min... and some Neuroscience.

Introduction Time series have maximum and minimum points as general patterns. Sometimes the noise present on it causes problems to spot general behavior. In this post, we will smooth time series -reducing noise-

  • Pablo Casas
    Pablo Casas
4 min read
R

Recommendation Systems in R

These systems are used in cross-selling industries, and they measure correlated items as well as their user rate. This last point wasn't included the apriori algorithm (or association rules), used in market basket

  • Pablo Casas
    Pablo Casas
1 min read
R

{Long Vs. Wide} Data Frames

Introduction This is an excellent resource to understand 2 types of data frame format: Long and Wide. Just take a look at figure 1 inside the article Long format: ggplot2 needs in certain

  • Pablo Casas
    Pablo Casas
1 min read
R

Introduction to automatic machine learning

Introduction "I want to develop a model that automatically learns over time", a really challenging objective. We'll develop in this post a procedure that loads data, build a model, make predictions

  • Pablo Casas
    Pablo Casas
5 min read
R

Data Science - Short lesson on cluster analysis

Introduction In clustering you let data to be grouped according to their similarity. A cluster model is a group of segments -clusters- containing cases (such as clients, patients, cars, etc.). Once a cluster

  • Pablo Casas
    Pablo Casas
3 min read
R

Dynamic analysis on outliers

Treating outliers Introduction Outliers are the extreme values that a variable has, depending on the model or requirement, it could be necessary to treat them, either transforming or deleting. Variable “Income”

  • Pablo Casas
    Pablo Casas
2 min read
Data Science Heroes Blog © 2025
Latest Posts Facebook Twitter Ghost