Data Science Heroes Blog

Data analysis with R

  • Blog
  • Twitter
  • Datos en R (Spanish)
data-science-live-book

funModeling: New site, logo and version 🚀

funModeling is focused on exploratory data analysis, data preparation and the evaluation of models. Check the latest functions and website here :)

  • Pablo Casas
    Pablo Casas
2 min read
data science

Tips before migrating to a newer R version

A summary of common problems that my colleagues and I had when migrating R / packages to newer version.

  • Pablo Casas
    Pablo Casas
4 min read
fastai

SPAM detection using fastai ULMFiT - Part 1: Language Model

Tutorial to fastai ULMFiT model for classification texts (and some of the theory behind it) 🤖📚

  • Pablo Casas
    Pablo Casas
3 min read
Python

How Auth0’s Data Team uses R and Python

Auth0 Data Team shares their tooling, from R to Python, their favourite open-souce libraries for data science and data engineering 🛠

  • Pablo Casas
    Pablo Casas
5 min read
data cleaning

Automatic data types checking in predictive models

Given certain data, and we need to create models (xgboost, random forest, regression, etc). Each one of them has its constraints regarding data types. Errors are not clear, here's a new function to speed up model creation.

  • Pablo Casas
    Pablo Casas
3 min read
data preparation

Fast data exploration for predictive modeling

Before predictive model creation, we need to check/change numerical, categorical, NAs, one unique value and high cardinality variables. This new function will assist us in this task.

  • Pablo Casas
    Pablo Casas
4 min read
machine learning

How to use `recipes` package from `tidymodels` for one hot encoding 🛠

Quick introduction to `recipes` package, from the `tidymodels` family, based on one hot encoding. Useful to automatize some data preparation tasks.

  • Pablo Casas
    Pablo Casas
6 min read
clustering

Jugando con las dimensiones: desde Clustering, PCA, t-SNE.... ¡hasta Carl Sagan!

👉 Actualización! 7/4/20 La nueva versión de este post con mejoras y comentarios sobre UMAP, acá: https://escueladedatosvivos.ai/blog/204650/jugando-con-las-dimensiones-clustering-pca-tsne-carl-sagan Jugando con las dimensiones ¡Hola! Este post es un experimento

  • Pablo Casas
    Pablo Casas
7 min read
R

redshiftTools v1.0.0 - CRAN Release!

A new version of the package redshiftTools has arrived with improvements and it's now available in CRAN! This package let's you efficiently upload data into an Amazon Redshift database using the approach recommended

  • Pablo Seibelt
    Pablo Seibelt
2 min read
libro-vivo-ciencia-datos

Lanzamiento! Libro Vivo de Ciencia de Datos 📗 (open-source)

Finalmente disponible la versión en español del _Data Science Live Book_! El libro se abre sin barreras idiomáticas ante las personas de habla-hispana con ganas de aprender 👨‍🎓👩‍🎓. Esta publicación es una edición revisada tanto en gramática como en aspectos técnicos de la versión en inglés.

  • Pablo Casas
    Pablo Casas
2 min read
shap

A gentle introduction to SHAP values in R

Opening the black-box in complex models: SHAP values. What are they and how to draw conclusions from them? With R code example!

  • Pablo Casas
    Pablo Casas
5 min read
data preparation

New discretization method: Recursive information gain ratio maximization

This method can discretize a variable taking into consideration the target variable, similar to what decision tree do but with gain ratio.

  • Pablo Casas
    Pablo Casas
3 min read
R

Feature Selection using Genetic Algorithms in R

From a gentle introduction to a practical solution, this is a post about feature selection using genetic algorithms in R.

  • Pablo Casas
    Pablo Casas
6 min read
telegram

Integrating R and Telegram

Get notify when an R script finishes on Telegram.

  • Pablo Casas
    Pablo Casas
2 min read
tibble

How to apply a function to a matrix/tibble

Scenario: we got a table of id-value, and a matrix/tibble that contains the id, and we need the labels. It may be useful when predicting the Key (or Ids) of in a

  • Pablo Casas
    Pablo Casas
2 min read
deep-learning

How to create a sequential model in Keras for R

This tutorial will introduce the Deep Learning classification task with Keras. With focus on one-hot encoding, layer shapes, train & model evaluation.

  • Pablo Casas
    Pablo Casas
5 min read
machine learning

Sample size and class balance on model performance

Analyzing the relationship between the sample size and how it impacts on the accuracy in a classification model

  • Pablo Casas
    Pablo Casas
5 min read
bookdown

How to self publish a book: customizing Bookdown

tl;dr: This post is related to How to self-publish a book: A handy list of resources. It's centered around Bookdown and some non-standard customizations I found useful to create the Data Science

  • Pablo Casas
    Pablo Casas
6 min read
bookdown

How to self-publish a book: A handy list of resources

tl;dr: A list of useful resources aimed to self-publish a book on Amazon using Bookdown.

  • Pablo Casas
    Pablo Casas
9 min read
exploratory data analysis

Exploratory Data Analysis in R (introduction)

Exploratory data analysis (EDA) the very first step in a data project. We will create a code-template to achieve this with one function.

  • Pablo Casas
    Pablo Casas
5 min read
rstats

Tutorial instalación R y RStudio

Este tutorial tiene como propósito hacer el set-up inicial para empezar a desarrollar modelos machine learning en increíble lenguaje R.

  • Pablo Casas
    Pablo Casas
4 min read
machine learning

Introduction to Machine Learning for non-developers

About Machine Learning We all know that machine learning is about handling data, but it also can be seen as: The art of finding order in data by browsing its inner information. Some

  • Pablo Casas
    Pablo Casas
4 min read
learning

"I hate math!" - Education and Artificial Intelligence to find a meaning in what we do

Well, what you hate is the way that math was taught to you. That soup of equations, abstractions, and solutions to problems that we don’t know, It's hard to enjoy the things

  • Pablo Casas
    Pablo Casas
5 min read
data-science-live-book

Data Science Live Book available at Amazon!

Hi there! tl;dr: The Data Science Live Book is now available at Amazon! Kindle & Paperback versions! 🚀 👉 See at Amazon 📗! Link to the black & white version, also available on full-color. It

  • Pablo Casas
    Pablo Casas
3 min read
rstats

Exploratory Data Analysis & Data Preparation with 'funModeling'

funModeling quick-start This package contains a set of functions related to exploratory data analysis, data preparation, and model performance. It is used by people coming from business, research, and teaching (professors and students)

  • Pablo Casas
    Pablo Casas
11 min read
Data Science Heroes Blog © 2025
Latest Posts Facebook Twitter Ghost