data-science-live-book funModeling: New site, logo and version 🚀 funModeling is focused on exploratory data analysis, data preparation and the evaluation of models. Check the latest functions and website here :)
data science Tips before migrating to a newer R version A summary of common problems that my colleagues and I had when migrating R / packages to newer version.
Python How Auth0’s Data Team uses R and Python Auth0 Data Team shares their tooling, from R to Python, their favourite open-souce libraries for data science and data engineering 🛠
data cleaning Automatic data types checking in predictive models Given certain data, and we need to create models (xgboost, random forest, regression, etc). Each one of them has its constraints regarding data types. Errors are not clear, here's a new function to speed up model creation.
data preparation Fast data exploration for predictive modeling Before predictive model creation, we need to check/change numerical, categorical, NAs, one unique value and high cardinality variables. This new function will assist us in this task.
machine learning How to use `recipes` package from `tidymodels` for one hot encoding 🛠 Quick introduction to `recipes` package, from the `tidymodels` family, based on one hot encoding. Useful to automatize some data preparation tasks.
R redshiftTools v1.0.0 - CRAN Release! A new version of the package redshiftTools has arrived with improvements and it's now available in CRAN! This package let's you efficiently upload data into an Amazon Redshift database using the approach recommended
shap A gentle introduction to SHAP values in R Opening the black-box in complex models: SHAP values. What are they and how to draw conclusions from them? With R code example!
data preparation New discretization method: Recursive information gain ratio maximization This method can discretize a variable taking into consideration the target variable, similar to what decision tree do but with gain ratio.
R Feature Selection using Genetic Algorithms in R From a gentle introduction to a practical solution, this is a post about feature selection using genetic algorithms in R.
tibble How to apply a function to a matrix/tibble Scenario: we got a table of id-value, and a matrix/tibble that contains the id, and we need the labels. It may be useful when predicting the Key (or Ids) of in a
deep-learning How to create a sequential model in Keras for R This tutorial will introduce the Deep Learning classification task with Keras. With focus on one-hot encoding, layer shapes, train & model evaluation.
machine learning Sample size and class balance on model performance Analyzing the relationship between the sample size and how it impacts on the accuracy in a classification model
bookdown How to self publish a book: customizing Bookdown tl;dr: This post is related to How to self-publish a book: A handy list of resources. It's centered around Bookdown and some non-standard customizations I found useful to create the Data Science
bookdown How to self-publish a book: A handy list of resources tl;dr: A list of useful resources aimed to self-publish a book on Amazon using Bookdown.
exploratory data analysis Exploratory Data Analysis in R (introduction) Exploratory data analysis (EDA) the very first step in a data project. We will create a code-template to achieve this with one function.
machine learning Introduction to Machine Learning for non-developers About Machine Learning We all know that machine learning is about handling data, but it also can be seen as: The art of finding order in data by browsing its inner information. Some
data-science-live-book Data Science Live Book available at Amazon! Hi there! tl;dr: The Data Science Live Book is now available at Amazon! Kindle & Paperback versions! 🚀 👉 See at Amazon 📗! Link to the black & white version, also available on full-color. It
rstats Exploratory Data Analysis & Data Preparation with 'funModeling' funModeling quick-start This package contains a set of functions related to exploratory data analysis, data preparation, and model performance. It is used by people coming from business, research, and teaching (professors and students)
R A comprehensive guide to connect R to Amazon Redshift Amazon Redshift is one of the hottest databases for Data Warehousing right now, it's one of the most cost-effective solutions available, and allows for integration with many popular BI tools. Unfortunately, the status
rstats Data discretization made easy with funModeling tl;dr: Convert numerical variables into categorical, as it is shown in the next image. ⏳ Reading time ~ 6 min. Let's start! The package funModeling (from version > 1.6.6) introduces two functions—
R xray: The R Package to Have X Ray Vision on your Datasets This package lets you analyze the variables of a dataset, to evaluate how the data is shaped. Consider this the first step when you have your data for modeling, you can use this
data science Data Science Live Book (open source) ~ new big release! 200-pages Well after some time, and +300 commits, this is the biggest release of the Data Science Live Book! (open source), after the first publication more than 1 year ago :) tl;dr: Hi there!
clustering Playing with dimensions: from Clustering, PCA, t-SNE... to Carl Sagan! Playing with dimensions Hi there! This post is an experiment combining the result of t-SNE with two well known clustering techniques: k-means and hierarchical. This will be the practical section, in R. But