data-science-live-book funModeling: New site, logo and version 🚀 funModeling is focused on exploratory data analysis, data preparation and the evaluation of models. Check the latest functions and website here :)
data science Tips before migrating to a newer R version A summary of common problems that my colleagues and I had when migrating R / packages to newer version.
Python How Auth0’s Data Team uses R and Python Auth0 Data Team shares their tooling, from R to Python, their favourite open-souce libraries for data science and data engineering 🛠
machine learning Introduction to Machine Learning for non-developers About Machine Learning We all know that machine learning is about handling data, but it also can be seen as: The art of finding order in data by browsing its inner information. Some
R A comprehensive guide to connect R to Amazon Redshift Amazon Redshift is one of the hottest databases for Data Warehousing right now, it's one of the most cost-effective solutions available, and allows for integration with many popular BI tools. Unfortunately, the status
rstats Data discretization made easy with funModeling tl;dr: Convert numerical variables into categorical, as it is shown in the next image. ⏳ Reading time ~ 6 min. Let's start! The package funModeling (from version > 1.6.6) introduces two functions—
data science Data Science Live Book (open source) ~ new big release! 200-pages Well after some time, and +300 commits, this is the biggest release of the Data Science Live Book! (open source), after the first publication more than 1 year ago :) tl;dr: Hi there!
data science Model Performance in Data Science Live Book Hi there! I decided to almost re-write the model validation section since it didn't reflect real case scenarios. Hopefully in the two new chapters you will gain a deeper knowledge on methodological aspects
data science Data Science Live Book - Scoring, Model Performance & profiling - Update! This update contains a new chapter -scoring- which is related to model performance and model deployment, used when predicting a binary outcome. Link to the scoring chapter. Important: To use following updates please
R Anomaly Detection in R Introduction Inspired by this Netflix post, I decided to write a post based on this topic using R. There are several nice packages to achieve this goal, the one we´re going to
R Text Mining Analysis: some theory and practice in R Introduction Big Data help us to analyze unstructred data (aka "text" ), with many techniques, in this post it is presented one: Cosine Similarity. There are also other analysts work, who scraped
R {Long Vs. Wide} Data Frames Introduction This is an excellent resource to understand 2 types of data frame format: Long and Wide. Just take a look at figure 1 inside the article Long format: ggplot2 needs in certain
R Introduction to automatic machine learning Introduction "I want to develop a model that automatically learns over time", a really challenging objective. We'll develop in this post a procedure that loads data, build a model, make predictions
R Data Science - Short lesson on cluster analysis Introduction In clustering you let data to be grouped according to their similarity. A cluster model is a group of segments -clusters- containing cases (such as clients, patients, cars, etc.). Once a cluster