exploratory data analysis Exploratory Data Analysis in R (introduction) Exploratory data analysis (EDA) the very first step in a data project. We will create a code-template to achieve this with one function.

machine learning Introduction to Machine Learning for non-developers About Machine Learning We all know that machine learning is about handling data, but it also can be seen as: The art of finding order in data by browsing its inner information. Some

rstats Data discretization made easy with funModeling tl;dr: Convert numerical variables into categorical, as it is shown in the next image. ⏳ Reading time ~ 6 min. Let's start! The package funModeling (from version > 1.6.6) introduces two functions—

R xray: The R Package to Have X Ray Vision on your Datasets This package lets you analyze the variables of a dataset, to evaluate how the data is shaped. Consider this the first step when you have your data for modeling, you can use this

data science Data Science Live Book (open source) ~ new big release! 200-pages Well after some time, and +300 commits, this is the biggest release of the Data Science Live Book! (open source), after the first publication more than 1 year ago :) tl;dr: Hi there!

R Data Science Live Book (open source) Hi! Well finally there is the first release of this project: A open source book which will hopefully contain some useful resources for those who want to learn some data analysis/machine learning.

R Time Series Analysis Using Max/Min... and some Neuroscience. Introduction Time series have maximum and minimum points as general patterns. Sometimes the noise present on it causes problems to spot general behavior. In this post, we will smooth time series -reducing noise-

R Package funModeling: data cleaning, importance variable analysis and model performance ![Crossplot](http://datascienceheroes.com/img/blog/09_cross_plot_5.PNG) POST UPDATE 09/24/2016 Good news! funModeling documentation evolved into an open source book! Please follow the link below Jump to

R Recommendation Systems in R These systems are used in cross-selling industries, and they measure correlated items as well as their user rate. This last point wasn't included the apriori algorithm (or association rules), used in market basket

R {Long Vs. Wide} Data Frames Introduction This is an excellent resource to understand 2 types of data frame format: Long and Wide. Just take a look at figure 1 inside the article Long format: ggplot2 needs in certain

R Introduction to automatic machine learning Introduction "I want to develop a model that automatically learns over time", a really challenging objective. We'll develop in this post a procedure that loads data, build a model, make predictions

R Data Science - Short lesson on cluster analysis Introduction In clustering you let data to be grouped according to their similarity. A cluster model is a group of segments -clusters- containing cases (such as clients, patients, cars, etc.). Once a cluster

R Dynamic analysis on outliers Treating outliers Introduction Outliers are the extreme values that a variable has, depending on the model or requirement, it could be necessary to treat them, either transforming or deleting. Variable “Income”