R A comprehensive guide to connect R to Amazon Redshift Amazon Redshift is one of the hottest databases for Data Warehousing right now, it's one of the most cost-effective solutions available, and allows for integration with many popular BI tools. Unfortunately, the status
rstats Data discretization made easy with funModeling tl;dr: Convert numerical variables into categorical, as it is shown in the next image. ⏳ Reading time ~ 6 min. Let's start! The package funModeling (from version > 1.6.6) introduces two functions—
R xray: The R Package to Have X Ray Vision on your Datasets This package lets you analyze the variables of a dataset, to evaluate how the data is shaped. Consider this the first step when you have your data for modeling, you can use this
data science Data Science Live Book (open source) ~ new big release! 200-pages Well after some time, and +300 commits, this is the biggest release of the Data Science Live Book! (open source), after the first publication more than 1 year ago :) tl;dr: Hi there!
clustering Playing with dimensions: from Clustering, PCA, t-SNE... to Carl Sagan! Playing with dimensions Hi there! This post is an experiment combining the result of t-SNE with two well known clustering techniques: k-means and hierarchical. This will be the practical section, in R. But
R Shiny Chart Builder - Explore your database with a point-and-click interface I have a new year's surprise for you! This shiny app means to be a system for basic reporting in the style of most Business Intelligence tools, you can create a report without
R Authentication Proxy on Shiny Open Source A year ago i wrote about a way to authenticate shiny with Auth0, using Apache: http://blog.datascienceheroes.com/adding-authentication-to-shiny-open-source-edition/ This method works but has some issues, Sebastian Peyrott has written an excellent
data science Model Performance in Data Science Live Book Hi there! I decided to almost re-write the model validation section since it didn't reflect real case scenarios. Hopefully in the two new chapters you will gain a deeper knowledge on methodological aspects
data science Data Science Live Book - Scoring, Model Performance & profiling - Update! This update contains a new chapter -scoring- which is related to model performance and model deployment, used when predicting a binary outcome. Link to the scoring chapter. Important: To use following updates please
R Time Series Analysis Using Max/Min... and some Neuroscience. Introduction Time series have maximum and minimum points as general patterns. Sometimes the noise present on it causes problems to spot general behavior. In this post, we will smooth time series -reducing noise-
R How to bulk upload your data from R into Redshift Amazon's columnar database, Redshift is a great companion for a lot of Data Science tasks, it allows for fast processing of very big datasets, with a familiar query language (SQL). There are 2
R Anomaly Detection in R Introduction Inspired by this Netflix post, I decided to write a post based on this topic using R. There are several nice packages to achieve this goal, the one we´re going to
R Text Mining Analysis: some theory and practice in R Introduction Big Data help us to analyze unstructred data (aka "text" ), with many techniques, in this post it is presented one: Cosine Similarity. There are also other analysts work, who scraped
R Adding Authentication to Shiny Open Source Edition Shiny Server is a great solution for BI/analytics reporting. It leverages the power of the R language to create interactive reports/dashboards. May be you have tried it but are reclutant to
R Recommendation Systems in R These systems are used in cross-selling industries, and they measure correlated items as well as their user rate. This last point wasn't included the apriori algorithm (or association rules), used in market basket
R {Long Vs. Wide} Data Frames Introduction This is an excellent resource to understand 2 types of data frame format: Long and Wide. Just take a look at figure 1 inside the article Long format: ggplot2 needs in certain
R Introduction to automatic machine learning Introduction "I want to develop a model that automatically learns over time", a really challenging objective. We'll develop in this post a procedure that loads data, build a model, make predictions
R Data Science - Short lesson on cluster analysis Introduction In clustering you let data to be grouped according to their similarity. A cluster model is a group of segments -clusters- containing cases (such as clients, patients, cars, etc.). Once a cluster
EU Life Quality Geo Report Living longer, living better? It's equally important to measure the longer living as well as its quality. Analyzing data from [eurostat](http://ec.europa.eu/eurostat/publications/recently-published?p_auth=ZKofrOKp&p_p_
R Dynamic analysis on outliers Treating outliers Introduction Outliers are the extreme values that a variable has, depending on the model or requirement, it could be necessary to treat them, either transforming or deleting. Variable “Income”
R Forecasting the Argentinian "Blue Dollar" If you have visited recently my website, Bluelytics, you will notice there is a new section named "Predicción", which is a forecast of the value of the Blue Dollar in a
Python Scraping data from the central bank of Argentina Today i'm going to show you an example of data scraping with the BCRA, which is the central bank of Argentina. On this website, we have a section called "Estadisticas e indicadores&