Tuesday, 19 March 2013

Learning-by-doing: my quest to master ggplot2 (part 1)

Monday, 18 March 2013

R, where should I start?

This is a dynamic post which I will continue to update whenever I find something new. Hope you will find the following links useful.

Online Courses for Learning the R language

Free Documentations for Learning the R Language

  1. R for Beginners by Emmanuel Paradis
  2. R Graphics by Paul Murrel
  3. ggplot2 (official documentation)
  4. Advanced R Programming by Hadley Wickham

Online Courses for Data Mining with R

e-Books for Data Mining with R

R Tutorials

  1. Twotorials by Anthony Damico (learning new tricks from short 2-min videos)
  2. Revolution Analytics Free Webinars
  3. ggplot2 Graphics Cheat Sheet
  4. 10 tips for making your R graphics look their best
  5. Making Maps with R
  6. Compiling R 3.0.1 with MKL support
  7. Flowing Data - Tutorials
  8. Quick-R
  9. R-Uni (A List of Free R Tutorials and Resources in University Webpages)

Interesting Blogs and Articles

Useful R Packages

  1. Ten R packages I wish I knew about earlier (Before you do anything, read this blog post first!!)
  2. caret (short for Classification And REgression Training) for a simple way to train and fine-tune model using different algorithms
  3. ff and bigmemory - two packages to solve memory issues with big datasets
  4. quantmod for financial modelling
  5. foreach and doSNOW for parallel computing in R

Interactive Development Environment

Sunday, 17 March 2013

Blend what?


Over the years I have learned quite a few things about machine learning but I have never thought of writing them down properly. Too often I can't figure out exactly what I did when I look at my old codes. The time is NOW!

More importantly, I have fallen in love with the R programming language and the massive amount of useful packages from the R community. I want to talk about tricks, tools and useful resources for data mining with R (and sometimes my old favourite Matlab) here. 

Bayesian Ensemble Learning

One of the interesting tricks I learned is called "Bayesian Ensemble Learning". It involves combining (i.e. blending) different models to improve overall prediction accuracy. Although it has its downside (e.g. computationally expensive, difficult to interpret ...), it is certainly my favourite data mining technique at the moment. I also decided to name this blog with it long before I start writing this first post!


There is also a need to promote my own research project online. So I guess there will be times I talk about drainage design, green infrastructure and decision support systems. This is not the main focus of the blog but I will try to create some funky graphs and explain my research to a wider audience when the time is right (i.e. when I eventually master the art of graphics in R).

OK, so here we go, this is my journey into the wonderful world of data science!