Long H. Nguyen bio photo

Long H. Nguyen

Knowledge worths sharing!


My latest tutorial on “Practical Data Science”. R programming language and Jupiter notebooks are used in this tutorial. However, the concepts are generic and can be applied for Python or other programming language users as well.

In this tutorial, I have discussed many essential steps in data science projects, including importing data, data manupulation, visualization, modeling and reporting. For each step, important libraries and example code are provided so that you can quickly reused them for your projects. For example:

  • Importing data: readr, data.table, RMySQL
  • Data Manipulation: dplyr, tidyr, lubridate, stringr
  • Data Visualization: ggplot2, plotly
  • Data modeling: caret, lm, randomForest, rpart
  • Reporting: Jupyter notebook, RMarkdown

The data modeling section starts with an overview on predictive modeling landscape, then goes through frequently used models (glm, randomforest, gbm, nerural net). Advanced ML models are also discussed with tips and tricks for tunning hyper parameters. Finally, the stacking technique which is used in many data science competitions is also covered.

Here is the deck:

And, the notebooks:

Hope it useful.