Practical Data Science

My latest tutorial on “Practical Data Science”. R programming language and Jupiter notebooks are used in this tutorial. However, the concepts are generic and can be applied for Python or other programming language users as well.

In this tutorial, I have discussed many essential steps in data science projects, including importing data, data manupulation, visualization, modeling and reporting. For each step, important libraries and example code are provided so that you can quickly reused them for your projects. For example:

Importing data: readr, data.table, RMySQL
Data Manipulation: dplyr, tidyr, lubridate, stringr
Data Visualization: ggplot2, plotly
Data modeling: caret, lm, randomForest, rpart
Reporting: Jupyter notebook, RMarkdown

The data modeling section starts with an overview on predictive modeling landscape, then goes through frequently used models (glm, randomforest, gbm, nerural net). Advanced ML models are also discussed with tips and tricks for tunning hyper parameters. Finally, the stacking technique which is used in many data science competitions is also covered.

Here is the deck:

And, the notebooks:

Hope it useful.

Long H. Nguyen

Practical Data Science

You might also enjoy (View all posts)