Data Processing in R for Data Science
Data ScienceR

Data Preprocessing in R for Data Science

In Data Science, Data Preprocessing is a very crucial part in the making of a Machine Learning model. Without it, our Machine learning models will not work properly. Think of it for example like preparing the farm to plant crops. Without proper preparation, we would have a difficult time planting and it would negatively affect the crops yield. But with the proper preparation, the rest of the process is made easier and we get proper yields,

This is probably going to be the most boring part of this course but once we are done with it we will have a smoother ride with the rest of the course.

Download the Dataset

We will use a simple data-set in a .csv file. you can create an excel file using the data below

Country Age Salary Purchased
Netherlands 44 72000 No
Switzerland 27 48000 Yes
France 30 54000 No
Switzerland 38 61000 No
France 40   Yes
Netherlands 35 58000 Yes
Switzerland   52000 No
Netherlands 48 79000 Yes
France 40 83000 No
Netherlands 37 67000 Yes

The table contains four columns with ten observations. It just compares to see if different people with different salaries and ages have purchased a certain product from a certain company.

We have two types of variables, Independent and dependent variables. The independent variables are in the first three columns, Town, age, and Salary, while the dependent variable is the purchased column. We can use the independent variable to predict dependent variables.


Importing The Libraries

Unlike in Python, we don’t need to import libraries in R. The libraries are already in the packages tab in R Studio. Most of the libraries that we need are imported by default. We are going to install other required libraries as we continue.

Some Really Useful Data Science and Machine Learning Books

That’s it on Data Preprocessing in R. Check out our Previous Tutorial on How to Install R and R Studio if you haven’t already and if you have, See you on the next one as we continue to learn about Data Science and Machine Learning.

Next, Importing the Dataset in R for Data Science

What's your reaction?

In Love
Not Sure

You may also like

More in:Data Science


  1. […] and welcome to this tutorial. We’ve finished the Data Preprocessing part and now it’s time to start making Machine Learning Models. We’re are going to […]

  2. […] Let us continue this tutorial in our next article, Data Preprocessing and Importing the Libraries in R for Data Science […]

Leave a reply

Your email address will not be published. Required fields are marked *