Data Processing in R for Data Science

In Data Science, Data Preprocessing is a very crucial part in the making of a Machine Learning model. Without it, our Machine learning models will not work properly. Think of it for example like preparing the farm to plant crops. Without proper preparation, we would have a difficult time planting and it would negatively affect the crops yield. But with the proper preparation, the rest of the process is made easier and we get proper yields,

This is probably going to be the most boring part of this course but once we are done with it we will have a smoother ride with the rest of the course.

We will use a simple data-set in an excel file. you can create an excel file using the data below

Town Age Salary Purchased
Nairobi 44 72000 No
Mombasa 27 48000 Yes
Kisumu 30 54000 No
Thika 38 61000 No
Nakuru 40   Yes
Naivasha 35 58000 Yes
Kiambu   52000 No
Nyeri 48 79000 Yes
Meru 40 83000 No
Nanyuki 37 67000 Yes

The table contains four columns with ten observations. It just compares to see if different people with different salaries and ages have purchased a certain product from a certain company.

We have two types of variables, Independent and dependent variables. The independent variables are in the first three columns, Town, age, and Salary, while the dependent variable is the purchased column. We can use the independent variable to predict dependent variables.


Importing The Libraries

Unlike in Python, we don’t need to import libraries in R. The libraries are already in the packages tab in R Studio. Most of the libraries that we need are imported by default. We are going to install other required libraries as we continue.

Some Really Useful Data Science and Machine Learning Books

That’s it on Data Preprocessing in R. Check out our Previous Tutorial on How to Install R and R Studio if you haven’t already and if you have, See you on the next one as we continue to learn about Data Science and Machine Learning.

Next, Importing the Dataset in R for Data Science

What's your reaction?

In Love
Not Sure

You may also like


  1. […] and welcome to this tutorial. We’ve finished the Data Preprocessing part and now it’s time to start making Machine Learning Models. We’re are going to […]

  2. […] Let us continue this tutorial in our next article, Data Preprocessing and Importing the Libraries in R for Data Science […]

Leave a reply

Your email address will not be published.

More in:Data Science