In Data Science, Data Preprocessing is a very crucial part in the making of a Machine Learning model. Without it, our Machine learning models will not work properly. Think of it for example like preparing the farm to plant crops. Without proper preparation, we would have a difficult time planting and it would negatively affect the crops yield. But with the proper preparation, the rest of the process is made easier and we get proper yields,
This is probably going to be the most boring part of this course but once we are done with it we will have a smoother ride with the rest of the course.
We will use a simple data-set in an excel file. you can create an excel file using the data below
The table contains four columns with ten observations. It just compares to see if different people with different salaries and ages have purchased a certain product from a certain company.
We have two types of variables, Independent and dependent variables. The independent variables are in the first three columns, Town, age, and Salary, while the dependent variable is the purchased column. We can use the independent variable to predict dependent variables.
Importing The Libraries
Unlike in Python, we don’t need to import libraries in R. The libraries are already in the packages tab in R Studio. Most of the libraries that we need are imported by default. We are going to install other required libraries as we continue.
Some Really Useful Data Science and Machine Learning Books
That’s it on Data Preprocessing in R. Check out our Previous Tutorial on How to Install R and R Studio if you haven’t already and if you have, See you on the next one as we continue to learn about Data Science and Machine Learning.