Your assignment is to analyze the dataset contained in EmployeeData.csv

  

Your assignment is to analyze the dataset contained in EmployeeData.csv

 

Download EmployeeData.csv deliver a report, and upload all project files as described below. You can read more about this dataset here. You are required to model the probability of attrition and draw insights from your analysis. In your report, be sure to support your responses.

 

A large company named Canterra employs, at any given point in time, around 4000 employees. However, every year, around 15% of its employees leave the company and need to be replaced with the talent pool available in the job market.

 

The dataset contains 18 variables (columns) for 4410 employees. The variables are as follows:

 

Age (in years)

Attrition: Whether the employee left the previous year (Yes, No)

BusinessTravel: How frequently the employees travelled for business purposes last year

DistanceFromHome: Distance from home to location of work (in miles)

Education: Education (1 =’Below College’, 2= ‘College’, 3= ‘Bachelor’, 4= ‘Master’, 5= ‘Doctor’)

EmployeeID

Gender (Female, Male)

JobLevel: Job level at company (scale of 1 to 5, level 1 is lowest and 5 is highest)

MaritalStatus: Marital status of the employee (Single, Married, Divorced)

Income: Annual Income (in $)

NumCompaniesWorked: Number of companies they worked at previously

StandardHours: Standard hours of work for the employee

TotalWorkingYears: Total number of years the employee has worked so far

TrainingTimesLastYear: Number of times training was conducted for this employee last year

YearsAtCompany: Total number of years spent at the company by the employee

YearsWithCurrManager: Number of years under current manager

EnvironmentSatisfaction: Satisfaction with Work Environment (1= ‘Low’, 2= ‘Medium’, 3= ‘High’, 4= ‘Very High’)

JobSatisfaction: Job Satisfaction (1= ‘Low’, 2= ‘Medium’, 3= ‘High’, 4= ‘Very High’)

Key Requirements and Questions

 

Pre-process the data and prepare it for running the following classification models (KNN and SVM). (15 points)

Run a K-Nearest Neighbor model to build a predictive model for employees’ attrition. Try out different values of k to find an optimal model. (25 points)

Run a Support Vector Machines model to build a predictive model for employees’ attrition. Try out different model settings to find an optimal model. (25 points)

Compare the best KNN and SVM models by their model evaluation metrics. Which is a better model, and why? (10 points)

Summarize your findings, discuss the pros and cons of your proposed models, and provide your recommendations to management based on insights generated from the above model. (15 points)

Deliverables

 

A written report answering the questions and explaining your findings (submitted as a PDF document). This report should clearly define the problem statement, data processing steps, your approach and any assumptions you make, the results of analyses you have performed, and the managerial insights you have gained by performing these analyses. In writing it, imagine that you are a consultant submitting the report to your client. This report should not be more than 3 pages long. If needed, you can have up to 2 pages of an appendix with supporting exhibits. Include your names on top of the first page, or add a cover page with the names.

Your fully-functional and annotated R code(s) with proper name(s). In your R file(s), highlight the different steps/questions by including annotations or section titles. Also, make sure your submitted code can be executed (with no errors) by just reading the EmployeeData.csv file.

Rubric:

 

Your project will be graded based on:

 

Timeliness:

 

Submitting the complete assignment on or before the deadline. (faculty may not accept late submissions or penalize them in other ways.)

Analysis and Recommendations:

 

Clearly stating the scope and objectives.

Answering all study questions clearly and completely.

Demonstrating appropriate use of concepts and techniques.

Depth of the analysis.

Clarity and quality of the findings and recommendations

To summarize:

 

Correctness of your approach, your code, answering the questions, and following the steps. (90 points)

Format of the report and your R code. (10 points)

1 attachments

Your assignment is to analyze the dataset contained in EmployeeData.csv

We offer the best custom writing paper services. We have answered this question before and we can also do it for you.

GET STARTED TODAY AND GET A 20% DISCOUNT coupon code DISC20

Leave a Comment