{ "cells": [ { "cell_type": "markdown", "id": "871b7ef6", "metadata": {}, "source": [ "## Problem Statement:\n", "1. Identify why the attrition is higher ?\n", "2. Identify the root cause of attrition ?" ] }, { "cell_type": "code", "execution_count": 2, "id": "01798366", "metadata": { "ExecuteTime": { "end_time": "2022-10-15T15:42:02.179675Z", "start_time": "2022-10-15T15:42:00.905817Z" } }, "outputs": [], "source": [ "import pandas as pd \n", "import numpy as np \n", "import matplotlib.pyplot as plt \n", "import seaborn as sns " ] }, { "cell_type": "code", "execution_count": 3, "id": "cd7b2e44", "metadata": { "ExecuteTime": { "end_time": "2022-10-15T15:42:05.411725Z", "start_time": "2022-10-15T15:42:03.725746Z" } }, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
EmployeeNumberAttritionAgeBusinessTravelDailyRateDepartmentDistanceFromHomeEducationEducationFieldEmployeeCount...RelationshipSatisfactionStandardHoursStockOptionLevelTotalWorkingYearsTrainingTimesLastYearWorkLifeBalanceYearsAtCompanyYearsInCurrentRoleYearsSinceLastPromotionYearsWithCurrManager
01Yes41Travel_Rarely1102Sales12Life Sciences1...18008016405
12No49Travel_Frequently279Research & Development81Life Sciences1...4801103310717
23Yes37Travel_Rarely1373Research & Development22Other1...28007330000
34No33Travel_Frequently1392Research & Development34Life Sciences1...38008338730
45No27Travel_Rarely591Research & Development21Medical1...48016332222
\n", "

5 rows × 35 columns

\n", "
" ], "text/plain": [ " EmployeeNumber Attrition Age BusinessTravel DailyRate \\\n", "0 1 Yes 41 Travel_Rarely 1102 \n", "1 2 No 49 Travel_Frequently 279 \n", "2 3 Yes 37 Travel_Rarely 1373 \n", "3 4 No 33 Travel_Frequently 1392 \n", "4 5 No 27 Travel_Rarely 591 \n", "\n", " Department DistanceFromHome Education EducationField \\\n", "0 Sales 1 2 Life Sciences \n", "1 Research & Development 8 1 Life Sciences \n", "2 Research & Development 2 2 Other \n", "3 Research & Development 3 4 Life Sciences \n", "4 Research & Development 2 1 Medical \n", "\n", " EmployeeCount ... RelationshipSatisfaction StandardHours \\\n", "0 1 ... 1 80 \n", "1 1 ... 4 80 \n", "2 1 ... 2 80 \n", "3 1 ... 3 80 \n", "4 1 ... 4 80 \n", "\n", " StockOptionLevel TotalWorkingYears TrainingTimesLastYear WorkLifeBalance \\\n", "0 0 8 0 1 \n", "1 1 10 3 3 \n", "2 0 7 3 3 \n", "3 0 8 3 3 \n", "4 1 6 3 3 \n", "\n", " YearsAtCompany YearsInCurrentRole YearsSinceLastPromotion \\\n", "0 6 4 0 \n", "1 10 7 1 \n", "2 0 0 0 \n", "3 8 7 3 \n", "4 2 2 2 \n", "\n", " YearsWithCurrManager \n", "0 5 \n", "1 7 \n", "2 0 \n", "3 0 \n", "4 2 \n", "\n", "[5 rows x 35 columns]" ] }, "execution_count": 3, "metadata": {}, "output_type": "execute_result" } ], "source": [ "dataset = pd.read_excel(\"/Users/rishavdas/Downloads/Data/HR_Employee_Attrition-1.xlsx\")\n", "dataset.head()" ] }, { "cell_type": "code", "execution_count": 3, "id": "923e5551", "metadata": { "ExecuteTime": { "end_time": "2022-10-15T15:42:09.096194Z", "start_time": "2022-10-15T15:42:09.091856Z" } }, "outputs": [ { "data": { "text/plain": [ "(2940, 35)" ] }, "execution_count": 3, "metadata": {}, "output_type": "execute_result" } ], "source": [ "dataset.shape" ] }, { "cell_type": "code", "execution_count": 4, "id": "efbe0080", "metadata": { "ExecuteTime": { "end_time": "2022-10-15T15:42:21.673316Z", "start_time": "2022-10-15T15:42:21.653053Z" } }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "\n", "RangeIndex: 2940 entries, 0 to 2939\n", "Data columns (total 35 columns):\n", " # Column Non-Null Count Dtype \n", "--- ------ -------------- ----- \n", " 0 EmployeeNumber 2940 non-null int64 \n", " 1 Attrition 2940 non-null object\n", " 2 Age 2940 non-null int64 \n", " 3 BusinessTravel 2940 non-null object\n", " 4 DailyRate 2940 non-null int64 \n", " 5 Department 2940 non-null object\n", " 6 DistanceFromHome 2940 non-null int64 \n", " 7 Education 2940 non-null int64 \n", " 8 EducationField 2940 non-null object\n", " 9 EmployeeCount 2940 non-null int64 \n", " 10 EnvironmentSatisfaction 2940 non-null int64 \n", " 11 Gender 2940 non-null object\n", " 12 HourlyRate 2940 non-null int64 \n", " 13 JobInvolvement 2940 non-null int64 \n", " 14 JobLevel 2940 non-null int64 \n", " 15 JobRole 2940 non-null object\n", " 16 JobSatisfaction 2940 non-null int64 \n", " 17 MaritalStatus 2940 non-null object\n", " 18 MonthlyIncome 2940 non-null int64 \n", " 19 MonthlyRate 2940 non-null int64 \n", " 20 NumCompaniesWorked 2940 non-null int64 \n", " 21 Over18 2940 non-null object\n", " 22 OverTime 2940 non-null object\n", " 23 PercentSalaryHike 2940 non-null int64 \n", " 24 PerformanceRating 2940 non-null int64 \n", " 25 RelationshipSatisfaction 2940 non-null int64 \n", " 26 StandardHours 2940 non-null int64 \n", " 27 StockOptionLevel 2940 non-null int64 \n", " 28 TotalWorkingYears 2940 non-null int64 \n", " 29 TrainingTimesLastYear 2940 non-null int64 \n", " 30 WorkLifeBalance 2940 non-null int64 \n", " 31 YearsAtCompany 2940 non-null int64 \n", " 32 YearsInCurrentRole 2940 non-null int64 \n", " 33 YearsSinceLastPromotion 2940 non-null int64 \n", " 34 YearsWithCurrManager 2940 non-null int64 \n", "dtypes: int64(26), object(9)\n", "memory usage: 804.0+ KB\n" ] } ], "source": [ "dataset.info()" ] }, { "cell_type": "code", "execution_count": 5, "id": "91c9c990", "metadata": { "ExecuteTime": { "end_time": "2022-10-15T15:42:37.849973Z", "start_time": "2022-10-15T15:42:37.836598Z" } }, "outputs": [ { "data": { "text/plain": [ "EmployeeNumber 0\n", "Attrition 0\n", "Age 0\n", "BusinessTravel 0\n", "DailyRate 0\n", "Department 0\n", "DistanceFromHome 0\n", "Education 0\n", "EducationField 0\n", "EmployeeCount 0\n", "EnvironmentSatisfaction 0\n", "Gender 0\n", "HourlyRate 0\n", "JobInvolvement 0\n", "JobLevel 0\n", "JobRole 0\n", "JobSatisfaction 0\n", "MaritalStatus 0\n", "MonthlyIncome 0\n", "MonthlyRate 0\n", "NumCompaniesWorked 0\n", "Over18 0\n", "OverTime 0\n", "PercentSalaryHike 0\n", "PerformanceRating 0\n", "RelationshipSatisfaction 0\n", "StandardHours 0\n", "StockOptionLevel 0\n", "TotalWorkingYears 0\n", "TrainingTimesLastYear 0\n", "WorkLifeBalance 0\n", "YearsAtCompany 0\n", "YearsInCurrentRole 0\n", "YearsSinceLastPromotion 0\n", "YearsWithCurrManager 0\n", "dtype: int64" ] }, "execution_count": 5, "metadata": {}, "output_type": "execute_result" } ], "source": [ "dataset.isnull().sum()" ] }, { "cell_type": "code", "execution_count": 5, "id": "2724bf53", "metadata": {}, "outputs": [ { "data": { "text/plain": [ "Index(['EmployeeNumber', 'Attrition', 'Age', 'BusinessTravel', 'DailyRate',\n", " 'Department', 'DistanceFromHome', 'Education', 'EducationField',\n", " 'EmployeeCount', 'EnvironmentSatisfaction', 'Gender', 'HourlyRate',\n", " 'JobInvolvement', 'JobLevel', 'JobRole', 'JobSatisfaction',\n", " 'MaritalStatus', 'MonthlyIncome', 'MonthlyRate', 'NumCompaniesWorked',\n", " 'Over18', 'OverTime', 'PercentSalaryHike', 'PerformanceRating',\n", " 'RelationshipSatisfaction', 'StandardHours', 'StockOptionLevel',\n", " 'TotalWorkingYears', 'TrainingTimesLastYear', 'WorkLifeBalance',\n", " 'YearsAtCompany', 'YearsInCurrentRole', 'YearsSinceLastPromotion',\n", " 'YearsWithCurrManager'],\n", " dtype='object')" ] }, "execution_count": 5, "metadata": {}, "output_type": "execute_result" } ], "source": [ "dataset.columns" ] }, { "cell_type": "code", "execution_count": 6, "id": "75f464ad", "metadata": { "ExecuteTime": { "end_time": "2022-10-15T15:44:23.384694Z", "start_time": "2022-10-15T15:44:23.366885Z" } }, "outputs": [], "source": [ "for i in dataset.select_dtypes('object').columns:\n", " dataset[i] = dataset[i].astype('category')" ] }, { "cell_type": "code", "execution_count": 7, "id": "555ea9f1", "metadata": { "ExecuteTime": { "end_time": "2022-10-15T15:44:27.182415Z", "start_time": "2022-10-15T15:44:27.149923Z" } }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "\n", "RangeIndex: 2940 entries, 0 to 2939\n", "Data columns (total 35 columns):\n", " # Column Non-Null Count Dtype \n", "--- ------ -------------- ----- \n", " 0 EmployeeNumber 2940 non-null int64 \n", " 1 Attrition 2940 non-null category\n", " 2 Age 2940 non-null int64 \n", " 3 BusinessTravel 2940 non-null category\n", " 4 DailyRate 2940 non-null int64 \n", " 5 Department 2940 non-null category\n", " 6 DistanceFromHome 2940 non-null int64 \n", " 7 Education 2940 non-null int64 \n", " 8 EducationField 2940 non-null category\n", " 9 EmployeeCount 2940 non-null int64 \n", " 10 EnvironmentSatisfaction 2940 non-null int64 \n", " 11 Gender 2940 non-null category\n", " 12 HourlyRate 2940 non-null int64 \n", " 13 JobInvolvement 2940 non-null int64 \n", " 14 JobLevel 2940 non-null int64 \n", " 15 JobRole 2940 non-null category\n", " 16 JobSatisfaction 2940 non-null int64 \n", " 17 MaritalStatus 2940 non-null category\n", " 18 MonthlyIncome 2940 non-null int64 \n", " 19 MonthlyRate 2940 non-null int64 \n", " 20 NumCompaniesWorked 2940 non-null int64 \n", " 21 Over18 2940 non-null category\n", " 22 OverTime 2940 non-null category\n", " 23 PercentSalaryHike 2940 non-null int64 \n", " 24 PerformanceRating 2940 non-null int64 \n", " 25 RelationshipSatisfaction 2940 non-null int64 \n", " 26 StandardHours 2940 non-null int64 \n", " 27 StockOptionLevel 2940 non-null int64 \n", " 28 TotalWorkingYears 2940 non-null int64 \n", " 29 TrainingTimesLastYear 2940 non-null int64 \n", " 30 WorkLifeBalance 2940 non-null int64 \n", " 31 YearsAtCompany 2940 non-null int64 \n", " 32 YearsInCurrentRole 2940 non-null int64 \n", " 33 YearsSinceLastPromotion 2940 non-null int64 \n", " 34 YearsWithCurrManager 2940 non-null int64 \n", "dtypes: category(9), int64(26)\n", "memory usage: 624.6 KB\n" ] } ], "source": [ "dataset.info()" ] }, { "cell_type": "code", "execution_count": 7, "id": "904c15d6", "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
countmeanstdmin25%50%75%max
EmployeeNumber2940.01470.500000848.8492211.0735.751470.52205.252940.0
Age2940.036.9238109.13381918.030.0036.043.0060.0
DailyRate2940.0802.485714403.440447102.0465.00802.01157.001499.0
DistanceFromHome2940.09.1925178.1054851.02.007.014.0029.0
Education2940.02.9129251.0239911.02.003.04.005.0
EmployeeCount2940.01.0000000.0000001.01.001.01.001.0
EnvironmentSatisfaction2940.02.7217691.0928961.02.003.04.004.0
HourlyRate2940.065.89115620.32596930.048.0066.084.00100.0
JobInvolvement2940.02.7299320.7114401.02.003.03.004.0
JobLevel2940.02.0639461.1067521.01.002.03.005.0
JobSatisfaction2940.02.7285711.1026581.02.003.04.004.0
MonthlyIncome2940.06502.9312934707.1557701009.02911.004919.08380.0019999.0
MonthlyRate2940.014313.1034017116.5750212094.08045.0014235.520462.0026999.0
NumCompaniesWorked2940.02.6931972.4975840.01.002.04.009.0
PercentSalaryHike2940.015.2095243.65931511.012.0014.018.0025.0
PerformanceRating2940.03.1537410.3607623.03.003.03.004.0
RelationshipSatisfaction2940.02.7122451.0810251.02.003.04.004.0
StandardHours2940.080.0000000.00000080.080.0080.080.0080.0
StockOptionLevel2940.00.7938780.8519320.00.001.01.003.0
TotalWorkingYears2940.011.2795927.7794580.06.0010.015.0040.0
TrainingTimesLastYear2940.02.7993201.2890510.02.003.03.006.0
WorkLifeBalance2940.02.7612240.7063561.02.003.03.004.0
YearsAtCompany2940.07.0081636.1254830.03.005.09.0040.0
YearsInCurrentRole2940.04.2292523.6225210.02.003.07.0018.0
YearsSinceLastPromotion2940.02.1877553.2218820.00.001.03.0015.0
YearsWithCurrManager2940.04.1231293.5675290.02.003.07.0017.0
\n", "
" ], "text/plain": [ " count mean std min 25% \\\n", "EmployeeNumber 2940.0 1470.500000 848.849221 1.0 735.75 \n", "Age 2940.0 36.923810 9.133819 18.0 30.00 \n", "DailyRate 2940.0 802.485714 403.440447 102.0 465.00 \n", "DistanceFromHome 2940.0 9.192517 8.105485 1.0 2.00 \n", "Education 2940.0 2.912925 1.023991 1.0 2.00 \n", "EmployeeCount 2940.0 1.000000 0.000000 1.0 1.00 \n", "EnvironmentSatisfaction 2940.0 2.721769 1.092896 1.0 2.00 \n", "HourlyRate 2940.0 65.891156 20.325969 30.0 48.00 \n", "JobInvolvement 2940.0 2.729932 0.711440 1.0 2.00 \n", "JobLevel 2940.0 2.063946 1.106752 1.0 1.00 \n", "JobSatisfaction 2940.0 2.728571 1.102658 1.0 2.00 \n", "MonthlyIncome 2940.0 6502.931293 4707.155770 1009.0 2911.00 \n", "MonthlyRate 2940.0 14313.103401 7116.575021 2094.0 8045.00 \n", "NumCompaniesWorked 2940.0 2.693197 2.497584 0.0 1.00 \n", "PercentSalaryHike 2940.0 15.209524 3.659315 11.0 12.00 \n", "PerformanceRating 2940.0 3.153741 0.360762 3.0 3.00 \n", "RelationshipSatisfaction 2940.0 2.712245 1.081025 1.0 2.00 \n", "StandardHours 2940.0 80.000000 0.000000 80.0 80.00 \n", "StockOptionLevel 2940.0 0.793878 0.851932 0.0 0.00 \n", "TotalWorkingYears 2940.0 11.279592 7.779458 0.0 6.00 \n", "TrainingTimesLastYear 2940.0 2.799320 1.289051 0.0 2.00 \n", "WorkLifeBalance 2940.0 2.761224 0.706356 1.0 2.00 \n", "YearsAtCompany 2940.0 7.008163 6.125483 0.0 3.00 \n", "YearsInCurrentRole 2940.0 4.229252 3.622521 0.0 2.00 \n", "YearsSinceLastPromotion 2940.0 2.187755 3.221882 0.0 0.00 \n", "YearsWithCurrManager 2940.0 4.123129 3.567529 0.0 2.00 \n", "\n", " 50% 75% max \n", "EmployeeNumber 1470.5 2205.25 2940.0 \n", "Age 36.0 43.00 60.0 \n", "DailyRate 802.0 1157.00 1499.0 \n", "DistanceFromHome 7.0 14.00 29.0 \n", "Education 3.0 4.00 5.0 \n", "EmployeeCount 1.0 1.00 1.0 \n", "EnvironmentSatisfaction 3.0 4.00 4.0 \n", "HourlyRate 66.0 84.00 100.0 \n", "JobInvolvement 3.0 3.00 4.0 \n", "JobLevel 2.0 3.00 5.0 \n", "JobSatisfaction 3.0 4.00 4.0 \n", "MonthlyIncome 4919.0 8380.00 19999.0 \n", "MonthlyRate 14235.5 20462.00 26999.0 \n", "NumCompaniesWorked 2.0 4.00 9.0 \n", "PercentSalaryHike 14.0 18.00 25.0 \n", "PerformanceRating 3.0 3.00 4.0 \n", "RelationshipSatisfaction 3.0 4.00 4.0 \n", "StandardHours 80.0 80.00 80.0 \n", "StockOptionLevel 1.0 1.00 3.0 \n", "TotalWorkingYears 10.0 15.00 40.0 \n", "TrainingTimesLastYear 3.0 3.00 6.0 \n", "WorkLifeBalance 3.0 3.00 4.0 \n", "YearsAtCompany 5.0 9.00 40.0 \n", "YearsInCurrentRole 3.0 7.00 18.0 \n", "YearsSinceLastPromotion 1.0 3.00 15.0 \n", "YearsWithCurrManager 3.0 7.00 17.0 " ] }, "execution_count": 7, "metadata": {}, "output_type": "execute_result" } ], "source": [ "dataset.describe().T" ] }, { "cell_type": "code", "execution_count": null, "id": "52c439c5", "metadata": {}, "outputs": [], "source": [ "3 steps of Analysis:\n", "1. Univariate Analysis \n", "\n", "2, Bivriate Analysis\n", "\n", "3. Multivariate analysis" ] }, { "cell_type": "code", "execution_count": 9, "id": "f2368047", "metadata": {}, "outputs": [ { "data": { "text/plain": [ "No 2466\n", "Yes 474\n", "Name: Attrition, dtype: int64" ] }, "execution_count": 9, "metadata": {}, "output_type": "execute_result" } ], "source": [ "dataset.Attrition.value_counts()" ] }, { "cell_type": "code", "execution_count": 8, "id": "e3ffe151", "metadata": {}, "outputs": [ { "data": { "text/plain": [ "No 83.88%\n", "Yes 16.12%\n", "Name: Attrition, dtype: object" ] }, "execution_count": 8, "metadata": {}, "output_type": "execute_result" } ], "source": [ "dataset.Attrition.value_counts(normalize=True).mul(100).round(2).astype('str') + '%'" ] }, { "cell_type": "code", "execution_count": null, "id": "d024353e", "metadata": {}, "outputs": [], "source": [] }, { "cell_type": "code", "execution_count": null, "id": "ee9aabb0", "metadata": {}, "outputs": [], "source": [] }, { "cell_type": "markdown", "id": "29acd312", "metadata": {}, "source": [ "## Univariate Continous Data Analysis" ] }, { "cell_type": "code", "execution_count": 10, "id": "3ab7a0e3", "metadata": { "ExecuteTime": { "end_time": "2022-10-15T16:09:21.619640Z", "start_time": "2022-10-15T16:09:21.607328Z" } }, "outputs": [], "source": [ "def continuos_univariate_analysis(data,\n", " feature,\n", " figsize=(12, 8),\n", " kde=False,\n", " ):\n", " f1, (ax_box,\n", " ax_hist) = plt.subplots(nrows=2,\n", " sharex=True,\n", " gridspec_kw={'height_ratios': (0.25, 0.75)},\n", " figsize=figsize)\n", " sns.color_palette(\"viridis\", as_cmap=True)\n", " sns.boxplot(data=data,\n", " x=feature,\n", " ax=ax_box,\n", " showmeans=True,\n", " color='yellow')\n", " sns.histplot(data=data, x=feature, ax=ax_hist, kde=kde, color='blue')\n", " ax_hist.axvline(data[feature].mean(), color='cyan', linestyle='--')\n", " ax_hist.axvline(data[feature].median(), color='orange', linestyle=\"-\")" ] }, { "cell_type": "code", "execution_count": 12, "id": "d4755681", "metadata": { "ExecuteTime": { "end_time": "2022-10-15T16:09:23.825296Z", "start_time": "2022-10-15T16:09:23.578694Z" } }, "outputs": [ { "data": { "image/png": "", "text/plain": [ "
" ] }, "metadata": { "needs_background": "light" }, "output_type": "display_data" } ], "source": [ "continuos_univariate_analysis(dataset, 'Age', kde=True)" ] }, { "cell_type": "markdown", "id": "8e6a4ab4", "metadata": {}, "source": [ "## Observations:\n", "1. Age is normally distributed \n", "2. Majority of the employees do have age between 28 - 43 years \n", "3. Slightly left skewed \n", "4. no outliers observed \n", "5. the orgnaization has more young employees\n", "6. Avg Age is 36 years \n", "7. No such significant difference in age observed for those who are leaving the organization \n" ] }, { "cell_type": "code", "execution_count": 40, "id": "a00df48d", "metadata": { "ExecuteTime": { "end_time": "2022-10-15T16:39:12.843805Z", "start_time": "2022-10-15T16:39:12.831347Z" } }, "outputs": [ { "data": { "text/plain": [ "No 83.9%\n", "Yes 16.1%\n", "Name: Attrition, dtype: object" ] }, "execution_count": 40, "metadata": {}, "output_type": "execute_result" } ], "source": [ "dataset.Attrition.value_counts(normalize=True).mul(100).round(1).astype('str')+'%'" ] }, { "cell_type": "code", "execution_count": 41, "id": "1fd98c65", "metadata": { "ExecuteTime": { "end_time": "2022-10-15T16:41:41.425154Z", "start_time": "2022-10-15T16:41:41.419101Z" } }, "outputs": [ { "data": { "text/plain": [ "Attrition\n", "No 37.561233\n", "Yes 33.607595\n", "Name: Age, dtype: float64" ] }, "execution_count": 41, "metadata": {}, "output_type": "execute_result" } ], "source": [ "dataset.groupby(['Attrition'])['Age'].mean()" ] }, { "cell_type": "code", "execution_count": 42, "id": "b9ede37e", "metadata": { "ExecuteTime": { "end_time": "2022-10-15T16:43:22.095611Z", "start_time": "2022-10-15T16:43:21.821407Z" } }, "outputs": [ { "data": { "image/png": "", "text/plain": [ "
" ] }, "metadata": { "needs_background": "light" }, "output_type": "display_data" } ], "source": [ "continuos_univariate_analysis(dataset, 'MonthlyIncome', kde=True)" ] }, { "cell_type": "code", "execution_count": 14, "id": "69933d3d", "metadata": { "ExecuteTime": { "end_time": "2022-10-15T16:50:16.477680Z", "start_time": "2022-10-15T16:50:16.465548Z" } }, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
EmployeeNumber
JobRole
Manager162
Research Director90
\n", "
" ], "text/plain": [ " EmployeeNumber\n", "JobRole \n", "Manager 162\n", "Research Director 90" ] }, "execution_count": 14, "metadata": {}, "output_type": "execute_result" } ], "source": [ "dataset[dataset['MonthlyIncome']>16000].groupby('JobRole').agg({'EmployeeNumber':'count'})" ] }, { "cell_type": "code", "execution_count": 15, "id": "0fd1a1a4", "metadata": { "ExecuteTime": { "end_time": "2022-10-15T16:58:28.042867Z", "start_time": "2022-10-15T16:58:28.015763Z" } }, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
MonthlyIncome
AttritionJobRole
YesManager19509.333333
Research Director19395.500000
\n", "
" ], "text/plain": [ " MonthlyIncome\n", "Attrition JobRole \n", "Yes Manager 19509.333333\n", " Research Director 19395.500000" ] }, "execution_count": 15, "metadata": {}, "output_type": "execute_result" } ], "source": [ "dataset[(dataset['MonthlyIncome'] > 16000)\n", " & (dataset['Attrition'] == 'Yes')].groupby(\n", " ['Attrition', 'JobRole']).agg({'MonthlyIncome': 'mean'})" ] }, { "cell_type": "code", "execution_count": 19, "id": "e9460b5c", "metadata": { "ExecuteTime": { "end_time": "2022-10-15T17:02:50.474820Z", "start_time": "2022-10-15T17:02:50.446894Z" } }, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
AttritionNoYes
JobRole
Healthcare Representative7453.5573778548.222222
Human Resources4391.7500003715.750000
Laboratory Technician3337.2233502919.258065
Manager13498.47368412729.500000
Manufacturing Director7289.9259267365.500000
Research Director13158.600000NaN
Research Scientist3328.1224492780.468085
Sales Executive6804.6171007489.000000
Sales Representative2798.4400002364.727273
\n", "
" ], "text/plain": [ "Attrition No Yes\n", "JobRole \n", "Healthcare Representative 7453.557377 8548.222222\n", "Human Resources 4391.750000 3715.750000\n", "Laboratory Technician 3337.223350 2919.258065\n", "Manager 13498.473684 12729.500000\n", "Manufacturing Director 7289.925926 7365.500000\n", "Research Director 13158.600000 NaN\n", "Research Scientist 3328.122449 2780.468085\n", "Sales Executive 6804.617100 7489.000000\n", "Sales Representative 2798.440000 2364.727273" ] }, "execution_count": 19, "metadata": {}, "output_type": "execute_result" } ], "source": [ "dataset[dataset['MonthlyIncome']<=16000].groupby(['Attrition', 'JobRole']).agg({'MonthlyIncome':'mean'}).reset_index()\\\n", " .pivot_table(index='JobRole', columns='Attrition', values='MonthlyIncome')" ] }, { "cell_type": "code", "execution_count": 18, "id": "7b789cce", "metadata": { "ExecuteTime": { "end_time": "2022-10-15T17:07:26.393828Z", "start_time": "2022-10-15T17:07:26.378188Z" } }, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
MonthlyIncomeEmployeeNumber
Attrition
No5601.9361512224
Yes4470.784483464
\n", "
" ], "text/plain": [ " MonthlyIncome EmployeeNumber\n", "Attrition \n", "No 5601.936151 2224\n", "Yes 4470.784483 464" ] }, "execution_count": 18, "metadata": {}, "output_type": "execute_result" } ], "source": [ "dataset[dataset['MonthlyIncome'] <= 16000].groupby(['Attrition']).agg({\n", " 'MonthlyIncome':\n", " 'mean',\n", " 'EmployeeNumber':\n", " 'count'\n", "})" ] }, { "cell_type": "markdown", "id": "6eb23a68", "metadata": {}, "source": [ "## Observation:\n", "1. Majority of the employees are earning between 2900 dollars to 8000 dollar per month \n", "2. On avg, monthly income is 6000 dollars\n", "3. Those who are earning more than 16000 dollars are treated as outliers \n", "4. 242 employees are earning more than 16000 dollars \n", "5. less than 1% attrition happend for higher salary bracket \n", "6. Those are in higher salary bracket, mostly they are from leadership stacks (Manager/Directors)\n", "7. On avg, those who left the organization were earning less amount of 800 dollars than others \n", "**MonthlyIncome impacts Attrition linearly**" ] }, { "cell_type": "code", "execution_count": 51, "id": "c5778a32", "metadata": { "ExecuteTime": { "end_time": "2022-10-15T17:09:50.974427Z", "start_time": "2022-10-15T17:09:50.704153Z" } }, "outputs": [ { "data": { "image/png": "", "text/plain": [ "
" ] }, "metadata": { "needs_background": "light" }, "output_type": "display_data" } ], "source": [ "continuos_univariate_analysis(dataset, 'DistanceFromHome', kde=True)" ] }, { "cell_type": "code", "execution_count": 20, "id": "ccb378a4", "metadata": {}, "outputs": [ { "data": { "image/png": "", "text/plain": [ "
" ] }, "metadata": { "needs_background": "light" }, "output_type": "display_data" } ], "source": [ "continuos_univariate_analysis(dataset, 'MonthlyRate', kde=True)" ] }, { "cell_type": "code", "execution_count": 52, "id": "20c5056c", "metadata": { "ExecuteTime": { "end_time": "2022-10-15T17:13:43.784244Z", "start_time": "2022-10-15T17:13:43.765193Z" } }, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
DistanceFromHomeAge
Attrition
No8.91565337.561233
Yes10.63291133.607595
\n", "
" ], "text/plain": [ " DistanceFromHome Age\n", "Attrition \n", "No 8.915653 37.561233\n", "Yes 10.632911 33.607595" ] }, "execution_count": 52, "metadata": {}, "output_type": "execute_result" } ], "source": [ "dataset.groupby(['Attrition']).agg({'DistanceFromHome':'mean', 'Age':'mean'})" ] }, { "cell_type": "markdown", "id": "c46f32d9", "metadata": {}, "source": [ "**Binning Technique**" ] }, { "cell_type": "code", "execution_count": 69, "id": "bed5c524", "metadata": { "ExecuteTime": { "end_time": "2022-10-15T17:27:10.396623Z", "start_time": "2022-10-15T17:27:10.378366Z" } }, "outputs": [ { "data": { "text/plain": [ "(-0.001, 10.0] 2052\n", "(10.0, 15.0] 230\n", "(15.0, 30.0] 658\n", "Name: DistanceFromHome, dtype: int64" ] }, "execution_count": 69, "metadata": {}, "output_type": "execute_result" } ], "source": [ "dataset['DistanceFromHome'].value_counts(bins=[0,10, 15, 30], sort=False)" ] }, { "cell_type": "code", "execution_count": 66, "id": "d0997be0", "metadata": { "ExecuteTime": { "end_time": "2022-10-15T17:25:22.857280Z", "start_time": "2022-10-15T17:25:22.848450Z" } }, "outputs": [], "source": [ "bins = [0, 10, 15, 30]\n", "labels = [\"Short_distance\", \"Moderate_distance\", \"Higher_distance\"]\n", "dataset['distance_class'] = pd.cut(x=dataset['DistanceFromHome'],\n", " bins=bins,\n", " labels=labels,\n", " include_lowest=True)" ] }, { "cell_type": "code", "execution_count": 70, "id": "c8fff089", "metadata": { "ExecuteTime": { "end_time": "2022-10-15T17:28:11.433581Z", "start_time": "2022-10-15T17:28:11.406077Z" } }, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
EmployeeNumberMonthlyIncome
Attritiondistance_class
NoShort_distance17646875.893424
Moderate_distance1806796.722222
Higher_distance5226699.329502
YesShort_distance2884557.256944
Moderate_distance506402.760000
Higher_distance1364679.808824
\n", "
" ], "text/plain": [ " EmployeeNumber MonthlyIncome\n", "Attrition distance_class \n", "No Short_distance 1764 6875.893424\n", " Moderate_distance 180 6796.722222\n", " Higher_distance 522 6699.329502\n", "Yes Short_distance 288 4557.256944\n", " Moderate_distance 50 6402.760000\n", " Higher_distance 136 4679.808824" ] }, "execution_count": 70, "metadata": {}, "output_type": "execute_result" } ], "source": [ "dataset.groupby(['Attrition', 'distance_class']).agg({'EmployeeNumber':'count', 'MonthlyIncome':'mean'})" ] }, { "cell_type": "markdown", "id": "3a10dbe2", "metadata": {}, "source": [ "## Observation :\n", "\n", "1. Majority of them live between 0 - 10 km of distance from office location \n", "2. very less of number of employees are staying far from office \n", "3. No significant distance observed between those who left and those who stayed(only 2km on avg greater than those who left) \n", "4. Those who stay far away/nearby from the office with lower salary than average are likely to leave the organization \n", "\n", "**Distance with lower income matters**" ] }, { "cell_type": "code", "execution_count": null, "id": "d3431d04", "metadata": {}, "outputs": [], "source": [] } ], "metadata": { "kernelspec": { "display_name": "Python 3.8.8 ('base')", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.8.8" }, "vscode": { "interpreter": { "hash": "ecd35c037bd360a9223f97d1b9b8f2c86e12889559e066ee9e282756f5cb5240" } } }, "nbformat": 4, "nbformat_minor": 5 }