Human Resources Optimised with Advanced Analytics

Human Resources Optimised with Advanced Analytics

Today we are going to present some insights related to employee’s working the satisfaction using Advanced Analytics tools and techniques. As a source for this study, we make use of the data made available on this link by the data scientist Ludovic Benistant who made important anonymizations. Some pictures have Brazilian Portuguese words, sorry about that! Let’s go!

Research Questions

Following the DCIM (Data Culture Introduction Methodology) methodology to guide this research, we came up the following questions:

  • What factors have the greatest influence on employee satisfaction?
  • What are the main satisfaction scenarios that exist?
  • What are the main patterns associated with key satisfaction scenarios?
  • What factors influence professionals to leave?

Data Characteristics

In total, 14,999 employees were evaluated, considering the following variables already sanitized by our scripts:

  • Employee satisfaction level (0 to 10) – Probably filled out by the employee;
  • Last evaluation (0 to 10) – Probably filled in by a manager;
  • Number of projects (2 to 7) – Number of projects in which the employee acted;
  • Average monthly hours (96 to 310);
  • Time spent at the company (2 to 10) – How long the person already worked in the company;
  • Whether they have had an accident at work – (Yes = 1 / No = 0);
  • Whether they have had a promotion in the last 5 years (Yes = 1 / No = 0);
  • Salary Range (Low = 1, Medium = 2, High = 3); Note: Actual values were not made available.
  • Left the company (Yes = 1 / No = 0).

Number of people per department

 

 


per-departament

 

Frequency Analysis / Distribution of Satisfaction

overal-satisfaction-level

The highest concentration of satisfaction is within the range of 7 to 9, and there are few people with satisfaction scores between 1.5 and 3.0.

Results

Ranking of Influence Factors in Work Satisfaction

By processing this dataset on VORTX Big Data algorithm

  1. Average monthly hours (50)
  2. Time spent at the company (21)
  3. Number of projects (20)
  4. Salary Range (13)
  5. Left the company (10)
  6. Whether they have had a promotion in the last 5 years (9)
  7. Whether they have had accident at work (9)

The factor “Last evaluation” had no relevant influence and it was automatically discarded by VORTX.

Satisfaction Scenarios

In the table below we have the result of the processing with the separation of employees into groups done automatically by the platform. In all, 120 groups have been found, and here we will focus on only the 20 most relevant and leave the others out as isolated cases and not the focus of the analysis.

english-table

Model Visual Validation

Typically managers, as far as we have experienced,  are not sure regarding machine’s ability automate the discovery of insights. Therefore, as proof of the model, we chose to show the raw data visually to demonstrate the insights aforementioned.

grupo-9-o-mais-insatisfeitos

The pattern of hours worked by the 588 people in scenario 9 (very dissatisfied). X Axis = Monthly working hours.

 

grupo-1

The pattern of hours worked in the largest scenario (1), which has 4085 employees, a good job satisfaction and a low level of job evasion. X Axis – Monthly working hours

In the view below, each circle represents a contributor in four dimensions:

  • The level of satisfaction on the Y axis.
  • Average hours per month on the X axis.
  • Orange colors for people who left the company and blue for those who remain.
  • Circle size represents the number of years in the company.

general-pattern

Alright, we just saw the overall pattern including the whole organization, so what would happen if we see it by the department?

accounting-and-it

managment-to-product

rd-and-support

technical

Conclusions and Recommendations

This study shed some light on the improvement of human resource management, which is at the heart of today’s businesses. Applying data analytics algorithms in this area allows automating and accelerating the process of pattern discovery in complex environments with, let’s say 50 variables or more. Here it was just a few. Meanwhile, the search for patterns in a traditional BI continues to be a purely artisanal work with a well know imitation of 4 dimensions per attempt (read more on this at Understanding the differences between BI, Big Data and Data Mining). The automation of discovery is an extremely important step in predictive analytics, in this case, the evasion of highly qualified professionals and possible dissatisfactions overlooked by management.

With VORTX’s ability to discover the different scenarios, we were able to analyze the data and conclude that:

  • People in group 1 and 2 (55% of the company) have a reasonable work satisfaction with a weekly load of 50 hours on average, without receiving promotion or suffering an accident at work.
  • The pattern persists in all departments.
  • The most satisfied groups of the 20 largest were the 7 and 10 who worked more than 247 hours a month, took on several projects but as they did not receive promotion they left the company. These people should be retained since there seams to be highly qualified.
  • Group 16 proves that it is possible to earn a good salary and be dissatisfied. These 77 people should be interviewed to identify the root cause of such unsatisfaction.
  • The cut-off line for non-company employees is: minimum 170 and maximum 238 hours worked per month.People with more than 3.5 years of work harder and are more satisfied.
  • Monthly hours above 261 resulted in very low levels of satisfaction.
  • Monthly hours below 261 with a number of projects greater than 3 turns out in high job satisfaction.
  • Scenario 15 shows the importance of promotion over the last 5 years of work.
  • The ones with more than 5 projects decrease their satisfaction, the ideal number is between 3 and 5. Of course, in this case, to better understand the indicator is necessary to better understand what the number of projects represents to different departments.

For managers, collecting as many indicators as possible is always good especially without interruption in all areas. More variables to enrich your model would be:

  • The distance between employee’s home and work.
  • The average time that is taken from home to work.
  • The number of children.
  • The number of phone calls or emails sent and received.
  • Gender and age and the reason for leaving the job.

We hope this information is useful for you guys in some way. If you find it relevant, share it with your colleagues. If in doubt, contact us! A big hug and success in developing your own HR strategy!

 

What is Aquarela Advanced Analytics?

Aquarela Analytics is Brazilian pioneering company and reference in the application of Artificial Intelligence in industry and large companies. With the Vortx platform and DCIM methodology, it serves important global customers such as Embraer (aerospace & defence), Scania and Randon Group (automotive), Solar Br Coca-Cola (beverages), Hospital das Clínicas (healthcare), NTS-Brasil (oil & gas), Votorantim Energia (energy), among others.

Stay tuned following Aquarela’s Linkedin!

Big Data Scenario Discovery, why is it super useful for decision making?

Big Data Scenario Discovery, why is it super useful for decision making?

Hi everyone, in today’s demonstration, we are going to show you how Big Data Scenario Discovery can help decision making in a profound way in various sectors. We use AQUARELA VORTX Big Data, which is a tool that is a groundbreaking technology in the machine learning field. The Dataset used for the experiment was presented in the previous post about Big Data country auto-segmentation (clustering). The differences here is that this one also includes the Gini Index (found later on) and removes the electrification rate in rural areas. Also, it seeks systemic influences towards a GOAL, in this case, we selected Human Development Index, previously the segmentation just grouped similar countries according to their general characteristics.

The key questions for the experiment:

  1. How many Human Development Index scenarios exist in total? And which countries belong to them?
  2. Amongst 65 indexes, which of them have most influence to define a High or Low Human Development Index?
  3. What is the DNA (set of characteristics) of a High and Low Human Development scenario?

Alright, hang on for a minute! Before you see the results, take a look at all variables analysed in the previous post. Then try to figure out by yourself using the most of your intuition, what would be the answer to these 3 questions. This is a very fun and very useful cognitive task to scenario validation. OK?

Results after pushing the Discoverer button:

HDI - Total

This is the overall distribution of 188 countries, where most of the countries present HDI between 0.65 and 0.75. And very few above 0.90.  In total, there are 15 different HDI scenarios, which the first 3 correspond to more than 94% of the total and that is what we are to focus on.

Scenario 1

The most common scenario and the average HDI

Scenario 2

Countries with the lowest HDI

Scenario 3

Countries with the highest HDI

Where are they located?

Screen Shot 2016-09-15 at 20.21.36

What factors influence HDI the most and the least?

Ranking

The list marks the top and bottom 10 factors. The factor Intimate or Nonintimate partner Violence ever experienced 2001-2011 – Was automatically removed from the ranking as it does not correlate with HDI.

What is the DNA of each main scenario?

Screen Shot 2016-09-15 at 19.56.15

All factors presented at once. Note that the scales on X axis changes dynamically hovering the mouse on VORTX data scope screen.

Screen Shot 2016-09-15 at 19.56.06 Screen Shot 2016-09-15 at 19.55.57

Drilling down into the DNA

Under-Five Mortality rates vs HDI

Screen Shot 2016-09-15 at 19.51.05

Screen Shot 2016-09-15 at 19.51.19

Screen Shot 2016-09-15 at 19.51.30

Filtering visualisation by the most relevant factor and HDI (HDI is the focus of the analytics so it has the darker colour. Here we see that countries with the highest HDI have lowest levels of under-five mortality rate.

Gender Inequality Rate vs HDI

Screen Shot 2016-09-15 at 19.55.12

Screen Shot 2016-09-15 at 19.55.31

Screen Shot 2016-09-15 at 19.55.41

Gross National Income GNI per capta vs HDI

Screen Shot 2016-09-15 at 19.53.38 Screen Shot 2016-09-15 at 19.53.25 Screen Shot 2016-09-15 at 19.53.15

Insights and Conclusions of the study

The possibilities generating new knowledge from this Big Data strategy are endless, but we focused on just a few questions and few print screens to demonstrate its value. During this research, we found interesting to see the machine autonomously confirming some previous intuitions, while breaking some preconceptions. It is important to mention that we are not measuring causation as if one factor leads to another and vice-versa, the results show systemic correlations only. Here there are some of them that called our attention:

  • Gender inequality playing a strong role and inverse correlation in Human Development Index while we are living a transition of the industrial age to information where knowledge if surpassing the physical differences between genders.
  • Research and development having a direct correlation to HDI.
  • The United States having its own scenario due to its unique systemic characteristics.
  • Gross National Income GNI per capita leading the ranking and the values around 40 thousand dollars.
  • Public expenditure ahead of Education related indexes.

Business applications

Applying the same questions we had at the beginning of the article, now let’s see how they would look like for different business scenarios:

Sales

  • How many scenarios exist for your sales? Which customer segment belong to each scenario?
  • Amongst several business factors, which of them have the most influence to define a High or Low revenue?
  • What is the DNA (characteristics) of a High and Low revenue scenario?

Industry

  • How many production/maintenance scenarios exist for your production line? Which processes belong to each scenario?
  • Amongst several production factors, which of them have the most influence to define a High or Low outcome or High or Low maintenance/costs?
  • What is the DNA (characteristics) of a High and Low production/maintenance scenario?

Healthcare

  • How many patient scenarios exist for a specific disease or medical condition? Which patients belong to each scenario?
  • Amongst several patient characteristics, which of them have the most influence to result in High or Low levels of a specific disease or medical condition?
  • What is the DNA (characteristics) of a High and Low medical condition scenarios?

All in all, we expect that this article can help easy landing on the newest territories of machine learning and in case you need more information on how this solution applies to your business scenario, please let us know. If you found this analytics interesting and worth spreading, do so. Super thanks on behalf of Aquarelas team!

What is Aquarela Advanced Analytics?

Aquarela Analytics is Brazilian pioneering company and reference in the application of Artificial Intelligence in industry and large companies. With the Vortx platform and DCIM methodology, it serves important global customers such as Embraer (aerospace & defence), Scania and Randon Group (automotive), Solar Br Coca-Cola (beverages), Hospital das Clínicas (healthcare), NTS-Brasil (oil & gas), Votorantim Energia (energy), among others.

Stay tuned following Aquarela’s Linkedin!