Hi everyone, in today’s demonstration, we are going to show you how Big Data Scenario Discovery can help decision making in a profound way in various sectors. We use AQUARELA VORTX Big Data, which is a tool that is a groundbreaking technology in the machine learning field. The Dataset used for the experiment was presented in the previous post about Big Data country auto-segmentation (clustering). The differences here is that this one also includes the Gini Index (found later on) and removes the electrification rate in rural areas. Also, it seeks systemic influences towards a GOAL, in this case, we selected Human Development Index, previously the segmentation just grouped similar countries according to their general characteristics.

The key questions for the experiment:

  1. How many Human Development Index scenarios exist in total? And which countries belong to them?
  2. Amongst 65 indexes, which of them have most influence to define a High or Low Human Development Index?
  3. What is the DNA (set of characteristics) of a High and Low Human Development scenario?

Alright, hang on for a minute! Before you see the results, take a look at all variables analysed in the previous post. Then try to figure out by yourself using the most of your intuition, what would be the answer to these 3 questions. This is a very fun and very useful cognitive task to scenario validation. OK?

Results after pushing the Discoverer button:

HDI - Total

This is the overall distribution of 188 countries, where most of the countries present HDI between 0.65 and 0.75. And very few above 0.90.  In total, there are 15 different HDI scenarios, which the first 3 correspond to more than 94% of the total and that is what we are to focus on.

Scenario 1

The most common scenario and the average HDI

Scenario 2

Countries with the lowest HDI

Scenario 3

Countries with the highest HDI

Where are they located?

Screen Shot 2016-09-15 at 20.21.36

What factors influence HDI the most and the least?

Ranking

The list marks the top and bottom 10 factors. The factor Intimate or Nonintimate partner Violence ever experienced 2001-2011 – Was automatically removed from the ranking as it does not correlate with HDI.

What is the DNA of each main scenario?

Screen Shot 2016-09-15 at 19.56.15

All factors presented at once. Note that the scales on X axis changes dynamically hovering the mouse on VORTX data scope screen.

Screen Shot 2016-09-15 at 19.56.06 Screen Shot 2016-09-15 at 19.55.57

Drilling down into the DNA

Under-Five Mortality rates vs HDI

Screen Shot 2016-09-15 at 19.51.05

Screen Shot 2016-09-15 at 19.51.19

Screen Shot 2016-09-15 at 19.51.30

Filtering visualisation by the most relevant factor and HDI (HDI is the focus of the analytics so it has the darker colour. Here we see that countries with the highest HDI have lowest levels of under-five mortality rate.

Gender Inequality Rate vs HDI

Screen Shot 2016-09-15 at 19.55.12

Screen Shot 2016-09-15 at 19.55.31

Screen Shot 2016-09-15 at 19.55.41

Gross National Income GNI per capta vs HDI

Screen Shot 2016-09-15 at 19.53.38 Screen Shot 2016-09-15 at 19.53.25 Screen Shot 2016-09-15 at 19.53.15

Insights and Conclusions of the study

The possibilities generating new knowledge from this Big Data strategy are endless, but we focused on just a few questions and few print screens to demonstrate its value. During this research, we found interesting to see the machine autonomously confirming some previous intuitions, while breaking some preconceptions. It is important to mention that we are not measuring causation as if one factor leads to another and vice-versa, the results show systemic correlations only. Here there are some of them that called our attention:

  • Gender inequality playing a strong role and inverse correlation in Human Development Index while we are living a transition of the industrial age to information where knowledge if surpassing the physical differences between genders.
  • Research and development having a direct correlation to HDI.
  • The United States having its own scenario due to its unique systemic characteristics.
  • Gross National Income GNI per capita leading the ranking and the values around 40 thousand dollars.
  • Public expenditure ahead of Education related indexes.

Business applications

Applying the same questions we had at the beginning of the article, now let’s see how they would look like for different business scenarios:

Sales

  • How many scenarios exist for your sales? Which customer segment belong to each scenario?
  • Amongst several business factors, which of them have the most influence to define a High or Low revenue?
  • What is the DNA (characteristics) of a High and Low revenue scenario?

Industry

  • How many production/maintenance scenarios exist for your production line? Which processes belong to each scenario?
  • Amongst several production factors, which of them have the most influence to define a High or Low outcome or High or Low maintenance/costs?
  • What is the DNA (characteristics) of a High and Low production/maintenance scenario?

Healthcare

  • How many patient scenarios exist for a specific disease or medical condition? Which patients belong to each scenario?
  • Amongst several patient characteristics, which of them have the most influence to result in High or Low levels of a specific disease or medical condition?
  • What is the DNA (characteristics) of a High and Low medical condition scenarios?

All in all, we expect that this article can help easy landing on the newest territories of machine learning and in case you need more information on how this solution applies to your business scenario, please let us know. If you found this analytics interesting and worth spreading, do so. Super thanks on behalf of Aquarelas team!

VORTX Big Data

Aquarela developed VORTX Big Data to make predictive analytics a lot easier, more precise and more robust than current solutions on the market with significant impact on business problems such as: Churn reduction, business scenarios discovery, predictive maintenance, market segmentation and healthcare resource optimisation.

 

Autores
Marcos Santos
Founder of Aquarela, CEO and architect of the VORTX platform. Master in Engineering and Knowledge Management, enthusiast of new technologies, having expertise in Scala functional language and algorithms of Machine Learning and IA.

Joni Hoppen
Founder of Aquarela, professor and lecturer in the area of Data Science, master in Information Systems, focused on processes of rapid prototyping of Big Data Analytics and data culture.

Informações para referenciação: Gostou do material? Caso queira enriquecer sua pesquisa ou relatório (seja blog post ou artigo acadêmico), referencie nosso conteúdo como: Aquarela 2018 - Inteligência Artificial para negócios (www.aquare.la).