5 real data challenges in the food sector

5 real data challenges in the food sector

The food sector and food security are a global concern and Brazil is one of the main countries responsible for the world demand for food (Estadão). In this sense, what are the main challenges related to data management to optimize Brazil’s operational efficiency in the food/agribusiness sector, which today represents 21% of Brazil’s GDP?

This article addresses the issue with the bias of Aquarela’s experience in Advanced Analytics and Artificial Intelligence projects carried out in large operations in Brazil. The risk of a lack of information is as relevant as its excess and lack of analysis, which can impact the efficiency of the sector’s logistics chain as a whole.

Below, we have elaborated on some of these main risks.

Characterization of the food sector

The food sector is quite varied due to the large extension of the production chain, which ranges from agricultural inputs, industrialization, transport logistics to commercialization in consumer markets and finally the final consumer.

As fundamental characteristics, the food sector is directly linked to factors that can have great variability and little control, such as: 

  • Climate (temperature, water volume, luminosity and others);
  • Economic factors such as currency fluctuations;
  • Infrastructure;
  • Domestic/external market demand.

In addition to these factors, below we list some related to data management. We also show how they, if well organized, can help mitigate the effects of uncontrollable variables in the food supply chain.

01 – Incompleteness of information

The supply chain is quite large. This makes the data complex and difficult to interpret due to the different phases of each process, culture and region. In addition, it causes many important planning decisions to take place with very limited information and high risk. In other words, decisions are made without a vision of the complete scenario of the chain, largely following the manager’s intuition.

The lack of quality information is a big risk. If data is lacking today, imagine what the scenario was like 10 or 20 years ago.

In recent years, the industry and retail have shown great advances in their computerization processes with various traceability solutions. With the evolution of Industry 4.0 technologies (IOT and 5G) in the coming years, it is likely that the food market, from the agricultural and industrial sector to the commercial sector, will hold more complete information for decision making than what is currently available today.

02 – Data from multiple sources

If data is becoming more and more present with the development of informatization and communication, then the next problem is trying to analyze data from multiple and disconnected sources.

Different data is often stored on different systems, thus leading to incomplete or inaccurate analyses. Combining data manually to form datasets (what are datasets?) for analysis is quite heavy and time-consuming work and can limit insights into the reality of operations.

What is sought is the construction of Data Lakes adherent to the type of management to democratize access to data by market professionals, thus optimizing their activities with increasingly powerful analytics solutions. This not only frees up time spent accessing multiple sources, it also allows for cross-comparisons and ensures that the data is complete.

03 – Low quality data

Having incorrect data can be just as or more harmful than not having it. Nothing is more harmful to data analysis than inaccurate data, especially if the idea is to use data science and machine learning practices. Without a good input, the output will be unreliable.

One of the main causes of inaccurate data is manual errors made during data entry, especially when information is collected manually. Another problem is asymmetric data: when information from one system does not reflect changes made to another system and thus makes it out of date.

Analytics strategic planning projects seek to mitigate and/or eliminate these problems. This happens from systematic processes of data dictionarization, survey of processes, functions, and so on.

04 – Lack of data talents

Some organizations and companies, in general, are not able to achieve better levels of efficiency in operations, as they suffer from a lack of talent in the area of ​​data analysis. In other words, even if the company has consistent technologies and data, the manpower to execute the analysis and action plans still counts a lot at the end of the day.

This challenge can be mitigated in three ways:

  • Develop an analytical technology stack that is always up-to-date and adherent to the business and with up-to-date training materials.
  • Add analytical skills to the hiring process. In addition, invest in the constant training of the team on new data technologies related to the technological stack of the operation.
  • Use analytics outsourcing to accelerate the process. In this article, for example, we list the main aspects to be considered when choosing a good supplier.

05 – Customization of values ​​and product characteristics in the food sector

Although, according to Embrapa, about 75% of the entire world food sector is based on just 12 types of plants and 5 types of animals, there are thousands of different products, marketed in multiple ways, prices and deadlines in the final consumer market.

Just as an example, in the area of ​​animal protein, the process of marketing cattle meat requires investments, infrastructure, deadlines and processes that are quite different from what would be for the production of pork or even chicken.

Since the processes are different, the data generated by the production chain also becomes different, requiring customizations in information systems and databases. As a consequence, there are changes in models of:

The recommendation is to parameterize the systems based on the most common classifications in the market and focus on the most important products from a strategic point of view (contribution margin, volume or sales price).

5 real data challenges in the food sector – Final thoughts

In this article, we have collected some relevant points about the real challenges of data in the area of ​​food, a sector in which  Brazil stands out as one of the main global players.

It is a complex area with several risk factors and great opportunities for optimization with the increasingly intensive use of data. Previously, we wrote an article related to data strategies for energy trading and which in part has the same challenges related to decision making in the food sector.

We, at Aquarela Analytics, constantly work with these challenges of making the complex  things simple and with good risk mitigation. So if you have any questions, get in touch with us!

What is Aquarela Advanced Analytics?

Aquarela Analytics is Brazilian pioneering company and reference in the application of Artificial Intelligence in industry and large companies. With the Vortx platform and DCIM methodology, it serves important global customers such as Embraer (aerospace & defence), Scania and Randon Group (automotive), Solar Br Coca-Cola (beverages), Hospital das Clínicas (healthcare), NTS-Brasil (oil & gas), Votorantim Energia (energy), among others.

Stay tuned following Aquarela’s Linkedin!

Author

The rise of the Self-taught Programmer

The rise of the Self-taught Programmer

The desire to become a self-taught programmer or developer is at an all-time high right now and the pandemic is partially to blame for the rapid growth of this profession. During the pandemic, a lot of physical jobs were lost, but the Tech industry experienced an immense amount of growth in revenue and job opportunities, and these opportunities started to attract unemployed people or just ordinary people looking to get a slice from the industry. 

The Tech industry comes with really good if not one of the best working conditions and benefits ever. The most famous one being the benefit of working from home or what is also known as the “Home office”. 

With all these shiny benefits, people started looking for easier ways to join the tech industry, basically, without going through the hassle of paying for universities and/or colleges and having to study for years and years and this resulted in the explosion in the number of self-taught programmers.

What does it mean to be a Self-Taught programmer?

When you go to a University or college, you have a fixed curriculum or a ‘roadmap’ that shows you exactly what to study, in which order, and how to go about doing it. However, when you take the self-taught path, things are extremely different, because you are choosing the roadmap yourself, maybe with the help of some friends or family members, or maybe even a quick search on Reddit or youtube, but the whole idea is that you are in charge of putting together your plan of action, which may not always be the best of plans, however when that plan succeeds you can gladly call yourself a “Self-taught” programmer. 

The challenges

Although it seems easy to many, being self-taught is arduous because you are constantly in battle with your doubts, with exhausting unpredictability and uncertainty.  It takes time, patience, continuous learning, doing extensive research, building projects, and a lot of failing to become a self-taught programmer, but during this whole process, you are creating or building what is referred to as a “coding muscle”.

I remember back in early 2019 when I decided to embark on the programming journey, full of excitement and cheer, ready to change the world with code, but little did I know of what was in store for me. The process was very daunting, I was doubting myself almost every day during those early stages, I would find myself asking questions like who am I to do this? I am over 30 already and without any college or university degree, so where exactly do I fit in this vast world of programming, which programming language should I learn, do I want to learn back-end or front-end? and the list went on. I am pretty sure if you are a self-taught programmer, then some of those questions might be familiar because those are just some of the stages most self-taught programmers go through.

Why you should hire self-taught programmer

Well, self-taught programmers may not have the necessary diplomas or degrees in the programming field, but I can assure you that they can outwork, outthink and outmaneuver many varsity or college graduates.

  • They have vigor, passion, and a huge inner drive to achieve 

For starters, if you are teaching yourself to code, you should either really love it or you must really want it with your whole being because it takes time, a huge amount of patience, dedication, a lot of guts, and just an immense work ethic. Most self-taught programmers possess all of these traits and much more.

  • They have support and know where to get information.

Although it might seem like a lonely journey for many, self-taught programmers actually often form part of a community, where they share their problem-solving skills and ideas with each other, and this can be an advantage for the employer because he is not only hiring one programmer but that programmer comes with a whole community of developers who possess various forms expertise in different fields or technologies, that the programmer can always tap into.

  • Always ready to go

All new employees need to go through the onboarding and training phases respectively because it is a vital experience for the employee, but it also gets expensive the more it drags on. Being self-taught mostly but not often means that you have a decent amount of real-world experience, which you picked up along your learning journey, be it in collaborative projects or freelancing gigs. So with that experience, the developer will most likely be ready to start coding in less time and with minimal training – Often saving the company time and money.

  • When all fails, they always have plan C, D, E, and more if need be

Self-taught developers are skilled problem solvers, every great developer has an extensive history of solving problems. Universities give programmers a solid base in theory, but theory goes out of the window when you encounter real-life coding problems or challenges.

A fundamental part of self-teaching is knowing how to untangle yourself when you are stuck in a situation, identifying problems, solving them, and learning from the process.

Read too: Industry 4.0: Web 3.0 and digital transformation

Conclusion

I hope this text doesn’t sound one-sided or maybe in favor of the self-taught programmer as opposed to the traditional varsity or college-educated programmer, but take it with a grain of salt. Studies have shown that happy employees are up to 13% more productive (according to the University of Oxford)  and self-taught developers are passionate about what they do, so there is no doubt that this is an advantage for the company. With all that said, I think we can all agree that the self-taught programmer is here to stay!  🎓

Did you like the article? Leave your comment.

What is Aquarela Advanced Analytics?

Aquarela Analytics is Brazilian pioneering company and reference in the application of Artificial Intelligence in industry and large companies. With the Vortx platform and DCIM methodology, it serves important global customers such as Embraer (aerospace & defence), Scania and Randon Group (automotive), Solar Br Coca-Cola (beverages), Hospital das Clínicas (healthcare), NTS-Brasil (oil & gas), Votorantim Energia (energy), among others.

Stay tuned following Aquarela’s Linkedin!

Author

AI and Analytics strategic planning: concepts and impacts

AI and Analytics strategic planning: concepts and impacts

The benefits and positive impacts of the use of data and, above all, artificial intelligence are already a reality in the Brazilian Industry. These benefits are most evident in areas ranging from dynamic pricing in education, forecasting missed medical appointments, predicting equipment breakdowns, and even monitoring the auto parts replacement market. However, to achieve these benefits, organizations need to reach a level of analytical maturity that is adequate for every challenge they face.

In this article, we are going to discuss the concepts of AI and Analytics Strategic Planning and also look at which characteristics of the scenarios demand this type of project within the Digital Transformation journey of companies towards Industry 4.0.

What is AI and Analytics strategic planning?

The AI ​​and Data Analytics Strategic Planning is a structuring project that combines a set of elaborate consultative activities (preferably by teams with an external view of the organization) for the survey of scenarios, mapping of analytical processes, elaboration of digital assets (systems, databases, and others) to assess the different levels of analytical maturity of teams, departments and the organization as a whole.

As a result, shared definitions of the vision, mission, values, policies, strategies, action plans, and good data governance practices are accomplished to leverage the organization’s analytical maturity level in the least possible time and cost.

Symptoms of low analytic maturity scenarios

Although there are many types of businesses, products, and services on the market, here we present emerging patterns that help to characterize the problem of companies analytical maturity and can generate interesting reflections:

  1. Is it currently possible to know which analytics initiatives (data analytics) have already taken place and are taking place? Who is responsible? And what were the results?
  2. In analytics initiatives, is it possible to know what data was used and even reproduce the same analysis?
  3. Does data analysis happen randomly, spontaneously, and isolated in departments?
  4. Is it possible to view all data assets or datasets available to generate analytics?
  5. Are there situations in which the same indicator appears with different values ​​depending on the department in which the analysis is carried out?
  6. Are there defined analytic data dictionaries?
  7. What is the analytical technology stack?
  8. Are data analytics structuring projects being considered in strategic planning?

Other common problems

Organizational identity

Scenarios with low analytic maturity do not have data quality problems in isolation. There are usually systemic problems that involve the complexity of business processes, the level of training of teams, knowledge management processes, and finally, the choice of technologies for operating ERP, CRM, SCM and how these transactional systems are related.

Security Issues

Companies are living organisms that constantly evolve with people working in different areas. Thus, over time, control of the access levels of each employee is lost, causing unauthorized people to have access to sensitive information and also the opposite when people cannot access the data they need for their work.

Excessive use of spreadsheets and duplicates

Spreadsheets are one of the most useful and important management tools and for that reason, they are always helping in various processes. The big side effect of excessive use of spreadsheets is the maintenance of knowledge of each process. When there are two or more people and the volume of information and updates starts to grow, it becomes difficult to manage the knowledge that travels in blocks with spreadsheets. Additionally, many duplications occur and make it virtually impossible to securely consolidate data in large volumes.

What are the benefits of AI and Analytics strategic planning?

Data-driven management is expected to provide not just drawings and sketches of operations or market conditions, but a high-resolution photograph of present and future reality. Thus, it provides subsidies for corporate strategic planning in the short, medium, and long term with the following gains:

  • Procedural and technological readiness for data lakes projects and Advanced Analytics and AI labs.
  • Increased intensity of application of scientific techniques to businesses, such as comparative analysis, scenario simulations, identification of behavior patterns, demand forecasting, and others.
  • Increased accuracy of information.
  • Security of access to information at different levels.
  • Acceleration of the onboarding processes (entry of new team members) who in turn learn more quickly the work scenario and also begin to communicate more efficiently.
  • Greater data enrichment from increased interaction of teams from different sectors for analytical challenges.
  • Increased visibility into analytics operations, Organization for localizability, accessibility, interoperability, and reuse of digital assets.
  • Optimized plan of change for data-driven Corporate Governance.
  • Incorporation of Analytical and AI mindset in different sectors.
  • Homogenization of data policies and controls.

AI and Analytics strategic planning – Conclusions and recommendations 

The preparation of strategic AI and Analytics planning is an important step to reach the level of data governance that allows the intensive use of analytics and artificial intelligence in operations since the high failure rate of analytical projects is linked to low quality of data, processes, and even the correct use of technologies (training).

Structuring projects, such as AI strategic planning and Analytics are, or at least should be, the first step in the journey of digital transformation of traditional companies. Therefore, we are convinced that in the future every successful company will have a clear and shared idea (vision, mission, and values) of what data means to them and their business model, in contrast to investments in data technology purely and simply because of the competition.

We believe that the focus on orchestrated (tidy and synchronized) data will be reflected in almost every area, for example: in the range of services, in revenue models, in key resources, processes, cost structures, in your corporate culture, in your focus on clients and networks, and in its corporate strategy.

Last but not least, it is worth pointing out that, for a successful structuring to happen, a long-term holistic approach must be taken. This means investments in optimized technology, people, and processes to enable continued business growth.

How Aquarela has been acting

Developing new technologies and new data-driven business models in a vision that the amount and availability of more data will continue to grow, taking the business to new heights of optimization.

What we do specifically for companies:

  • We analyze data-generating enterprise ecosystems.
  • We determine analytic maturity and derive action fields for data-driven organizations and services.
  • We develop and evaluate data-based services.
  • We identify and estimate the data’s potential for future business models.
  • We design science-based digital transformation processes and guide their organizational integration.

For more information – Click here.

Did you like the article? Leave your comment.

What is Aquarela Advanced Analytics?

Aquarela Analytics is Brazilian pioneering company and reference in the application of Artificial Intelligence in industry and large companies. With the Vortx platform and DCIM methodology, it serves important global customers such as Embraer (aerospace & defence), Scania and Randon Group (automotive), Solar Br Coca-Cola (beverages), Hospital das Clínicas (healthcare), NTS-Brasil (oil & gas), Votorantim Energia (energy), among others.

Stay tuned following Aquarela’s Linkedin!

Authors

How VORTX Big Data organises the world?

How VORTX Big Data organises the world?

Hello everyone,

The objective of this post is to show you what happens when we give several numbers to a machine (VORTX Big Data) and it finds out by itself how the countries should be organized into different boxes. This technique is called clustering! The questions we will answer in this post are:

  • How are countries segmented based on the world’s indexes?
  • What are the characteristics of each group?
  • Which factors are the most influential for the separation?

Here we go!

Data First – What comes in?

I have gathered 65 indexes of 188 countries of the world, the sources are mainly from:

  • UNDESA 2015,
  • UNESCO Institute for Statistics 2015,
  • United Nations Statistics Division 2015,
  • World Bank 2015,
  • IMF 2015.

Selected variables for the analysis were:

  1. Human Development Index HDI-2014
  2. Gini coefficient 2005-2013
  3. Adolescent birth rate 15-19 per 100k 20102015
  4. Birth registration under age 5 2005-2013
  5. Carbon dioxide emissions Average annual growth
  6. Carbon dioxide emissions per capita 2011 Tones
  7. Change forest percentile 1900 to 2012
  8. Change mobile usage 2009 2014
  9. Consumer price index 2013
  10. Domestic credit provided by financial sector 2013
  11. Domestic food price level 2009 2014 index
  12. Domestic food price level 2009-2014 volatility index
  13. Electrification rate or population
  14. Expected years of schooling – Years
  15. Exports and imports percentage GPD 2013
  16. Female Suicide Rate 100k people
  17. Foreign direct investment net inflows percentage GDP 2013
  18. Forest area percentage of total land area 2012
  19. Fossil fuels percentage of total 2012
  20. Freshwater withdrawals 2005
  21. Gender Inequality Index 2014
  22. General government final consumption expenditure – Annual growth 2005 2013
  23. General government final consumption expenditure – Perce of GDP 2005-2013
  24. Gross domestic product GDP 2013
  25. Gross domestic product GDP per capita
  26. Gross fixed capital formation of GDP 2005-2013
  27. Gross national income GNI per capita – 2011  Dollars
  28. Homeless people due to natural disaster 2005 2014 per million people
  29. Homicide rate per 100k people 2008-2012
  30. Infant Mortality 2013 per thousands
  31. International inbound tourists thousands 2013
  32. International student mobility of total tertiary enrolment 2013
  33. Internet users percentage of population 2014
  34. Intimate or no intimate partner violence ever experienced 2001-2011
  35. Life expectancy at birth- years
  36. Male Suicide Rate 100k people
  37. Maternal mortality ratio deaths per 100 live births 2013
  38. Mean years of schooling – Years
  39. Mobile phone subscriptions per 100 people 2014
  40. Natural resource depletion
  41. Net migration rate per 1k people 2010-2015
  42. Physicians per 10k people
  43. Population affected by natural disasters average annual per million people 2005-2014
  44. Population living on degraded land Percentage 2010
  45. Population with at least some secondary education percent 2005-2013
  46. Pre-primary 2008-2014
  47. Primary-2008-2014
  48. Primary school dropout rate 2008-2014
  49. Prison population per 100k people
  50. Private capital flows percentage GDP 2013
  51. Public expenditure on education Percentage GDP
  52. Public health expenditure percentage of GDP 2013
  53. Pupil-teacher ratio primary school pupils per teacher 2008-2014
  54. Refugees by country of origin
  55. Remittances inflows GDP 2013
  56. Renewable sources percentage of total 2012
  57. Research and development expenditure 2005-2012
  58. Secondary 2008-2014
  59. Share of seats in parliament percentage held by woman 2014
  60. Stock of immigrants percentage of population 2013
  61. Taxes on income profit and capital gain 205 2013
  62. Tertiary -2008-2014
  63. Total tax revenue of GDP 2005-2013
  64. Tuberculosis rate per thousands 2012
  65. Under-five Mortality 2013 per thousands

What comes out?

Let’s start looking at the map, where these groups are, then we go to the VORTX’s visualization for better understanding the DNA (composition of factors of each group).

Mundi

Click on the picture to play around with the map inside Google maps.

Ok, I see the clusters but know I want to know what is the combination of characteristics that unite or separate them. In the picture below is the VORTX visualization considering all groups and all factors.

Main groups

On the left side, there are the groups and their proportion. Segmentation sharpness is the measurement of the differences of groups based on all factors. On the right side is the total composition of variables or we can call the world’s DNA.

In the next figures, you will see how different it becomes when we select each group some groups.

Cluster 1

The most typical situation of a country representing 51,60.  We call them as average countries.

Cluster 2

The second most common type representing 26.46% of the globe.

Cluster 3

This is the cluster that has the so called first world countries with results are above average representing 14.89% of the globe. The United States does not belong to these group, but Canada, Australia, New Zeeland and Israel.

Cluster 4 - USA

The US is numerically so different from the rest of the world that VORTX decided to separate it alone in one group that had the highest distinctiveness = 38.93%.

United Arab Emirates

Other countries didn’t have similar countries to share the same group, this is the case of United Arab Emirates.

Before we finish, below I add the top 5 most and the 5 least influential factors that VORTX identified as the key to create the groups.

Top 5

  1. Maternal mortality ratio deaths per 100 live births 2013 – 91% influence
  2. Under-five Mortality 2013 thousand – 90%
  3. Human Development Index HDI-2014  – 90%
  4. Infant Mortality 2013 per thousands – 90%
  5. Life expectancy at birth- years – 90%

Bottom 5

  1. Renewable sources percentage of total 2012 – 70% influence
  2. Total tax revenue of GDP 2005-2013 – 72%
  3. Public health expenditure percentage of GDP 2013 73%
  4. General government final consumption expenditure – Percentual of GDP 2005-2013 73%
  5. General government final consumption expenditure – Annual growth 2005 2013 75%

Conclusions

According to VORTX if you plan to live in another country or sell your product abroad, it would be wise to see to which group this country belong to. If it belongs to the same group you live in, then you know what to expect.

Could other factors be added to removed from the analysis? Yes, absolutely. However, sometimes it is not that easy to get the information you need at the time you need it, Big Data analyses usually have several constraints and typically really on the type of questions are posed to the Data and to the algorithm that, in turn, relies on the creativity of the Data Scientist.

The clustering approach is becoming more and more common in the industry due to its strategic role in organizing and simplifying the decision-making chaos. So how could a manager look at 12.220 cells to define a regional strategy?

Any question or doubts? Or anything that calls your attention? Please leave a comment!

For those who wish to see the platform operating in practice, here is a video using data from Switzerland. Enjoy it!.

 

What is Aquarela Advanced Analytics?

Aquarela Analytics is Brazilian pioneering company and reference in the application of Artificial Intelligence in industry and large companies. With the Vortx platform and DCIM methodology, it serves important global customers such as Embraer (aerospace & defence), Scania and Randon Group (automotive), Solar Br Coca-Cola (beverages), Hospital das Clínicas (healthcare), NTS-Brasil (oil & gas), Votorantim Energia (energy), among others.

Stay tuned following Aquarela’s Linkedin!

How Titanic passengers are segmented by VORTX Big Data?

How Titanic passengers are segmented by VORTX Big Data?

To demonstrate how VORTX works, I selected a well-known dataset with information about the passengers who embarked on Titanic. Despite the tragic event, this dataset is fairly rich in details and has been widely used in Machine Learning communities since it allows the application of several Big Data techniques.

In this case, I am going to apply VORTX, which it is Big Data tool focused giving automatic segmentation plus other important decision-making indicators. This technique is called clustering. More information about this on this post (How can big data clustering strategy help business)In the conclusion section, I give some ideas on how it help businesses by means of this innovative approach.

Titanic Dataset summary

According to Encyclopedia Titanica “On 10 April 1912, the new liner sailed from Southampton, England with 2,208 passengers and crew, but four days later she collided with an iceberg and sank: 1496 people died and 712 survived”.  For this analysis the data we had access we had the following figures: 

  • 1309 people on board of which 500 survived (38%) and 809 (62%) died.
  • The average age of 29.88 years (estimated).
  • 466 women of which 127 died and 339 survived.
  • 843 man, of which 682 died and 161 survived.
  • Ticket cost on average £53.65 per woman while £76.60 for man.

For more details on the complete dataset – Google for Titanic Dataset.

Factors under analysis

Unfortunately, 267 passengers (20.39%) had to be excluded from the analysis due to missing age values. Furthermore, out of 15 factors presented in the original file, I select the numerical ones with stronger weights calculated by VORTX. Usually, we classify factors, variables or data attributes in the following 3 categories:

  • Protagonist – Factors with strong positive influence to generate a valuable pattern with clarity.
  • Antagonist – Factors with noise or unclear patterns and negative influence that play against the protagonist.
  • Supporting – Factors that do not play a significant role in changing the path of the analysis, but can enrich the results.

According to the influence power, the protagonists chosen for this analysis were:

  • Age of the passenger = 87.85%
  • How much each passenger paid to embark = 72.69%
  • Number of parents on the ship = 71.69%
  • Number of siblings or spouses on the ship = 72.42%

During the calculation the gender that indicates if the passenger was male or female tended to play an antagonist role, meaning the absence of a pattern to form the groups dropping the dataset sharpness to 7%.  Therefore, it was removed.

VORTX Results and group characteristics

After processing, VORTX resulted in the following indicators, which most of them are not offered by other algorithms, therefore, I give a brief explanation for each of them:

  • Dataset Sharpness = 33.64%. It shows how clear or confident the machine is about the discovered grouping patterns. According to our dataset quality scale, sharpness above 20% is already useful for decision making.
  • Automatic discovery of segments (groups) = 8. This is a function that makes the whole process a lot easier for the data analyst. Unlike k-means and other algorithms, VORTX finds the right (ideal) number of groups by itself reducing dramatically the segmentation errors that topically happened.
  • Clustering Distinctness = How much different the elements of each group are in relation to the overall group that makes them a group. The most distinctive one is number 5 with 51.48% (darker color) and the least one group 1 with 8.58%. This means that elements from group 5 tend to more homogeneous than the other groups.

VORTX VIEW

VORTX screenshot

By analyzing the groups and checking against the ones who survived or not the trip I came to the survival rate of each group plus the average Ticket Fare, so if you have the characteristics of the group 5 or 7 you would have better chances of surviving.  (more…)