What are outliers and how to treat them in Data Analytics?

What are outliers and how to treat them in Data Analytics?

What are Outliers? They are data records that differ dramatically from all others, they distinguish themselves in one or more characteristics. In other words, an outlier is a value that escapes normality and can (and probably will) cause anomalies in the results obtained through algorithms and analytical systems. There, they always need some degrees of attention.

Understanding the outliers is critical in analyzing data for at least two aspects:

  1. The outliers may negatively bias the entire result of an analysis;
  2. the behavior of outliers may be precisely what is being sought.

While working with outliers, many words can represent them depending on the context. Some other names are: Aberration, oddity, deviation, anomaly, eccentric, nonconformist, exception, irregularity, dissent, original and so on. Here are some common situations in which outliers arise in data analysis and suggest best approaches on how to deal with them in each case.

How to identify which record is outlier?

Find the outliers using tables

The simplest way to find outliers in your data is to look directly at the data table or worksheet – the dataset, as data scientists call it. The case of the following table clearly exemplifies a typing error, that is, input of the data. The field of the individual’s age Antony Smith certainly does not represent the age of 470 years. Looking at the table it is possible to identify the outlier, but it is difficult to say which would be the correct age. There are several possibilities that can refer to the right age, such as: 47, 70 or even 40 years.

Antony Smith age outlier

Antony Smith age outlier

In a small sample the task of finding outliers with the use of tables can be easy. But when the number of observations goes into the thousands or millions, it becomes impossible. This task becomes even more difficult when many variables (the worksheet columns) are involved. For this, there are other methods.

Find outliers using graphs

One of the best ways to identify outliers data is by using charts. When plotting a chart the analyst can clearly see that something different exists. Here are some examples that illustrate the view of outliers with graphics.

Case: outliers in the Brazilian health system

In a study already published on Aquarela’s website, we analyzed the factors that lead people no-show in medical appointments scheduled in the public health system of the city of Vitória in the state of Espirito Santo, which caused and approximate loss of 8 million US dollars a year million.  

In the dataset, several patterns have been found, for example: children are practically not missing their appointments; and women attend consultations much more than men. However, a curious case was that of an outlier, who at age 79 scheduled a consultation 365 days in advance and actually showed up in her appointment. This is a case, for example, of a given outlier that deserves to be studied, because the behavior of this lady can bring relevant information of measures that can be adopted to increase the rate of attendance in the schedules. See the case in the chart below.

sample of 8000 appointments

sample of 8000 appointments

Case: outliers in the Brazilian financial market

On May 17, 2017 Petrobras shares fell 15.8% and the stock market index (IBOVESPA) fell 8.8% in a single day. Most of the shares of the Brazilian stock exchange saw their price plummet on that day. This strong negative variation had as main motivation the Joesley Batista, one of the most shocking political events that happened in the first half of 2017.

This case represents an outlier for the analyst who, for example, wants to know what was the average daily return on Petrobrás shares in the last 180 days. Certainly, the Joesley strongly affected the average down. In analyzing the chart below, even in the face of several observations, it is easy to identify the point that disagrees with the others.

Petrobras 2017

The data of the above example may be called outlier, but if taken literally, it can not necessarily be considered a “outlier.” The “curve” in the above graph, although counter-intuitive, is represented by the straight line that cuts the points. Still from the graph above you can see that although different from the others, the data is not exactly outside the curve. A predictive model could easily infer with high precision that a 9% drop in the stock market index would represent a 15% drop in Petrobras’ share price. In another case, still with data from the Brazilian stock market, the stock of the company Magazine Luiza appreciated 30.8% on a day when the stock market index rose by only 0.7%. This data, besides being an atypical point, distant from the others, also represents an outlier. See the chart:

This is an outlier case that can harm not only descriptive statistics calculations, such as the mean and median, for example, but it also affects the calibration of predictive models.

Find outliers using statistical methods

A more complex but quite precise way of finding outliers in a data analysis is to find the statistical distribution that most closely approximates the distribution of the data and to use statistical methods to detect discrepant points. The following example represents the histogram of the known driver metric “kilometers per liter”. The dataset used for this example is a public dataset greatly exploited in statistical tests by data scientists. The dataset contains “Motor Trend US magazine” of 1974 and comprises several aspects about the performance of 32 models. More details at this link.

The histogram is one of the main and simplest graphing tools for the data analyst to use in understanding the behavior of the data. In the histogram below, the blue line represents what the normal (Gaussian) distribution would be based on the mean, standard deviation and sample size, and is contrasted with the histogram in bars. The red vertical lines represent the units of standard deviation. It can be seen that cars with outlier performance for the season could average more than 14 kilometers per liter, which corresponds to more than 2 standard deviations from the average.

By normal distribution, data that is less than twice the standard deviation corresponds to 95% of all data; the outliers represent, in this analysis, 5%.

Conclusions: What to do with outliers?

We have seen  that it is imperative to pay attention to outliers because they can bias data analysis. But, in addition to identifying outliers we suggest some ways to better treat them:

  • Exclude the discrepant observations from the data sample: when the discrepant data is the result of an input error of the data, then it needs to be removed from the sample;
  • perform a separate analysis with only the outliers: this approach is useful when you want to investigate extreme cases, such as students who only get good grades, companies that make a profit even in times of crisis, fraud cases, among others. use clustering methods to find an approximation that corrects and gives a new value to the outliers data.
  • in cases of data input errors, instead of deleting and losing an entire row of records due to a single outlier observation, one solution is to use clustering algorithms that find the behavior of the observations closest to the given outlier and make inferences of which would be the best approximate value.

Finally, the main conclusion about the outliers can be summarized as follows:

“a given outlier may be what most disturbs his analysis, but may also be exactly what you are looking for.”

The Future of Financial Analysis with Advanced Analytics.

The Future of Financial Analysis with Advanced Analytics.

The way to conduct financial analysis is changing fast. In the last two decades, companies generally have undergone an intense process of computerization initiated by the field of accounting, with the use of management systems such as ERPs and CRMs. Today they produce much more ta anata than ever before and this equity needs to be analyzed from both the finance and investment standpoint as well as Data Analytics and Advanced Analytics.

In this article we will briefly compare the main changes that are occurring in the way financial analysts work with regard to their future in relation to the area of ​​Advanced Analytics.

Key motivators of changes in how to do financial analysis

The connectivity of recent years has generated new business models that could never before be imagined, being able to serve varied audiences 24 hours a day and with an unprecedented scalability in history, such as Uber for example. In addition, the volume of data has grown in size and complexity, creating a potential for insights and transformations in business that can be sure of purely financial analysis, done only by conventional methods.

Traditional Financial Analysis and Advanced Analytics Techniques

Financial analysis has well-established and widespread analysis methods. To say whether or not a company is interesting for business or investment is a task that is often satisfied by the analysis of past accounting / financial indicators. To do this, each analyst has specific criteria to evaluate the economic and financial feasibility of new investments, serving both corporate finance and personal finance.

Data Analytics methods, in turn, can be used to automate and optimize financial decisions, according to methods that are already used in the area. However, it is also possible, when using Advanced Analytics techniques, to incorporate machine learning and artificial intelligence algorithms to develop predictions in an innovative and yet unexplored way in the market: this is what will generate a great competitive advantage for companies in the financial market. industry 4.0.

Data Analytics and Advanced Analytics in practice :

Automate and optimize financial analytics with Data Analytics:

creation of datasets with automatic updating with macroeconomic data (basic interest rate, inflation, GDP, among others);

collecting data from financial statements in an automated way, either from companies open via API of already existing databases, or from closed companies, through extraction of tables of PDF files, for example;

creation of automated descriptive reports on the main changes in the macroeconomic scenario.

Make predictions with Advanced Analytics – machine learning, artificial intelligence, among others:

models adaptable to changes in economic and financial reality, capable of making recommendations and indicating directions for decision making.

If the financial analyst uses spreadsheets, such as Excel, to do his analyzes, he can then optimize the data extraction and cleaning processes with Data Analytics techniques and in the end obtain an output from an Excel spreadsheet so that he can work and perform the financial analyzes you are already accustomed to do. However, the great competitive advantage lies in the hands of analysts who can use Advanced Analytics to transform the way in which they perform their own financial analysis.

The area of ​​finance is also strongly influenced by the use of econometric methods to make forecasts. However, the use of conventional econometric models usually refers to models that are static. Several tests of robustness are usually made to validate such models, but the problem is that much of it is not adaptable to changes in economic and financial reality, typical situation due to the dynamism of financial markets. This versatility and adaptability to change are characteristics of models that use machine learning and artificial intelligence techniques in a coherent implementation of the data analytics culture among financial analysts.

The Data Analytics culture presents a different way of acquiring analytical knowledge from the traditional model. Achieving Analytics knowledge is more decentralized by the effect of the internet and the sharing of programming codes in package form (influence of computer science and versioning techniques). That is, instead of the analyst spending months or even years creating all the calculations in an isolated way in an Excel spreadsheet to reach a conclusion, with the culture of Data Analytics it is possible to import complete sets of codes that perform complex analyzes on the data in minutes, greatly speeding up the process.

To get an idea of the growth of this type of approach in problem solving, we present below the volume of packages added to the main repository of R language packs – CRAN.

The possibilities become so broad in this new mode that, in a few seconds, it is possible to install and execute commands for automatic generation of Internet Memes, like this one, with only 4 command lines.

For more information on this small package, see this article.


The same process facility extends to financial packages, such as:

TTR, tidyquant, PerformanceAnalytics, PortfolioAnalytics, quantmod, Quandl, among others.

Earlier we wrote about the need to incorporate R, given the traditional limitations of Excel – Leaving from Limited Excel to R or better Python? 

Comparison of traditional methods of financial analysis, Data Analytics and Advanced Analytics Typically, traditional financial analysis methods include stable, well-judged valuations without the need for presentation or discussion of the methods used. Already in Analytics methods, communities share codes and tools, not just concepts. See the table below for a comparison of the two approaches.

Financial analysis Data Analytics Advanced Analytics
Replication level and analysis speed Low, since each worksheet is auto contained and changes are not shared Intermediate, using good practices of scripting and collaborative work. Spreadsheets in Dataset format. High, using structured systems to operate in a distributed way. Multiplatform of scalar form.
Use of Artificial Intelligence low Intermediate High
predictive analysis Trend analysis with strong use of temporal series, use of regression methods. In general, these are robust but static models. Predictions with statistical weights of all variables analyzed, with a wide range of generic algorithms available for analysis Continuous improvement of accuracy, speed and assertiveness of predictive models with weights in all variables discovered by the algorithms themselves.
analyzes focus Internal financial health data of the organization, comparison with similar organizations. Macroeconomic analyzes made on the basis of theoretical premises. Internal data, data linked to macroeconomic aspects, analysis of texts (such as minutes and explanatory notes), investigation of relationships also with non-financial data. Internal and external data at various levels of granularity.
Key Analysis Tools Excel and transactional systems such as:

ERPs (Link)

CRMs (Link)

SCMs – (Link)

Statistical and econometric software, such as: SPSS, Eviews, Stata.

R, Python or other specific programming notebooks, Data cleaning tools, data mining algorithm suites.

Pure text editors (example: Sublime)


Git – Code versioning and creative artifacts

Machine learning platforms and artificial intelligence, which contemplate the use of several algorithms. Use of distributed computing platforms, such as Spark and Hadoop.
license Closed Code Tools Open Code Tools Mix of open and closed code tools
Analyst main activities Analysis of financial statements and indicators. Development of economic / financial reports. Definition of financial analysis structures, preparation of Datasets, information flow of the indicators that compose the datasets. Not limited to financial indicators. Implantation of large-scale models in an integrated way to the transactional systems.

Final considerations and recommendations

The deeper impact of shifting the profile of financial analysis to Analytics paradigms occurs in the nature of the work of financial analysts, which becomes oriented to package orchestration and data flow through scripts, with less technical dependence on the IT sectors and Development.

For those who work in the area of financial analysis and intends to adapt to new market trends, increasingly based on data, we recommend an in-depth study of the basic packages of programming languages (mainly R and Python), how to use code versioning methods (such as Git or Github), participate in Data Science best practices communities in your region, or even online communities.

Industry 4.0, Web 3.0 and Digital Transformation

Industry 4.0, Web 3.0 and Digital Transformation

Industry 4.0 is characterized by the change on the flow of value from centrally designed and resource-intensive products to knowledge-intensive decentralized services designed and produced with strong support from Advanced Analytics and IA throughout digital transformation.

This process has its beginning with the Internet boom in the first decade of the millennium. 2018 seems to be the year of emancipation of Industry 4.0; which ceases to exist only in scientific articles and laboratories, evolving with vigorous support from the budgets of the largest corporations in the world, according to research by the OECD, Gartner Group and PWC.

From our point of view, the Industry 4.0 is materialized from the concepts of Web 3.0, whose core lies in the democratization of the capacity for action and knowledge (as already discussed in this blog post). But before we get to 4.0, let’s understand their previous versions in perspective:

Industry 1.0

Characterized by the discovery of economic gains by producing something in series rather than artisanal (individual) production, making it possible to mechanize labor, which was previously only performed by people or animals. It was the moment when man began to use the force of the waters, winds and also of the fire, from the steam engines and mills.  In 1776 Adam Smith (The Wealth of Nations) presents the advantages of segmenting work in a pin factory. (know more)

Key Components – Coal and Steam Engines.

Industry 2.0

Its major driver was the electricity that, from generators, motors and artificial lighting, allowed to establish the assembly lines, and thus was given the mass production of consumer goods.

Key Components – Electricity and Electromechanical Machines

Industry 3.0

Characterized by automation, its driving force is the use of robots and computers in the optimization of production lines.

Key Components: Computers and Robots

Industry 4.0

Industry 4.0 is characterized by the strong automation of the design, manufacturing and distribution stages of goods and services with strong use of CI – Collective Intelligence – and AI – Artificial Intelligence. In Industry 4.0, with the evolution of the Web, individuals are increasingly empowered by their agents (smartphones). Giving up to the needs of this new consumer is one of the great challenges of the new industry.

To better illustrate this concept we created the following table:

Generations Concept (Design) Manufacture Distribution Services Outcome
Before industry age People People People People Hand-made work
Industry 1.0 People Machines People People Use of electric, thermic, hydraulic energy
Industry 2.0 People Machines People People Electric energy as a main driver, assembly line process start
Industry 3.0 People using machines (computers) as assistants Machines People and machines People Use of automation (robots and computers)
Industry 4.0 Collective inteligente + machines Machines Machines Collective inteligente + machines Use of computacional and collective inteligence to create products and services

In order to understand Industry 4.0 it is important to clarify some concepts that make up its foundations: AI – Artificial Intelligence and CI – Collective Intelligence.

Collective Intelligence

Let’s start with IC, which is more tangible, since we constantly use mechanisms that use collective intelligence in the production and curation of content such as: Wikipedia, Facebook, Waze and Youtube.

Wikipedia: For example, most of all Wikipedia content is produced by hundreds of thousands of publishers worldwide and cured by millions of users who validate and review their content.

Waze: The Waze application uses users’ own movement to build and refine their maps, providing real-time alternative routes to escape traffic congestion and new routes of new sections created by cities.

Facebook and Youtube are services that today have a diverse range of content that is spontaneously generated and cured by its users throughout likes and shares.

What do these mechanisms have in common? They rely on the so-called intelligence of the masses, a concept established by the Marquis de Condorcet in 1785, which defines a degree of certainty and uncertainty about a decision from a collective of individuals.

With hundreds or thousands of individuals acting in their own way, by summing all these actions, one gets a whole that is greater than the sum of the parts. This collective behavior is observed in the so-called swarm effects, in which insects, birds, fish and humans, acting collectively, reach much larger deeds than if they had acted individually.

Condorcet proved that mathematically, inspiring illuminist leaders which used his ideas as base to the formation of democracies in the 18th and 19th centuries.

In a contemporary way, we can look at a database as a large lake of individual experiences that form a collective. Big Data is responsible for collecting and organizing this data and Advanced Analytics for improving, creating and re-creating things (disruption) through intensive statistics and AI.

Artificial Intelligence

In a judicious scrutiny, it is possible to understand AI as an artificial implementation of agents that use the same principles of CI – Collective Intelligence.

That is, instead of real ants or bees, artificial neurons and/or insects are used in a computational world (cloud), that in some ways simulate the real-world behavior and thus obtains from the intelligence of the masses: decisions, responses and creations.

For instance, this piece used to support a bridge in the Dutch capital, The Hague.

On the left side is the original piece created by engineers. In the middle and on the right, two pieces created from an AI approach called genetic algorithm. The right-hand piece is 50% smaller and uses 75% less material, and yet, because of its design, it is capable of sustaining the same dynamic load of its left counterpart.

There are hundreds of cases of AI use cases, ranging from the detection of smiles on cameras and cell phones to cars that move autonomously in the midst of cars with human drivers in big cities.

Each AI use case relies on a set of techniques that can involve Machine Learning, insights discovery and optimal decision making throughout predictive and prescriptive Advanced Analytics and Creative Computing.


The intensive use of CI and AI can generate new products and services creating disruptions that we see today in some industries promoted by companies like Uber, Tesla, Netflix and Embraer.


In the case of Uber, they heavily use the CI to generate competition and at the same time collaboration between drivers and passengers, which is complemented by AI algorithms in delivering a reliable transportation service at a cost never before available.

Despite being 100% digital, it is revolutionizing the way we are transported and very soon will launch its 100% autonomous taxis and, in the near future, drones that transport their passengers through the skies. This is a clear example of digital transformation from redesign through the perspective of Industry 4.0.


Tesla uses CI from the captured data of the drivers of its electric cars and, applying Advanced Analytics, optimizes its own process and still uses them to train the AI that today is able to drive a car safely in the midst of the traffic of big cities of the world.

Tesla is a remarkable example of Industry 4.0. They use CI and AI to design their innovative products, a chain of automated factories to produce them and sell them online. And very soon they will transport and deliver their products to the buyer’s door with their new electric and autonomous trucks, completely closing the Industry 4.0 cycle.


Netflix, in turn, uses the access history to movies and notes gave by its users to generate a list of preferences recommendations that serve as input to the creation of originals such as the hits House of Cards and Stranger Things. In addition, they use the AI of the Bandit algorithm (from Netflix itself) to generate title covers and list curation, which attracts users (viewers) to consume new content.


Embraer, the world’s third largest producer of civil aircraft and the largest innovation company in the Brazil, uses AI, CI and Advanced Analytics in equipment maintenance systems.

By using these techniques it is possible, based on maintenance experiments and risk mitigation procedures applied to an IA, to reduce the costs of troubleshooting processes in high-value equipment, up to 18% savings in an industry where apparently low margins can generate considerable competitive impact.

Conclusions and Recommendations

The path to industry 4.0 is paved by the techniques of CI, AI, Advanced Analytics, Big Data, Digital Transformation and Service Design and with good examples of global leaders.

Transformation is often a process that can generate anxiety and discomfort, but it is necessary to achieve the virtues of Industry 4.0.

We suggest starting small and thinking big, start thinking about Data, they are the building blocks of all Digital Transformation. Start by feeding a Data Culture into your business / department / industry.

And how do you start thinking about Data? Start with the definition of your dictionaries, they will be your nautical charts in the middle of the Digital Transformation journey.

Understanding the potential of data and the new business they can generate is instrumental in the transition from producer of physical goods to service providers, that can be supported by physical products or not. See Uber and AirBnb, both have no cars or real estate, but are responsible for a generous share of the transportation and accommodation market.

We recommend raising the degree of maturity beginning with a diagnosis, then the elaboration of a plan of action and its application.

At Aquarela we have developed a Business Analytics Canvas Model which is a Service Design tool for the development of new business based on Data. It is possible to promote the intensive use of CI, AI in the stages of Design and Services, the links that characterize the change from Industry 3.0 to 4.0.

We will soon publish more about Business Analytics Canvas Model and Service Design techniques for Advanced Analytics and AI.

Analytics and AI on Unicorns and Investment Funds

Analytics and AI on Unicorns and Investment Funds

Interest is growing in start-ups that are causing disruption in all areas of business. These so-called unicorns (companies valued at over 1 billion USD) are impacting the lives of millions of people, as well as generating revenues comparable to the GDP of countries in a the short time span of just a few years, really just few years (averaging 1.6 years).

In this article, we present a current global scenario of unicorn companies (valued at over US $ 1 billion dollars) and venture capital funds associated to them. We analyzed the data using Data Analytics and Artificial Intelligence methodologies to find out relevant patterns on the subject, such as: the marginalization of Latin American countries in this market, the decentralization of the investment and innovation market caused by Asian initiatives, specially China, and the composition of groups with similar business and performance characteristics in general terms.


  • Venture Capitalists
  • Investment Funds
  • Tech Entrepreneurs
  • Macroeconomists and strategists

Key Questions

  • How are the 217 unicorns cataloged and their relationships with global investment funds characterized?
  • What are the main the investment funds and their strategic behavior on the market?
  • Which Investors are most appropriate for each sector and how are they distributed geographically?
  • What are the ultra-unicorns?
  • What is the average distance in kilometers between investors and companies in the US and China?
  • What is the correlation of the physical presence of funds with unicorns?
  • How does the polarization of the creation of unicorns take place between the West and the East?

What are unicorn companies?

Unicorn companies are Tech-based startups with a valuation  above $ 1 billion worth, even before they are publicly traded. In November 2017, the total value of unicorn startups in the world corresponds to US $ 753 billion (42% of Brazil’s GDP). US $ 383 billion are from the US companies while US $ 253 billion from Chinese companies, which combined correspond to 84.4% of the total!


We collected the data sample published by the market research firm CBIsinghts with the following raw information:

  • 217 start unicorns with market value greater than 1 billion US dollars and with the time of the first investment “series A” less than 10 years.
  • 312 investment funds with at least one investment in one of the 217 companies.
  • 34 business sectors.
  • 22 countries.

Note: We execute several “cognitive capoeira”, which is the way we call data wrangling creative processes for the dataset enrichment and field validation. We simplify the number of categories as much as possible. We verified all the geographical positions and realized that some unicorn companies, mainly Chinese, were associated with the generic geographical coordinates due to the difficulties in language translation (or address translation) or they have migrated their offices to other locations. We focused on the place of origin of each.

Rankings of unicorn and georeferenced analyzes

Business scenarios for unicorn startups created by VORTX

The business scenarios encountered by VORTX artificial intelligence were named according to the most striking features from the point of view of Aquarela Analysts team.

Most unicorn startups

Group 1 – Annual Billion

They are the most typical companies and correspond to 50% of the 217 analysed companies. They present Valuations of 2 billion in two years, of which we have the following configuration of sectors:
(26.09%) | eCommerce / Marketplace
(16.05%) | Fintech
(13.38%) | Software & Services
(10.03%) | Healthtech
(6.69%) | big data
(6.02%) | On demand
(5.69%) | Social
(5.02%) | Media
(3.68%) | Hardware
(3.34%) | Cyber ​​Security

The main investors related to these companies are:
(4.35%) | Sequoia Capital
(3.01%) | Accel Partners
(2.68%) | GGV Capital
(2.34%) | SV Angel
(2.01%) | New Enterprise Associates
(2.01%) | Index Ventures
(1.67%) | Founders Fund
(1.67%) | Insight Venture Partners
(1.67%) | Alibaba Group
(1.67%) | Khosla Ventures

Group 2 – Explosives

Companies that correspond 38.82% of the base and present an average valuation of 2 billion with extremely short life time with average age under one year. The main sectors of these companies are:
(9.52%) | Fintech
(9.52%) | Software & Services
(8.23%) | On demand
(7.36%) | Healthtech
(6.93%) | Education
(5.19%) | Cyber ​​Security
(5.19%) | Real Estate Management
(4.33%) | Social
(4.33%) | eCommerce / Marketplace
(3.90%) | Travel-TravelTech

Top 10 Investors
(2.60%) | Warburg pincus
(2.60%) | Sequoia Capital China
(2.16%) | Khosla Ventures
(2.16%) | Goldman Sachs
(2.16%) | Tencent Holdings
(1.73%) | Google Ventures
(1.73%) | Tencent
(1.73%) | Sequoia Capital
(1.30%) | QiMing Venture Partners
(1.30%) | Temasek Holdings

Group 3 – Designers

Companies with an average valuation of 4.6 billion and an average of 4.5 years with the following main industries:
(28.95%) | Software & Services
(23.68%) | eCommerce / Marketplace
(18.42%) | Cyber ​​Security
(7.89%) | Fintech
(7.89%) | Social
(5.26%) | Media
(5.26%) | Big Data
(2.63%) | Clothing and Accessories

Top 10 Investors in This Group
(5.26%) | Accel Partners
(5.26%) | Andreessen Horowitz
(5.26%) | Temasek Holdings
(5.26%) | Technology Crossover Ventures
(5.26%) | Sequoia Capital
(5.26%) | General Atlantic
(5.26%) | SoftBank Group
(2.63%) | Firstmark Capital
(2.63%) | Ceyuan Ventures
(2.63%) | Polaris Partners

Ultra-Unicorn Startups

From group 4 onwards, we consider companies “ultra-unicorns” which are the exceptions (or outliers) within the dataset.

Group 4 – Data-Driven

They are US companies in the areas of eCommerce, Marketplace and Big Data, with an average valuation of $ 24.64 billion and average age of 6 years. The only two companies were Airbnb and Palantir and the investors were:

Top 10 Investors
(16.67%) | Founders Fund
(16.67%) | RRE Ventures
(16.67%) | ENIAC Ventures
(16.67%) | In-Q-Tel
(16.67%) | General Catalyst Partners
(16.67%) | Andreessen Horowitz

Group 5 – Logistics

Companies in general of the logistics area, with an average valuation of 20 billion dollars and age of 3.5 years. The top industries in this group are Supply Chain and Logistics, and Facilities.

Top 10 Investors
(16.67%) | Rothenberg Ventures
(16.67%) | Founders Fund
(16.67%) | T. Rowe Price
(16.67%) | SoftBank Group
(16.67%) | Draper Fisher Jurvetson
(16.67%) | Benchmark Capital

Group 6 – Uber

With $ 68 billion valuation, Uber gained a scenario just for itself according to VORTX. An interesting case of and outlier among already outlier companies. Belonging to the on-demand industry category, key investors are:

  • Lowercase Capital
  • Google Ventures
  • Benchmark Capital

Group 7 – Chinese Uber

With a gigantic market to explore, China has a Uber-like company that is very strong in its local market and has a valuation of $ 50 billion in just 2 years from birth. The company is called Didi Chuxing. The main investors are:

  • Softbank Corp.
  • Tiger Global Management
  • Matrix Partners

Group 8 – Chinese Hardware

Xiaomi, with valuation of $ 46 billion and 5 years in the area of ​​hardware with key investors:

  • QiMing Venture Partners
  • Digital Sky Technologies
  • Qualcomm Ventures

Group 9 – China Internet Plus

China Internet Plus, with a valuation of $ 30 billion in 1 year in the area of ​​eCommerce / Marketplace. Key Investors:

  • Global DST
  • Trustbridge Partners
  • Capital Today

Group 10 – Green Industry

Bloom Energy, with valuation of $ 2.7 billion over 8 years in the clean industry area. Key Investors:

  • Kleiner Perkins Caufield & Byers
  • ATEL Ventures
  • DAG Ventures

The displacement of the American axis of startups unicorns

Although we see that most unicorns startups originate in North America, other points, such as Europe and especially Asia, began to gain strength in this market. The following gif file shows the mean of the unicorns’ geographic coordinates(latitude/Longitude) over the last 5 years (2012-2017) as well as the investors. It is possible to see how the axis has distanced itself from the United States, by the strong Asian movement! The trend also continues to be on the northern hemisphere.

The regional average of distances from venture capital funds (also called venture capital funds) presents relevant information in absolute terms: most investors are in the United States, but the center point of this axis is shifting to the right, so as the central point of the unicorns, which is already approaching Europe. Initiatives in the southern hemisphere are proportionately so low that they do not allow the points to be shifted to the central axis of the map. Only 3 funds are below the equator.


China’s presence in the global market for startups is staggering, featuring more and more ultra-unicorn companies. Moreover, the Chinese strategy apparently aims to create equivalent solutions of the American market, using its domestic economy and its cultural challenges faced by foreign companies to adapt to the solutions and different languages ​​and cultures of Chinese civilization. India presents more startups and investors, also contributing to the shift of the American axis.

In contrast, in Brazil there are no companies or investment funds listed in the ranking, until now. Although the country is one of the largest economies in the world, the large-scale technologies used in the country are traditionally American. Late entry into these new markets may represent something similar to the late entry into the industrial age, but on a much shorter time scale. The operating time of the unicorn companies is really very short, average 1.64 years. South Americans need to be aware of global movements.

Uber, the absolute leader, presents a model so innovative that it generates business advantages that are not restricted only to taxi drivers, but also to the technology industry, since its information service does not generate the invoice issuance at least in Brazil. Also the information of human flows in key strategical cities of the world allows UBER to extract a series of Big Data Analytics opportunities to commercialize this knowledge, and also serve as a competitive differential to the allied investment funds.

With the VORTX scenario discovery analytics, it becomes possible not only to understand the points outside the curve, but also to help entrepreneurs and investment funds choose the sectors and partner profiles that best fit the reality of each business. Also, taking the average investment geographic distance from each fund, it is possible to infer if there are greater or lesser chances of investing in a company that could become a startup unicorn.

More than half of these companies are based in the United States, followed by China which, in turn, is gaining a lot of space on the international scene, although very little is heard about these companies outside China.

We hope this report can assist mid/long-term investors and entrepreneurs. For any questions of comments, just let us know.

Yellow Ribbon September – towards celebration of life

Yellow Ribbon September – towards celebration of life

Aquarela starts September engaged with the life valorization campaign, bringing to light a subject that has to be talked about. All the way from schools until the corporate word, mental suffering can be silently present of with colleague, neighbor or relative and a refuge can make all the difference for them.

Suicide is a phenomenon that is presents in all cultures, since the beginning of human history. It relates to characteristics related to emotional, mental social and economical aspects.

The person suffers from feelings’ ambivalence; they do not want to die, but they want to put an end to their psychic pain (or physical when dealing with chronical cases).  Since the subject is seen as a taboo, full of prejudice, the subject gets stigmatized, which difficulties the reaching of for help or simply for having a conversation. The subject is simply avoided.

However, this year, the ‘blue whale’ “fever” as well as the ‘13 Reasons Why’ TV-Series raised the public interest regarding suicide. Some parents lost their sleep and search for information and gathered help from health professionals. But, the thought of suicide, is not present only on the minds of the young; it is present in other age groups, including the elderly. And that is one more reason why suicide has to be discussed.

The good news is that suicide can be prevented, as long as it gets treated as a case of public health associated to information and prevention projects. Below follows some relevant data.

World Health Organization Data

According to the Pan American Health Organization (PATH/WHO):

  • over 800 000 people die every year from suicide;
  • suicide the the second main death cause of young people between age of 15 and 29;
  • only 60 of the 172 member nations provide data that is considered to have good quality;
  • it is estimated that 28 countries have national suicide prevention strategies;
  • in the Mental Health Action Plan 2013-2020, the WHO member states have committed to reduce the suicide rates in 10% until 2020;
  • around 75% of suicides happen in countries of medium and low income;
  • men from wealthy countries commit three times more suicide than females;
  • in high income countries the highest suicide rates are related to abuse of alcohol and depression;
  • 90% of all suicides can be avoided;
  • in Brazil the average is of 6 to 7 death for every 100 000 inhabitants, which is considered low. However, that data is not reliable, since the quality of data in our country has a lot of room for improvements.

“Every 40 seconds one person dies by suicide”

Artificial Intelligence and suicide

Artificial Intelligence (AI) can provide means for identifying patterns and suicial behavioral tendencies, helping to refine preventive actions.

Recently suicidal movements, such as the previously mentioned ‘Blue Whale’, have gained visibility through their dissemination on the social networks. There are also cases of people who manifestate their feeling individually, also through the social networks.

Considering that, the implementation of Artificial Intelligence algorithms and big data techniques can provide precise inference regarding individuals which need help. Companies like Facebook, Instagram and Google have already announced that they will use AI on their platforms for providing warnings and prevention.

But much more can be done with the new technologies, putting together technologists, teachers, professors, psychologists and other professionals. They can provide preventive measures and identify possible suicidals, and they can also provide protection through means of a support network.

An analysis from Aquarela

Based on the death records of 645 municipalities from the state of São Paulo, Joni Hoppen, one of the Aquarela’s founders, found out that:

  • from 300 000 deaths, 2.223 were suicides;
  • he identified that most of the deaths are unknown or not informed professions. The exception were masons;
  • the lack of professional identification can lead to suicide, or, health professionals and family have great difficulties describing those peoples’ jobs;
  • Joni had difficulties trying to identify if masons really committed suicide, or if the deaths are related to work accidents which were informed as suicided due to labor issues;
  • he applied a filter for “lawyers” which returned 18. The ratio of lawyers in the state in comparison which other professional occupations such as janitors, shopkeepers and security guards indicates that favorable economic situation are also present in the statistics;
  • male with high scholarity commit more suicide;

You can see the whole post (in Portuguese) here.

Humans construct their identity based on personal, social and professional relations. Jobs represent socio-historical meanings, the role of an individual in the society and this roles affects how each person is seen by the other and also how they evaluate themselves. When those visions became dysfunctional health issues such as depression and suicidal thoughts can appear..


In order for people that are considering suicide not to be ashamed or afraid of reaching out for professional help, it is necessary to have information and welcoming environment.

It is necessary to be open to their pains and sufferings, without judgment or prejudices, showing interest and being available for them.

The discussion of the issue helps the population as well institutions to establish strategies and prevention. One of the objectives when intervening  is to recover the self esteem, promote emotional well-being and to establish bonds of affection that can provide a support network for the individuals.

In Brazil, we have the Centro de Valorização da Vida (CVV) (Health Valorization Center), a NGO that provides free voluntary services of emotional aid and suicide prevention through chat, telephone, Skype and email. Alway with keeping the individual’s privacy.

Additional information:

Booklet distributed by the Conselho Federal de Medicina (Federal Council of Medicine): http://www.flip3d.com.br/web/pub/cfm/index9/?numero=14#page/1

WHO’s first report on suicide: http://www.who.int/mental_health/suicide-prevention/world_report_2014/en/