AI provider? How to choose the best AI and Data Analytics provider?

AI provider? How to choose the best AI and Data Analytics provider?

Choosing a artificial intelligence provider for analytics projects, dynamic pricing, demand forecasting is, without a doubt, a process that should be on the table of every manager in the industry. Therefore, in case you are considering to speed up the process, an exit and the hiring of companies specialized in the subject.

A successful implementation of analytics is, to a large extent, a result of a well-balanced partnership between the internal teams and the teams a analytics service provider, so this is an important decision. Herein, we will cover some of key concerns.

Assessing the AI provider based on competencies and scale

First, you must evaluate your options based on the skills of the analytics provider. Below we bring some for criteria:

  • Consistent working method in line with your organization’s needs and size.
  • Individual skills of team members and way of working.
  • Experience within your industry, as opposed to the standard market offerings.
  • Experience in the segment of your business.
  • Commercial maturity of solutions such as the analytics platform.
  • Market reference and ability to scale teams.
  • Ability to integrate external data to generate insights you can’t have internally.

Whether developing an internal analytics team or hiring externally, the fact is that you will probably spend a lot of money and time with your analytics and artificial intelligence provider(partner), so it is important that they bring the right skills to your department’s business or process.

Consider all the options in the analytics offering.

We have seen many organizations limit their options to Capgemini, EY, Deloitte, Accenture and other major consultancies or simply developing internal analytics teams. Although:

But there are many other good options on the market, including the Brazilian ones which are worth paying attention to the their rapid growth. Mainly within the main technological centers of the country, such as: in Florianópolis or Campinas.

Adjust expectations and avoid analytical frustrations

We have seen, on several occasions, the frustrated creation of fully internal analytics teams, be they for configuring data-lakes, data governance, machine learning or systems integration.

The scenario for the adoption of AI is similar, at least per hour, to the time when companies developed their own internal ERPs in data processing departments. Today of the 4000 largest technology accounts in Brazil, only 4.2% maintain the development of internal ERP, of which the predominant are banks and governments, which makes total sense from the point of view of strategy and core business.

We investigated these cases a little more and noticed that there are at least four factors behind the results:

  • Non-data-driven culture and vertical segmentation prevent the necessary flow (speed and quantity) of ideas and data that make analytics valuable.
  • Projects waterfall management style performed in the same manner as if the teams where creating a physical artifacts or ERP systems, this style is not suitable for analytics.
  • Difficulty in hiring professionals with knowledge of analytics in the company’s business area together with the lack of on-boarding programs suited to the challenges.
  • Technical and unforeseen challenges happen very often, so it is necessary to have resilient professionals used to these cognitive capoeira (as we call here). Real life datasets are never ready and are as calibrated as those of the examples of machine learning of the passengers of the titanic dataset. They usually have outliers (What are outliers?), They are tied to complex business processes and full of rules as in the example of the dynamic pricing of London subway tickets (Article in Portuguese).

While there is no single answer to how to deploy robust analytics and governance and artificial intelligence processes, remember that you are responsible for the relationship with these teams, and for the relationship between the production and analytics systems.

Understand the strengths of analytics provider, but also recognize their weaknesses

It is difficult to find resources with depth and functional and technical qualities in the market, especially if the profile of your business is industrial, involving knowledge of rare processes, for instance, the physical chemical process for creating brake pads or other specific materials.

But, like any organization, these analytics provider can also have weaknesses, such as:

  • Lack of international readiness in the implementation of analytics (methodology, platform), to ensure that you have a solution implemented fast.
  • Lack of migration strategy, data mapping and ontologies
  • No guarantee of transfer of knowledge and documentation.
  • Lack of practical experience in the industry.
  • Difficulty absorbing the client’s business context

Therefore, knowing the provider’s methods and processes well is essential.
The pillars of a good Analytics and AI project are the Methodology and its Technological Stack (What is a technological stack?). Therefore, seek to understand about the background of the new provider, ask about their experiences with other customers of similar size to yours.

Also, try to understand how this provider solved complex challenges in other businesses, even if these are not directly linked to your challenge.

Data Ethics

Ethics in the treatment of data is a must have, therefore we cannot fail to highlight this topic of compliance. It is not just now that data is becoming the center of management’s attention, however new laws are being created as example of GDPR in Europe and LGPD in Brazil.

Be aware to see how your data will be treated, transferred and saved by the provider, and if his/her name is cleared on google searches of even public organizations.

Good providers are those who, in addition to knowing the technology well, have guidelines for dealing with the information of your business, such as:

  • It has very clear and defined security processes
  • Use end-to-end encryption
  • Track your software updates
  • Respect NDAs (Non-disclosure Agreements) – NDAs should not be simply standard when it comes to data.
  • All communication channels are aligned and segmented by security levels.
  • They are well regarded by the data analysis community.

Conclusions and recommendations

Choosing your Analytics provider is one of the biggest decisions you will make for your organization’s digital transformation.

Regardless of which provider you choose for your company, it is important that you assemble an external analytics consulting team that makes sense for your organization, that has a technological successful and proven business track that supports your industry’s demand.

What is Aquarela Advanced Analytics?

Aquarela Analytics is Brazilian pioneering company and reference in the application of Artificial Intelligence in industry and large companies. With the Vortx platform and DCIM methodology, it serves important global customers such as Embraer (aerospace & defence), Scania and Randon Group (automotive), Solar Br Coca-Cola (beverages), Hospital das Clínicas (healthcare), NTS-Brasil (oil & gas), Votorantim Energia (energy), among others.

Stay tuned following Aquarela’s Linkedin!

Author

AI for demand forecasting in the food industry

AI for demand forecasting in the food industry

The concept of a balance point between supply and demand is used to explain various situations in our daily lives, from bread in the neighborhood bakery, which can be sold at the equilibrium price, which equals the quantities desired by buyers and sellers, to the negotiation of securities of companies in the stock market.

On the supply side, a definition of the correct price to be practiced and mainly the quantity are common issues in the planning and execution of the strategy of several companies.

In this context, how are technological innovations in the data area establishing themselves in the food sector?

The construction of the demand forecast

The projection of demand is often built through historical sales data, growth prospects for the sector or even targets set to engage sales of a certain product.

When considering only these means of forecasting, without considering the specific growth of each SKU (Stock Keeping Unit), companies can fall into the traps of subjectivity or generalism.

The expansion of a sector does not result in a growth of the same magnitude for the entire product mix. For example, does a projected annual growth of 6% for the food sector necessarily imply equivalent growth for the noble meat segment?

Possibly not, as this market niche may be more resilient or sensitive than the food sector, or it may even suffer from recent changes in consumer habits.

Impacts of Demand Forecasting Errors

For companies, mainly large ones with economies of scale and geographic capillarity, an error in the forecast of demand can cause several consequences, such as:

  • Stock break;
  • Perishable waste (What is FIFO?);
  • Drop in production;
  • Idle stock (slow moving)
  • Pricing errors

Adversities like these directly impact the companies’ final results, as they result in loss of market share, increase in costs or low optimization in the dilution of fixed costs, growth in the loss of perishable products, frustration of employees in relation to the goals and mainly break in the confidence of recurring customers who depend on supply for their operations.

The demand forecast in the food sector

The food industry is situated in a context of highly perishable products with the following characteristics:

  • High inventory turnover;
  • Parallel supply in different locations;
  • Large number of Skus, points of production and points of sale;
  • Verticalized supply chain;
  • Non-linearity in data patterns;
  • Seasonality.

These characteristics make the sector a business niche that is more sensitive to deviations in demand forecast and adjacent planning.

Supply chain opportunity

As an alternative to the traditional demand forecast format, there are opportunities to use market and AI data to assist managers in the S&OP (Sales & Operations Planning) process, as well as in the S&OE (Sales and Operations Execution) process.

During the S&OP process, demand forecasting supported by AI facilitates the work of the marketing and sales areas, as well as reducing uncertainty and increasing predictability for the supply chain areas.

In the S&OE process, AI can be used to identify new opportunities and to correct deviations from what was planned.

In addition to the technical attributes that AI can add to the process, the data base reduces points of conflict between teams, reduces historical disputes between preferences for SKUs and makes the process more transparent between areas.

Previously, in our blog, we addressed the challenges of forecasting demand in our view (pt. 1 in portuguese). In the articles, we cite the differentials of the predictive approach in relation to demand, taking into account factors such as seasonality, geographic / regional preferences and changes in consumer behavior.

We understand that the need for a predictive approach through data, mainly external to the company, is increasingly latent.

The role of machine learning in the food sector

The use of AI through machine learning techniques associated with a coherent technological stack of analytics (What is a technological stack?) Provides greater information speed, data organization with different granularities (region, state, city and neighborhood), adjustments seasonality, exploration of opportunities and decision making in real time.

In the case of the food sector, the greatest accuracy in forecasting demand means:

  • Inventory optimization among Distribution Centers (CDs);
  • Reduction of idle stocks;
  • Decrease in disruptions that cause loss of market share due to substitute products;
  • Direct reduction in losses with perishability (FIFO).

The great technical and conceptual challenge faced by data scientists (The profile of data scientists in the view of Aquarela), however, is the modeling of analysis datasets (what are datasets?) That will serve for the proper training of machines.

Please note that:

“Performing machine training with data from the past alone will cause the machines to replicate the same mistakes and successes of the past, especially in terms of pricing, so the goal should be to create hybrid models that help AI replicate with more intensity and emphasis the desired behaviors of the management strategy “.

In the case of Aquarela Analytics, the demand forecast module of Aquarela Tactics makes it possible to obtain forecasts integrated into corporate systems and management strategies. It was created based on real national-wide retail data and algorithms designed to meet specific demands in the areas of marketing, sales, supply chain, operations and planning (S&OP and S&OE).

Conclusions and recommendations

In this article, we present some key characteristics of the operation of demand forecasting in the food sector. We also comment, based on our experiences, on the role of structuring analytics and AI in forecasting demand. Both are prominent and challenging themes for managers, mathematicians and data scientists.

Technological innovations in forecasting, especially with the use of Artificial Intelligence algorithms, are increasingly present in the operation of companies and their benefits are increasingly evident in industry publications.

In addition to avoiding negative points of underestimating demand, the predictive approach, when done well, makes it possible to gain market share in current products and a great competitive advantage in forecasting opportunities in other niches before competitors.

What is Aquarela Advanced Analytics?

Aquarela Analytics is Brazilian pioneering company and reference in the application of Artificial Intelligence in industry and large companies. With the Vortx platform and DCIM methodology, it serves important global customers such as Embraer (aerospace & defence), Scania and Randon Group (automotive), Solar Br Coca-Cola (beverages), Hospital das Clínicas (healthcare), NTS-Brasil (oil & gas), Votorantim Energia (energy), among others.

Stay tuned following Aquarela’s Linkedin!

What is a technological stack?

What is a technological stack?

The stack represents a set of integrated systems to run a single application without additional software. In this way and above all, one of the main goals of a technology stack is to improve communication about how an application is built. In addition, the chosen technology package may contain:

  • the programming languages ​​used;
  • structures and tools that a developer needs to interact with the application;
  • known performance attributes and limitations;
  • survey of strengths and weaknesses of the application in general.

As a rule, stacks must have a specific purpose. For instance, if we look at the the web 3.0 stack (what is web 3.0?), you will see how much different it is in relation to a data analysis stack in statistical R language. That is, the construction of a stack you should always ask: What is the underlying business purpose?

Where does this term come from?

The term comes from the software development community and along with it it is also quite common to speak of a full-stack developer.

A full-stack developer is, in turn, the professional who knows how to work in all layers of technologies of a 100% functional application.

Why is the technological stack so important?

Firstly, on the one hand, the accountant has all company transactions registered for financial management, on the other hand, developers and project leaders need the information of the development team.

Secondly, developers cannot manage their work effectively without at least knowing what is happening, what are the available technology assets (systems, databases, programming languages, communication protocols) and so on.

The technological stack is just as important as lifting inventory control from a company that sells physical products. It is in the technological stack that both the business strategy and the main learning (maturity) of system tests that the company has been through are concentrated.

The technological stack the working dictionary of developers in the same manner data analytics look at their data dictionaries to understand the meaning of variables and columns. It is an important item of maturity in the governance of organizations.

Without prior knowledge of the technological stack, management is unable to plan hiring, risk mitigation plans, plans to increase service capacity and, of course, the strategy for using data in the business area.

Technology stacks are particularly useful for hiring developers, analysts and data scientists.

“Companies that try to recruit developers often include their technology stack in their job descriptions.”

For this reason, professionals interested in advancing their careers should pay attention to the strategy of personal development of their skills in a way that is in line with market demand.

Technological stack example

The professional social network, Linkedin, for example: it is composed of a combination of structures and programming languages ​​and artificial intelligence algorithms to be online. So, here are some examples of technologies used in their stack:

Technological Stack – Linkedin for 300 million hits – Author Philipp Weber (2015)

Is there a technological stack for analytics?

Yes, currently the area of ​​analytics, machine learning, artificial intelligence are known for the massive use of techniques and technologies of information systems. Likewise, analytical solutions require very specific stacks to meet functional (what the system should do) and non-functional (how the system will do – security, speed, etc.) business requirements for each application.

As the foundation of a house, the order in which the stack is built is important and is directly linked to the maturity of the IT and analytics teams, so we recommend reading this article – The 3 pillars of the maturity of the analytics teams (in Portuguese).

In more than 10 years of research in different types of technologies, we have gone through several technological compositions until we reached the conformation of the current Aquarela Vortx platform. The main stack results for customers are:

  • Reduction of technological risk (learning is already incorporated in the stack);
  • technological update;
  • speed of deployment and systems integration (go-live);
  • maturity of the maintenance of the systems in production and;
  • the quality of the interfaces and flows in the production environment as the stack makes the maintenance of technicians’ knowledge more efficient.

Conclusions and recommendations

In conclusion, we presented our vision of the technological stack concept and how it is also important for analytical projects. Which, in turn, impacts strategic planning. Yet, it is worth bearing in mind that technological stacks are just like business, always evolving.

The success of defining successful stacks is directly linked to the maturity of the IT and analytics teams (The 3 pillars of the maturity of the analytics teams – In Portuguese).

Regardless of the sector, the decisions involved in shaping the technological stack are a factor of success or failure in IT and analytics projects. Because, they directly interfere in the operation and in the business strategy.

Finally, we recommend reading this other article on technology mitigation with support from specialized companies – (How to choose the best data analytics provider? in Portuguese).

What is Aquarela Advanced Analytics?

Aquarela Analytics is Brazilian pioneering company and reference in the application of Artificial Intelligence in industry and large companies. With the Vortx platform and DCIM methodology, it serves important global customers such as Embraer (aerospace & defence), Scania and Randon Group (automotive), Solar Br Coca-Cola (beverages), Hospital das Clínicas (healthcare), NTS-Brasil (oil & gas), Votorantim Energia (energy), among others.

Stay tuned following Aquarela’s Linkedin!

Author

What are outliers and how to treat them in Data Analytics?

What are outliers and how to treat them in Data Analytics?

What are Outliers? They are data records that differ dramatically from all others, they distinguish themselves in one or more characteristics. In other words, an outlier is a value that escapes normality and can (and probably will) cause anomalies in the results obtained through algorithms and analytical systems. There, they always need some degrees of attention.

Understanding the outliers is critical in analyzing data for at least two aspects:

  1. The outliers may negatively bias the entire result of an analysis;
  2. the behavior of outliers may be precisely what is being sought.

While working with outliers, many words can represent them depending on the context. Some other names are: Aberration, oddity, deviation, anomaly, eccentric, nonconformist, exception, irregularity, dissent, original and so on. Here are some common situations in which outliers arise in data analysis and suggest best approaches on how to deal with them in each case.

How to identify which record is outlier?

Find the outliers using tables

The simplest way to find outliers in your data is to look directly at the data table or worksheet – the dataset, as data scientists call it. The case of the following table clearly exemplifies a typing error, that is, input of the data. The field of the individual’s age Antony Smith certainly does not represent the age of 470 years. Looking at the table it is possible to identify the outlier, but it is difficult to say which would be the correct age. There are several possibilities that can refer to the right age, such as: 47, 70 or even 40 years.

Antony Smith age outlier
Antony Smith age outlier

In a small sample the task of finding outliers with the use of tables can be easy. But when the number of observations goes into the thousands or millions, it becomes impossible. This task becomes even more difficult when many variables (the worksheet columns) are involved. For this, there are other methods.

Find outliers using graphs

One of the best ways to identify outliers data is by using charts. When plotting a chart the analyst can clearly see that something different exists. Here are some examples that illustrate the view of outliers with graphics.

Case: outliers in the Brazilian health system

In a study already published on Aquarela’s website, we analyzed the factors that lead people no-show in medical appointments scheduled in the public health system of the city of Vitória in the state of Espirito Santo, which caused and approximate loss of 8 million US dollars a year million.  

In the dataset, several patterns have been found, for example: children are practically not missing their appointments; and women attend consultations much more than men. However, a curious case was that of an outlier, who at age 79 scheduled a consultation 365 days in advance and actually showed up in her appointment.

This is a case, for example, of a given outlier that deserves to be studied, because the behavior of this lady can bring relevant information of measures that can be adopted to increase the rate of attendance in the schedules. See the case in the chart below.

sample of 8000 appointments
sample of 8000 appointments

Case: outliers in the Brazilian financial market

On May 17, 2017 Petrobras shares fell 15.8% and the stock market index (IBOVESPA) fell 8.8% in a single day. Most of the shares of the Brazilian stock exchange saw their price plummet on that day. This strong negative variation had as main motivation the Joesley Batista, one of the most shocking political events that happened in the first half of 2017.

This case represents an outlier for the analyst who, for example, wants to know what was the average daily return on Petrobrás shares in the last 180 days. Certainly, the Joesley’ facts strongly affected the average down. In analyzing the chart below, even in the face of several observations, it is easy to identify the point that disagrees with the others.

Petrobras 2017

The data of the above example may be called outlier, but if taken literally, it can not necessarily be considered a “outlier.” The “curve” in the above graph, although counter-intuitive, is represented by the straight line that cuts the points. Still from the graph above you can see that although different from the others, the data is not exactly outside the curve.

A predictive model could easily infer with high precision that a 9% drop in the stock market index would represent a 15% drop in Petrobras’ share price. In another case, still with data from the Brazilian stock market, the stock of the company Magazine Luiza appreciated 30.8% on a day when the stock market index rose by only 0.7%. This data, besides being an atypical point, distant from the others, also represents an outlier. See the chart:

This is an outlier case that can harm not only descriptive statistics calculations, such as the mean and median, for example, but it also affects the calibration of predictive models.

Find outliers using statistical methods

A more complex but quite precise way of finding outliers in a data analysis is to find the statistical distribution that most closely approximates the distribution of the data and to use statistical methods to detect discrepant points. The following example represents the histogram of the known driver metric “kilometers per liter”.

The dataset used for this example is a public dataset greatly exploited in statistical tests by data scientists. The dataset contains “Motor Trend US magazine” of 1974 and comprises several aspects about the performance of 32 models. More details at this link.

The histogram is one of the main and simplest graphing tools for the data analyst to use in understanding the behavior of the data.

In the histogram below, the blue line represents what the normal (Gaussian) distribution would be based on the mean, standard deviation and sample size, and is contrasted with the histogram in bars.

The red vertical lines represent the units of standard deviation. It can be seen that cars with outlier performance for the season could average more than 14 kilometers per liter, which corresponds to more than 2 standard deviations from the average.

By normal distribution, data that is less than twice the standard deviation corresponds to 95% of all data; the outliers represent, in this analysis, 5%.

Outliers in clustering

In this video in English (with subtitles) we present the identification of outliers in a visual way using a visual clustering process with national flags.

Conclusions: What to do with outliers?

We have seen  that it is imperative to pay attention to outliers because they can bias data analysis. But, in addition to identifying outliers we suggest some ways to better treat them:

  • Exclude the discrepant observations from the data sample: when the discrepant data is the result of an input error of the data, then it needs to be removed from the sample;
  • perform a separate analysis with only the outliers: this approach is useful when you want to investigate extreme cases, such as students who only get good grades, companies that make a profit even in times of crisis, fraud cases, among others. use clustering methods to find an approximation that corrects and gives a new value to the outliers data.
  • in cases of data input errors, instead of deleting and losing an entire row of records due to a single outlier observation, one solution is to use clustering algorithms that find the behavior of the observations closest to the given outlier and make inferences of which would be the best approximate value.

Finally, the main conclusion about the outliers can be summarized as follows:

“a given outlier may be what most disturbs his analysis, but may also be exactly what you are looking for.”

What is Aquarela Advanced Analytics?

Aquarela Analytics is Brazilian pioneering company and reference in the application of Artificial Intelligence in industry and large companies. With the Vortx platform and DCIM methodology, it serves important global customers such as Embraer (aerospace & defence), Scania and Randon Group (automotive), Solar Br Coca-Cola (beverages), Hospital das Clínicas (healthcare), NTS-Brasil (oil & gas), Votorantim Energia (energy), among others.

Stay tuned following Aquarela’s Linkedin!

Authors

14 sectors for applying Big Data and their input datasets

14 sectors for applying Big Data and their input datasets

Hello folks, 

In the vast majority of talks with clients and prospects about Big Data, we soon realized an astonishing gap between the business itself and the expectations of Data Analytics projects. Therefore, we carried out a research to respond the following questions: 

  • What are the main business sectors that already use Big Data?
  • What are the most common Big Data results per sector?
  • What is the minimum dataset to reach the results per sector

The summary is organized in the table below.

,Business type / sector,Raw data examples,Business Opportunities,, ,"1 - Bank, Credit and Insurance ","Transaction history. Registration forms. External references such as the Credit Protection Service. Micro and macro economic indices. Geographic and demographic data.","Credit approval. Interest rates changes. Market analysis. Prediction of default . Fraud detection. Identifying new niches. Credit risk analysis.",, ,2 - Security,"Access history. Registration form. Texts of news and WEB content.",Pattern detection of physical or digital behaviours that offer any type of risk.,, ,3 - Health,"Medical records. Geographic and demographic data. Sequencing genomes.","Predictive diagnosis (forecast). Analysis of genetic data. Detection of diseases and treatments. Map of health based on historical data. Adverse effects of medications / treatments.",, ,"4 - Oil, gas and electricity",Distributed sensor data.,"Optimization of production resources. Prediction / fault and found detection.",, ,5 - Retail,"Transaction history. Registration form. Purchase path in physical and/or virtual stores. Geographic and demographic data. Advertising data. Customer complaints.","Increasing sales by product mix optimization based on behaviour patterns during purchase. Billing analysis (as-is, trends), the high volume of customers and transactions, credit profile by regions. Increasing satisfaction / loyalty.",, ,6 - Production,"Data management system / ERP production. Market Data.","Optimization of production over sales. Decreased time / amount of storage. Quality control.",, ,7 - Representative organizations,"Customer's registration form. Event data. Business process management and CRM systems.","Suggestion of optimal combinations of company profiles, customers, business leverage to suppliers. Synergy opportunities identification.",, ,8 - Marketing,"Micro and macroeconomic indices. Market research. Geographic and demographic data. Content generated by users. Data from competitors. ","Market segmentation. Optimizing the allocation of advertising resources. Finding niche markets. Performance brand / product. Identifying trends.",, ,9 - Education,"Transcripts and frequencies. Geographic and demographic data. ","Personalization of education. Predictive analytics for school evasion.",, ,10 - Financial / Economic,"List of assets and their values. Transaction history. Micro and macroeconomics indexes.","Identify the optimal value of buying complex assets with many analysis variables (vehicles, real estate, stocks, etc.). Determining trends in asset values. Discovery of opportunities.",, ,11 - Logistic,"Data products. Routes and delivery points.","Optimization of goods flows. Inventory optimization.",, ,12 - E-commerce,"Customer registration. Transaction history. Users' generated content.","Increased sales through automatic product recommendations. Increased satisfaction / loyalty.",, ,"13 - Games, social networks and platforms (freemium)","Access history. Registration of users. Geographic and demographic data.",Increase free users conversion rate for paying users by detecting the behaviour and preferences of users. ,, ,14 - Recruitment,"Registration of prospects employees. Professional history, CV. Conections on social networks.","The person's profile evaluation for a specific job role. Criteria for hiring, promotions and dismissal. Better allocation of human resources.",,

Conclusions

  • The table presents a summary for easy understanding of the subject. However, for each business there are many more variables, opportunities and of course, risks. It is highly recommended to use multivariate analysis algorithms to help you prioritize the data and reduce project’s cost and complexity.
  • There are many more sectors in which excellent results have been derived from Big Data and data science methodology initiatives. However we believe that these can serve as examples for the many other types of similar businesses willing to use Big Data.
  • Common to all sectors, Big Data projects need to have relevant and clear input data; therefore it is important to have a good understanding of these datasets and the business model itself. We’ve noticed that currently many businesses haven’t been yet collecting the right data in their systems, which suggests the need pre-Big Data projects. (We will write about this soon). 
  • One obstacle for Big Data projects is the great effort to collect, organize, and clean the input data. This can surely cause overall frustration on stakeholders.
  • At least as far as we are concerned, plug & play Big Data solutions that automatically get the data and bring the analysis immediately still don’t exist. In 100% of the cases, all team members (technical and business) need to cooperate, creating hypothesis, selecting data samples, calibrating parameters, validating results and then drawing conclusions. In this way, an advanced scientific based methodology must be used to take into account business as well as technical aspects of the problem.

What is Aquarela Advanced Analytics?

Aquarela Analytics is Brazilian pioneering company and reference in the application of Artificial Intelligence in industry and large companies. With the Vortx platform and DCIM methodology, it serves important global customers such as Embraer (aerospace & defence), Scania and Randon Group (automotive), Solar Br Coca-Cola (beverages), Hospital das Clínicas (healthcare), NTS-Brasil (oil & gas), Votorantim Energia (energy), among others.

Stay tuned following Aquarela’s Linkedin!

7 characteristics to differentiate BI, Data Mining and Big Data

7 characteristics to differentiate BI, Data Mining and Big Data

Hi everybody

One of the most frequent questions in our day-to-day work at Aquarela is related to a common misconception of the concepts Business Intelligence (BI), Data Mining, and Big Data. Since all of them deal with exploratory data analysis, it is not strange to see wide misunderstandings. Therefore, the purpose of this post is to quickly illustrate what are the most striking features of each one helping readers define their information strategy, which depends on organization’s  strategy, maturity level and its context.

The basics of each involve the following steps:

  1. Survey questions: What does the customer want to learn (find out) of his/her business.3. How many customers do we serve each month? What is the average value of the product? Which product sells best?
  2. Study of data sources: What data are available internal / external data to answer business questions Where are the data? How can I have these data? How can I process them?
  3. Setting the size (scope) of the project: Who will be involved in the project? What is the size of the analysis or the sample? which will be the tools used? and how much will it be charged.
  4. Development: operationalization of the strategy, performing several, data transformations, processing, interactions with the stakeholders to validate the results and assumptions, finding out if the business questions were well addressed and results are consistent.

Until now the Bi, Data Mining and BigData virtually the same, right? So, in the table below we made a summary of what makes them different from each other in seven characteristics followed by important conclusions and suggestions.

Comparative table (Click to enlarge the image)

Comparative table Aquarela English

Conclusions and Recommendations

Although our research restricts itself to 7 characteristics, the results show that there are significant and important differences between the BI, Data Mining and BigData, serving as initial framework for helping decision maker to analysed and decide that fits best they business needs.  the most important points are:

  • We see that companies with a consolidated BI solution have more maturity to embark on extensive Data mining and/or Big Data, projects. Discoveries made by Data mining or Big Data can be quickly tested and monitored by a BI solution. So, the solutions can and must coexist.
  • The Big Data makes sense only in large volumes of data and the best option for your business depends on what questions are being asked and what the available data. All solutions are input data dependent. Consequently if the quality of the information sources is poor, the chances are that the answer is wrong: “garbage in, garbage out”.
  • While the panels of BI can help you to make sense of your data in a very visual and easy way, but you cannot do intense statistical analysis with it. This requires more complex solutions along side data scientists to enrich the perception of the business reality, by mean of finding new correlations, new market segments (classification and prediction), designing infographics showing global trends based on multivariate analysis).
  • Big Data extend the analysis to unstructured data, e.g. social networking posts, pictures, videos, music and etc. However, the degree of complexity increases significantly requiring experts data scientists in close cooperation with business analysts.
  • To avoid frustration is important to take into consideration differences of the value proposition of each solution and its outputs. Do not expect realtime monitoring data of a Data Mining project. In the same sense do not expect that a BI solution discovers new business insights, this is the role of the business operations of the other two solutions.
  • Big Data can be considered partly the combination of BI and Data Mining. While BI comes with a set of structured data in Data Mining comes with a range of algorithms and data discovery techniques. The makes Big Data a plus is the new large distributed processing technology, storage and memory to digest gigantic volumes of data with a wide range of heterogeneous data, more specifically non-structured data.
  • The results of the three can generate intelligence for business, just as the good use of a simple spread sheet can also generate intelligence, but it is important to assess whether this is sufficient to meet the ambitions and dilemmas of your business.
  • The true power of Big Data has not yet been fully recognized, however today’s most advanced companies in terms of technology base their entire strategy on the power and advanced analytics given by Big Data, in many cases they offer their services free of charge to gathering valuable data from the users. E.g.:  Gmail, Facebook, Twitter and OLX.
  • The complexity of data as well as its volume and file types tend to keep growing as presented in a previous post. This implies on the growing demand for Big Data solutions.

In the next post we will present what are interesting sectors for applying data exploratory and how this can be done for each case. Thank you for join us.

What is Aquarela Advanced Analytics?

Aquarela Analytics is Brazilian pioneering company and reference in the application of Artificial Intelligence in industry and large companies. With the Vortx platform and DCIM methodology, it serves important global customers such as Embraer (aerospace & defence), Scania and Randon Group (automotive), Solar Br Coca-Cola (beverages), Hospital das Clínicas (healthcare), NTS-Brasil (oil & gas), Votorantim Energia (energy), among others.

Stay tuned following Aquarela’s Linkedin!

More information