Geographic Normalization: what is it and what are its implications?

Geographic Normalization: what is it and what are its implications?

There is great value in representing reality through visualizations, especially spatial information. If you’ve seen a map, you know that the polygons that make up the political boundaries of cities and states are generally irregular (see Figure 1a). This irregularity makes it difficult to conduct analyzes and, therefore, cannot be dealt with by traditional Business Intelligence tools.

Notice the green dot in Figure 1b, it is over the polygon (‘neighborhood’) n.14, located between n.16 and n.18. So answer now: which region is having the greatest influence on the green dot? Is it neighborhood n.16 or n.18? Is the green dot representative of region n.14, region n.16 or n.18?

To answer questions like these and to minimize the bias generated by visualizations with irregular polygons, the Vortx Platform does what is known as Geographic Normalization, transforming irregular polygons into polygons of a single size and regular shape (see Figure 1c).

After the “ geographic normalization ”, it is possible to analyze the data of a given space by means of absolute statistics, not only relative, and without distortions caused by polygons of different sizes and formats.

normalização geográfica - mapa Florianópolis
Figure 1 – Source: Adapted from the Commercial and Industrial Association of Florianópolis – ACIF (2018)

Every day, people, companies and governments make countless decisions considering the geographic space. Which gym is closest to home for me to enroll? Where should we install the company’s new Distribution Center? Or, where should the Municipality place the health centers?

So, in today’s article, we propose two questions:

  1. What happens when georeferenced information is distorted?
  2. How close can our generalizations about space get?

Geographic standardization

Working with polygons and regions

Recalling that the concept of polygon is derived from geometry, being defined as: “a flat, closed figure formed by straight line segments”. When the polygon has all equal sides and, consequently, all equal angles, we can call it a regular polygon. When this does not happen, it is defined as an irregular polygon.

We use the political division of a territory to understand its contrasts, usually delimiting between Nations, States and Municipalities, for example, but we can also delimit regions according to several characteristics, such as the Caatinga region, the Amazon Basin region and even the Eurozone or Trump and Biden voter zones. Anyway, it is only necessary to surround a certain place in space by some common characteristic. Regional polygons, therefore, are widely used to represent certain regions or the organization of a territory of those regions.

Several market tools fill polygons with different shades of colors, according to the region’s data, looking for contrasts among them. But be careful! In case the sizes and shapes of the polygons are not constant, there may be geographic biases, making the visualization susceptible to misinterpretation.

Thus, the polygon approach becomes limited in the following aspects:

  • Comparisons between regions unevenly;
  • Requiredness to relativize indicators by number of population, area or other factors;
  • It does not allow more granular analyzes;
  • Demands more attention from analysts when creating statements about certain regions.

Purpose of Geographic standardization

Therefore, the reason for the existence of geographic normalization is to overcome the typical problems associated with data analysis related to irregular polygons, transforming the organization of the territory into a set of polygons (in this case, hexagons) of regular size and shape.

In the example below, we compare the two approaches:

1) Analysis with mesoregional polygons and; 2) Hexagons over the southeastern region of Brazil.

Normalização da Geografia | geographic normalization
Figure 2 – Source: Aquarela Advanced Analytics (2020)

Geographic Normalization seeks to minimize possible distortions of analysis generated by irregular polygons by replacing them with polygons of regular shape and size. This provides an elegant, eye-pleasing and precise alternative, capable of showing initially unknown patterns.

Normalization makes the definition of neighborhoods between polygons clearer and simpler, including promoting better adherence to artificial intelligence algorithms that search for patterns and events that are spatially autocorrelated.

After all, according to the First Law of Geography:

“All things are related to everything else, but things close are more related than distant things.” 

Waldo Tobler

Geographic normalization can also be done in different ways, such as by equilateral triangles or squares. However, the hexagon provides the least bias among these due to the smaller size of its side walls.

With the normalization, it is possible to summarize the statistics of points (inhabitants, homes, schools, health centers, supermarkets, industries, etc.) contained within these hexagons so that there is constancy in the area of ​​analysis and, of course, significant statistics of these summaries. Mature analytics companies, with a robust and well-consolidated datalake, have an advantage in this type of approach. Also check out our article on How to choose the best AI or data analytics provider?

Usage of normalized geography

Normalized geography can also be used through interactive maps. Maps of this type allow a very interesting level of approximation in the analyzes, as we can see in the animation below, where we show a visualization of the Vortx Platform that presents schools in the city of Curitiba, Brazil.

The darker the hexagon, the greater the number of schools. Note that we can also access other data through the pop-up and change the size of the hexagon as wished.

“The greater the amount of point data available in a region, the smaller the possible size of the hexagons”. 

Limitations of the standardized analysis

Like any representation of reality, models that use standardized analysis – although of great value in decision making – do not completely replace the illustration of spatial data in irregular polygons, especially when:

  • There is a clear political division to be considered;
  • There is no reasonable amount of data;
  • There is no consensus on the size of regular polygons.

In addition, the computational process to produce normalized maps must also be taken into consideration, since the processing of the data in this is not limited to the number of observations of the analyzed phenomenon, but also to the treatment of the geography under analysis. For example, conventional workstations can take hours to process basic geostatistical calculations for the 5573 cities in Brazil.

Geographic Normalization – Conclusions and recommendations 

In this article we explain geographic normalization, its importance, advantages and cautions for conducting spatial analyzes. In addition, we compared two important approaches to spatial data analysis. It is worth noting that these approaches are complementary in order to have a better understanding of the distribution of data on space. Therefore, we recommend viewing the analyzes in multiple facets.

We realized that, when designing the geographic space in an equitable way, a series of benefits to the analyzes becomes feasible, such as:

  • Alignment of the size of views according to business needs;
  • Adaption of the visualizations according to the availability of data;
  • Being able to make “fair” comparisons through absolute indicators of each region;
  • Observation of intensity areas with less bias;
  • Simplification of neighborhood definition between polygons, thus providing better adherence to spatial algorithms;
  • Finding patterns and events that autocorrelate in space with greater accuracy;
  • Usage of artificial intelligence algorithms (supervised and unsupervised) to identify points of interest that would not be identified without standardization. More information at: Application of Artificial Intelligence in georeferenced analyzes.

Finally, every tool has a purpose, geo-referenced visualizations can lead to bad or good decisions.

Therefore, using the correct visualization, along with the right and well-implemented algorithms, based on an appropriate analytical process, can enhance critical decisions that will lead to great competitive advantages that are so important in face of current economic challenges.

What is Aquarela Advanced Analytics?

Aquarela Analytics is Brazilian pioneering company and reference in the application of Artificial Intelligence in industry and large companies. With the Vortx platform and DCIM methodology, it serves important global customers such as Embraer (aerospace), Randon Group (automotive), Solar Br Coca-Cola (food), Hospital das Clínicas (health), NTS- Brazil (oil and gas), Votorantim (energy), among others.

Stay tuned following Aquarela’s Linkedin!

Authors

AI provider? How to choose the best AI and Data Analytics provider?

AI provider? How to choose the best AI and Data Analytics provider?

Choosing a artificial intelligence provider for analytics projects, dynamic pricing, demand forecasting is, without a doubt, a process that should be on the table of every manager in the industry. Therefore, in case you are considering to speed up the process, an exit and the hiring of companies specialized in the subject.

A successful implementation of analytics is, to a large extent, a result of a well-balanced partnership between the internal teams and the teams a analytics service provider, so this is an important decision. Herein, we will cover some of key concerns.

Assessing the AI provider based on competencies and scale

First, you must evaluate your options based on the skills of the analytics provider. Below we bring some for criteria:

  • Consistent working method in line with your organization’s needs and size.
  • Individual skills of team members and way of working.
  • Experience within your industry, as opposed to the standard market offerings.
  • Experience in the segment of your business.
  • Commercial maturity of solutions such as the analytics platform.
  • Market reference and ability to scale teams.
  • Ability to integrate external data to generate insights you can’t have internally.

Whether developing an internal analytics team or hiring externally, the fact is that you will probably spend a lot of money and time with your analytics and artificial intelligence provider(partner), so it is important that they bring the right skills to your department’s business or process.

Consider all the options in the analytics offering.

We have seen many organizations limit their options to Capgemini, EY, Deloitte, Accenture and other major consultancies or simply developing internal analytics teams. Although:

But there are many other good options on the market, including the Brazilian ones which are worth paying attention to the their rapid growth. Mainly within the main technological centers of the country, such as: in Florianópolis or Campinas.

Adjust expectations and avoid analytical frustrations

We have seen, on several occasions, the frustrated creation of fully internal analytics teams, be they for configuring data-lakes, data governance, machine learning or systems integration.

The scenario for the adoption of AI is similar, at least per hour, to the time when companies developed their own internal ERPs in data processing departments. Today of the 4000 largest technology accounts in Brazil, only 4.2% maintain the development of internal ERP, of which the predominant are banks and governments, which makes total sense from the point of view of strategy and core business.

We investigated these cases a little more and noticed that there are at least four factors behind the results:

  • Non-data-driven culture and vertical segmentation prevent the necessary flow (speed and quantity) of ideas and data that make analytics valuable.
  • Projects waterfall management style performed in the same manner as if the teams where creating a physical artifacts or ERP systems, this style is not suitable for analytics.
  • Difficulty in hiring professionals with knowledge of analytics in the company’s business area together with the lack of on-boarding programs suited to the challenges.
  • Technical and unforeseen challenges happen very often, so it is necessary to have resilient professionals used to these cognitive capoeira (as we call here). Real life datasets are never ready and are as calibrated as those of the examples of machine learning of the passengers of the titanic dataset. They usually have outliers (What are outliers?), They are tied to complex business processes and full of rules as in the example of the dynamic pricing of London subway tickets (Article in Portuguese).

While there is no single answer to how to deploy robust analytics and governance and artificial intelligence processes, remember that you are responsible for the relationship with these teams, and for the relationship between the production and analytics systems.

Understand the strengths of analytics provider, but also recognize their weaknesses

It is difficult to find resources with depth and functional and technical qualities in the market, especially if the profile of your business is industrial, involving knowledge of rare processes, for instance, the physical chemical process for creating brake pads or other specific materials.

But, like any organization, these analytics provider can also have weaknesses, such as:

  • Lack of international readiness in the implementation of analytics (methodology, platform), to ensure that you have a solution implemented fast.
  • Lack of migration strategy, data mapping and ontologies
  • No guarantee of transfer of knowledge and documentation.
  • Lack of practical experience in the industry.
  • Difficulty absorbing the client’s business context

Therefore, knowing the provider’s methods and processes well is essential.
The pillars of a good Analytics and AI project are the Methodology and its Technological Stack (What is a technological stack?). Therefore, seek to understand about the background of the new provider, ask about their experiences with other customers of similar size to yours.

Also, try to understand how this provider solved complex challenges in other businesses, even if these are not directly linked to your challenge.

Data Ethics

Ethics in the treatment of data is a must have, therefore we cannot fail to highlight this topic of compliance. It is not just now that data is becoming the center of management’s attention, however new laws are being created as example of GDPR in Europe and LGPD in Brazil.

Be aware to see how your data will be treated, transferred and saved by the provider, and if his/her name is cleared on google searches of even public organizations.

Good providers are those who, in addition to knowing the technology well, have guidelines for dealing with the information of your business, such as:

  • It has very clear and defined security processes
  • Use end-to-end encryption
  • Track your software updates
  • Respect NDAs (Non-disclosure Agreements) – NDAs should not be simply standard when it comes to data.
  • All communication channels are aligned and segmented by security levels.
  • They are well regarded by the data analysis community.

Conclusions and recommendations

Choosing your Analytics provider is one of the biggest decisions you will make for your organization’s digital transformation.

Regardless of which provider you choose for your company, it is important that you assemble an external analytics consulting team that makes sense for your organization, that has a technological successful and proven business track that supports your industry’s demand.

What is Aquarela Advanced Analytics?

Aquarela Analytics is Brazilian pioneering company and reference in the application of Artificial Intelligence in industry and large companies. With the Vortx platform and DCIM methodology, it serves important global customers such as Embraer (aerospace), Randon Group (automotive), Solar Br Coca-Cola (food), Hospital das Clínicas (health), NTS- Brazil (oil and gas), Votorantim (energy), among others.

Stay tuned following Aquarela’s Linkedin!

Author

AI for demand forecasting in the food industry

AI for demand forecasting in the food industry

The concept of a balance point between supply and demand is used to explain various situations in our daily lives, from bread in the neighborhood bakery, which can be sold at the equilibrium price, which equals the quantities desired by buyers and sellers, to the negotiation of securities of companies in the stock market.

On the supply side, a definition of the correct price to be practiced and mainly the quantity are common issues in the planning and execution of the strategy of several companies.

In this context, how are technological innovations in the data area establishing themselves in the food sector?

The construction of the demand forecast

The projection of demand is often built through historical sales data, growth prospects for the sector or even targets set to engage sales of a certain product.

When considering only these means of forecasting, without considering the specific growth of each SKU (Stock Keeping Unit), companies can fall into the traps of subjectivity or generalism.

The expansion of a sector does not result in a growth of the same magnitude for the entire product mix. For example, does a projected annual growth of 6% for the food sector necessarily imply equivalent growth for the noble meat segment?

Possibly not, as this market niche may be more resilient or sensitive than the food sector, or it may even suffer from recent changes in consumer habits.

Impacts of Demand Forecasting Errors

For companies, mainly large ones with economies of scale and geographic capillarity, an error in the forecast of demand can cause several consequences, such as:

  • Stock break;
  • Perishable waste (What is FIFO?);
  • Drop in production;
  • Idle stock (slow moving)
  • Pricing errors

Adversities like these directly impact the companies’ final results, as they result in loss of market share, increase in costs or low optimization in the dilution of fixed costs, growth in the loss of perishable products, frustration of employees in relation to the goals and mainly break in the confidence of recurring customers who depend on supply for their operations.

The demand forecast in the food sector

The food industry is situated in a context of highly perishable products with the following characteristics:

  • High inventory turnover;
  • Parallel supply in different locations;
  • Large number of Skus, points of production and points of sale;
  • Verticalized supply chain;
  • Non-linearity in data patterns;
  • Seasonality.

These characteristics make the sector a business niche that is more sensitive to deviations in demand forecast and adjacent planning.

Supply chain opportunity

As an alternative to the traditional demand forecast format, there are opportunities to use market and AI data to assist managers in the S&OP (Sales & Operations Planning) process, as well as in the S&OE (Sales and Operations Execution) process.

During the S&OP process, demand forecasting supported by AI facilitates the work of the marketing and sales areas, as well as reducing uncertainty and increasing predictability for the supply chain areas.

In the S&OE process, AI can be used to identify new opportunities and to correct deviations from what was planned.

In addition to the technical attributes that AI can add to the process, the data base reduces points of conflict between teams, reduces historical disputes between preferences for SKUs and makes the process more transparent between areas.

Previously, in our blog, we addressed the challenges of forecasting demand in our view (pt. 1 in portuguese). In the articles, we cite the differentials of the predictive approach in relation to demand, taking into account factors such as seasonality, geographic / regional preferences and changes in consumer behavior.

We understand that the need for a predictive approach through data, mainly external to the company, is increasingly latent.

The role of machine learning in the food sector

The use of AI through machine learning techniques associated with a coherent technological stack of analytics (What is a technological stack?) Provides greater information speed, data organization with different granularities (region, state, city and neighborhood), adjustments seasonality, exploration of opportunities and decision making in real time.

In the case of the food sector, the greatest accuracy in forecasting demand means:

  • Inventory optimization among Distribution Centers (CDs);
  • Reduction of idle stocks;
  • Decrease in disruptions that cause loss of market share due to substitute products;
  • Direct reduction in losses with perishability (FIFO).

The great technical and conceptual challenge faced by data scientists (The profile of data scientists in the view of Aquarela), however, is the modeling of analysis datasets (what are datasets?) That will serve for the proper training of machines.

Please note that:

“Performing machine training with data from the past alone will cause the machines to replicate the same mistakes and successes of the past, especially in terms of pricing, so the goal should be to create hybrid models that help AI replicate with more intensity and emphasis the desired behaviors of the management strategy “.

In the case of Aquarela Analytics, the demand forecast module of Aquarela Tactics makes it possible to obtain forecasts integrated into corporate systems and management strategies. It was created based on real national-wide retail data and algorithms designed to meet specific demands in the areas of marketing, sales, supply chain, operations and planning (S&OP and S&OE).

Conclusions and recommendations

In this article, we present some key characteristics of the operation of demand forecasting in the food sector. We also comment, based on our experiences, on the role of structuring analytics and AI in forecasting demand. Both are prominent and challenging themes for managers, mathematicians and data scientists.

Technological innovations in forecasting, especially with the use of Artificial Intelligence algorithms, are increasingly present in the operation of companies and their benefits are increasingly evident in industry publications.

In addition to avoiding negative points of underestimating demand, the predictive approach, when done well, makes it possible to gain market share in current products and a great competitive advantage in forecasting opportunities in other niches before competitors.

What is Aquarela Advanced Analytics?

Aquarela Analytics is Brazilian pioneering company and reference in the application of Artificial Intelligence in industry and large companies. With the Vortx platform and DCIM methodology, it serves important global customers such as Embraer (aerospace), Randon Group (automotive), Solar Br Coca-Cola (food), Hospital das Clínicas (health), NTS- Brazil (oil and gas), Votorantim (energy), among others.

Stay tuned following Aquarela’s Linkedin!

What is a technological stack?

What is a technological stack?

The stack represents a set of integrated systems to run a single application without additional software. In this way and above all, one of the main goals of a technology stack is to improve communication about how an application is built. In addition, the chosen technology package may contain:

  • the programming languages ​​used;
  • structures and tools that a developer needs to interact with the application;
  • known performance attributes and limitations;
  • survey of strengths and weaknesses of the application in general.

As a rule, stacks must have a specific purpose. For instance, if we look at the the web 3.0 stack (what is web 3.0?), you will see how much different it is in relation to a data analysis stack in statistical R language. That is, the construction of a stack you should always ask: What is the underlying business purpose?

Where does this term come from?

The term comes from the software development community and along with it it is also quite common to speak of a full-stack developer.

A full-stack developer is, in turn, the professional who knows how to work in all layers of technologies of a 100% functional application.

Why is the technological stack so important?

Firstly, on the one hand, the accountant has all company transactions registered for financial management, on the other hand, developers and project leaders need the information of the development team.

Secondly, developers cannot manage their work effectively without at least knowing what is happening, what are the available technology assets (systems, databases, programming languages, communication protocols) and so on.

The technological stack is just as important as lifting inventory control from a company that sells physical products. It is in the technological stack that both the business strategy and the main learning (maturity) of system tests that the company has been through are concentrated.

The technological stack the working dictionary of developers in the same manner data analytics look at their data dictionaries to understand the meaning of variables and columns. It is an important item of maturity in the governance of organizations.

Without prior knowledge of the technological stack, management is unable to plan hiring, risk mitigation plans, plans to increase service capacity and, of course, the strategy for using data in the business area.

Technology stacks are particularly useful for hiring developers, analysts and data scientists.

“Companies that try to recruit developers often include their technology stack in their job descriptions.”

For this reason, professionals interested in advancing their careers should pay attention to the strategy of personal development of their skills in a way that is in line with market demand.

Technological stack example

The professional social network, Linkedin, for example: it is composed of a combination of structures and programming languages ​​and artificial intelligence algorithms to be online. So, here are some examples of technologies used in their stack:

Technological Stack – Linkedin for 300 million hits – Author Philipp Weber (2015)

Is there a technological stack for analytics?

Yes, currently the area of ​​analytics, machine learning, artificial intelligence are known for the massive use of techniques and technologies of information systems. Likewise, analytical solutions require very specific stacks to meet functional (what the system should do) and non-functional (how the system will do – security, speed, etc.) business requirements for each application.

As the foundation of a house, the order in which the stack is built is important and is directly linked to the maturity of the IT and analytics teams, so we recommend reading this article – The 3 pillars of the maturity of the analytics teams (in Portuguese).

In more than 10 years of research in different types of technologies, we have gone through several technological compositions until we reached the conformation of the current Aquarela Vortx platform. The main stack results for customers are:

  • Reduction of technological risk (learning is already incorporated in the stack);
  • technological update;
  • speed of deployment and systems integration (go-live);
  • maturity of the maintenance of the systems in production and;
  • the quality of the interfaces and flows in the production environment as the stack makes the maintenance of technicians’ knowledge more efficient.

Conclusions and recommendations

In conclusion, we presented our vision of the technological stack concept and how it is also important for analytical projects. Which, in turn, impacts strategic planning. Yet, it is worth bearing in mind that technological stacks are just like business, always evolving.

The success of defining successful stacks is directly linked to the maturity of the IT and analytics teams (The 3 pillars of the maturity of the analytics teams – In Portuguese).

Regardless of the sector, the decisions involved in shaping the technological stack are a factor of success or failure in IT and analytics projects. Because, they directly interfere in the operation and in the business strategy.

Finally, we recommend reading this other article on technology mitigation with support from specialized companies – (How to choose the best data analytics provider? in Portuguese).

What is Aquarela Advanced Analytics?

Aquarela Analytics is Brazilian pioneering company and reference in the application of Artificial Intelligence in industry and large companies. With the Vortx platform and DCIM methodology, it serves important global customers such as Embraer (aerospace), Randon Group (automotive), Solar Br Coca-Cola (food), Hospital das Clínicas (health), NTS- Brazil (oil and gas), Votorantim (energy), among others.

Stay tuned following Aquarela’s Linkedin!

Author

14 sectors for applying Big Data and their input datasets

14 sectors for applying Big Data and their input datasets

Hello folks, 

In the vast majority of talks with clients and prospects about Big Data, we soon realized an astonishing gap between the business itself and the expectations of Data Analytics projects. Therefore, we carried out a research to respond the following questions: 

  • What are the main business sectors that already use Big Data?
  • What are the most common Big Data results per sector?
  • What is the minimum dataset to reach the results per sector

The summary is organized in the table below.

,Business type / sector,Raw data examples,Business Opportunities,, ,"1 - Bank, Credit and Insurance ","Transaction history. Registration forms. External references such as the Credit Protection Service. Micro and macro economic indices. Geographic and demographic data.","Credit approval. Interest rates changes. Market analysis. Prediction of default . Fraud detection. Identifying new niches. Credit risk analysis.",, ,2 - Security,"Access history. Registration form. Texts of news and WEB content.",Pattern detection of physical or digital behaviours that offer any type of risk.,, ,3 - Health,"Medical records. Geographic and demographic data. Sequencing genomes.","Predictive diagnosis (forecast). Analysis of genetic data. Detection of diseases and treatments. Map of health based on historical data. Adverse effects of medications / treatments.",, ,"4 - Oil, gas and electricity",Distributed sensor data.,"Optimization of production resources. Prediction / fault and found detection.",, ,5 - Retail,"Transaction history. Registration form. Purchase path in physical and/or virtual stores. Geographic and demographic data. Advertising data. Customer complaints.","Increasing sales by product mix optimization based on behaviour patterns during purchase. Billing analysis (as-is, trends), the high volume of customers and transactions, credit profile by regions. Increasing satisfaction / loyalty.",, ,6 - Production,"Data management system / ERP production. Market Data.","Optimization of production over sales. Decreased time / amount of storage. Quality control.",, ,7 - Representative organizations,"Customer's registration form. Event data. Business process management and CRM systems.","Suggestion of optimal combinations of company profiles, customers, business leverage to suppliers. Synergy opportunities identification.",, ,8 - Marketing,"Micro and macroeconomic indices. Market research. Geographic and demographic data. Content generated by users. Data from competitors. ","Market segmentation. Optimizing the allocation of advertising resources. Finding niche markets. Performance brand / product. Identifying trends.",, ,9 - Education,"Transcripts and frequencies. Geographic and demographic data. ","Personalization of education. Predictive analytics for school evasion.",, ,10 - Financial / Economic,"List of assets and their values. Transaction history. Micro and macroeconomics indexes.","Identify the optimal value of buying complex assets with many analysis variables (vehicles, real estate, stocks, etc.). Determining trends in asset values. Discovery of opportunities.",, ,11 - Logistic,"Data products. Routes and delivery points.","Optimization of goods flows. Inventory optimization.",, ,12 - E-commerce,"Customer registration. Transaction history. Users' generated content.","Increased sales through automatic product recommendations. Increased satisfaction / loyalty.",, ,"13 - Games, social networks and platforms (freemium)","Access history. Registration of users. Geographic and demographic data.",Increase free users conversion rate for paying users by detecting the behaviour and preferences of users. ,, ,14 - Recruitment,"Registration of prospects employees. Professional history, CV. Conections on social networks.","The person's profile evaluation for a specific job role. Criteria for hiring, promotions and dismissal. Better allocation of human resources.",,

Conclusions

  • The table presents a summary for easy understanding of the subject. However, for each business there are many more variables, opportunities and of course, risks. It is highly recommended to use multivariate analysis algorithms to help you prioritize the data and reduce project’s cost and complexity.
  • There are many more sectors in which excellent results have been derived from Big Data and data science methodology initiatives. However we believe that these can serve as examples for the many other types of similar businesses willing to use Big Data.
  • Common to all sectors, Big Data projects need to have relevant and clear input data; therefore it is important to have a good understanding of these datasets and the business model itself. We’ve noticed that currently many businesses haven’t been yet collecting the right data in their systems, which suggests the need pre-Big Data projects. (We will write about this soon). 
  • One obstacle for Big Data projects is the great effort to collect, organize, and clean the input data. This can surely cause overall frustration on stakeholders.
  • At least as far as we are concerned, plug & play Big Data solutions that automatically get the data and bring the analysis immediately still don’t exist. In 100% of the cases, all team members (technical and business) need to cooperate, creating hypothesis, selecting data samples, calibrating parameters, validating results and then drawing conclusions. In this way, an advanced scientific based methodology must be used to take into account business as well as technical aspects of the problem.

What is Aquarela Advanced Analytics?

Aquarela Analytics is Brazilian pioneering company and reference in the application of Artificial Intelligence in industry and large companies. With the Vortx platform and DCIM methodology, it serves important global customers such as Embraer (aerospace), Randon Group (automotive), Solar Br Coca-Cola (food), Hospital das Clínicas (health), NTS- Brazil (oil and gas), Votorantim (energy), among others.

Stay tuned following Aquarela’s Linkedin!

7 characteristics to differentiate BI, Data Mining and Big Data

7 characteristics to differentiate BI, Data Mining and Big Data

Hi everybody

One of the most frequent questions in our day-to-day work at Aquarela is related to a common misconception of the concepts Business Intelligence (BI), Data Mining, and Big Data. Since all of them deal with exploratory data analysis, it is not strange to see wide misunderstandings. Therefore, the purpose of this post is to quickly illustrate what are the most striking features of each one helping readers define their information strategy, which depends on organization’s  strategy, maturity level and its context.

The basics of each involve the following steps:

  1. Survey questions: What does the customer want to learn (find out) of his/her business.3. How many customers do we serve each month? What is the average value of the product? Which product sells best?
  2. Study of data sources: What data are available internal / external data to answer business questions Where are the data? How can I have these data? How can I process them?
  3. Setting the size (scope) of the project: Who will be involved in the project? What is the size of the analysis or the sample? which will be the tools used? and how much will it be charged.
  4. Development: operationalization of the strategy, performing several, data transformations, processing, interactions with the stakeholders to validate the results and assumptions, finding out if the business questions were well addressed and results are consistent.

Until now the Bi, Data Mining and BigData virtually the same, right? So, in the table below we made a summary of what makes them different from each other in seven characteristics followed by important conclusions and suggestions.

Comparative table (Click to enlarge the image)

Comparative table Aquarela English

Conclusions and Recommendations

Although our research restricts itself to 7 characteristics, the results show that there are significant and important differences between the BI, Data Mining and BigData, serving as initial framework for helping decision maker to analysed and decide that fits best they business needs.  the most important points are:

  • We see that companies with a consolidated BI solution have more maturity to embark on extensive Data mining and/or Big Data, projects. Discoveries made by Data mining or Big Data can be quickly tested and monitored by a BI solution. So, the solutions can and must coexist.
  • The Big Data makes sense only in large volumes of data and the best option for your business depends on what questions are being asked and what the available data. All solutions are input data dependent. Consequently if the quality of the information sources is poor, the chances are that the answer is wrong: “garbage in, garbage out”.
  • While the panels of BI can help you to make sense of your data in a very visual and easy way, but you cannot do intense statistical analysis with it. This requires more complex solutions along side data scientists to enrich the perception of the business reality, by mean of finding new correlations, new market segments (classification and prediction), designing infographics showing global trends based on multivariate analysis).
  • Big Data extend the analysis to unstructured data, e.g. social networking posts, pictures, videos, music and etc. However, the degree of complexity increases significantly requiring experts data scientists in close cooperation with business analysts.
  • To avoid frustration is important to take into consideration differences of the value proposition of each solution and its outputs. Do not expect realtime monitoring data of a Data Mining project. In the same sense do not expect that a BI solution discovers new business insights, this is the role of the business operations of the other two solutions.
  • Big Data can be considered partly the combination of BI and Data Mining. While BI comes with a set of structured data in Data Mining comes with a range of algorithms and data discovery techniques. The makes Big Data a plus is the new large distributed processing technology, storage and memory to digest gigantic volumes of data with a wide range of heterogeneous data, more specifically non-structured data.
  • The results of the three can generate intelligence for business, just as the good use of a simple spread sheet can also generate intelligence, but it is important to assess whether this is sufficient to meet the ambitions and dilemmas of your business.
  • The true power of Big Data has not yet been fully recognized, however today’s most advanced companies in terms of technology base their entire strategy on the power and advanced analytics given by Big Data, in many cases they offer their services free of charge to gathering valuable data from the users. E.g.:  Gmail, Facebook, Twitter and OLX.
  • The complexity of data as well as its volume and file types tend to keep growing as presented in a previous post. This implies on the growing demand for Big Data solutions.

In the next post we will present what are interesting sectors for applying data exploratory and how this can be done for each case. Thank you for join us.

What is Aquarela Advanced Analytics?

Aquarela Analytics is Brazilian pioneering company and reference in the application of Artificial Intelligence in industry and large companies. With the Vortx platform and DCIM methodology, it serves important global customers such as Embraer (aerospace), Randon Group (automotive), Solar Br Coca-Cola (food), Hospital das Clínicas (health), NTS- Brazil (oil and gas), Votorantim (energy), among others.

Stay tuned following Aquarela’s Linkedin!

More information