7 characteristics to differentiate BI, Data Mining and Big Data

7 characteristics to differentiate BI, Data Mining and Big Data

Hi everybody

One of the most frequent questions in our day-to-day work at Aquarela is related to a common misconception of the concepts Business Intelligence (BI), Data Mining, and Big Data. Since all of them deal with exploratory data analysis, it is not strange to see wide misunderstandings. Therefore, the purpose of this post is to quickly illustrate what are the most striking features of each one helping readers define their information strategy, which depends on organization’s  strategy, maturity level and its context.

The basics of each involve the following steps:

  1. Survey questions: What does the customer want to learn (find out) of his/her business.3. How many customers do we serve each month? What is the average value of the product? Which product sells best?
  2. Study of data sources: What data are available internal / external data to answer business questions Where are the data? How can I have these data? How can I process them?
  3. Setting the size (scope) of the project: Who will be involved in the project? What is the size of the analysis or the sample? which will be the tools used? and how much will it be charged.
  4. Development: operationalization of the strategy, performing several, data transformations, processing, interactions with the stakeholders to validate the results and assumptions, finding out if the business questions were well addressed and results are consistent.

Until now the Bi, Data Mining and BigData virtually the same, right? So, in the table below we made a summary of what makes them different from each other in seven characteristics followed by important conclusions and suggestions.

Comparative table (Click to enlarge the image)

Comparative table Aquarela English

Conclusions and Recommendations

Although our research restricts itself to 7 characteristics, the results show that there are significant and important differences between the BI, Data Mining and BigData, serving as initial framework for helping decision maker to analysed and decide that fits best they business needs.  the most important points are:

  • We see that companies with a consolidated BI solution have more maturity to embark on extensive Data mining and/or Big Data, projects. Discoveries made by Data mining or Big Data can be quickly tested and monitored by a BI solution. So, the solutions can and must coexist.
  • The Big Data makes sense only in large volumes of data and the best option for your business depends on what questions are being asked and what the available data. All solutions are input data dependent. Consequently if the quality of the information sources is poor, the chances are that the answer is wrong: “garbage in, garbage out”.
  • While the panels of BI can help you to make sense of your data in a very visual and easy way, but you cannot do intense statistical analysis with it. This requires more complex solutions along side data scientists to enrich the perception of the business reality, by mean of finding new correlations, new market segments (classification and prediction), designing infographics showing global trends based on multivariate analysis).
  • Big Data extend the analysis to unstructured data, e.g. social networking posts, pictures, videos, music and etc. However, the degree of complexity increases significantly requiring experts data scientists in close cooperation with business analysts.
  • To avoid frustration is important to take into consideration differences of the value proposition of each solution and its outputs. Do not expect realtime monitoring data of a Data Mining project. In the same sense do not expect that a BI solution discovers new business insights, this is the role of the business operations of the other two solutions.
  • Big Data can be considered partly the combination of BI and Data Mining. While BI comes with a set of structured data in Data Mining comes with a range of algorithms and data discovery techniques. The makes Big Data a plus is the new large distributed processing technology, storage and memory to digest gigantic volumes of data with a wide range of heterogeneous data, more specifically non-structured data.
  • The results of the three can generate intelligence for business, just as the good use of a simple spread sheet can also generate intelligence, but it is important to assess whether this is sufficient to meet the ambitions and dilemmas of your business.
  • The true power of Big Data has not yet been fully recognized, however today’s most advanced companies in terms of technology base their entire strategy on the power and advanced analytics given by Big Data, in many cases they offer their services free of charge to gathering valuable data from the users. E.g.:  Gmail, Facebook, Twitter and OLX.
  • The complexity of data as well as its volume and file types tend to keep growing as presented in a previous post. This implies on the growing demand for Big Data solutions.

In the next post we will present what are interesting sectors for applying data exploratory and how this can be done for each case. Thank you for join us.

What is Aquarela Advanced Analytics?

Aquarela Analytics is Brazilian pioneering company and reference in the application of Artificial Intelligence in industry and large companies. With the Vortx platform and DCIM methodology, it serves important global customers such as Embraer (aerospace & defence), Scania and Randon Group (automotive), Solar Br Coca-Cola (beverages), Hospital das Clínicas (healthcare), NTS-Brasil (oil & gas), Votorantim Energia (energy), among others.

Stay tuned following Aquarela’s Linkedin!

More information

What is Web 3.0 and why it is so important for business?

What is Web 3.0 and why it is so important for business?

Greetings to all!
Web 3.0 as concept and technology is important and here is why.

Day after day, the amount of data and information as we discussed in the last post on the internet grows exponentially. New sites, images, videos and all sort of digital materials are coming up every second. Thus, with this huge set of data, a major challenge is how to cost-effectively extract what is relevant to our day-to-day activities. Therefore:

In a complex ever-changing information-intensive context, Web 3.0 tools are valuable for users in organizing information and business processes at large scale.

The evolution of the Web

Firstly, since the emergence of the first Web version, created in the early 90s by Tim Berners-Lee in Switzerland, its technologies have undergone significant changes until we reach the surface of Web 3.0, this happened especially in terms of user’s interactivity and the massification of the internet usage.

In short, according to our research a Aquarela Analytics, the Web’s history presents three major stages:

The Static Web – Web 1.0

The Web 1.0 presented data and information in a predominantly static way, being characterised by low users’ interaction with the content. For instance: leaving comments, manipulating or creating content of a website.

Technologies and methods of Web 1.0 are still widely used for displaying static content such as laws and manuals like this example: http://copyright.gov/title17/92preface.html . Yet, this text was build on this paradigm.

That generation of the Web was marked by the centralisation of the content production – such as portals,  AOL and directories, Yahoo, and Craigslist.

On Web 1.0 the user is responsible for its own navigation and the identification of relevant content, having a predominantly passive role in the process.

Another important aspect is that just few produce information that is consumed for many. Likewise, the broadcasting model widely used in the media industry by TV, radio, newspapers and magazines.

Web 1.0’s greatest virtue was the democratisation of information access.

The Interactive Web – Web 2.0

Web 2.0 in contrast to Web 1.0 has its content predominantly generated by its users in a process where: many users produce content and many consume.

An example of this model is Wikipedia. Other examples of user-generated content platforms are in blogs, social networks and YouTube. In the Web 2.0 users are no longer just content consumers; they become producers or co-producers of contents.

In this version of the Web, search engines become more advanced and proliferate, since there is no more room for lists of links in directories, which has given a huge volume of content made by many.

Web 2.0’s great virtue is the democratisation of content production.

The Actionable Intelligent Web – Web 3.0

Web 3.0 or Semantic Web combines the virtues of Web 1.0 and 2.0 by adding machine intelligence.

Tim Berners-Lee (2001), who is the creator of the Web, has published an article in the Scientific American magazine setting up the foundation of the Semantic Web.

In his words, Berners-Lee explained how two brothers organised the logistics to support their mother health treatment, using intelligent agents, they do all the planning and execution of the process automatically interacting with clinical systems, among themselves and with their home devices.

In Web 3.0, the machines get along with users in content production and in decision-making, transforming traditional supportive role of the internet infrastructure to a protagonist entity in content/process generation.

Furthermore, Web 3.0 services can unite users and computers for problem-solving and intensive knowledge creation tasks. Therefore, with its large processing capacity, Web 3.0 is able to bring services and products to people and businesses with high added value because of their assertiveness and high customisation.

Web 3.0’s great virtue is the democratisation of the capacity of action and knowledge, which was previously only accessible to large businesses and governments.

Evolution of the Web summarized

Web 3.0 comparison among previous versions
Web 3.0 comparison among previous versions

Web 3.0 examples

Examples of Web 3.0 applications are Wolfram Alpha and Apple’s Siri, which can summarise large amounts of information into knowledge and useful actions for people. 

Wolfram Alpha

We can do a little comparison between Wolfram Alpha and Google, using both tools, typing the “Brazil vs. Argentina” phrase in both searching engines, and then we see big differences in the results:

Search results Google vs WolframAlpha

In the case of Google, the results turn out to be mostly about football games between Brazil and Argentina. Note that the word “football” or “games” were not mentioned in the search.  

In Wolfram Alpha, the tool considers that the search is a comparison between two countries and consequently brings organised statistics, historical, geographical (maps), demographic, linguistic and other useful aspects for comparison analysis.

Siri

The Apple’s Siri, in turn, uses techniques of speech recognition and artificial intelligence to bring results and perform actions such as:

“Where is the nearest pizzeria?” or

“How far am I from the nearest gas station” or “make an appointment at 9:00 am tomorrow.”

Above all, Traditional tools (Web 1.0 and 2.0) make search matching “word by word like” of the text in relation to what is published on the network. In other words, often it brings information bias of what is most abundant ending up not bringing what is most relevant to the user at that time.

Web 3.0 systems, however, seek contextualised knowledge to assist people in their jobs, pointing to series of analysis and potentially helpful information.

One of the distinctions of Web 3.0 search engine, is the time that user need to spend sailing in a sea of ​​information to find what he/she really wants to get solved.

Companies like Apple and IBM have been investing heavily in Web 3.0 technologies, for example, the Google Inc. over the past decade has made several acquisitions of companies in the Semantic Web area, such as Applied Semantics, and Metaweb Technologies, Inc, among others.

Conclusions em recommendations

We are living in an interesting time in history, where the Web begins to bring more knowledge and action capacity for its users, resulting in considerable changes in several aspects of daily life.

This new type of Web is moving fast towards a more dynamic and faster changing environment, where the democratisation of the capacity of action and knowledge can speed up business in almost all areas.

The areas impacted by Web 3.0 are ranging from: retail to applied molecular medicine; from micro-businesses to large corporations.

It is worth for innovative minds, whether business people, politicians, or researchers, to understand this new horizon of possibilities and be prepared for the new generation of businesses.

Some new business with the semantic web are already happening and, increasingly taking their momentum in the national and international markets. 

Web 3.0 is the progressive evolution of the Web. Hence by not getting along with its evolution, managers might bring organizational risks that suddenly might become obsolete or irrelevant at the time of paradigm shifts like the giants of the past such as Kodak, Nokia and Altavista.

In future posts, we will talk about Data Analytics and Big Data solutions that we developed and which we believe to be the way to materialize business faster (earlier) than Web 3.0 and Linked Open Data (LOD), although all of them are getting more and more intertwined.  It is important to understand the way Web 3.0 is getting through Big Data and LOD.

Several interesting challenges ahead!

Business update in 2022 – Link

What is Aquarela Advanced Analytics?

Aquarela Analytics is Brazilian pioneering company and reference in the application of Artificial Intelligence in industry and large companies. With the Vortx platform and DCIM methodology, it serves important global customers such as Embraer (aerospace & defence), Scania and Randon Group (automotive), Solar Br Coca-Cola (beverages), Hospital das Clínicas (healthcare), NTS-Brasil (oil & gas), Votorantim Energia (energy), among others.

Stay tuned following Aquarela’s Linkedin!