There is great value in representing reality through visualizations, especially spatial information. If you’ve seen a map, you know that the polygons that make up the political boundaries of cities and states are generally irregular (see Figure 1a). This irregularity makes it difficult to conduct analyzes and, therefore, cannot be dealt with by traditional Business Intelligence tools.
Notice the green dot in Figure 1b, it is over the polygon (‘neighborhood’) n.14, located between n.16 and n.18. So answer now: which region is having the greatest influence on the green dot? Is it neighborhood n.16 or n.18? Is the green dot representative of region n.14, region n.16 or n.18?
To answer questions like these and to minimize the bias generated by visualizations with irregular polygons, the Vortx Platform does what is known as Geographic Normalization, transforming irregular polygons into polygons of a single size and regular shape (see Figure 1c).
After the “ geographic normalization ”, it is possible to analyze the data of a given space by means of absolute statistics, not only relative, and without distortions caused by polygons of different sizes and formats.
Every day, people, companies and governments make countless decisions considering the geographic space. Which gym is closest to home for me to enroll? Where should we install the company’s new Distribution Center? Or, where should the Municipality place the health centers?
So, in today’s article, we propose two questions:
- What happens when georeferenced information is distorted?
- How close can our generalizations about space get?
O que vou encontrar neste artigo?
Working with polygons and regions
Recalling that the concept of polygon is derived from geometry, being defined as: “a flat, closed figure formed by straight line segments”. When the polygon has all equal sides and, consequently, all equal angles, we can call it a regular polygon. When this does not happen, it is defined as an irregular polygon.
We use the political division of a territory to understand its contrasts, usually delimiting between Nations, States and Municipalities, for example, but we can also delimit regions according to several characteristics, such as the Caatinga region, the Amazon Basin region and even the Eurozone or Trump and Biden voter zones. Anyway, it is only necessary to surround a certain place in space by some common characteristic. Regional polygons, therefore, are widely used to represent certain regions or the organization of a territory of those regions.
Several market tools fill polygons with different shades of colors, according to the region’s data, looking for contrasts among them. But be careful! In case the sizes and shapes of the polygons are not constant, there may be geographic biases, making the visualization susceptible to misinterpretation.
Thus, the polygon approach becomes limited in the following aspects:
- Comparisons between regions unevenly;
- Requiredness to relativize indicators by number of population, area or other factors;
- It does not allow more granular analyzes;
- Demands more attention from analysts when creating statements about certain regions.
Purpose of Geographic standardization
Therefore, the reason for the existence of geographic normalization is to overcome the typical problems associated with data analysis related to irregular polygons, transforming the organization of the territory into a set of polygons (in this case, hexagons) of regular size and shape.
In the example below, we compare the two approaches:
1) Analysis with mesoregional polygons and; 2) Hexagons over the southeastern region of Brazil.
Geographic Normalization seeks to minimize possible distortions of analysis generated by irregular polygons by replacing them with polygons of regular shape and size. This provides an elegant, eye-pleasing and precise alternative, capable of showing initially unknown patterns.
Normalization makes the definition of neighborhoods between polygons clearer and simpler, including promoting better adherence to artificial intelligence algorithms that search for patterns and events that are spatially autocorrelated.
After all, according to the First Law of Geography:
““All things are related to everything else, but things close are more related than distant things.””Waldo Tobler
Geographic normalization can also be done in different ways, such as by equilateral triangles or squares. However, the hexagon provides the least bias among these due to the smaller size of its side walls.
With the normalization, it is possible to summarize the statistics of points (inhabitants, homes, schools, health centers, supermarkets, industries, etc.) contained within these hexagons so that there is constancy in the area of analysis and, of course, significant statistics of these summaries. Mature analytics companies, with a robust and well-consolidated datalake, have an advantage in this type of approach. Also check out our article on How to choose the best AI or data analytics provider?
Usage of normalized geography
Normalized geography can also be used through interactive maps. Maps of this type allow a very interesting level of approximation in the analyzes, as we can see in the animation below, where we show a visualization of the Vortx Platform that presents schools in the city of Curitiba, Brazil.
The darker the hexagon, the greater the number of schools. Note that we can also access other data through the pop-up and change the size of the hexagon as wished.
“The greater the amount of point data available in a region, the smaller the possible size of the hexagons”.
Limitations of the standardized analysis
Like any representation of reality, models that use standardized analysis – although of great value in decision making – do not completely replace the illustration of spatial data in irregular polygons, especially when:
- There is a clear political division to be considered;
- There is no reasonable amount of data;
- There is no consensus on the size of regular polygons.
In addition, the computational process to produce normalized maps must also be taken into consideration, since the processing of the data in this is not limited to the number of observations of the analyzed phenomenon, but also to the treatment of the geography under analysis. For example, conventional workstations can take hours to process basic geostatistical calculations for the 5573 cities in Brazil.
Geographic Normalization – Conclusions and recommendations
In this article we explain geographic normalization, its importance, advantages and cautions for conducting spatial analyzes. In addition, we compared two important approaches to spatial data analysis. It is worth noting that these approaches are complementary in order to have a better understanding of the distribution of data on space. Therefore, we recommend viewing the analyzes in multiple facets.
We realized that, when designing the geographic space in an equitable way, a series of benefits to the analyzes becomes feasible, such as:
- Alignment of the size of views according to business needs;
- Adaption of the visualizations according to the availability of data;
- Being able to make “fair” comparisons through absolute indicators of each region;
- Observation of intensity areas with less bias;
- Simplification of neighborhood definition between polygons, thus providing better adherence to spatial algorithms;
- Finding patterns and events that autocorrelate in space with greater accuracy;
- Usage of artificial intelligence algorithms (supervised and unsupervised) to identify points of interest that would not be identified without standardization. More information at: Application of Artificial Intelligence in georeferenced analyzes.
Finally, every tool has a purpose, geo-referenced visualizations can lead to bad or good decisions.
Therefore, using the correct visualization, along with the right and well-implemented algorithms, based on an appropriate analytical process, can enhance critical decisions that will lead to great competitive advantages that are so important in face of current economic challenges.
What is Aquarela Advanced Analytics?
Aquarela Analytics is Brazilian pioneering company and reference in the application of Artificial Intelligence in industry and large companies. With the Vortx platform and DCIM methodology, it serves important global customers such as Embraer (aerospace), Randon Group (automotive), Solar Br Coca-Cola (food), Hospital das Clínicas (health), NTS- Brazil (oil and gas), Votorantim (energy), among others.
Stay tuned following Aquarela’s Linkedin!
Graduating in Economic Sciences (UFSC), he works as a data scientist at Aquarela. Competence in programming and data analysis in R and specialty in dynamic visualization frameworks and analytics dashboards.
Master in Production Engineering with a degree in Transport Engineering and Logistics. During his master’s degree, he deepened in the areas of macrologistics and regional economics, and developed research in the areas of reverse logistics, relocation of production chains, logistics outsourcing and operational research.
Founder – Director of International/Digital Expansion, Master in Business Information Technology at University of Twente – The Netherlands. Professor and lecturer in the area of Data Science, specialist in intelligence systems architecture and new business development for industry.