At Business Lab Consulting, we’re proud to share our customers’ feedback. Today, we’d like to talk to you about IBL and its wiiv loyalty programme, which will be launched in 2019.
The Head of Loyalty, Cécile MASSON HENRY, gives us her impressions of our collaboration.
.
When IBL launched the wiiv programme, the team was ‘new to the field’ and had to ‘wait until we had a minimum amount of data before starting the project’. It was against this backdrop that IBL chose to call on Business Lab Consulting, an agency offering not only skills in data visualisation, but also expertise in loyalty marketing.
IBL faced a number of difficulties during the implementation of the data project for its wiiv loyalty programme:
Despite these challenges, the collaboration with Business Lab Consulting has had a significant impact on IBL’s services within this department. The greatest contribution has been in training and support to :
During the tender process, Business Lab Consulting stood out as the only agency offering both :
IBL particularly appreciated the training and support provided by Business Lab Consulting, enabling it to acquire the skills needed to make the most of its data.
Although there were some initial challenges in terms of monitoring and support due to the dependence on a single consultant, Business Lab Consulting has strengthened its team, enabling it to be more responsive and work more smoothly together.
To sum up, IBL states:
‘With Business Lab Consulting, we had everything under one roof – visualisation expert, loyalty programme expert, data marketing expert and quality training, with the added pleasure of working with people we value and who give a huge amount of themselves to the success of our project.’
On a scale of 1 to 10, IBL recommends Business Lab Consulting's services at 8.5/10.
At Business Lab Consulting, we are delighted with the trust placed in us by IBL for its wiiv loyalty programme. We will continue to provide high quality services, taking seriously the areas for improvement that we can always work on, constantly adapting to the evolving needs of our customers in the field of loyalty marketing.
At Business Lab Consulting, we’re proud to share our customers’ feedback. Today, we present a testimonial from KDI, a company that called on our services to optimize its data management and operations. KDI’s Director of Logistics and Procurement gives us his impressions of our collaboration.
The applications we have developed meet all KDI’s initial needs, whether in terms of data analysis, visualization or integration.
Our collaboration has enabled KDI to better understand and analyze their business, by providing clear tools and solutions to effectively exploit their data and improve their overall performance.
KDI’s decision to call on our services was based on two key factors:
Two aspects of our service were particularly appreciated:
KDI highlights two major advantages of Business Lab Consulting:
These strengths have enabled KDI to obtain tailor-made solutions, perfectly adapted to their needs.
KDI rates his satisfaction at 9/10, a score that reflects the high quality of our services. The Logistics and Procurement Manager says he would recommend Business Lab Consulting without hesitation.
In conclusion, KDI sums up our service in the following terms:
At Business Lab Consulting, we are delighted with the trust placed in us by KDI, and will continue to provide services of the highest quality to meet our customers’ data management and exploitation needs.
As Steve Jobs once said, « Design is not just what it looks like and feels like. Design is how it works. » This principle applies perfectly to data visualization. In this final episode of our series, we’ll explore the often-overlooked dangers related to design in data presentation.
Color choice is a crucial aspect of data visualization design, yet it’s often mishandled. Poorly chosen colors can make visualizations difficult to read or even misleading. Here are some common color-related pitfalls:
Consider this example of a poorly designed dashboard:
In this dashboard, the use of similar colors for different categories makes it difficult to distinguish between crime types. A better approach would be to use a clear, distinct color palette with high contrast between categories.
Sometimes, in our quest for simplicity, we miss opportunities to enhance understanding through design. Thoughtful addition of visual elements can greatly improve engagement and memorability.
For example, consider this improved visualization of Edgar Allan Poe’s works:
This visualization uses design elements to evoke the dark ambiance of Poe’s works, making the visualization more memorable and engaging. The inverted y-axis and blood-red color scheme add to the ominous feel, while the portrait and signature provide context and personality.
Good design isn’t just about visual appeal; it must also consider usability. Visualizations that are difficult to manipulate or understand can frustrate users and limit the effectiveness of data communication.
Key usability considerations include:
Here’s an example of a dashboard with potential usability issues:
While this dashboard offers numerous interaction options, without careful user interface design, it can become overwhelming and difficult to use effectively. A better approach would be to simplify the interface, prioritize key information, and provide clear guidance on how to interact with the visualization.
In this final article of our series, we’ve explored the seventh type of error we can encounter when working with data: design dangers. We’ve seen how color choices, missed opportunities, and usability issues can affect the effectiveness of our data visualizations.
Throughout this seven-part series, we’ve covered a wide range of common pitfalls in working with data, from how we think about data to how we present it. By being aware of these pitfalls and learning how to avoid them, we can significantly improve our ability to work effectively with data and communicate valuable insights.
Remember, good design in data visualization is not just about making things look pretty. It’s about enhancing understanding, facilitating insights, and enabling better decision-making. As you continue your data journey, keep these principles in mind to create visualizations that are not only visually appealing but also clear, informative, and user-friendly.
This series of articles is strongly inspired by the book « Avoiding Data Pitfalls – How to Steer Clear of Common Blunders When Working with Data and Presenting Analysis and Visualizations » written by Ben Jones, Founder and CEO of Data Literacy, WILEY edition. We highly recommend this excellent read to deepen your understanding of data-related pitfalls and how to avoid them!
You can find all the topics covered in this series here: https://www.businesslab.mu/blog/artificial-intelligence/data-7-pitfalls-to-avoid-the-introduction/
In the dynamic world of Business Intelligence (BI), where the complexity of data meets the evolving needs of users, Lean UX Design is emerging as a revolutionary approach. This user-centered methodology promises to radically transform the way we design and develop BI solutions.
By quickly identifying what works and what doesn’t, Lean UX saves precious resources.
“Thanks to DATANALYSIS’ Lean UX approach, we reduced our BI development costs by 30% while increasing user satisfaction by 50%.”
– Marie Dupont, CIO, TechInnovate SA
BI solutions designed with users, for users, guarantee better adoption and use.
In an ever-changing BI environment, Lean UX enables you to pivot quickly and efficiently.
Adopting Lean UX in your BI development can seem daunting.
In a world where data is king, Lean UX offers a way to turn that data into actionable insights faster and more accurately than ever before. For companies looking to make the most of their BI investments, Lean UX isn’t just an option, it’s a competitive necessity.
At BUSINESS LAB CONSULTING, we’re passionate about applying Lean UX to BI development. Our team of experts is ready to guide you through this transformation to optimize your processes, reduce your costs and significantly improve the user experience of your BI solutions.
Data visualization is a powerful tool for communicating complex information clearly and concisely. However, it can also be a source of numerous errors that can lead to misinterpretations. In this episode, we’ll explore the most common graphical gaffes and how to avoid them.
One of the most common pitfalls in data visualization is creating graphs that mislead, often unintentionally. This can happen in several ways:
For example, consider this graph showing drug-related crime cases in Orlando:
This graph seems to show an alarming increase in drug-related crimes. However, upon closer examination, we see that the Y-axis doesn’t start at zero, visually exaggerating the increase.
It’s easy to fall into the trap of data dogmatism, thinking there’s only one « right » way to visualize data. In reality, the choice of graph type depends on the context, audience, and message you want to convey.
For example, although pie charts are often criticized, they can be effective for showing parts of a whole, especially when there are few categories:
This pie chart clearly shows that theft accounts for nearly half of all reported crimes in Orlando.
In data visualization, one can fall into the trap of thinking that we must always seek the « optimal » visualization at the expense of « satisfactory » solutions. In reality, it’s often more practical and effective to find a visualization that meets the needs sufficiently well, rather than spending excessive time seeking perfection.
For example, this horizontal bar chart can be « satisfactory » for showing the most common types of crimes, even if it’s not necessarily « optimal »:
This graph is easy to understand and quickly provides essential information, even if it could potentially be optimized further.
In this article, we explored the sixth type of error we can encounter when working with data: graphical gaffes. We’ve seen how to avoid misleading graphs, data dogmatism, and the false dichotomy between optimization and satisfaction.
In the next and final article in our series, we’ll explore the 7th type of error: design dangers. We’ll see how design choices can affect the perception and interpretation of visualized data.
This series of articles is strongly inspired by the book « Avoiding Data Pitfalls – How to Steer Clear of Common Blunders When Working with Data and Presenting Analysis and Visualizations » written by Ben Jones, Founder and CEO of Data Literacy, WILEY edition. We highly recommend this excellent read to deepen your understanding of data-related pitfalls and how to avoid them!
You can find all the topics covered in this series here: https://www.businesslab.mu/blog/artificial-intelligence/data-7-pitfalls-to-avoid-the-introduction/
In our quest to make the most of data, we often fall into the trap of considering intuition and analysis as mutually exclusive approaches. However, as we’ll see in this episode on analytical aberrations, intuition plays a crucial role in the data analysis process.
There was a time when advertisements boasted about moving from intuition to analysis in decision-making. This view is mistaken. Intuition isn’t obsolete in the data age – it’s actually more valuable than ever.
Intuition is the spark that powers the engine of analysis. It helps us:
Predicting the future from data can be risky. Extrapolating current trends can lead to significant errors if we don’t account for natural limits or potential changes.
For example, if we look at life expectancy in North and South Korea from 1960 to 1980, we might be tempted to predict a continuous, linear increase. However, reality turned out quite differently, especially for North Korea, which experienced a significant decline in the 1990s.
When working with time-series data, we must be careful in our interpretations between data points. A simple slope graph connecting two points in time can mask significant fluctuations between these points.
For example, consider life expectancy in certain countries between 1960 and 2015. A simple slope graph showing the change between these two years could give the impression of a steady and constant increase. However, this simplified representation would mask periods of conflict, economic hardship, or rapid progress in public health that significantly impacted life expectancy over the years.
Take the case of Cambodia, Timor-Leste, Sierra Leone, and Rwanda. A simple slope graph would show an increase in life expectancy between 1960 and 2015, but would completely obscure the tragic periods of war and genocide these countries experienced. For instance, life expectancy in Cambodia fell to less than 20 years in 1977 and 1978, a crucial fact that would be completely ignored in a simple interpolation between 1960 and 2015.
Forecasts, especially long-term ones, can be particularly prone to errors. A striking example is the unemployment forecasts made by different U.S. presidential administrations. These forecasts tend to show a rapid return to a « normal » rate of 4-6%, regardless of the actual economic situation.
This phenomenon can be explained by several factors. First, there’s political pressure to present optimistic outlooks. Second, there’s a natural tendency to assume that extreme or unusual situations will correct themselves quickly. Finally, forecasting models are often based on historical data and may not adequately account for structural changes in the economy.
For example, during the 2008 financial crisis, unemployment forecasts made just before or at the beginning of the crisis failed to anticipate the magnitude and duration of the increase in unemployment. Similarly, forecasts made at the height of the crisis often underestimated the time it would take for the unemployment rate to return to pre-crisis levels.
It’s crucial to ensure that the measures we use are relevant and meaningful. Too often, we focus on measures that are easy to obtain rather than those that are truly important for understanding a phenomenon or making decisions.
In sports, for example, many traditional measures can be misleading. Take the case of professional basketball: a player’s average speed on the court might seem like an interesting measure, but it doesn’t necessarily reflect the player’s real impact on the game.
LeBron James, one of the best players of all time, was criticized during the 2018 playoffs for having the lowest average speed on the court. However, this measure didn’t account for his real impact on the game, measured by more relevant statistics like the Player Impact Estimate (PIE).
This graph shows the relationship between average speed and PIE for NBA players. We can see that LeBron James (point in the top left) has a very high PIE despite a relatively low average speed, illustrating why average speed alone is an inadequate measure of a player’s performance.
This case illustrates the importance of choosing measures that truly reflect what we’re trying to evaluate, rather than settling for measures that are easy to obtain but potentially misleading.
In this article, we explored the fifth type of error we can encounter when using data to illuminate the world around us: analytical aberrations. We’ve seen how intuition and analysis can work together, and how to avoid the pitfalls of exuberant extrapolations, ill-advised interpolations, funky forecasts, and moronic measures.
In the next article, we’ll explore the 6th type of error in our series: graphical gaffes. We’ll see how errors in data visualization can lead to misinterpretations and poorly informed decisions.
This series of articles is strongly inspired by the book « Avoiding Data Pitfalls – How to Steer Clear of Common Blunders When Working with Data and Presenting Analysis and Visualizations » written by Ben Jones, Founder and CEO of Data Literacy, WILEY edition. We highly recommend this excellent read to deepen your understanding of data-related pitfalls and how to avoid them!
You can find all the topics covered in this series here: https://www.businesslab.mu/blog/artificial-intelligence/data-7-pitfalls-to-avoid-the-introduction/
“There are lies, damned lies and statistics” B.Disraeli
Why such distaste for a field that, according to Webster’s Merriam-dictionary, is simply “a branch of mathematics dealing with the collection, analysis, interpretation and presentation of masses of numerical data. ”1 Why is the field of statistics in such a negative light by so many people?
There are four main reasons
Descriptive statistics are intended to summarize the main characteristics of a data set. However, incorrect or inappropriate use can lead to misleading conclusions. A typical example is the use of the mean to summarize a distribution, without taking into account variability or skewness. Another common error is to present percentages without explaining the total number of people, which can be misleading as to the true extent of a phenomenon. It is therefore crucial to understand the assumptions and limitations of each descriptive measure in order to use it correctly.
Let’s take the example of analyzing salaries within a company. If we simply look at average salaries, we might conclude that the company is paying its employees well. However, if management salaries are very high compared to the rest of the employees, the average would be biased upwards. It would be more relevant to use the median, which gives the salary in the middle, or to look at the complete salary distribution for a more accurate view.
Statistical inference aims to draw conclusions about a population from a sample of that population. However, this process is subject to error. Sampling errors and Type I and II errors are common. In addition, errors can be exacerbated by confusion between correlation and causation. A solid understanding of the principles of statistical inference is essential to avoid these pitfalls.
Let’s imagine a public health study seeking to establish a link between a particular dietary habit (such as eating organic) and better overall health. If the study finds a positive correlation, it doesn’t necessarily mean that eating organic causes better health. There could be confounding factors, such as income level or lifestyle, that influence both eating habits and health status. Here, we can fall into the trap of confusing correlation with causation.
Sampling is a crucial stage in any data collection process. Yet many errors can occur at this stage. The sample may not be representative of the target population, due to selection bias or non-response. What’s more, the sample size may be insufficient to detect an effect. Careful sample planning is therefore essential to obtain reliable results.
Consider a customer satisfaction survey conducted by an e-commerce company. If the company only solicits opinions from customers who have made a recent purchase, it runs the risk of obtaining a distorted picture of overall customer satisfaction. Indeed, dissatisfied customers may have stopped making purchases and therefore not be included in the sample. This is an example of selection bias.
A common mistake in data analysis is to ignore the impact of sample size on results. A large sample size can make a very small effect significant, while too small a sample size may not have sufficient power to detect an existing effect. Furthermore, statistical significance does not necessarily mean practical significance. So it’s important to consider sample size when interpreting results.
Suppose you’re conducting a study to assess the effect of a drug on lowering blood pressure. If you have a very large sample of patients, you may see a statistically significant drop in blood pressure. However, this drop may be very small, say 0.1 mm Hg, a clinically insignificant value despite its statistical significance. This is an example where sample size can make a small effect significant. On the other hand, if the sample is too small, a real effect may be missed. It is therefore important to consider clinical or practical significance in addition to statistical significance.
Digging deeper into this issue, Ben Jones (see author who inspired this article) managed to find figures on kidney cancer rates as well as demographics for every US county, and he created an interactive dashboard (figure below) to visually illustrate the fact that Kahneman, Wainer and Zwerlink are doing quite clearly in words.
Notice a few elements in the dashboard. On the choropleth map (filled in), the darkest orange counties (high rates relative to the overall U.S. rate) and the darkest blue counties (low rates relative to the overall U.S. rate) are often side by side.
Also, note how in the scatterplot below the map, the marks form a funnel shape, with less populated counties (on the left) more likely to deviate from the reference line (the overall U.S. rate), and more populated counties like Chicago, L.A. and New York are more likely to be close to the overall reference line.
One final observation: if you hover over a county with a small population in the interactive online version, you’ll notice that the average number of cases per year is extremely low, sometimes 4 cases or less. A small deviation – even just 1 or 2 cases – in a subsequent year will pull a county from the bottom of the list to the top, or vice versa.
In the next article, we’ll explore the 5th type of error we may encounter when using data to illuminate the world around us: Analytical aberrations.
This article is heavily inspired by the book “Avoiding Data pitfalls – How to steer clear of common blunders when working with Data and presenting Analysis and visualization” written by Ben Jones, Founder and CEO of Data Litercy, WILEY edition. We recommend this excellent read!
You can find all the topics covered in this series here: https://www.businesslab.mu/blog/artificial-intelligence/data-7-pitfalls-to-avoid-the-introduction/
In our data analysis projects, mathematical errors can occur as soon as a calculated field is created to generate additional information from our initial dataset. This type of error can be found, for example, when :
These are obviously just a few of the types of operation where errors can occur. But in our experience, these are the main causes of the problems we encounter.
And, in each of these cases, it doesn’t take a genius engineer or scientist to correct them. It just takes a little care and a lot of rigor!
In this article, we won’t dwell too much on this common mistake. In fact, there are a large number of articles and anecdotes which illustrate this type of problem perfectly and in detail (which we also discussed in the previous article).
The most famous and costly example is the crash of the Mars Orbiter probe. If you’d like to find out more, please click here: Mars Climate Orbiter – Wikipedia
You may argue that none of us is part of NASA and has to land a probe on a distant planet, so we’re not concerned. Well, you may well come across this type of error when handling time data (hours, days, seconds, minutes, years), financial data (different currencies), or managing stocks (units, kilos, pallets, bars etc.).
We aggregate data when we group records that have an attribute in common. There are all sorts of such groupings that we deal with in our world as soon as we can establish hierarchical links; time (day, week, month, year), geography (cities, region, country), organizations (employees, teams, companies) and so on.
Aggregations are a powerful tool for apprehending the world, but beware, they involve several risk factors:
The statistical summary is a typical example of what aggregates can hide. In this example, the four data sets have exactly the same sums, means and standard deviations on both coordinates (X,Y). When we plot each of the points on curves, it’s easy to see that the 4 stories are significantly different.
As soon as data is aggregated, we try to summarize a situation. We must always remember that this summary masks the details and context that explain it. So be careful when, in a discussion, your interlocutors only talk about average values, sums or medians, without going into the details of what may have led to that particular scenario.
Take, for example, a dataset in which we observe the number of bird strikes on aircraft for an airline.
Our objective is to determine the month(s) of the year with the most incidents. This gives :
The answer to our question was therefore August, if we exclude the data for the year for which we didn’t have all the records.
This is the last example of the problems linked to aggregations that we’re going to discover in this article. This is one of the author’s “favorite” mistakes. Some might even call it a specialty!
It comes into play when it’s necessary to count the distinct individuals in a given population. Let’s say we’re looking at our customer base and want to know how many unique individuals are in it.
Counting the distinct ids for the whole company gives us a count of our unique customers:
But if we look at each product line and display a sum without paying attention :
We found 7 more customers!
This happens simply because there are customers in the customer base of the company studied who take both services AND licenses, and who end up being counted twice in the total!
This is a problem with simple solutions in all modern datavisualization and BI software, but it tends to hide itself in a series of calculations and aggregations, causing sometimes surprising discrepancies at the end of the chain.
We’ll illustrate this point with an example taken from one of the dashboards we made for one of our customers. With all our expertise, we also sometimes jump headlong into this type of error:
And yes, we’re talking about an occupancy rate that’s “slightly” over 100%!
How is this possible? A simple oversight!
The sum of the divisions is not equal to the division of the sums…
In this case, we had a data set similar to the one below:
Is the occupancy rate equal to :
The sum of the individual occupancy rates? FALSE!
This gives us a total of 30% + 71% + 100% + 50% + 92% + 70%, i.e. 414%.
And that’s exactly the error we made on an even larger data set…
Or the ratio of total passengers to total available capacity? 125/146 = 86%. That’s more accurate!
Note: the average of individual occupancy rates would also be wrong.
In short, whenever a ratio is manipulated, it’s a question of dividing the total of the numerator and denominator values to avoid this type of problem.
This is just one example of a ratio error. Honorable mentions can be given to the treatment of NULL values in a calculation, or to the comparison of ratios that are not calculated with the same denominators.
In the next article, we’ll explore the 4th type of obstacle we may encounter when using data to shed light on the world around us:
Statistical slippage. (Spoilers: “There are lies, damned lies and statistics” B.Disraeli)
This article was strongly inspired by the book “Avoiding Data pitfalls – How to steer clear of common blunders when working with Data and presenting Analysis and visualization” written by Ben Jones, Founder and CEO of Data Litercy, WILEY edition. We recommend this excellent read!
You can find all the topics covered in this series here: https://www.businesslab.mu/blog/artificial-intelligence/data-7-pitfalls-to-avoid-ep-2-7-technical-errors-how-is-data-created/
To stay with the construction metaphor, if problems of this nature exist, they will be hidden and barely visible in the final building. Particular care must therefore be taken during the data collection, processing and cleaning stages. It’s not for nothing that it’s estimated that 80% of the time spent on a data science project is spent on this type of task.
To avoid falling into this trap, and to limit the load required to carry out these potentially tedious operations, we need to accept three fundamental principles:
Accepting these principles does not remove the obligation to go through this preliminary work before any analysis, but the good news is that knowing how to identify these risks, and learning as we go along, helps to limit the scope of this second obstacle.
Data is dirty. I’d even go so far as to say that all data is dirty (see first principle above), with problems of formatting, data entry, inconsistent units, NULL values and so on.
Some well-known examples of this trap
Take the crash of NASA’s Mars Climate Orbiter in 1999, for example. A $125 million error caused by a dual unit system: imperial and metric units. This led to an erroneous calculation that affected the power sent to the probe’s thrusters and its destruction.
Fortunately, not all errors of this nature will cost us so much money! But they do have a significant impact on the results and ROI of the analyses we carry out.
-Standardizing fields (phone number, email, etc.): +262 692 00 11 22 / 00262692001122 / 06-92-00-11-22 correspond to the same line, and we can automate much of this work thanks to appropriate processing;
– Filling in empty fields using other data in the table. For example, we can deduce the country of residence from telephone numbers, zip codes, cities, etc.
-Using adapted rules to identify potentially identical lines. Two records with the same e-mail address, telephone number or company ID;
-Using distance calculation algorithms to define similar values in terms of spelling, pronunciation, common characters, etc.
From these examples and our own experience, we can see that this type of error mainly stems from data entry, collection or “scrapping” processes, whether implemented automatically or by humans. So, in addition to the solutions that can be implemented in data preparation processes, improving these preliminary steps will also greatly improve the quality of the data to be processed, and this requires education, training and the definition of rules and standards that are clearly known and shared (data governance is never far away).
Finally, we should also ask ourselves when we can consider this stage to be sufficiently clean. After all, we can always do more and better, but the costs involved can often outweigh the expected returns.
In the IT world, there’s an image that sums up this type of problem:
Often, the mistake lies between the screen and the seat!
And yes, even the best data scientists, data analysts or data engineers can make mistakes in the data cleansing, transformation and preparation stages.
Frequently, we manipulate several files from different sources and different applications, which multiplies the risks associated with dirty data issues and the risks when manipulating the files themselves:
And this problem can also be made more complex depending on the tools used in our analyses:
In this case, it’s often a question of technical constraints linked to the very business of data preparation, and taking the time to understand the risks and processes in place will save a great deal of time in delivering reliable, high-performance data analysis.
In the next article, we’ll explore the 3rd type of obstacle we may encounter when using data to shed light on the world around us: Mathematical errors.
This article was strongly inspired by the book “Avoiding Data pitfalls – How to steer clear of common blunders when working with Data and presenting Analysis and visualization” written by Ben Jones, Founder and CEO of Data Litercy, WILEY edition. We recommend this excellent read!
You can find all the topics covered in this series here : https://www.businesslab.mu/blog/artificial-intelligence/data-7-pitfalls-to-avoid-ep-1-7-epistemological-errors-how-do-we-think-about-data/