Data Warehouse – Businesslab

Business Intelligence, Data Governance, Data Marketing, Data Mining and Data Integration, Data Quality Management, Data Regulations, Data Warehouse

04 septembre 20241

DATA: 7 pitfalls to avoid. The introduction.

DATA! DATA ! DATA everywhere!

These days, data is everywhere, featuring prominently in all new projects and corporate strategies. It’s the key to performance in these uncertain times. At Business Lab consulting, we’re the first to be convinced that it’s a powerful tool that accelerates performance…when it’s well used, well understood and well mastered!

In this new series of articles, we’re going to talk about the big bad wolf; the devil that hides in the detail (or sometimes reveals itself in broad daylight) and discuss with you the 7 main types of pitfalls posed by data and its use. As far as possible, we’ll try to illustrate them with an example from our own experience, because as experts we’ve had the good fortune to come up against each of them in our missions…

Note: these are the pitfalls discussed in Ben Jones’ book, “7 data pitfalls”, which we highly recommend!

Enough suspense, let’s now unveil the 7 families of DATA deadly sins that we’ll be exploring in greater detail over the next 7 weeks:

1. Epistemological errors: how do we think about data?

We often use data with the wrong frame of mind, or with erroneous preconceptions. So, if we go into an analysis project thinking that the data is a perfect representation of reality; if we draw definitive conclusions based on predictions without questioning them; or if we look in the available information for anything that might confirm an opinion already made; then we can create critical errors in the very foundations of these projects.

2. Technical errors: how are the data processed?

Technical and technological issues are often a major source of error in the world of data. Once you’ve identified the information you need, there’s a whole series of obstacles to overcome. Are my sensors working? Do my processes not generate duplicates? Is my data clean or up to date? Complex issues in our projects! After all, isn’t it said that a data analyst spends most of his time and energy preparing and cleaning his data?

3. Mathematical errors: how are the data calculated?

So now you know what your math lessons from school, college and high school are all about! There’s something for every level and taste! If you’ve never combined data at different levels of detail, or made mistakes when calculating ratios, or forgotten that you shouldn’t mix carrots and bananas, we’d love to hear from you!

4. Statistical errors: how are data related?

As the saying goes, “There are lies, damned lies and statistics”. This is the most complex trap to get to grips with, because it takes a lot of skill to fully understand what’s at stake. However, in a world where machine learning, datamining and AI are king, it’s a family of errors that’s only becoming more common!

Do the measures of central tendency or variation we use lead us astray? Are the samples we work with representative of the population we want to study? Are our comparison tools valid and statistically significant?

5. Analytical aberrations: how are the data analyzed?

Golden rule: we’re all analysts (whether we have that title or not).

As soon as we use data to make decisions, then we are analysts, and therefore prone to making decisions based on aberrant analyses. For example, are you familiar with vanity metrics? Or have you ever made extrapolations that don’t make sense in the light of the data used?

These last two topics will be even more important to us than the previous ones, because we’re gaga for Data Visualization, so we’ve got plenty of examples of graphical blunders and aesthetic missteps!

6. Graphic blunders: how are data visualized?

Unlike statistical errors or analytical aberrations, graphical blunders are well known and easily identifiable. Why? Because they can be seen (often from a distance). Have we chosen the right type of chart for our analysis? Is the effect I want to show clearly visible?

7. Aesthetic hazards: can beauty be the enemy of goodness?

What’s the difference with graphic blunders?

Here we’re talking about the overall design of the final product and the interactions we’ve defined within it to ensure that the audience we’re trying to convince has the most ergonomic and aesthetically pleasing experience possible! Does the choice of colors we’ve made confuse or simplify the analysis? Have we used our creativity to make our dashboards pleasing to the eye, and have we used aesthetics to bring impact to the analysis we’re making? Is the final product easy to use and ergonomic, or are the interactions complex and time-consuming?

Are you ready to follow us through the twists and turns of everything that can go wrong with your data analysis projects, so that you don’t fall into these traps?

See you next week!

Did this article inspire you?

Let's talk !

Business Intelligence, Company, Data Governance, Data Marketing, Data Mining and Data Integration, Data Quality Management, Data Regulations, Data Warehouse, Machine Learning, Self-service Analytics, Technology

20 juin 20240

Getting started with Business Intelligence: practical tips

« Wisdom is about extracting gold from raw data; with sharp Business Intelligence, every piece of information becomes a nugget. »

This adage perfectly sums up the potential of BI, provided you follow a few practical tips. Existing information goldmines allow companies to turn them into nuggets of gold shaped in their own image.

Definition

Business Intelligence (BI) is a set of processes, technologies and tools used to collect, analyse, interpret and present data in order to provide actionable information to an organisation’s decision-makers and stakeholders. The main objective of BI is to help companies make strategic decisions based on reliable and relevant data.

BI is widely used in many areas of business, such as financial management, human resources management, marketing, sales, logistics and supply chain, among others. In short, Business Intelligence aims to transform data into actionable knowledge to improve an organisation’s overall performance.

Before looking at the practical tips, let’s look at the elements that define BI. To put BI into practice within your business, there are 5 main steps you need to follow to achieve relevant and effective BI.

Data collection	Data is collected from a variety of sources inside and outside the company, such as transactional databases, business applications, social media, customer surveys, etc.
Data cleansing and transformation	The data collected is cleaned, normalised and transformed into a format that is compatible for analysis. This often involves eliminating duplicates, correcting errors and standardising data formats.
Data analysis	Data is analysed using various techniques such as statistical analysis, data mining, predictive models and machine learning algorithms to identify trends, patterns and insights.
Data visualisation	The results of analysis are generally presented in the form of dashboards, reports, graphs and other interactive visualisations to facilitate understanding and decision-making.
Informations dissemination	The information obtained is shared with decision-makers and stakeholders throughout the organisation, enabling them to make informed decisions based on reliable data.

Practical tips

Now that we have a broad understanding of the definition of BI, it’s important to remember that getting started with Business Intelligence (BI) can be a challenge, but with a strategic approach and some practical advice, you can put in place an effective infrastructure for your business.

Here are some practical tips for getting started with relevant and effective Business Intelligence.

Clarify your objectives

Before you start implementing BI, clearly identify the business objectives you want to achieve. Whether you want to improve decision-making, optimise business processes or better understand your customers, clear objectives will help you focus your efforts.

Start with the basics

Don’t try to do everything at once. Start with pilot projects or specific initiatives to familiarise yourself with BI concepts and tools. This will also enable you to measure results quickly and adjust accordingly.

Identify your data sources

Identify your organisation’s internal and external data sources. This can include transactional databases, spreadsheets, CRM systems, online marketing tools, etc. Ensure that the data you collect is reliable, complete and relevant to your objectives.

Clean and prepare your data

Data quality is essential for effective BI. Put processes in place to clean, standardise and prepare your data before analysing it. This often involves eliminating duplicates, correcting errors and standardising data formats.

Choose the right tools

There are many BI solutions on the market, so look for those that best suit your needs. Considers factors such as ease of use, the ability to manage large sets of data, integration with your existing systems and cost.

Train your team

Make sure your team is formed to use BI tools and interpretation of data. BI is a powerful tool, but its effectiveness depends on the ability of your team to use it properly.

Communicate and collaborate

Involve stakeholders from the start of the BI implementation process. Their support and comments will be essential to ensure the long-term success of your initiative BI.

Start small and grow

Don’t try to implement all BI functionalities at once. Start with pilot projects or specific initiatives, and then gradually extend your use of BI according to the results obtained.

Involve stakeholders

Involve stakeholders right from the start of the BI implementation process. Their support and feedback will be essential in ensuring the long-term success of your BI initiative.

Measure and adjust

Track the performance of your BI and measure its impact on your business. Use this information to identify areas for improvement and make adjustments to your BI strategy over time.

By following these initial practical tips, you can get off to a good start with Business Intelligence and start leveraging your data to make informed decisions and drive business growth.

CONCLUSION

A Business Intelligence (BI) project is considered successful when it succeeds in adding value to the business by meeting its business objectives effectively and efficiently. Here are some key indicators that can define a successful BI project:

Alignment with business objectives: the BI project must be aligned with the company’s strategic objectives. It must contribute to improving decision-making, optimising business processes, increasing profitability or strengthening the company’s competitiveness.

Effective use of data: a successful BI project makes effective use of data to provide usable information. This means collecting, cleansing, analysing and presenting data in the right way to meet business needs.

User adoption: end-users must adopt BI tools and use them on a regular basis to make decisions. A successful BI project is one that meets users’ needs and is easy to use and understand.

Improved performance: a successful BI project translates into improved business performance. This can take the form of increased sales, reduced costs, improved productivity or any other performance measure relevant to the business.

Positive return on investment (ROI): a successful BI project generates a positive return on investment for the business. This means that the benefits gained from using BI outweigh the costs of implementing and maintaining the project.

Scalability and flexibility: a successful BI project is capable of adapting to the changing needs of the business and evolving with it. It must be flexible enough to support new needs, new types of data or new usage scenarios.

Management support and commitment: a successful BI project benefits from the support and commitment of the company’s management. Management must recognise the value of BI and provide the necessary resources to support the project throughout its lifecycle.

In summary, a successful BI project is one that contributes to achieving the company’s business objectives by effectively using data to make informed decisions. It is characterised by its alignment with business objectives, its adoption by users, its positive impact on business performance and its positive return on investment.

Did this article inspire you ?

Let's talk !

Business Intelligence, Company, Data Governance, Data Marketing, Data Mining and Data Integration, Data Quality Management, Data Regulations, Data Warehouse, Machine Learning, Technology

23 mai 20240

Basic SQL : what is it?

For a very long time, SQL was reserved for knowledgeable and technical people in the IT department, and not just any company entity or department could do it. It used to be the exclusive preserve of the company’s IT department. Now, with the spread of « IT », many departments are able to access their company’s data using SQL to query their databases, including marketing, accounting, management control, human resources and many others!

Are you a company specialising in e-commerce, healthcare, retail or simply an SME / SMI? Do you have a set of data stored in a database?

It’s essential to know the basics of structured query language (SQL) so that you can quickly get answers to your queries.

DEFINITION

SQL, or Structured Query Language, is a programming language specially designed for managing and manipulating relational databases.

It provides a standardised interface enabling users to communicate with databases and carry out operations such as inserting, updating, deleting and retrieving data efficiently.

THE BASICS OF SQL

Remember that SQL is nothing more than a way of reading the contents of a relational database to retrieve the information a user needs to meet a requirement.

DATA STRUCTURING

SQL is based on the relational model, which organises data in the form of tables. Each table is made up of columns (fields) representing specific attributes, and rows containing the records.

Table structure :

In the world of SQL, table structure is crucial. Each table is defined by columns, where each column represents a particular attribute of the data you are storing. For example, an « employees » table might have columns such as « surname« , « first name« , « age« , etc. These tables are linked by keys. These tables are linked by keys, which can be unique identifiers for each record, facilitating relationships between different tables.

The main operations (or commands / basic SQL queries)

SELECT : Used to extract data from one or more tables. The SELECT clause is used to specify the columns to be retrieved, the filter conditions and the sort order. This clause is one of the most fundamental in SQL. The WHERE clause, often used with SELECT, is used to filter the results according to specific conditions. For example, you might want to retrieve only those employees whose age is greater than 30, or as in the example below, only those employees in the sales department.

SELECT last name, first name FROM employees WHERE department = ‘Sales’;

INSERT: Used to add new rows to a table

INSERT INTO customers (last name, first name, email) VALUES (‘Doe’, ‘John’, ‘john.doe@email.com’);

UPDATE: Used to add new rows to a table

UPDATE products SET price = price * 1.1 WHERE category = ‘Electronics’;

DELETE: Used to delete rows from a table under certain conditions

DELETE FROM orders WHERE order_date < ‘2023-01-01‘;

Filtering and sorting

To filter the results, SQL uses the WHERE clause, which allows you to specify conditions for selecting the data. In addition, the ORDER BY clause is used to sort the results according to one or more columns.

Filtering and sorting are essential operations in the SQL language, making it possible to retrieve specific data and organise it in a meaningful way. Let’s explore these concepts with some practical examples

Filtering with the WHERE clause

The WHERE clause is used to filter the results of a query by specifying conditions. This allows you to select only the data that meets these criteria.

–Select employees with a salary greater than 50000

SELECT last name, first name, salary

FROM employees

WHERE salary > 50000;

In this example, only employees with a salary greater than 50000 will be included in the results.

Filtering with the ORDER BY clause

The ORDER BY clause is used to sort the results of a query according to one or more columns. You can specify the sort order (ascending or descending)

–Select customers and sort alphabetically by name

SELECT last name, first name, email

FROM customers

ORDER BY name ASC;

In this example, the results will be sorted in ascending alphabetical order by customer name.

Filtering and sorting can also be combined, i.e. combining the WHERE clause and the ORDER BY clause to filter the results at the same time

–Select products in the ‘Electronics’ category and sort by descending price

SELECT product_name, price

FROM products

WHERE category = ‘Electronics

ORDER BY price DESC;

There are other ways of filtering and sorting with operators, but this becomes SQL that is no longer basic, but for a more experienced audience.

By understanding these filtering and sorting concepts, you will be able to extract specific data from your SQL databases in a targeted and organised way.

Joins

Joins are essential for combining data from several tables.

Common types of joins include INNER JOIN, LEFT JOIN, RIGHT JOIN and FULL JOIN, each offering specific methods for associating rows between different tables.

Example of a simple join:

SELECT customer.name, orders.date

FROM customers

INNER JOIN orders ON customers.customer_id = orders.customer_id;

Types of joins :

INNER JOIN: Returns the rows when the join condition is true in both tables.

LEFT JOIN (or LEFT OUTER JOIN): Returns all the rows in the left-hand table and the corresponding rows in the right-hand table.

RIGHT JOIN (or RIGHT OUTER JOIN): The opposite of LEFT JOIN.

FULL JOIN (or FULL OUTER JOIN): Returns all rows when the join condition is true in one of the two tables.

Constraints for data integrity and Indexes to optimise performance

Constraints play a crucial role in guaranteeing data integrity. Primary keys ensure that each record in a table is unique, while foreign keys establish links between different tables. Uniqueness constraints ensure that no duplicate values are allowed in a specified column.

Indexes are data structures that improve query performance by speeding up data searches. Creating an index on a column makes searching easier, but it is essential to use them wisely, as they can also increase the size of the database.

Conclusion

SQL is a powerful and universal tool for working with relational databases. Understanding its fundamentals enables developers and data analysts to interact effectively with database management systems, making it easier to manipulate and retrieve crucial information. Whether for simple tasks or more complex operations, SQL remains an essential part of data management.

It offers a range of tools for interacting with relational databases in a powerful and flexible way. By understanding these basic concepts, you’ll be better equipped to effectively manipulate data, create custom reports and answer complex questions from large datasets. Whether you’re a developer, data analyst or database administrator, mastering SQL is an invaluable asset in the world of data management.

Did this article inspire you ?

Let's talk !

06 mars 20240

Data Warehouses vs Data Lakes: a comparative dive into the Tech World

In the ever-evolving world of technology, two terms have been making waves: Data Warehouses and Data Lakes. Both are powerful tools for data storage and analysis, but they serve different purposes and have unique strengths and weaknesses. Let’s dive into the world of data and explore these two tech giants.

Data Warehouses have been around for a while, providing a structured and organized way to store data. They are like a well-organized library, where each book (data) has its place. Recent advancements have made them even more efficient. The convergence of data lakes and data warehouses, for instance, has led to a more unified approach to data storage and analysis. This means less data movement and more efficiency – a win-win!

Moreover, the integration of machine learning models and AI capabilities has automated data analysis, providing more advanced insights. Imagine having a personal librarian who not only knows where every book is but can also predict what book you’ll need next!

However, every rose has its thorns. Data warehouses can be complex and costly to set up and maintain. They may also struggle with unstructured data or real-time data processing. But they shine when there is a need for structured, historical data for reporting and analysis, or when data from different sources needs to be integrated and consistent.

On the other hand, Data Lakes are like a vast ocean of raw, unstructured data. They are flexible and scalable, thanks to the development of the Data Mesh. This allows for a more distributed approach to data storage and analysis. Plus, the increasing use of machine learning and AI can automate data analysis, providing more advanced insights.

However, without proper management, data lakes can become « data swamps », with data becoming disorganized and difficult to find and use. Data ingestion and integration can also be time-consuming and complex. But they are the go-to choice when there is a need for storing large volumes of raw, unstructured data, or when real-time or near-real-time data processing is required.

In depth

DATA WAREHOUSES

Advancements

1. Convergence of data lakes and data warehouses: This allows for a more unified approach to data storage and analysis, reducing the need for data movement and increasing efficiency.

2. Easier streaming of real-time data: This allows for more timely insights and decision-making.

3. Integration of machine learning models and AI capabilities: This can automate data analysis and provide more advanced insights.

4. Faster identification and resolution of data issues: This improves data quality and reliability.

Setbacks

1. Data warehouses can be complex and costly to set up and maintain.

2. They may not be suitable for unstructured data or real-time data processing.

Best scenarios for implementation

1. When there is a need for structured, historical data for reporting and analysis.

2. When data from different sources needs to be integrated and consistent.

DATA LAKES

Advancements

1. Development of the Data Mesh: This allows for a more distributed approach to data storage and analysis, increasing scalability and flexibility.

2. Increasing use of machine learning and AI: This can automate data analysis and provide more advanced insights.

3. Tools promoting a structured dev-test-release approach to data engineering: This can improve data quality and reliability.

Setbacks

1. Data lakes can become « data swamps » if not properly managed, with data becoming disorganized and difficult to find and use.

2. Data ingestion and integration can be time-consuming and complex.

Best scenarios for implementation

1. When there is a need for storing large volumes of raw, unstructured data.

2. When real-time or near-real-time data processing is required.

In conclusion, both data warehouses and data lakes have their own advantages and setbacks. The choice between them depends on the specific needs and circumstances of the organization. It’s like choosing between a library and an ocean – both have their charm, but the choice depends on what you’re looking for. So, whether you’re a tech enthusiast or a business leader, understanding these two tools can help you make informed decisions in the tech world. After all, in the world of data, knowledge is power!

This article inspired you ?

Let's talk !