What is Data Warehouse?

data warehouse que es

Nowadays, companies depend on the use of information for decision making. Once I know collect, store and integrate data efficiently, it is possible to proceed to analyze the important information, essential to optimize the benefits, generate income or contain the costs of each organization. In this post we tell you everything about the data storage tool, Data Warehouse, what is it, what it is for, and all its main characteristics. 

To contextualize, we must know thatCompanies use data from multiple sources which can be internal such as personnel data, sales or purchasing status, customer monitoring, new opportunities, etc., or external data, such as information on the competition, the market, potential clients, etc. In this way, the more it expands the information horizon that will be used for decision making The greater the amount of data that will have to be stored.

We can process all this information with methods such as ETL (Extract, Transform, Load) and then store the result in a Data Warehouse, an electronic warehouse where companies store a large amount of valuable information. In this place, the available data is stored securely and are easy to recover and analyze. 

The data that is stored in a Data Warehouse They are both historical and current, which also allows for an even greater overview. It is important to know that by definition only stores data that was modeled or structured, unlike a Data Lake where we can find data that will ultimately not be useful to us.

Within the advantages of Data Warehouse We can highlight the ease of use, the ability to transform information into knowledge, the great contribution to decision making and the increase in productivity.

 

1. What is data warehouse used for?

To continue analyzing the Data Warehouse ends, we will continue talking about data. And as we have already talked about in the previous point, information is vital for decision making. In this way, andAmong the functions that we can commonly see we have the analysis of different types of data:

Market trends for investments.

Financial status of clients for insurance, whether home, car, motorcycle, or life insurance, or granting loans.

Analysis of web users to create marketing audiences.

Determine pricing or discount policies based on purchasing trends.

Added to this, as an extra complement, the information stored in the Data Warehouse allow data scientists to perform Machine learning or Artificial Intelligence models, further enhancing results such as generating audiences for Marketing or predicting fluctuations in the financial market.

 

2. Data warehouse characteristics

The main features are based on the following points:

Can get data from multiple sources, regardless of the origin, as long as they comply with the second point.

The data has already undergone a first processing, this means that it was cleaned and what is stored in the Data Warehouse (mostly at least) it is useful, classified and are consolidated in an organized system.

At the same time, the ability to support large amounts of data makes it ideal for store amount of historical data, which grow day by day.

 

3. Different types of Data Warehouse

Currently there are defined 3 types of Data Warehouse:

Offline

Every certain period of time the data is updated, it can be at different intervals, such as daily, weekly or monthly.

In real time

It is constantly updated to provide the latest information available. Every time new data is generated it is automatically entered.

An example could be the sales points of a local chain, with each sale it will be updated.

Integrated

These work collaboratively with other information systems, thus allowing them access to process reports.

 

4. Who uses a Data Warehouse?

Are Mainly used by Data Analysts, who obtain all this information and analyze it to make decisions or to search for insights. Also Data Scientists use Data Warehouse for the creation of Machine Learning and Artificial Intelligence models.

At the same time Business Intelligence systems use Data Warehouses as data sources, since they are reliable and respect a scheme, facilitating the use and availability of data, and leading to more accurate analysis.

 

5. How does a Data Warehouse work?

Storing the useful data is the easy part of the process. The main question or where the ?complexity? is during the previous work, in the points that must be taken into account when plan and implement data storage in Data Warehouse. 

It is essential to be clear about several important aspects when implementing Data Warehouse. Among them, define the scope, define the business needs that must be satisfied, be clear about the data sources with which you will work, their availability, the relevant ETL process for each of the sources or the periodicity with which is going to feed.

All of this is important to take into account from the beginning since several of these points will have an impact from minute 1 of development and then it can be complex to modify it. This because of information from various sources can be interconnected, and modifying one may mean having to modify the entire structure, from ingestion to transformation.

 

6. Structures of a Data Warehouse

A basic structure for a data warehouse consists first of data sources, which can be of any type, whether structured, semi-structured or unstructured, from which we obtain the ?raw data? or ?dirty data?.

This data is stored in a Data Lake, and up to this point, can we? use this data but it will be difficult to obtain good conclusions since it is full of unuseful and disposable information.

In this way, this is where The aforementioned ETL process is carried out, or "Extract, Transform, Load." The information is cleaned and shaped, discarding what is considered useless and leaving only those data that can finally be used by analysts.

Once this entire process is completed, the output is stored in the Data Warehouse, thus resulting in the volume growing more and more over time. finally obtaining a history of all the useful information.

 

7. Data Warehouse in the cloud Why migrate to the cloud?

There are various reasons why migrate a Data Warehouse to the cloud. Among them agility stands out, since the computing capacity will not be linked to a local physical machine which may have its limitations.

This brings us to the second point, costs, which are easier to manage since solutions like those offered Google with BigQuery They charge for consumption, this allows us not to have to increase the storage capacity of a local machine, but simply as we need more, the use in BigQuery will automatically increase and Less use will reduce costs.

On the other hand, Security is also a key factor in data management, and by having them all in a cloud like Google's we can trust that they will be safe, since from GCP This topic is covered.

Another differentiating factor is Disponibility, since where the Data Warehouse is stored We will not be affected by electricity or internet outages. In cases where the server may suffer any of these problems or even fail a component, it can be solved and until it is done the data will be blocked. The advantage is that this type of problem does not happen in the cloud.

In addition, having data available in the cloud also gives the possibility of using online analytical processing, eliminating the hardware barrier and latency.

In order to obtain all these benefits, it is not necessary to start a process from scratch, you can make a migration of an on-premise data warehouse to the cloud.

 

7.1 Main advantages of moving the data warehouse to the cloud

As we mentioned in the reasons for migrating to the cloud, we find various advantages of Data Warehouse. Among the main ones stand out data security, high availability of information and low latency.

At the same time, the computing power to quickly process the data and obtain all the desired information is extremely important, including linking directly with dashboarding tools such as Looker Studio or Looker.

We see the change in the way of estimating costs as an advantage, since there will no longer be problems that have to do with failures or the need for improvements in hardware.

 

Compartir

Leave a comment

Do you want to know more about Google and technology?

Subscribe to our monthly newsletter!