No matter how smooth the plan may be in theory, practice will certainly make adjustments. Because each real case has its own characteristics, which in the general case cannot be taken into account. Let's see how the world's leading brands have adapted to their needs a well-known way of storing information — data warehousing.
The Reason for Making Decisions
The need to make business decisions based on data analysis has long been beyond doubt. But to get this data, it needs to be collected, sorted and prepared for analytics. This is what data warehousing specialists do. To focus on the best performance, it makes sense to consider how high-quality custom assemblies came out of this constructor.
Data warehousing interacts with a huge amount of data
A data warehousing is a digital storage system that integrates and reconciles large amounts of data from different sources. It helps companies turn data into valuable information and make informed decisions based on it. Data warehousing combines current and historical data and acts as a single source of reliable information for business.
After raw data mining (extract, transform, load) info enters the warehouse from operating systems, such as an enterprise data resource planning system or a customer relationship management system. Sources also include databases, partner operational systems, IoT devices, weather apps, and social media. Infrastructure can be on-premises or cloud-based, with the latter option predominating in recent times.
Data warehousing is necessary not only for storing information, but also for processing structured and unstructured data: video, photos, sensor indicators. Some data warehousing options use built-in analytics and in-memory database data technology (info is stored in RAM rather than on a hard drive). This is necessary to access reliable data in real time.
After data is sorted, it is sent to data marts for further analysis by BI or data science.
Why consider data warehousing cases
Consideration of known options for data warehousing is necessary, first of all, in order not to keep making the same mistakes. Based on a working solution, you can improve your own performance.
- When using data warehouses, executives access data from different sources, they do not have to decide blindly.
- Data warehousing is needed for quick retrieval and analysis. When using warehouses, you can quickly request large amounts of data without involving personnel for this.
- Before uploading to the warehouse, the system creates data cleansing tasks and puts them for further processing, ensuring converting the data into a consistent format for subsequent analyst reports.
- The warehouse contains large amounts of historical data and allows you to study past trends and issues to predict events and improve the business structure.
Blindly repeating other people's decisions is also impossible. Your case is unique and probably requires a custom approach. At best, well-known storage solutions can be taken as a basis. You can do it yourself, or you can contact DATAFOREST specialists for professional services. We have a positive experience and positive customer stories of data warehousing creating and operating.
Case 1: How the Amazon Service Does Data Warehousing
Amazon is one of the world's largest and most successful companies with a diversified business: cloud computing, digital content, and more. As a company that generates vast amounts of data (including data warehousing services), Amazon needs to manage and analyze its data effectively.
Two main businesses
Amazon's data warehousing needs are driven by the company's vast and diverse data sources, which require sophisticated tools and technologies to manage and analyze effectively.
1. One of the main drivers of Amazon's business is its e-commerce platform, which allows customers to purchase a wide range of products through its website and mobile apps. Amazon's data warehousing needs in this area are focused on collecting, storing, and analyzing data related to customer behavior, purchase history, and other metrics. This data is used to optimize Amazon's product recommendations engine, personalize the shopping experience for individual customers, and identify growth strategies.
2. Amazon's other primary business unit is Amazon Web Services (AWS), which offers cloud computing managed services to businesses and individuals. AWS generates significant amounts of data from its cloud data infrastructure, including customer usage and performance data. To manage and analyze this modern data effectively, Amazon relies on data warehousing technologies like Amazon Redshift, which enables AWS to provide real-time analytics and insights to its customers.
3. Beyond these core businesses, Amazon also has significant data warehousing needs in digital content (e.g., video, music, and books). Amazon's advertising business relies on data analysis to identify key demographics and target ads more effectively to specific audiences.
By investing in data warehousing and analytics capabilities, Amazon through digital transformation can maintain its competitive edge and continue to grow and innovate in the years to come.
Obstacles on the way to the goal
Amazon faced several specific implementation details and challenges in its data warehousing efforts.
• The brand needed to integrate data from various sources into a centralized data warehouse. It required the development of custom data pipelines to collect and transform data into a standard format.
• Amazon's data warehousing needs are vast and constantly growing, requiring a scalable solution. The company distributed data warehouse architecture center using technologies like Amazon Redshift, allowing petabyte-scale data storage and analysis.
• As a company that generates big data, Amazon would like to ensure that its data warehousing solution could provide real-time data analytics and insights. Achieving high performance requires optimizing data storage, indexing, and querying processes.
• Amazon stores sensitive customer data in its warehouse, prioritizing data security. To protect against security threats, the brand implements various security measures, including encryption, access controls, and threat detection.
• Building and maintaining a data warehousing solution can be expensive. Amazon leverages cloud-based data warehousing solutions (Redshift) to minimize costs, which provide a cost-effective, pay-as-you-go pricing model.
Amazon's data warehousing implementation required careful planning, significant investment in technology and infrastructure, and ongoing optimization and maintenance to ensure high performance and reliability.
Change for the better
When Amazon considered all the needs, found the right tools, and implemented a successful data warehouse, the company got the following main business outcomes:
• Improved data driven decision
• Better customer enablement
• Improved performance
• Competitive advantage
Amazon's data warehousing implementation has driven the company's growth and success. Not surprisingly, a data storage service provider must understand data storage. The cobbler's children don't need to have no shoes.
Case 2: Data Warehousing Adventure with UPS
United Parcel Services (UPS) is an American parcel delivery and supply chain management company founded in 1907 with an annual revenue of 71 billion dollars and logistics services in more than 175 countries. In addition, the brand distributes goods, customs brokerage, postal and consulting services. UPS processes approximately 300 million tracking requests daily. This effect was achieved, among others, thanks to intelligent data warehousing.
One mile for $50 million
In 2013, UPS stated that it hosted the world's largest DB2 relational database in two United States data centers for global operations. Over time, global operations began to increase, as did the amount of semi structured data. The goal was to use different forms of storage data to make better users business decisions.
One of the fundamental problems was route optimization. According to an interview with the UPS CTO, saving 1 mile a day per driver could save 1.5 million gallons of fuel per year or $50 million in total savings.
However, the data was distributed in DB2; some included repositories, some local, and some spreadsheets. UPS needed to solve the data infrastructure problem first and then optimize the route.
Four letters "V."
The big data ecosystem efficiently handles the four "Vs": volume, validity, velocity, and variety. UPS has experimented with Hadoop clusters and integrated its storage details and computing system into this ecosystem. They upgraded data warehousing and computing power to handle petabytes of data, one of UPS's most significant technological achievements.
The following Hadoop components were used:
• HDFS for storage
• Map Reduce for fast processing
• Kafka streaming
• Sqoop (SQL-to-Hadoop) for ingestion
• Hive & Pig for structured queries on unstructured data
• monitoring system for data nodes and names
But that's just speculation because, due to confidentiality, UPS didn't declassify the tools and technologies they used in their big data ecosystem.
Constellation of Orion
The result was a four-year ORION (On-Road Integrated Optimization and Navigation) route optimization project. Costs — about one billion dollars a year. ORION used the results to data stores and calculate big data and got analytics from more than 300 million data points to optimize thousands of routes per minute based on real-time information. In addition to the economic benefits, the Orion project shortened approximately 100 million shipping miles and a 100,000-ton reduction in carbon emissions.
Case 3: 42 ERP Into One Data Warehouse
In general, the topic of specific cases of data warehousing implementation is sufficiently secret. There may be cases of consent and legitimate interests in the contracts. There are open-source examples of work, but the vast majority are on paid libraries. The subject is so relevant that you can earn money from it. Therefore, sometimes there are "open" cases, but the brand name is not disclosed.
Brand X needs help
World leader in industrial pumps, valves, actuators, controls, etc., needed help extracting data from disparate ERP systems. They wanted it from 42 ERP instances, standardized flat files, and collected all the information in one data warehouse. The ERP systems were from different vendors (Oracle, SAP, BAAN, Microsoft, PRMS) to complicate future matters.
The client also wanted a core set of metrics and a central dashboard to combine all the information from different locations worldwide. The project resulted from a surge in demand for corporate data from database management. The company knew its data warehousing needed a central repository for all data from its locations worldwide. Requests often came from top to bottom, and when an administrator required access to the correct data, there were logistical extracting problems. And the project gets started.
The foundation stone
The hired third-party developer center has made a roadmap, according to which ERP data was taken from 8 major databases and placed in a corporate data warehouse. It entailed integrating 5 Oracle ERP instances with 3 SAP ERP. Rapid Marts have also been integrated into Oracle ERP systems to improve the project's progress.
One of the main challenges was the need for more standardization of fields or operational data definitions in ERP systems. To solve this problem, the contractor has developed a data service tool that allows access to the back end of the database and displays info suitably. Since then, the customer has known which fields to use and how to set them each time a new ERP instance is encountered. These data definition patterns were the project's foundation stone and completely changed how customer data is handled. It was a point to launch consent.
All roads lead to data warehousing
The company has one common and consistent way to obtain critical indicators. The long-term effect of the project is the ease of obtaining information. What was once a long and inconsistent process of getting relevant information at an aggregate level is now streamlined to store data in one central repository with one team controlling it.
Data Warehousing: Different Cases — General Conclusions
Each data warehouse organization has unique methods and tools because business needs differ. In this case, data warehousing can be compared with a mosaic and a children's constructor. You can make different figures from the same parts, arranging the elements especially. And if one part is lost or broken, you need to make a new one or find another one and "process it with a rasp."
Generalities between different cases of data warehousing
There are several common themes and practices among successful data warehousing implementations, including:
• Successful data warehousing implementations start with clearly understanding the business objectives and how the warehouse (or data lake) can support those objectives.
• The data modeling process is critical to the success of data warehousing.
• The data warehouse is only as good as the data it contains.
• Successful data warehousing requires efficient data integration processes that can operate large volumes of data and ensure consistency and accuracy.
• Data warehousing needs ongoing performance tuning to optimize query performance.
• A critical factor in data warehousing is a user-friendly interface that makes it easy for end users to access the data and perform complex queries and analyses.
• Continuous improvement is essential to ensure the data warehouse remains relevant and valuable to the business.
Competent data warehousing implementations combine technical expertise and a deep understanding of business details and user needs.
Your case is not mentioned anywhere
When solving the problem of organizing data warehousing, one would like to find a description of the same case and do everything according to plan. But the probability of this event is negligible — you will have to adapt to the specifics of the customer's business and consider your knowledge and capabilities, as well as the technical and financial conditions of the project. Then it would help if you took a piece of the puzzle or parts of the constructor and built your data warehouse. Minus — you have to work. Plus — it will be your decision on data storage and only your implementation.
Data Warehousing Is Like a Trampoline
Changes in data warehousing, like any technological and methodological changes, are carried out to improve the data collection, storage, and analysis level. It takes the customer to a new level in his activity and the contractor — to his own. Like a jumper and a trampoline: separately, it is just a gymnast and just equipment, and in combination, they give a certain third quality — the possibility of a sharp rise.
If you are faced with the problem of organizing a new data warehousing system, or you are simply interested in what you read, let's exchange views with DATAFOREST.
What is the benefit of data warehousing for business?
A data warehouse is a centralized repository that contains integrated data from various sources and systems. Data warehousing provides several benefits for businesses: improved decision-making, increased efficiency, better customer insights, operational efficiency, and competitive advantage.
What is the definition of a successful data warehousing implementation?
The specific definition of a successful data warehouse implementation will vary depending on the goals of the organization and the particular use case for data warehousing. Some common characteristics are: meeting business requirements, high data quality, scalability, user adoption, and positive ROI.
What are the general considerations for implementing data warehousing?
Implementing data warehousing involves some general considerations: business objectives, data sources, quality and modeling, technology selection, performance tuning, user adoption, ongoing maintenance, and support.
What are the most famous examples of the implementation of data warehousing?
There are many famous examples of the implementation of data warehousing across industries:
• Walmart has one of the largest data warehousing implementations in the world
• Amazon's data warehousing solution is known as Amazon Redshift
• Netflix uses a data warehouse to store and analyze data from its streaming platform
• Coca-Cola has a warehouse to consolidate data from business units and analyze it
• Bank of America analyzes customer data by data warehousing to improve customer experience
What are the challenges while implementing data warehousing, and how to overcome them?
Based on the experiences of organizations that have implemented data warehousing, some common challenges and solutions are:
• Ensuring the quality of the data that is being stored and analyzed. You must establish data quality standards and implement data validation and cleansing by data types.
• Integrating from disparate data sources. Establishing a clear data integration strategy that considers the different data sources, formats, and protocols involved is vital.
• As the amount of data stored in a data warehouse grows, performance issues may arise. A brand should regularly monitor query performance and optimize the data warehouse to ensure that it remains efficient and effective.
• To ensure that sensitive data stored in the data warehouse is secure. It involves implementing appropriate measures such as access controls, encryption, and regular security audits. They are details of privacy security.
• Significant changes to existing processes and workflows. Solved by establishing a transparent change management process that involves decision-makers and users at all levels.
What is an example of how successful data warehousing has affected a business?
An example of how successful data warehousing has affected Amazon is its recommendation engine. It suggests products to customers based on their browsing and purchasing history. By using artificial intelligence and machine learning algorithms to analyze customer data, Amazon has improved the fully managed accuracy of its recommendations, resulting in increased sales and customer satisfaction.
What role does data integration play in data warehousing?
Data integration is critical to data warehousing, enabling businesses to consolidate and standardize data from multiple sources, ensure data quality, and establish effective data governance practices.
How are data quality and governance tracked in data warehousing?
Data quality and governance are tracked in data warehousing through a combination of data profiling, monitoring, and management processes and establishing data governance frameworks that define policies and procedures for managing data quality and governance. So, businesses can ensure that their data is accurate, consistent, and compliant with regulations, enabling effective decision-making and driving business applications' success.
Are there any measures to the benefits of data warehousing?
The benefits of business data warehousing can be measured through improvements in data quality, efficiency, decision-making, revenue and profitability, and customer satisfaction. By tracking these metrics, businesses can assess the effectiveness of their data warehousing initiatives and make informed decisions about future investments in data management and analytics with cloud services.
How to avoid blunders when warehousing data?
By following the best practices, businesses can avoid common mistakes, minimize the risk of blunders when warehousing data, and ensure their data warehousing initiatives are successful and practical to be analyzed with business intelligence.