Join Alan Simon for an in-depth discussion in this video Exploring the Hadoop ecosystem, part of Transitioning from Data Warehousing to Big Data.
- When it comes to applying big data technology into the world of business intelligence and data warehousing Hadoop is definitely a game-changer for managing enterprise data far beyond what we've ever been able to do with traditional data warehousing. Specifically Hadoop helps us address the three Vs that we looked at earlier. First it helps us break the volume barrier of relational databases and handle capacities of data far beyond anything we've ever been able to do with traditional data warehouses.
Second Hadoop helps us support a variety of data formats all within the same system. We have this structured data: numbers and character strings and dates that we typically have had in data warehousing for the recourse that we've done. But we also now, within that same environment, can take advantage of semi structured data: tweets and blogs and emails, as well as unstructured data: RDO, video, and images. And bring all of that data together for the types of analytics that we need to run in this new era.
And then finally Hadoop is architected for high-velocity data intake and usage. We bring Hadoop driven data into an environment and make it available to users far faster than we ever did with traditional data warehousing and business intelligence. We bring all of these together: the volume, the variety, and the velocity, and it opens up an entirely new realm of possibilities for the types of analytics we need to do versus what we ever did in the business intelligence era. Hadoop is best thought of as an entire ecosystem.
We find a data storage environment within Hadoop, we also have a number of different languages and tools and APIs. As well as the vendors that bring Hadoop, which is an open source environment, to the market place, they add their own enhancements and extensions. At the core of Hadoop we finds HDFS, or the Hadoop Distributed File System. This is the portion of Hadoop that's used to distribute and manage data across numerous servers, to handle the very large capacities of data far beyond what relational databases have been able to handle.
Interestingly there are some elements of HDFS in its ability to handle data across all those different servers, which are conceptually similar to the Distributed Data Base Management System approach that we looked at earlier in this course, that wound up not being a viable solution for managing our enterprise data. And in turn lead to this generation of data warehousing, that we've been working with for many years. The Hadoop environment has a number of uniquely and cleverly named different tools and languages and APIs, some of which you will become familiar with, others may be more tangential to what you do.
Not all of them relate to the usage of Hadoop as next generation data warehouses for next generation business intelligence and analytics but some of them do. And we'll look later throughout the course at which ones of these you are likely to become familiar with. When it comes to vendors enhancing and extending Hadoop I mentioned that many Hadoop vendors will add their own enhancements and extensions and we look here at three different ones: Cloudera, Pivotal, and IBM.
And they each have their own data warehouse and SQL database type of extensions such as Impala, HAWQ, and Big SQL, respectively. And we'll look later at some of the details of each of these but the thing you should remember though is that vendors take the core Hadoop environment and then they take that code base and adjust it as needed to try to make it as competitive as possible and functional as possible out there in the marketplace. What does all of this mean to you? If you're coming from the world of business intelligence and data warehousing into this new realm of big data and analytics you do need to have a strong understanding of Hadoop.
But you don't necessarily need to be an expert in every technical aspect of Hadoop to be able to architect and design advanced analytical solutions. Many of these tools will be very important to you others will be less so. But you will need to have a solid understanding of Hadoop at least at the architectural level and that's where we'll take a deeper look.
- Exploring big data, Hadoop, and analytics
- Examining the shortcomings of traditional data warehousing
- Comparing big data architectures for next-generation data warehousing
- Understanding alternatives
- Building a roadmap
- Managing big data-driven projects
- Monitoring and measuring success