Join Alan Simon for an in-depth discussion in this video Building your roadmap, part of Transitioning from Data Warehousing to Big Data.
- It's easy to look at a set of very well architected diagrams of your organization's future state that show big data driven analytics providing order of magnitude improvement and date driven insights and decisions over where you're currently at. But getting from where you currently are to that well thought out future state requires an equally well thought out road map. Let's take a look at the steps you need to go through to build a road map to move from today's data warehousing into the big data era in your organization.
First, you need to asses where you are. We've seen how you can quickly grade how well you're current environment is or isn't doing and where your hot spots are that need to be addressed. You need to do plenty of homework and research. Take a look a vendors and core technologies and see what other organizations are doing with big data and see what's work for them and what isn't working as well. Sell the ideas and concepts of big data within your organization and then you need to be prepared to counter misconceptions and misunderstandings about big data, Hadoop, and modern analytics.
Then we've also seen several alternatives for how we can bring Hadoop into your environment. One being the super sized data staging area or you may decide to take the plunge and build your next generation of data warehouses directly on top of Hadoop itself. All of these phases are part of your road map and once you're done with those you'll then need to proceed with planning, implementing, and then managing the environment once it's built. Let's look a each of those in more detail. When it comes to planning the big data program, all of the things that you would need to do for any large scale business or technology program apply here.
You need to build your budgets. Make sure you have the right skills and the right number of resources available. You need to build your project plans and schedules. You also have to have contingency plans to make sure that if you run into problems you can fall back to a safe place without winding up with serious problems. You need to put your status reporting mechanisms in place so your executives can keep track of where you are in your program. You need to make sure you have ways to track issues as well as resolve them in a timely manner.
You have to support all the necessary testing and quality assurance of all the different capabilities in your environment. As well as, users having the say in whether or not the capabilities that have been built meet their needs. Then of course, you need to make sure you have ways to actually deploy the systems out to the users and out to the field when they are built and ready to go. Beyond all of those typical aspects of program planning you need to make sure you address concerns related to migrating from one environment to another since almost certainly, you will not be building a brand new big data environment without having to cut over existing capabilities into it.
Here's an example of a migration oriented road map and the associated plans. Suppose you have the typical data warehouse with your source systems feeding data via etl into a relational data staging area. Which in turn sends data downstream into a seperate set of tables where the users access that information for their reports and dashboards. Rather than address all three of these components at once, what we may decide to do is leave our source systems exactly as is, leave our relational user assessable data our target data warehouse exactly as is and instead let's initially focus on the infrastructure.
What we do then is take that relational data staging area, get rid of it, and replace it with a Hadoop data staging area as well as transform our etl into elt processes. Hadoop has its capability known as Scoop which is used to bring data from relational databases into Hadoop and then the other direction as well out of Hadoop and into relational data bases. Then as part of this initial phase of work we transform our etl into Scoop based elt both into the staging area and then out of the staging area into the target data warehouse.
Then what we do once we have this in place we can start adding new sources using the streaming technology of Hadoop both Scoop and then Floom for non relational data as well as other capabilities if necessary. We can start quickly adding data sources into the staging area, we leave our user assessible data our data warehouse in place and then progressively start doting out our analytic capabilities and other aspects of our system. The key point to remember is to take the overall body of work that you need to accomplish and decompose it into smaller manageable groups of activities.
Be successful with each one of those before you move onto the next. Once you have your plans in place now it's time to implement the system and then manage it. Keep in mind that moving from data warehousing to big data is not just a technology platform migration but without a doubt you will be incorporating new functionality requirements for analytics and those will be occuring both during as well as after your implementation efforts. Here's what might change from your current state to future state as part of your migration.
New data sets will be incorporated. New reports will be created. New families of predictive analytics and discovery analytics. As well as moving all of your analytics into workflow driven prescriptive analytics where the systems advise you what your options are and what your recommended actions should be. All along the way you should think agile. Not necessarily a formal agile methodology but conceptually. Think small bodies of work.
Think rapid success. Think all of the basic tennants of an agile methodology. That should serve you very well. You need to do signifiant amounts of architecture work but once you've done a good job at that adding new funtionality very rapidly as requirements surface should work out very well for you.
- Exploring big data, Hadoop, and analytics
- Examining the shortcomings of traditional data warehousing
- Comparing big data architectures for next-generation data warehousing
- Understanding alternatives
- Building a roadmap
- Managing big data-driven projects
- Monitoring and measuring success