Start free trial Sign in

From the course: Introduction to SAP BI/BW

Extracting data overview - SAP ERP Tutorial

From the course: Introduction to SAP BI/BW

Start my 1-month free trial Buy for my team

Extracting data overview

“

- [Instructor] Okay, so in this presentation we're going to briefly walk you through what makes up the SAP BW system, the objects that are used within SAP BW, and how everything sort of works together to form a cohesive solution that makes this SAP BW Warehouse such a great tool. It's important to understand the difference between the OLTP system and the OLAP system. BW is our OLAP system which provides analytics, and ECC is an example of a transactional system where things happen in real-time, data's entered manually by users, and the BW system, which is the OLAP system, which you can see at the bottom half of the screen here in this image, is showing you that it's made for data mining, cleaning data, setting up all of your data in a reportable fashion, whereas the OLTP side, the top part, is just taking all sorts of data, it's not cleansed, and it's in real-time, transactional, and the OLAP, the bottom half, is historical. It's not real-time unless you configure it to be, and it's optimized for reporting. A good example of a transactional system would be pretend you're a large retailer like Wal-Mart. When a customer comes in and buys a product, a transaction is written to the database of the transactional environment. So let's say they're running SAP. That will be the ECC system. The OLAP system can't actually see this transactional information until it's loaded into the data warehouse on the BW side of things. So every night, Wal-Mart will then pull from the OLTP system, push that data into the OLAP environment, which is the BW data warehouse, and during that pull process, BW will cleanse that data, removing any invalid values that would've caused issues in reporting, and it's essentially staging the data and aligning it exactly how the reports are architected for the end user. What's the difference between OLAP and OLTP? Again, think of the OLTP system as the data entry point, very high detail, and the BW system is where the data is summarized, transformed, and ready to be reported off of. It doesn't matter how many systems you have connected to BW. You could cleanse that data any way you see fit in order to get it all into one centralized repository, whether it be in a Data Store Object or an InfoCube, and then create reports off of those info providers. BW is read-only. You can't go on any tables in the BW database and change values around. It's impossible. BW was designed only to insert records during their loading process. So you can't go into a table and change a value. That has to happen on the ECC side of the house in the OLTP transactional system, and then that data then gets loaded into BW to reflect that new change. And because BW is read-only, it's optimized for query processing. In this graphic, we can see the difference between the back end and the front end of BW. The top part of the screen is the front end. These are the tools that actually tap into BW to display reports for our end users. They can't actually change any data in the warehouse, but they can just report on it. SAP BW is made up of two things, a server which runs all the SAP software, and a database which stores all the tables that the server reads from. These two items, the server and the database, in accordance, make up the BW environment. On the bottom right of the screen, you can see these three objects here which are basically external data sources that we want to bring into our BW environment. We can pull from a whole assortment of different source systems. This is just a brief example of what you can pull from. So if you have a legacy system, chances are you could probably connect that into the BW application, if you want to load some files in a BW it's very easy, and if you want to just pull from a SAP source system, BW is ready and willing to accept that connection because it's already natively built into the functionality. Where can we extract data from? We can extract from the BW system that we're logged into. So if you want to pull from a table within your own BW database, you can do that. If you want to connect to a different SAP system, so in this example, the EH6 environment, which is not BW, that's another system that BW can talk to you and pull data from. BusinessObjects Data Services, other external systems. So if you're running an older version of Oracle or some other third-party vendor that has this particular niche software, they can provide BW with a driver that would allow you to connect to their particular application. Because out of the box, SAP has no idea what those applications potentially could be, so the vendor more often than not will provide you with the drivers you'll need to connect to their particular database. You have the file source system where you can connect to pretty much any kind of file, and then the DB Connect and UD Connect are just connecting to other databases, and Web Services which is a newer connector that was brought in in BW7.3. Again, this is the Source System screen we're looking at where you can select where you want to choose your data source from. To show you a live example of this, here we're logged into the transaction RSA1, which is BW's primary transaction, and we're currently under the Modeling section. We toggle over to Source Systems. We can see all the different source systems that have been created inside of this BW environment. Just minimize all this. You can see BW, SAP, Data Services, and if you want to view the data sources within these environments, all you have to do is click into any of these source systems, and it's going to display all the different data sources. These icons you're seeing here are data sources. And these two squares are the application components. You can think of those as like a top level folder to organize all of your data sources in. If you want to create a new application component, you simply come up to the top bar, this gray one, right click, and choose a Create Application component. Again, that's just for organizational purposes. So data sources live inside of source systems, and that's how we get data from outside of BW into BW, using these vehicles called data sources. The screen we're looking at now is the modeling workbench. This is where we do all of our design and configuration of BW objects to get ready for our data to be loaded and optimized for reporting for our end users. You can see there's a bunch of different options that you can choose from over here on this left hand pane. You can find for a particular object if you know what it's called, view the data flow of a particular project you may have developed, and we'll get into more detail as far as what these different icons mean a little later in this video, but you can see that everything's sort of organized in an object hierarchy, the lowest level being where the data's getting pulled from, which is your source, all the way up to the top level which is our InfoCube, which is the target where we actually run our reports from. Real quick, you can see the different icons, the InfoPackage, the data source, the data transfer process, the transformation, and the InfoCube. That hierarchy is similar to all of these other development hierarchies, and the objects you use for your project are dependent on what your end result needs to be, so how you want to stage that data and get it ready for reporting. This object here is an Info Object, and normally you won't see Info Objects in the InfoProvider screen, so when you're clicking on these things it's actually filtering out the objects that aren't part of that particular subset of object. Any object thats master data can be configured to be an info provider, so you could actually write reports off of this customer number, but you can't write a report off of an Info Object under this section, which is just a generic Info Object. I'll scroll through here. Much like application components for data sources, that's how they all get categorized and stored, you can store your Info Objects under things called catalogs. It's the same exact thing as the application component when you're creating a topical folder to assign your objects into for organizational purposes. It's not required for development, but it's a best practice to have those. These are our Info Objects here, this icon, the triangle and the little grid behind it, and these are characteristics, and if I expand this, these are key figures. We'll go into more detail what both of these objects mean, the characteristic and the key figure, a little later in this video, but just a quick way of remembering what the difference is. Key figures are what we use to calculate our data. Characteristics are what we use to describe the data that's being calculated. So you wouldn't be able to calculate, for instance, a cost center. But the cost center has an assigned amount. Just as a review, the Data Workbench, which is transaction RSA1, is where we do all of our modeling, which incorporates the design process, the maintenance of your existing projects that you've created, administration, optimizing of your data and your structures, here you can set your schedules for your loads, when you want them to kick off, whether they be hourly, nightly, monthly loads, that's all defined within this section, and of course monitoring your status of your loads. So if we go back, I can show you an example of each of these. When we administrate within the BW system, say we got some bad data in this InfoCube, all you need to do is highlight the cube by clicking on it, right click, and select Manage. This gives you the ability to remove a bad request if some data came through that wasn't accurate. Say you had some data that wasn't actually ready for production yet but it accidentally got pushed into production, you would just highlight the row, press Delete, the request would get dropped, you can optimize performance of your particular InfoCube by creating indexes at the database level, creating aggregates at the database level. There's all sorts of different tips and tricks you can do to increase your performance. You could view the data, so if you just want to double check to make sure your data's accurate, look at the content of your InfoCube, and this could all be done within this interface. If we want to look at the status of this DTP load, we can click on our DTP, double click, and up in the top here you can see this little heart rate looking monitor, click on that, we can see that there's currently no data selected. let's see if we can change our filter range. This DTP hasn't run yesterday or today. So let's make this a free date selection, if we want to, or do this month and last month. We'll run that and see if this has been run recently, and it has. So this particular request, 2091, is tied to this DTP load. We can see that 24 records were loaded, and everything's all agreeing. If there was any issues with this load, everything would be colored red for the particular process that failed, and if it's currently running, it's yellow, meaning be cautious because this might fail, but be optimistic 'cause we're hoping for green. Let's get our data in there. And then if you want to schedule, we can define when these data transfer processes run by going to a process chain. Now, we'd simply create a new process chain that would run at a predefined interval that you want it to run on. So if we just double click into this process chain that someone's created, it displays a network. It's a lot cleaner that way. Not really a fan of the tree output. And we can see that there's an order defined for this particular load. So it's deleting old data, then it's loading new data, has a different targets that it's dropping data from before it loads. The data store activations, we'll get into a little later, but you can see it's creating indexes and doing all of this automatically. So there's no human intervention here unless something actually fails. You could set up alerts where if one of these processes fail it'll send an email to you or call your phone and send you a text. You can do all that by creating system messages and then having those messages associated with one of these tasks. So if it turns red, fails, it'll immediately alert you that, "Hey, this failed. "We need you to go back in there, "take a look at what's going on, "and investigate why it's failing." More often than not, when things fail on BW it's not because of any bad data, it's because there's not enough processes available to run a particular job, and it might time out. Or if two things are trying to run at the same time there might be a collision at the database level which would cause a failure. There's a number of things that can go wrong when you're loading. But usually if your processes are timed out properly, you have schedules where multiple things aren't running at the same time, you should have no issues. You could design all of your chains to kick off sequentially so nothing would ever run at the same time. But if you have a lot of data, chances are you're going to have to do some sort of scheduling where multiple things are running, but as long as they're not hitting the same objects, you should be fine. So BW's ETL, or Extract Transform Load, is as follows. BW uses source systems, which contain data sources, to extract data from those particular source systems. We then transform that data using transformations. So from source to target, let's say you want your calender day instead of being a four digit year to be just a two digit year, you can do that by adding some custom coding to this particular column here called Rule. And lastly, the L, data loading. We use data transfer processes to load our data. If you look down at the bottom of the screen here, it shows you how the data transfer process works. What it's doing is it's tapping into the data source, applying any filters if they're created at the DTP level, so you go under Extraction here. You can specify if you just want a particular year or a certain range of values, and then it's just going to apply that filter. If there are any errors, they'll show up in the Error Handling Log, and then it actually executes the transformation. This guy is going to be run when we run our DTP. Then it's going to push the data into our InfoCube. Let's take a look at both a transformation and a DTP in a live BW system. We open this InfoArea. We can see a transformation, so we'll double click. Over here, it shows our data source and our target. We're loading from a data source, sales underscore data underscore one, right to our InfoCube. So there's no intermediary DSO. You should always try to use a DSO. It's a best practice because it allows you to have data in a more detailed level, and then whatever you really want to be in your end user reports you'd be putting in InfoCubes because they're optimized for performance. We'll get into more details on why in the actual training course. Now, let's look at a DPT.

Contents