Note: This course was created by Packt Publishing. We are pleased to host this training in our library.
- Memory organization
- Parallel programming models
- Designing a parallel program and evaluating performance
- Working with threads in Python
- Synchronizing threads and using multithreading
- Spawning a process
- Running a process in the background
- Synchronizing processes
- Using the mpi4py Python module
- Using collective communication
- Reducing operations
- Managing events, tasks, and routines with Asyncio
- Distributing tasks
Skill Level Intermediate
- [Voiceover] Hi, welcome to the first section of the course. Getting Started with Parallel Computing and Python. In this section we'll deal with parallel computing and it's memory architecture. We'll also look at memory organization, and parallel programming models. Next we'll see how to design a parallel program, and also to evaluate the performance of a parallel program. Further, we'll get introduced to Python. And we'll work with the processes and throttle on with them. So let's being with the first video of this section. The Parallel Computing Memory Architecture. In this video we'll learn about Flynn's taxonomy which includes, SISD, MISD, SIMD, and MIMD.
We'll now take a look at the parallel computing memory architecture. Based on the number of instructions and data that can be processed simultaneously, computer systems are classified into four categories. One, Single Instruction Single Data, SISD. Two, Single Instruction Multiple Data, SIMD. Three, Multiple Instructions Single Data, MISD. Four, Multiple Instructions Multiple Data, MIMD. This classification is known as Flynn's taxonomy. Let's understand what SISD is.
The SISD computing system is a uni-processor machine. It executes a single instructions that operates on a single data stream. In SISD, machine instructions are processed sequentially. In a clock cycle, the CPU executes the operations which are One, fetch. The CPU fetches the data and Instructions from a memory area that is called a register. Two, decode. The CPU decodes the instructions. Three, execute. The instruction is carried out on the data. The results of the operation is stored in another register.
Once the execution stage is complete, the CPU sets itself to begin another CPU cycle. This is how the SISD architecture schema works. The algorithms that run on these type of computers are sequential, or serial. Since they do not contain any parallelism. Examples of SISD computers are hardware systems with a single CPU. The main elements of these architectures, also known as Von Neumann architectures, are as shown. One, central memory unit. This is used to store both instructions and program data.
Two, CPU. This is used to get the instruction and/or data from the memory unit, which decodes the instructions and sequentially implements them. Three, the I/O system. This refers to the input data and output data of the program. The conventional single processor computers are classified as SISD systems. This figures specifically shows which area of a CPU are used in the stages of fetch, decode, and execute. Now let's see what MISD is. In this model, in processors, each with their own control unit, share a single memory unit.
And each clock cycle, the data received from the memory is processed by all processors simultaneously. Each in corridence with the instructions received from it's control unit. In this case, the parallelism, instruction level parallelism, is obtained by performing several operations on the same piece of data. The types of problems that can be solved efficiently in these architectures are rather special. Such as those regarding data encryption. For this reason, the computer MISD did not find space in the commercial sector. MISD computers are more of an intellectual exercise, than a practical configuration.
This is how the MISD architecture scheme looks. Next we'll take a look at SIMD. A SIMD computer consists of of "n" identical processors, each with its own local memory, where it's possible to store data. All processors work under a control of a single instruction stream. In addition to this, there are "n" data streams. One for each processor. The processors work simultaneously on each step and execute the same instruction, but on different data elements. This is an example of data level parallelism. The SIMD architectures are much more versatile than MISD architectures.
Numerous problems covering a wide range of applications, can be solved by parallel algorithms on SIMD computers. Another interesting feature is that the algorithms for these computers are relatively easy to design, analyze, and implement. The limit is that only the problems that can be divided into a number of sub problems, which are all identical, each of each will then be solved contemporarily through the same set of instructions. Can be addressed with the SIMD computer. With a super computer developed, according to this paragon, we must mention the connection machine, 1985 thinking machine.
An NPP, which is NASA from 1983. As we will see in section six, GPU programming with python, the Athene of modern graphics processor unit, or GPU, built with many SIMD embedded units, has led to a more widespread use of this computational paragon. We now move on to understand what MIMD is. This class of parallel computers is the most general, and more powerful class according to Flynn's classification. There are end processors and instruction streams, and "n" data streams in this.
Each processor has it's own control unit and local memory. Which makes MIMD architecture more computationally powerful than those used in SIMD. Each processor operates under the control of a flow of instructions issued by it's own control unit. Therefore, the processors can potentially run in different programs, on different data. Solving sub problems that are different and can be part of a single larger problem. In MIMD, the architecture is achieved through the help of the parallelism level with threads and/or processes. This also means that the processors usually operate asynchronously. The computers in this class are used to solve those problems that do not have a regular structure that is required by the model SIMD.
Nowadays, this architecture is applied to many PC's, super computers, and computer networks. However, there is a counter that you need to consider. Asynchronous algorithms are difficult to design, analyze, and implement. Great. So now we know how the parallel computing memory architecture works. In the next video we'll take a look at memory organization.