From the course: AWS for Developers: Data-Driven Serverless Applications with Kinesis (2019)

Kinesis introduction

- [Instructor] One of the AWS services that we will use during this course is Amazon Kinesis. Amazon Kinesis makes easy to collect, process, and analyze real-time streaming data. Let's see quickly what are the benefits of using Amazon Kinesis. Real time. The data can be ingested in real time and can be processed in seconds. Fully managed. Kinesis is a fully managed service. You don't need it to manage infrastructure. Scalable. Kinesis can handle any amount of a streaming data and process it from different sources in very low latencies. Pay as you go. In this service, you pay for how much you use. Amazon Kinesis is part of the AWS serverless offering. It's a backend as a service service. There are four different flavors of Kinesis available depending on the data that is going to be ingested. Kinesis data streams. Enables you to build custom, real-time applications that process data streams. Kinesis video streams. Make it easy to stream video from connected devices to AWS for analytics, machine learning, and other processing. Kinesis data firehouse. Loads data streams into AWS data stores. Kinesis data analytics. Analyze data streams with SQL or Java. In this course, we are going to focus on Amazon Kinesis data streams. Let's see how they work in a nutshell. First, we have the input to stream, anything that is data can be an input. Then, the data get into the streams and, one by one, the vents are processed. Third, the processing part can be done by different tools like Amazon Kinesis data analytics, Sparks, or motionary computing like EC2 or AWS Lambda. Finally, the outputs can be analyzed later in some other tool. Let's see now some use cases for Kinesis data streams. There a many use cases for Kinesis data streams can be very handy. Log and event data collection. Kinesis can be used to collect log and event data from different sources. The data can be aggregated, for example, and displayed in a dashboard. Real-time analytics. You can gain real-time insights from your data as the events are coming inside the stream and then process when they arrived. Mobile data capture. Your mobile applications can push data to Kinesis. Kinesis can take hundreds of files in devices and make the data available to you as soon as it's produced. Gaming data feed. Kinesis can be used to collect data about interactions of a player in a game. Let's find some key concepts that we will use when we work with Kinesis. Shard. It's the base throughput unit of the Amazon Kinesis data stream. Let's dip in more into what is a shard. A shard is an append-only log and a unit of a streaming capability. A shard contains an ordered sequence of records ordered by arrival time. One shard can ingest up to a thousand data records per second, or one megabyte per second. More shards means more ingesting capability. You will specify the numbers of shard needed when creating the stream, but you can change it later. You can add and remove shards dynamically as your data throughput changes. Data steam is a logical grouping of shards. There are no bounds on the number of shards within a data stream. Data streams retain the data for 24 hours up to seven days when the retention is extended. A partition key is a meaningful identifier. It is specified by your data producer while putting your data into Kinesis data steam and it's useful for consumers, as they can use the partition key to replay or build a history associated with the partition key. The partition key is also used to root data records to different shards of a stream. A sequence number is a unique identifier for each data record. A sequence number is a sign by Kinesis when a new record is added. A data record is the unit of data stored in a stream. A record is composed of sequence number, partition key, and a data blob. A data blob is the data your producer adds to the stream. The maximum size of the blob after base64 decoding is one megabyte. The data producer is an application that emits data records to a Kinesis data stream. The data producer assigns partitions keys to record. And this partition keys ultimately determine which shards ingest the data record in that stream. The data consumer is a Kinesis application or AWS service that is getting the data from all the shards in the stream as it is generated. Most data consumers retrieve the most recent data in the shard.

Contents