From the course: Apache Flink: Real-Time Data Engineering

Reading from a stream source - Flink Tutorial

From the course: Apache Flink: Real-Time Data Engineering

Start my 1-month free trial

Reading from a stream source

- [Instructor] In this video, We will continue on the example discussed in the previous video with the basic streaming operations class. In order to read data from a data source, Flink needs to create a data source connection and attach it to the data stream object. For a file based data source. We defined the input format as text input format and pointed to the data directory where the files are created. Then we use the read file function in the streaming environment Object to read the contents of this directory. In this function we first specify the format of the file. Then we specify the directory to monitor for new or changed files. When your change files are found, Flink will automatically pick up these files for processing and attach it to the data string. The final processing mode is set the process continuously. This means that Flink will continuously monitor the directory for any new files. The monitoring interval is set to one second. To receive data into this data stream, it is required to declare the data type of each record. We start off with the data type as a string, to read it into the audit trail string data string. A string format for a record has limited use. It needs to be either converted into a POJO class or a tople For further processing. We will convert a string record into an audit trail POJO object using a map function. The map function will continuously process new records as they arrive into the string. It will print the received record so we can check its output. It also converts it into an audit trail object. The audit trail object contains logic to interpret the string and converted into a corresponding object. Back to the main code. Let's run this code and review the results. The log messages generated by the string generator are printed in blue to differentiate from the messages generated by the Flink job. As a generator continues to generate events, we see that the map function receives these events and converts them into auditory objects. In the next video, we will perform some computations on this data.

Contents