Amazon Kinesis -Awsbloginfo
Amazon Kinesis is a managed, scalable, cloud-based service that allows real-time processing of streaming large amount of data per second. It is designed for real-time applications and allows developers to take in any amount of data from several sources, scaling up and down that can be run on EC2 instances.
It is used to capture, store, and process data from large, distributed streams such as event logs and social media feeds. After processing the data, Kinesis distributes it to multiple consumers simultaneously.
How to Use Amazon KCL?
It is used in situations where we require rapidly moving data and its continuous processing. Amazon Kinesis can be used in the following situations −
Data log and data feed intake − We need not wait to batch up the data, we can push data to an Amazon Kinesis stream as soon as the data is produced. It also protects data loss in case of data producer fails. For example: System and application logs can be continuously added to a stream and can be available in seconds when required.
Real-time graphs − We can extract graphs/metrics using Amazon Kinesis stream to create report results. We need not wait for data batches.
Real-time data analytics − We can run real-time streaming data analytics by using Amazon Kinesis.
Limits of Amazon Kinesis?
Following are certain limits that should be kept in mind while using Amazon Kinesis Streams −
Records of a stream can be accessible up to 24 hours by default and can be extended up to 7 days by enabling extended data retention.
The maximum size of a data blob (the data payload before Base64-encoding) in one record is 1 megabyte (MB).
One shard supports up to 1000 PUT records per second.
For more information related to limits, visit the following link − https://docs.aws.amazon.com/
kinesis/latest
Kinesis
Before knowing about the Kinesis, you should know about the streaming data.
What is streaming data?
Streaming data is data which is generated continuously from thousands of data sources, and these data sources can send the data records simultaneously and in small size.
Following are the examples of streaming data:
- Purchases from online stores
People buying stuff on amazon.com and generates streaming data and that streaming data can be transactions, product, etc. - Stock prices
Stock price is also an example of streaming data. - Game data
Suppose the user is playing an angry bird game and the application is generating streaming data back to the central server. This streaming data could be "what the user is doing", "what is the score". - Social network data
Social network data is also another example of streaming data. Suppose you visit on Facebook, update your status, and put a post on your friend's wall. All these data would then be streamed. - Geospatial data
When you are using uber, and your device is connected to the internet. Uber application is constantly saying that where the uber driver is, where you are, and it is interrogating the map to give you the best possible route to your destination. This is also a good example of streaming data. - iOT Sensor Data
It senses the all around world monitoring temperature.
What is Kinesis?
Kinesis is a platform on AWS that sends your streaming data. It makes it easy to analyze load streaming data and also provides the ability for you to build custom applications based on your business needs.
Core Services of Kinesis
- Kinesis Streams
- Kinesis Firehose
- Kinesis Analytics
Kinesis Streams
- Kinesis streams consist of shards.
- Shards provide 5 transactions per second for reads, up to a maximum total data read rate of 2MB per second and up to 1,000 records per second for writes up to a maximum total data write rate of 1MB per second.
- The data capacity of your stream is a function of the number of shards that you specify for the data stream. The total capacity of the Kinesis stream is the sum of the capacities of all shards.
Architecture of Kinesis Stream
Suppose we have got the EC2, mobile phones, Laptops, IOT which are producing the data. They are known as producers as they produce the data. The data is moved to the Kinesis streams and stored in the shard. By default, the data is stored in shards for 24 hours. You can increase the time to 7 days of retention. Once the data is stored in shards, then you have EC2 instances which are known as consumers. They take the data from shards and turned it into useful data. Once the consumers have performed its calculation, then the useful data is moved to either of the AWS services, i.e., DynamoDB, S3, EMR, Redshift.
Kinesis Firehouse
- Kinesis Firehose is a service used for delivering streaming data to destinations such as Amazon S3, Amazon Redshift, Amazon Elasticsearch.
- With Kinesis Firehouse, you do not have to manage the resources.
Architecture of Kinesis Firehose
Suppose you have got the EC2, mobile phones, Laptop, IOT which are producing the data. They are also known as producers. Producers send the data to Kinesis Firehose. Kinesis Firehose does not have to manage the resources such as shards, you do not have to worry about streams, you do not have to worry about manual editing the shards to keep up with the data, etc. It?s completely automated. You do not have to worry even about the consumers. Data can be analyzed by using a Lambda function. Once the data has been analyzed, the data is sent directly over to the S3. The analytics of data is optional. One important thing about Kinesis Firehouse is that there is no automatic retention window, but the Kinesis stream has an automatic retention window whose default time is 24 hours and it can be extended up to 7 days. Kinesis Firehose does not work like this. It essentially either analyzes or sends the data over directly to S3 or other location.
The other location can be Redshift. First, you have to write to S3 and then copy it to the Redshift.
If the location is Elastic search cluster, then the data is directly sent to the Elastic search cluster.
Kinesis Analytics
Kinesis Analytics is a service of Kinesis in which streaming data is processed and analyzed using standard SQL.
Architecture of Kinesis Analytics
We have got the kinesis firehose and kinesis stream. Kinesis Analytics allows you to run the SQL Queries of that data which exist within the kinesis firehose. You can use the SQL Queries to store the data in S3, Redshift or Elasticsearch cluster. Essentially, data is analyzed inside the kinesis using SQL type query language.
Differences b/w Kinesis Streams & Kinesis Firehose
- Kinesis stream is manually managed while Kinesis Firehose is fully automated managed.
- Kinesis stream sends the data to many services while Kinesis Firehose sends the data only to S3 or Redshift.
- Kinesis stream consists of an automatic retention window whose default time is 24 hours and can be extended to 7 days while Kinesis Firehose does not have automatic retention window.
- Kinesis streams send the data to consumers for analyzing and processing while kinesis firehose does not have to worry about consumers as kinesis firehose itself analyzes the data by using a lambda function.
How to Use Amazon Kinesis?
Following are the steps to use Amazon Kinesis −
Step 1 − Set up Kinesis Stream using the following steps −
Sign into AWS account. Select Amazon Kinesis from Amazon Management Console.
Click the Create stream and fill the required fields such as stream name and number of shards. Click the Create button.
The Stream will now be visible in the Stream List.
Step 2 − Set up users on Kinesis stream. Create New Users & assign a policy to each user.(We have discussed the procedure above to create Users and assigning policy to them)
Step 3 − Connect your application to Amazon Kinesis; here we are connecting Zoomdata to Amazon Kinesis. Following are the steps to connect.
Log in to Zoomdata as Administrator and click Sources in menu.
Select the Kinesis icon and fill the required details. Click the Next button.
Select the desired Stream on the Stream tab.
On the Fields tab, create unique label names, as required and click the Next button.
On the Charts Tab, enable the charts for data. Customize the settings as required and then click the Finish button to save the setting.
Features of Amazon Kinesis
Real-time processing − It allows to collect and analyze information in real-time like stock trade prices otherwise we need to wait for data-out report.
Easy to use − Using Amazon Kinesis, we can create a new stream, set its requirements, and start streaming data quickly.
High throughput, elastic − It allows to collect and analyze information in real-time like stock trade prices otherwise we need to wait for data-out report.
Integrate with other Amazon services − It can be integrated with Amazon Redshift, Amazon S3 and Amazon DynamoDB.
Build kinesis applications − Amazon Kinesis provides the developers with client libraries that enable the design and operation of real-time data processing applications. Add the Amazon Kinesis Client Library to Java application and it will notify when new data is available for processing.
Cost-efficient − Amazon Kinesis is cost-efficient for workloads of any scale. Pay as we go for the resources used and pay hourly for the throughput required.
Comments
Post a Comment