Friday, June 27, 2014

Google Cloud Platform


Google Cloud Platform is a portfolio of cloud computing products by Google, that is offering hosting on the same supporting infrastructure that Google uses internally for end-user products like Google Search andYouTube

Google Cloud platform comprises family of Products,  each include
                                                                                       a web interface
                                                                                       a command line tool
                                                                                        REST API



Google Cloud Data Flow
        
        One of the key tools unveiled so far  aimed at helping with data handling, application development and more 
This service can be used by developers to create data pipelines  that ingest , transform and  analyse data in both batch and streaming modes

Developers can use the service to work with streaming real-time data and by uploading batches of data to the system



 Dataflow is based on a number of technologies the company has been using internally, including Flume and MillWheel.

What is Flume ..... ??  :o

The main task is the combination of high-level abstractions for parallel data and computation , deferred evaluation and optimization.These  efficient parallel primitives yields an easy-to-use system


To enable parallel operations to run efficiently, FlumeJava defers their evaluation, instead internally constructing an execution plan dataflow graph. When the final results of the parallel operations are eventually needed, FlumeJava first optimizes the execution plan, and then executes the optimized operations on appropriate underlying primitives (e.g., MapReduces). 

What is MillWheel ..... ??  :o

 MillWheel is a framework for building low-latency data-processing applications that is widely used at Google.






Java was used for the first Cloud Dataflow SDK, but it is also providing a dashboard for monitoring these pipelines right from the developer console.



Google Cloud Dataflow - The Quick Summary

  • Fully managed data pipeline service
  • Works with real-time streaming data or batch uploads
  • Designed to scale up to petabytes of data
  • Meant to address the performance issues of using MapReduce for building pipelines
  • Based on FlumeJava and Millwheel





No comments:

Post a Comment