Cloudera Unleashes Impala: Google-Inspired Tool for Analyzing Data

  • Google has a tradition of releasing research papers detailing the software it uses to drive its online services, allowing open source projects to take advantage of its ideas.
  • A relatively new project from Google called F1 is a regional database management system (RDBMS), which the search giant uses to run its online ad system in conjunction with Spanner.
  • Google only revealed F1 last May and has yet to release a paper on the technology, but Silicon Valley startup Cloudera has already created its own open source version, called Impala.
  • Cloudera hired Marcel Kornacker, one of the main engineers on the F1 project, and has been working on Impala for two years. “Impala is a means of instantly analyzing the massive amounts of data stored in Hadoop,” an open source platform for spreading and crunching data, explains Wired.
  • Hadoop is used as a batch processing platform for various data-crunching tasks. Cloudera has brought Hadoop to the business world and now aims to more efficiently use the platform with Impala.
  • “With open source tools such as Hive, you can also analyze Hadoop data in much the same way you would query a traditional database using the common Structured Query Language, or SQL,” notes the article. “Impala lets you query the same data ‘in real-time’ — i.e., in seconds. According to Cloudera, it’s 10 times faster than a tool like Hive.”
  • Cloudera is four years old, but is only just starting to build “what I wanted to build when we started the company,” says founder Jeff Hammerbacher.

No Comments Yet

You can be the first to comment!

Leave a comment

You must be logged in to post a comment.