We have promised a while back to publish the code from live coding
GridGain presentation we did at QCon London earlier this year. Since
presentation was in Scala, the code we will be posting here is in Scala.
First a brief intro. We all know Hadoop’s counting words example which
takes a file with words and then produces another file with number
of occurrences next to each word. Hadoop does this example very well,
however the main caveat with Hadoop’s example is that it is not real time.
The counting words example we did at QCon actually counted words in real
time. The program was split into two parts. First part is responsible for
loading the words in real time into GridGain data grid, and the second part
was querying the grid every 3 seconds to continuously print out top 10 words
stored so far.
The example was done using ‘Scalar‘ – GridGain DSL for Scala, but it
Few days ago I blogged about how GridGain easily supports starting many
GridGain nodes in the single JVM – which is a huge productivity boost
during the development. I’ve got a lot of requests to show the code – so
here it is (next page).
This is an example that we are shipping with upcoming 4.3 release (entire
import org.gridgain.grid.*; import org.gridgain.grid.spi.discovery.tcp.*;
import org.gridgain.grid.spi.discovery.tcp.ipfinder.*; import
org.gridgain.grid.typedef.*; import javax.swing.*; import
GridGain is Java-based middleware for in-memory processing of big data in a
distributed environment. It is based on high performance in-memory data
platform that integrates fast In-Memory MapReduce implementation with
In-Memory Data Grid technology delivering easy to use and easy to scale
software. Using GridGain you can process terabytes of data, on 1000s of nodes
in under a second.
GridGain typically resides between business, analytics, transactional or BI
applications and long term data storage such as RDBMS, ERP or Hadoop HDFS,
and provides in-memory data platform for high p... (more)
Wikibon produced an interesting material (looks like paid by Aerospike, NoSQL
database recently emerged by resurrecting failed CitrusLeaf and acquihiring
AlchemyDB, which product, of course, was recommended in the end) that
compares NoSQL databases based on storing data in flash-based SSD vs. storing
data in DRAM.
There are number of factual problems with that paper and I want to point them
Note that Wikibon doesn’t mention GridGain in this study (we are not a
NoSQL datastore per-se after all) so I don’t have any bone in this game
other than annoyance with biased and factu... (more)
One of the features in GridGain’s In-Memory Data Platform that often goes
unspoken for is ability to launch multiple GridGain nodes in the single JVM.
Now, as trivial as it sounds… can you start multiple JBoss or WebLogic or
Infinisnap or Gigaspaces or Coherence or (gulp) Hadoop 100% independent
runtimes in the single JVM? The answer is no. Even for a simple test run
you’ll have to start multiple instances on your computer (or on multiple
computers), and debug this via remotely connected debugger, different log
windows, different configurations, etc. In one word – awkward…
Not so... (more)