It seems to me this will replace Storm for many uses, and it seems they knew this and they've been putting effort into taking on exactly that sort of use case:
"Based on user feedback, Kafka seems to be the the primary means that developers are interested in ingesting data from. We rewrote the onyx-kafka plugin from scratch to essentially mirror what Storm's Kafka Spout provides. That is, the Onyx Kafka plugin will dynamically discover brokers from ZooKeeper and automatically reconnect to an In Sync Replica (ISR) if any brokers go down. We also took a little detour to create onyx-s3. onyx-s3 is an S3 reader that handles faults by check-pointing to ZooKeeper."
As a Storm user, I can confirm this. I'm eyeing Onyx with great interest. It seems like the design is cleaner, and the development environment much more reasonable than with storm.
Onyx feels more like a library then a framework, so it's not as 'heavy' as other frameworks. Also, since it relies on pure data and pure functions, it was actually trivial to get up and running with it when I was evaluating it. There was about 20 lines of boilerplate I needed to copy, and then I just built a workflow with the functions I had already been using and it all just kinda worked.
I was able to develop in the REPL, which was a huge win, and deploying is as simple as uberjar / copy to host.
In another thread recently, someone mentioned that Phoenix wasn't a good name choice for a project because its been used so many times. I feel like Onyx is the same way. This thing might displace the other Onyxes for a while only to be lost to history when some other software project eventually is also called Onyx.
As long as Onyx is easily searchable while it remains relevant, I don't see this as a problem. Yes, this particular incarnation of 'Onyx' might be lost to history after it has stopped being relevant. So what?
We at Cognician are using Onyx to calc stats for our event-sourced Datomic data. User gestures and events go in via the web server, and our Onyx workflow picks them up on the Datomic transaction report queue, runs calculations, and writes them back to Datomic.
We have around 10 workflows to store pre-calculated values and 'short-circuit' reference collections at the moment, and we're adding more all the time as we find hotspots in our web-tier Datomic queries that we want to speed up.
It's wonderful that we can use all the same notions we're familiar with in the rest of our stack (Clojure, ClojureScript, Datomic) – data-oriented, functional, immutable, dynamically typed. We get to use the same simple paradigm for the entire lifecycle of a user interaction. It's incredibly empowering.
We started with 0.5, and patiently fought through the difficulties that HornetQ produced, because despite those difficulties, it was a real pleasure to write code for Onyx. Now that 0.6 is out, with metrics, no HornetQ, a significantly faster dev mode thanks to the core.async transport, and a cleaner lifecycle API, it feels like we've been given super powers!
Michael and Lucas (the two core team members I've interacted with) are incredibly receptive to feedback and tremendously eager to help out if you get stuck, and we have learned a hell of a lot about this game from them.
If you're in Clojure at all, and you need something like this, look no further. Heck, even if you're not in Clojure, you should take a look.
We are finalizing our production tests right now. We will be using it to do all of our user event processing. (workflow: app -> kafka -> onyx -> redis && s3.
Soon after that we will move our audio upload/transcoding process into a different pipeline as well.
If you're already using clojure, this is a no-brainer if you want to do stream or batch processing.
"Based on user feedback, Kafka seems to be the the primary means that developers are interested in ingesting data from. We rewrote the onyx-kafka plugin from scratch to essentially mirror what Storm's Kafka Spout provides. That is, the Onyx Kafka plugin will dynamically discover brokers from ZooKeeper and automatically reconnect to an In Sync Replica (ISR) if any brokers go down. We also took a little detour to create onyx-s3. onyx-s3 is an S3 reader that handles faults by check-pointing to ZooKeeper."