07/01/2015 Leave a comment
For the second time in the same year, I was able to attend another major Cassandra event : the EU Summit in London in December 2014.
Again, the format was quite similar to the one in San Francisco held earlier in September, the first day is a training day and the second day is reserved for conferences.
I The training day
For this summit I chose to attend the Cassandra/Spark training and that was definitely the right choice. In the morning we had an introduction to Spark architecture and the installation, settings and tuning of Datastax Entreprise which embeds Cassandra and Spark in a single product.
The trainer (Patrick Callaghan) presents with much details all the Spark config parameters (SPARK_MASTER_OPTS, SPARK_WORKER_OPTS, SPARK_WORKER_MEMORY, SPARK_WORKER_CORES …) that will impact memory and CPU usage.
The morning finished by some crash courses on RDD concept, an introduction to the Cassandra-Spark connector and a hands on exercise to start the Spark master, workers and to connect to the REPL and default web interface as basic health check.
The afternoon session started with presentation of RDD lineage, acyclic dependency graph, lazy evaluation and persistence strategies in Spark. To be honest most of those topics are quite common but Patrick pushed the explanation into details by giving real examples of how and when persisting data with Spark can make a huge difference in term of performance for your pipeline.
There was also a big chapter on Spark partitioning/shuffling strategies and how some aggregation functions like join, grouping and sorting may result in shipping a bunch of data over the network because they change the previous partitioning.
One striking example is the very common
map() function. If you have a PairRDD (RDD consisting of a couple
map() will remove any previous partitioner and may result in data shuffling down the processing pipeline.
After the theory, the practice. The set of exercises are well designed and the difficulty level built up progressively.
In a nutshell, this training really worths to be attended.
II The conference day
The first hour of the keynote, presented by Billy Bosworth, is essentially the same as the one in San Francisco, with different people invited on stage. The second part is a small recap by Jonathan Ellis on new features in 2.1 and some announcements of features coming in 3.0:
hints storage: hints are no-longer stored in Cassandra tables but just as plain append-only files, à-la commitlog. It will improve hints delivery and avoid coordinators to be overwhelmed by hints and related compaction in case of multiple nodes failure
JSON syntax support for CQL: with 3.0 you can insert or retrieve data with JSON formatting, which greatly simplifies your day if you need to send the data directly to third parties using JSON as exchange format. It also opens the door for potentially interesting automatic object mapping leveraging existing libraries like Jackson mapper
UDF: a few words about user-defined functions and the syntax. Jonathan did not expand much on the subject because there is a complete talk on it by Robert Stupp later
Global index: a long awaited feature. Until now secondary indices are distributed locally, meaning that you suffer the fan-out phenomenon when your cluster size grows. With global index, the approach is more classic, all the partition is sitting on the same node. What’s about wide rows ? Jonathan did not give any detail on it
Lesson Learned — From SQL to Cassandra with Billions Contacts Data
- I started the first conference session of the day as speaker, with Brice Dutheil. Essentially this talk presents the work we have done at Libon to migrate data from Oracle to Cassandra.
In the first part we presented Libon business, the functional use-cases and the need to migrate to Oracle.
Then we introduce the 3-phases migration strategy with a double-run and zero downtime. We played with Cassandra timestamp to give higher priority for live production updates over the batch migration process.
We also explained in detail the strategy to mitigate code refactoring to the persistence layer only and leveraging the huge code coverage we have with existing unit and integration tests. The data model is designed to scale linearly with the growing number of users and contacts because we always take care to have user_id/contact_id as component of the partition key.
The last part of the talk focused on the tooling and data type optimization with the usage of Achilles to optimize performance and simplify the coding.
If you missed this session, the video is here.
User Defined Functions in Cassandra 3.0
This is a very interesting talk presenting the sexy feature of user-defined functions (UDF) coming in Cassandra 3.0. The talk was given by Robert Stupp, the Cassandra committer that implemented the feature!
A new CQL syntax has been introduced to allow users pushing their UDF to the server:
CREATE FUNCTION sin(value double)
Once pushed to the server, the code will be compiled and distributed to all nodes so that your UDF can be executed on any node. The UDFs can be used in the SELECT clause of your CQL queries:
SELECT sin(angle) FROM trigonometry WHERE partition = ...;
The simple UDF is used as building block for aggregate functions. The aggregate declaration syntax is:
CREATE AGGREGATE minimun (int)
STYPE int //type of the state
The initial state is set to null. For each fetched row (CQL row, not physical partition), the myUDF function is called and the state is updated with the value returned by myUDF.
You can tune further the initial state value and the final computation of the state with the extended syntax for aggregates:
CREATE AGGREGATE average (int)
INITCOND (0, 0);
Robert recommended to make your UDF pure, meaning no side effects, no socket opening or file access because the UDF code is executed server-side and side effects may cause performance degradation or worst, crash the JVM.
When asked about sand boxing, the answer is that right now there is no hard verification of the UDF code because code checking for side effect is a complex problem in it self. One can think about a white list of allowed Java packages import but it would be too restrictive and user want some time to import their own library in the UDF for custom behaviour.
Black listing pacakges or classes also proves to be hard and not enough because you can never be sure that the black list is comprehensive enough. Let’s suppose that I create my own class: MyFile, encapsulating the core java.io.File class. My custom class will completely bypass the black list and thus allow me to perform expensive side effects server-side…
Last but not least, Robert give us a quick preview of what will be possible to achieve with UDFs:
The video of the talk is now online here
Cassandra at Apple at Massive Scale
I’ve missed this talk at the San Francisco summit so I wanted to catch up. Apart from the impressive figures showing the number of Cassandra nodes deployed at Apple, the talk was too technical and focused mainly on the technical issues they encountered and how they fixed and contributed back to the code base.
How Cassandra was used at Apple and for which use-case ? No information on that. I was quite disappointed by this talk considering the initial hype. In the defence of Sankalp Kohli, the speaker, the legals have probably imposed some restrictions on the internal content that can be publicly exposed.
The video is not available, legal restrictions again I guess.
Hit the Turbo Button on Your Cassandra Application Performance
Yet another talk by the super star Patrick McFadin. This time Patrick focused on the usage of the driver and some common mistakes :
- only prepare the statements once, never re-prepare the same statements many time, it is completely useless and hurt performance
- for insert performance, use executeAsync()
- batch statement is NOT for performance, it’s for eventual atomicity
Many people get bitten by the batch. The abuse of batch statement is very bad because the job that is not done by the client application is delegated to the coordinator. For very large batches with different partition keys, the coordinator will maintain the payload in its JVM heap and block for all statements to be acknowledged by different nodes before releasing the resources. This can lead to a chain of undesired event like heap size pressure, long JVM garbage collection, node flapping etc…
The only sensible case where using batch to optimize network traffic is when all the statements inside the batch have the same partition key, which is very unlikely most of the time.
As a counter-measure to bad usage of batch, some thresholds have been introduced in the code. Above a certain batch size threshold, Cassandra will raise a warning message in the log. Above another threshold the batch will fail. Interestingly, the thresholds are not set on number of statements in a batch but on the payload size.
Then, Patrick exposed some perf improvement with the new row cache refactoring in 2.1. Now the row cache can keep the most recent cells in memory. This is well suited for time series data model where you need to access recent data.
The talk finished by an annual rant about storage. People sometimes expect the impossible with rotational disks. With around 10ms of access time at best, there is no way to push the performance of a node above some limit. SSD, in contrast, have access time an order of magnitude lower, around 70 micro second for the best. In a nutshell, you get what you pay, there is no magic.
The video of the talk is here.
Add a Bit of ACID to Cassandra
This is a quite unusual talk. People from OK.ru is presenting their own fork of Cassandra called COne that allow ACID transactions in Cassandra. Indeed they changed drastically the masterless architecture to assign roles to Cassandra nodes. There are 2 roles: update servers and storage servers. In the classical master/slave architecture, the master role is given to one server. In COne, the roles is assigned to a data center instead of a single server, so there is one DC dedicated to transaction management and another one to storage.
All reads can target directly the storage DC whereas upserts must go through the transaction DC. Consensus in this DC is achieved using QUORUM.
In addition, clients become fats because they know all about the topology and act as their own coordinator.
For the transactional part, in details, Oleg Anastasyev explained that a pessimistic locking mechanism is used in combination of an implementation of Lamport timestamp to guarantee temporal ordering of the cells. A transaction in C*One consists of several steps and can be rolled back.
The design seems innovative thought it suffers the same limitation of all master/slave architectures: all writes must go to the transaction DC. However the size of this DC can be increased. This is the trade-off Ok.ru accepts to pay for ACID transactions.
Right now COne is not open-sourced. When I asked Oleg the reason why they keep it closed source, he told me that at Ok.ru they have designed a special infrastructure (network connection, server storage …) to make COne work and this may not be easy to replicate this architecture. In one word, C*One has been designed to meet closely Ok.ru requirements and has a lot of optimizations for their business logic. It may not be a good fit to another project.
The video of the talk is here
Spotify: How to Use Apache Cassandra
This was a very interesting talk explaining how Spotify pushes the usage of Cassandra internally, empowers and gives more responsibility to the devs. “You code it, you maintain it” is the motto. Until 2013, the developers push their code into production and the SRE(site reliability engineer) teams are responsible for the cluster status in production and are on-call duty. Since 2013, the developers themselves are involved in on-call duty and the SRE teams act just facilitator/expertise for Cassandra.
This approach is the right one because unlike traditional RDBMS, with Cassandra, the data model has a crucial impact on the performance at runtime so the developers should be accountable for their data model design. And what is the best way to make them accountable other than putting then on-call duty ?
Then the speaker Jimmy Mårdell explained how managing repair in a big cluster (more than 100+ nodes) is a complex task. Fortunately those issues are mostly solved by incremental repair in Cassandra 2.1.
The end of the talk shed some light on the new DateTieredCompaction, which has been developed internally at Spotify before being merged to the open source code base. The explanation went into great detail and I found the pictures very explanatory, example worth thousand words. If you’re interested on how it is implemented, just start watching at the video here
The video of the talk is here
Having attended the EU Summit last year, I can see the real difference with this edition. The number of attendees triple, people are no longer asking whether Cassandra is a appropriate choice, they are now at the point of asking how they can leverage Cassandra in their infrastructure. I’ve also met some interesting people from the academic background with new projects using Cassandra as the Geo Sun project at the university of Reunion (French territory in Indian Ocean). The idea is to have a network of weather metrics (temperature, pressure, …) captors, saved the raw data into Cassandra, run data mining algorithms on them to provide predictions and consequently adjust the local electrical production.
This kind of projects fit into the Sensors and IoT (Internet of Things) perfect use-cases for Cassandra. Coupled with Spark and the Datastax Cassandra/Spark connector, you’ll have a powerful platform for big data ingestion and analytics.