Monthly Archives

October 2017

How Performance Testing Improved the Nitrogen Release

By | Blog

From Sai Sindhur Malleni

One of my responsibilities as a performance engineer working on OpenStack is to make sure that OpenStack scales, no matter what the use case is or the backend technologies being used. The flexibility to use multiple open source and proprietary backends to support the various services in OpenStack such as Neutron, Cinder and Glance is what makes OpenStack a force to reckon with in the cloud ecosystem.

As OpenStack adoption increases across verticals, the SDN controller which manages all of the virtual networking is taking more of a center stage. OpenDaylight, with its flexible and extensible architecture, support for multivendor networking devices, network programmability enabling control of the underlay and overlay, and tight integration with OpenStack is becoming the de facto choice for Neutron backend in NFV deployments.

Over the last 6 weeks, several colleagues and I were involved in a massive effort, spanning multiple teams and time zones, to test and improve the scale and performance of OpenDaylight. The scope of this work is in line with the objectives of the S3P WorkGroup and we are quite happy with the progress made thus far. This blog post goes over all the hardening that went into OpenDaylight in the scale and performance realms for the Nitrogen Release.

Our lab inventory consisted of 13 Dell R630 Nodes, with Intel Haswell processors (28 cores and 56 threads), 128G of memory and  Intel x710 quad port NIC. We were using custom-built Carbon SR-2 RPMs (custom built to test some patches before they were merged). We deployed in both a clustered and standalone configuration. Two different configurations were used:

  1. 3 OpenStack controllers, 3 ODLs clustered, 1 OpenStack undercloud and the rest of the nodes as compute nodes
  2. 1 OpenStack controller, 1 ODL, 1 OpenStack undercloud and the rest of the nodes as compute nodes

Browbeat was used to orchestrate the tests with Rally, monitor the environment using Collectd/Graphite/Grafana stack and store test results in Elasticsearch. Browbeat takes a simple YAML-based configuration file of the control plane tests you want to run and orchestrates them on the OpenStack cloud. Some of the Rally scenarios that were run included creating networks, subnets, neutron ports, routers, security group rules, and booting VMs on subnets, with each done 500 “times” at varying concurrencies of  8, 16 and 32.  While the “times” denotes the total number of resources you want to create of each type, the concurrency denotes the number of resources you want to create concurrently. So, one could effectively vary the load on OpenDaylight using these two knobs.

One of the biggest issues on which we focused during this round of testing was out-of-memory errors and subsequent death of the OpenDaylight process when creating several Neutron resources at scale. Using our automation tools (Ansible/ Collectd/Graphite and Grafana), we were able to actively monitor the heap memory usage and use Eclipse MAT to analyze the hprof files dumped on OOM. Ansible was used to install collectd (light-weight daemon to monitor system resource usage) on the nodes, Graphite was the data store for this time series data and Grafana was used to visualize the data as graphs. This is a great example of using several open source technologies to make another open source technology better. The memory leaks we were observing were traced back to unclosed transactions in the openflowplugin. Our test team took this information back to the ODL upstream teams, who promptly addressed the issue with fixes included in the Nitrogen release. It was inspirational to see that level of collaboration in the community.

We also looked into the networking-odl v2 driver which is the glue that connects OpenStack Neutron to Opendaylight. A journal table is used by the driver to keep a queue of  the operations that occurred on neutron and need to be mirrored to the OpenDaylight controller. We identified several optimization possibilities in the way journaling was done. We also profiled the Galera database cluster used to house this table. With a few optimizations, we were able to achieve a 20x reduction in the CPU consumption of the mysqld process.

Clustering was another area of focus. We identified an issue where OpenStack VMs wouldn’t boot when clustered OpenDaylight configuration was being used. To give a bit of context, OpenStack considers a VM active only when the actual plumbing of the OVS interfaces and flows happens on the hypervisor, so OpenStack’s compute service Nova waits on the Neutron port to be set to active for a VM to be considered “active”. When using OpenDaylight, a websocket is used to communicate the port status information from the OpenDaylight controller to networking-odl. This in turns sets the port status in the Neutron database. In a clustered OpenDaylight setup there are 3 ODLs and 3 instances of networking-odl, one on each OpenStack controller. Haproxy assigns a Virtual IP (VIP) to the ODL cluster; this VIP could be on any one of the OpenDaylight nodes (not necessarily ODL cluster leader). However, Neutron events that trigger flow creation and activation of the operational port occur only on the leader, so when the VIP is not on the leader the websocket notifications aren’t established against the leader, causing a failure in communication of port status information between OpenDaylight and Neutron via networking-odl. The issue was fixed by having each networking-odl instance on the OpenStack controller establish a websocket with the local OpenDaylight member. This was a major win in hardening the integration points between OpenDaylight and OpenStack. A good amount of failover testing was also done to test the clustering feature of ODL.

This testing also led to the overall improvement of OpenStack as we were able to tune certain kernel parameters to help scale OpenStack to hundreds of networks and instances.  Overall, we are extremely confident that all the work that went into improving the scalability, stability and performance of OpenDaylight, means an OpenDaylight release that is better than ever before. In my colleague Daniel Farrell’s words,  “It seems to me that this was one of the more important stability improvements in ODL’s history. The combination of expert performance testers, dedicated hardware, close support from experts with direct access the testing environment and tight connections to the relevant upstream projects had never happened so well.”

I have to say that I have had a very positive experience working with the upstream OpenDaylight community. It is really heartening to see the developers, packaging team and performance engineers come together and do more than what an individual can ever do. As a performance engineer, I was thrilled to see the OpenDaylight community treat performance and scale as first class citizens and focus on them pre-release vs post-release.

 

OpenDaylight Nitrogen Release: It’s All About The Apache Karaf Upgrade (by Ryan Goulding)

By | Blog

This blog was originally published on Inocybe Technologies’ blog here.

With the release of Nitrogen, OpenDaylight continues to solidify itself as the de facto open networking controller standard.  Although the release contains widespread enhancements and improvements,  the main focus of the OpenDaylight Nitrogen release is the major version upgrade of the underlying Apache Karaf container from 3.0.8 to 4.0.9.  Among other functions, Apache Karaf is used to coordinate ODL microservices and lifecycle, provide a common logging infrastructure, enable remote management through JMX and a unix-like shell, facilitate dynamic configuration and hot deployment, and provide a common set of base resources across the product.  As such, Karaf has several responsibilities and has served as a crucial piece of infrastructure to the overall OpenDaylight architecture since its inclusion in the Helium release. It’s kind of a big deal ;)!

Since ODL Karaf dependency management is centralized in the odlparent project, an upgrade may initially seem trivial—  just a one-liner version bump, right?  Unfortunately, that is far from the case.  As previously stated, each release of Karaf provides a set of common dependencies that should remain constant across applications in the container.  This is done in order to centralize on a base set of hypothetically stable, interoperable and secure dependencies based on what is available from upstream projects at the time.  Thus, even a minor version bump of the Karaf container version in odlparent can result in the need for a number of changes cascading across consuming projects in ODL— especially if Karaf’s upstream dependencies contain API changes.

Major versions of Apache Karaf, which often contain new features, dependencies, frameworks, and API changes, are arduous and expensive to consume.  Moving to a new Karaf version becomes similar to transferring an existing building onto a new foundation;  there will be some jagged pieces along the bottom that will need work to fit properly.  With that said, it is critically important for ODL to keep current with Karaf releases in order to consume security fixes and avoid upstream dependency end-of-life scenarios.  Centralization of common ODL dependencies in odlparent eases some of the heartburn associated with cross-project upgrades.  Inclusion of the functionality to utilize version ranges in ODL, which continues to be developed, will ease upgrade woes in future releases.  Additionally, a significant amount of testing infrastructure was added in the Nitrogen release timeframe to enhance runtime dependency resolution checks and identify problems earlier and more predictably.

Another noteworthy change which occurred in the Nitrogen timeframe is the decision of odlparent contributors to disaggregate the subproject release from the traditional simultaneous release plan.  This means that odlparent now releases independently of the rest of ODL projects, and consuming projects depend on a released version rather than a snapshot.  Snapshot artifacts are prone to change from day to day, which can cause nuisance to consumers.  For example, imagine that a snippet of code compiles on Tuesday, and without any changes, fails to compile Wednesday morning.  This problem is avoided by depending on released artifacts, as released artifacts should never change without a proper version bump.  Consuming projects can treat the released odlparent artifacts as true upstream dependencies, and upgrades of those dependencies are now consumed with greater precision and control.  Breaking odlparent out of the simultaneous release additionally allows the odlparent development team to continue to perform necessary upgrades of widely used core dependencies without breaking downstream projects, enabling a disaggregated and saner development workflow.  Although odlparent is the only project to break off from the simultaneous release during the Nitrogen timeframe, contributors of other projects have expressed interest in following a similar plan in the near future.  This is exciting for developers and users alike, as it assists in breaking down the traditionally monolithic release process of OpenDaylight into more manageable chunks.

Maturity of the platform and its community is recognized by the decision to invest in an infrastructure focused release.  Focus is garnered around not only doing things, but doing them right.  Upgrade of the Karaf version in ODL Nitrogen is a double-edged sword;  the foundation of ODL is greatly improved but at the expense of requiring the sole focus of an entire, albeit shortened, release cycle to adapt to changes.  The improvements weren’t free, but were necessary in order to continue to deliver a stable and secure platform.  Fortunately for ODL application developers and consumers, several articles and code examples are available which provide assistance in transitioning to the updated ODL Nitrogen release.  Additionally, the support lifecycle of the previous release, ODL Carbon, has been extended beyond the traditional lifespan in order to help ease transition woes.

If you’d like to learn more about the OpenDaylight Nitrogen Release, the ODL Foundation’s Nitrogen blog is available here: https://www.opendaylight.org/blog/2017/09/26/opendaylight-introduces-nitrogen. Dive in and download OpenDaylight’s Nitrogen release today from: https://www.opendaylight.org/downloads.

Written by Ryan Goulding, Principal Software Engineer at Inocybe Technologies.