Globo.com Uses OpenDaylight Due to Its Community Size, Flexibility, and Well-defined APIs
Initial use case deploys ACLs to virtual switches results resulting in reduced cost, 25x increase in the number of ACLs, and 64% decrease in ACL deployment time
Grupo Globo is a privately held group of companies based in Brazil. They create, produce, and distribute quality content, with the mission of informing, entertaining, and contributing to the education of their country. Moreover, Grupo Globo aspires to be the hub for information, entertainment, and culture, where everyone meets together. The group’s consolidated net revenues through sales, services, and advertising were over $4.4B USD in 2017, making it a major presence in Brazil. The group reaches millions of viewers, listeners, readers, and internet users providing television networks, films, music, subscription television, radio services, newspapers and magazines, and online services. As an example of their scale, the group produces over 30,000 hours of journalistic and sports audio-visual content per year and the equivalent of 20 feature films of entertainment per day, has over 87 million users, and reaches over 78% of Brazil’s internet users.
Globo.com, a company in Grupo Globo, provides internet-related services and platforms to the group’s companies. The technology team utilizes sophisticated technologies ranging from software defined networking (SDN), big data, AI, cloud computing, containers, NoSQL databases, web development technologies, continuous integration/delivery, and agile software development techniques. The company has also been an enthusiastic supporter and user of open source technologies since 2001. In addition to consuming a number of upstream community projects for internal efforts, Globo.com has also been a prolific contributor. Currently, the open source team has a whopping 375 open source projects hosted on github. Some of the major cloud related projects are Tsuru, DBaaS, NetworkAPI, ACL API, FSaaS, and DNSaaS.
To serve the diverse and demanding needs of the group’s needs, Globo.com created an internal private cloud. The cloud, originally built in 2008, includes CloudStack, Xen hypervisor, OVS virtual switch, and Zabbix monitoring — all open source technologies. The cloud has evolved to offering internal customers containers using Kubernetes and bare-metal resources in addition to virtual machines. In 2017, the Globo.com team operated over 5,000 servers that moved over 3.1TB/s of data. As an early user of open source cloud technologies, Globo.com learned that to ensure reliable platform delivery in this ever-changing environment, they needed to implement practices such as automated tests, agile software development, and continuous integration. The team also uses testbeds to run experiments to support each technical decision they make.
Beyond just the sheer bandwidth, the scale of the network required to support the cloud is impressive. In 2017, it consisted of over 500 devices, 8,000 virtual networks, 6,300 VLANs, 50,000 access control lists (ACLs), and 1,700 tenant environments. To fulfill to these demands, Globo.com has been constantly innovating in the area of networking. When Globo.com created their private network, there was no automated network manager software that met their needs around different models and vendors. This led them to develop their own in-house network management API. The API has been open sourced as the NetworkAPI project and is integrated with RabbitMQ as a queue service and Celery as consumer workers. Another similar effort is the ACL API project to manage ACLs.
Figure 1: Globo.com Network Architecture
The scale of the network also creates a significant cost. As a result, Globo.com was looking for techniques to lower networking costs without compromising security, capacity, high availability, or resiliency. More specifically, they were looking for technologies that enabled efficient use of network resources and bandwidth while evolving the infrastructure based on application requirements.
Globo.com found Software Defined Networking (SDN) to be a promising solution to the above problem since it separates the networking control plane from the data plane. By using a model-driven “infrastructure-as-code” approach around the control plane, their infrastructure could evolve as fast as their applications. Network Functions Virtualization (NFV) is also a promising technology since networking can be offered as a set of services in the Globo.com cloud infrastructure.
To achieve the promise of SDN/NFV, Globo.com selected OpenDaylight as a key technology. In addition to OpenDaylight, they are using or evaluating other open source technologies to embrace a full open networking stack. These projects include Mininet network simulator, OpenConfig vendor-neutral model-driven network management, P4 language with white box switches, PNDA.io for network analysis, OPNFV to build NFV as a service, OVN as a control plane agent on top of virtual switches, POX SDN control application development environment, and Open Compute Project to use white box datacenter hardware.
Globo.com chose OpenDaylight as their main SDN controller since it has a large open source community behind it, good documentation, and well-defined interfaces for users and services. OpenDaylight is also fully modular, so it can be used as a control plane for many different use cases. Through its APIs it is possible to connect different Globo.com cloud services with very little effort. As it is pure software, it is easy to monitor, troubleshoot, and evolve rapidly.
In 2017, Globo.com had 6 OpenDaylight clusters across 56 servers. The team has also built an SDN lab where they replicate their datacenter production environment and run SDN experiments on it. They also perform partial deployments where solutions are gradually delivered to production, thus avoiding any network downtime.
OpenDaylight is already helping Globo.com with horizontal scaling through software, better distribution of applications on compute resources along datacenter racks, and deploy NFV as a service. As a consequence, the software-based infrastructure is more dynamic. Notably though, the most tangible result of OpenDaylight has been to overcome previous limitations in deploying ACLs to switches.
Globo.com uses ACLs to control access to resources in their data centers. Each rack is a Point of Delivery (POD) that terminates layer 3 networks. Before OpenDaylight, ACLs were deployed on top of rack (ToR) switches. These ACLs formerly resided in limited-capacity TCAM chips in each ToR switch. While the approach does offer performance, TCAMs are very expensive and can store only about 2,000 ACL rules — which is a small subset of the total ACLs required by Globo.com for each POD.
With that in mind, Globo.com moved forward with a bifurcated approach where OpenDaylight communicates with and deploys application ACLs over the OpenFlow protocol to virtual switches, which in turn run inside a hypervisor/physical node. The Globo Network API and the ACL API mentioned above also supplement OpenDaylight in this exercise. Generic ACLs, on the other hand, continue to be deployed on ToR switches.
This separation provides several important advantages. In the previous approach, it would take the team 25 seconds to deploy a set of 100 ACLs on 2 ToR switches. Through the OpenDaylight solution, this time has been reduced to 9 seconds for the same set of ACLs. More importantly, since the ACLs are distributed on virtual switches, the number of ACLs can be scaled dramatically and the granularity of ACL distribution has improved. ACLs on virtual switches use host memory instead of TCAMs; as a result there is more than adequate space to store the necessary number of ACLs, which currently scales up to 50,000.
Figure 2: Base Workflow of the Globo.com Solution to Deploy ACLs
Figure 3: Reduction in ACL deployment time
In addition to the ACL activity, Globo.com has also conducted load experiments on OpenDaylight with the following findings per controller:
- 300,000 individual flows
- 62 requests per second
- 222 millisecond mean request completion time
- 9 virtual switches without losing performance
Globo.com continues to better characterize these numbers and use them for both internal planning and external community education purposes.
Community Engagement and Future Direction
Globo.com has faced some scaling issues, feature and documentation gaps, and inadequate benchmarks. They also identified several API and interoperability issues with networking vendor equipment. Commendably, the team views this as an opportunity to fill these gaps by contributing to the community through blog posts, talks, meetings, and reports such as this.
For future initiatives, Globo.com is exploring using OpenDaylight for the edge. OpenDaylight is an effective control plane for this solution in which the team will automatically update peering routes to maximize egress throughput and integrate with Kubernetes BGP. Another opportunity is to start managing ToR switches through OpenConfig and automatically communicate between overlay and underlay networks.
In summary, SDN and NFV solutions together enable network transformation for Globo.com as they decouple network control from hardware appliances to permit scaling without making hardware changes or being locked to a specific vendor solution. Globo.com has chosen OpenDaylight as their primary SDN controller platform. The initial use case is to use OpenDaylight to deploy ACLs on virtual switches. The previous solution used TCAMs, and since TCAMs are small, expensive, and limited, using OpenDaylight for ACL deployment reduces the overall network hardware cost and turns the environment into a dynamic, software-based infrastructure while reducing vendor lock-in.
Grupo Globo https://grupoglobo.globo.com/
Globo.com github https://github.com/globocom
Globo.com open source contributions https://opensource.globo.com/
Open Compute Project https://www.opencompute.org/