All posts by Martin Villalba

Evaluating Netflix OSS tools using ZeroToDocker images in Azure

Introduction

ZerotoDocker is a project that allows anyone with a Docker host to run a single node of any Netflix OSS technology with a single command. The portability of Docker allows us to run the tools locally or in different cloud environments such as AWS or Azure. However, it is important to keep in mind that some of the Netflix OSS tools work only in AWS. In these cases, although we could start a Docker container running the application in other environments such as Azure, the tools won’t be able to provide the expected functionality.

If you are not familiar with Docker and you would like to read more about it, you can check out this blog post.

Available Docker Images

Netflix OSS provides Docker images for the following tools:

  • Genie
  • Inviso
  • Atlas
  • Asgard
  • Eureka
  • Edda
  • A Karyon based Hello World Example
  • A Zuul Proxy used to proxy to the Karyon service
  • Security Monkey
  • Exhibitor managed Zookeeper

It is important to keep in mind that these images are not intended to be used in production environments.

Additionally, as we mentioned before, some of the Netflix OSS services corresponding to the images offer functionalities associated exclusively with AWS:

  • Atlas: According to the Atlas wiki in the Zero to Docker repository, it appears that the Atlas image requires the AWS APIs in order to work.
  • Asgard: It offers a web interface for application deployments and cloud management in Amazon Web Services (AWS)
  • Edda: It polls AWS resources via AWS APIs and records the results.
  • Security Monkey: It monitors policy changes and alerts on insecure configurations in an AWS account.

Evaluating Netflix OSS

In this section we show how you can test Genie and the “Hello Netflix OSS” sample application in a Docker environment. This sample application is based on Karyon and interacts with Eureka and Zuul.

As we mentioned before, these images can be run in different environments. In our case, we will test them in Azure.

If you would like to know how to set up an Azure VM with Docker, please take a look at these posts:

Running Genie on Docker

This section describes the steps to set up Genie 2.2.1 using the Docker image. If you’re looking for a different version, please see the list of releases here. The steps described in this document are based on the instructions provided here.

Please, consider that the Docker image we will use is not considered production ready.

Configuration

This section describes how to set up and configure the containers required to run the example.

Setup MySQL

The first step is to set up MySQL. In order to start a new container running the MySQL image, we need to run the following command:

docker run –name mysql-genie -e MYSQL_ROOT_PASSWORD=genie -e MYSQL_DATABASE=genie -d mysql:5.6.21

 

If you don’t have the MySQL image in your host, it will be downloaded. Otherwise, the container will start using the existing image:

clip_image001

The previous command will start a container named “mysql-genie”. We’ll use that name later to reference this container from Genie in order to establish a connection.

To verify if MySQL is running properly, we can do 2 things:

  • Run the “docker ps” command to check that the container is running

    clip_image002

  • Access the MySQL container
    • Run the following command to access the MySQL container.

      docker exec -it mysql-genie mysql – -pgenie

      clip_image003

    • Additionally, we can execute the “show Databases” command to make sure the “genie” database was created:

      show Databases;

      clip_image004

      We can see that there is a database called “genie”. If we check the tables of that database, we’ll see that it is empty.

    • Run the “use genie” to use the genie database.

      use genie;

      clip_image005

    • Execute the “show Tables;” command. No information will be displayed.
    • Finally, exit the MySQL container by running “exit”.

Set up Hadoop to Run Example

We just need to run the “sequenceiq/hadoop-docker” image. In this case, we will run the command in interactive mode to be able to configure our Hadoop container and verify that everything is working as expected.

docker run –name hadoop-genie -it -p 10020:10020 -p 19888:19888 -p 211:8088 sequenceiq/hadoop-docker:2.6.0 /etc/bootstrap.sh -bash

clip_image006

Since we already have an endpoint configured for port 211 in our Azure VM, we included the port mapping “211:8088” to be able to access the Hadoop Resource manager.

Once we have Hadoop running, we will modify the /etc/hosts file. We will add “hadoop-genie” (the container name) after the container id to the first line (space separated). This will allow the daemons to resolve each other when a job is submitted in the future from the Genie node by container name.

So, we will run the following command to start editing the hosts file.

vi /etc/hosts

 

After editing the file, it should look as follows:

172.17.0.7 83893db7d234 hadoop-genie

127.0.0.1 localhost

::1 localhost ip6-localhost ip6-loopback

fe00::0 ip6-localnet

ff00::0 ip6-mcastprefix

ff02::1 ip6-allnodes

ff02::2 ip6-allrouters

 

Finally, we will start the Job History Server:

/usr/local/hadoop/sbin/mr-jobhistory-daemon.sh start historyserver

clip_image007

Running the “jps” command, we should see something like this:

bash-4.1# jps

356 SecondaryNameNode

1001 Jps

190 DataNode

514 ResourceManager

112 NameNode

598 NodeManager

932 JobHistoryServer

 

We will leave Hadoop running in the current SSH client and open a new one to start working with Genie.

Run the Genie Container

Once we have opened a new SSH client, accessed our VM and configured Docker properly, we can run the Genie container.

In our case, we have an endpoint in our Azure VM configured for port 210, so we will run the Genie container mapping public port 210 to container port 8080, which is the default Genie port.

docker run -p 210:8080 –name genie –link mysql-genie:mysql-genie –link hadoop-genie:hadoop-genie -d netflixoss/genie:2.2.1

clip_image008

Once the Genie container is running, we can verify if everything is working properly.

First, we will check the connection with the MySQL container from Genie:

  • Access the MySQL container by running:

    docker exec -it mysql-genie mysql -pgenie

  • Run the “use genie” command

    use genie;

    clip_image009

  • Finally, check the genie database tables by running:

    show tables;

    clip_image010

To exit the MySQL container, we need to run “exit”.

Once we have verified the connection with the database, we can access the Genie UI from a browser. We should be able to access Genie by accessing our VM URL and providing port 210. In our case, the URL is:

http://vm-docker-demo.cloudapp.net:210

clip_image011

As you can see, there are no clusters or jobs.

In order to finish our verification, we will check that the connection with Hadoop is working. To do this, we will access the Genie container by running:

docker exec -it genie /bin/bash

 

Then, we will ping the Hadoop container. We should see information about packages that have been sent and received successfully.

ping hadoop-genie

 

To stop the ping command, we will press “Ctrl + C”.

clip_image012

Run the example

We have everything in place, and we are ready to run the example. The example configures Genie with the Hadoop configuration information for the Hadoop container mentioned earlier as well as two commands (Hadoop and Pig). Then, it will launch a Hadoop MR job which is the example provided by Hadoop: ”hadoop jar share/hadoop/mapreduce/hadoop-mapreduce-examples-2.6.0.jar grep input output ‘dfs[a-z.]+’”.

We are already in the Genie container, so we can start to run the example.

First, we’ll execute the setup script to register the Hadoop cluster and Hadoop / Pig commands.

/apps/genie/example/setup.py

clip_image013

In order to verify that everything was registered as expected, we can go to the Genie UI and check the commands and clusters sections:

  • Home page:

    clip_image014

  • Clusters view:

    clip_image015

  • Commands view:

    clip_image016

Additionally, we can verify that “excite.log.bz2” is in HDFS by using:

hadoop fs -ls

clip_image017

Finally, if everything is OK, we can run the example job.

/apps/genie/example/run_hadoop_job.py

clip_image018

Once the Job is started, we will see it in the Genie UI:

clip_image019

If we access the Jobs section, we should see it there too:

clip_image020

We can also see the output of the Job by accessing:

http://{azure-vm-dns-name}:210/genie-jobs/{job-id}

clip_image021

Here, we can check the different logs.

When the job finishes executing, the console should resemble the following:

clip_image022

That’s it! We ran Genie and submitted a job to a Hadoop cluster. Additionally, we were able to check the different logs corresponding to the job.

Running the “Hello Netflix OSS” sample on Docker

This section describes how to run the “Hello Netflix OSS” sample image in combination with Eureka and Zuul, running the corresponding Docker images in an Azure VM.

Please take into account that the Docker images we will use are not considered production-ready.

Run Eureka

The first thing we will do is to start a container running the Eureka image. Since we have an endpoint for port 212 in our Azure VM, we will map it to container port 8080 since Eureka is accessible through that port.

docker run -d -p 212:8080 –name eureka netflixoss/eureka:1.1.147

clip_image001[6]

After running the image, we should be able to access the Eureka page through the following URL:

http://{azure-vm-dns-name}:212/eureka

In our case, the URL is:

http://vm-docker-demo.cloudapp.net:212/eureka/

clip_image002

Hello Netflix OSS sample application

Once Eureka is running, we can start the sample image:

docker run -d -p 213:8080 -p 214:8077 –name hello-netflix-oss –link eureka:eureka netflixoss/hello-netflix-oss:1.0.27

clip_image003[5]

In this case, we configured the ports to be able to access the application though port 213 and the embedded Karyon admin services console through port 214:

Additionally, we defined a link with the Eureka container to allow Eureka to communicate with the sample application container.

Zuul

Although the application is already accessible, we will start a container running Zuul to access the sample application through Zuul.

So, we need to start a container running the Zuul image and link it to Eureka.

docker run -e “origin.zuul.client.DeploymentContextBasedVipAddresses=HELLO-NETFLIX-OSS” -p 210:8080 -d –name zuul –link eureka:eureka netflixoss/zuul:1.0.28

clip_image006[6]

Here, we are making Zuul accessible through port 210 of our Azure VM. If we access the Zuul port, we will see that the sample application is displayed:

clip_image007[6]

At this point, if we check the Eureka application, it will show both applications: Zuul and the sample application.

clip_image009

In this scenario we were able to run an application based on Karyon, establish a communication with Eureka and access the application through Zuul.

Netflix OSS – Security Tools

Netflix has released different security tools and solutions to the open source community. The security-related open source efforts generally fall into one of two categories:

  • Operational tools and systems to make security teams more efficient and effective when securing large and dynamic environments
  • Security infrastructure components that provide critical security services for modern distributed systems.

Below you can find further information about some of the security tools released by Netflix.

Security Monkey

Security Monkey monitors policy changes and alerts on insecure configurations in an AWS account. While Security Monkey’s main purpose is security, it also proves a useful tool for tracking down potential problems as it is essentially a change tracking system.

It has a Docker image, but the functionality works with AWS.

Scumblr

Scumblr is a web application that allows performing periodic searches and storing/taking actions on the identified results. Scumblr uses the Workflowable gem to allow setting up flexible workflows for different types of results.

Workflowable is a gem that allows easy creation of workflows in Ruby on Rails applications. Workflows can contain any number of stages and transitions, and can trigger customizable automated actions as states are triggered.

Scumblr searches utilize plugins called Search Providers. Each Search Provider knows how to perform a search via a certain site or API (Google, Bing, eBay, Pastebin, Twitter, etc.). Searches can be configured from within Scumblr based on the options available by the Search Provider. Examples of things you might want to look for are:

  • Compromised credentials
  • Vulnerability / hacking discussion
  • Attack discussion
  • Security relevant social media discussion

Message Security Layer

Message Security Layer (MSL) is an extensible and flexible secure messaging framework that can be used to transport data between two or more communicating entities. Data may also be associated with specific users, and treated as confidential or non-replayable if so desired.

MSL does not attempt to solve any specific use case or communication scenario. Rather, it is capable of supporting a wide variety of applications and leveraging external cryptographic resources. There is no one-size-fits-all implementation or configuration; proper use of MSL requires the application designer to understand their specific security requirements.

Netflix OSS – Insight, Reliability and Performance Tools

As part of the Netflix OSS platform, Netflix has released tools to get operation insight about an application, take different kind of metrics and validate reliability by ensuring that the application can support different kinds of failures.

In this blog post we list and briefly describe some of these tools.

Atlas

Atlas was developed by Netflix to manage dimensional time series data for near real-time operational insight. Atlas features in-memory data storage, allowing it to gather and report very large numbers of metrics very quickly.

Atlas captures operational intelligence. Whereas business intelligence is data gathered for the purpose of analyzing trends over time, operational intelligence provides a picture of what is currently happening within a system.

Edda

Edda is a service that polls your AWS resources via AWS APIs and records the results. It allows you to quickly search through your resources and shows you how they have changed over time.

Previously this project was known within Netflix as Entrypoints (and mentioned in some blog posts), but the name was changed as the scope of the project grew. Edda (meaning “a tale of Norse mythology”), seemed appropriate for the new name, as our application records the tales of Asgard.

Spectator

Spectator is a simple library for instrumenting code to record dimensional time series. When running at Netflix with the standard platform, use the spectator-nflx-plugin library to get bindings for internal tools like Atlas and Chronos.

Vector

Vector is an open source on-host performance monitoring framework which exposes hand-picked, high-resolution system and application metrics to every engineer’s browser. Having the right metrics available on-demand and at a high resolution is key to understanding how a system behaves to correctly troubleshoot performance issues.

Vector provides a simple way for users to visualize and analyze system and application-level metrics in near real-time. It leverages the battle tested open source system monitoring framework, Performance Co-Pilot (PCP), layering on top a flexible and user-friendly UI.

Ice

Ice provides a bird’s-eye view of large and complex cloud landscape from a usage and cost perspective. Cloud resources are dynamically provisioned by dozens of service teams within the organization and any static snapshot of resource allocation has limited value.

Ice is a Grails project. It consists of three parts: processor, reader and UI. Processor processes the Amazon detailed billing file into data readable by reader. Reader reads data generated by processor and renders them to UI. UI queries reader and renders interactive graphs and tables in the browser.

Ice communicates with AWS Programmatic Billing Access and maintains knowledge of the following key AWS entity categories:

  • Accounts
  • Regions
  • Services (e.g. EC2, S3, EBS)
  • Usage types (e.g. EC2 – m1.xlarge)
  • Cost and Usage Categories (On-Demand, Reserved, etc.) The UI allows you to filter directly on the above categories to custom tailor your view.

Simian Army

The Simian Army is a suite of tools for keeping your cloud operating in top form. Chaos Monkey, the first member, is a resiliency tool that helps ensure that your applications can tolerate random instance failures.

Simian Army consists of services (Monkeys) in the cloud for generating various kinds of failures, detecting abnormal conditions, and testing our ability to survive them. The goal is to keep our cloud safe, secure, and highly available.

Currently the simians include Chaos Monkey, Janitor Monkey, and Conformity Monkey.

Netflix OSS – Data Persistence Tools Overview

Handling a huge amount of data operations per day required Netflix to improve existent open source software with their own tools. The cloud usage in Netflix and the scale at which it consumes/manages data has required them to build tools and services that enhance the used datastores.

In this blog post we list and briefly describe some of the tools released by Netflix to store and serve data in the cloud.

EVCache

EVCache is a memcached & spymemcached based caching solution that is mainly used for AWS EC2 infrastructure for caching frequently used data.

EVCache is an abbreviation for:

  • Ephemeral – The data stored is for a short duration as specified by its TTL (Time To Live)
  • Volatile – The data can disappear at any time (Evicted)
  • Cache – An in-memory key-value store

It offers the following features:

  • Distributed Key-Value store, i.e., the cache is spread across multiple instances
  • AWS Zone-Aware – Data can be replicated across zones
  • Registers and works with Eureka for automatic discovery of new nodes/services

Dynomite

Dynomite is a thin, distributed dynamo layer for different storages and protocols.

It is a generic dynamo implementation that can be used with many different key-value pair storage engines. Currently these include Redis and Memcached. Dynomite supports multi-datacenter replication and is designed for high availability.

The ultimate goal with Dynomite is to be able to implement high availability and cross-datacenter replication on storage engines that do not inherently provide that functionality. The implementation is efficient, not complex (few moving parts), and highly performant.

Astyanax

Astyanax is a high level Java client for Apache Cassandra. Apache Cassandra is a highly available column oriented database.

It borrows many concepts from Hector but diverges in the connection pool implementation as well as the client API. One of the main design considerations was to provide a clean abstraction between the connection pool and Cassandra API so that each may be customized and improved separately. Astyanax provides a fluent style API which guides the caller to narrow the query from key to column as well as providing queries for more complex use cases that we have encountered. The operational benefits of Astyanax over Hector include lower latency, reduced latency variance, and better error handling.

Some of the features provided by this client are:

  • High level, simple object oriented interface to Cassandra
  • Fail-over behavior on the client side
  • Connection pool abstraction. Implementation of a round robin connection pool
  • Monitoring abstraction to get event notification from the connection pool
  • Complete encapsulation of the underlying Thrift API and structs
  • Automatic retry of downed hosts
  • Automatic discovery of additional hosts in the cluster
  • Suspension of hosts for a short period of time after several timeouts
  • Annotations to simplify use of composite columns

Dyno

Dyno is the Netflix home grown java client for Dynomite. Dynomite adds sharding and replication on top of Redis and Memcached as underlying datastores and the dynomite server implements the underlying datastore protocol and presents that as its public interface. Hence, one can just use popular java clients like Jedis, Redisson and SpyMemcached to directly speak to Dynomite. Dyno encapsulates client-side complexity and best practices in one place instead of having every application repeat the same engineering effort, e.g., topology aware routing, effective failover, load shedding with exponential backoff, etc.

Dyno implements patterns inspired by Astyanax on top of popular clients like Jedis, Redisson and SpyMemcached.

Some of Dyno’s features are:

  • Connection pooling of persistent connections – this helps reduce connection churn on the Dynomite server with client connection reuse.
  • Topology aware load balancing (Token Aware) for avoiding any intermediate hops to a Dynomite coordinator node that is not the owner of the specified data
  • Application specific local rack affinity based request routing to Dynomite nodes
  • Application resilience by intelligently failing over to remote racks when local Dynomite rack nodes fail
  • Application resilience against network glitches by constantly monitoring connection health and recycling unhealthy connections
  • Capability of surgically routing traffic away from any nodes that need to be taken offline for maintenance
  • Flexible retry policies such as exponential backoff, etc.
  • Insight into connection pool metrics
  • Highly configurable and pluggable connection pool components for implementing your advanced features

Netflix OSS – Common Runtime Services & Libraries

Netflix has released as open source software several of the tools, libraries and services they use to power microservices. The cloud platform is the foundation and technology stack for the majority of the services within Netflix. This platform consists of cloud services, application libraries and application containers.

Below you can find information about the services and libraries used by Netflix that were released as open source software.

Eureka

Eureka is a REST (Representational State Transfer) based service that is primarily used in the AWS cloud for locating services for the purpose of load balancing and failover of middle-tier servers. We call this service the Eureka Server. Eureka also comes with a Java-based client component, the Eureka Client, which simplifies the interactions with the service. The client also has a built-in load balancer that does basic round-robin load balancing. At Netflix, a much more sophisticated load balancer wraps Eureka to provide weighted load balancing based on several factors (like traffic, resource usage, error conditions, etc.) to provide superior resiliency.

Apart from playing a critical part in mid-tier load balancing, at Netflix, Eureka is used for the following purposes.

  • For aiding Netflix Asgard in:
    • Fast rollback of versions in case of problems, avoiding the relaunch of hundreds of instances which could take a long time
    • Rolling pushes, to avoid propagation of a new version to all instances in case of problems
  • For Cassandra deployments to take instances out of traffic for maintenance
  • For Memcached caching services to identify the list of nodes in the ring
  • For carrying other additional application specific metadata about services for various other reasons

Archaius

Archaius is a configuration management library with a focus on Dynamic Properties sourced from multiple configuration stores. It includes a set of configuration management APIs used by Netflix. It is primarily implemented as an extension of Apache’s Commons Configuration Library.

It provides the following functionalities:

  • Dynamic, Typed Properties
  • High throughput and Thread Safe Configuration operations
  • A polling framework that allows obtaining property changes of a Configuration Source
  • A Callback mechanism that gets invoked on effective/”winning” property mutations (in the ordered hierarchy of Configurations)
  • A JMX MBean that can be accessed via JConsole to inspect and invoke operations on properties
  • Out of the box, Composite Configurations (With ordered hierarchy) for applications (and most web applications willing to use convention based property file locations)
  • Implementations of dynamic configuration sources for URLs, JDBC and Amazon DynamoDB
  • Scala dynamic property wrappers

At the heart of Archaius is the concept of a Composite Configuration which can hold one or more Configurations. Each Configuration can be sourced from a Configuration Source such as: JDBC, REST, .properties file, etc.

Ribbon

Ribbon is an Inter Process Communication (remote procedure calls) library with built-in software load balancers. The primary usage model involves REST calls with various serialization scheme support.

Ribbon is a client-side IPC library that is battle-tested in cloud. It provides the following features:

  • Load balancing
  • Fault tolerance
  • Multiple protocol (HTTP, TCP, UDP) support in an asynchronous and reactive model
  • Caching and batching

There are three sub projects:

  • ribbon-core: includes load balancer and client interface definitions, common load balancer implementations, integration of client with load balancers and client factory
  • ribbon-eureka: includes load balancer implementations based on Eureka client, which is the library for service registration and discovery
  • ribbon-httpclient: includes the JSR-311 based implementation of REST client integrated with load balancers

Hystrix

Hystrix is a latency and fault tolerance library designed to isolate points of access to remote systems, services and 3rd party libraries, stop cascading failure and enable resilience in complex distributed systems where failure is inevitable.

Hystrix is designed to do the following:

  • Give protection from and control over latency and failure from dependencies accessed (typically over the network) via third-party client libraries
  • Stop cascading failures in a complex distributed system
  • Fail fast and rapidly recover
  • Fallback and gracefully degrade when possible
  • Enable near real-time monitoring, alerting, and operational control

Karyon

Karyon is a framework and library that essentially contains the blueprint of what it means to implement a cloud-ready web service. All the other fine grained web services and applications that form our SOA graph can essentially be thought of as being cloned from this basic blueprint.

Karyon can be thought of as a nucleus that contains the following main ingredients:

  • Bootstrapping , dependency and Lifecycle Management (via Governator)
  • Runtime Insights and Diagnostics (via karyon-admin-web module)
  • Configuration Management (via Archaius)
  • Service discovery (via Eureka)
  • Powerful transport module (via RxNetty)

Governator

Governator is a library of extensions and utilities that enhance Google Guice to provide classpath scanning and automatic binding, lifecycle management, configuration to field mapping, field validation and parallelized object warmup.

Prana

Pana is a sidecar for your NetflixOSS based services. It simplifies the integration with NetflixOSS services since it exposes Java based client libraries of various services like Eureka, Ribbon, and Archaius over HTTP. It makes it easy for applications especially written in Non-JVM languages to exist in the NetflixOSS eco-system.

Prana is a Karyon & RxNetty based application that exposes features of java-based client libraries of various NetflixOSS services over an HTTP API. It is conceptually “attached” to the main application and complements it by providing features that are otherwise available as libraries within a JVM-based application.

Prana is used extensively within Netflix alongside applications built in non-jvm programming language like Python and NodeJS or services like Memcached, Spark, and Hadoop.

Between Prana’s features we can mention:

  • Advertising applications via the Eureka Service Discovery Service
  • Discovery of hosts of an application via Eureka
  • Health Check of services
  • Load Balancing http requests via Ribbon
  • Fetching Dynamic Properties via Archaius

Zuul

Zuul is an edge service that provides dynamic routing, monitoring, resiliency and security among other things. It is the front door for all requests from devices and web sites to the backend of the Netflix streaming application. As an edge service application, Zuul is built to enable dynamic routing, monitoring, resiliency and security. It also has the ability to route requests to multiple Amazon Auto Scaling Groups.

Zuul uses a range of different types of filters that enables us to quickly and nimbly apply functionality to our edge service. These filters help the people from Netflix to perform the following functions:

  • Authentication and Security – identifying authentication requirements for each resource and rejecting requests that do not satisfy them
  • Insights and Monitoring – tracking meaningful data and statistics at the edge in order to give us an accurate view of production
  • Dynamic Routing – dynamically routing requests to different backend clusters as needed
  • Stress Testing – gradually increasing the traffic to a cluster in order to gauge performance
  • Load Shedding – allocating capacity for each type of request and dropping requests that exceed the limit
  • Static Response handling – building some responses directly at the edge instead of forwarding them to an internal cluster
  • Multiregion Resiliency – routing requests across AWS regions in order to diversify our ELB usage and move our edge closer to our members

At the center of Zuul is a series of Filters that are capable of performing a range of actions during the routing of HTTP requests and responses.
The following are the key characteristics of a Zuul Filter:

  • Type: most often defines the stage during the routing flow when the Filter will be applied (although it can be any custom string)
  • Execution Order: applied within the Type, defines the order of execution across multiple Filters
  • Criteria: the conditions required in order for the Filter to be executed
  • Action: the action to be executed if the Criteria is met

Zuul provides a framework to dynamically read, compile, and run these Filters. Filters do not communicate with each other directly – instead they share state through a RequestContext which is unique to each request.

Zuul contains multiple components:

  • zuul-core– library which contains the core functionality of compiling and executing Filters
  • zuul-simple-webapp– webapp which shows a simple example of how to build an application with zuul-core
  • zuul-netflix– library which adds other NetflixOSS components to Zuul – using Ribbon for routing requests, for example
  • zuul-netflix-webapp – webapp which packages zuul-core and zuul-netflix together into an easy-to-use package

Netflix OSS – Build and Delivery Tools Overview

Between the Build and Delivery tools released by Netflix as part of the Netflix OSS platform you can find build resources such as Nebula (which makes Gradle plugins easy to build, test and deploy) and tools to manage resources in AWS and to support deployments to this platform.

Below you can find a brief description of some of the build and delivery tools released by Netflix.

Nebula

The nebula-plugins organization was set up to facilitate the generation, governance, and releasing of Gradle plugins. It is done by providing a space to host plugins, in SCM, CI, and a Repository. A single GitHub organization is used, for which anyone or any plugin can be added. Cloudbees jobs are created for every plugin to provide a standard set of jobs. Releases are posted to bintray, proxied to JCenter and synced to Maven Central.

Aminator

Aminator is a tool for creating EBS AMIs. This tool currently works for CentOS/RedHat Linux images and is intended to run on an EC2 instance.
It creates a custom AMI from just:

  • A base ami ID
  • A link to a deb or rpm package that installs your application.

This is useful for many AWS workflows, particularly ones that take advantage of auto-scaling groups.

Asgard

Asgard is a web-based tool for managing cloud-based applications and infrastructure. It offers a web interface for application deployments and cloud management in Amazon Web Services (AWS).

Netflix has been using Asgard for cloud deployments since early 2010. It was initially named the Netflix Application Console.

Netflix OSS – Big Data Tools Overview

Behind the scenes, Netflix has a rich ecosystem of big data technologies facilitating their algorithms and analytics. They use and contribute to broadly-adopted open source technologies including Hadoop, Hive, Pig, Parquet, Presto, and Spark. Additionally, they have developed and contributed some additional tools and services which have further elevated their data platform.

Below you can find information about some tools of the Netflix OSS platform that offer functionalities associated to big data.

Genie

Genie is a federated job execution engine developed by Netflix. Genie provides REST-ful APIs to run a variety of big data jobs like Hadoop, Pig, Hive, Presto, Sqoop and more. It also provides APIs for managing many distributed processing cluster configurations and the commands and applications which run on them.

From the perspective of the end-user, Genie abstracts away the physical details of various (potentially transient) computational resources (like YARN, Spark, Mesos clusters etc.). It then provides APIs to submit and monitor jobs on these clusters without users having to install any clients themselves or know details of the clusters and commands.

A big advantage of this model is the scalability that it provides for client resources. This solves a very common problem where a single machine is configured as an entry point to submit jobs to large clusters and the machine gets overloaded. Genie allows the use of a group of machines which can increase and decrease in number to handle the increasing load, providing a very scalable solution.

Within Netflix it is primarily used to handle scaling out jobs for their big data platform in the cloud, but it doesn’t require AWS or the cloud to benefit users.

Inviso

Inviso is a lightweight tool that provides the ability to search for Hadoop jobs, visualize the performance, and view cluster utilization.

This tool is based on the following components:

  • REST API for Job History: REST endpoint to load an entire job history file as a json object
  • ElasticSearch: Search jobs and correlate Hadoop jobs for Pig and Hive scripts
  • Python Scripts: Scripts to index job configurations into ElasticSearch for querying. These scripts can accommodate a pub/sub model for use with SQS or another queuing service to better distribute the load or allow other systems to know about job events.
  • Web UI: Provides an interface to search and visualize jobs and cluster data

Lipstick

Lipstick combines a graphical depiction of a Pig workflow with information about the job as it executes, giving developers insight that previously required a lot of sifting through logs (or a Pig expert) to piece together.

Aegisthus

Aegisthus is a Bulk Data Pipeline out of Cassandra. It implements a reader for the SSTable format and provides a map/reduce program to create a compacted snapshot of the data contained in a column family.

Netflix OSS Overview

Netflix is considered one of the biggest cloud applications out there. As such, the people at Netflix have faced many different kinds of challenges trying to avoid failures in their service. Over time, they implemented different tools to support and improve their cloud environment, making the Netflix application more reliable, fault-tolerant and highly available.

The really good news is that Netflix has made some of these tools open source. There are now tools available to make a cloud environment more reliable coming from a company that is using them in a huge infrastructure. One thing to consider is that Netflix utilizes AWS for services and content delivery and, as a consequence, some of the implemented tools provide functionalities for this particular cloud environment. However, other tools offer more generic features that can be used in other environments.

It is important to consider that, although Netflix has embraced the open source concept, the shared code provides solutions for computing. The company is not sharing any of its innovations and technology around streaming video.

Open Source tools

The different tools released as part of the Netflix OSS platform can be categorized according to the functionality they provide. In this section you can find a brief description of these categories.

Additionally, you can find further information about the different categories and the associated tools in the blog posts that are referenced at the end of each category description. It is important to keep in mind that, at this point, more than 50 projects can be found in the Netflix OSS GitHub repository. As a consequence, we list and describe only the main tools corresponding to each category.

Big Data Tools

Behind the scenes, Netflix has a rich ecosystem of big data technologies facilitating their algorithms and analytics. They use and contribute to broadly-adopted open source technologies including Hadoop, Hive, Pig, Parquet, Presto, and Spark. Additionally, they have developed and contributed some additional tools and services which have further elevated their data platform.

To know more about the main tools of the Netflix OSS platform that offer functionalities associated to big data, please check out the Netflix OSS – Big Data tools Overview blog post.

Build and Delivery Tools

In this category you can find build resources such as Nebula, which makes Gradle plugins easy to build, test and deploy. Additionally, this category includes tools to manage resources in AWS and to support deployments to this platform.

In the Netflix OSS – Build and Delivery Tools Overview blog post you can find more information about the available tools.

Common Runtime Services & Libraries

In this category you can find tools, libraries and services to power microservices. The cloud platform is the foundation and technology stack for the majority of the services within Netflix. This platform consists of cloud services, application libraries and application containers.

In this category you can find tools, libraries and services to power microservices. The cloud platform is the foundation and technology stack for the majority of the services within Netflix. This platform consists of cloud services, application libraries and application containers.

Take a look at the Netflix OSS – Common Runtime Services & Libraries Overview blog post to know more about the services and libraries used by Netflix that were released as open source software.

Data Persistence Tools

Handling a huge amount of data operations per day required Netflix to improve existent open source software with their own tools. The cloud usage in Netflix and the scale at which it consumes/manages data has required them to build tools and services that enhance the used datastores.

In this category you will find tools to store and serve data in the cloud. Take a look at the Netflix OSS – Data Persistence Tools Overview blog post to read more about these tools.

Insight, Reliability and Performance Tools

In this category you can find tools to get Operation Insight about an application, take different kind of metrics and validate reliability by ensuring that the application can support different kinds of failures.

In the Netflix OSS – Insight, Reliability and Performance Tools Overview blog post you can find more information about these tools.

Security Tools

Netflix has released different security tools and solutions to the open source community. The security-related open source efforts generally fall into one of two categories:

  • Operational tools and systems to make security teams more efficient and effective when securing large and dynamic environments
  • Security infrastructure components that provide critical security services for modern distributed systems.

Check out the Netflix OSS – Security Tools Overview blog post to find further information about some of the security tools released by Netflix.

Getting Started

There are different ways to start working with the Netflix OSS tools.

The Zero to Cloud workshop offers a tutorial focused on bringing up the Netflix OSS stack on a fresh AWS account, in a similar style to how Netflix does it internally. To try it, you would need to have an AWS account and the required resources to set up the infrastructure.

Another way to start playing with the Netflix OSS tools is to analyze sample applications such as IBM ACME Air and Flux Capacitator. These applications use several of the Netflix OSS tools, so they could be useful to understand how the tools can be used outside Netflix. In this case, you may also need to have the proper cloud infrastructure to set up the tools and execute them.

Finally, the fastest way to test some of the Netflix OSS tools is to use ZeroToDocker. If you are familiar with Docker, you can use the Docker images provided by Netflix to get some of the tools up and running in just a few minutes. Additionally, since some of the tools do not require AWS to work, you can run and test them in other cloud environments or locally.

Docker Compose: Scaling Multi-Container Applications

Introduction

In the Docker Compose: Creating Multi-Container Applications blog post we talked about Docker Compose, the benefits it offers, and used it to create a multi-container application. Then, in the Running a .NET application as part of a Docker Compose in Azure blog post, we explained how to create a multi-container application composed by a .NET web application and a Redis service. So far, so good.

However, although we can easily get multi-container applications up and running using Docker Compose, in real environments (e.g. production) we need to ensure that our application will continue responding even if it receives numerous requests. In order to achieve this, those in charge of configuring the environment usually create multiple instances of the web application and set up a load balancer in front of them. So, the question here is: Could we do this using Docker Compose? Fortunately, Docker Compose offers a really simple way to create multiple instances of any of the services defined in the Compose.

Please notice that although Docker Compose is not considered production ready yet, the goal of this post is to show how a particular service can be easily scaled using this feature so you know how to do it when the final version is released.

Running applications in more than one container with “scale”

Docker Compose allows you to generate multiple containers for your services running the same image. Using the “scale” command in combination with a load balancer, we can easily configure scalable applications.

The “scale” command sets the number of containers to run for a particular service. For example, if we want to run a front end web application in 10 different containers we would use this command.

Considering the scenario we worked on in the Running a .NET application as part of a Docker Compose in Azure blog post, how could we scale the .NET web application service to run in 3 different containers at the same time? Let’s see…

Check/update the docker-compose.yml file

The first thing we need to do is ensure that the service we want to scale does not specify the external/host port. If we specify that port, the service cannot be scaled since all the instances would try to use the same host port. So, we just need to make sure that the service we want to scale only defines the private port in order to let Docker choose a random host port when the container instances are created.

But, how do we specify only the private port? The port value can be configured as follows:

  • If we want to specify the external/host port and the private port, the “ports” configuration would look like this:
    “<external-port>:<private-port>”
  • If we want to specify only the private port, this would be the “ports” configuration:
    “<private-port>”

In our scenario, we want to scale the .NET web application service called “net“; therefore, that service should not specify the external port. As you can see in our docker-compose.yml file displayed below, the ports specification for the “net” service only contains one port, which is the private one. So, we are good to go.

net:

  image: websiteondocker

  ports:

   – “210”

  links:

   – redis

redis:

  image: redis

 

Remember that the private port we specify here must be the one we provided when we published the .NET application from Visual Studio since the application is configured to work on that port.

Scaling a service

Now that we have the proper configuration in the docker-compose.yml file, we are ready to scale the web application.

If we don’t have our Compose running or have modified the docker-compose.yml file, we would need to recreate the Compose by running “docker-compose up -d“.

Once we have the Compose running, let’s check the containers we have running as part of the Compose by executing “docker-compose ps“:

clip_image002

As you can see, there is one container running that corresponds to the “net” service (.NET web application) and another container corresponding to the Redis service.

Now, let’s scale our web application to run in 3 containers. To do this, we just need to run the scale command as follows:

docker-compose scale net=3

 

In the previous command, “net” is the name of the service that we want to scale and “3” is the amount of instances we want. As a result of running this command, 2 new containers running the .NET web application will be created.

clip_image002[7]

If we check the Docker Compose containers now, we’ll see the new ones:

clip_image003

We need to consider that Docker Compose remembers the amount of instances set in the scale command. So, from now on, every time we run “docker-compose up -d” to recreate the Compose, 3 containers running the .NET web application will be created. If we only want 1 instance of the web application again, we can run “docker-compose scale net=1“. In this case, Docker Compose will delete the extra containers.

At this point, we have 3 different containers running the .NET web application. But, how hard would it be to add a load balancer in front of these containers? Well, adding a load balancer container is pretty easy.

Configuring a load balancer

There are different proxy images that offer the possibility of balancing the load between different containers. We tested one of them: tutum/haproxy.

When we created the .NET web application, we included logic to display the name of the machine where the requests are processed:

@{

    ViewBag.Title = “Home Page”;

}

<h3>Hits count: @ViewBag.Message</h3>

<h3>Machine Name: @Environment.MachineName</h3>

 

So, once we set a load balancer in front of the 3 containers, the application should display different container IDs.

Let’s create the load balancer. In our scenario, we can create a new container using the tutum/haproxy image to balance the load between the web application containers by applying any of the following methods:

  • Manually start the load balancer container:
    We can manually start a container running the tutum/haproxy image by running the command displayed below. We would need to provide the different web app container names in order to indicate to the load balancer where it should send the requests.

docker run -d -p 80:80 –link <web-app-1-container-name>:<web-app-1-container-name> –link <web-app-2-container-name>:<web-app-2-container-name> … –link <web-app-N-container-name>:<web-app-N-container-name> tutum/haproxy

 

  • Include the load balancer configuration as part of the Docker Compose:
    We can update the docker-compose.yml file in order to include the tutum/haproxy configuration. This way, the load balancer would start when the Compose is created and the site would be accessible just by running one command. Below, you can see what the configuration corresponding to the load balancer service would look like. The “haproxy” service definition specifies a link to the “net” service. This is enough to let the load balancer know that it should distribute the requests between the instances of the “net” service, which correspond to the .NET web application.

haproxy:

  image: tutum/haproxy

  links:

   – net

  ports:

   – “80:80”

 

In our scenario, we will apply the second approach since it allows us to start the whole environment by running just one command. Although in general we think that it is better to include the load balancer configuration in the Compose configuration file, please keep in mind that starting the load balancer together with the rest of the Compose may not always be the best solution. For example, if you scaled the web application service adding new instances and you want the load balancer to start considering those instances without the site being down too long, restarting the load balancer container manually may be faster than recreating the whole compose.

Continuing with our example, let’s update the “docker-compose.yml” file to include the “haproxy” service configuration.

First, open the file:

vi docker-compose.yml

 

Once the file opens, press i (“Insert”) to start editing the file. Here, we will add the configuration corresponding to the “haproxy” service:

haproxy:

  image: tutum/haproxy

  links:

   – net

  ports:

   – “80:80”

net:

  image: websiteondocker

  ports:

   – “210”

  links:

   – redis

redis:

  image: redis

 

Finally, to save the changes we made to the file, just press Esc and then :wq (write and quit).

At this point, we are ready to recreate the Compose by running “docker-compose up -d“.

image

As you can see in the previous image, the existing containers were recreated and additionally, a new container corresponding to the “haproxy” service was created.

So, Docker Compose started the load balancer container, but is the site working? Let’s check it out!

First, let’s look at the container we have running:

image

As you can see, the load balancer container is up and running in port 80. So, since we already have an endpoint configured in our Azure VM for this port, let’s access the URL corresponding to our VM.

clip_image001

The site is running! Please notice that the Container ID is displayed on the page. Checking the displayed value against the result we got from the “docker ps” command, we can see that the request was processed by the “netcomposetest_net_3” container.

If we reload the page, this time the request should be processed by a different container.

clip_image002

This time, the request was processed by the “netcomposetest_net_4” container.

At this point we have validated that the .NET web application is running in different containers and that the load balancer is working. Plus, we have verified that all the containers are consuming information from the same Redis service instance since, as you can see, the amount of hits increased even when the requests were processed by different web application instances.

Now, what happens if we need to stop one of the web application containers? Do we need to stop everything? The answer is “No”. We can stop a container, and the load balancer will notice it and won’t send new requests to that container. The best thing here is that the site continues running!

Let’s validate this in our example. Since we have 3 web application containers running, we can stop 2 of them and then try to access the site.

To stop the containers, we can run the “docker stop <container-name>” command. Looking at the result we got from the “docker ps” command, we can see that our containers are called “netcomposetest_net_3“, “netcomposetest_net_4” and “netcomposetest_net_5“. Let’s stop the “netcomposetest_net_3” and “netcomposetest_net_4” containers.

clip_image003[1]

Now, if we reload the page, we will see that the site is still working!

clip_image004[1]

This time the request was processed by the only web application container we have running: “netcomposetest_net_5“.

If we keep reloading the page, we will see that all the requests are processed by this container.

clip_image005

Running a .NET application as part of a Docker Compose in Azure

Introduction

In the “Docker Compose: Creating Multi-Container Applications” blog post we talked about Docker Compose, the benefits it offers, and used it to create a multi-container application based on the example provided in the Quick Start section of the Overview of Docker Compose page. The example consists of a Python web application that consumes information from a Redis service.

Since at our company we work mainly with Microsoft technologies, we wondered, “How hard would it be to configure a similar scenario for a .NET 5 web application?” Let’s find out!

Required configuration

The first thing we need is a Linux virtual machine with Docker running in Azure. We already have one in place, but in case you don’t, in the post “Getting Started with Docker in Azure” we explain how to create a Virtual Machine in Azure directly from Visual Studio 2015 and deploy a .NET application to a Docker container in that machine.
And, of course, in order to work with Docker Compose, we need to have the feature installed in the VM. If you haven’t installed it yet, you can follow the steps outlined in the “Docker Compose: Creating Multi-Container Applications” blog post.

Preparing our .NET application

Once we have the Virtual Machine running in Azure with Docker and Docker Compose installed, we are ready to start working. So, let’s open Visual Studio 2015 and create a new ASP.NET Web Application project using the Web Site template from the available ASP.NET Preview Templates.

clip_image001[4]

Now that we have a .NET 5 application to start working on, we will modify it to reproduce the scenario we saw with Python.

As we did with the Python application, we will configure our .NET application to get information from a Redis service and also update its information. This way, we can reproduce the scenario of getting the information from Redis that tells us how many times the web page was accessed, increase the value and store the updated value back in Redis.

The first thing we need to solve here is how we should set up the communication between our web site and the Redis service without making the Redis port public. To achieve this, we will configure the Compose to set up a link between our app and the Redis service (we’ll explain how to do this later). If we specify that link, when the Compose is created, an environment variable called “REDIS_PORT_6379_TCP_ADDR” is generated by Docker. We will simply use this variable to establish communication with Redis. Let’s see what the resulting code looks like.

The following is the code corresponding to the Index action of the Home controller.

public IActionResult Index()

{

ViewBag.Message = this.GetHitsFromRedis();

return View();

}

public int GetHitsFromRedis()

{

int hits = 0;

var db = Environment.GetEnvironmentVariable(“REDIS_PORT_6379_TCP_ADDR”);

using (var redisClient = new RedisClient(db))

{

try

{

hits = redisClient.Get<Int32>(“hits”);

hits++;

redisClient.Set<Int32>(“hits”, hits);

return hits;

}

catch (Exception ex)

{

return -1;

}

}

}

 

The logic displayed above is getting the value corresponding to the “REDIS_PORT_6379_TCP_ADDR” environment variable, using it to create a Redis client, and finally using the client to get the hits value and store the updated value back to Redis.

In addition to updating the controller, we also performed some changes to the Index view and the layout to remove the HTML created by default.

For the Index view, we removed the default code and added logic to display the amount of hits (accesses to the page) obtained from the controller and the name of the machine where the web application is running. We added this second line in order to validate that the site is running in the container created by the Compose.

@{

    ViewBag.Title = “Home Page”;

}

<h3>Hits count: @ViewBag.Message</h3>

<h3>Machine Name: @Environment.MachineName</h3>

 

Regarding the layout of the site, we removed unnecessary code and left the following:

@inject IOptions<AppSettings> AppSettings

<!DOCTYPE html>

<html>

    <head>

        <meta charset=”utf-8″ />

        <meta name=”viewport” content=”width=device-width, initial-scale=1.0″ />

        <title>@ViewBag.Title – @AppSettings.Options.SiteTitle</title>

    </head>

    <body>

        <div class=”container body-content”>

            @RenderBody()

            <hr />

            <footer>

                <p>&copy; 2015 – @AppSettings.Options.SiteTitle</p>

            </footer>

        </div>

        @RenderSection(“scripts”, required: false)

    </body>

</html>

 

Once we finish updating the .NET application, we need to republish it in order to update the Docker image in the VM.

clip_image002

When the application is published, you can configure it to use any port you want. However, you need to consider that the image will be parametrized to use that port, so when we instantiate the containers to run the image without overriding the entry point, the application will use the port we configured when we published it.

Configure a Docker Compose to use the .NET web application

In order to start configuring the Compose, we will create a new directory for the required files.

We will call this new directory “netcomposetest” and create it by running:

mkdir netcomposetest

 

Then, we need to change the current directory in order to work in the new one:

cd netcomposetest

 

Once we are working in the proper directory, we will create the “docker-compose.yml” file required to create a Docker Compose. To create the file we just need to run:

> docker-compose.yml

 

Finally, we will use vi to edit the file by running:

vi docker-compose.yml

 

Once the file opens, we press i (“Insert”) to start editing the file.

In this file we need to define the services that will be part of the Docker Compose. In our case, we will configure the .NET application and the Redis service:

net:

  image: websiteondocker

  ports:

   – “210”

  links:

   – redis

redis:

  image: redis

 

As you can see, we configured a service called “net” to use the image corresponding to our .NET application. We configured it to use the same port we used when we published the app. In this case, we just provided the private port. When Docker Compose creates the container for the web app, it will assign a public port and will map it to private port 210. Additionally, below the ports specification, we defined a link to the Redis service.

Finally, to save the changes we made to the file, we just need to press Esc and then :wq (write and quit).

At this point we should be ready to create the Compose.

In order to later check that the expected containers were created, we will run the “docker ps” command to see what containers we have running before starting the Compose:

clip_image002[6]

As you can see in the previous image, the only container we have running is the one corresponding to the .NET application we published from Visual Studio.

After checking the list of running containers, we can go ahead and run the following command to create the Docker Compose:

docker-compose up -d

 

Upon running this command, the containers should be created.

clip_image001

Immediately after running the command, we ran “docker ps” to validate that the containers were running.

Although the expected containers were listed the first time we ran “docker ps“, when we ran that command again a few seconds later, the container corresponding to the .NET application was not listed anymore. We then ran the “docker ps” command adding the “-a” option to see the stopped containers, and we saw that the status of the .NET app container created by the Compose was “Exited“.

image

To figure out what happened, we ran the logs command for that container and confirmed that it had crashed:

image

While researching this issue, we found a similar issue and a workaround. The proposed workaround consists of modifying the .NET application entry point defined in the Dockerfile.

The original Dockerfile generated by Visual Studio specifies a specific entry point:

FROM microsoft/aspnet:vs-1.0.0-beta4

ADD . /app

WORKDIR /app/approot/src/{{ProjectName}}

ENTRYPOINT [“dnx”, “.”, “Kestrel”, “–server.urls”, “http://localhost:{{DockerPublishContainerPort}}“]

 

The workaround consists of applying a “sleep” before the “dnx” as follows:

FROM microsoft/aspnet:vs-1.0.0-beta4

ADD . /app

WORKDIR /app/approot/src/{{ProjectName}}

ENTRYPOINT sleep 10000000 | dnx . Kestrel –server.urls http://localhost:{{DockerPublishContainerPort}}

 

After applying this change to the Dockerfile, we republished the application from Visual Studio and ran the “docker-compose up -d” command again in order to regenerate the containers and use the new .NET application image. This time the container corresponding to the .NET app didn’t crash.

In order to validate that the site is running and working as expected, we could create an endpoint in the Azure VM for the .NET application port, but to make it simpler we will start a proxy (tutum/haproxy) in port 80 pointing to the .NET app container. To do this we need to know the container name, so we need to run “docker ps” (we could also run “docker-compose ps“):

clip_image002[8]

Since the name of the .NET application container is “netcomposetest_net_1“, we can start the proxy by running:

docker run -d -p 80:80 –link netcomposetest_net_1:netcomposetest_net_1  tutum/haproxy

 

After running this command, a new container should be created:

clip_image002[10]

Now, the .NET web application should be available in port 80:

clip_image003[4]

As you can see, the displayed machine name is the ID corresponding to the .NET app container.

And if we refresh the page, the hits count should increase.

clip_image004

Since the hits count increased as expected, we know that the application was able to connect to Redis and retrieve/update values.

That’s it! We have our .NET web application running as part of a multi-container application created using Docker Compose.