.NET Core: Cross Platform – Windows

The .NET Execution Environment (DNX) is a software development kit (SDK) and runtime environment that has everything you need to build and run .NET applications for Windows, Mac and Linux.

However, package managers are the key component that have completely changed the face of modern software development and they’re very tied together in the .NET Core main tools: DNVM, DNU and DNX.

.NET Version Manager (DNVM)

The .NET Version Manager helps retrieving versions of the DNX, using NuGet, and allowing you to switch between versions when you have multiple on your machine. DNVM is simply a way to manage and download NuGet packages and it is a set of command line utilities to update and configure which .NET Runtime to use.
You can use the PowerShell script below to install DNVM. It just downloads an executes the dnvminstall.p1 script from the ASPNET/Home repo.

@powershell -NoProfile -ExecutionPolicy unrestricted -Command "&{$Branch='dev';iex ((new-object net.webclient).DownloadString('https://raw.githubusercontent.com/aspnet/Home/dev/dnvminstall.ps1'))}"

It just installs the dnmv.ps1 command line tool and adds it to the %PATH% environment variable. No version of DNX will be installed at this point. Also, it is recommended to check and upgrade to the latest version of DNVM using the following command:

dnvm upgrade



DNVM solves the bootstrapping problem of getting and selecting the correct version of the DNX to run. You’ll find that the NuGet gallery hosts cross platform versions of DNX:

.NET Execution Environment (DNX)

The .NET Execution Environment (DNX) contains the code required to bootstrap and run an application. This includes things like the compilation system, SDK tools, and the native CLR hosts.

DNX provides a consistent development and execution environment across multiple platforms (Windows, Mac and Linux) and across different .NET flavors (.NET Framework, .NET Core and Mono).

It is easy to install the .NET Core version of DNX, using the DNVM install command:

dnvm install -r coreclr latest


You can then use dnvm to list and select the active DNX version in your system (In my case, the latest version of DNX CoreCLR is 1.0.0-beta8)

dnvm use 1.0.0-beta8 -r coreclr

dnvm list


Hello world

A DNX project is simply a folder with a project.json file. The name of the project is the folder name.

Let’s first create a folder, set it as our current directory in command line:

mkdir HelloWorld && cd HelloWorld


Now create a new C# file HelloWorld.cs, and paste in the code below:

using System;
public class Program {
    public static void Main(string[] args){
        Console.WriteLine("Hello World from Core CLR!");


Next, we need to provide the project settings DNX will use. Create a new project.json file in the same folder, and edit it to match the listing shown here:


    "version": "1.0.0-*",
    "dependencies": {
    "frameworks" : {
        "dnx451" : { },
        "dnxcore50" : {
            "dependencies": {
                "System.Console": "4.0.0-beta-*"

.NET Development Utility (DNU)

DNU is a command line tool that helps with the development of applications using DNX. You can use DNU to build, package and publish DNX projects. Or, as in the following example, you can use DNU to install a new package into an existing project or to restore all package dependencies for an existing project.


The project.json file defines the app dependencies and target frameworks in addition to various metadata properties
about the app. See Working with DNX Projects for more details.

Because the .NET Core is completely factored we need to explicitly pull those libraries that our project depends on. We’ll  run the following command to download and install all packages that are listed in the project.json:

dnu restore


You’ll notice that even though our project only required System.Console, several dependant libraries have been downloaded and installed as a result.

DNX run the app

At this point, we’re ready to run the app. You can do this by simply entering dnx run from the command prompt. You should see a result like this one:

dnx run

Further reading





Intro to .NET Core

.NET Core: Cross Platform – OS X

Intro to .NET Core

.NET is a general purpose development platform. It has several key features that are attractive to many developers, including automatic memory management and modern programming languages, that make it easier to efficiently build high-quality apps. Multiple implementations of .NET are available, based on open .NET Standards that specify the fundamentals of the platform.

.NET Implementations

There are various implementations of .NET, some coming from Microsoft, some coming from other companies and groups:

  • The .NET Framework is the premier implementation of the .NET Platform available for Windows server and client developers.

    There are additional stacks built on top the .NET Framework, for example Windows Forms and Windows Presentation Foundation (WPF) for UI, Windows Communication Foundation (WCF) for middleware services and ASP.NET as a web framework.

  • Mono is an open source implementation of Microsoft’s .NET Framework based on the ECMA standards for C# and the Common Language Runtime.
  • .NET Native is the set of tools used to build .NET Universal Windows Platform (UWP) applications. .NET Native compiles C# to native machine code that performs like C++.

I’ll explain a little bit more on what is .NET Core below. But first, let’s take a look at the .NET Ecosystem.

.NET Ecosystem

The NET Ecosystem is undergoing a major shift and restructuring in 2015. There are a lot of “moving pieces” that need to be tied together in order for this new ecosystem and all of the recommended scenarios to work. As you can see, this is a very vibrant and diverse ecosystem.

You might not know but most of these projects are currently open sourced and fostered by the .NET Foundation (independent) organization. Yes! These projects are open source!



A wild .NET implementation appeared!

.NET Core is a cross-platform implementation of .NET that is primarily being driven by ASP.NET 5 workloads, but also by the need and desire to have a modern runtime that is modular and whose features and libraries can be cherry picked based on the application’s needs.

It includes a small runtime that is built from the same codebase as the .NET Framework CLR. The .NET Core runtime includes the same GC and JIT (RyuJIT), but doesn’t include features like Application Domains or Code Access Security.

There are several characteristics of .NET Core:

  • Cross-platform support is the first important feature. For applications, it is important to use those platforms that will provide the best environment for their execution. Thus, having an application platform that can enable the app to be ran on different operating systems with minimal or no changes provides a significant boon.
  • Open Source because it is proven to be a great way to enable a larger set of platforms, supported by community contribution.
  • Better packaging story – the framework is distributed as a set of packages that developers can pick and choose from, rather than a single, monolithic platform. .NET Core is the first implementation of .NET Platform that is distributed via NuGet package manager.
  • Better application isolation as one of the scenarios for .NET Core is to enable applications to “take” the needed runtime for their execution and deploy it with the application, not depending on shared components on the targeted machine. This plays well with the current trends of developing software and using container technologies like Docker for consolidation.
  • Modular – .NET Core is a set of runtime, library and compiler components. Microsoft uses these components in various configurations for device and cloud workloads.

NuGet as a 1st class delivery vehicle

In contrast to the .NET Framework, the .NET Core platform will be delivered as a set of NuGet packages.

Using NuGet allows for much more agile usage of the individual libraries that comprise .NET Core. It also means that an application can list a collection of NuGet packages (and associated version information) and this will comprise both system/framework as well as third-party dependencies required. Further, third-party dependencies can now also express their specific dependencies on framework features, making it much easier to ensure the proper packages and versions are pulled together during the development and build process.

If, for example, you need to use immutable collections, you can install the System.Collections.Immutable package via NuGet. The NuGet version will also align with the assembly version, and will use semantic versioning.



Open Source and Cross-Platform

Last year, the .NET Core main repositories were made open source: CoreFX (Framework libraries) and CoreCLR (Runtime) are public in GitHub. The main reasons for this is to leverage a stronger ecosystem and lay the foundation for a cross platform .NET and it is a natural progression on current .NET Foundation’s open source efforts:

However, as of only a few months ago (April) you can install .NET Core on Windows, Linux and OSX. This makes code written for it is also portable across application stacks, such as Mono, and platforms making it feasible to move applications across different environments with ease.


Further reading







.NET Core: Cross Platform – Windows

.NET Core: Cross Platform – OS X

Tips for Running a TeamCity CI Server in Microsoft Azure

Note: Before starting to read this post, read this one, where the Jet Brains team (creators of TeamCity and Resharper, among other good stuff) explain how they’ve enhanced the scalability of their own server. Many of the tips explained below were based on those articles.

In this post I’m going to share some tips for running a TeamCity server end-to-end in Microsoft Azure. This includes the server virtual machine configuration, the agent virtual machines, the database, etc. I won’t cover instructions on how to install and configure TeamCity; instead I’ll focus mainly on specific advice on how to take advantage of Azure services to achieve a better scalability and availability of the server. The key aspects will be:

Let’s get started! But let me first warn you that if you are ‘Penny Pinching in the cloud’, you’re not going to like this Smile

The TeamCity Server

Virtual Machine

Not much to mention on this point. Notice that having the agents running in a separate virtual machine and the database on an external service makes it unnecessary to have super-fast hardware on the server virtual machine.


With the release of SQL v12 in Azure, TeamCity can use Azure SQL database as an external database. So we’ve created a SQL Database Server with v12 enabled and then created a database in the Premium P2 tier, which also provides ‘geo-replication’ out of the box.


Disks and Data

TeamCity Data Directory is the directory in the file system used by TeamCity server to store configuration settings, build results and current operation files. The directory is the primary storage for all the configuration settings and holds all the data that is critical to the TeamCity installation. The build history, users and their data are stored in the database.

The server should be configured to use different disks for the TeamCity binaries and the Data Directory, as follows:

  • The TeamCity binaries are placed in the C: drive of the virtual machine (OS Disk)
  • Since the Azure virtual machine D: drive is a solid state drive (SSD) and provides temporary storage only, the TeamCity data directory is on a separate, attached VHD disk (E:). It’s not advisable to store the TeamCity data directory there. For instructions on how to attach another disk to an Azure virtual machine see this article.


Java Virtual Machine (JVM)

To avoid memory warnings it’s better to use the 64-bit JVM for the TeamCity server (see instructions here). TeamCity (both server and agent) requires JRE 1.6 (or later) to operate:


It’s also recommended to allocate the maximum possible memory: 4 GB (more information here). This mainly implies setting the ‘TEAMCITY_SERVER_MEM_OPTS’ Windows environment variable with the following values:

-Xmx3900m -XX:MaxPermSize=300m -XX:ReservedCodeCacheSize=350m


It’s always advisable to monitor the memory consumption from the Diagnostics page in the Server Administration section from time to time:


The TeamCity Agents

For better scalability of the server, it’s advisable to run the TeamCity agents in a different Azure virtual machine. The agents should only include the TeamCity agent software, plus all the software prerequisites to run builds (Visual Studio, Webdeploy, certificates, etc.).

To avoid configuring all the software prerequisites multiple times (once for each agent), you can do it once and then create a virtual machine image. You get the side benefit of being able to use this same image for the cloud agent configuration (more on this below). You can learn how to create/use an Azure virtual machine image in this article.

When you create the new agents based on the virtual machine image, you should remember to update the agent name in the ‘C:\BuildAgent\conf\buildAgent.properties’ file (see image below) and then restart the agent (from services.msc). Otherwise, all the agents will have the same name and will conflict when registering on the server machine. Also, make sure you open port 9090 in the Windows Firewall before creating the agent image.


Using Cloud Agents

With the release of the Azure Integration plugin (included out-of-the-box in TeamCity 9), you can use TeamCity Cloud Agents to provision agents virtual machines on-demand based on the build queue state. You can have a set of ‘fixed’ agents (that are always running) and a set of cloud agents that start when more build power is needed.

TeamCity triggers the creation of a new agent virtual machine when it has more than one build in the queue and no available agents. It can also be triggered manually (see below). The maximum number of agents created can also be configured, but it won’t create more than what the license allows. For example, to be able to scale from 3 to 6 agents using the cloud configuration, you need to have 6 TeamCity agent licenses available. At the same time, this can help you save some money in your Cloud bill, as TeamCity will delete the agents if they are idle for some time.

The requirements to configure the plug-in include:

  • Downloading the publish settings file (browse here) and uploading the management certificate you obtained in from that file (text only) to the TeamCity virtual machine.
  • Configuring a cloud service that will be used to provision the virtual machines (you can create an empty cloud service)
  • Providing an virtual machine image name, the maximum number of agents to be created, virtual machine size and name prefix. TeamCity will use the prefix plus a number to set the virtual machine name e.g. ‘tcbld-1’, ‘tcbld-2’.




For any questions, you can reach me on twitter @sebadurandeu.

Evaluating Netflix OSS tools using ZeroToDocker images in Azure


ZerotoDocker is a project that allows anyone with a Docker host to run a single node of any Netflix OSS technology with a single command. The portability of Docker allows us to run the tools locally or in different cloud environments such as AWS or Azure. However, it is important to keep in mind that some of the Netflix OSS tools work only in AWS. In these cases, although we could start a Docker container running the application in other environments such as Azure, the tools won’t be able to provide the expected functionality.

If you are not familiar with Docker and you would like to read more about it, you can check out this blog post.

Available Docker Images

Netflix OSS provides Docker images for the following tools:

  • Genie
  • Inviso
  • Atlas
  • Asgard
  • Eureka
  • Edda
  • A Karyon based Hello World Example
  • A Zuul Proxy used to proxy to the Karyon service
  • Security Monkey
  • Exhibitor managed Zookeeper

It is important to keep in mind that these images are not intended to be used in production environments.

Additionally, as we mentioned before, some of the Netflix OSS services corresponding to the images offer functionalities associated exclusively with AWS:

  • Atlas: According to the Atlas wiki in the Zero to Docker repository, it appears that the Atlas image requires the AWS APIs in order to work.
  • Asgard: It offers a web interface for application deployments and cloud management in Amazon Web Services (AWS)
  • Edda: It polls AWS resources via AWS APIs and records the results.
  • Security Monkey: It monitors policy changes and alerts on insecure configurations in an AWS account.

Evaluating Netflix OSS

In this section we show how you can test Genie and the “Hello Netflix OSS” sample application in a Docker environment. This sample application is based on Karyon and interacts with Eureka and Zuul.

As we mentioned before, these images can be run in different environments. In our case, we will test them in Azure.

If you would like to know how to set up an Azure VM with Docker, please take a look at these posts:

Running Genie on Docker

This section describes the steps to set up Genie 2.2.1 using the Docker image. If you’re looking for a different version, please see the list of releases here. The steps described in this document are based on the instructions provided here.

Please, consider that the Docker image we will use is not considered production ready.


This section describes how to set up and configure the containers required to run the example.

Setup MySQL

The first step is to set up MySQL. In order to start a new container running the MySQL image, we need to run the following command:

docker run –name mysql-genie -e MYSQL_ROOT_PASSWORD=genie -e MYSQL_DATABASE=genie -d mysql:5.6.21


If you don’t have the MySQL image in your host, it will be downloaded. Otherwise, the container will start using the existing image:


The previous command will start a container named “mysql-genie”. We’ll use that name later to reference this container from Genie in order to establish a connection.

To verify if MySQL is running properly, we can do 2 things:

  • Run the “docker ps” command to check that the container is running


  • Access the MySQL container
    • Run the following command to access the MySQL container.

      docker exec -it mysql-genie mysql – -pgenie


    • Additionally, we can execute the “show Databases” command to make sure the “genie” database was created:

      show Databases;


      We can see that there is a database called “genie”. If we check the tables of that database, we’ll see that it is empty.

    • Run the “use genie” to use the genie database.

      use genie;


    • Execute the “show Tables;” command. No information will be displayed.
    • Finally, exit the MySQL container by running “exit”.

Set up Hadoop to Run Example

We just need to run the “sequenceiq/hadoop-docker” image. In this case, we will run the command in interactive mode to be able to configure our Hadoop container and verify that everything is working as expected.

docker run –name hadoop-genie -it -p 10020:10020 -p 19888:19888 -p 211:8088 sequenceiq/hadoop-docker:2.6.0 /etc/bootstrap.sh -bash


Since we already have an endpoint configured for port 211 in our Azure VM, we included the port mapping “211:8088” to be able to access the Hadoop Resource manager.

Once we have Hadoop running, we will modify the /etc/hosts file. We will add “hadoop-genie” (the container name) after the container id to the first line (space separated). This will allow the daemons to resolve each other when a job is submitted in the future from the Genie node by container name.

So, we will run the following command to start editing the hosts file.

vi /etc/hosts


After editing the file, it should look as follows: 83893db7d234 hadoop-genie localhost

::1 localhost ip6-localhost ip6-loopback

fe00::0 ip6-localnet

ff00::0 ip6-mcastprefix

ff02::1 ip6-allnodes

ff02::2 ip6-allrouters


Finally, we will start the Job History Server:

/usr/local/hadoop/sbin/mr-jobhistory-daemon.sh start historyserver


Running the “jps” command, we should see something like this:

bash-4.1# jps

356 SecondaryNameNode

1001 Jps

190 DataNode

514 ResourceManager

112 NameNode

598 NodeManager

932 JobHistoryServer


We will leave Hadoop running in the current SSH client and open a new one to start working with Genie.

Run the Genie Container

Once we have opened a new SSH client, accessed our VM and configured Docker properly, we can run the Genie container.

In our case, we have an endpoint in our Azure VM configured for port 210, so we will run the Genie container mapping public port 210 to container port 8080, which is the default Genie port.

docker run -p 210:8080 –name genie –link mysql-genie:mysql-genie –link hadoop-genie:hadoop-genie -d netflixoss/genie:2.2.1


Once the Genie container is running, we can verify if everything is working properly.

First, we will check the connection with the MySQL container from Genie:

  • Access the MySQL container by running:

    docker exec -it mysql-genie mysql -pgenie

  • Run the “use genie” command

    use genie;


  • Finally, check the genie database tables by running:

    show tables;


To exit the MySQL container, we need to run “exit”.

Once we have verified the connection with the database, we can access the Genie UI from a browser. We should be able to access Genie by accessing our VM URL and providing port 210. In our case, the URL is:



As you can see, there are no clusters or jobs.

In order to finish our verification, we will check that the connection with Hadoop is working. To do this, we will access the Genie container by running:

docker exec -it genie /bin/bash


Then, we will ping the Hadoop container. We should see information about packages that have been sent and received successfully.

ping hadoop-genie


To stop the ping command, we will press “Ctrl + C”.


Run the example

We have everything in place, and we are ready to run the example. The example configures Genie with the Hadoop configuration information for the Hadoop container mentioned earlier as well as two commands (Hadoop and Pig). Then, it will launch a Hadoop MR job which is the example provided by Hadoop: ”hadoop jar share/hadoop/mapreduce/hadoop-mapreduce-examples-2.6.0.jar grep input output ‘dfs[a-z.]+’”.

We are already in the Genie container, so we can start to run the example.

First, we’ll execute the setup script to register the Hadoop cluster and Hadoop / Pig commands.



In order to verify that everything was registered as expected, we can go to the Genie UI and check the commands and clusters sections:

  • Home page:


  • Clusters view:


  • Commands view:


Additionally, we can verify that “excite.log.bz2” is in HDFS by using:

hadoop fs -ls


Finally, if everything is OK, we can run the example job.



Once the Job is started, we will see it in the Genie UI:


If we access the Jobs section, we should see it there too:


We can also see the output of the Job by accessing:



Here, we can check the different logs.

When the job finishes executing, the console should resemble the following:


That’s it! We ran Genie and submitted a job to a Hadoop cluster. Additionally, we were able to check the different logs corresponding to the job.

Running the “Hello Netflix OSS” sample on Docker

This section describes how to run the “Hello Netflix OSS” sample image in combination with Eureka and Zuul, running the corresponding Docker images in an Azure VM.

Please take into account that the Docker images we will use are not considered production-ready.

Run Eureka

The first thing we will do is to start a container running the Eureka image. Since we have an endpoint for port 212 in our Azure VM, we will map it to container port 8080 since Eureka is accessible through that port.

docker run -d -p 212:8080 –name eureka netflixoss/eureka:1.1.147


After running the image, we should be able to access the Eureka page through the following URL:


In our case, the URL is:



Hello Netflix OSS sample application

Once Eureka is running, we can start the sample image:

docker run -d -p 213:8080 -p 214:8077 –name hello-netflix-oss –link eureka:eureka netflixoss/hello-netflix-oss:1.0.27


In this case, we configured the ports to be able to access the application though port 213 and the embedded Karyon admin services console through port 214:

Additionally, we defined a link with the Eureka container to allow Eureka to communicate with the sample application container.


Although the application is already accessible, we will start a container running Zuul to access the sample application through Zuul.

So, we need to start a container running the Zuul image and link it to Eureka.

docker run -e “origin.zuul.client.DeploymentContextBasedVipAddresses=HELLO-NETFLIX-OSS” -p 210:8080 -d –name zuul –link eureka:eureka netflixoss/zuul:1.0.28


Here, we are making Zuul accessible through port 210 of our Azure VM. If we access the Zuul port, we will see that the sample application is displayed:


At this point, if we check the Eureka application, it will show both applications: Zuul and the sample application.


In this scenario we were able to run an application based on Karyon, establish a communication with Eureka and access the application through Zuul.

Netflix OSS – Security Tools

Netflix has released different security tools and solutions to the open source community. The security-related open source efforts generally fall into one of two categories:

  • Operational tools and systems to make security teams more efficient and effective when securing large and dynamic environments
  • Security infrastructure components that provide critical security services for modern distributed systems.

Below you can find further information about some of the security tools released by Netflix.

Security Monkey

Security Monkey monitors policy changes and alerts on insecure configurations in an AWS account. While Security Monkey’s main purpose is security, it also proves a useful tool for tracking down potential problems as it is essentially a change tracking system.

It has a Docker image, but the functionality works with AWS.


Scumblr is a web application that allows performing periodic searches and storing/taking actions on the identified results. Scumblr uses the Workflowable gem to allow setting up flexible workflows for different types of results.

Workflowable is a gem that allows easy creation of workflows in Ruby on Rails applications. Workflows can contain any number of stages and transitions, and can trigger customizable automated actions as states are triggered.

Scumblr searches utilize plugins called Search Providers. Each Search Provider knows how to perform a search via a certain site or API (Google, Bing, eBay, Pastebin, Twitter, etc.). Searches can be configured from within Scumblr based on the options available by the Search Provider. Examples of things you might want to look for are:

  • Compromised credentials
  • Vulnerability / hacking discussion
  • Attack discussion
  • Security relevant social media discussion

Message Security Layer

Message Security Layer (MSL) is an extensible and flexible secure messaging framework that can be used to transport data between two or more communicating entities. Data may also be associated with specific users, and treated as confidential or non-replayable if so desired.

MSL does not attempt to solve any specific use case or communication scenario. Rather, it is capable of supporting a wide variety of applications and leveraging external cryptographic resources. There is no one-size-fits-all implementation or configuration; proper use of MSL requires the application designer to understand their specific security requirements.

Netflix OSS – Insight, Reliability and Performance Tools

As part of the Netflix OSS platform, Netflix has released tools to get operation insight about an application, take different kind of metrics and validate reliability by ensuring that the application can support different kinds of failures.

In this blog post we list and briefly describe some of these tools.


Atlas was developed by Netflix to manage dimensional time series data for near real-time operational insight. Atlas features in-memory data storage, allowing it to gather and report very large numbers of metrics very quickly.

Atlas captures operational intelligence. Whereas business intelligence is data gathered for the purpose of analyzing trends over time, operational intelligence provides a picture of what is currently happening within a system.


Edda is a service that polls your AWS resources via AWS APIs and records the results. It allows you to quickly search through your resources and shows you how they have changed over time.

Previously this project was known within Netflix as Entrypoints (and mentioned in some blog posts), but the name was changed as the scope of the project grew. Edda (meaning “a tale of Norse mythology”), seemed appropriate for the new name, as our application records the tales of Asgard.


Spectator is a simple library for instrumenting code to record dimensional time series. When running at Netflix with the standard platform, use the spectator-nflx-plugin library to get bindings for internal tools like Atlas and Chronos.


Vector is an open source on-host performance monitoring framework which exposes hand-picked, high-resolution system and application metrics to every engineer’s browser. Having the right metrics available on-demand and at a high resolution is key to understanding how a system behaves to correctly troubleshoot performance issues.

Vector provides a simple way for users to visualize and analyze system and application-level metrics in near real-time. It leverages the battle tested open source system monitoring framework, Performance Co-Pilot (PCP), layering on top a flexible and user-friendly UI.


Ice provides a bird’s-eye view of large and complex cloud landscape from a usage and cost perspective. Cloud resources are dynamically provisioned by dozens of service teams within the organization and any static snapshot of resource allocation has limited value.

Ice is a Grails project. It consists of three parts: processor, reader and UI. Processor processes the Amazon detailed billing file into data readable by reader. Reader reads data generated by processor and renders them to UI. UI queries reader and renders interactive graphs and tables in the browser.

Ice communicates with AWS Programmatic Billing Access and maintains knowledge of the following key AWS entity categories:

  • Accounts
  • Regions
  • Services (e.g. EC2, S3, EBS)
  • Usage types (e.g. EC2 – m1.xlarge)
  • Cost and Usage Categories (On-Demand, Reserved, etc.) The UI allows you to filter directly on the above categories to custom tailor your view.

Simian Army

The Simian Army is a suite of tools for keeping your cloud operating in top form. Chaos Monkey, the first member, is a resiliency tool that helps ensure that your applications can tolerate random instance failures.

Simian Army consists of services (Monkeys) in the cloud for generating various kinds of failures, detecting abnormal conditions, and testing our ability to survive them. The goal is to keep our cloud safe, secure, and highly available.

Currently the simians include Chaos Monkey, Janitor Monkey, and Conformity Monkey.

Netflix OSS – Data Persistence Tools Overview

Handling a huge amount of data operations per day required Netflix to improve existent open source software with their own tools. The cloud usage in Netflix and the scale at which it consumes/manages data has required them to build tools and services that enhance the used datastores.

In this blog post we list and briefly describe some of the tools released by Netflix to store and serve data in the cloud.


EVCache is a memcached & spymemcached based caching solution that is mainly used for AWS EC2 infrastructure for caching frequently used data.

EVCache is an abbreviation for:

  • Ephemeral – The data stored is for a short duration as specified by its TTL (Time To Live)
  • Volatile – The data can disappear at any time (Evicted)
  • Cache – An in-memory key-value store

It offers the following features:

  • Distributed Key-Value store, i.e., the cache is spread across multiple instances
  • AWS Zone-Aware – Data can be replicated across zones
  • Registers and works with Eureka for automatic discovery of new nodes/services


Dynomite is a thin, distributed dynamo layer for different storages and protocols.

It is a generic dynamo implementation that can be used with many different key-value pair storage engines. Currently these include Redis and Memcached. Dynomite supports multi-datacenter replication and is designed for high availability.

The ultimate goal with Dynomite is to be able to implement high availability and cross-datacenter replication on storage engines that do not inherently provide that functionality. The implementation is efficient, not complex (few moving parts), and highly performant.


Astyanax is a high level Java client for Apache Cassandra. Apache Cassandra is a highly available column oriented database.

It borrows many concepts from Hector but diverges in the connection pool implementation as well as the client API. One of the main design considerations was to provide a clean abstraction between the connection pool and Cassandra API so that each may be customized and improved separately. Astyanax provides a fluent style API which guides the caller to narrow the query from key to column as well as providing queries for more complex use cases that we have encountered. The operational benefits of Astyanax over Hector include lower latency, reduced latency variance, and better error handling.

Some of the features provided by this client are:

  • High level, simple object oriented interface to Cassandra
  • Fail-over behavior on the client side
  • Connection pool abstraction. Implementation of a round robin connection pool
  • Monitoring abstraction to get event notification from the connection pool
  • Complete encapsulation of the underlying Thrift API and structs
  • Automatic retry of downed hosts
  • Automatic discovery of additional hosts in the cluster
  • Suspension of hosts for a short period of time after several timeouts
  • Annotations to simplify use of composite columns


Dyno is the Netflix home grown java client for Dynomite. Dynomite adds sharding and replication on top of Redis and Memcached as underlying datastores and the dynomite server implements the underlying datastore protocol and presents that as its public interface. Hence, one can just use popular java clients like Jedis, Redisson and SpyMemcached to directly speak to Dynomite. Dyno encapsulates client-side complexity and best practices in one place instead of having every application repeat the same engineering effort, e.g., topology aware routing, effective failover, load shedding with exponential backoff, etc.

Dyno implements patterns inspired by Astyanax on top of popular clients like Jedis, Redisson and SpyMemcached.

Some of Dyno’s features are:

  • Connection pooling of persistent connections – this helps reduce connection churn on the Dynomite server with client connection reuse.
  • Topology aware load balancing (Token Aware) for avoiding any intermediate hops to a Dynomite coordinator node that is not the owner of the specified data
  • Application specific local rack affinity based request routing to Dynomite nodes
  • Application resilience by intelligently failing over to remote racks when local Dynomite rack nodes fail
  • Application resilience against network glitches by constantly monitoring connection health and recycling unhealthy connections
  • Capability of surgically routing traffic away from any nodes that need to be taken offline for maintenance
  • Flexible retry policies such as exponential backoff, etc.
  • Insight into connection pool metrics
  • Highly configurable and pluggable connection pool components for implementing your advanced features

Netflix OSS – Common Runtime Services & Libraries

Netflix has released as open source software several of the tools, libraries and services they use to power microservices. The cloud platform is the foundation and technology stack for the majority of the services within Netflix. This platform consists of cloud services, application libraries and application containers.

Below you can find information about the services and libraries used by Netflix that were released as open source software.


Eureka is a REST (Representational State Transfer) based service that is primarily used in the AWS cloud for locating services for the purpose of load balancing and failover of middle-tier servers. We call this service the Eureka Server. Eureka also comes with a Java-based client component, the Eureka Client, which simplifies the interactions with the service. The client also has a built-in load balancer that does basic round-robin load balancing. At Netflix, a much more sophisticated load balancer wraps Eureka to provide weighted load balancing based on several factors (like traffic, resource usage, error conditions, etc.) to provide superior resiliency.

Apart from playing a critical part in mid-tier load balancing, at Netflix, Eureka is used for the following purposes.

  • For aiding Netflix Asgard in:
    • Fast rollback of versions in case of problems, avoiding the relaunch of hundreds of instances which could take a long time
    • Rolling pushes, to avoid propagation of a new version to all instances in case of problems
  • For Cassandra deployments to take instances out of traffic for maintenance
  • For Memcached caching services to identify the list of nodes in the ring
  • For carrying other additional application specific metadata about services for various other reasons


Archaius is a configuration management library with a focus on Dynamic Properties sourced from multiple configuration stores. It includes a set of configuration management APIs used by Netflix. It is primarily implemented as an extension of Apache’s Commons Configuration Library.

It provides the following functionalities:

  • Dynamic, Typed Properties
  • High throughput and Thread Safe Configuration operations
  • A polling framework that allows obtaining property changes of a Configuration Source
  • A Callback mechanism that gets invoked on effective/”winning” property mutations (in the ordered hierarchy of Configurations)
  • A JMX MBean that can be accessed via JConsole to inspect and invoke operations on properties
  • Out of the box, Composite Configurations (With ordered hierarchy) for applications (and most web applications willing to use convention based property file locations)
  • Implementations of dynamic configuration sources for URLs, JDBC and Amazon DynamoDB
  • Scala dynamic property wrappers

At the heart of Archaius is the concept of a Composite Configuration which can hold one or more Configurations. Each Configuration can be sourced from a Configuration Source such as: JDBC, REST, .properties file, etc.


Ribbon is an Inter Process Communication (remote procedure calls) library with built-in software load balancers. The primary usage model involves REST calls with various serialization scheme support.

Ribbon is a client-side IPC library that is battle-tested in cloud. It provides the following features:

  • Load balancing
  • Fault tolerance
  • Multiple protocol (HTTP, TCP, UDP) support in an asynchronous and reactive model
  • Caching and batching

There are three sub projects:

  • ribbon-core: includes load balancer and client interface definitions, common load balancer implementations, integration of client with load balancers and client factory
  • ribbon-eureka: includes load balancer implementations based on Eureka client, which is the library for service registration and discovery
  • ribbon-httpclient: includes the JSR-311 based implementation of REST client integrated with load balancers


Hystrix is a latency and fault tolerance library designed to isolate points of access to remote systems, services and 3rd party libraries, stop cascading failure and enable resilience in complex distributed systems where failure is inevitable.

Hystrix is designed to do the following:

  • Give protection from and control over latency and failure from dependencies accessed (typically over the network) via third-party client libraries
  • Stop cascading failures in a complex distributed system
  • Fail fast and rapidly recover
  • Fallback and gracefully degrade when possible
  • Enable near real-time monitoring, alerting, and operational control


Karyon is a framework and library that essentially contains the blueprint of what it means to implement a cloud-ready web service. All the other fine grained web services and applications that form our SOA graph can essentially be thought of as being cloned from this basic blueprint.

Karyon can be thought of as a nucleus that contains the following main ingredients:

  • Bootstrapping , dependency and Lifecycle Management (via Governator)
  • Runtime Insights and Diagnostics (via karyon-admin-web module)
  • Configuration Management (via Archaius)
  • Service discovery (via Eureka)
  • Powerful transport module (via RxNetty)


Governator is a library of extensions and utilities that enhance Google Guice to provide classpath scanning and automatic binding, lifecycle management, configuration to field mapping, field validation and parallelized object warmup.


Pana is a sidecar for your NetflixOSS based services. It simplifies the integration with NetflixOSS services since it exposes Java based client libraries of various services like Eureka, Ribbon, and Archaius over HTTP. It makes it easy for applications especially written in Non-JVM languages to exist in the NetflixOSS eco-system.

Prana is a Karyon & RxNetty based application that exposes features of java-based client libraries of various NetflixOSS services over an HTTP API. It is conceptually “attached” to the main application and complements it by providing features that are otherwise available as libraries within a JVM-based application.

Prana is used extensively within Netflix alongside applications built in non-jvm programming language like Python and NodeJS or services like Memcached, Spark, and Hadoop.

Between Prana’s features we can mention:

  • Advertising applications via the Eureka Service Discovery Service
  • Discovery of hosts of an application via Eureka
  • Health Check of services
  • Load Balancing http requests via Ribbon
  • Fetching Dynamic Properties via Archaius


Zuul is an edge service that provides dynamic routing, monitoring, resiliency and security among other things. It is the front door for all requests from devices and web sites to the backend of the Netflix streaming application. As an edge service application, Zuul is built to enable dynamic routing, monitoring, resiliency and security. It also has the ability to route requests to multiple Amazon Auto Scaling Groups.

Zuul uses a range of different types of filters that enables us to quickly and nimbly apply functionality to our edge service. These filters help the people from Netflix to perform the following functions:

  • Authentication and Security – identifying authentication requirements for each resource and rejecting requests that do not satisfy them
  • Insights and Monitoring – tracking meaningful data and statistics at the edge in order to give us an accurate view of production
  • Dynamic Routing – dynamically routing requests to different backend clusters as needed
  • Stress Testing – gradually increasing the traffic to a cluster in order to gauge performance
  • Load Shedding – allocating capacity for each type of request and dropping requests that exceed the limit
  • Static Response handling – building some responses directly at the edge instead of forwarding them to an internal cluster
  • Multiregion Resiliency – routing requests across AWS regions in order to diversify our ELB usage and move our edge closer to our members

At the center of Zuul is a series of Filters that are capable of performing a range of actions during the routing of HTTP requests and responses.
The following are the key characteristics of a Zuul Filter:

  • Type: most often defines the stage during the routing flow when the Filter will be applied (although it can be any custom string)
  • Execution Order: applied within the Type, defines the order of execution across multiple Filters
  • Criteria: the conditions required in order for the Filter to be executed
  • Action: the action to be executed if the Criteria is met

Zuul provides a framework to dynamically read, compile, and run these Filters. Filters do not communicate with each other directly – instead they share state through a RequestContext which is unique to each request.

Zuul contains multiple components:

  • zuul-core– library which contains the core functionality of compiling and executing Filters
  • zuul-simple-webapp– webapp which shows a simple example of how to build an application with zuul-core
  • zuul-netflix– library which adds other NetflixOSS components to Zuul – using Ribbon for routing requests, for example
  • zuul-netflix-webapp – webapp which packages zuul-core and zuul-netflix together into an easy-to-use package

Netflix OSS – Build and Delivery Tools Overview

Between the Build and Delivery tools released by Netflix as part of the Netflix OSS platform you can find build resources such as Nebula (which makes Gradle plugins easy to build, test and deploy) and tools to manage resources in AWS and to support deployments to this platform.

Below you can find a brief description of some of the build and delivery tools released by Netflix.


The nebula-plugins organization was set up to facilitate the generation, governance, and releasing of Gradle plugins. It is done by providing a space to host plugins, in SCM, CI, and a Repository. A single GitHub organization is used, for which anyone or any plugin can be added. Cloudbees jobs are created for every plugin to provide a standard set of jobs. Releases are posted to bintray, proxied to JCenter and synced to Maven Central.


Aminator is a tool for creating EBS AMIs. This tool currently works for CentOS/RedHat Linux images and is intended to run on an EC2 instance.
It creates a custom AMI from just:

  • A base ami ID
  • A link to a deb or rpm package that installs your application.

This is useful for many AWS workflows, particularly ones that take advantage of auto-scaling groups.


Asgard is a web-based tool for managing cloud-based applications and infrastructure. It offers a web interface for application deployments and cloud management in Amazon Web Services (AWS).

Netflix has been using Asgard for cloud deployments since early 2010. It was initially named the Netflix Application Console.

Netflix OSS – Big Data Tools Overview

Behind the scenes, Netflix has a rich ecosystem of big data technologies facilitating their algorithms and analytics. They use and contribute to broadly-adopted open source technologies including Hadoop, Hive, Pig, Parquet, Presto, and Spark. Additionally, they have developed and contributed some additional tools and services which have further elevated their data platform.

Below you can find information about some tools of the Netflix OSS platform that offer functionalities associated to big data.


Genie is a federated job execution engine developed by Netflix. Genie provides REST-ful APIs to run a variety of big data jobs like Hadoop, Pig, Hive, Presto, Sqoop and more. It also provides APIs for managing many distributed processing cluster configurations and the commands and applications which run on them.

From the perspective of the end-user, Genie abstracts away the physical details of various (potentially transient) computational resources (like YARN, Spark, Mesos clusters etc.). It then provides APIs to submit and monitor jobs on these clusters without users having to install any clients themselves or know details of the clusters and commands.

A big advantage of this model is the scalability that it provides for client resources. This solves a very common problem where a single machine is configured as an entry point to submit jobs to large clusters and the machine gets overloaded. Genie allows the use of a group of machines which can increase and decrease in number to handle the increasing load, providing a very scalable solution.

Within Netflix it is primarily used to handle scaling out jobs for their big data platform in the cloud, but it doesn’t require AWS or the cloud to benefit users.


Inviso is a lightweight tool that provides the ability to search for Hadoop jobs, visualize the performance, and view cluster utilization.

This tool is based on the following components:

  • REST API for Job History: REST endpoint to load an entire job history file as a json object
  • ElasticSearch: Search jobs and correlate Hadoop jobs for Pig and Hive scripts
  • Python Scripts: Scripts to index job configurations into ElasticSearch for querying. These scripts can accommodate a pub/sub model for use with SQS or another queuing service to better distribute the load or allow other systems to know about job events.
  • Web UI: Provides an interface to search and visualize jobs and cluster data


Lipstick combines a graphical depiction of a Pig workflow with information about the job as it executes, giving developers insight that previously required a lot of sifting through logs (or a Pig expert) to piece together.


Aegisthus is a Bulk Data Pipeline out of Cassandra. It implements a reader for the SSTable format and provides a map/reduce program to create a compacted snapshot of the data contained in a column family.