Netflix OSS Overview

Netflix is considered one of the biggest cloud applications out there. As such, the people at Netflix have faced many different kinds of challenges trying to avoid failures in their service. Over time, they implemented different tools to support and improve their cloud environment, making the Netflix application more reliable, fault-tolerant and highly available.

The really good news is that Netflix has made some of these tools open source. There are now tools available to make a cloud environment more reliable coming from a company that is using them in a huge infrastructure. One thing to consider is that Netflix utilizes AWS for services and content delivery and, as a consequence, some of the implemented tools provide functionalities for this particular cloud environment. However, other tools offer more generic features that can be used in other environments.

It is important to consider that, although Netflix has embraced the open source concept, the shared code provides solutions for computing. The company is not sharing any of its innovations and technology around streaming video.

Open Source tools

The different tools released as part of the Netflix OSS platform can be categorized according to the functionality they provide. In this section you can find a brief description of these categories.

Additionally, you can find further information about the different categories and the associated tools in the blog posts that are referenced at the end of each category description. It is important to keep in mind that, at this point, more than 50 projects can be found in the Netflix OSS GitHub repository. As a consequence, we list and describe only the main tools corresponding to each category.

Big Data Tools

Behind the scenes, Netflix has a rich ecosystem of big data technologies facilitating their algorithms and analytics. They use and contribute to broadly-adopted open source technologies including Hadoop, Hive, Pig, Parquet, Presto, and Spark. Additionally, they have developed and contributed some additional tools and services which have further elevated their data platform.

To know more about the main tools of the Netflix OSS platform that offer functionalities associated to big data, please check out the Netflix OSS – Big Data tools Overview blog post.

Build and Delivery Tools

In this category you can find build resources such as Nebula, which makes Gradle plugins easy to build, test and deploy. Additionally, this category includes tools to manage resources in AWS and to support deployments to this platform.

In the Netflix OSS – Build and Delivery Tools Overview blog post you can find more information about the available tools.

Common Runtime Services & Libraries

In this category you can find tools, libraries and services to power microservices. The cloud platform is the foundation and technology stack for the majority of the services within Netflix. This platform consists of cloud services, application libraries and application containers.

In this category you can find tools, libraries and services to power microservices. The cloud platform is the foundation and technology stack for the majority of the services within Netflix. This platform consists of cloud services, application libraries and application containers.

Take a look at the Netflix OSS – Common Runtime Services & Libraries Overview blog post to know more about the services and libraries used by Netflix that were released as open source software.

Data Persistence Tools

Handling a huge amount of data operations per day required Netflix to improve existent open source software with their own tools. The cloud usage in Netflix and the scale at which it consumes/manages data has required them to build tools and services that enhance the used datastores.

In this category you will find tools to store and serve data in the cloud. Take a look at the Netflix OSS – Data Persistence Tools Overview blog post to read more about these tools.

Insight, Reliability and Performance Tools

In this category you can find tools to get Operation Insight about an application, take different kind of metrics and validate reliability by ensuring that the application can support different kinds of failures.

In the Netflix OSS – Insight, Reliability and Performance Tools Overview blog post you can find more information about these tools.

Security Tools

Netflix has released different security tools and solutions to the open source community. The security-related open source efforts generally fall into one of two categories:

  • Operational tools and systems to make security teams more efficient and effective when securing large and dynamic environments
  • Security infrastructure components that provide critical security services for modern distributed systems.

Check out the Netflix OSS – Security Tools Overview blog post to find further information about some of the security tools released by Netflix.

Getting Started

There are different ways to start working with the Netflix OSS tools.

The Zero to Cloud workshop offers a tutorial focused on bringing up the Netflix OSS stack on a fresh AWS account, in a similar style to how Netflix does it internally. To try it, you would need to have an AWS account and the required resources to set up the infrastructure.

Another way to start playing with the Netflix OSS tools is to analyze sample applications such as IBM ACME Air and Flux Capacitator. These applications use several of the Netflix OSS tools, so they could be useful to understand how the tools can be used outside Netflix. In this case, you may also need to have the proper cloud infrastructure to set up the tools and execute them.

Finally, the fastest way to test some of the Netflix OSS tools is to use ZeroToDocker. If you are familiar with Docker, you can use the Docker images provided by Netflix to get some of the tools up and running in just a few minutes. Additionally, since some of the tools do not require AWS to work, you can run and test them in other cloud environments or locally.

Microsoft Azure Media Services SDK for Java v0.8.0 released and new samples available

Azure Media Services SDK for Java

This week the Azure SDK team published new releases of the Azure SDK for Java packages that contain updates and support for more Microsoft Azure Platform Services features. You can find the full list of packages at|ga|1|g:””.

In particular, there was a new release (v.0.8.0) of the Azure Media Services SDK for Java that contains lots of new features such as Content Protection (Dynamic Encryption) and support for more entities/operations (StreamingEndpoint, EncodingReservedUnitType, StorageAccount, etc.). Below you can find the full change log for this release.

Here at Southworks I’ve been working with Emanuel Vecchio on preparing some Java console sample applications showing how to use the new features recently added to the Azure Media Services SDK for Java. As a result, we created the azure-sdk-for-media-services-java-samples GitHub repository containing the following samples:

Media Services SDK for Java sample projects in Eclipse


v0.8.0 Change Log




manifoldjsWe recently contributed to the creation of manifoldJS, a tool that helps web developers to bring their websites to the app stores with minimal effort, giving them the chance to reach more users, platforms and devices without having to change the workflow, content and deployment model they use today on their websites.

manifoldJS uses the W3C Manifest for Web Apps (a standard that describes the metadata of a website) to create apps for a number of platforms, including iOS, Android, Windows, Firefox OS and Chrome OS. There are two ways to generate apps with manifoldJS: through the app generator’s wizard steps of the site, or installing and running the NodeJS CLI Tool locally.

The tool was introduced at //BUILD/ 2015’s Day 2 Keynote, where Steve Guggenheimer (Microsoft’s Chief Technical Evangelist) and John Shewchuck (Technical Fellow and the CTO for the Microsoft Developer Platform) showed how to use manifoldJS to generate web apps for the Shiftr website.

MeetUp: Machine Learning – Buscándole el sentido a nuestros datos

Estamos organizando un nuevo Meetup dentro del grupo “Buenos Aires Cloud Computing” sobre Machine Learning con Microsoft Azure.

Cuándo y dónde?

Este meetup será el Miércoles 19 de Agosto del 2015 en las oficinas de Southworks (Perú 375 1er Piso, CABA).

Sobre qué trata?

La idea será ver una introducción a Machine Learning, ver las herramientas de Microsoft para hacer nuestros proyectos de Machine Learning y desarrollar un caso práctico de como podemos usar esta tecnología para que nuestros datos tengan sentido y valor.

Para más información, pueden ir viendo los posts que estuvimos escribiendo en nuestro blog:

Dónde me inscribo?

Pueden inscribirse en la web de MeetUp en este enlace.

Los esperamos!


Using Data Factory in a real world scenario

The purpose of this post is to show you the features of Data Factory and the kind of scenarios you can create with it.
Remember our Machine Learning Twitter Sentiment Analysis post? That service can be used to generate more complex scenarios.

Data Factory can help us create a scenario where all our related data converges, even from different sources. The data can be prepared, transformed and analyzed. Data factory lets you compose services into managed data flow pipelines to transform data using HDInsight, Azure Batch, Azure Machine Learning, and more to generate relevant information.


Let’s see how all our pieces can be combined to generate useful information.

Our scenario is about a company that just released a marketing campaign and wants to leverage the Twitter response, analyze, aggregate, and identify the positive and negative sentiments of its customers.

We are using this sample as a base for creating the scenario. If you want to try it, just follow the instructions found in the link.

The raw tweets stored in Azure Blob storage will be read and aggregated to generate the TweetCountPerHour, TweetCountPerDay and the TotalTweetCount datasets. The tweets will then be sent to the Sentiment Analysis Machine Learning model to be classified as Positive, Negative or Neutral. Finally, the sentiment for individual tweets will be aggregated to determine the overall sentiment, which is the number of tweets with Positive, Negative and Neutral Sentiment.

Having this information will allow us to determine whether or not the campaign was successful.

The following image shows the diagram of the data factory scenario.

solution diagram

You can see how the first input of the process is the raw tweets, and how data is being transformed and then moved to SQL Azure.

If we inspect the generated data in the Azure SQL Server, we can see the following results.


In the results we can see that based on the gathered tweets, almost 60% of people are tweeting positively about our brand, and only 13% show negative sentiment. We can conclude, based on this data, that our new campaign has been successful.

Hope this is useful to you.


Getting ready for the “Programming in HTML5 with JavaScript and CSS3” Microsoft Exam

Knowing HTML, CSS and JavaScript is essential to any web developer, and the Microsoft Exam 70-480 “Programming in HTML5 with JavaScript and CSS3” allows you to become certified in those technologies. Although in my opinion the exam is not a reliable measure of how well you can program in those languages (this is a debatable topic, but I won’t talk about this here), in my case it provided the motivation I needed to study something I had been postponing.

I had some basic knowledge of HTML and CSS, but JavaScript was new to me. To get trained, I read back to back the Official Training guide (at an average of two chapters a day, I finished it in two weeks). I found it to be an excellent book; it provides both theoretical information as well as practice exercises and code samples; not to mention the quizzes at the end of each lesson to test your knowledge. It has a few disadvantages though: there are a few errors here and there, and it doesn’t cover 100% of the exam topics. Oh, and the black-and-white code snippets were a pain to read; I recommend buying the eBook format to copy and paste snippets in your favorite IDE to try them out (or use jsfiddle).

Anyway, I realized that the book didn’t cover everything on the exam after I compared the contents to the Skills tested in the exam. You need to dig deeper than the book, and of course the internet is an excellent source.

Here are a few topics I had to review before taking the exam:

  • CSS animations with the “transition” and “transform” properties. It’s not covered at all in the training guide, and for the exam you need to know for instance how to rotate an element counter-clockwise. Also, it’s not an exam topic but it’s worth knowing about the existance of keyframes, which allow you to specify intermediate steps for your animations (something you can’t do with transitions).
  • CSS exclusions, which basically only work in IE and it’s a way to specify how content flows around an element (like a “float”).
  • The JavaScript stuff, i.e. how the “this” keyword really works in the context of a script, a function, an object, and an event handler. (This is a big topic and can take some time to grasp if you are new to JavaScript – I recommend this article).
  • More JavaScript stuff: call() versus apply() functions to force a certain meaning to the “this” keyword, and the various ways of attaching event handlers to elements.
  • How to do layouts with columns, grids, and flexible boxes. Again, these aren’t covered in the book, and they’re quite large topics. Basically, the multi-column properties are just what they sound like; grids are like tables and they only work in Internet Explorer, and flexible boxes give a “container” element the ability to alter its child elements’ width and height to best fill the available space.
  • Know how to style text with underlines, sizes, shadows, etc. Also, know how to put shadows and curved borders around boxes with the box-shadow and border-radius CSS properties.
  • CSS positioning. For me it was one of the hardest concepts in the exam, and it is a key concept of CSS, so you’d better spend some time on it. Here’s my quick summary:
    • Static positioning tells the browser to put the element where it should go according to the HTML. With this scheme, setting the “top”, “left”, etc. values don’t make any sense.
    • Relative positioning takes the element, places it where it should go according to the HTML, and then moves it relative to this position.
    • Absolute positioning tells the browser to *remove* the element from the HTML flow and put it relative to the first non-static element. If all the parent elements are static, it will position it relative to the browser window.
    • Fixed positioning is like absolute positioning, but always in relation to the browser window.
  • How to make calls to web services using AJAX, in all four combinations: sync, async, with jQuery and without jQuery. Also, know how to be cross-browser compatible without jQuery (i.e. using ActiveX objects instead of XMLHttpRequest objects).
  • How to implement classes and inheritance in JavaScript. Coming from C#, this was a huge move for me and it was not intuitive at first. The fact that you define a class by wrapping it inside a function blew my mind 🙂
  • Web workers. Know how they work, i.e. it’s a script placed in a different file that executes in the background, and it doesn’t have access to the DOM, so you communicate with it through bidirectional messages. Also, know how to terminate them from the client and within the worker itself.
  • How to get the location of a user using the geolocation APIs, in the two flavors: one-time requests or continuous updates.
  • The RegEx object in JavaScript, and how to write patterns to test strings against them, so as to validate user input. Regexes are a big topic on itself so you should devote some time to it if you’re new to it.
  • Media queries, also not explained in the book.

And here are a few tips that got me through my learning:

  •  The first thing you should do before preparing for the exam is actually knowing what the exam is about and identify the areas you need to work on and which areas are new to you. For this, read the Skills section on the exam description.
  • When in doubt, ask StackOverflow. I found it especially useful to search for the differences between topics, e.g. the difference between “return true” and “return false” in event handlers, the different jQuery methods to set the content of an element (val(), text() and html()), the meaning of the “accept”, “dataType” and “contentType” parameters in AJAX…
  • Do the practice exercises of the book. They are not just “copy and paste” instructions; they actually tell you, for example, “implement a function that will take user input and perform this and that” and then show you the code to do it; you can code it yourself and then “peek” at the answer. Also, try to add your own functionality to the sites you create.
  • Have a look at to see what features and APIs are available in each browser.
  • To learn CSS: If you find a site you like or with an interesting layout, try copying it to see what you come up with.

Happy studying and good luck in the exam!

NodeBots Day 2015 – Medellin

El sábado pasado tuve la oportunidad de participar del NodeBots Day 2015 en Medellin @NodeBotsMed. Fue un día realmente muy divertido y el primero de estos eventos en los que participo.


El lugar donde se realizó el evento me sorprendió gratamente. El edificio de RutaN es bellísimo. Con el estilo que se repite en todo Medellín, esta coloreado de mucho verde y naturaleza.

NodeBots Day Medellin - RutaN


Siguiendo el espíritu de NodeBots Day no hubo una charla formal sino mas bien una introducción por parte de nuestro facilitador @Julian_Duque que nos contó un poco sobre los origenes de Javascript Robotics, un poco de la Serial Port library y el modulo Johnny-Five.

Manos a la obra

Rapidamente nos pusimos todos manos a la obra y provistos de unos SumoBots kit comenzamos armando los sumobots. Los Sumobots son unos modelos open source especialmente diseñados para los NodeBots Day, fáciles de construir, de ensamblar y muy económicos. Podrá parecerles exagerado, pero hacía mucho tiempo que no realizaba trabajos manuales con tecnología que ya nos resulta desconocida: Cables, Tornillos, Taladros, Trinchetas, Pegamento, Lijas!!

(Source: MeetupJS)

Firmware y Software

Para proveer un poco de firmware a los NodeBots y programarlos para jugar un rato pudimos disponer de unos Dev Kits de ElectricImp y usamos el Imp IDE para flashear los modulitos. Esto ultimo, toda una experiencia aparte que hace sonreír al niño nerd interior que todos tenemos!

Internet of Things

ElectricImp hace muy facil la implementacion de una solucion de IoT. Basicamente permite conectar los modulos (Devices) via WIFI con la nube de ‘Agents’ de ElectricImp. Para esto usamos el Imp IDE para subir las librerías tyrion y imp-io para poder manejar los NodeBots remotamente.


ElectricImp Architecture


Que puede ser más entretenido de hacer un sábado por la tarde que unas luchas de sumo con NodeBots manejados remotamente desde la nube. No hay nada mas que decir, dejo alguna fotos subidas por algunos usuarios a la pagina del Meetup NodeBots

Los nuevos amigos del equipo de SumoBots: Carlos Alberto Mesa Rivillas, Elias Quintero y Alejandro Berrío.

NodeBots Day Medellin - Equipo

Mas para leer

Introduction to Azure Data Factory

We live in a world where data is coming at us from everywhere. IoT is evolving so quickly that right now it seems almost every device is capable of producing valuable information (from water quality sensors to smartwatches). At the same time, the amount of data collected is growing exponentially in volume, variety, and complexity, making the process of learning useful information about Terabytes of data stored in different places (data sources from a variety of geographic locations) a complex scenario that requires the creation of custom logic that has to be maintained and updated over time.

There are several tools and services nowadays that are used to simplify the process of extracting, transforming and loading (ETL) data from different (and most likely) heterogeneous data sources into a single source: an Enterprise Data Warehouse (EDW). Their goal is to obtain meaningful business information (insights) that could help improve products and make decisions.

In this post, we are going to explore Azure Data Factory, the Microsoft cloud service for performing ETL operations to compose streamlined data pipelines that can be later consumed by BI tools or monitored to pinpoint issues and take corrective actions.

Azure Data Factory

Azure Data Factory is a fully managed service that merges the traditional EDWs with other modern Big Data scenarios like Social feeds (Twitter, Facebook), device information and other IoT scenarios. This service lets you:

  • Easily work with diverse data storage and processing systems, meaning you can process both on-premises data (like a SQL Server) and cloud data sources such as Azure SQL Database, Blob, Tables, HDInsight, etc.
  • Transform data into trusted information, via Hive, Pig and custom C# code activities that can be fully managed by Data Factory on our behalf (meaning that, for instance, no manual Hadoop cluster setup or management is required)
  • Monitor data pipelines in one place. For this, you can use an up-to-the-moment monitoring dashboard to quickly assess end-to-end data pipeline health, pinpoint issues, and take corrective action if needed.
  • Get rich insights from transformed data. You can create data pipelines that produce trusted data, which can be later consumed by BI and analytic tools.


Now that we know the basics let’s see each of these features in a real scenario. For this, we are going to use the Gaming customer profiling sample pipeline provided in the Azure Preview Portal. You can easily deploy this Data Factory in your own Azure subscription following this tutorial and explore it using the Azure Preview Portal. For instance, this is the Data Factory diagram of this sample (you can visualize it by clicking the Diagram tile inside the Data Factory blade):


The following is a brief description of the sample:

“Contoso is a gaming company that creates games for multiple platforms: game consoles, hand held devices, and personal computers (PCs). Each of these games produces tons of logs. Contoso’s goal is to collect and analyze the logs produced by these games to get usage information, identify up-sell and cross-sell opportunities, develop new compelling features, etc. to improve business and provide better experience to customers. This sample collects sample logs, processes and enriches them with reference data, and transforms the data to evaluate the effectiveness of a marketing campaign that Contoso has recently launched. “

Easily work with diverse data storage and processing systems

Azure Data Factory currently supports the following data sources: Azure Storage (Blob and Tables), Azure SQL, Azure DocumentDB, On-premises SQL Server, On-premises Oracle, On-premises File System, On-premises MySQL, On-premises DB2, On-premises Teradata, On-premises Sybase and On-premises PostgreSQL.

For instance, the Data Factory sample combines information from Azure Blob Storage:


Transform data into trusted information

Azure Data Factory currently supports the following activities: Copy Activity (on-premises to cloud, and cloud to on-premises), HDInsight Activity (Pig, Hive, MapReduce, Hadoop Streaming transformations), Azure Machine Learning Batch Scoring Activity, Azure SQL Stored Procedure activity, Custom .NET activities.

In the Data Factory sample, one of the pipelines executes 2 activities: an HDInsight Hive Activity to bring data from 2 different blob storage tables into a single blob storage table and a Copy Activity to copy the results of the previous activity (in an Azure Blob) to an Azure SQL Database.


Monitor data pipelines in one place

You can use the Azure Preview Portal to view details about the Data Factory resource, like linked services, datasets and their details, the latest runs of the activities and their status, etc. You can also configure the resource to send notifications when an operation is complete or has failed (more details here)


Get rich insights from transformed data

You can use data pipelines to deliver transformed data from the cloud to on-premises sources like SQL Server, or keep it in your cloud storage sources for consumption by BI tools and other applications.

In this sample we collect log information and reference data that is then transformed to evaluate the effectiveness of marketing campaigns, as seen in the image below:


Next steps

Twitter Sentiment Analysis

> Note: If you are not familiar with machine learning you can start with this post which explains the basic concepts of Machine Learning and the Azure Machine Learning service.

The purpose of this post is to explain how to build an experiment for sentiment analysis using Azure Machine Learning and then publish it to a public API that can be consumed by any application that needs to use this feature for a particular business scenario (e.g. gather user’s opinions about a product or brand, etc.). Since there is already a Text Analytics API in the Azure Marketplace in English, we decided to create it in Spanish. And to simplify things, we used the sample Twitter Sentiment analysis experiment available in the Azure Machine Learning Gallery.

Creating a custom dataset

This is our greatest challenge: create a valid dataset but with Spanish content. There is an existing dataset used in the sample experiment we are going to use as a basis for our work, which you can find here. This experiment is based on an original dataset of 1,600,000 tweets classified as negative or positive. The Azure ML Studio sample dataset contains only 10% of this data (160,000 records). In supervised learning the more training data you have, the more accurate your trained model will be, and that’s why the first thing we want is a dataset with a considerable amount of data.

As this dataset is in English, the predictive model will learn to process English text. But since we want to create a service using the Spanish language, our data needs to be in Spanish.

To get the data in Spanish we could use Spanish tweets and manually classify them (which would take a long time) or use the original dataset translated to Spanish. In the latter option, the hard work of classifying the data is already done and we could use an automatic translation tool to do the work for us. Although automatic translation is not 100% accurate, the keywords will be there, so we’re going to go with this approach to make sure we have a good quantity of training data.

For this reason we created a very simple console application that uses the Bing Translate API to translate our dataset and return it in the correct format.

Once we have the dataset ready, the next step is to upload it to Azure ML studio so it is available to use in the experiments.

To upload the recently created dataset, in the Azure ML portal click NEW, select DATASET, and then click FROM LOCAL FILE. In the dialog box that opens, select the file you want to upload, type a name, and select the dataset type (this is usually inferred automatically). In our case, it is a TAB separated values file (.tsv).

uploading a new dataset

The data in the dataset contains only 2 columns, the sentiment_label, which is 0 for a negative sentiment and 4 for positive.

Sample input data

Once the dataset is created, we will take advantage of the existing sample experiment of the Machine Learning Gallery, available here.

Open the experiment by clicking Open in Studio as shown below.

sample experiment

Then, you will be prompted to copy the experiment from the Gallery to your workspace.

copying from gallery

At this point let’s remove the Reader module from the experiment and add the custom dataset we created. Connect the dataset to the Execute R Script module.

Run the experiment.

Running the experiment

Pre-processing the data

This experiment uses several modules to pre-process the data before analyzing its content (like removing punctuation marks or special characters, or adjusting the data to fit the algorithm used). For more information about the data preprocessing, you can read the information available in the experiment page in the Gallery.

Scoring the model

After running the predictive experiment, let’s create the scoring model. To do this, point to SET UP WEB SERVICE and select Predictive Web Service [Recommended].

setting up the web service

Once the Predictive Experiment is created, we need to update this experiment to make it work as expected. First delete the Filter Based Feature Selection Module and reconnect the Feature Hashing module to the Score Model module.

Delete the connection between the Score Model module and the Web Service Output module by right-clicking it and clicking Delete.

deleting a conection

Between those two modules, add a Project Columns module, and then an Execute R Script module. Connect them in sequence and also with the Web Service Output module. The resulting experiment will resemble the following image.

resulting experiment

Now let’s configure the Project Columns module. Select it and in the Properties pane, click Launch column selector. In the dialog box that opens, in the row with the Include dropdown, go to the text field and add the four available columns (sentiment_label, tweet_text, Scored Labels, and Scored Probabilities).

projecting columns

Lastly, select the Execute R Script to configure it. Click inside the R Script text box and replace the existing script with the following:

# Map 1-based optional input ports to variables
dataset1 <- maml.mapInputPort(1) # class: data.frame

#set thresholds for classification
threshold1 <- 0.60
threshold2 <- 0.45
positives <- which(dataset1["Scored Probabilities"] > threshold1)
negatives <- which(dataset1["Scored Probabilities"] < threshold2)
neutrals <- which(dataset1["Scored Probabilities"] <= threshold1 &
dataset1["Scored Probabilities"] >= threshold2)

new.labels <- matrix(nrow=length(dataset1["Scored Probabilities"]),
new.labels[positives] <- "positive"
new.labels[negatives] <- "negative"
new.labels[neutrals] <- "neutral"

data.set <- data.frame(assigned=new.labels,
confidence=dataset1["Scored Probabilities"])
colnames(data.set) <- c('Sentiment', 'Score')

# Select data.frame to be sent to the output Dataset port

This will return two columns as the output of the service: Sentiment and Score.

The sentiment column will be returned as Positive, Neutral or Negative and the Score column will be the Score Probability. The classification will be made based on the defined thresholds and will fall into the following 3 categories:

  • Less than 0.45: Negative
  • Between 0.45 and 0.60: Neutral
  • Above 0.60: Positive

Now that everything is set up, we can run the experiment.

Publishing and Testing the Web Service

Once the predictive experiment finishes, click Deploy Web Service. The deployed service screen will appear. Click Test.

published web service

In the Enter data to predict dialog box, enter a text in Spanish in the TWEET_TEXT parameter and click the check mark button.

entering data to predict

Wait for the web service to predict the results, which will be shown as an alert.

prediction results

We have made the following test page that uses our generated API to test the service.

testing app

Next Steps

We tested the resulting API with some sample text, and we are pleased with the outcome (the model learned how to classify Spanish texts quite well). Nevertheless, there are some ways to improve the model we have created, such as:

  • Trying other training algorithms and comparing their performance
  • Improving the input dataset, either by having a brand new dataset with manually classified information in Spanish or using common keywords for getting classified results.

Given that this is a proof of concept, we consider this to be a successful experiment.

Docker Compose: Scaling Multi-Container Applications


In the Docker Compose: Creating Multi-Container Applications blog post we talked about Docker Compose, the benefits it offers, and used it to create a multi-container application. Then, in the Running a .NET application as part of a Docker Compose in Azure blog post, we explained how to create a multi-container application composed by a .NET web application and a Redis service. So far, so good.

However, although we can easily get multi-container applications up and running using Docker Compose, in real environments (e.g. production) we need to ensure that our application will continue responding even if it receives numerous requests. In order to achieve this, those in charge of configuring the environment usually create multiple instances of the web application and set up a load balancer in front of them. So, the question here is: Could we do this using Docker Compose? Fortunately, Docker Compose offers a really simple way to create multiple instances of any of the services defined in the Compose.

Please notice that although Docker Compose is not considered production ready yet, the goal of this post is to show how a particular service can be easily scaled using this feature so you know how to do it when the final version is released.

Running applications in more than one container with “scale”

Docker Compose allows you to generate multiple containers for your services running the same image. Using the “scale” command in combination with a load balancer, we can easily configure scalable applications.

The “scale” command sets the number of containers to run for a particular service. For example, if we want to run a front end web application in 10 different containers we would use this command.

Considering the scenario we worked on in the Running a .NET application as part of a Docker Compose in Azure blog post, how could we scale the .NET web application service to run in 3 different containers at the same time? Let’s see…

Check/update the docker-compose.yml file

The first thing we need to do is ensure that the service we want to scale does not specify the external/host port. If we specify that port, the service cannot be scaled since all the instances would try to use the same host port. So, we just need to make sure that the service we want to scale only defines the private port in order to let Docker choose a random host port when the container instances are created.

But, how do we specify only the private port? The port value can be configured as follows:

  • If we want to specify the external/host port and the private port, the “ports” configuration would look like this:
  • If we want to specify only the private port, this would be the “ports” configuration:

In our scenario, we want to scale the .NET web application service called “net“; therefore, that service should not specify the external port. As you can see in our docker-compose.yml file displayed below, the ports specification for the “net” service only contains one port, which is the private one. So, we are good to go.


  image: websiteondocker


   – “210”


   – redis


  image: redis


Remember that the private port we specify here must be the one we provided when we published the .NET application from Visual Studio since the application is configured to work on that port.

Scaling a service

Now that we have the proper configuration in the docker-compose.yml file, we are ready to scale the web application.

If we don’t have our Compose running or have modified the docker-compose.yml file, we would need to recreate the Compose by running “docker-compose up -d“.

Once we have the Compose running, let’s check the containers we have running as part of the Compose by executing “docker-compose ps“:


As you can see, there is one container running that corresponds to the “net” service (.NET web application) and another container corresponding to the Redis service.

Now, let’s scale our web application to run in 3 containers. To do this, we just need to run the scale command as follows:

docker-compose scale net=3


In the previous command, “net” is the name of the service that we want to scale and “3” is the amount of instances we want. As a result of running this command, 2 new containers running the .NET web application will be created.


If we check the Docker Compose containers now, we’ll see the new ones:


We need to consider that Docker Compose remembers the amount of instances set in the scale command. So, from now on, every time we run “docker-compose up -d” to recreate the Compose, 3 containers running the .NET web application will be created. If we only want 1 instance of the web application again, we can run “docker-compose scale net=1“. In this case, Docker Compose will delete the extra containers.

At this point, we have 3 different containers running the .NET web application. But, how hard would it be to add a load balancer in front of these containers? Well, adding a load balancer container is pretty easy.

Configuring a load balancer

There are different proxy images that offer the possibility of balancing the load between different containers. We tested one of them: tutum/haproxy.

When we created the .NET web application, we included logic to display the name of the machine where the requests are processed:


    ViewBag.Title = “Home Page”;


<h3>Hits count: @ViewBag.Message</h3>

<h3>Machine Name: @Environment.MachineName</h3>


So, once we set a load balancer in front of the 3 containers, the application should display different container IDs.

Let’s create the load balancer. In our scenario, we can create a new container using the tutum/haproxy image to balance the load between the web application containers by applying any of the following methods:

  • Manually start the load balancer container:
    We can manually start a container running the tutum/haproxy image by running the command displayed below. We would need to provide the different web app container names in order to indicate to the load balancer where it should send the requests.

docker run -d -p 80:80 –link <web-app-1-container-name>:<web-app-1-container-name> –link <web-app-2-container-name>:<web-app-2-container-name> … –link <web-app-N-container-name>:<web-app-N-container-name> tutum/haproxy


  • Include the load balancer configuration as part of the Docker Compose:
    We can update the docker-compose.yml file in order to include the tutum/haproxy configuration. This way, the load balancer would start when the Compose is created and the site would be accessible just by running one command. Below, you can see what the configuration corresponding to the load balancer service would look like. The “haproxy” service definition specifies a link to the “net” service. This is enough to let the load balancer know that it should distribute the requests between the instances of the “net” service, which correspond to the .NET web application.


  image: tutum/haproxy


   – net


   – “80:80”


In our scenario, we will apply the second approach since it allows us to start the whole environment by running just one command. Although in general we think that it is better to include the load balancer configuration in the Compose configuration file, please keep in mind that starting the load balancer together with the rest of the Compose may not always be the best solution. For example, if you scaled the web application service adding new instances and you want the load balancer to start considering those instances without the site being down too long, restarting the load balancer container manually may be faster than recreating the whole compose.

Continuing with our example, let’s update the “docker-compose.yml” file to include the “haproxy” service configuration.

First, open the file:

vi docker-compose.yml


Once the file opens, press i (“Insert”) to start editing the file. Here, we will add the configuration corresponding to the “haproxy” service:


  image: tutum/haproxy


   – net


   – “80:80”


  image: websiteondocker


   – “210”


   – redis


  image: redis


Finally, to save the changes we made to the file, just press Esc and then :wq (write and quit).

At this point, we are ready to recreate the Compose by running “docker-compose up -d“.


As you can see in the previous image, the existing containers were recreated and additionally, a new container corresponding to the “haproxy” service was created.

So, Docker Compose started the load balancer container, but is the site working? Let’s check it out!

First, let’s look at the container we have running:


As you can see, the load balancer container is up and running in port 80. So, since we already have an endpoint configured in our Azure VM for this port, let’s access the URL corresponding to our VM.


The site is running! Please notice that the Container ID is displayed on the page. Checking the displayed value against the result we got from the “docker ps” command, we can see that the request was processed by the “netcomposetest_net_3” container.

If we reload the page, this time the request should be processed by a different container.


This time, the request was processed by the “netcomposetest_net_4” container.

At this point we have validated that the .NET web application is running in different containers and that the load balancer is working. Plus, we have verified that all the containers are consuming information from the same Redis service instance since, as you can see, the amount of hits increased even when the requests were processed by different web application instances.

Now, what happens if we need to stop one of the web application containers? Do we need to stop everything? The answer is “No”. We can stop a container, and the load balancer will notice it and won’t send new requests to that container. The best thing here is that the site continues running!

Let’s validate this in our example. Since we have 3 web application containers running, we can stop 2 of them and then try to access the site.

To stop the containers, we can run the “docker stop <container-name>” command. Looking at the result we got from the “docker ps” command, we can see that our containers are called “netcomposetest_net_3“, “netcomposetest_net_4” and “netcomposetest_net_5“. Let’s stop the “netcomposetest_net_3” and “netcomposetest_net_4” containers.


Now, if we reload the page, we will see that the site is still working!


This time the request was processed by the only web application container we have running: “netcomposetest_net_5“.

If we keep reloading the page, we will see that all the requests are processed by this container.