Cloud Photo

Azure Data Architect | DBA

CouchDB on Suse VM in Azure For Less Than $100/month (Part 1 of 5)

,

The following outlines a series of posts designed to walk readers through building and benchmarking a reasonably cheap, fast and reliable location tracking app using CouchDB on an OpenSUSE VM in the Azure cloud. This first installment will introduce the project.

  1. Introduction
    1. Goals
    2. Cheap, Fast, Reliable
    3. Eventual Consistency
  2. Installation
    1. Create VM
    2. Installs
    3. SSH Security
    4. Capture VM
  3. Client Code
    1. Create Databases
    2. Create Benchmark/Testing Code
    3. Create Views
    4. Start Replication
    5. Cron Jobs
  4. Performance Tweaks
    1. stale=update_after
    2. CouchDB local.ini mods
  5. Mapping Data
    1. Aggregate Heat Map
    2. Live Map

 

1. Introduction

a. Goals

The goal of this project is to create a repository for tracking data with an ingestion rate of roughly 10,000 records per minute while spending less than $100/month using Azure virtual machines (VMs). For this project, tracking data includes (x,y) coordinates, a timestamp and a unique device identifier. The idea is to be able to draw maps based on the data collected. The goal is not to provide real-time location information, but to show aggregated locations over time.

b. Cheap, Fast, Reliable

Known in some circles as the project management triangle, a project can usually only pick two of cheap, fast or reliable. Since the budget for this project is $100/month, cheap is already a goal. In fact, according to the Azure price list (http://azure.microsoft.com/en-us/pricing/details/virtual-machines/#Linux), the project could run as cheap as $26/month. However, using A1 VMs in the standard tier yields better results while remaining under the $100/month budget. While an ingestion rate of 10,000 records per minute is not blistering fast, it is fast enough for this project. In addition to ingestion rate, the speed of map-reduce and reporting on data must be fast. With an ingestion rate of 10,000 records per minute and using cron to keep the map-reduce views fresh, sum results for a given time period returns in sub-second times even as the total record count exceeded 200 million. For data aggregations such as this, the speed of ingestion is not as important as the speed at which data can be retrieved for reports. So, while the system is taking in tens of thousands of records each minute, it can also return specific results in less than a second. So this project meets the price and performance requirements. For the speed and price trade-off, reliability is balanced. Given the price and speed, the reliability of Azure VMs is actually quite good. So, while this setup is not free, nor blindingly fast, nor 100% reliable, it does present a balanced approach to meeting all three criteria.

c. Eventual Consistency

As mentioned in the goals, this project is not designed to provide real-time location information. One key reason behind this is the eventual consistency of replicated databases in CouchDB. The benefit of a multi-master setup is scaling clients across multiple servers. The drawback is that not all data are immediately available across the replication set.

Leave a Reply