Saturday, August 18, 2012

Choosing a NoSQL Database

Where to Start?

When starting a new project, it is necessary to figure out which of the multitude of No SQL providers to choose from.

I found the following diagram very helpful: No-SQL Diagram
Take note that it covers both CAP requirements and data model (Key-Value, Column or Document).

The following article brings home what CAP means in the real world:  MongoDB vs CouchDB

Once we have eliminated a few choices (we are picking from a short-list) the following article is helpful to understand the trade-offs we are making:  No-SQL Summary

What does CAP mean to me ?

When we introduce a second data center into our design, we are forced into an overall AP model.
We know there will be times that the two data centers will lose connectivity from each other (P) and the primary reason we are doing this is to keep our site available (A).

Even if we run an Active-Passive configuration, we must accept that our primary DC may lose power before replicating data to our secondary. If/when we get the primary datacenter back online we will need to reconcile the data (eventually consistent).

The question becomes, do we want to handle this inconsistency at the application or the data-store layer, putting this another way:
Do we want our data store to span data centers?

If we span data centers we need to deal with eventually consistent data on the site, we will have different data at each site and totals will not always match, we have to make sure that these inconsistencies don't impact the user (too much).

If we isolate the data centers (data-store clusters don't span DC's) all the data-stores within the datacenter can be CP (if a fatal problem occurs it is still possible to cut over to the other data-center). But within each DC things are now much simpler as all the stores are consistent.
The fail-over/back events (at the datacenter level) cause inconsistencies that have to be handled at the application layer, however this may be okay if such fail-overs are rare and the desired SLA is more generous.

In summary if data-stores clusters are restricted to a single data center (or AWS region) our application logic can be simpler, than it would be, if it had to handle eventually consistent data.
However we run the risk of having to reconcile data during a datacenter failover. This risk can be mitigated by spreading the nodes of the cluster across independent power/network zones (availability zones) hence reducing the chance of an outage at the datacenter. But this risk can not be eliminated unless we handle inconsistent data at the application layer.

Data Model

The choice of data model comes down to how we want to handle "document linkage". Key/Value stores are easy to understand and the most efficient for simple use cases. However we inevitably get to the point where we want to "serialize" more complex objects into a single value.
Then we discover that we need to access the data in a more complex way and we end up with a more complex solution:
  • Creating reference documents, where the value is a list of keys.
  • Mangling the keys so that we can load data based on a pattern.
  • Using a third party index (such as solr).
Document stores are specifically optimized for this usage pattern - to be able to extract (and update) parts of multiple documents in a single query.

Hence the choice between these technologies is based on how complex we expect our use cases to be.

Sanity Check our Choice

Running a google query on "DATABASE_NAME doesn't scale" is a good way to find the articles about the negatives.

The most important question is why (the given DB doesn't scale), since if a product only has one limiting feature its likely someone is working on fixing it (we can also make an assessment of how likely the fix is).

Anyway here are a couple of good negative articles:

Monday, May 28, 2012

Reseting the Default Root Password on OSX

If you have ever set the root password on an OSX machine using the: passwd
Command you may be wondering how to reset it back to the default;

rm /var/db/shadow/hash/FFFFEEEE-DDDD-CCCC-BBBB-AAAA00000000*

Edit /var/db/dslocal/nodes/Default/users/root.plist

Remove the key "authentication_authority" and any values it has.

Reboot the machine.

Thursday, May 24, 2012

Configuring Centos 6.2 without a Keyboard or Mouse

Turn on Synergy

Note: replace HOSTNAME  with the host that is running the synergy server (synergys).

Add the following to the bottom of: /etc/gdm/Init/Default

/usr/bin/killall synergyc
sleep 1
/usr/bin/synergyc HOSTNAME

exit 0

Create the file: /etc/gdm/PostLogin/Default

/usr/bin/killall synergyc
sleep 1
exit 0

Add the following to the bottom of: /etc/gdm/PreSession/Default

/usr/bin/killall synergyc
sleep 1
/usr/bin/synergyc HOSTNAME

exit 0

Enable Auto Login

Add the following to: /etc/gdm/custom.conf


Sunday, April 22, 2012

Shortest Ever Java Generics Tutorial

Location of Declaration

Formal type parameters can be applied at either the class or method level:

public class SomeClass<A> {

    // <A> Declared at the class level
    A methodA(A a) {
        return a;

    // <B> Declared at the method level.
    <B> B methodB(B b) {
        return b;

Class level type parameters can only by used in non-static context.

Wildcard Declarations

Wildcard parameters allow you more flexibility in your assignments,
but they don't allow you to violate type safety.
So the following declaration is illegal:

ArrayList<Number> numbers = new ArrayList<BigDecimal>();

Because you could violate type safety by adding a BigInteger to the numbers collection.
However the following is okay:

ArrayList<? extends Number> numbers = new ArrayList<BigDecimal>();

Because you can't use the numbers variable to add new elements to the array.

A quick way to think of the wildcard forms is:

DeclarationDescription EffectUsable methods
List<Number>List of NumbersRead/Writelist.get(0);
List<? extends Number>List of Number or List of BigDecimalRead onlylist.get(0);
List<? super Number>List of Number or List of ObjectWrite onlylist.add(number);
List<?>List of unknown...list.size();

The last form may not seem that useful, however you can still perform many "Collection" operations such as getting the size or reversing the elements in the list.


Since all objects inherit from java.lang.Object it is possible to read an Object from any of the lists.
However the last two cases will be of type java.lang.Object
(The only way to maintain type safety is to down cast to Object).

Since removal of elements doesn't create any type safety issues, it is possible to remove elements from all 4 examples.
So the second and forth examples only really prevent addition of new elements.

Another way to put it is that remove() is a collection level operation, like size.


  1. In the interests of brevity I have bent the truth (a bit) with respect to wildcard declarations.
  2. There might be a shorter tutorial that I am not aware of.