There may be a document-oriented database in your future.
The technology is catching on widely, right now, proving itself a cornerstone for many companies operating at large-scale on the web: LinkedIn, Facebook, Orbitz, and Zynga, for instance.
Many deployments are based on open-source code, which is where the broader NoSQL movement, if that's the right word, has its origins. But a number of commercial ventures are plying the document database waters, following the strategy pioneered by Red Hat of Linux fame and commercializing open-source code by adding significant function and offering a menu of technical consulting services to support it.
Recent weeks have seen some major developments that underscore the maturing of the market. 10Gen, the company behind a DBMS called MongoDB, took a round of investment from two major players: Intel, through its venture capital arm; and Red Hat, which is moving aggressively into cloud-based services. (10Gen only recently took money from In-Q-Tel, an investment fund associated with the CIA.)
While 10Gen declined to say how much money its new backers had put up, it did reveal that it now has more than 500 paying customers, half of them so-called global 2,000 companies. Among these are Disney and Foursquare, a location-based social networking site.
Meanwhile, 10Gen's main competitor, Couchbase, launched a beta of its release 2.0 product, offering added secondary indexing for the content of JSON documents and the ability to replicate databases between server clusters so as to better prepare for disaster recovery or to bring data closer to users and thus reduce latency.
Couchbase has told Curt Monash, the noted DBMS market researcher, that it has more than 350 subscription customers, including LinkedIn; gamemaker Zynga (also an investor in Couchbase); and Orbitz, the travel booking site.
These companies, and a variety of open-source projects also active in building document-oriented databases, see potential mainly in greenfield situations, though also in situations where users are bumping into the limitations of their well-established relational systems. Mobile and web-based applications are the main target, for they typically have to manage tremendous numbers of records, any one of which may need to be served up in a flash. And what's more, over time, these databases must be open to expansion as customers wish to add more categories of data.
Relational packages can get stretched to the limit when confronted with these requirements. Beyond maybe eight or nine servers working together, their performance doesn't scale well; for more performance the databases must be split into pieces and handled by disparate clusters, with every application updated, as well, as to where different sets of records are now stored.
Relation databases also have problems accommodating new kinds of records. It can take weeks of time and effort to re-engineer the schema that define the rows and columns that make up tables of records.
What the document-oriented DBMS does is treat every field in every record as its own document, perhaps only a few bytes in size. These are generally formatted according to the JSON scheme and indexed with a hash value that makes for extremely rapid retrieval.
AOL, for instance, is using Couchbase's code to store around 500 million user profiles, all of them ready for near-instantaneous retrieval for the sake of choosing just the right ads for webpages any of those users happen to be visiting. AOL has just 40ms to customize an ad and send it on its way to a remote screen.
Storing data in SSD can help with performance, no question, but it's not always necessary. For the kinds of online games that Zynga runs, for instance, perhaps only a million users are active at any given moment, and their records can be pulled into RAM for speedy retrieval.
If it's high performance flexible data you need, a document-oriented DBMS may be just the thing.