NoSQL Databases: comparing MongoDB, HDInsight, and DocumentDB

Click here to view original web page at www.kdnuggets.com

Tags: DocumentDB, HDInsight, MongoDB, NoSQL

We compare 3 major NoSQL databases: MongoDB, DocumentDB, and HDInsight in terms of data models, scalability, availability, query types, and support for transactions.

By Jack Dawson, BigDropInc

There are many NoSQL solutions, but this blog will examine choosing between 3 major NoSQL solutions:

  • MongoDB, an open-source document database, leading NoSQL database
  • Microsoft Azure DocumentDB (a fully-managed, highly-scalable, NoSQL document database service)
  • Microsoft Azure HDInsight (100% Apache Hadoop-based service in the cloud), and
mongodb-hdinsight-documentdb

Where is each one more appropriate?

Data Model

If you have highly fluid data, MongoDB and DocumentDB are the best option because they use flexible data structure. MongoDB BSON (Binary JSON) data format and DocumentDB JSON format are useful for data storage. This data is stored in documents.

Another similarity of the two is that they have reserved ‘ID’ that take a GUID for the representation of unique record. MongoDB reserve field is  ‘_id’ while DocumentDB has ‘id’.

Hosting

More and more organizations are going the cloud hosting way, but if your policy still calls for ‘on-premise hosting, MongoDB is the solution for you. Both DocumentDB and HDInsight use the Azure Cloud for hosting. Note that MongoDB also allows for cloud hosting, making it a good option if your business is in transition.

Scalability & Availability

If you are planning to increase your capacity in the future, DocumentDB and MongoDB are good options because they allow for both horizontal and vertical scaling (sharding). DocumentDB, which uses the Azure Cloud for hosting, uses a cluster of the servers in the cloud. All these servers support read and write operations. On the other hand, MongoDB has shard clusters where the nodes are added using scripts. There are primary servers which support both read and write operations and secondary nodes that only support read operations.

Availability is not a problem with both MongoDB and DocumentDB. MongoDB ensures there is high availability through the configuration of a secondary server to act as the primary server when the primary server goes down.  DocumentDB uses the Azure feature to manage server availability.

DocumentDB is designed specifically for web applications and mobile devices. This means you will not get the best from it if you are not using web applications or mobile devices.

Transactions and Analytics

MongoDB and HDInsight support map reduce queries. If you have high volumes of data that require the map to reduce queries, you should go for HDInsight because you will get better performance than what you would with MongoDB.

If you need to process transactional data, use either MongoDB or DocumentDB. HDInsight will not help you because it only supports read operation, meaning you can only use it for analysis.

If you want more than analytics and a transactional system, MongoDB is the best option because it supports aggregation. DocumentDB and HDInsight do not support this feature yet. As an example, you can integrate MongoDB with Hadoop for additional features. Note that HDInsight is also a Hadoop distribution.

Management and Consistency

Management is not a problem with both DocumentDB and MongoDB. With a DocumentDB account, you will get a web interface from Azure for management and monitoring of the account. You can monitor such things as usage and you can even modify the metrics to match your specific business needs. MongoDB has an ops manager that allows for monitoring. It gives you dashboards, charts, and even customized alerts for easy usage monitoring and it allows for customized metrics.

For consistency, both DocumentDB and MongoDB are good options because they use ACID properties (at the document level) to ensure safe updating of documents. It there is error, the operation rolls back. With MongoDB, developers can specify the write concerns. With DocumentDB, the ACID properties allow for the definition of different consistency levels, which determine execution of the process of the read operation following a write operation.

Query Types

In MongoDB, there are system-defined operators and methods to perform such operations as filtering and aggregation. MongoDB has a ‘Find’ method that allows you to define the number of fields to return. MongoDB also allows you to search nested structures. DocumentDB supports user defined functions or UDF, triggers, and stored procedures, but it is disadvantageous in that there are no groups by options and there are no such methods as average and sum – you have to write custom logic for this.

Support

More people are using MongoDB than are using DocumentDB and HDInsight. Using MongoDB, therefore, has an advantage over using the other solutions because there are companies like Remote DBA Support that have well trained and experienced personnel in the solution. There are also large communities where you will find any information you need.

Why NoSQL

MongoDB, DocumentDB and HDInsight have several features in common because they are all NoSQL solutions. Carlo Strozzi first used the term NoSQL in 1998 to refer to his open-source relational DB. The term later saw re-introduction by Eric Evans in 2009.

So, why would any business go for a NoSQL solution where there are so many established Relational Database Management Systems (RDBMSs) in the market?

  1.  Studies have shown that between 80% and 90% of all organizations use unstructured data. This is data from mobile devices, social media, customer statistics, and other sources. Organizations are now abandoning traditional DB solutions such as MySQL in favor of NoSQL solutions like MongoDB, HDInsight, and DocumentDB.
  2. The greatest advantage of NoSQL databases over relational DBs is that NoSQL DBs perform better. They are quick to develop and they can work on low-end devices.
  3. There is no data format limitation with NoSQL. In RDBMS, you have to format data to a specific extension before adding it to a server. Over time, servers and technologies age and the outputs become outdated. With NoSQL, data formats are fluid, thereby extending the shelf life of your systems.
  4. You get less downtime with NoSQL since you can spread your workload across several servers.
  5. NoSQL gives you higher storage capacity compared to RDBMS. NoSQL is also capable of handling more complex data. These two factors mean reduced data management and storage costs.
  6. NoSQL requires less back end maintenance compared to RDBMS. There is little or no lag or downtime in service – NoSQL updates are self-installing.
  7. Even if you get more support on RDBMS because RDBMS have been around longer, note that you will get JOIN and ACID Support if you are using a NoSQL DB that supports JOIN.

Some of the ideas in this post are based on this post.

Bio: Jack Dawson is a web developer and UI/UX specialist at BigDropInc.com. He works at a design, branding and marketing firm, having founded the same firm 9 years ago. He likes to share knowledge and points of view with other developers and consumers on platforms.

Related:

zclixadmin