{"id":268,"date":"2018-03-13T10:07:37","date_gmt":"2018-03-13T02:07:37","guid":{"rendered":"http:\/\/www.max-shu.com\/blog\/?p=268"},"modified":"2018-03-13T10:07:37","modified_gmt":"2018-03-13T02:07:37","slug":"cassandra%e3%80%81mongodb%e3%80%81couchdb%e3%80%81redis%e3%80%81riak%e3%80%81hbase%e3%80%81membase%e3%80%81neo4j%e7%ad%89nosql%e6%95%b0%e6%8d%ae%e5%ba%93%e6%af%94%e8%be%83","status":"publish","type":"post","link":"http:\/\/www.max-shu.com\/blog\/?p=268","title":{"rendered":"Cassandra\u3001MongoDB\u3001CouchDB\u3001Redis\u3001Riak\u3001HBase\u3001Membase\u3001Neo4j\u7b49noSQL\u6570\u636e\u5e93\u6bd4\u8f83"},"content":{"rendered":"<div>\n<h2>MongoDB<\/h2>\n<ul>\n<li><strong>Written in:<\/strong>\u00a0C++<\/li>\n<li><strong>Main point:<\/strong>\u00a0Retains some friendly properties of SQL. (Query, index)<\/li>\n<li><strong>License:<\/strong>\u00a0AGPL (Drivers: Apache)<\/li>\n<li><strong>Protocol:<\/strong>\u00a0Custom, binary (BSON)<\/li>\n<li>Master\/slave replication (auto failover with replica sets)<\/li>\n<li>Sharding built-in<\/li>\n<li>Queries are javascript expressions<\/li>\n<li>Run arbitrary javascript functions server-side<\/li>\n<li>Better update-in-place than CouchDB<\/li>\n<li>Uses memory mapped files for data storage<\/li>\n<li>Performance over features<\/li>\n<li>Journaling (with &#8211;journal) is best turned on<\/li>\n<li>On 32bit systems, limited to ~2.5Gb<\/li>\n<li>An empty database takes up 192Mb<\/li>\n<li>GridFS to store big data + metadata (not actually an FS)<\/li>\n<li>Has geospatial indexing<\/li>\n<\/ul>\n<p><strong>Best used:<\/strong>\u00a0If you need dynamic queries. If you prefer to define indexes, not map\/reduce functions. If you need good performance on a big DB. If you wanted CouchDB, but your data changes too much, filling up disks.<\/p>\n<p><strong>For example:<\/strong>\u00a0For most things that you would do with MySQL or PostgreSQL, but having predefined columns really holds you back.<\/p>\n<\/div>\n<div>\n<h2>Riak (V1.0)<\/h2>\n<ul>\n<li><strong>Written in:<\/strong>\u00a0Erlang &amp; C, some Javascript<\/li>\n<li><strong>Main point:<\/strong>\u00a0Fault tolerance<\/li>\n<li><strong>License:<\/strong>\u00a0Apache<\/li>\n<li><strong>Protocol:<\/strong>\u00a0HTTP\/REST or custom binary<\/li>\n<li>Tunable trade-offs for distribution and replication (N,\u00a0R,\u00a0W)<\/li>\n<li>Pre- and post-commit hooks in JavaScript or Erlang, for validation and security.<\/li>\n<li>Map\/reduce in JavaScript or Erlang<\/li>\n<li>Links &amp; link walking: use it as a graph database<\/li>\n<li>Secondary indices: but only one at once<\/li>\n<li>Large object support (Luwak)<\/li>\n<li>Comes in &#8220;open source&#8221; and &#8220;enterprise&#8221; editions<\/li>\n<li>Full-text search, indexing, querying with Riak Search server (beta)<\/li>\n<li>In the process of migrating the storing backend from &#8220;Bitcask&#8221; to Google&#8217;s &#8220;LevelDB&#8221;<\/li>\n<li>Masterless multi-site replication replication and SNMP monitoring are commercially licensed<\/li>\n<\/ul>\n<p><strong>Best used:<\/strong>\u00a0If you want something Cassandra-like (Dynamo-like), but no way you&#8217;re gonna deal with the bloat and complexity. If you need very good single-site scalability, availability and fault-tolerance, but you&#8217;re ready to pay for multi-site replication.<\/p>\n<p><strong>For example:<\/strong>\u00a0Point-of-sales data collection. Factory control systems. Places where even seconds of downtime hurt. Could be used as a well-update-able web server.<\/p>\n<\/div>\n<div>\n<h2>CouchDB (V1.1.1)<\/h2>\n<ul>\n<li><strong>Written in:<\/strong>\u00a0Erlang<\/li>\n<li><strong>Main point:<\/strong>\u00a0DB consistency, ease of use<\/li>\n<li><strong>License:<\/strong>\u00a0Apache<\/li>\n<li><strong>Protocol:<\/strong>\u00a0HTTP\/REST<\/li>\n<li>Bi-directional (!) replication,<\/li>\n<li>continuous or ad-hoc,<\/li>\n<li>with conflict detection,<\/li>\n<li>thus, master-master replication. (!)<\/li>\n<li>MVCC &#8211; write operations do not block reads<\/li>\n<li>Previous versions of documents are available<\/li>\n<li>Crash-only (reliable) design<\/li>\n<li>Needs compacting from time to time<\/li>\n<li>Views: embedded map\/reduce<\/li>\n<li>Formatting views: lists &amp; shows<\/li>\n<li>Server-side document validation possible<\/li>\n<li>Authentication possible<\/li>\n<li>Real-time updates via _changes (!)<\/li>\n<li>Attachment handling<\/li>\n<li>thus, CouchApps (standalone js apps)<\/li>\n<li>jQuery library included<\/li>\n<\/ul>\n<p><strong>Best used:<\/strong>\u00a0For accumulating, occasionally changing data, on which pre-defined queries are to be run. Places where versioning is important.<\/p>\n<p><strong>For example:<\/strong>\u00a0CRM, CMS systems. Master-master replication is an especially interesting feature, allowing easy multi-site deployments.<\/p>\n<\/div>\n<div>\n<h2>Redis (V2.4)<\/h2>\n<ul>\n<li><strong>Written in:<\/strong>\u00a0C\/C++<\/li>\n<li><strong>Main point:<\/strong>\u00a0Blazing fast<\/li>\n<li><strong>License:<\/strong>\u00a0BSD<\/li>\n<li><strong>Protocol:<\/strong>\u00a0Telnet-like<\/li>\n<li>Disk-backed in-memory database,<\/li>\n<li>Currently without disk-swap (VM and Diskstore were abandoned)<\/li>\n<li>Master-slave replication<\/li>\n<li>Simple values or hash tables by keys,<\/li>\n<li>but complex operations like ZREVRANGEBYSCORE.<\/li>\n<li>INCR &amp; co (good for rate limiting or statistics)<\/li>\n<li>Has sets (also union\/diff\/inter)<\/li>\n<li>Has lists (also a queue; blocking pop)<\/li>\n<li>Has hashes (objects of multiple fields)<\/li>\n<li>Sorted sets (high score table, good for range queries)<\/li>\n<li>Redis has transactions (!)<\/li>\n<li>Values can be set to expire (as in a cache)<\/li>\n<li>Pub\/Sub lets one implement messaging (!)<\/li>\n<\/ul>\n<p><strong>Best used:<\/strong>\u00a0For rapidly changing data with a foreseeable database size (should fit mostly in memory).<\/p>\n<p><strong>For example:<\/strong>\u00a0Stock prices. Analytics. Real-time data collection. Real-time communication.<\/p>\n<\/div>\n<div>\n<h2>HBase (V0.92.0)<\/h2>\n<ul>\n<li><strong>Written in:<\/strong>\u00a0Java<\/li>\n<li><strong>Main point:<\/strong>\u00a0Billions of rows X millions of columns<\/li>\n<li><strong>License:<\/strong>\u00a0Apache<\/li>\n<li><strong>Protocol:<\/strong>\u00a0HTTP\/REST (also Thrift)<\/li>\n<li>Modeled after Google&#8217;s BigTable<\/li>\n<li>Uses Hadoop&#8217;s HDFS as storage<\/li>\n<li>Map\/reduce with Hadoop<\/li>\n<li>Query predicate push down via server side scan and get filters<\/li>\n<li>Optimizations for real time queries<\/li>\n<li>A high performance Thrift gateway<\/li>\n<li>HTTP supports XML, Protobuf, and binary<\/li>\n<li>Cascading, hive, and pig source and sink modules<\/li>\n<li>Jruby-based (JIRB) shell<\/li>\n<li>Rolling restart for configuration changes and minor upgrades<\/li>\n<li>Random access performance is like MySQL<\/li>\n<\/ul>\n<p><strong>Best used:<\/strong>\u00a0When you use the Hadoop\/HDFS stack. When you need random, realtime read\/write access to BigTable-like data.<\/p>\n<p><strong>For example:<\/strong>\u00a0For data that&#8217;s similar to a search engine&#8217;s data<\/p>\n<\/div>\n<div>\n<h2>Neo4j (V1.5M02)<\/h2>\n<ul>\n<li><strong>Written in:<\/strong>\u00a0Java<\/li>\n<li><strong>Main point:<\/strong>\u00a0Graph database &#8211; connected data<\/li>\n<li><strong>License:<\/strong>\u00a0GPL, some features AGPL\/commercial<\/li>\n<li><strong>Protocol:<\/strong>\u00a0HTTP\/REST (or embedding in Java)<\/li>\n<li>Standalone, or embeddable into Java applications<\/li>\n<li>Full ACID conformity (including durable data)<\/li>\n<li>Both nodes and relationships can have metadata<\/li>\n<li>Integrated pattern-matching-based query language (&#8220;Cypher&#8221;)<\/li>\n<li>Also the &#8220;Gremlin&#8221; graph traversal language can be used<\/li>\n<li>Indexing of nodes and relationships<\/li>\n<li>Nice self-contained web admin<\/li>\n<li>Advanced path-finding with multiple algorithms<\/li>\n<li>Indexing of keys and relationships<\/li>\n<li>Optimized for reads<\/li>\n<li>Has transactions (in the Java API)<\/li>\n<li>Scriptable in Groovy<\/li>\n<li>Online backup, advanced monitoring and High Availability is AGPL\/commercial licensed<\/li>\n<\/ul>\n<p><strong>Best used:<\/strong>\u00a0For graph-style, rich or complex, interconnected data. Neo4j is quite different from the others in this sense.<\/p>\n<p><strong>For example:<\/strong>\u00a0Social relations, public transport links, road maps, network topologies.<\/p>\n<\/div>\n<div>\n<h2>Cassandra<\/h2>\n<ul>\n<li><strong>Written in:<\/strong>\u00a0Java<\/li>\n<li><strong>Main point:<\/strong>\u00a0Best of BigTable and Dynamo<\/li>\n<li><strong>License:<\/strong>\u00a0Apache<\/li>\n<li><strong>Protocol:<\/strong>\u00a0Custom, binary (Thrift)<\/li>\n<li>Tunable trade-offs for distribution and replication (N,\u00a0R,\u00a0W)<\/li>\n<li>Querying by column, range of keys<\/li>\n<li>BigTable-like features: columns, column families<\/li>\n<li>Has secondary indices<\/li>\n<li>Writes are much faster than reads (!)<\/li>\n<li>Map\/reduce possible with Apache Hadoop<\/li>\n<li>I admit being a bit biased against it, because of the bloat and complexity it has partly because of Java (configuration, seeing exceptions, etc)<\/li>\n<\/ul>\n<p><strong>Best used:<\/strong>\u00a0When you write more than you read (logging). If every component of the system must be in Java. (&#8220;No one gets fired for choosing Apache&#8217;s stuff.&#8221;)<\/p>\n<p><strong>For example:<\/strong>\u00a0Banking, financial industry (though not necessarily for financial transactions, but these industries are much bigger than that.) Writes are faster than reads, so one natural niche is real time data analysis.<\/p>\n<\/div>\n<div>\n<h2>Membase<\/h2>\n<ul>\n<li><strong>Written in:<\/strong>\u00a0Erlang &amp; C<\/li>\n<li><strong>Main point:<\/strong>\u00a0Memcache compatible, but with persistence and clustering<\/li>\n<li><strong>License:<\/strong>\u00a0Apache 2.0<\/li>\n<li><strong>Protocol:<\/strong>\u00a0memcached plus extensions<\/li>\n<li>Very fast (200k+\/sec) access of data by key<\/li>\n<li>Persistence to disk<\/li>\n<li>All nodes are identical (master-master replication)<\/li>\n<li>Provides memcached-style in-memory caching buckets, too<\/li>\n<li>Write de-duplication to reduce IO<\/li>\n<li>Very nice cluster-management web GUI<\/li>\n<li>Software upgrades without taking the DB offline<\/li>\n<li>Connection proxy for connection pooling and multiplexing (Moxi)<\/li>\n<\/ul>\n<p><strong>Best used:<\/strong>\u00a0Any application where low-latency data access, high concurrency support and high availability is a requirement.<\/p>\n<p><strong>For example:<\/strong>\u00a0Low-latency use-cases like ad targeting or highly-concurrent web apps like online gaming (e.g. Zynga).<\/p>\n<\/div>\n","protected":false},"excerpt":{"rendered":"<p>MongoDB Written in:\u00a0C++ Main point:\u00a0Retains some friend &hellip;<\/p>\n","protected":false},"author":1,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[63],"tags":[193,195,198,199,194,200,201,196,197,61],"class_list":["post-268","post","type-post","status-publish","format-standard","hentry","category-63","tag-cassandra","tag-couchdb","tag-hbase","tag-membase","tag-mongodb","tag-neo4j","tag-nosql","tag-redis","tag-riak","tag-61"],"views":1697,"_links":{"self":[{"href":"http:\/\/www.max-shu.com\/blog\/index.php?rest_route=\/wp\/v2\/posts\/268","targetHints":{"allow":["GET"]}}],"collection":[{"href":"http:\/\/www.max-shu.com\/blog\/index.php?rest_route=\/wp\/v2\/posts"}],"about":[{"href":"http:\/\/www.max-shu.com\/blog\/index.php?rest_route=\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"http:\/\/www.max-shu.com\/blog\/index.php?rest_route=\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"http:\/\/www.max-shu.com\/blog\/index.php?rest_route=%2Fwp%2Fv2%2Fcomments&post=268"}],"version-history":[{"count":1,"href":"http:\/\/www.max-shu.com\/blog\/index.php?rest_route=\/wp\/v2\/posts\/268\/revisions"}],"predecessor-version":[{"id":269,"href":"http:\/\/www.max-shu.com\/blog\/index.php?rest_route=\/wp\/v2\/posts\/268\/revisions\/269"}],"wp:attachment":[{"href":"http:\/\/www.max-shu.com\/blog\/index.php?rest_route=%2Fwp%2Fv2%2Fmedia&parent=268"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"http:\/\/www.max-shu.com\/blog\/index.php?rest_route=%2Fwp%2Fv2%2Fcategories&post=268"},{"taxonomy":"post_tag","embeddable":true,"href":"http:\/\/www.max-shu.com\/blog\/index.php?rest_route=%2Fwp%2Fv2%2Ftags&post=268"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}