Do you need help with your business? Fill out the email form to the right to send me a message. Please describe the type of business or technology issues you're having in as much detail as possible. If I can't help you, I will find you someone who can at an affordable price. You may also wish to check out Future Media Venture Group or for live response during business hours (PST) call (855) FMVG-777.
Here is a brief overview of the leading open source Big Data technologies currently available on the market and their functions.
Hadoop - A scalable distributed file system that a great majority of the giant media companies use nowadays. It uses mapping and reducing algorithms to process search requests by first parallel mapping (searching) the data across the cluster and then reducing the results by merging the findings into a finished table. Hadoop has a new type of file system called HDFS (similar to the proprietary Google File System or GoogleFS). HDFS is a highly fault-tolerant scalable file system written in Java. It normally sits on a bunch (can be thousands) of inexpensive computers with cheap drives, with files and directories scattered everywhere and operates as a cluster. HDFS can support up to 4500 servers and 200 petabyte addressible file space for a partition. A petabyte being about 1000 terrabytes (TBs), that is a total capacity of about 200,000 TBs or 200 million gigabytes (GBs). The MapReduce framework functionality is able to search the entire cluster in milliseconds and locate the file or data. Access to the HDFS file system is with the use of specialized hdfs commands (Ex. "hadoop fs -mkdir /user/hadoop/dir1 /user/hadoop/dir2" or "hadoop fs -ls /user/hadoop/dir1/filename.txt").
There is of course a lot more to Big Data than a cluster of file servers running a virtual file system. Below are the most popular Big Data database technologies and concepts that I'm familiar with, but first let's clear up what Big Data databases are about. Big Data databases are primarily focused on storing non-relational data. NoSQL is a concept that describes the process of storing non-relational data. The various data types are, document (ex. MongoDB, CouchDB), graph, key and value (ex. Riak), and wide-column hybrid (ex. Cassandra). At the highest level in a big data system all data is in the form of keys and values, where they keys are the indexes and the values are complex structures (documents, hashes, graphs, etc).
The various leading database technologies (in no particular order) are:
Recent comments