In February of this year, I published the article, Is my developer team ready for big data?, with a figure representing the subjective complexity of mobile, cloud, big data, and NoSQL technologies:
My editor for that article, Marie Beaugureau, pushed back on the figure—and rightly so. What’s the scale we’re using here? What makes big data and NoSQL more complex than cloud or mobile?
To give you some background, I’ve written and taught courses on all four of these subjects. By writing and teaching these courses, I’ve created a good baseline of complexity across all four subjects, and for how hard it is to teach each one to a beginner in the field. These baselines span hundreds of companies, thousands of use cases, thousands of students, and several countries (but primarily based in the U.S.).
For the past two years, I’ve tried to show—in a simple way—how big data is different.
I attribute the complexity of big data to two primary reasons. The first being that you need to know 10 to 30 different technologies, just to create a big data solution. The second reason is that distributed systems are just plain hard.
So many technologies
Let’s start off with a diagram that shows a sample architecture for a mobile product, with a database back end. Figure 2 (click on links to view graphics) illustrates a run-of-the-mill mobile application that uses a web service to store data in a database.
Let’s contrast the simple architecture for a mobile app with Figure 3, which shows a starter Hadoop solution.
As you can see, Hadoop weighs-in at double the number of boxes in our diagram—why? What’s the big deal? A “simple” Hadoop solution actually isn’t very simple at all. You might think of this more as a starting point or, to put it another way, as “crawling” with Hadoop. The “Hello World” of a Hadoop solution is more complicated than other domain’s intermediate to advanced setups. See my source code for a Hello World in big data that’s not so simple.
Now, let’s look at a complicated mobile project’s architecture diagram in Figure 4.
Note: that’s the same diagram as the simple mobile architecture. Usually, a more complex mobile solution requires more code or more web service calls, but no additional technologies are added to the mix.
Let’s contrast a simple big data/Hadoop solution with a complex big data/Hadoop solution in Figure 5.
Yes, that’s a lot of boxes, representing various types of components you may need when dealing with a complex big data solution. This is what I call the “running” phase of a big data project.
You might think I’m exaggerating the number of technologies to make a point; I’m not. I’ve taught at companies where this is their basic architectural stack. I’ve also taught at companies with twice as much complexity in their architectural stack.
Read the rest of the article at oreilly.com.