Free MapReduce Design Patterns: Building Effective Algorithms and Analytics for Hadoop and Other Systems

Rating:

(13 reviews)
Author: Visit Amazon's Donald Miner Page
ISBN : 1449327176
New from $40.49
Format: PDF

Posts about Download The Book Free MapReduce Design Patterns: Building Effective Algorithms and Analytics for Hadoop and Other Systems for everyone book mediafire, rapishare, and mirror link

About the Author

Donald Miner serves as a Solutions Architect at EMC Greenplum, advising and helping customers implement and use Greenplum's big data systems. Prior to working with Greenplum, Dr. Miner architected several large-scale and mission-critical Hadoop deployments with the U.S. Government as a contractor. He is also involved in teaching, having previously instructed industry classes on Hadoop and a variety of artificial intelligence courses at the University of Maryland, BC. Dr. Miner received his PhD from the University of Maryland, BC in Computer Science, where he focused on Machine Learning and Multi-Agent Systems in his dissertation.

Adam Shook is a Software Engineer at ClearEdge IT Solutions, LLC, working with a number of big data technologies such as Hadoop, Accumulo, Pig, and ZooKeeper. Shook graduated with a B.S. in Computer Science from the University of Maryland Baltimore County (UMBC) and took a job building a new high-performance graphics engine for a game studio. Seeking new challenges, he enrolled in the graduate program at UMBC with a focus on distributed computing technologies. He quickly found development work as a U.S. government contractor on a large-scale Hadoop deployment. Shook is involved in developing and instructing training curriculum for both Hadoop and Pig. He spends what little free time he has working on side projects and playing video games.

Books with free ebook downloads available Free MapReduce Design Patterns: Building Effective Algorithms and Analytics for Hadoop and Other Systems Paperback

Product Details
Table of Contents
Reviews

Paperback: 230 pages
Publisher: O'Reilly Media; 1 edition (December 22, 2012)
Language: English
ISBN-10: 1449327176
ISBN-13: 978-1449327170
Product Dimensions: 0.6 x 7 x 9.2 inches
Shipping Weight: 12.6 ounces (View shipping rates and policies)

Free MapReduce Design Patterns: Building Effective Algorithms and Analytics for Hadoop and Other Systems

In the 1990s O'Reilly books had a well-earned reputation for quality. O'Reilly authors such as Simson Garfinkel explained technical topics with precision, clarity, and wit. I proudly kept a whole shelf of O'Reilly books at work, and I imbibed copious java from their tenth anniversary mug. I'm sorry to see that O'Reilly's traditional quality has gone the way of the Internet bubble. MapReduce Design Patterns represents the absolute nadir of technical writing, and it never should have been published in its current form.

One of the most poorly written parts of the book is Appendix A on Bloom filters. As I was writing my original review of the book, I thought it might be helpful to point readers to a better explanation of the topic. Turning to Wikipedia as a potential reference, I was struck by the number of similarities between it and Appendix A. It now appears that this appendix plagiarizes the Wikipedia article "Bloom filter." To see this, compare the opening paragraph of the Wikipedia article (January 19, 2013) to the first two paragraphs of the book's appendix (which you can see in the sample pages here):

Wiki: A Bloom filter, conceived by Burton Howard Bloom in 1970, is a space-efficient probabilistic data structure that is used to test whether an element is a member of a set. (Paragraph 1, sentence 1)

MRDP: Conceived by Burton Howard Bloom in 1970, a Bloom filter is a probabilistic data structure used to test whether a member is an element of a set. (Page 221, paragraph 1, sentence 1)

Wiki: False positive retrieval results are possible, but false negatives are not; i.e. a query returns either "inside set (may be wrong)" or "definitely not in set".

It's a good book that show you a lot of Mapreduce patterns using Hadoop.

But the main trouble it's that you can not trust the examples source code, at all. I Clone the Code Github in my Mac and I've found several bugs..

https://github.com/adamjshook/mapreducepatterns

I'm running the Book's code using a macbook with:

- hadoop-1.0.4
- Mac OS ver 10.6.8
- Java ver "1.6.0_43"
- Eclipse
- Data for running the examples from ( Stack Exchange Data Dump - Dec 2011 _Update_ )

For now, these are the bugs I've found:

Page: 31
The error is in the MedianStdDevCombiner code.
I'm looking for a bug in this full example because when you execute it ,you obtain different result from the previous normal Median and Standard deviation using the same input data. The result obtained is nearly double values from the previous example, when need to be the same results.

Page: 35-36
The error i found is in the Inverted Index Example.
In the Mapper Function if "getWikipediaURL" return a null value then you get a nullpointerException because you need to check if the result of this function is null prior to set the "link" variable value.

Page 117-118
In ReduceSideJoinWithBloomDriver Code from github doesn't exist any reference to load the bloom filter from any argument... [something like DistributedCache.addCacheFile(...... ], this file is nearly a Copy/paste from the previous ReduceSideJoin.java.

Page 122:
In ReplicatedJoinMapper you always get a java.io.FileNotFoundException because this code want to read and decompress a folder , not a concrete "file.gz", inside this folder. You only need to add a index to your files inside the DistributedCache.

Download Link 1 - Download Link 2