: PragProg

Seven Databases in Seven Weeks, Second Edition

A Guide to Modern Databases and the NoSQL Movement

by: Luc Perkins, Jim Wilson, Eric Redmond

Published	2018-04-06
Internal code	pwrdata
Print status	In Print
Pages	358
User level	Intermediate
Keywords	NoSQL, Redis, Neo4J, Couch, Mongo, HBase, Riak, Postgres, EC2, CAP
Related titles	All of the books in the “Seven in Seven” series Data Science Essentials in Python SQL Antipatterns
ISBN	9781680502534
Other ISBN	Channel epub: 9781680505979 Channel PDF: 9781680505986 Kindle: 9781680505955 Safari: 9781680505962 Kindle: 9781680505955
BISACs	COM021000 COMPUTERS / Databases / General COM051230 COMPUTERS / Software Development & Engineering / General COM051230 COMPUTERS / Software Development & Engineering / General

https://media.pragprog.com/covers/pwrdata.jpg

Highlight

Data is getting bigger and more complex by the day, and so are your choices in handling it. Explore some of the most cutting-edge databases available—from a traditional relational database to newer NoSQL approaches—and make informed decisions about challenging data storage problems. This is the only comprehensive guide to the world of NoSQL databases, with in-depth practical and conceptual introductions to seven different technologies: Redis, Neo4J, CouchDB, MongoDB, HBase, Postgres, and DynamoDB. This second edition includes a new chapter on DynamoDB and updated content for each chapter.

Description

While relational databases such as MySQL remain as relevant as ever, the alternative, NoSQL paradigm has opened up new horizons in performance and scalability and changed the way we approach data-centric problems. This book presents the essential concepts behind each database alongside hands-on examples that make each technology come alive.

With each database, tackle a real-world problem that highlights the concepts and features that make it shine. Along the way, explore five database models—relational, key/value, columnar, document, and graph—from the perspective of challenges faced by real applications. Learn how MongoDB and CouchDB are strikingly different, make your applications faster with Redis and more connected with Neo4J, build a cluster of HBase servers using cloud services such as Amazon’s Elastic MapReduce, and more. This new edition brings a brand new chapter on DynamoDB, updated code samples and exercises, and a more up-to-date account of each database’s feature set.

Whether you’re a programmer building the next big thing, a data scientist seeking solutions to thorny problems, or a technology enthusiast venturing into new territory, you will find something to inspire you in this book.

Q&A with Author Luc Perkins

Q: Why did you choose to work on the second edition of Seven Databases in Seven Weeks?

A: Well, I began becoming intimately acquainted with the NoSQL space about five years ago, when I took on the role of technical writer at Basho Technologies, the company behind NoSQL database Riak. From the very beginning I found the space endlessly interesting, so full of promise and inspiring technology yet also very tricky to navigate.

Relational databases are quite interesting to me as well, but they tend to be very structurally similar. NoSQL databases on the other hand, tend to be much more individualistic, you could say. Each has its own special strengths and weakness and quirks and presents you with a set of trade-offs you’ve probably never encountered in another database. So the book was an opportunity to take my more localized knowledge of the space and really stretch my knowledge and my thinking outward.

Q: What was the hardest part about working on the book?

A: In general, I’d say making the book up to date. Unsurprisingly, a ton has changed since the original edition. The NoSQL space is notoriously fast moving and it’s hard enough to keep up with one database, let alone seven. That means that I had to check every single code snippet and CLI command and claim and diagram in the book to make sure that it still worked, presented accurate information, etc. Then I had to make sure that newer features are mentioned or showcased when necessary. For a book that’s really seven books in one, this was quite a task, though an extremely rewarding one.

Q: What are some of the main differences between the first and second edition?

A: First, and most importantly, everything in the book works now. We’re all used to bit rot in code but it happens in books, too. Database systems change a lot over time. Many of the CLI commands and code snippets from the first edition eventually started throwing cryptic errors or flat-out not working at all.

But there are some other, more specific changes. The chapter on Riak was removed and replaced with a chapter on Amazon’s DynamoDB. Riak is a fascinating database but its future is very uncertain. DynamoDB is also a fascinating database but it feels like a living, breathing project. Furthermore, the querying language for Neo4j was updated to Cypher (instead of the original and now largely defunct Gremlin).

Q: What’s your favorite database in the book?

A: Oh gosh, that’s very tricky, because I have a special fondness and a place in my heart reserved for each of them. But if I had to pick I’d say Redis. It has a pretty small surface area for such a widely used system and a very well-defined domain of problems that it seeks to address. If I had to build a new application that used all seven databases in the book, the Redis portion of the application would be the one I’d be most eager to work on.

Q: Do you have any general advice for readers? Databases are complex and it may not be readily apparent how even an extremely technically savvy reader should proceed.

A: I’d say take it nice and slow. The content is spread across “days” for a reason. You don’t have to follow the schema we present, of course, but this is not single-sitting material. Take a minute to really absorb the diagrams and technical definitions. Try to understand each database’s “worldview,” so to speak, and use that as a thinking cap for each chapter’s material. Try to imagine times when each database would be indispensable. And if you and a database just aren’t getting along, skip to the next one and come back later. You may come back with fresh insight and a new slate of questions.

Contents and Extracts

Preface

Why a NoSQL Book
Why Seven Databases
What’s in This Book
What This Book Is Not
Code Examples and Conventions
Credits
Online Resources

Introduction

It Starts with a Question
The Genres
Onward and Upward

PostgreSQL

That’s Post-greS-Q-L
Day 1: Relations, CRUD, and Joins
Day 2: Advanced Queries, Code, and Rules
Day 3: Full Text and Multidimensions
Wrap-Up

HBase

Introducing HBase
Day 1: CRUD and Table Administration
Day 2: Working with Big Data excerpt
Day 3: Taking It to the Cloud
Wrap-Up

MongoDB

Hu(mongo)us
Day 1: CRUD and Nesting
Day 2: Indexing, Aggregating, Mapreduce excerpt
Day 3: Replica Sets, Sharding, GeoSpatial, and GridFS
Wrap-Up

CouchDB

Relaxing on the Couch
Day 1: CRUD, Fauxton, and cURL Redux
Day 2: Creating and Querying Views
Day 3: Advanced Views, Changes API, and Replicating Data
Wrap-Up

Neo4J

Neo4j Is Whiteboard Friendly
Day 1: Graphs, Cypher, and CRUD
Day 2: REST, Indexes, and Algorithms excerpt
Day 3: Distributed High Availability
Wrap-Up

DynamoDB

DynamoDB: The “Big Easy” of NoSQL
Day 1: Let’s Go Shopping!
Day 2: Building a Streaming Data Pipeline
Day 3: Building an “Internet of Things” System Around DynamoDB
Wrap-Up

Redis

Data Structure Server Store
Day 1: CRUD and Datatypes
Day 2: Advanced Usage, Distribution
Day 3: Playing with Other Databases
Wrap-Up

Wrapping Up

Genres Redux
Making a Choice
Where Do We Go from Here?

Database Overview Tables

The CAP Theorem

Eventual Consistency
CAP in the Wild
The Latency Trade-Off