Seven Databases in Seven Weeks, Second Edition

A Guide to Modern Databases and the NoSQL Movement

by: Luc Perkins, Jim Wilson, Eric Redmond

Published 2018-04-06
Internal code pwrdata
Print status In Print
Pages 358
User level Intermediate
Keywords NoSQL, Redis, Neo4J, Couch, Mongo, HBase, Riak, Postgres, EC2, CAP
Related titles
  • All of the books in the “Seven in Seven” series
  • Data Science Essentials in Python
  • SQL Antipatterns
ISBN 9781680502534
Other ISBN Channel epub: 9781680505979
Channel PDF: 9781680505986
Kindle: 9781680505955
Safari: 9781680505962
Kindle: 9781680505955
BISACs COM021000 COMPUTERS / Databases / General
COM051230 COMPUTERS / Software Development & Engineering / General
COM051230 COMPUTERS / Software Development & Engineering / General

Highlight

Data is getting bigger and more complex by the day, and so are your choices in handling it. Explore some of the most cutting-edge databases available—from a traditional relational database to newer NoSQL approaches—and make informed decisions about challenging data storage problems. This is the only comprehensive guide to the world of NoSQL databases, with in-depth practical and conceptual introductions to seven different technologies: Redis, Neo4J, CouchDB, MongoDB, HBase, Postgres, and DynamoDB. This second edition includes a new chapter on DynamoDB and updated content for each chapter.

Description

While relational databases such as MySQL remain as relevant as ever, the alternative, NoSQL paradigm has opened up new horizons in performance and scalability and changed the way we approach data-centric problems. This book presents the essential concepts behind each database alongside hands-on examples that make each technology come alive.

With each database, tackle a real-world problem that highlights the concepts and features that make it shine. Along the way, explore five database models—relational, key/value, columnar, document, and graph—from the perspective of challenges faced by real applications. Learn how MongoDB and CouchDB are strikingly different, make your applications faster with Redis and more connected with Neo4J, build a cluster of HBase servers using cloud services such as Amazon’s Elastic MapReduce, and more. This new edition brings a brand new chapter on DynamoDB, updated code samples and exercises, and a more up-to-date account of each database’s feature set.

Whether you’re a programmer building the next big thing, a data scientist seeking solutions to thorny problems, or a technology enthusiast venturing into new territory, you will find something to inspire you in this book.


Q&A with Author Luc Perkins

Q: Why did you choose to work on the second edition of Seven Databases in Seven Weeks?

A: Well, I began becoming intimately acquainted with the NoSQL space about five years ago, when I took on the role of technical writer at Basho Technologies, the company behind NoSQL database Riak. From the very beginning I found the space endlessly interesting, so full of promise and inspiring technology yet also very tricky to navigate.

Relational databases are quite interesting to me as well, but they tend to be very structurally similar. NoSQL databases on the other hand, tend to be much more individualistic, you could say. Each has its own special strengths and weakness and quirks and presents you with a set of trade-offs you’ve probably never encountered in another database. So the book was an opportunity to take my more localized knowledge of the space and really stretch my knowledge and my thinking outward.

Q: What was the hardest part about working on the book?

A: In general, I’d say making the book up to date. Unsurprisingly, a ton has changed since the original edition. The NoSQL space is notoriously fast moving and it’s hard enough to keep up with one database, let alone seven. That means that I had to check every single code snippet and CLI command and claim and diagram in the book to make sure that it still worked, presented accurate information, etc. Then I had to make sure that newer features are mentioned or showcased when necessary. For a book that’s really seven books in one, this was quite a task, though an extremely rewarding one.

Q: What are some of the main differences between the first and second edition?

A: First, and most importantly, everything in the book works now. We’re all used to bit rot in code but it happens in books, too. Database systems change a lot over time. Many of the CLI commands and code snippets from the first edition eventually started throwing cryptic errors or flat-out not working at all.

But there are some other, more specific changes. The chapter on Riak was removed and replaced with a chapter on Amazon’s DynamoDB. Riak is a fascinating database but its future is very uncertain. DynamoDB is also a fascinating database but it feels like a living, breathing project. Furthermore, the querying language for Neo4j was updated to Cypher (instead of the original and now largely defunct Gremlin).

Q: What’s your favorite database in the book?

A: Oh gosh, that’s very tricky, because I have a special fondness and a place in my heart reserved for each of them. But if I had to pick I’d say Redis. It has a pretty small surface area for such a widely used system and a very well-defined domain of problems that it seeks to address. If I had to build a new application that used all seven databases in the book, the Redis portion of the application would be the one I’d be most eager to work on.

Q: Do you have any general advice for readers? Databases are complex and it may not be readily apparent how even an extremely technically savvy reader should proceed.

A: I’d say take it nice and slow. The content is spread across “days” for a reason. You don’t have to follow the schema we present, of course, but this is not single-sitting material. Take a minute to really absorb the diagrams and technical definitions. Try to understand each database’s “worldview,” so to speak, and use that as a thinking cap for each chapter’s material. Try to imagine times when each database would be indispensable. And if you and a database just aren’t getting along, skip to the next one and come back later. You may come back with fresh insight and a new slate of questions.

Contents and Extracts

Preface

  • Introduction
    • It Starts with a Question
    • The Genres
    • Onward and Upward
  • PostgreSQL
    • That’s Post-greS-Q-L
    • Day 1: Relations, CRUD, and Joins
    • Day 2: Advanced Queries, Code, and Rules
    • Day 3: Full Text and Multidimensions
    • Wrap-Up
  • HBase
    • Introducing HBase
    • Day 1: CRUD and Table Administration
    • Day 2: Working with Big Data excerpt
    • Day 3: Taking It to the Cloud
    • Wrap-Up
  • MongoDB
    • Hu(mongo)us
    • Day 1: CRUD and Nesting
    • Day 2: Indexing, Aggregating, Mapreduce excerpt
    • Day 3: Replica Sets, Sharding, GeoSpatial, and GridFS
    • Wrap-Up
  • CouchDB
    • Relaxing on the Couch
    • Day 1: CRUD, Fauxton, and cURL Redux
    • Day 2: Creating and Querying Views
    • Day 3: Advanced Views, Changes API, and Replicating Data
    • Wrap-Up
  • Neo4J
    • Neo4j Is Whiteboard Friendly
    • Day 1: Graphs, Cypher, and CRUD
    • Day 2: REST, Indexes, and Algorithms excerpt
    • Day 3: Distributed High Availability
    • Wrap-Up
  • DynamoDB
    • DynamoDB: The “Big Easy” of NoSQL
    • Day 1: Let’s Go Shopping!
    • Day 2: Building a Streaming Data Pipeline
    • Day 3: Building an “Internet of Things” System Around DynamoDB
    • Wrap-Up
  • Redis
    • Data Structure Server Store
    • Day 1: CRUD and Datatypes
    • Day 2: Advanced Usage, Distribution
    • Day 3: Playing with Other Databases
    • Wrap-Up
  • Wrapping Up
    • Genres Redux
    • Making a Choice
    • Where Do We Go from Here?
  • Database Overview Tables
  • The CAP Theorem
    • Eventual Consistency
    • CAP in the Wild
    • The Latency Trade-Off