This book is a great start for those  who is thinking about creating a data lake for the first time or is having trouble getting the most out of one that has already been put in place. This book is a great place to start if you’re interested in utilizing what these innovative new big data technologies and techniques have to offer the organization. When big data issues arise at work, management may want to read it once and refer to it occasionally. For practical practitioners, it can be a helpful resource as they plan and carry out big data lake projects. This book is the result of interviews with practitioners and executives from more than a hundred organizations, ranging from typical corporate enterprises to governments and data-driven businesses like Google, LinkedIn, and Facebook. This book teaches you what a data lake is, why businesses need one, and the best practices for creating one. After reading this book, you will gain a brief understanding of data warehousing, big data, and data science as well as how to design a self-service model and best practices for giving analysts access to the data. After reading this book, you’ll learn from experts in various industries how to implement a data lake, as well as the various approaches businesses take to build one. You’ll also be able to use various methods for architecting your data lake. You may learn how to succeed in the Big Data/AI-Driven era by reading this book. Numerous precise suggestions for setting up and managing data across the Enterprise. Simultaneously deep and simple to read. I hope you will find this book beneficial and useful.

Topics covered by this book:

  • Chapter 1 is about Introduction to Data Lakes. In which we study about Creating a Successful Data Lake, Roadmap to Data Lake Success and Data Lake Architectures

  • Chapter  2 is about Historical Perspective. In which  we study The Drive for Self-Service Data The Birth of Databases, The Analytics Imperative The Birth of Data Warehousing as well as what is The Data Warehouse Ecosystem.

  • Chapter  3 is about Introduction to Big Data and Data Science. Which is about Hadoop Leads the Historic Shift to Big Data, How Processing and Storage Interact in a MapReduce Job and What Should Your Analytics Organization Focus On?

  • Chapter  4 is about  Starting a Data Lake which tell us about  What and Why of Hadoop, Preventing Proliferation of Data Puddles and  Advantage of Big Data.

  • Chapter 5 is  From Data Ponds/Big Data Warehouses to Data Lakes. In which we study about Essential Functions of a Data Warehouse, Moving to a Data Pond and Growing Data Ponds into a Data Lake Loading Data That’s Not in the Data Warehouse.

  • Chapter 6 is about Optimizing for Self-Service. Which tell us about The Beginnings of Self-Service, Business Analysts, Data Wrangling in the Data Lake as well as  Analyzing and Visualizing.