Is Data Science emerging as a New Domain in Computer Science?
I've just completed reading Chapter 5 of Beautiful Data. I planned to write a
blog post about this book, however this chapter contained some new insights for me which I thought were valuable to share. This book has some excellent chapters covering significant developments in the domain of data storage, retrieval and analysis. Chapter 5 is titled "Information Platforms and the Rise of the Data Scientist" written by Jeff Hammerbacher.
The chapter explores the challenges Facebook faced in analysing the data it is collecting and how existing RDMS solutions (MySQL and Oracle) were not up to the task of collecting and enabling analysis of highly fluid data such as clickstreams from millions of users (Currently 2.5 Petabytes is stored and new data is collected at 14 TB/day). The author goes on to discuss the solution they developed internally at Facebook (based on Cloud technologies such as Hadoop and unstructured data).
Analysis of large scale data is becoming a common problem in a large number of domains. Web companies such as Facebook, Google are not the only ones in the World that analyse huge amounts of data. Several scientific experiments such as the CERN LHC produce gigantic amounts of data that needs to be analysed (The recent book Fourth Paradigm by Microsoft Research explores data intensive scientific initiatives).
So many new skills are required to manage this data: designing storage architectures, high speed retrieval architectures, authoring data analysis workflows and finally communicating the results of the analysis. All these tasks are multi-disciplinary. Some tasks are related to Computer Science (design of data storage and retrieval systems), some to Business Analysis (authoring data analysis), some tasks belong to statisticians (the actual algorithms performing the analysis) and some to engineers (the underlying infrastructure for storing and processing the data).
Can this multi-discplinary approach to data management be termed as "Data Science". This is a term which I believe is increasingly gaining traction.