How to recover data from an formatted HFS Drive
Last week my Time Machine Backup Hard disk suddenly died! Everytime I plugged the hard disk, my mac did not recognise the file system (it was a HFS+ partition) and asked my to format it. I did format it, only to realise that I had lost some precious data.
How do you recover data from a formatted hard disk? Enter Data Rescue from PROSoft Engineering. I used Data Rescue II (only to realise afterwards that a new version was available). It took more than 2 days to scan the hard disk (1 TB hard disk, sector by sector analysis) and afterwards it recreated the files it found.

Recreation of the files, was also a lengthy process, and took nearly 10 hrs. After it recovers the files it presents a list of the kinds of files you might be interested in restoring. I selected the files I was interested in, research papers and my iPhoto Collection.

Recovery of the selected files (around 55.6GB) took around 3 hrs.
All in all I'm really grateful for such fantastic software. For all the windows users out there.... My NTFS hard disk has failed as well
(bad start to the year!
). Currently I'm using ParetoLogic's Data Recovery Pro. The data recovery process is currently on going (since 5 days).
Is Data Science emerging as a New Domain in Computer Science?
I've just completed reading Chapter 5 of Beautiful Data. I planned to write a
blog post about this book, however this chapter contained some new insights for me which I thought were valuable to share. This book has some excellent chapters covering significant developments in the domain of data storage, retrieval and analysis. Chapter 5 is titled "Information Platforms and the Rise of the Data Scientist" written by Jeff Hammerbacher.
The chapter explores the challenges Facebook faced in analysing the data it is collecting and how existing RDMS solutions (MySQL and Oracle) were not up to the task of collecting and enabling analysis of highly fluid data such as clickstreams from millions of users (Currently 2.5 Petabytes is stored and new data is collected at 14 TB/day). The author goes on to discuss the solution they developed internally at Facebook (based on Cloud technologies such as Hadoop and unstructured data).
Analysis of large scale data is becoming a common problem in a large number of domains. Web companies such as Facebook, Google are not the only ones in the World that analyse huge amounts of data. Several scientific experiments such as the CERN LHC produce gigantic amounts of data that needs to be analysed (The recent book Fourth Paradigm by Microsoft Research explores data intensive scientific initiatives).
So many new skills are required to manage this data: designing storage architectures, high speed retrieval architectures, authoring data analysis workflows and finally communicating the results of the analysis. All these tasks are multi-disciplinary. Some tasks are related to Computer Science (design of data storage and retrieval systems), some to Business Analysis (authoring data analysis), some tasks belong to statisticians (the actual algorithms performing the analysis) and some to engineers (the underlying infrastructure for storing and processing the data).
Can this multi-discplinary approach to data management be termed as "Data Science". This is a term which I believe is increasingly gaining traction.