The Aadhaar database, which will eventually contain details of every Indian resident, will put valuable information at risk if it starts working with CIA funded startup MongoDB, activists fear.
According to a story in online magazine Moneylife (the report has been pulled from the site since), MongoDB which will become a vendor to the Unique Identification Authority of India — tasked with assigning a unique identification number to over 1.2 billion Indians– will take data from UIDAI to “undertake analysis.”
The fact that CIA’s non profit Venture Capital Arm is backing the company which developed MongoDB, as reported originally by a story in The Economic Times, is worrying. However, it’s too early to squarely blame UIDAI without knowing their side of the story. We don’t think MongoDB will get the data to “undertake analysis” as pointed out in the Moneylife report .
Sunil Abraham of the Center for Internet & Society pointed out in the ET story that a source code audit should take care of this problem, if it has not been already taken care of. However, the folks who’ve created MongoDB and such are nothing short of geniuses. So the auditors looking at the code, need to be equally competent.
Putting MongoDB in Perspective
Some of the worlds largest websites like that of The New York Times, Craigslist & eBay are built on MongoDB. However, given the scale at which the National Security Agency in the United States conducts Internet surveillance and the sensitivity of India’s National Identity database, fears of activists aren’t completely misplaced.
For the uninitiated, MongoDB is an open-source database management system developed by 10gen, a New York based company founded by Dwight Merriman, the former Chief Technical Officer of DoubleClick & others. The company is funded by a clutch of venture capital firms & In-Q-Tel, the non profit venture capital arm of the American Central Intelligence Agency.
It is a newer form of database technology called NoSQL, used to store records. Oracle and other proprietary companies used to run on an older DBMS system called Relational Database Management Systems (RDBMS). But those didn’t work very well with unstructured data (the kind of data that is generated on social media) which is multiplying by the day. Other open source NoSQL databases like CouchDB and Cassandra can also be used in its place.
Aadhaar uses a bunch of big data technologies like Hadoop, MySQL, Apache Lucene Solr & MongoDB. On the Mongo cluster, the UIDAI stores enrollment records, demographics and photograph of an Aadhaar card holder.
According to available data (see pic below), the data access technologies being used for Aadhaar are Hadoop HDFS, Hive, HBase, Solr, MySql & MongoDB. Except for MySql & MongoDB, the rest of the technology stack is by Apache. Surely the UIDAI had its reasons for choosing MongoDB. However, in the interest of simplicity, could they have used CouchDB from Apache instead of MongoDB?
Typically, you would use MongoDB to setup your very large distributed data storage system and the data resides on your servers. If the UIDAI has inspected the source code and made sure that it is up to their security standards and also added their own security features, it should be fairly difficult for a third party, like the CIA, to gain access to it. However, there is always an outside chance of loopholes in the code (or even a backdoor entry) that folks who have originally developed the technology would know of and can be exploited. This is a risk you run if you aren’t working completely in house. Defense & space establishments used to do everything in house, however, that trend has been changing of late. Critical government institutions have now started working with third party vendors and startups but they do have a very strict diligence process in place.
According to UIDAI officials who I spoke to when I visited their facility last year, data is anonymized & encrypted before storage. So unless someone wants Obama & party to see what’s inside, it’s a tough nut to crack.
What are your thoughts?
[With inputs from team Kreeo, an enterprise collaboration startup which works on big data technologies.]