Data science has become an important part of resource engineering and could be pivotal in helping the earth optimise consumption and emissions. At the same time, data science has other uses, particularly for business.
The disclosure that a firm which uses data mining, data brokerage and data analysis to advise political groups on strategy managed to access personal records of Facebook users has grown into a major crisis.
Data science is analysis using mathematics, along with computer utilities, for extracting useful information out of jumbled or apparently unrelated data. We now have the ability, with computers, to deal with complex computations on huge quantities of data. Scientific discoveries, till a few decades ago were the result of deep insight and experiments designed to test theories that could explain observed facts.
While large collection of data was still necessary, handling data was laborious and the data was analysed to look for specific patterns. A difference with the arrival of computers is that large quantities of data can now be handled fast and at low cost.
Methods have hence been developed to exploit this ability and get computers to take data which has not been sorted or collected through a specific purpose and just discover trends and connections between components of data.
And with the increased ability to deal with large data, the raw data itself has become more valuable. Enterprises now design their operating processes so that data is collected at all stages and then made use of, generally by an automated system, to improve the processes for greater production and less cost. With the current proliferation of the Internet, the marketplace and advertising spaces have shifted to computer screens.
Individual purchase preferences or product interests can now be recorded every time a person logs in and makes a purchase or visits a website. This information can then be used to direct focused offers or information to the person for the benefit of the purchaser and the seller and the economy in general.
The rising market of computer advertising has thus led to demand for popular websites, as places where advertising panels can be placed. And among websites, social media platforms have a leading presence. As each user of the site connects with his or her friends and the friends in turn connect with their other friends, the number of persons that each user gets connected to grows exponentially.
Businesses and institutions of all kinds now have specially trained social media personnel to develop networks through which advertising or special messages can be conveyed, to ever widening readership.
And, as users do not come face-to-face with their friends, or many are friends only in the social media site, users have a sense of anonymity and they are often very forthcoming and communicative about their personal lives.
What happened in the current Facebook crisis is that a social sciences researcher in the UK developed an “app”, a computer programme that runs on Android phones, and invited phone users to log in by sharing their Facebook accounts, in the process of completing surveys to receive personality ratings. Facebook, at the time, allowed such Android phone programmes to extract various bits of information.
While this may not, by itself, have been harmful, the researcher, who got his hands on the data of thousands of Facebook users, passed on the information to Cambridge Analytica, a data analysis firm.
This firm specialises in analyses of data to extract trends of information and to classify individuals as suitable for specific messages, either product advertising or messages that have political bearing, to influence the receiver and also others with whom the receiver is connected.
It is reported that Cambridge Analytica used such information, as gathered from social media sources, to advise President Donald Trump in his 2016 election campaign. Even in Indian politics, different parties alleged that Cambridge Analytica had been engaged to advise or design election strategy.
The revelation that personal information, which users had disclosed in the Facebook profiles, which they believed to be restricted to their circle of “friends”, had been made public, caused great unease to users.
There is a continuing rush now of users closing down their Facebook accounts and Facebook faces the possibility of revenue losses and is deposing before investigating bodies on how it allowed private data to go public.
The main defense of Facebook is that the researcher whose app was allowed to collect data for research passed it on to Cambridge Analytica in violation of the terms in which he was allowed access to the data.
Cambridge Analytica also says that it has now deleted all the data accessed. But there are suspicions that this has not been done and also the question of many other Android apps that have accessed data and many that are still doing so.
While the inquisition of Facebook goes on, states and different groups are examining the question of what safeguards need to be there, by statute, on information that users of different services share with the service provider.
And then to place limits on the uses to which the service providers can make of the information that they become privy to. While there may be truth in the belief that data mining techniques can now extract from social media data even personal facts about users that they had not specifically shared, it has become important to consider what extent of use of data should be permitted.
Using the data for commercial advertisement targeting is an important source of revenue for websites that provide users with communication or other services free of charge.
This apart, the information in popular websites is valuable for research in sociology and for planning and optimising public services, even crime prevention, without being personally intrusive.
Excessive curbs on the information which can be mined from transactions over the Internet would hence deprive responsible organisations with an important resource.
The Indian Railways had once examined the possibility of using for commercial or other purposes the demographic data that gets captured every time a ticket is booked.
While the idea is yet to be implemented, there could be the objection that the information of age and residential address, which a passenger shares while booking a ticket, is only for ensuring fair allotment of reserved space.
It could be argued that personal information should hence not be shared, even with agencies that plan more useful provision of reserved space and certainly not with commercial firms for profit.
With current technical capability, the whole genetic information of a person could be collected every time a blood test, to check the sugar or haemoglobin level, is taken.
This would be an invaluable resource for medical or demographic research. It would be unfortunate if the law said that the information should be deleted as soon as the limited, pathological purpose has been served.
The same month of March 2018, when the these questions of data privacy surfaced, also saw the passing of John Sulston, who shared the Nobel Prize for medicine in 2002 for work in genome sequencing.
Sulston had been a campaigner for free sharing of science data, particularly genetic data that was sought to be controlled by drug companies. That information should be free was an article of faith with Sulston, who was one of those who stood bail for Julian Assange of Wikileaks fame.
The case for sharing results of research is fairly obvious. The same case could be made out to share, for public good, the huge personal information that gets generated in all transactions of modern living.
The writer can be contacted at [email protected]