Data and IIIT

Introduction

Data… this is a word that gets thrown around quite a lot these days, but few of us ever care about it. Data collection has become so mainstream now that we willingly agree to have organisations and governments take and use all the data that we generate on a day to day basis, without sparing a second thought. But recently there has been a rise in the awareness amongst the populace, mainly because of just how large scale and how commonplace this collection of data is. The Snowden leaks brought to light how governments have no qualms about monitoring their own citizens on a massive scale. So how is this relevant to our college? Well, one should remember that ours is a research institute, focused primarily on Information Technology. Our institute is involved with cutting edge research in fields which primarily involve a lot of data. We, the students here, are by far the largest generators of data that the institute has access to, and this data could potentially be used to great effect. With advances in machine learning and big data (yes, we know the buzzwords as well!), the data has become precious. All this is great when looked at from the perspective of the industry and academia, where more data equals better research and more profits. However, there is a flipside to all this—privacy.

What Could Go Wrong?

It all boils down to how much we as individuals value our privacy. At first glance, one might just set all this aside, stating that this is far too cynical. “What would they even do with random pieces of information about me?,” is a statement that one hears far too often. And the answer is a lot. Let us look at a few instances of what can possibly be done. In the recent past, personal data—things like phone numbers, names etc.—have grown to become more and more important in all walks of life. This hasn’t gone unnoticed by criminals, and not only have acts such as identity theft become more common, they have also become easier. One can find instances of identity theft online every other day and as major an issue as it might be, it’s statistically rare.

The real issue now isn’t about these blatant forms of criminality, but those that lie in the gray area between what’s legal and what’s not. That gray area which deals with ethics. For instance, every single day we generate metadata based on our browsing history, social network activity, locations visited, etc. This information which is collected by the internet service provider is used to calculate what kind of content to provide to people at a very individual level. This might seem harmless at face value, but these same usage statistics are very valuable to many companies as it can be used to model an individual’s choices and as such simulate each individual user. Having someone else somewhere who knows how you think and can predict your decisions, possibly even before you encounter the particular situation—this is the scary flip side of the coin.

The 2014 incident when the ISAS grades portal was breached into and all information was uploaded online is a prime example of a lapse in server security.

Security

Another problem with giving such personal data to other organisations is the way that their data is handled at their end. Most organisations store such information securely in some encrypted formats. The key point to notice here is most, not all. Even if there was just one such organisation which stored this data in an insecure manner, then that would compromise one’s privacy completely irrespective of how well the others take care of handling it. Just recently, Aadhar information was compromised from certain servers and information on lakhs of people was out in the open, visible to one and all, all through a simple google search. Remember, this is the government, so there would be no alternative for us other than giving our data. But such instances just show how vulnerable we are in the current world, where the data that makes us us is no longer solely in our control. Data in IIIT does not have a history of remaining secure either. Case in point: The 2014 incident when the ISAS grades portal was breached into and all information was uploaded online. Moreover, the new IMS has not yet inspired the confidence of the community. In fact, errors during course registration have led to the very professionalism of the persons hired being called into question.

The Future is Now, Unfortunately

You no longer make your friends. Facebook does it for you. You no longer decide your crushes. Facebook does it for you. As if the philosophers denouncing free will weren’t enough, we have technologists denouncing it as well. While the above statement is dramatised, it is not necessarily hyperbolic. Facebook’s ‘friend suggestion’ feature, coupled with its algorithm that decides whose pictures to show on your newsfeed decides both, who are more likely to be your friends and who you would be more likely to crush on. But Facebook, we can choose to opt out of (or at least that’s what they want you to think). But what about governments? This you can’t opt out of. Who knows, you might just be charged with sedition if you choose to. Consider the case of the ‘risk assessment score’ that is now used in some American states to help judges decide on whether a person is guilty. Without getting into the nitty-gritties of it, the judges get a percentage point from an algorithm that helps them make their decision. These algorithms base their information on your information in order to decide how guilty you are. You like reading crime fiction? The algorithm might just work against you. In the case of the United States, however, there was a distinct racial bias, with the black community getting the worse end of the deal.

Let’s Get Back to Our College

How are we dealing with data in our college? One need only walk around in the campus for a short while to get a sense of just HOW much data the college collects about each student. There are CCTV cameras collecting video footage at many places in campus, videos being taken for attendance purposes, biometric registration being made compulsory for all students and much more; all this in addition to the massive amount of data that they already get during admissions. Data is almost never collected without a very important, innocent purpose. The gray area is always with respect to how important is the purpose vs. the risk of the data falling into the wrong hands or abused. CCTV cameras for security, biometric for convenience in addition to security, attendance videos for verification.

Not only are handwritten assignments stored, but they have also been used for research purposes, without student permission.

Attendance Videos

Let us consider the attendance videos in slightly more detail. Every class with more than a 100 students currently has a centralized system that involves video recording of the students in the class. Such footage is a goldmine for training machine learning models. There is a distinct lack of policy as to what is done with the footage, where and how it is stored. For how long is it retained before being disposed? And one of the more pressing issues is if such videos are/will be used in research without explicit student permission. Though we have been told that such data is properly deleted and that the college does not have the infrastructure to archive and store such data for any long period of time, the lack of any sort of formal policy in this matter makes one feel very uneasy about it. What is also startling are the rumours going around in the community that these videos are, indeed, being used for research. So much so, that when the staff member responsible for taking the attendance videos was asked about what was done with the videos, he claimed that they were not kept with him but forwarded to a faculty member who used them for working on facial recognition (!).

Handwritten Assignments

There is a new way of submitting assignments in town. Through the Shiksha portal. Handwritten assignments are scanned and uploaded and stored—for how long, nobody knows. Not only are these assignments stored, but they have also been used for research purposes, without student permission. This is a breach of trust and privacy. There is a system being developed (or already developed) that would allow a machine to “translate” a person’s handwriting to another’s. What this means is that someone could “write” in someone else’s handwriting. What this also means is that the only reason the person accused to be the Zodiac killer wasn’t convicted, the mismatch in handwriting, would no longer be a reliable criterion. And signatures would also probably not make sense as a method of verification. The first person whose handwriting could be emulated would be you. You, whose handwriting was used in research.

Conclusion

There is no publicly accessible policy with regards to student data collected for academic purposes. There is no publicly accessible policy with regards to the recent collection of biometric information. Policy for biometrics was limited to a few email exchanges that the security committee had with students who bothered about it at the time. There are a lot of aspects of the day to day functioning our institute which rest on trust and mutual understanding. But in case of sensitive issues as these, where privacy and personal student information is concerned, we feel that formal written policy is a reasonable expectation. Such a policy would help in putting the student body’s concerns to rest as well as giving clarity to everyone as to what can and cannot be done. In addition to this, each one of us should start becoming more aware of the research that we are doing as well as the data and privacy concerns that come along with it.