How Data Science Is Helping with Online Youth Safety by Dennis O'Reilly


Data science is the practice of extracting intelligence from mountains of information collected and stored in every form and format imaginable. The field holds great promise for industries and businesses as diverse as agriculture, publishing, finance, and aerospace. However, little attention has been paid to how predictive analytics, machine learning, and other data science techniques will benefit children. Yet the youngest of us are poised to reap the greatest benefits from data science, beginning with use of the technology to promote online youth safety.

Making the Internet of Things safe for all ages

The world of the future will be populated by connected devices — some visible, and some transparent. Whenever a connection is established, a record is created of the transaction, which automatically poses a security risk for the people identifiable by the "bread crumbs" left by the ever-growing record of such transactions. The bread crumbs include IP addresses, cell tower signals, GPS tracks, and other personally identifiable information.

The advent of connected toys raises concerns about the data-collection practices of the toy vendors. Consumer Reports recommends that parents determine the data collected by connected-toy vendors and avoid toys that pose a risk to the child's and the family's privacy. However, it will be nearly impossible for parents to control the flow of personal data between their children and devices linked to the internet as part of the Internet of Things (IoT).

BitDefender estimates that there are expected to be 50 billion connected devices in the world by 2020, and a large percentage of those devices will be collecting information from and about children. In fact, these interactions can be a boon to the young, but only if the children are safe from harm, their privacy is honored, and their personal data is not being misused. Data scientists will play an important role in the development and implementation of "privacy by design," which applies limits on IoT devices in terms of the communications that are allowed and the types of data that they collect, retain, and reuse.

Removing bias from machine learning and other algorithms

What's in a name? Plenty, when it comes to assessing resumes submitted by job candidates. Scientific American reports on the unconscious biases that people demonstrate when judging the qualifications of applicants for employment based on the candidate's name. Studies show that a resume from someone named "John" is rated higher than the same resume submitted by a person named "Julia," while a resume sent by "Linda" is favored over the same resume submitted by "Lakisha."

These same biases — whether conscious or unconscious — are transferred to algorithms by the programmers who create them. This tendency can be particularly harmful when bias is present in the algorithms that comprise an artificial intelligence system, which by definition is intended to act "human." The most common form of AI at present is machine learning, which is a system of algorithms designed to create other algorithms. Examples include the automatic recommendations you receive from Amazon for books based on your reading history (and other factors), and for videos from Netflix, also based on what you've watched in the past.

Children are especially vulnerable to the biases in AI algorithms because they lack an adult's ability to think critically about how they are interacting with connected devices, applications, and other data-collecting systems. One way data scientists are working to protect children online is by improving the methods used to enforce provisions of the Children's Online Privacy Protection Act (COPPA) and other regulations enacted to safeguard youngsters and other vulnerable groups. For example, in the journal Proceedings of Privacy Enhancing Technologies, researchers describe a new framework for the "scalable dynamic analysis" of the algorithms that power applications written for Google's Android mobile platform to ensure they comply with COPPA.

Kate Crawford, co-founder of New York University's AI Now Institute, is one of several data scientists working to ensure machine-learning algorithms and other AI applications don't create more problems than they solve. The journal Nature describes efforts by Crawford and other data scientists to make algorithms "transparent and accountable" so third parties can confirm independently that they contain no inherent biases that would be detrimental to children or others.

How data science helps inform policy for protecting children online

A recent report issued by UNICEF highlights the potential dire consequences of failing to incorporate protections for children in the application of data science now and in the future. For example, personal data about children is collected without the parents being able to grant informed consent (by law, minors are considered incapable of consenting to having a right waived, nor can they enter into contracts).

The researchers call for a new approach to integrating ethics in data science, especially the collection, retention, and reuse of personal information, that is targeted specifically at protecting children's right to privacy and ensuring their safety when they are online. The framework the researchers propose extends across "institutional, national and international practices" and pertains to the "entire data cycle": from collection of the data to applying the data to daily usage and its ultimate erasure when necessary to protect children.

Another approach to safeguarding children from the dangers of big data is proposed by data scientists as part of a series of articles on data ethics from O'Reilly Media. The researchers point out that applying the "Golden Rule" to data science — treat other people's data the way you would treat your own — is insufficient to ensure ethical practices. The researchers recommend an approach based on "the five C's": consent, clarity, consistency, control (and transparency), and consequences (and harm). The data scientists emphasize the need to make the five C's a cornerstone of every company's culture.

Ultimately, what these and other data-science researchers are calling for is a version of medicine's Hippocratic Oath to ensure that first and foremost, technology products "do no harm" — especially to children.


Dennis O'Reilly began writing about workplace technology as an editor for Ziff-Davis' Computer Select, back when CDs were new-fangled, and IBM's PC XT was wowing the crowds at Comdex. He spent more than seven years running PC World's award-winning Here's How section, beginning in 2000. O'Reilly has written about everything from web search to PC security to Microsoft Excel customizations. Along with designing, building, and managing several different web sites, Dennis created the Travel Reference Library, a database of travel guidebook reviews that was converted to the web in 1996 and operated through 2000.  

May 14, 2019


Hakin9 TEAM
Hakin9 is a monthly magazine dedicated to hacking and cybersecurity. In every edition, we try to focus on different approaches to show various techniques - defensive and offensive. This knowledge will help you understand how most popular attacks are performed and how to protect your data from them. Our tutorials, case studies and online courses will prepare you for the upcoming, potential threats in the cyber security world. We collaborate with many individuals and universities and public institutions, but also with companies such as Xento Systems, CATO Networks, EY, CIPHER Intelligence LAB, redBorder, TSG, and others.
Notify of

This site uses Akismet to reduce spam. Learn how your comment data is processed.

Inline Feedbacks
View all comments
© HAKIN9 MEDIA SP. Z O.O. SP. K. 2023