Algorithmic Transparency

&

Public Discourse

2021.01.28

2021 is likely to bring a heated public debate over how to best regulate big tech companies. In 2020, President Trump and President-elect Biden both argued for the repeal of Section 230, the US law that gives internet platforms immunity from legal action for the consequences of user-generated content, a law that Wired magazine once argued is Silicon Valley’s “source of invincibility.” Meanwhile, here in Canada, the Competition Bureau has launched an investigation into the business practices of Amazon.ca. In the past, efforts such as these have not received widespread public support in North America. Even after a decade of widely reported cyber controversies, until very recently, there was not much political will for big tech to be subordinated to big government. However, that appears to be changing as the COVID-19 pandemic drags on and is complicated by misinformation going viral on social media. The taboo against imposing legislation in cyberspace is fading, and now, Canada must have the public conversation about how to regulate big tech. Perhaps the best place to enter that conversation is an analysis of how Canadians use algorithms to inform their decision-making. Because digital networks so profoundly impact our most important decisions, there is a strong case to be made that Canada should enact legislation requiring digital platforms be transparent about their use of algorithms to amplify content.

To make that case, it is important to explain how algorithmic amplification works. An algorithm is a set of rules that helps a person make a decision. It can easily be understood through a non-digital example. Imagine you are at the library deciding which book to read next. You recently enjoyed a horror novel by Stephen King, so you decide to check out another King novel. This time, however, you read The Green Mile, one of King’s novels in a genre other than horror. You discover it is not to your taste, so you refine the criteria by which you choose your next book, according to the stipulation “a novel written by Stephen King in the horror genre.” This feedback loop continually narrows the range of options available to you every time you go to the library, making it easier to select a book with each new piece of information. However, though Stephen King is a prolific writer, the total number of novels he has authored is still quite small as a data-set. Determining which of his 61 novels to read is a difficult decision but it is still a decision that has only 61 alternatives. In situations where we must decide between a number of alternatives that is greater than what we can possibly compute on our own, we often turn to an algorithm that amplifies certain alternatives based on factors outside of our own experientially derived preferences.

For example, if you have read all of King’s novels and you decide to expand the data-set in order to increase the number of novels available to you, you can ask a librarian to recommend some books that are similar to King’s. That would be a form of content amplification because the librarian would introduce selection criteria outside of your own experiences that bias particular alternatives. A librarian with a cursory understanding of genre fiction may be able to point you to authors like Dean Koontz or Anne Rice, whereas a librarian who happens to be a horror aficionado will have an algorithm so refined she can recommend obscure titles by Richard Bachman, John Swithen, and Beryl Evans — three pseudonyms King has used to publish lesser-known works that are absolute shibboleths among dedicated fans. In this way, you benefit from the librarian’s algorithmic amplification of alternatives that would have otherwise been unknown to you. Unfortunately, algorithmic amplification often creates all kinds of problems when it is digitized and executed at scale. To understand why, imagine that instead of asking a librarian for a recommendation you use a search engine like Google to seek out information according to the following parameters: “horror novels like Stephen King”. When I type this into Google Search, the search engine returns 43,300,000 results. Clearly, no algorithm that I can compute within my own cognitive powers could organize that many web pages. So, I trust Google’s digital algorithms to do that work for me.

Google uses a bundle of algorithms to produce an optimized list of webpages that has been tailored to my search based on my own search history as well numerous other parameters in a process known as Search Engine Optimization (SEO). This is where the need for legislation becomes apparent. Google is notoriously secretive about how its algorithms “optimize” search results. In fact, the company is currently being investigated by the U.S. Congress’s Anti-Trust Sub-Committee for using algorithms to subordinate organic search results to Google’s own (less relevant) content, a technique the Sub-Committee refers to as “self-preferencing.” And, while Google is probably the most high-profile and technologically sophisticated operator of algorithms, the lack of transparency and potential for predatory content amplification are by no means restricted to SEO. The secrecy surrounding algorithmic amplification also enables malicious actors to harm citizens in cyberspace as well as in the real world. Perhaps the most famous example of this harm was the compromise of the Facebook newsfeed through so-called “false amplification” techniques during the 2016 U.S. election. Macedonian teenagers and Russian trolls gamed the algorithms that curated individual Facebook users’ pages to spread fake news, disinformation, and conspiracy theories — a strategy now known as LikeWar. Two Russian-linked Facebook groups even organized opposing protests at the same time outside an Islamic community center in Houston, Texas on 21 May 2016. The confrontation eventually escalated into verbal attacks between Heart of Texas and United Muslims of America, two fictitious political action groups invented by trolls in St. Petersburg that rallied real people in Houston.

Perhaps even more concerning than SEO-hacking by corporate interests and political disinformation, however, is the recent adoption by Canadian law enforcement agencies of predictive policing and algorithmic surveillance technologies. These technologies in themselves are not without merit but they have been implemented with almost no scrutiny by media or legislators and thus their potential to perpetuate and magnify systemic discrimination through entrenched feedback looping has gone almost completely unanalyzed. When, in the summer of 2020, the University of Toronto’s Faculty of Law reviewed the practices of several police forces across Canada who had adopted such technologies, they recommended an immediate nation-wide moratorium on all algorithmic policing technologies pending a comprehensive review through judicial inquiry. Their report specifically cited the need for transparency in the development and implementation of such technologies.

These examples are not meant to be evidence for a kind of techno-nihilism. There is no question that the Artificial Intelligence we benefit from by using digital algorithms to compute mass-quantities of information enables us to make rapid decisions we otherwise would not be able to make. Algorithmic amplification was one of the first tools researchers turned to in the search for a vaccine to COVID-19. Arguably the necessity of algorithmic amplification to make decisions increases concomitantly to the escalating importance of those decisions. Put more simply, the more important the decision, the more data we need to process in order to make it an informed decision. This is the best argument for demanding transparency in big tech’s use of algorithms. And there are policy answers out there. The U of T investigators recommended police forces produce “algorithmic impact assessments” before implementing new data-driven policing techniques as well as publish annual reports that disclose details about how algorithmic policing technologies are being used, including information about any associated data. Such policy tools would allow the public to have an informed debate about the merits and dangers of such technologies. Legislating similar requirements across algorithm-enabled industries would bring some order to the chaos that is often found in cyberspace and which is, unfortunately, spilling over into the real world with increasing frequency.

Page updated

Google Sites

Report abuse