14. Ethics#

Data science is both an academic and an applied discipline. It is taught in universities, and its techniques are widely practiced in industry, government and other walks of life. In its academic mode, issues of ethics and ethical conduct arise primarily in terms of research. In the United States at least, the basic ethical framework for research in data science is a product of two government reports—the Belmont Report and Menlo Report. Those reports, and thus the framework, were a response to an egregious and tragic ethical failure known as the Tuskegee Syphilis Study or Tuskegee Experiment. Reviewing the history of that experiment is important instrumentally, because it helps us understand where our current pratice of research ethics comes from. But it is also important more broadly, as an example of an injustice that both reflected and perpetuated class and racial discrimination. That is, the Tuskegee case reminds us that everything we do in data science has potential consequences beyond ourselves.

Acknowledgement

Some of the material in this chapter is adapted from Matthew J. Salganik’s book Bit by Bit: Social Research in the Digital Age, which provides a much more detailed discussion of the research principles and frameworks, as well as several special problems in the digital age outlined here. We thank him for his excellent work, and recommend his book to all who are interested in reading further.