2020 Pacific Symposium on Biocomputing

Our travels to the Big Island to discuss ethical and technical challenges in genome privacy

by Charlotte Brannon

Gamze and I started off 2020 with a bang by attending the Pacific Symposium on Biocomputing (PSB) on the Big Island of Hawaii. This year was the 25th anniversary of PSB. The conference was held on the west coast of the island at the Fairmont Orchid hotel. We attended several of the conference sessions and workshops, but we were most excited about the “Navigating Ethical Quandaries with the Privacy Dilemma of Biomedical Datasets” workshop because Gamze was a co-organizer and I was giving a talk. I also took a bunch of photos, which are not *technically* related to ethics, privacy, or genomics, but nonetheless accompany this post.

left- arriving at the Kona airport
middle - views of the island on the drive to my hotel
right - arriving at the conference on day 1

“Ethical quandaries” come up a lot in computational biology and bioinformatics in more ways than one. In fact, ours was not the only workshop at PSB this year that addressed questions of ethics. One of the other workshops focused on Artificial Intelligence (AI) ethics in Biomedicine. Whereas Gamze and I are specifically interested in protecting individual privacy while sharing biomedical datasets, AI presents a different set of ethical challenges. For example, scientists making use of AI technology have to think about how to prevent bias in their algorithms. This is a tricky question, as AI/machine learning algorithms make use of real-world data to learn how to approach new data. Yet, real-world data is often biased, which will lead to a biased algorithm. Humans are inherently biased, but at least we can hold people socially and legally accountable for acting on biases. If someone were to develop a machine learning algorithm to, say, identify cancer in patient imaging data, and it were biased toward certain populations, we currently wouldn’t have a good way of holding that algorithm “accountable” for its bias. 

Additionally, algorithms that can identify hidden patterns in large amounts of data don’t always get it right. In the AI ethics workshop, Chris Re from Stanford gave an example of a machine learning algorithm that was supposed to determine a person’s sex based on an image of their eye. At first, the algorithm seemed to be incredibly accurate. It turned out, though, that the algorithm was determining sex based on the presence or absence of mascara on the eyelashes, which were visible in each photo in the training data set. Imagine if we had algorithms like this making decisions about health care diagnoses and treatments… For this reason, it was important to have a workshop to discuss ways scientists can continue to refine and make use of AI technology while addressing ethical issues. This workshop reminded us of the amazing invited talk given by Julia Stoyanovich at the 2019 GA4GH plenary meeting, who advised the audience to “data responsibly,” as the world is biased. 

Here’s another “ethical quandary” – how can we allow researchers to share biomedical data and make groundbreaking scientific discoveries without violating individuals’ privacy? This was the subject of the “Navigating Ethical Quandaries with the Privacy Dilemma of Biomedical Datasets” workshop, which was co-organized by both technical experts (like Gamze, a postdoc in computational biology and bioinformatics focusing on genome privacy and Dr. Steven Brenner, who is a Professor of Computational Biology), and legal or ethical experts (like Jennifer Wagner, a JD/PhD focusing on anthropology, genetics, law, and bioethics; Megan Doerr, a governance specialist at the Sage Bionetworks, and John Wilbanks, who is the  the chief commons officer at Sage Bionetworks). At PSB 2019, Gamze and Steven (together with other technical experts) held a session focused on privacy, which was very successful and ended with a fruitful panel discussion. Based on how well it went, they submitted a workshop proposal for 2020 focused mainly on privacy. Simultaneously, Megan Doerr (from Sage Bionetworks) and others submitted a proposal for an ethics workshop, and the conference organizers asked them to merge into one workshop. 

Megan opened this year’s workshop by saying that the marriage of technical and ethical minds in this field used to be a “marriage of convenience,” but now it is a “marriage of love.” This got a big laugh, but we actually ended up circling back to this analogy throughout the workshop. What does a “marriage of love” mean between ethics and technical tools when sharing biomedical datasets? To me, it could mean a few things, but at the very least it meant that as technical and legal/ethical experts, we were actually going to engage with one another, rather than simply talk at each other. Privacy and ethics cannot be separated in these discussions. 

The workshop consisted of several talks, some presenting technical tools/algorithms for protecting patient privacy when sharing biomedical data, and others discussing the ethics of biomedical research. For example, Corey Hudson from Sandia National Labs gave a talk about hackable vulnerabilities in genomic analysis pipelines; I gave a talk about applications of blockchain technology to promote security of genomics data; and Jennifer Wagner gave a talk about legal regulation of diverse biomedical datasets. Lucila Ohno-Machado, Heidi Sofia and Xiaoqian Jiang mentioned the iDASH center and challenges (which we discussed in our previous blog post). The complete list of talks and speakers is as follows:

John Wilbanks, Translational bioethics in a monopoly network era

Jennifer Wagner, Legal Angles and the ‘Illusion’ of Certainty: Regulation of Diverse Data Sets

Heidi Sofia, Advancing data sharing with security and privacy at NIH

Lucila Ohno-Machado, Responsible data sharing: Patient preferences, institutional policies, and privacy technology

Corey Hudson, From buffer overflowing genomics tools to securing genomic pipelines

Charlotte Brannon, Applications of blockchain technology to genomics

Xiaoqian Jiang, Secure Cohort Identification for Clinical Trial using Heterogeneous Healthcare Data

After this mix of technical and legal/ethical talks, we had a panel discussion with all of the speakers, which brought us back to some foundational issues at the intersection of these two modes of thought (the technical and ethical). One question that came up was, what does it mean to share data legally versus ethically? What should the law do to protect patient privacy while permitting biomedical research to move ahead? Megan Doerr brought up the issue of informed consent. As privacy issues remain, we want people to be informed before sharing their data. Yet, achieving truly informed consent is challenging. For example, Megan pointed out, if you ask people whether they are willing to share their “genomic data,” this may not meet the standard for informed consent–many people do not know what the term “genomic data” means. However, it turns out that most people do know what “DNA” means, and therefore it may be a more appropriate term to use when obtaining consent for data sharing. This is a case where asking about “genomic data” might be checking the box of getting consent, but we can do so more ethically by using terms that people really understand. There were also discussions about the problems with broad consent vs. specific consent. 

Evening social event at PSB

As a quick tangent, I even saw examples of this during my time on the Big Island. Over the course of the conference, I had to take several taxi rides between the conference and my hotel a little ways down the coast. Most of the drivers asked what conference I was attending, and what kind of research I was involved in – they were used to meeting people who had come to the Big Island for a conference. In these conversations, it was challenging to find the appropriate jargon and approach to discuss technical topics. Most people I chatted with did not know what I meant by “genome privacy” (the term Gamze and I use colloquially in the lab), but most people understood when I said something like, “looking for ways to share people’s DNA data for scientific research without violating personal privacy.” Actually, most people were eager to discuss this with me for the rest of the car ride! One woman and I actually had a 30 minute conversation about how machine learning works, and what sorts of ethical problems the field of AI faces. This showed me that people are interested in these topics, if they are made accessible. 

One guy I met was especially interested in the focus of our workshop. Upon hearing that I was attending a workshop about DNA privacy, he immediately told me that in Hawaii, people are especially invested in this issue–that Native Hawaiians in particular are weary of using DNA to determine ancestry. I had never heard this before, but later found this 2015 article, which reports a controversy over using blood quantum to determine Hawaiian identity. The article quotes Williamson Chang, a Native Hawaiian law professor at the University of Hawaii, who says it is a very “un-Hawaiian idea” to rely on DNA to define identity; that personal identity and family heritage are more important than the percent composition of blood. 

I found this interesting because it shows that different populations have different stakes when it comes to genome privacy – we need to be able consider this diversity when making privacy laws and standards, or designing processes like informed consent. I appreciated the chances I had at the conference to talk to several non-scientists about the implications of genomic privacy protection.

Photo from Gamze’s visit to Molokai, the fifth-largest of the main Hawaiian islands

At one of the conference social events after the workshop, a fellow trainee asked me how I enjoyed the workshop, did I think it was productive, and if I could wave a magic wand and achieve three things in the field, what would I do? I almost wish this had been the core question of the workshop panel. What three things could we, a mix of lawyers, scientists, and ethicists agree on that would move the field forward? I couldn’t answer the question by myself, and I’m not sure we would have identified three things within the 45 minutes for our panel. We have many lingering questions, which we hope to come back to next year:

  • There are weak or no consequences for privacy violators. How can we impose sanctions so that people and organizations will follow privacy law?
  • What is currently going on in data sharing that is legal, but not ethical? We want to hear from more speakers who can address this.
  • How can we do a better job of moving privacy-protecting tools into real-world scenarios? 

Next year at PSB, we hope to have a chance to discuss some of these questions, and more. In the meantime, we wanted to acknowledge this year’s workshop organizers, listed below.

Workshop Organizers: Gamze Gursoy, Megan Doerr, Steven E. Brenner, Haixu Tang, John Wilbanks, Jennifer K. Wagner

Remembering Dr. Martin Luther King Jr. with Dr. Angela Davis

Reverend Doctor Martin Luther King Jr (taken from biography.com)

As a scientist who has to read a mountain of academic literature everyday, I can never seem to find enough time to read about the history of the country I live in. However, I have a deep appreciation of the Civil Rights Movement and always try to educate myself and learn from it.

Dr. Angela Davis is giving a lecture called “people get ready!” at the Woolsey Hall of Yale University. Pic is taken by Gamze

I feel extremely lucky that the Yale Afro-American Cultural Center invited Dr. Angela Davis – who is an activist and a civil rights icon – to give a lecture commemorating Dr. Martin Luther King Jr, and that I was able to attend. The event, held in Woolsey Hall, was sold out shortly after it was announced. There are so many quotes from the night that I cannot list here. Among the many important issues she addressed, I want to specifically mention two of them. She said “We need to revise the way in which we narrate the history of black people in the Americas”. I have gotten all my history education in my native country, which is far, far away from the US. Therefore, I was not educated in colonial history or the US’s recent history – so maybe that’s why I was hung up on this sentence, as it will make me re-evaluate what I learn about history in this country. She also talked about the importance of feminist movements to battle racism, which made me think of the women’s suffrage movement and how it only served white women (there was a great NY Times article last year on how suffrage betrayed black women). She also added that she is not talking about “capitalist feminism” or “glass-ceiling feminism”, but rather she was talking about more intersectional/inclusive form of feminism.

Next, as part of 2020 MLK commemoration, Yale is hosting activist and renowned poet Nikki Giovanni, again in a sold out event. Sadly neither Charlotte nor myself were able to get a ticket for that event. I will probably have to follow it through twitter.

2019 iDASH Competition & Workshop

In October 2019, we attended the iDASH workshop in Bloomington, IN at the Indiana University Luddy School of Informatics, Computing, and Engineering. iDASH stands for Integrating Data for Analysis, Anonymization, and Sharing, and each year (we believe since 2014) they organize a ‘Secure genome analysis competition’. We are very competitive (especially Gamze) so naturally we wanted to get involved. 

The iDASH organizers set four tracks of the competition, each a different challenge. The 2019 challenges were:

Track I: Distributed Gene-Drug Interaction Data Sharing based on Blockchain and Smart Contracts

Track II: Secure Genotype Imputation using Homomorphic Encryption

Track III: Privacy-preserving Machine Learning as a Service on SGX

Track IV: Secure Collaborative Training of Machine Learning Model

These challenges were exciting to us because they each took place at the cutting edge of privacy research. The idea of storing gene-drug interaction data in the Ethereum blockchain is positively odd–but many of the best ideas are. They were also learning opportunities for us; when we got started, neither of us was an expert in the methods mentioned in the tasks. This meant we were going to learn a lot in the process of designing solutions. The competition was a way for a community of scientists and technical specialists to benchmark methods to protect individual privacy and data security in the context of genomic data. 

Challenges were announced in April, we designed and implemented our solutions over the summer, and the competition culminated in a workshop on October 26 in Bloomington, IN.

At the workshop our team, ‘Team Gerstein Lab,’ was awarded 3rd place for the Track 1 challenge. We gave a talk describing our approach to storing gene-drug interaction data in a smart contract. This was Charlotte’s first time giving a talk at any sort of conference or workshop!

Aside from the exciting technical advances presented in the workshop, one of our favorite things about the day was the number of women speakers we saw (including us). In February 2018 Gamze attended a workshop on genomic privacy in Lausanne, Switzerland. While it was a great event, she was frustrated that among the many speakers there was only one woman. She complained through a tweet and as a response organizers apologized and said “sorry, there are not many female experts in the field of genome privacy that we could invite”. This year’s iDASH workshop revealed that this is not the case… there are indeed many women experts in this field as evidenced from not only the number of female invited speakers but also the workshop attendees and challenge participants.

Throughout the day there were several talks, between three and five per track. Dr. Li Xiong, Professor of Computer Science at Emory University, gave a fascinating keynote talk on the use of differential privacy for health data analysis. Then, the winning teams within each track gave talks. In Track I (which included our talk), the main challenge people faced was engineering– how to actually configure the Ethereum blockchain network and deploy your solution to the chain. The other tracks faced their own issues, for example with the SGX challenge, how to work within the memory constraints imposed by the SGX enclave size.

The day wrapped up with a panel discussion with Xiaofeng Wang (IU; Moderator), Anamaria Costache (Intel), Xiang Xie (PlatON), Rundong Zhou (Baidu), Mariya Georgieva (Inpher), Gamze Gursoy (Yale). This was really useful; it created space for experts to informally discuss some of the more nuanced issues in the field. For example, how can we develop tools that scientists and bioinformaticians will actually use? Or how can we trust privacy-protecting tools developed by for-profit companies, such as Intel? One challenge that was mentioned was that many of the bioinformaticians who work with private data are not even aware that much of the computations can now be done in the encrypted space.

Genome privacy and security are relatively new areas of study and concern, and we loved joining the iDASH community to learn more about them and contribute our solutions and ideas. Already looking forward to iDASH 2020!

iDASH 2019 website can be found at http://www.humangenomeprivacy.org/2019/index.html

Our first post

Welcome to GC content! Gamze Gürsoy and Charlotte Brannon, here–we are two scientists currently working in Mark Gerstein’s group at Yale University. We hail from different parts of the world (Gamze from Istanbul, Charlotte from Houston), are at different parts of our careers (Gamze a postdoc, Charlotte a postgrad), and have a variety of different research interests (too many for a parenthetical). But we both share a passion for our research, and an interest in the intersection of biology, computer science, and ethics. Also, as feminist scientists, we are excited about women’s voices in science. We are starting this blog to have a forum to write about this mix of interesting issues. We hope you enjoy it!

“GC content” typically refers to the percentage of Guanine and Cytosine base pairs in a region of DNA. According to us, it also refers to “Gamze and Charlotte’s” content…

“GC content” is our clever way of highlighting ourselves–Gamze and Charlotte–and our focus on genomics. GC content refers to the percentage of Guanine-Cytosine base pairs in a particular fragment of DNA or RNA. A GC base pair is held together by 3 hydrogen bonds and these 3 bonds represent our commitment to science, yale, and publications… just kidding! We have no more of an analogy, at least not yet. In this blog we hope to write about science, and also topics pertaining to our own lives and interests (i.e. our content). By including a variety of posts, we hope to bring humanity to science and vice versa.

Please visit our “About us” page to learn more.

Create your website at WordPress.com
Get started