Acknowledging that we have a problem with health equity is a start. Now, we begin to address it by open-sourcing our toolset: available now on GitHub.
At KenSci, we live at the intersection of healthcare and technology. Healthcare, and the vocation of medicine, are steeped in responsibility and instituted with the intent to do good for humanity - to promote equity in health. Health equity is allowing each individual the equal opportunity to achieve their full health potential. Unfortunately, this intent has not always been realized. Among many challenges, social and economic inequity has permeated healthcare resulting in the long term and significant health disparities among, but not limited to, groups of different races, ethnicities, ages, and sex [1-6]. As the growth of data and computation changes healthcare, we must keep these challenges in mind and work proactively to consider the unintended consequences that any new technologies may have. With the application of new technologies such as Artificial Intelligence & Machine Learning, across the continuum of care it is prudent to ensure that technology aids humanity and not the reverse. Hence, Responsible AI that is fair, transparent, robust, and accountable, is central to driving adoption in the field and promoting health equity.
A central tenet of Responsible AI is that AI should be aware of diversity and fairness as a fundamental construct of the data, models, KPIs, and the context in which it is being applied. Machine Learning models are algorithms trained on existing data, and as such, they often carry with them the biases of prior practices . Without vigilance and methods of surveillance, even the most well-intentioned machine learning model can propagate the biases that were present in the training data. Of course, biases in prognostic models in healthcare are not novel [8-10]. Yet, given the advanced tooling capabilities of today, we are now in a position to recognize these biases and work to rectify them .
Cynthia Rudin, Professor of Computer Science, Electrical and Computer Engineering, and Statistical Science at Duke University
“I am excited about KenSci’s recent efforts to improve fairness and reduce bias in healthcare. This is an important yet historically under-appreciated field, and the scientific community absolutely needs to address it right now” according to Cynthia Rudin, Professor of Computer Science, Electrical and Computer Engineering, and Statistical Science, Duke University.
A multitude of biases may be present in healthcare and can stem from the data, the algorithms, or the model outputs (see Figure 1). Importantly, no tool or strategy will remove all evidence of bias. Instead, we suggest Responsible AI best practices including close collaboration between data scientists and subject matter experts to consider the range of biases and ask the following questions:
- Which biases may exert harm on groups or individuals, particularly those representing historically marginalized groups?
- Which biases are having the greatest impact on current operations?
- Which biases are within the purview of the modelers or those applying the models to challenge?
Figure 1:The multitude of source of biases in healthcare that can come from data, models, KPIs, and the context. Source: Slides from “Fairness in Machine Learning for Healthcare” (Tutorial at KDD 2020)
To answer these questions, we created an open-source library and are making it now available on GitHub- fairMLHealth. This library, and the accompanying body of work, seeks to address a deficiency in the healthcare AI domain. Despite the prominence of discussions and published works around fairness and bias in healthcare AI, practical efforts to assist in identifying these issues, such as specific software libraries, have been largely missing. While there are preexisting libraries focused on fairness, they prioritize general aspects of fairness, our library allows for an easy comparison of numerous fairness metrics across multiple machine learning algorithms. And it is designed in a way to be extensible as new research emerges and improvements are made in this field. Developing a community around this topic is non-trivial given the challenges of understanding the nuances of both the fairness metrics and the computational challenges in machine learning. The fairMLHealth library also provides templates and tutorials on the meaning of various fairness metrics, relevance, their limitations, and their use in applied settings. This library builds on KenSci’s commitment to Responsible AI. Our team has published numerous manuscripts, delivered tutorials at major AI and ML conferences, and conversed with healthcare leaders around the world to promote key elements of Responsible AI - including fairness, bias, explainability, and human oversight [12-15].
Figure 2:The fairMLHealth tool allows for an easy comparison of numerous fairness metrics across multiple machine learning algorithms. Source: https://kensciresearch.github.io/fairMLHealth/
Consideration of fairness and bias is essential across AI applications in healthcare. The realities of data collection, issues of transparency in end-user acceptance, and the sensitivities around long-existing health disparities all point to this as a requirement. And the tools to get started, to uncover the biases in these complex models, can only be supplied by the scientific community.
At KenSci, we use these tools as we prepare and deliver algorithms that risk-stratify patients across different disease types and cohorts. In collaborative work predicting outcomes following shoulder arthroplasty [16, 17], for example, we are using the fairness library to identify patient cohorts with disparate performance. Importantly, this tool also enables the evaluation of intersectional fairness. Intersectionality is an oft-overlooked but critical consideration for health equity when aspiring to the promises of precision medicine.
“Faster and wider adoption of machine learning based clinical decision-support tools will occur if healthcare professionals deem these predictive algorithms to be trustworthy,” said Chris Roche, VP of Extremities at Exactech, Inc. “To be proven trustworthy, these evidence-based AI techniques must not only be accurate in their predictions but also transparent in their limitations, requiring clear communication of any potential bias. Tools such as the FairMLHealth library help ensure that machine learning predictive models are adequately tested for various bias metrics that may arise from within the data or from creation of the models.”
Chris Roche, VP of Extremities at Exactech, Inc.
At KenSci, the accountability of our models is not an afterthought, or an optional product feature or part of a premium SKU - it is a basic component. All model developers should be able to hold their work accountable using the best techniques available today. By making the fairMLHealth library freely available on GitHub, we also invite the wider community to join us. Like so many other challenging problems in healthcare, issues of bias and transparency will be best solved by collaborating across groups. Saving lives with data and AI begins with addressing basic problems in health equity. This then leads to building trust. It takes a village. We’ve taken the first step and now turn towards the healthcare and technology community to continue to guide us.
- Allen C, Ahmad MA, Eckert C, Hu J, Kumar V, Teredesai A. fairML-Health: Tools and tutorials for evaluation of fairness and bias in healthcare applications of machine learning models. 2020 Oct 23. https://github.com/KenSciResearch/fairMLHealth.
- Ahmad MA, Patel A, Eckert C, Kumar V, Teredesai A. Fairness in Machine Learning for Healthcare. InProceedings of the 26th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining 2020 Aug 23 (pp. 3529-3530).
- Thomas SB, Quinn SC. The Tuskegee Syphilis Study, 1932 to 1972: implications for HIV education and AIDS risk education programs in the black community. Am J Public Health. 1991;81(11):1498-1505. doi:10.2105/ajph.81.11.1498
- Randall, Vernellia R. "Slavery, Segregation and Racism: Trusting the Health Care System Ain't Always Easy--An African American Perspective on Bioethics." . Louis U. Pub. L. Rev. 15 (1995): 191.
- de Malave, Florita Z. Louis. Sterilization of Puerto Rican women: a selected, partially annotated bibliography. University of Wisconsin System, Women's Studies Librarian, 1999.
- Bierman, Arlene S. "Sex matters: gender disparities in quality and outcomes of care." Cmaj 177, no. 12 (2007): 1520-1521.
- Chen, Esther H., Frances S. Shofer, Anthony J. Dean, Judd E. Hollander, William G. Baxt, Jennifer L. Robey, Keara L. Sease, and Angela M. Mills. "Gender disparity in analgesic treatment of emergency department patients with acute abdominal pain." Academic Emergency Medicine 15, no. 5 (2008): 414-418.
- Wailoo, Keith. "Sickle cell disease—a history of progress and peril." N Engl J Med 376, no. 9 (2017): 805-807.
- Angwin, J.; Larson, J.; Mattu, S.; and Kirchner, L. 2016b. Machine bias: There’s software used across the country to predict future criminals. and it’s biased against blacks. Accessed May 26, 2017.
- Stanley J. Pitfalls of artificial intelligence decision making highlighted in Idaho ACLU case. Privacy & Technology. 2017.
- Levey AS, Stevens LA, Schmid CH, et al. A new equation to estimate glomerular filtration rate. Ann Intern Med 2009;150: 604-12.
- Grobman WA, Lai Y, Landon MB, et al. Development of a nomogram for prediction of vaginal birth after cesarean delivery. Obstet Gynecol 2007;109:806-12.
- Vyas DA, Eisenstein LG, Jones DS. Hidden in plain sight—reconsidering the use of race correction in clinical algorithms.
- Ahmad MA, Eckert C, Teredesai A. Interpretable machine learning in healthcare. In Proceedings of the 2018 ACM international conference on bioinformatics, computational biology, and health informatics 2018 Aug 15 (pp. 559-560).
- Eckert C and Ahmad M. Interpretable Machine Learning: What Clinical Informaticists Need to Know. AMIA Clinical Informatics Conference. April 2019.
- Ahmad MA, Teredesai A, Eckert C. Fairness, accountability, transparency in AI at scale: lessons from national programs. In Proceedings of the 2020 Conference on Fairness, Accountability, and Transparency 2020 Jan 27 (pp. 690-690).
- Ahmad MA, Patel A, Eckert C, Kumar V, Teredesai A. Fairness in Machine Learning for Healthcare. In Proceedings of the 26th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining 2020 Aug 23 (pp. 3529-3530).
- Kumar V, Roche C, Overman S, Simovitch R, Flurin PH, Wright T, Zuckerman J, Routman H, Teredesai A. Using Machine Learning to Predict Clinical Outcomes After Shoulder Arthroplasty with a Minimal Feature Set. Journal of Shoulder and Elbow Surgery. 2020 Aug 19.
- Kumar V, Roche C, Overman S, Simovitch R, Flurin PH, Wright T, Zuckerman J, Routman H, Teredesai A. What Is the Accuracy of Three Different Machine Learning Techniques to Predict Clinical Outcomes After Shoulder Arthroplasty?. Clinical Orthopaedics and Related Research®. 2020 Sep 4.
Banner image source:
- The Guardian: https://www.theguardian.com/world/2020/aug/17/black-babies-survival-black-doctors-study
- USANews: https://www.usnews.com/news/healthiest-communities/articles/2019-08-01/black-babies-at-highest-risk-of-infant-mortality
Authors: Carly Eckert, Muhammad Aurangzeb Ahmad, Christine Allen, Vikas Kumar, Arpit Patel, Corinne Stroum, Juhua Hu, and Ankur Teredesai
Corresponding Author: Muhammad Aurangzeb Ahmad (firstname.lastname@example.org)