The Rockefeller Foundation has announced the launch of a funder collaborative aimed at providing data scientists, researchers, and social entrepreneurs in low- and middle-income countries with the resources they need to produce new datasets that address an underserved population or problem, augment existing datasets so they are more representative of a target population, and/or update old datasets to be more sustainable.
According to the foundation, the data required to build AI applications in many low-resource settings often are outdated, missing key information, or not representative of underserved populations — if they exist at all — resulting in biases and lower accuracy. Machine learning tools then "learn" these biases, which can result in harmful outcomes for people of color, women, and other marginalized populations.
Launched with $4 million in pooled funds from the Rockefeller Foundation, Google.org, and Canada's International Development Research Centre, the Lacuna Fund aims to address such gaps by awarding grants for the creation, expansion, and/or maintenance of equitably labeled datasets; deepening understanding of how to effectively and efficiently fund the development of such datasets; and enabling underserved populations to take advantage of AI advances.
To that end, the fund initially will focus on three areas: language, for which labeled training data are required for natural language processing (NLP) but which do not currently exist for many languages; agriculture, in which AI-enabled tools have great potential to increase production and resilience and advance broader sustainable development goals; and health, with a focus on expanding datasets related to COVID-19 and respiratory illness response to make them more representative of and useful in treating populations impacted by the virus. The first call for proposals focused on underrepresented languages is supported by the German development agency GIZ on behalf of the Federal Ministry for Economic Cooperation and Development.
"Labeled datasets have fueled massive innovation in machine learning over the last decade," said Evan Tachovsky, lead data scientist and director of innovation at the Rockefeller Foundation. "However, people and problems from low-income countries have been left out of these datasets and in turn haven't benefited from new technology. We're proud to launch the Lacuna Fund to close these data gaps and fuel the next generation of machine learning that works for all."
"The goal is open datasets. But this is bigger than just creating the datasets," said Daphne Luong, director of engineering at Google AI Research. "We want to create innovative, scalable and replicable data protocols, so they can be applied to different data domains as well as other geographical regions. Eventually we hope that more representative and accessible data will allow machine learning to better serve communities worldwide."
(Photo credit: GettyImages)