A 2023 paper estimating that 80% of U.S. workers have tasks exposed to large language models has been cited by the IMF, European Parliament, and referenced in U.S. Senate proposals. Three years later, the mismatch between this paper's original conclusions and where its findings are being applied has consequences for policy decisions. The limitations of these scores are not independent; scores calculated against one model, using an American taxonomy, decomposed into discrete tasks, compounds rather than simply accumulates these constraints. More dynamic, representative, and actionable evidence measurement tools exist, however they are not reaching policymakers at the pace the policy conversation requires. Both policymakers and researchers have a role in closing the gap between the evidence we have and the decisions we need to make, and both need to start treating workers as partners in that process, not subjects of analysis.
The most widely cited version of this estimate comes from a 2023 paper, “GPTs are GPTs,” by Eloundou et al. Their headline findings: 80% of the U.S. workforce has at least 10% of their occupational tasks exposed to large language models, and 19% have 50% or more. These numbers have traveled widely – they have been cited by the IMF, the OECD, referenced in U.S. Senate proposals, and built upon by research institutions across multiple countries. The figure below maps this distance across recent AI labor market research. Like all empirical tools, the GPTs are GPTs scores are a bounded instrument, and the distance between what they were designed to answer and what they are being asked to support deserves attention.
The GPTs are GPTs scores measure the technical feasibility of a GPT-4-era model, evaluated against the U.S. Department of Labor’s occupational taxonomy, for tasks with verifiable outputs that can be completed faster with AI assistance. That is a specific and answerable question, and the paper addresses it carefully. But that specificity matters for three main reasons: the scores reflect a model from early 2023. Since then, frontier AI capabilities have improved substantially, with one index estimating a roughly 26 percentage point gap between the model represented by the GPTs are GPTs scores and current AI capabilities. The scores are built on an American occupational taxonomy that does not transfer cleanly onto other labor markets, even with translation. The scores model work as a bundle of discrete, scorable tasks, which captures what can be itemized but not the judgment, relationships, and context that often constitute the most consequential parts of a job.
Source: cohere