Joshua Plotkin, a biology professor at the University of Pennsylvania, uses mathematics to study the forces that drive organisms like viruses to evolve.
But now, with the help of an unlikely tool, he wants to use similar mathematical models to shed light on something equally dynamic: language.
"At the beginning, when evolutionary biology really got going with Darwin, it was actually really closely intercalated with language studies and cultural evolution," said Plotkin.
Vast amounts of textual data found within ancient libraries and archives allowed linguists a head start on developing their own theories of change compared to budding biologists.
Without a similarly large fossil record, Charles Darwin cited the work of linguists to lend credence to his ideas on survival, competition, and fitness within the animal kingdom.
But today, thanks to technological advancements like DNA analysis, Plotkin says much more is known about how organisms change than how cultural markers like language do -- despite the existence of Google Books and social media platforms.
Since antiquity, linguists have used written texts to show how words and grammar have transformed over time and trace the impact of inventions, migrations, and cultural shifts.
Plotkin wanted to use digitized textual archives to apply the math he used to study genes to uncover previously hidden patterns in language.
But when Plotkin shared his ideas with his linguist friends, they were skeptical.
Not in his math, but in the data he wanted to use.
"They were kind of like, 'That's not really language,'" Plotkin said.
Real language, they argued, wasn't the polished prose found in books or even the casual scrawlings posted on someone's Facebook wall.
It was the everyday, fast-paced, messy conversations that people have in real life interactions.
Spoken language is where subtle changes take root and spread from person to person, only to show up much later in written texts.
For this reason, spoken language is also more difficult to study with traditional methods than written texts. Shifts in pronunciation and word usage happen in real-time and are hard to capture and quantify.
"And I thought, if we just had some data set of how spoken language has changed over time. That would be like, perfect -- a goldmine of really studying a real, important cultural change," Plotkin said.
Then his linguist friends told him about a treasure trove of data that existed, and it was just a short walk from his office.
That goldmine is called the Philadelphia Neighborhood Corpus, a collection of more than 400 recordings of everyday Philadelphians, spanning from 1972 to 2012.
The project, spearheaded by pioneering linguist William Labov, is an archive of the city's linguistic evolution, preserved not in written form but in raw, recorded speech.
Meredith Tamminga, a linguist and one of Labov's former students, now oversees the collection, which for years was stored in its physical form inside her mentor's old lab at an old Victorian house on campus.