Tag Archives: conversation

Most conversations don’t end when people want them to, study finds

Ever feel trapped in a never-ending conversation? Well, you’re not the only one, according to a new study. A group of researchers surveyed over 800 people and found conversations almost never end when both parties want them to – and people don’t really think about their partner’s desires to end the conversation.

Image credit: Flickr / Felipe Cabrera

Back when he was studying for his master’s degree at the University of Oxford, Adam Mastroianni used to attend black-tie events and wonder how many people were stuck in conversations they couldn’t get out of. “What if we’re all trapped in conversations because we mistakenly think the other person wants to continue?” he asked himself.

It’s probably happened to all of us at some point. You’re not really into the conversation, but you don’t want to be rude so you half-heartedly keep going. Turns out, it’s very common.

Most previous studies about conversations were done by linguistics or sociologists. Psychologists did look at conversations but only as a way to study other things, such as how people use words to persuade. A few studies have explored the phrases used by individuals at the end of conversations, but not focusing on when people choose to say them.

“People feel like it’s a social rupture to say: ‘I’m ready to go’, or to say: ‘I want to keep going although I feel like you don’t want to keep going.’ Because of that, we’re pretty skilled at not broadcasting that information,” Mastroianni told New Scientist. “Whatever you think the other person wants, you may well be wrong. So you might as well leave at the first time it seemed appropriate.”

Now a Ph.D. student in Psychology, Mastroianni wanted to get some answers. With a group of researchers, he surveyed over 800 people randomly recruited from a crowdsourcing marketplace website. Participants responded to questions about recent conservations they had, including how they felt about the length and the way the conversation ended.

The researchers also worked with more than 250 students and non-students pooled from volunteers available for studies in the Harvard University psychology department. The group participated in one-on-one conversations with another participant, who they didn’t already know. They could chat for as long as they wanted, up to a maximum of 45 minutes.

When they finished talking, both study participants could leave the room and each was quizzed about the conversation. If the conversation lasted 45 minutes, one of the researchers stepped into the room to end it. Most of the pairs engaged in chitchat about where they grew up or what they were studying. It was mostly boring, even “hard to watch them,” Mastroianni recalls.

The findings showed most of the conversations rarely ended when people wanted to, even when both participants wanted to stop. The length of the conversations was off by about 50% compared with how long people would have liked them to last. Only 10% of the participants ended the conversations even though both people wanted to continue.

“They could have kept going; they had time left. But for some reason they stopped, maybe thinking they were doing a nice thing by letting the other person go,” Mastroianni said. People in conversations want different endpoints and know very little about what their partners actually want. But this doesn’t mean they don’t enjoy the conversations, he added.

Thalia Wheatley, a social psychologist at Dartmouth College, who was not involved in the study, told Scientific American it was “astounding” to find out people fail so much in judging when a conversation partner wisher to wrap things up. Conversations are otherwise “such an elegant expression of mutual coordination,” she said. “And yet it all falls apart at the end because we just can’t figure out when to stop.”

The researchers only covered people from the United States, which raises the question of whether the rules of conversations are clearer in other cultures or not. The study was published in the journal PNAS.

Dialogue.

People learn to predict which words come after ‘um’ in a conversation — but not with foreigners

People can learn to predict what a speaker will say after a disfluency (such as ‘um’ or ‘aaah’). However, this only seems to work with those that share their native tongue, not foreigners.

Dialogue.

Image via Pixabay.

Even flowing conversation is peppered with disfluencies — short pauses and ‘umm‘s, ‘ahh’s, ‘ugh’s. On average, people produce roughly 6 disfluencies per 100 words. A new paper reports that such disfluencies do not occur randomly — they typically come before ‘hard-to-name’ or low-frequency words (such as ‘automobile’ instead of ‘car’).

The team notes that, while previous research has shown that people can use disfluencies to predict when such a low-frequency (uncommon) word is incoming, no research has established whether listeners would actively track the occurrence of ‘uh’, even when it appeared in unexpected places. And that’s exactly what this present study wanted to find out.

Small pauses for big words

The team asked two groups of Dutch participants (41 in total, 30 of which produced useable data) to look at sets of two images on a screen (one ‘common, such as a hand, and an ‘uncommon’ one such as an igloo) while listening to both fluent and disfluent instructions. These instructions would tell participants to click on one of the two images. One of the groups received instructions spoken in a ‘typical’ manner — in which the talker would say ‘uh’ before low-frequency words — while the other group received ‘atypical’ instructions — in which the talker said ‘uh’ before high-frequency words.

Eye-tracking devices were used to keep track of where each participant was looking during the trial. What the team was interested in finding was whether participants in the second group would keep track of the unexpected ‘uh’s and would learn to expect the common object after them.

At the start of the experiment, participants listening to ‘typical’ instructions immediately looked at the igloo upon hearing the disfluency, as did those in the atypical group. Note that the team intentionally left a relatively long pause between the ‘uh’ and the following word, so the participants looked at an object even before hearing the word itself. However, people in the atypical group quickly learned to adjust this natural prediction and started looking at the common object upon hearing a disfluency.

“We take this as evidence that listeners actively keep track of when and where talkers say ‘uh’ in spoken communication, adjusting what they predict will come next for different talkers,” explains lead author Hans Rutger Bosker from the Max Planck Institute for Psycholinguistics.

The team also wanted to see if this effect would hold for non-native speakers. In a follow-up experiment — one that used the same set-up and instructions but this time spoken with a heavy Romanian accent — participants learned to predict uncommon words following the disfluencies of a ‘typical’ (‘uh’ before low-frequency words) non-native talker. However, they didn’t start predicting high-frequency words in an ‘atypical’ non-native speaker, despite the fact that the same sentences were used in the native and non-native experiments.

“This probably indicates that hearing a few atypical disfluent instructions (e.g., the non-native talker saying ‘uh’ before common words like “hand” and “car”) led listeners to infer that the non-native speaker had difficulty naming even simple words in Dutch,” says co-author Geertje van Bergen.

“As such, they presumably took the non-native disfluencies to not be predictive of the word to follow — in spite of the clear distributional cues indicating otherwise.”

The findings suggest an interplay between ‘disfluency tracking’ and ‘pragmatic inferencing’, according to the team. In non-science speak, that largely means we only track disfluencies if the talker’s voice makes us believe they are a reliable umm’er.

“We’ve known about disfluencies triggering prediction for more than 10 years now, but we demonstrate that these predictive strategies are malleable. People actively track when particular talkers say ‘uh’ on a moment by moment basis, adjusting their predictions about what will come next,” explains Bosker.

The paper “How tracking the distribution of native and non-native disfluencies influences online language comprehension” has been published in the Journal of Memory and Language.

Singing mouse.

The brains of singing mice might hold the secret of how we engage in conversation

Singing mice from the cloud forests of Costa Rica could help us better understand how our brains handle speech.

Singing mouse.

Image credits NYU School of Medicine.

The male Alston’s singing mouse (Scotinomys teguina) is quite the skillful bard. These tiny mammals can produce songs from a repertoire of almost one hundred audible noises and a host of sounds we can’t even perceive. There’s also surprising structure to their musical interactions — much like humans engaged in conversation, the mice challenge their competitors by singing in turn, a new paper explains.

The brains of these mice can help us understand the brain mechanisms that underpin our own ability to converse with one another. We tend to take this ability pretty much for granted, but it’s nowhere near widespread in nature, the paper notes. Standard laboratory mice, for example, produce ultrasonic sounds without evident timing of exchanges.

I talk, then you talk, and that’s our, communication hack

“Our work directly demonstrates that a brain region called the motor cortex is needed for both these mice and for humans to vocally interact,” says senior study author Michael Long, PhD, an associate professor of neuroscience at the New York University (NYU) School of Medicine.

Evolution has separated the duties of sound production and control circuits (i.e. those that handle the timing of replies) in the brains of singing mice, the team reports. This is similar to what is seen in crickets, some species of birds, and “possibly human discussion”, adds study co-first author Arkarup Banerjee, a post-doctoral researcher in Long’s lab.

The findings are based on electromyography measurements which the team performed on singing mice, meant to determine the relationship between different brain centers and muscular contractions. The readings were performed on two mice which coordinated their responses.

It’s an exciting find, the team adds, as we simply don’t have suitable mammalian models for the study of back-and-forth communication in the wild. A lot of animals engage in vocalization, sure, but their communication is more similar to a chatroom where everybody is talking at the same time than a balanced conversation. Up to now, they explain, the most reliable animal model neuroscientists could use to study vocal exchanges was the marmoset (family Callitrichidae), but it, too, came with significant limitations: their conversational turns are very slow compared to human speech, and unlikely to result from the fast muscle response to sensory cues, the team notes.

And no hard feelings, marmosets, but that just doesn’t cut it:

“We need to understand how our brains generate verbal replies instantly using nearly a hundred muscles if we are to design new treatments for the many Americans for whom this process has failed, often because of diseases such as autism or traumatic events, like stroke,” says Long.

The team found that the brains of singing mice come equipped with specialized areas that control how their muscles create specific notes. Separate circuits in the motor cortex enable the fast starts and stops that form a conversation between vocal partners. The former areas allow these mice to create the actual sounds, while the latter control their timing to prevent a cacophony.

Mice’s songs also change in social situations as individuals “bend and break the songs” to converse. They also report finding a functional “hotspot” in the side of the front of the motor cortex — the orofacial motor cortex or OMC — that regulates song timing.

In the future, the team plans to apply the mouse model to guide similar exploration of human speech circuits. They hope that understanding how two brains engage in conversations can help us identify what goes wrong in the context of disorders that interfere with communication, and even finding cures.

The paper ” Motor cortical control of vocal interaction in neotropical singing mice” has been published in the journal Science.

Text bubble.

AI spots depression by looking at your patterns of speech

A new algorithm developed at MIT can help spot signs of depression from a simple sample (text of audio) of conversation.

Text bubble.

Image credits Maxpixel.

Depression has often been referred to as the hidden depression of modern times, and the figures seem to support this view: 300 million people around the world have depression, according to the World Health Organization. The worst part about it is that many people live and struggle with undiagnosed depression day after day for years, and it has profoundly negative effects on their quality of life.

Our quest to root out depression in our midst has brought artificial intelligence to the fray. Machine learning has seen increased use as a diagnostics aid against the disorder in recent years. Such applications are trained to pick up on words and intonations of speech that may indicate depression. However, they’re of limited use as the software draws on an individual’s answers to specific questions.

In a bid to bring the full might of the silicon brain to bear on the matter, MIT researchers have developed a neural network that can look for signs of depression in any type of conversation. The software can accurately predict if an individual is depressed without needing any other information about the questions and answers.

Hidden in plain sight

“The first hints we have that a person is happy, excited, sad, or has some serious cognitive condition, such as depression, is through their speech,” says first author Tuka Alhanai, a researcher in the Computer Science and Artificial Intelligence Laboratory (CSAIL).

“If you want to deploy [depression-detection] models in scalable way […] you want to minimize the amount of constraints you have on the data you’re using. You want to deploy it in any regular conversation and have the model pick up, from the natural interaction, the state of the individual.”

The team based their algorithm on a technique called sequence modeling, which sees use mostly in speech-processing applications. They fed the neural network samples of text and audio recordings of questions and answers used in diagnostics, from both depressed and non-depressed individuals, one by one. The samples were obtained from a dataset of 142 interactions from the Distress Analysis Interview Corpus (DAIC).

The DAIC contains clinical interviews designed to support the diagnosis of psychological distress conditions such as anxiety, depression, and post-traumatic stress disorder. Each subject is rated ,in terms of depression, on a scale between 0 to 27, using the Personal Health Questionnaire. Scores between moderate (10 to 14) and moderately severe (15 to 19) are considered depressed, while all others below that threshold are considered not depressed. Out of all the subjects in the dataset, 28 (20 percent) were labeled as depressed.

Simple diagram of the network. LSTM stands for Long Short-Term Memory.
Image credits Tuka Alhanai, Mohammad Ghassemi, James Glass, (2018), Interspeech.

The model drew on this wealth of data to uncover speech patterns for people with or without depression. For example, past research has shown that words such as “sad,” “low,” or “down,” may be paired with audio signals that are flatter and more monotone in depressed individuals. Individuals with depression may also speak more slowly and use longer pauses between words.

The model’s job was to determine whether any patterns of speech from an individual were predictive of depression or not.

“The model sees sequences of words or speaking style, and determines that these patterns are more likely to be seen in people who are depressed or not depressed,” Alhanai says. “Then, if it sees the same sequences in new subjects, it can predict if they’re depressed too.”

Samples from the DAIC were also used to test the network’s efficiency. It was measured on its precision (whether the individuals it identified as depressed had been diagnosed as depressed) and recall (whether it could identify all subjects who were diagnosed as depressed in the entire dataset). It scored 71% on precision and 83% on recall for an averaged combined score of 77%, the team writes. While it may not sound that impressive, the authors write that this outperforms similar models in the majority of tests.

The model had a much harder time spotting depression from audio than text. For the latter, the model needed an average of seven question-answer sequences to accurately diagnose depression. With audio, it needed around 30 sequences. The team says this “implies that the patterns in words people use that are predictive of depression happen in a shorter time span in text than in audio,” a surprising insight that should help tailor further research into the disorder.

The results are significant as the model can detect patterns indicative of depression, and then map those patterns to new individuals, with no additional information. It can run on virtually any kind of conversation. Other models, by contrast, only work with specific questions — for example, a straightforward inquiry, “Do you have a history of depression?”. The models then compare a subject’s response to standard ones hard-wired into their code to determine if they are depressed.

“But that’s not how natural conversations work,” Alhanai says.

“We call [the new model] ‘context-free,’ because you’re not putting any constraints into the types of questions you’re looking for and the type of responses to those questions.”

The team hopes their model will be used to detect signs of depression in natural conversation. It could, for instance, be remade into a phone app that monitors its user’s texts and voice communication for signs of depression, and alert them to it. This could be very useful for those who can’t get to a clinician for an initial diagnosis, due to distance, cost, or a lack of awareness that something may be wrong, the team writes.

However, in a post-Cambridge-Analytica-scandal world, that may be just outside of the comfort zone of many. Time will tell. Still, the model can still be used as a diagnosis aid in clinical offices, says co-author James Glass, a senior research scientist in CSAIL.

“Every patient will talk differently, and if the model sees changes maybe it will be a flag to the doctors,” he says. “This is a step forward in seeing if we can do something assistive to help clinicians.”

Truth be told, while the model does seem very good at spotting depression, the team doesn’t really understand what crumbs it follows to do so. “The next challenge is finding out what data it’s seized upon,” Glass concludes.

Apart from this, the team also plans to expand their model with data from many more subjects — both for depression and other cognitive conditions.

The paper “Detecting Depression with Audio/Text Sequence Modeling of Interviews” has been published in the journal Interspeech.

Mother chimp with her infant. M. Fröhlich

Chimps and Bonobos use sounds and gestures back-and-forth, mimicking human conversation

A conversation is a two-way street where cooperation is paramount. When cooperation between two or more people ends, like in the heat of an argument when shouting ensues, the conversation is officially over too. But although humans are the only Earthlings gifted with the power of speech, researchers found at least two other species, namely bonobos and chimpanzees, make use of conversational cooperation.

Mother chimp with her infant. M. Fröhlich

Mother chimp with her infant. Credit: M. Fröhlich

The team made of researchers from Max Planck‘s Institute for Ornithology and Institute for Evolutionary Anthropology monitored the communicative gestures of mother-infant pairs in four communities: two of chimpanzee and two of bonobos. They chose to follow mother-infant interactions because these are somewhat analogous to human mother-baby ones, in the sense that these are limited to unarticulated sounds and gestures.

After two years of closely following the bonobos and chimps from the Salonga National Park and Luo Scientific Reserve in the Democratic Republic of Congo, researchers came to the conclusion that communicative exchanges in both species resemble cooperative turn-taking sequences in human conversation. In other words, the mothers and infants recognized the pair was engaged in a conversation, and each took turns to signal their thoughts or listen.

There were some slight, but important differences in the way the two species converse, too. Marlen Froehlich, one of the lead authors of the study published in Scientific Reports, said, “(for bonobos) gaze plays a more important role and they seem to anticipate signals before they have been fully articulated.” Chimps, on the other hand, take their time and seem to use more complex cooperative elements like signaling, pausing and responding.

“By taking into consideration intra- and inter-species variability and by focusing on the mother-infant dyad, our results showed that all observed dyads across groups frequently engaged in turn-taking sequences to negotiate joint travel. They established participation frameworks via gaze, body orientation and the adjustment of initiation distance, and they used adjacency pair-like sequences characterized by gesture-response pairs and response waiting. Regarding temporal relationships between signals and responses, we found that mother-infant dyads of both species used the whole spectrum of responses, including immediate, overlapping and even delayed responses. Immediate responses match the temporal relations between turns in human speech consisting of relatively little cultural variation (e.g. overall cross-linguistic median of 100 ms, ranging from 0 ms in the English and Japanese culture, for instance, to 300 ms in the Danish and Lao culture),” the authores wrote in the study.

Following the way great apes cooperate, interact and converse in their highly complex societies might one day unravel the origin of human speech, a subject of great interest but also debate among scholars. Many agree, however, that the first precursors of speech were gestures. As such, bonobos might be the most representative animal model for understanding the very elemental prerequisites for human speech.

“Communicative interactions of great apes thus show the hallmarks of human social action during conversation and suggest that cooperative communication arose as a way of coordinating collaborative activities more efficiently,” says Simone Pika, head of the study.