Tag Archives: speech

Baby seals can modulate the pitch of their voice, much like humans do

Credit: John O’ Connor.

In the 1980s, one of the biggest attractions at the New England Aquarium was Hoover the Talking Seal. The name says it all, as Hoover was famous for parroting human speech. The female seal was rescued as a pup by a man from Maine who took Hoover home before moving to the aquarium. During this time, Hoover remarkably learned how to imitate some of his owner’s vocal antics, including the Maine accent.

You can clearly hear the seal saying “Hoover get over here! Come on, come on!” in this 1984 recording.

Vocal learning — the ability to acquire vocalization through learning — is very rare in the animal kingdom. However, scientists are very much interested in this trait as it may reveal the evolutionary path that took our ancestors from blabbering primates to the highly articulated beings we are today, capable of conveying speech, singing, and a wide ranging vocal repertoire.

Besides humans, this ability is often restricted to birds, which are, evolutionarily-speaking, as far away from us as dinosaurs are. This is why Hoover’s story is so important since seals seem to be one of the rare examples of vocal control and plasticity in mammals. But is this quality present across the species or was Hoover just some oddball?

This inquiry inspired Andrea Ravignani from the Max Planck Institute for Psycholinguistics to conduct a new study that investigated the vocalization abilities of seals. Ravignani and colleagues studied eight harbor seal pups no older than three weeks that were held in captivity at a rehabilitation center in the Netherlands before being released back into the wild.

Over the course of several days, the pups were exposed to audio recordings of noises from the nearby Wadden Sea. These recordings were not chosen by accident. Co-author Laura Torres Borda of the University of Paris 13 went there with a microphone to record the natural noises of the sea because that’s what seal pups are used to in their natural habitat.

“The main challenge was to design the experiment in a way which would be meaningful to better understand the origins of human speech,” Ravignani said.

The recordings were played back at three volumes, varying from almost silence to 65 decibels (equivalent to the noise made by a fast car at 25 feet away), but with a similar tone to that of the seal pups’ calls.

Humans and animals alike raise their voices when there’s noise in the environment so they’re heard and understood better. This is known as the Lombard effect, and one seal clearly demonstrated this phenomenon, producing louder calls when the audio levels were higher. But that’s not all they did.

“If baby seals acted as most animals, we would just expect them to increase the intensity of their voices as noise increased. However, what seals did was lower the pitch of their voices to escape the frequency range of noise, something that only animals with good control of their larynx (including humans but potentially excluding most mammals) can do. This shows vocal plasticity in seal pups from an early age, suggesting that seals may be one of the very few mammals which, like humans when singing or speaking a tonal language, can flexibly modulate the pitch of their voices,” Ravignani told ZME Science.

The fact that seals can modulate their vocalizations spontaneously and without training is striking. Even chimps, our closest living relatives, cannot do this, which makes the origin of speech perhaps even more mysterious. Humans are the only mammals that we know of that have a direct neural connection between the cortex and the larynx (the organ at the top of the neck that is responsible for the tone of your voice). Perhaps seals share this neural connection as well, which is what scientists intend to find out.

“In general, this work is part of my larger research agenda aimed at establishing seals as prime animal model species to better understand the origins and evolution of human speech and music. While at first, they may seem an unusual model, they offer untapped potential for comparative research, being more closely related to us than the much-studied songbirds and parrots, while spontaneously showing more music-like and speech-like behaviors than say apes and monkeys,” Ravignani said.

Just last week, researchers from the same Max Planck Institute for Psycholinguistics found that a type of lemur from Madagascar, known as the indri, has songs that exhibit a kind of rhythm only previously seen in humans. Together with seal vocalizations, these findings suggest that various building blocks for human speech may be found across different animal species. It’s just that we’ve somehow been lucky enough to put them all together.

“By finding another mammal who can modulate the pitch of its voice, we can start building an evolutionary tree of building blocks of speech, and show that some of these are not in fact uniquely human,” Ravignani said.

The findings appeared in the journal Philosophical Transactions of the Royal Society B Biological Sciences.

A wearable artificial graphene throat transforms human throat movements into different sounds. Credit: ACS Nano.

Wearable artificial voice box could help mute people speak

Chinese researchers have developed a thin artificial ‘voice box’ that can be attached to the neck like a temporary tattoo. The wearable device is capable of converting slight movements of the skin into sounds. Someday, the researchers hope, a more advanced version might help mute people speak with relative ease.

A wearable artificial graphene throat transforms human throat movements into different sounds. Credit: ACS Nano.

A wearable artificial graphene voice box transforms human throat movements into different sounds. Credit: ACS Nano.

We take speech for granted, but this seemingly effortless gesture is extremely fragile. About 8 in 10,000 people are born mute and many more suffer accidents or illnesses that can damage vocal cords, leading to speech impairments.

There are actually a surprising number of ways that speech can be robbed from us. There are disorders like stuttering and apraxia, which cause people to scramble syllables, as well as motor neuron diseases, which affect muscle control required to articulate. There are also brain injuries, stroke, multiple sclerosis, and even autism, which can cause speech impairments.

In the United States alone, there are over 2 million people who require digital “Adaptive alternative communication” (AAC) methods to help them communicate. For instance, the late Stephan Hawking used to employ a voice synthesizer that turns text into speech — the familiar robotic speech that eventually became one of Hawking’s trademark features.

Scientists are now working on more sophisticated voice-assisting technologies. For instance, a device that uses electrodes fitted in the human brain can detect and decipher neural signals — what a person means to say — transforming them into a digital signal which a voice synthesizer can utter.

Elsewhere, Chinese researcher at the Institute of Microelectronics & Beijing National Research Center for Information Science and Technology have developed prototypes that can measure the motions of human skin — such as a pulse or heartbeat — into other forms of energy, such as sounds.

Previously, one such prototype that converted skin motions into sound had to be taped to the skin and wasn’t comfortable enough to wear for long periods of time. Now, a new version which was recently described in ACS Nano is so thin that the device can be attached to the neck like a temporary tattoo, using just water.

The artificial voice box is comprised of a laser-scribed graphene on a thin sheet of polyvinyl alcohol film. It measures only 0.6 by 1.2 inches (1.5 to 3 cm), or roughly twice the size of a thumbnail.

To demonstrate the device, the researchers attached the film to volunteer’s throat which was connected to a small armband equipped with a circuit board, microcomputer, power amplifier, and decoder. When the volunteer imitated the motions of speech without actually pushing air to produce sounds, the device was able to convert these movements into sounds. For now, these are simple words like “OK” or “No” but the researchers say that, in the future, people with speech impairments could use a similar device to generate speech with their throats, just like any other person.

Credit: KylaCaresWP, Flickr.

Humans and monkeys respond differently to music and speech

Credit: KylaCaresWP, Flickr.

Credit: KylaCaresWP, Flickr.

Speech and music contain harmonic frequencies which we perceive to have “pitch”. The capacity to differentiate pitch from noise (sound that lacks pitch) is considered to be an intrinsic human quality — but how unique is this ability? A new study suggests that although humans and macaque monkeys share a similar visual cortex, there are important differences in the auditory cortex which processes sound.

Sam Norman-Haignere and colleagues at Columbia University measured cortical responses to both natural and synthetic harmonic tones and noise in human subjects and macaque monkeys. These sounds also included recorded macaque vocalizations that were pitched in post-production.

During one experiment involving four human participants and three macaque monkeys, the researchers noticed strong responses to harmonic tones in humans and virtually no response in the monkeys. In another experiment that studied the brain responses of six humans and five monkeys to natural and modified macaque vocalizations, the researchers found that the human brain performs stronger selectivity for harmonic vocalizations. Meanwhile, the macaques seem to lack the capacity to discern the pitched version.

The team of researchers concludes that the auditory cortical organization differs between human and macaques. These differences are likely driven by human’s propensity for speech and music.

Pitch allows us to convey mood or emphasis when speaking. For instance, read these sentences aloud:

  • I never said she stole my money.
  • never said she stole my money.
  • I never said she stole my money.

Each of the sentences above carries a different meaning due to the emphasis on certain words through pitch change. Previously, a study published in the journal Neuron involving epilepsy patients narrowed down the brain region responsible for pitch and its variations — the dorsal laryngeal motor cortex. Such studies are particularly useful to sufferers of aprosodia, a neurological condition that some researchers have described as “a disruption in the expression or comprehension of the changes in pitch, loudness, rate, or rhythm that convey a speaker’s emotional intent.”

High and low pitches are created by the vibration of vocal cords, which are controlled by tension in the folds that comes from flexing muscles, causing a faster vibration.

“We speculate that the greater sensitivity of the human cortex to harmonic tones is driven in development or evolution by the demands imposed by speech and music perception,” the authors concluded in the journal Nature.

Scientists present device that transforms brain activity into speech

The future is here: scientists have unveiled a new decoder that synthesizes a person’s speech using brain signals associated with the movements of their jaw, larynx, lips, and tongue. This could be a game changer for people suffering from paralysis, speech impairment, or neurological impairments.

Illustrations of electrode placements on the research participants’ neural speech centers, from which activity patterns recorded during speech (colored dots) were translated into a computer simulation of the participant’s vocal tract (model, right) which then could be synthesized to reconstruct the sentence that had been spoken (sound wave & sentence, below). Credit: Chang lab / UCSF Dept. of Neurosurgery.

Technology that can translate neural activity into speech would be a remarkable achievement in itself — but for people who are unable to communicate verbally, it would be absolutely transformative. But speaking, a process which most of us take for granted in our day to day lives, is actually a very complex process, one that’s very hard to digitize.

‘It requires precise, dynamic coordination of muscles in the articulator structures of the vocaltract — the lips, tongue, larynx and jaw,” explain Chethan Pandarinath and Yahia Ali in a commentary on the new study.

Breaking up speech into its constituent parts doesn’t really work. Spelling, if you think about it, is a sequential concatenation of discrete letters, whereas speech is a highly efficient form of communication involving a fluid stream of overlapping and complex movements multi-articulator vocal tract movements — and the brain patterns associated with these movements are equally complex.

Image of an example array of intracranial electrodes of the type used to record brain activity in the current study. Credit: UCSF.

The first step was to record cortical activity from the brain of five participants. These volunteers had their brain activity recorded as they spoke several hundred sentences aloud. The movements of the vocal tract were also followed. Then, scientists reverse-engineered the process, producing speech from brain activity. In trials of 101 sentences, listeners could readily identify and transcribe the synthesized speech.

Several studies have used deep-learning methods to reconstruct audio signals from brain signals, but in this study, a team led by postdoctoral researcher Gopala Anumanchipalli tried a different approach. They split the process into two stages: one that decodes the movement associated with speech, and another which synthesizes speech. The speech was played to another group of people, who had no problem understanding.

In separate tests, researchers asked one participant to speak sentences and then mime speech (making the same movements as speaking, just without the sound). This test was also successful, with the authors concluding that it is possible to decode features of speech that are never audibly spoken.

The rate at which speech was produced was remarkable. Losing the ability to communicate due to a medical condition is devastating. Devices that use movements of the head and eyes to select letters one by one can help, but they produce a communication rate of about 10 words/minute — much slower than the average 150 words/minute in average speech. This new technology is comparable to the natural speech rate, marking a dramatic improvement.

It’s important to note that this device doesn’t attempt to understand what someone is thinking — only to be able to produce speech. Edward Chang, one of the study authors, explains:

“The lab has never investigated whether it is possible to decode what a person is thinking from their brain activity. The lab’s work is solely focused on allowing patients with speech loss to regain the ability to communicate.”

While this is still a proof-of-concept and needs much more work before it can be practically implemented, the results are compelling. With continued progress, we can finally hope to empower individuals with speech impairments to regain the ability to speak their minds and reconnect with the world around them.

The study was published in Nature. https://doi.org/10.1038/s41586-019-1119-1

Text bubble.

AI spots depression by looking at your patterns of speech

A new algorithm developed at MIT can help spot signs of depression from a simple sample (text of audio) of conversation.

Text bubble.

Image credits Maxpixel.

Depression has often been referred to as the hidden depression of modern times, and the figures seem to support this view: 300 million people around the world have depression, according to the World Health Organization. The worst part about it is that many people live and struggle with undiagnosed depression day after day for years, and it has profoundly negative effects on their quality of life.

Our quest to root out depression in our midst has brought artificial intelligence to the fray. Machine learning has seen increased use as a diagnostics aid against the disorder in recent years. Such applications are trained to pick up on words and intonations of speech that may indicate depression. However, they’re of limited use as the software draws on an individual’s answers to specific questions.

In a bid to bring the full might of the silicon brain to bear on the matter, MIT researchers have developed a neural network that can look for signs of depression in any type of conversation. The software can accurately predict if an individual is depressed without needing any other information about the questions and answers.

Hidden in plain sight

“The first hints we have that a person is happy, excited, sad, or has some serious cognitive condition, such as depression, is through their speech,” says first author Tuka Alhanai, a researcher in the Computer Science and Artificial Intelligence Laboratory (CSAIL).

“If you want to deploy [depression-detection] models in scalable way […] you want to minimize the amount of constraints you have on the data you’re using. You want to deploy it in any regular conversation and have the model pick up, from the natural interaction, the state of the individual.”

The team based their algorithm on a technique called sequence modeling, which sees use mostly in speech-processing applications. They fed the neural network samples of text and audio recordings of questions and answers used in diagnostics, from both depressed and non-depressed individuals, one by one. The samples were obtained from a dataset of 142 interactions from the Distress Analysis Interview Corpus (DAIC).

The DAIC contains clinical interviews designed to support the diagnosis of psychological distress conditions such as anxiety, depression, and post-traumatic stress disorder. Each subject is rated ,in terms of depression, on a scale between 0 to 27, using the Personal Health Questionnaire. Scores between moderate (10 to 14) and moderately severe (15 to 19) are considered depressed, while all others below that threshold are considered not depressed. Out of all the subjects in the dataset, 28 (20 percent) were labeled as depressed.

Simple diagram of the network. LSTM stands for Long Short-Term Memory.
Image credits Tuka Alhanai, Mohammad Ghassemi, James Glass, (2018), Interspeech.

The model drew on this wealth of data to uncover speech patterns for people with or without depression. For example, past research has shown that words such as “sad,” “low,” or “down,” may be paired with audio signals that are flatter and more monotone in depressed individuals. Individuals with depression may also speak more slowly and use longer pauses between words.

The model’s job was to determine whether any patterns of speech from an individual were predictive of depression or not.

“The model sees sequences of words or speaking style, and determines that these patterns are more likely to be seen in people who are depressed or not depressed,” Alhanai says. “Then, if it sees the same sequences in new subjects, it can predict if they’re depressed too.”

Samples from the DAIC were also used to test the network’s efficiency. It was measured on its precision (whether the individuals it identified as depressed had been diagnosed as depressed) and recall (whether it could identify all subjects who were diagnosed as depressed in the entire dataset). It scored 71% on precision and 83% on recall for an averaged combined score of 77%, the team writes. While it may not sound that impressive, the authors write that this outperforms similar models in the majority of tests.

The model had a much harder time spotting depression from audio than text. For the latter, the model needed an average of seven question-answer sequences to accurately diagnose depression. With audio, it needed around 30 sequences. The team says this “implies that the patterns in words people use that are predictive of depression happen in a shorter time span in text than in audio,” a surprising insight that should help tailor further research into the disorder.

The results are significant as the model can detect patterns indicative of depression, and then map those patterns to new individuals, with no additional information. It can run on virtually any kind of conversation. Other models, by contrast, only work with specific questions — for example, a straightforward inquiry, “Do you have a history of depression?”. The models then compare a subject’s response to standard ones hard-wired into their code to determine if they are depressed.

“But that’s not how natural conversations work,” Alhanai says.

“We call [the new model] ‘context-free,’ because you’re not putting any constraints into the types of questions you’re looking for and the type of responses to those questions.”

The team hopes their model will be used to detect signs of depression in natural conversation. It could, for instance, be remade into a phone app that monitors its user’s texts and voice communication for signs of depression, and alert them to it. This could be very useful for those who can’t get to a clinician for an initial diagnosis, due to distance, cost, or a lack of awareness that something may be wrong, the team writes.

However, in a post-Cambridge-Analytica-scandal world, that may be just outside of the comfort zone of many. Time will tell. Still, the model can still be used as a diagnosis aid in clinical offices, says co-author James Glass, a senior research scientist in CSAIL.

“Every patient will talk differently, and if the model sees changes maybe it will be a flag to the doctors,” he says. “This is a step forward in seeing if we can do something assistive to help clinicians.”

Truth be told, while the model does seem very good at spotting depression, the team doesn’t really understand what crumbs it follows to do so. “The next challenge is finding out what data it’s seized upon,” Glass concludes.

Apart from this, the team also plans to expand their model with data from many more subjects — both for depression and other cognitive conditions.

The paper “Detecting Depression with Audio/Text Sequence Modeling of Interviews” has been published in the journal Interspeech.

Marmosets (Callithrix jacchus) take 3000–5000 ms between turn-taking exchanges. Credit: Wikimedia Commons.

From insects to whales, all sorts of animals take turns to communicate

Until not long ago, two-way communication was thought to be an exclusively human trait. But a new study shows just flawed this idea was, reporting that many animals from elephants to mosquitoes employ turn-taking behavior when communication among themselves. The findings might one day help scientists pinpoint the origin of human speech.

Marmosets (Callithrix jacchus) take 3000–5000 ms between turn-taking exchanges. Credit: Wikimedia Commons.

Marmosets (Callithrix jacchus) take 3000–5000 ms between turn-taking exchanges. Credit: Wikimedia Commons.

An international team of researchers reviewed the current scientific literature on animal communication. From the hundreds of studies that they analyzed spanning over 50 years of research, the authors found that the orderly exchange of communicative signals is far more common in the animal kingdom than anyone thought.

The challenge lied in piecing together all the fragmented information from all the various studies published thus far, whether the focus was on the chirps of birds or the whistles of dolphins. But once this task was complete, the researchers were impressed to learn about how complex animal communication really is.

Take timing, for instance, which is a key feature of communicative turn-taking in both humans and non-human animals. In some species of songbird, the latency between notes produced by two different birds is less than 50 milliseconds. On the other end of the spectrum, sperm whales exchange sequences of clicks with a gap of about two seconds between turns. We, humans, lie somewhere in the middle, producing utterances with gaps of around 200 milliseconds between turns.

Interestingly, us humans aren’t the only species that consider it rude to interrupt. The researchers found that both black-capped chickadees and European starlings practiced so-called “overlap avoidance” during turn-taking communication. If an overlap occurred, individuals would go silent or flew away, a sign that overlapping may be seen as an unacceptable violation of the social rules of turn-taking.

Although temporal coordination in animal communication has attracted interest over several decades, no clear picture has yet emerged as to why individuals exchange signals, the researchers wrote.

It’s quite likely that this turn-taking behavior underlies the evolution of human speech so a more systematic cross-species examination could render striking results. The authors even offer a framework that might enable such a comparison.

“The ultimate goal of the framework is to facilitate large-scale, systematic cross-species comparisons,” Dr. Kobin Kendrick, from the University of York’s Department of Language and Linguistic Science and one of the authors of the new study, said in a statement.

“Such a framework will allow researchers to trace the evolutionary history of this remarkable turn-taking behavior and address longstanding questions about the origins of human language.”

The team included researchers from the Universities of York and Sheffield, the Max Planck Institute for Evolutionary Anthropology in Germany, and the Max Planck Institute for Psycholinguistics in the Netherlands.

“We came together because we all believe strongly that these fields can benefit from each other, and we hope that this paper drives more cross-talk between human and animal turn-taking research in the future,” said Dr. Sonja Vernes, from the Max Planck Institute for Psycholinguistics.

Scientific reference: Taking turns: Bridging the gap between human and animal communicationProceedings of the Royal Society B.

Credit: Flickr, Tambako The Jaguar.

Monkey ‘vocabulary’ could clue us in on evolution of human speech

Looking to pinpoint the origin of human speech, German researchers at Tübingen University have identified the smallest units comprising the vocalization of marmoset monkeys. Like human speech, the monkey’s vocalization is made up of individual syllables of fixed length from short  ‘tsiks’ and ‘ekks’ to quiet ‘phees’.

Scientists have found that there’s an inherent rhythm with which humans are capable of producing syllables, which are a seventh of a second long on average. We’re restrained from producing any shorter syllables by our biological machinery, which includes both the structure of the voicebox and the neural pathways that govern speech in the brain. 

To understand the evolution of human speech, scientists are trying to identify the fundamental processes that enable speech and language generation, but also the fundamental units of language.

Dr. Steffen Hage of the Werner Reichardt Centre for Integrative Neuro-science (CIN) at the University of Tübingen has a hunch that the biological fundamentals of speech may have looked very similar in our ancestors. He and colleagues looked for clues in our closest relatives still alive today: other primates.

The team focused on marmoset monkeys, which are small primates that live high up in the canopies of South American rainforests. There are more than 20 species, and most could fit comfortably in an adult human’s hand. They come in a wide variety of colors, from black to brown to silver to bright orange, and have soft and silky hair. Many have tufts of hair or manes on either side of their faces, which are sparsely furred or naked, making them quite adorable looking.

A pygmy marmoset. Credit: Public Domain Pictures.

A pygmy marmoset. Credit: Public Domain Pictures.

Typically, scientists that study the rhythm and length of syllables focus on other animals such as passerine birds, but Hage and colleagues decided that marmosets are more interesting, being far more closely related to us.

The researchers recorded thousands upon thousands of monkey vocalizations in a sound chamber. They intentionally interrupted the monkeys’ “tsiks” and “ekks” with white noise at regular intervals to make them fall quiet. Researchers might have been rude, shamelessly talking over the moneys, but at least they found out something that could turn out to be very important.

“The marmosets’ ‘phee’ had so far been considered part of their basic vocabulary, alongside the ‘tsik’ and ‘ekk’. We observed that they would stop right in the middle of their ‘phee’ calls when we disrupted them with noise. Moreover, that would only happen at specific points within the call,” said Thomas Pomberger, one of the study’s co-authors.

What the researchers learn was that the long ‘phee’ call actually consists of small units of about the same length as a ‘tsik’ or ‘ekk’, lasting about 100 milliseconds.

“Until now, the supposed existence of the long ‘phee’ has not allowed for the conclusion that we can draw now: just like us, marmoset monkeys have a ‘hardwired’ rhythm that controls their vocalisation. It is even similarly fast,” said Hage in a statement.

This sort of rhythm could have evolved in an early ancestor as a prerequisite of speech. Of course, the evolution of speech is still an open question, but little by little, we’re getting there — one tsik and ekk at a time.

Scientific reference: Thomas Pomberger, Cristina Risueno-Segovia, Julia Löschner, Steffen R. Hage: Precise Motor Control Enables Rapid Flexibility in Vocal Behavior of Marmoset Monkeys. In: Current Biology (in press). 22 February 2018.


Brain network that picks words from the background noise revealed

Scientists have identified the brain networks that help focus on one voice or conversation in a noisy room — known as the “cocktail party effect”. They hope that by emulating the way these areas work, modern voice recognition software can be made to function much more efficiently.


Image credits Gerd Altmann / Pixabay.

When you’re at a party, your brain allows you to tune in on a single conversation while lowering the volume of background noise, so to speak. Now, have you ever tried to give a voice command to a device in any type of noisy setting? If yes, you can probably understand why scientists would love to get their hand on a similar voice recognition system for our gadgets.

A new study might offer a way forward for such a technology. Neuroscientists led by Christopher Holdgraf from the University of California, Berkeley, recorded the brain activity of participants listening to a previously distorted sentence after they were told what it meant. The team worked with seven epilepsy patients who had electrodes placed on the surface of their brain to track seizures.

They played a very distorted recording of a sentence to each participant, which almost none of them was able to initially understand. An unaltered recording of the same sentence was played afterwards, followed by the garbled version once more.

“After hearing the intact sentence” the paper explains, the subjects understood the “noisy version” without any difficulty.

Brain recordings show that this moment of recognition coincided with patterns of activity in areas known to be involved in understanding sound and speech. When subjects listened to the garbled version, the team saw little activity in these areas, but hearing the clear sentence then caused their brains to light up.

This was the first time we saw the way our brains alter their response when listening to an understandable or garbled sound. When hearing the distorted phrase again, auditory and speech processing areas lit up and changed their pattern of activity over time, apparently tuning in to the words among the distortion.

“The brain actually changes the way it focuses on different parts of the sound,” explained the researchers.

“When patients heard the clear sentences first, the auditory cortex enhanced the speech signal.”

The team is now trying to expand on their findings and understand how the brain distinguishes between the background and the sounds we’re actually interested in hearing.

“We’re starting to look for more subtle or complex relationships between the brain activity and the sound,” Mr Holdgraf said.

“Rather than just looking at ‘up or down’, it’s looking at the details of how the brain activity changes across time, and how that activity relates to features in the sound.”

This, he added, gets closer to the mechanisms behind perception. If we understand how our brains filter out the noise, we can help people with speech and hearing impediments better hear the world around them. The team hopes to use the findings to develop a speech decoder — a brain implant to interpret people’s imagined speech — which could help those with certain neurodegenerative diseases that affect their ability to speak.

The full paper “Rapid tuning shifts in human auditory cortex enhance speech intelligibility” has been published in the journal Nature Communications.


That urge to complete other people’s sentences? Turns out the brain has its own Auto Correct

The hippocampus might have a much more central role to play in language and speech than we’ve ever suspected, a team of US neuroscientists claims. They examined what happens in people’s brains when they finish someone else’s sentence.


Image credits Isa Karakus / Pixabay.

Do you ever get that urge to blurt out the last word of somebody else’s sentence? Happens to me all the time. And it seems scientists do it too because a team led by senior researcher at the Donders Centre for Cognition and Radboud University Medical Centre Vitoria Piai looked into the brains of 12 epileptic patients to make heads and tails of the habit. What they’ve found flies against everything we currently know about how memory and language interact in our brains.

The 12 patients were taking part in a separate study trying to understand their unique patterns of brain activity. Each one of their brains was monitored with a set of electrodes. Piai and her team told the participants a series of six-syllable (but incomplete) sentences, “she came in here with the…” or “he locked the door with the…” for example. After the sentence was read out to them the researchers held up a card with the answer printed on it, all the while monitoring how the patients’ hippocampi — on their non-epileptic side of the brain — responded.

When the missing word was obvious, ten out of the twelve subjects showed bursts of synchronised theta waves in the hippocampus, a process indicative of memory association.

“The hippocampus started building up rhythmic theta activity that is linked to memory access and memory processing,” said Robert Knight from the Department of Psychology, Helen Wills Neuroscience Institute, University of California, Berkeley and co-author of the paper.

But when the answer wasn’t so straightforward, their hippocampi ramped up even more as it tried (without success) to find the correct word — like an engine revving up with the clutched pulled down.

The original auto correct

“[The results] showed that when you record directly from the human hippocampal region, as the sentence becomes more constraining, the hippocampus becomes more active, basically predicting what is going to happen.”

Just like the auto correct feature replaces a more unusual word the first time you use it but adapts over time to not only stop replacing it, but also starts filling it in for you, the findings suggest that our minds try to fill blanks in dialogue drawing from our memory stores of language and the interlocutor’s particularities of speech, linking memory and language.

“Despite the fact that the hippocampal area of the medial part of the temporal lobe is well known to be linked to spatial and verbal memory in humans, the two fields have been like ships running in the fog, unaware that the other ship is there,” Knight added.

This would mean that the hippocampus plays a much more important role in language, previously thought to be the domain of the cortex — though right now, the team doesn’t know exactly how this link works. Because of this, the team hopes to continue their work to better understand the bridge between memory and language, which will hopefully give us a better understanding of the brain itself.

Another implication would be that, because at least part of the act of speaking is handled by the hippocampus and not the cortex, language might not be so human-only as we’d like to believe.

The full paper “Direct brain recordings reveal hippocampal rhythm underpinnings of language processing” has been published in the journal PNAS.

zebra finch

How baby songbirds can tell us a thing or two about how we learn to speak

zebra finch

Credit: Pixabay

We only remember few things that happened before the age of four, this part of our childhood is critical. During this time, we learn to walk, navigate human society and speak — with some help from mom and dad, of course. Zebra finches are not all that different in this respect. Only the males sing, and they learn it all while they’re still babies by attentively listening to a tutor. Scientists tapped into the brains of the zebra finches and uncovered a neural mechanism that may be key to shaping the brains of the birds, but also that of humans, in this critical stage of their development.

“For young animals, the early sensory experiences are very important and strongly affect brain development,” said Prof. Yoko Yazaki-Sugiyama and Dr. Shin Yanagihara from Okinawa Institute of Science and Technology Graduate University (OIST). “This stage is called the ‘critical period’ where the brain circuits are very flexible and can be easily changed and modified. We wanted to know how the early sensory experiences during the critical period shape brain circuits and lead to appropriate behaviours.”

Only male zebra finches learn how to sing songs. These songs attract mates, and as such are essential to reproduction. If a young can’t master the complex songs, then he has no chance of passing on his genes.

To learn the song, juvenile males attentively list to the father’s song and memorizes it. Animal behaviorists believe young finches store the song in their auditory memory, gradually adapting it as they vocalize until the birds develop their own songs, similar to the ones they’ve been tutored. This is very similar to human speech development which occurs during a critical period when children are all eyes and ears to what parents say.

Yanagihara and colleagues predicted that when the juvenile zebra finches are tutored by their fathers, the experience modifies the young brain circuits to form memories.

The team set out to confirm this prediction by looking what kind of response neurons from the higher auditory cortex had to the sound of a tutor’s song. The brains of juvenile birds were monitored when they listening to different songs — their own, the tutor’s or other zebra finches, as well as songs from other songbird species. Isolated control zebra finches were also monitored. You can listen to how these various songs sound like below.

The investigation suggests that the zebra finches respond to all sounds with non-selective neurons. There are, however, selective neurons that only ‘fire’ almost exclusively to the sound of a tutor’s song.

“In the normal, tutored birds, we encountered a group of neurons that responded very strongly to the tutor song, after they had learned the song, but did not respond to the other songs,” Yanagihara said.

“However, for the birds which had no tutor experiences, we did not see any response to the tutor-song, or in this case the genetic-father-song, and no selective neurons at all.”

Five percent of the neurons in the higher auditory cortex (27 neurons) reacted to the tutor song, indicating where early auditory memory might be located in the zebra finch brain. This is important because, for both finches and humans, these critical brain circuits are still poorly understood.

“This study gives some idea of how the brain acquires memories during the critical period,” Yanagihara said. “This is a step in understanding how the neuronal mechanisms of memory and early sensory experiences form brain circuits in the early developmental stage, not only in birds, but also in humans and other species.”

Humans are not unique in understanding the basics of language

A paper published recently in Nature Communications details how a team led by Dr. Ben Wilson and Professor Chris Petkov used a brain imaging technique to identify the neuronal evolutionary origins of language. Their findings help us understand how we learn to speak, and could allow new treatments for those who lost this ability from aphasia following a stroke or dementia.

Image via wikimedia

By scanning the brains of macaque monkeys, the researchers identified an area in the front of the brain that, in both humans and macaques, recognizes a sequence of sounds as speech, and is responsible for analyzing if the sounds are in legal order or in an unexpected, illegal order.

“Young children learn the rules of language as they develop, even before they are able to produce language. So, we used a ‘made up’ language first developed to study infants, which our lab has shown the monkeys can also learn. We then determined how the human and monkey brain evaluates the sequences of sounds from this made up language,” said Professor Petkov.

Human and monkey subjects were played an example sequence from the made-up language, to hear the correct order in the sequence of sounds. After this they were played new sequences, some of which were in an incorrect order, and the team scanned their brains using fMRI. In both species, there was neuronal response in the same region of the brain — the ventral frontal and opercular cortex — when the sounds were correctly ordered.

The findings suggest that this region’s functionality is shared between humans and macaques, revealing a common cerebral evolutionary source. This brain region seems to monitor the orderliness or organization of sounds and words, which is an important cognitive function, at the core of the more complex language abilities of humans. The findings are the first scientific evidence that other animals share with us at least some of the functions this area serves, which include understanding language in humans.

“Identifying this similarity between the monkey and human brain is also key to understanding the brain regions that support language but are not unique to us and can be studied in animal models using state-of-the-art neuroscientific technologies,” Professor Petkov explains.

“This will help us answer questions on how we learn language and on what goes wrong when we lose language, for example after a brain injury, stroke or dementia.”

Building on these developments, the Newcastle University team, with their neurology collaborators in Cambridge and Reading Universities have begun a project to study the function of this brain region and its role in language impairment in aphasic patients with stroke, which might lead to better diagnosis and prognosis of language impairment.


brain and speech center

Researchers home in on speech center in the brain

Researchers have long theorized that the  superior temporal sulcus (STS) is involved in processing speech rhythms, but it’s only recently that this has been confirmed by a team at Duke University. Their findings show that the STS  is sensitive to the timing of speech, a crucial element of spoken language. This could help further our understanding of how some speech-impairing conditions arise in the brain, or aid tutors design next-generation, computer assisted foreign language courses.

brain and speech center

Image: The Linguist

Timing and the brain


Human brains are particularly efficient in perceiving, producing, and processing fine rhythmic information in music and speech. However, music is processed differently from speech, suggesting some underlying, specific mechanisms. For instance, any type of sound whether rhythmic or not triggers activity in the temporal lobe’s auditory cortex, but speech lights up only the STS.

Any linguist can tell you timing makes everything in a language. Namely, it involves timing and stitching ultra short, short and long sounds together.  Phonemes are the shortest, most basic unit of speech and last an average of 30 to 60 milliseconds. By comparison, syllables take longer: 200 to 300 milliseconds, while most whole words are longer still. It’s an immense amount of information to process, least we forget that data related to speech is distinct from other sounds like the environment (birds chirping, water splashing) or music, which shares a rhythmic sequence.

The Duke researchers took a speech rendered in a foreign language then cut it into short chunks ranging from 30 to 960 milliseconds in length. Using a novel computer algorithm, the sounds were then re-assembled which led to new sounds that the authors call ‘speech quilts’. It’s basically gibberish, but it still sounds like some language. The shorter the pieces of the resulting speech quilts, the greater the disruption was to the original structure of the speech.

The sounds were then played to volunteers who had their brain activity monitored. The researchers hypothesized that the STS would have a better response to speech quilts made up of longer segments. Indeed, this was what happened: the STS became highly active during the 480- and 960-millisecond quilts compared with the 30-millisecond quilts.


To make sure they weren’t actually seeing some other response, the authors also played other sounds intended to mimic speech, but with some key differences. One of the synthetic sounds they created shared the frequency of speech but lacked its rhythms. Another removed all the pitch from the speech. A third used environmental sounds. Again, each control sound was chopped and quilted before playing them to the participants. The STS didn’t seem responsive to the quilting manipulation when it was applied to these control sounds, as reported in Nature Neuroscience.

“We really went to great lengths to be certain that the effect we were seeing in STS was due to speech-specific processing and not due to some other explanation, for example, pitch in the sound or it being a natural sound as opposed to some computer-generated sound,” said co-author Tobias Overath, an assistant research professor of psychology and neuroscience at Duke.


A wild-born orangutan has learned to communicate like a human

A female orangutan born in the wild has learned to use her tongue to whistle and produce vowel sounds just like a human – suggesting that all giant apes are able to do so. Although orangutans are known to create diverse vocalisations, what Tilda can do is unique.

Meet Tilda, the first ever orangutan to make human vocalisations. She can click her tongue producing two calls which were never before observed in any apes, and can create sounds similar to our pronounciation of voiceless consonants (something present in several African languages). Tilda can also whistle.

It’s not clear how she learned to do all this, but it’s believed that it happened because she worked in the “entertainment business”. Tilda’s vocalisations have now been described in a paper in PLOS ONE. She uses these signals to ask for more food, even clapping and pointing in the direction of food. Biologist Adriano Lameira from the University of Amsterdam, explains:

“They are what we would call attention gathering or come-hither calls, which indeed are mostly used when the human caretakers are handling food,” Lameira said. “I would translate them into, ‘Come here and give that food to me!”

It was previously believed that apes don’t have enough control on their vocal structures to emit human-like sounds, but this clearly shows that the theory is wrong.

Image: Archive Cologne Zoo

“The extent of motoric control that great apes exert over their vocal structures, both laryngeal and supra-laryngeal, may be much higher than hitherto presumed,” the authors write in PLOS ONE. The notion that great ape calls are hard-wired and inflexible is likely an artefact of our very poor understanding of the call communication of these species, rather than that their calls are factually hard-wired or inflexible,” Lameira added.

Interestingly, this may not only help us understand if and how apes can speak, but it can also help us understand where our own speech originated from. By further studying how great apes use these sounds, we may finally understand “the conditions that brought together for the first time the two basic building blocks of speech,” as the researchers write.

“The evolutionary origins of speech remain obscure. Recently, it was proposed that speech derived from monkey facial signals which exhibit a speech-like rhythm of ~5 open-close lip cycles per second. In monkeys, these signals may also be vocalized, offering a plausible evolutionary stepping stone towards speech.”

Journal Reference: Adriano R. Lameira, Madeleine E. Hardus, Adrian M. Bartlett, Robert W. Shumaker, Serge A. Wich, Steph B. J. Menken. Speech-Like Rhythm in a Voiced and Voiceless Orangutan Call. Published: January 08, 2015DOI: 10.1371/journal.pone.0116136

In photo: sixteen year old inventor Arsh Shah Dilbagi demonstrating his breath to voice synthesizer.

Indian teenager invents cheap device that turns breath into speech

In photo: sixteen year old inventor  Arsh Shah Dilbagi demonstrating his breath to voice synthesizer.

In photo: sixteen year old inventor Arsh Shah Dilbagi demonstrating his breath to voice synthesizer.

About 1.4% of the world’s population today is speech impaired, due to conditions such as Amyotrophic lateral sclerosis (ALS), locked-in syndrome (LIS), Encephalopathy (SEM),Parkinson’s disease, and paralysis. Imagine all the people living in Germany today were unable to speak and you’ll come to realize just how far reaching this condition is. So, aside for those being paralyzed, there are a lot of people who can’t speak, making any kind of relationship with friends and family unbearable – the patient is essentially trapped in a situation where he/she is forced to live inside her head until the end of days. An Indian teenager sought to address this heartbreaking world problem and succeed in building a device that is easy to make, cheap and effective. Most of all, it’s extremely ingenious since it can translate orderly breaths into speech.

Follow my breath

If you followed the work of the esteemed physicist Stephen Hawking or have seen him on TV, you may have noticed that he uses a complex computer interface to speak. Oddly enough, his voice is one of the most recognized on the planet, and it’s all synthesized! The tech he employs is, however, extremely expensive.

Sixteen-year-old Arsh Shah Dilbagi took a different route. Instead of building complex and expensive IR sensors that trigger off of twitches in the cheek muscle under the eye, like those used by Hawking’s machine, Dilbagi designed a system that can translate a user’s breath into electrical signals. As such, the device is only made out of a pressure-sensitive diaphragm etched directly into a silicon chip, and an amplifying device to increase the sound of the user’s breath. This allowed him to keep the price tag at $80, compared to thousands someone would need to cash out for a device similar to Hawking’s.

The tech, called ‘TALK’, can identify two types of breaths, as well as  different intensities and timing so that the user can effectively spell out words using Morse code. An embedded microprocessor then reads the timed breaths as dots and dashes and translates them into words. A second microprocessor synthesizes the words to spell them into a voice. It’s remarkably simple and effective, even though the user needs to be trained to use Morse code, but it sure beats the alternative.

“After testing the final design with myself and friends and family, I was able to arrange a meeting with the Head of Neurology at Sir Ganga Ram Hospital, New Delhi and tested TALK (under supervision of doctor and in controlled environment) with a person suffering from SEM and Parkinson’s Disease,” Dilbagi reports. “The person was able to give two distinguishable signals using his breath and the device worked perfectly.”

Dilbagi is currently the only finalist in Asia enrolled in Google’s Global Science Fair, a competition that’s open to 13 to 18-year-olds from anywhere in the world. Let’s wish him the best of luck!


Who talks more, men or women? It all depends on the context, study finds

“We women talk too much, nevertheless we only say half of what we know.”  Nancy Witcher Astor, Viscountess

http://www.toonpool.com/user/323/files/chatterbox_484075.jpgThere’s a deeply entrenched stereotype that portrays women as extremely talkative or, at least, much much chatty than men. Ask most people, both men and women, they will agree, but is this merely a subjective facet or does it indeed reflect reality? A new study teases out a more accurate picture on the matter. Researchers at Northwestern University found that in the end it all boils down to context, and that in some cases men are the real chatterboxes.

The communication patterns or both men and women have been of great interest to researchers, but so far studies have turned in mixed results and have made the matter controversial. For instance, a University of California study from 2007 that made a meta-analysis of previously published studies investing men and women communication patterns found that in fact men are the most talkative. Older studies report that women chat more, while others report that there’s no actual difference. It’s very confusing, and like most psychological studies the mixed results stem from an inconsistent common frame of reference. Some studies rely on self-reporting (subjectivity limitation), while others make direct observations (the ‘being-watched’ syndrome that causes people to act and respond differently).

Gauging conversations


North­eastern pro­fessor David Lazer and his team took a different approach. Making use of the technology at their disposal, the researchers employed “sociometers” – wearable devices roughly the size of a smartphone that quantify social interactions. These were fitted to 79 students and 54 call-centre employees who were involved in two distinct social settings: a University environment where the participants were divided into groups and asked to  work on a project, and a work situation in which employees were tracked during twelve one-hour lunch breaks.

During the task-based, university setting conversations were dominated by men and women, in turn, depending on how large the group was. When the groups consisted of six or more participants, it was men who did the most talking, whereas women spent more time than men speaking with just one or two other people when the task was collaborative (62% more talkative than men). During the lunch-break setting, women were found to be slightly more likely than men to engage in conversations, both long and short-duration. This reported difference, however, is so slight that the researchers couldn’t infer which of the two genders was definitely more talkative.

“In the one set­ting that is more col­lab­o­ra­tive we see the women choosing to work together, and when you work together you tend to talk more,” said Lazer, who is also co-​​director of the NULab for Texts, Maps, and Net­works, Northeastern’s research-​​based center for dig­ital human­i­ties and com­pu­ta­tional social sci­ence. “So it’s a very par­tic­ular sce­nario that leads to more inter­ac­tions. The real story here is there’s an inter­play between the set­ting and gender which cre­ated this difference.”

Where does the notion that women ‘fill the air’ come from then? It is possible that this stereotype forms during childhood. Girls language skills develop earlier, enabling them to become articulate at a younger age, and it may be possible that this notion stuck with us ever since we were very little. This idea, too, seems to be flawed as studies have failed to report a significant difference in the amount of spoken words by gender in children.

While men and women are indeed biologically and, in some instances, cognitively different, it may be the case that we’ve been brainwashed by society into exaggerating the gender gap. Then again, maybe men are really from Mars and women from Venus.

The findings were reported in the journal Scientific Reports.

Speech Jammer

Speech-jamming gun puts annoying conversations to an end

Speech Jammer Are you fed up with meaningless, rambling conference speakers? All too tired of phone calls around you at work? Wish there was a mute button for your girlfriend? Finally, all your prayers have been answered! Presenting the ultimate silencer, the speech-jamming gun.

Japanese scientists, Kazutaka Kurihara at the National Institute of Advanced Industrial Science and Technology in Tskuba and Koji Tsukada at Ochanomizu University, recently presented their radical solution, unveiling to the world a device which feeds-back the words uttered by a targeted speaker with a delay of 0.2 seconds. The idea is simple, and has been confirmed by psychologists in the past – when a person’s voice is recorded and played back to him with a delay of a fraction of the second, it’s impossible for the person in question to speak anymore, any this is exactly what the speech-jammer does

Capitalizing on the “delayed auditory feedback” (DAF), the handheld device consisting of a microphone and a speaker, can be directed towards an uttering person and render him silent, and will keep him or otherwise go mad. The centerpiece of the device is its parametric directional speaker, which modulates the laser targeted person’s voice into an ultrasonic beam. This perturbs the air in a narrow beam, demodulating the audio to generate audible sound to anyone within that beam.

After tests, its creators state that “the system can disturb remote people’s speech without any physical discomfort.” They go on to add that the jamming-gun works best against speech that involves reading aloud than against spontaneous monologue. Applications? Maintain silence in public libraries and “facilitate discussion” in group meetings. “We have to establish and obey rules for proper turn-taking when speaking,” they say.

Check out a demo of the jamming-gun in the youtube video below.

Link to reference paper.

via MIT’s Technology Review

The girl who silenced the world for five minutes

Her name is Severn Suzuki, and here you have one of perhaps the most impressive speeches of all time, delivered by her (only a 12 year old at the time) at a UN meeting, at the Earth Summit in 1992.

[After 5 minutes]

An incredible story

Severn Suzuki was born in a remarkable family, with her mother being a writer, and her father being a genetician and environmental activist. She showed extreme determination and leadership at an age (9) where other children are still learning how to play with toys, by founding the Environmental Children’s Organization (ECO) – a group of children who wanted to learn more about the environment and teach other children about it.

When she was 12, she raised money with other children from ECO to attend the Earth Summit in Rio de Janeiro, where that clip is taken from. Along with group members Michelle Quigg, Vanessa Suttil, and Morgan Geisler, Severn Suzuki attendend the summit, where she presented environmental issues that affect the world from a child’s perspective; she was applauded by summit members for minutes, and the video of her speech became one of the most inspiring ever.

Furthermore, one year later she published a book, Tell the World, which presents easy environmental steps to take for every family out there.

Where is Severn Suzuki now


In case you’re wondering what she’s up to nowadays, she graduated from Yale in 2002, and now she’s an environmental activist, and even had her own show for children on Discovery. She was involved in an internet-based think tank which was used as an advisor for Kofi Annan, but the project was disbanded when she continued her education.

I’ve searched all over the internet but couldn’t find a way to contact her in the hope that maybe she could share a few words with us. If you know or happen to stumble upon an email address or something, it’d be really great; once again, we would like to bow our heads to Severn Suzuki, and the things she has accomplished along the years.