Imperial College London researchers claim they have developed a speech analysis method that supports applications such as speech recognition and identification while removing sensitive attributes such as emotions, gender, and health. Your framework receives voice data and privacy settings as additional information and uses the settings to filter out confidential attributes that could otherwise be extracted from recorded speech.
Speech signals are an extensive data source that contains linguistic and paralinguistic information, including age, likely gender, state of health, personality, mood, and emotional state. This raises concerns in cases where raw data is transferred to servers. Attacks such as attribute inference can reveal attributes that should not be shared. In fact, the researchers claim that attackers could use a speech recognition model to learn more attributes from users and use the model̵
The team’s approach aims to limit the success of inference attacks using a two-phase approach. In the first phase, users adjust their privacy settings, with each of the settings being associated with tasks (e.g. voice recognition) that can be performed on voice data. In the second phase, the framework learns unraveled representations in the voice data to control dimensions that reflect the independent factors for a particular task. The framework can generate three types of output: language embeds (ie, numerical representations of language), speaker embeds (numerical representations of users), or language reconstructions that are created by concatenating the language embeds with synthetic identities.
In experiments, the researchers used five public data sets (IEMOCAP, RAVDESS, SAVEE, LibriSpeech and VoxCeleb) that were recorded for various purposes, including speech recognition, speaker recognition, and emotion recognition, to train, validate, and test the framework. They found that they were able to achieve high speech recognition accuracy while hiding a speaker’s identity using the framework. However, this recognition accuracy increased slightly depending on the preferences specified. In this case, the co-authors were confident that this could be addressed with restrictions in future work.
“It is clear that [things like the] The change in energy in each pitch class for each frame reflects the success of the proposed frame in changing the prosodic representation in relation to the user’s emotions [and other attributes] to protect his privacy, ”the researchers wrote in a preprint paper. “Protecting the privacy of users when analyzing speech continues to be a particularly challenging task. However, our experiments and results show that it is possible to achieve an adequate level of privacy while maintaining a high level of functionality for voice-based systems. “
The researchers plan to focus on expanding their framework to provide controls that depend on the devices and services that users interact with. They also intend to investigate privacy-protecting, interpretable, and customizable applications that are made possible by disentangled representations.
This latest study follows an article by Chalmers University of Technology and Swedish RISE research institutes that proposes a privacy protection technique that learns to disguise attributes such as gender in voice data. Like the team at Imperial College London, they used a model that was trained to filter confidential information in records and then, regardless of the details filtered, to generate new and private information to ensure that confidential information remained hidden without realism or impair the benefits.