本文へ移動
記事

混ざった声を聞き分ける最新技術:音源分離と目的音声抽出

記事を表すアイコン

混ざった声を聞き分ける最新技術:音源分離と目的音声抽出

資料種別
記事
著者
池下 林太郎ほか
出版者
The Institute of Electronics, Information and Communication Engineers
出版年
2025-04-01
資料形態
デジタル
掲載誌名
電子情報通信学会 基礎・境界ソサイエティ FUNDAMENTALS REVIEW 18 4
掲載ページ
p.267-278
詳細を見る

資料詳細

要約等:

<p>複数の音声やそのほかの音が混ざって収録された音響信号から,個々の音を分離して抽出する音源分離,及び特定の話者の音声のみを抽出する目的音声抽出について,最新の技術動向を解説する.これらの技術は,人にとって音声をより聞き取りやすくするだけでなく,後段の音声アプリケーションの性能向上にも寄与する.二...

全国の図書館の所蔵

国立国会図書館以外の全国の図書館の所蔵状況を表示します。

所蔵のある図書館から取寄せることが可能かなど、資料の利用方法は、ご自身が利用されるお近くの図書館へご相談ください

その他

書誌情報

この資料の詳細や典拠(同じ主題の資料を指すキーワード、著者名)等を確認できます。

デジタル

資料種別
記事
出版年月日等
2025-04-01
出版年(W3CDTF)
2025-04-01
タイトル(掲載誌)
電子情報通信学会 基礎・境界ソサイエティ FUNDAMENTALS REVIEW
巻号年月日等(掲載誌)
18 4
掲載巻
18
掲載号
4
掲載ページ
267-278
掲載年月日(W3CDTF)
2025-04-01
出版事項(掲載誌)
The Institute of Electronics, Information and Communication Engineers
本文の言語コード
ja
対象利用者
一般
参照
Independent Vector Analysis via Log-Quadratically Penalized Quadratic Minimization
WavLM: Large-Scale Self-Supervised Pre-Training for Full Stack Speech Processing
Independent Vector Extraction for Fast Joint Blind Source Separation and Dereverberation
Self-Supervised Speech Representation Learning: A Review
Speech Enhancement and Dereverberation With Diffusion-Based Generative Models
Fast Online Source Steering Algorithm for Tracking Single Moving Source Using Online Independent Vector Analysis
Blind and Neural Network-Guided Convolutional Beamformer for Joint Denoising, Dereverberation, and Source Separation
Improving Speaker Discrimination of Target Speech Extraction With Time-Domain Speakerbeam
Microphone Array Signal Processing and Deep Learning for Speech Enhancement: Combining model-based and data-driven approaches to parameter estimation and filtering
Target Speech Extraction with Conditional Diffusion Model
Speaker Activity Driven Neural Speech Extraction
The Conversation: Deep Audio-Visual Speech Enhancement
Speech Dereverberation Based on Maximum-Likelihood Estimation With Time-Varying Gaussian Source Model
Joint Dereverberation and Separation With Iterative Source Steering
ISS2: An Extension of Iterative Source Steering Algorithm for Majorization-Minimization-Based Independent Vector Analysis
SpeakerBeam: Speaker Aware Neural Network for Target Speaker Extraction in Speech Mixtures
Autoregressive Moving Average Jointly-Diagonalizable Spatial Covariance Analysis for Joint Source Separation and Dereverberation
Target-Speaker Voice Activity Detection: A Novel Approach for Multi-Speaker Diarization in a Dinner Party Scenario
Target Speech Extraction with Pre-Trained Self-Supervised Learning Models
Multi-Stream Diffusion Model for Probabilistic Integration of Model-Based and Data-Driven Speech Enhancement
Majorization-Minimization Algorithms in Signal Processing, Communications, and Machine Learning
Auxiliary-Function-Based Independent Component Analysis for Super-Gaussian Sources
Solution of Permutation Problem in Frequency Domain ICA, Using Multivariate Probability Density Functions
Multichannel blind deconvolution and equalization using the natural gradient
Determined BSS Based on Time-Frequency Masking and Its Application to Harmonic Vector Analysis
Generalization of Multi-Channel Linear Prediction Methods for Blind MIMO Impulse Response Shortening
Switching Independent Vector Analysis and its Extension to Blind and Spatially Guided Convolutional Beamforming Algorithms
Blind and Spatially-Regularized Online Joint Optimization of Source Separation, Dereverberation, and Noise Reduction
TF-GridNet: Integrating Full- and Sub-Band Modeling for Speech Separation
Neural Blind Source Separation and Diarization for Distant Speech Recognition
Neural Target Speech Extraction: An overview
BUDDy: Single-Channel Blind Unsupervised Dereverberation with Diffusion Models
Acoustic Modeling for Google Home
A Consolidated Perspective on Multimicrophone Speech Enhancement and Source Separation
Fast and Stable Blind Source Separation with Rank-1 Updates
Speech Dereverberation
Beamforming: a versatile approach to spatial filtering
An auxiliary-function approach to online independent vector analysis for real-time blind source separation
Real-Time Independent Vector Analysis for Convolutive Blind Source Separation
End-to-End SpeakerBeam for Single Channel Target Speech Recognition
End-to-End Multi-Speaker Speech Recognition Using Speaker Embeddings and Transfer Learning
The CHiME-8 DASR Challenge for Generalizable and Array Agnostic Distant Automatic Speech Recognition and Diarization
Computationally Efficient and Versatile Framework for Joint Optimization of Blind Speech Separation and Dereverberation
Streaming Target-Speaker ASR with Neural Transducer
Masked Modeling Duo: Towards a Universal Audio Pre-Training Framework
AudioLDM 2: Learning Holistic Audio Generation With Self-Supervised Pretraining
ICASSP 2023 Speech Signal Improvement Challenge
Personal VAD: Speaker-Conditioned Voice Activity Detection
Multi-Channel Linear Prediction-Based Speech Dereverberation With Sparse Priors
Speaker-Aware Neural Network Based Beamformer for Speaker Extraction in Speech Mixtures
U-Net: Convolutional Networks for Biomedical Image Segmentation
Deep clustering: Discriminative embeddings for segmentation and separation
Independent Low-Rank Matrix Analysis with Decorrelation Learning
SEGAN: Speech Enhancement Generative Adversarial Network
Blind separation of instantaneous mixtures of nonstationary sources
Looking to listen at the cocktail party
Fast fixed-point independent vector analysis algorithms for convolutive blind source separation
On Optimal Frequency-Domain Multichannel Linear Filtering for Noise Reduction
Speech Dereverberation Based on Variance-Normalized Delayed Linear Prediction
Independent component analysis, A new concept?
Blind Source Separation Exploiting Higher-Order Frequency Dependencies
A Robust and Precise Method for Solving the Permutation Problem of Frequency-Domain Blind Source Separation
Determined Blind Source Separation Unifying Independent Vector Analysis and Nonnegative Matrix Factorization
Low-latency real-time blind source separation for hearing aids based on time-domain implementation of online independent vector analysis with truncation of non-causal components
Multitalker Speech Separation With Utterance-Level Permutation Invariant Training of Deep Recurrent Neural Networks
Joint Separation and Dereverberation of Reverberant Mixtures with Determined Multichannel Non-Negative Matrix Factorization
Blind Separation and Dereverberation of Speech Mixtures by Joint Optimization
A summary of the REVERB challenge: state-of-the-art and remaining challenges in reverberant speech processing research
Fast and robust fixed-point algorithms for independent component analysis
Inverse filtering of room acoustics
Conv-TasNet: Surpassing Ideal Time–Frequency Magnitude Masking for Speech Separation
連携機関・データベース
国立情報学研究所 : CiNii Research
提供元機関・データベース
Japan Link Center
Crossref

デジタル

要約等
<p>In this paper, the latest advancements in source separation and target speech extraction technologies are reviewed. The former technology separates individual sounds from an acoustic signal recorded with multiple voices and other sounds, and the latter one extracts only the speech of the desired speaker. These technologies make speech more understandable for humans and contribute to improving downstream speech applications. Two important approaches are discussed: signal-model-based and neural-network-based methods. Then detailed explanations of representative techniques in the approaches, blind source separation in reverberant environments, and target speech extraction based on voice features are provided. Finally, the future prospects of this technological field are discussed.</p>
<p>複数の音声やそのほかの音が混ざって収録された音響信号から,個々の音を分離して抽出する音源分離,及び特定の話者の音声のみを抽出する目的音声抽出について,最新の技術動向を解説する.これらの技術は,人にとって音声をより聞き取りやすくするだけでなく,後段の音声アプリケーションの性能向上にも寄与する.二つの重要なアプローチとして,信号のモデルに基づく方法とニューラルネットワークに基づく方法をとりあげる.各アプローチの概要と特徴を述べ,代表的な技術として,残響環境下でのブラインド音源分離と声の特徴に基づく目的音声抽出について詳しく紹介する.最後に,この技術分野の今後の展望についても触れる.</p>
DOI
10.1587/essfr.18.4_267
オンライン閲覧公開範囲
インターネット公開
連携機関・データベース
科学技術振興機構 : J-STAGE