混ざった声を聞き分ける最新技術：音源分離と目的音声抽出

Material type: 記事

Author: 池下林太郎ほか

Publisher: The Institute of Electronics, Information and Communication Engineers

Publication date: 2025-04-01

Material Format: Digital

Journal name: 電子情報通信学会　基礎・境界ソサイエティ　ＦＵＮＤＡＭＥＮＴＡＬＳ　ＲＥＶＩＥＷ 18 4

Publication Page: p.267-278

View Details

Detailed bibliographic record

Summary, etc.：: 複数の音声やそのほかの音が混ざって収録された音響信号から，個々の音を分離して抽出する音源分離，及び特定の話者の音声のみを抽出する目的音声抽出について，最新の技術動向を解説する．これらの技術は，人にとって音声をより聞き取りやすくするだけでなく，後段の音声アプリケーションの性能向上にも寄与する．二...

Holdings of Libraries in Japan

This page shows libraries in Japan other than the National Diet Library that hold the material.

List of Cooperating Institutions and Databases

Please contact your local library for information on how to use materials or whether it is possible to request materials from the holding libraries.

other

J-STAGE
Digital
J-STAGE
CiNii Research
Search Service
Digital
You can check the holdings of institutions and databases with which CiNii Research is linked at the site of CiNii Research.
Check the holdings of this book

Bibliographic Record

You can check the details of this material, its authority (keywords that refer to materials on the same subject, author's name, etc.), etc.

Digital

Material Type: 記事
Title: 混ざった声を聞き分ける最新技術：音源分離と目的音声抽出
Author Heading: 池下林太郎
落合翼
デルクロアマーク
加茂直之
荒木章子
中谷智広
Publication Date: 2025-04-01
Publication Date (W3CDTF): 2025-04-01
Periodical title: 電子情報通信学会　基礎・境界ソサイエティ　ＦＵＮＤＡＭＥＮＴＡＬＳ　ＲＥＶＩＥＷ
No. or year of volume/issue: 18 4
Volume: 18
Issue: 4
Pages: 267-278
Publication date of volume/issue (W3CDTF): 2025-04-01
Publication (Periodical Title): The Institute of Electronics, Information and Communication Engineers
Text Language Code: ja
Subject Heading: Sound source separation
Target speech extraction
Speech enhancement
Neural network
Statistical signal processing
音源分離
目的音声抽出
音声強調
ニューラルネットワーク
統計的信号処理
Target Audience: 一般
DOI: 10.1587/essfr.18.4_267
https://doi.org/10.1587/essfr.18.4_267
Related Material (URI): https://www.jstage.jst.go.jp/article/essfr/18/4/18_267/_pdf
References: Independent Vector Analysis via Log-Quadratically Penalized Quadratic Minimization
https://cir.nii.ac.jp/crid/1360013247960125184
WavLM: Large-Scale Self-Supervised Pre-Training for Full Stack Speech Processing
https://cir.nii.ac.jp/crid/1360017285976659328
Independent Vector Extraction for Fast Joint Blind Source Separation and Dereverberation
https://cir.nii.ac.jp/crid/1360017288938536576
Self-Supervised Speech Representation Learning: A Review
https://cir.nii.ac.jp/crid/1360018681026941568
Speech Enhancement and Dereverberation With Diffusion-Based Generative Models
https://cir.nii.ac.jp/crid/1360020998593619200
Fast Online Source Steering Algorithm for Tracking Single Moving Source Using Online Independent Vector Analysis
https://cir.nii.ac.jp/crid/1360021391864315136
Blind and Neural Network-Guided Convolutional Beamformer for Joint Denoising, Dereverberation, and Source Separation
https://cir.nii.ac.jp/crid/1360022497271473536
Improving Speaker Discrimination of Target Speech Extraction With Time-Domain Speakerbeam
https://cir.nii.ac.jp/crid/1360022497271474816
Microphone Array Signal Processing and Deep Learning for Speech Enhancement: Combining model-based and data-driven approaches to parameter estimation and filtering
https://cir.nii.ac.jp/crid/1360023718185513088
Target Speech Extraction with Conditional Diffusion Model
https://cir.nii.ac.jp/crid/1360023718190903552
Speaker Activity Driven Neural Speech Extraction
https://cir.nii.ac.jp/crid/1360023721022896896
The Conversation: Deep Audio-Visual Speech Enhancement
https://cir.nii.ac.jp/crid/1360292620452689280
Speech Dereverberation Based on Maximum-Likelihood Estimation With Time-Varying Gaussian Source Model
https://cir.nii.ac.jp/crid/1360294724151259904
Joint Dereverberation and Separation With Iterative Source Steering
https://cir.nii.ac.jp/crid/1360298344751830784
ISS2: An Extension of Iterative Source Steering Algorithm for Majorization-Minimization-Based Independent Vector Analysis
https://cir.nii.ac.jp/crid/1360298345006791552
SpeakerBeam: Speaker Aware Neural Network for Target Speaker Extraction in Speech Mixtures
https://cir.nii.ac.jp/crid/1360298345534859264
Autoregressive Moving Average Jointly-Diagonalizable Spatial Covariance Analysis for Joint Source Separation and Dereverberation
https://cir.nii.ac.jp/crid/1360298757172601984
Target-Speaker Voice Activity Detection: A Novel Approach for Multi-Speaker Diarization in a Dinner Party Scenario
https://cir.nii.ac.jp/crid/1360299770033024512
Target Speech Extraction with Pre-Trained Self-Supervised Learning Models
https://cir.nii.ac.jp/crid/1360303972865665280
Multi-Stream Diffusion Model for Probabilistic Integration of Model-Based and Data-Driven Speech Enhancement
https://cir.nii.ac.jp/crid/1360305193162588800
Majorization-Minimization Algorithms in Signal Processing, Communications, and Machine Learning
https://cir.nii.ac.jp/crid/1360564063957693312
Auxiliary-Function-Based Independent Component Analysis for Super-Gaussian Sources
https://cir.nii.ac.jp/crid/1360574093891358592
Solution of Permutation Problem in Frequency Domain ICA, Using Multivariate Probability Density Functions
https://cir.nii.ac.jp/crid/1360574095834182400
Multichannel blind deconvolution and equalization using the natural gradient
https://cir.nii.ac.jp/crid/1360574096310094464
Determined BSS Based on Time-Frequency Masking and Its Application to Harmonic Vector Analysis
https://cir.nii.ac.jp/crid/1360576118774756224
Generalization of Multi-Channel Linear Prediction Methods for Blind MIMO Impulse Response Shortening
https://cir.nii.ac.jp/crid/1360576198891445760
Switching Independent Vector Analysis and its Extension to Blind and Spatially Guided Convolutional Beamforming Algorithms
https://cir.nii.ac.jp/crid/1360581245815860352
Blind and Spatially-Regularized Online Joint Optimization of Source Separation, Dereverberation, and Noise Reduction
https://cir.nii.ac.jp/crid/1360584340722947456
TF-GridNet: Integrating Full- and Sub-Band Modeling for Speech Separation
https://cir.nii.ac.jp/crid/1360584344422604288
Neural Blind Source Separation and Diarization for Distant Speech Recognition
https://cir.nii.ac.jp/crid/1360586669070045696
Neural Target Speech Extraction: An overview
https://cir.nii.ac.jp/crid/1360586669077197056
BUDDy: Single-Channel Blind Unsupervised Dereverberation with Diffusion Models
https://cir.nii.ac.jp/crid/1360586670922860544
Acoustic Modeling for Google Home
https://cir.nii.ac.jp/crid/1360845538912703744
A Consolidated Perspective on Multimicrophone Speech Enhancement and Source Separation
https://cir.nii.ac.jp/crid/1360845538958015488
Fast and Stable Blind Source Separation with Rank-1 Updates
https://cir.nii.ac.jp/crid/1360849945129821312
Speech Dereverberation
https://cir.nii.ac.jp/crid/1360855569491177472
Beamforming: a versatile approach to spatial filtering
https://cir.nii.ac.jp/crid/1360855570749772544
An auxiliary-function approach to online independent vector analysis for real-time blind source separation
https://cir.nii.ac.jp/crid/1360857672362558080
Real-Time Independent Vector Analysis for Convolutive Blind Source Separation
https://cir.nii.ac.jp/crid/1360861294591336832
End-to-End SpeakerBeam for Single Channel Target Speech Recognition
https://cir.nii.ac.jp/crid/1360862718470465792
End-to-End Multi-Speaker Speech Recognition Using Speaker Embeddings and Transfer Learning
https://cir.nii.ac.jp/crid/1360866922521943936
The CHiME-8 DASR Challenge for Generalizable and Array Agnostic Distant Automatic Speech Recognition and Diarization
https://cir.nii.ac.jp/crid/1360866923244509056
Computationally Efficient and Versatile Framework for Joint Optimization of Blind Speech Separation and Dereverberation
https://cir.nii.ac.jp/crid/1360866923244510848
Streaming Target-Speaker ASR with Neural Transducer
https://cir.nii.ac.jp/crid/1360866925857149824
Masked Modeling Duo: Towards a Universal Audio Pre-Training Framework
https://cir.nii.ac.jp/crid/1360868143116022784
AudioLDM 2: Learning Holistic Audio Generation With Self-Supervised Pretraining
https://cir.nii.ac.jp/crid/1360868143116024192
ICASSP 2023 Speech Signal Improvement Challenge
https://cir.nii.ac.jp/crid/1360868144047296128
Personal VAD: Speaker-Conditioned Voice Activity Detection
https://cir.nii.ac.jp/crid/1360868144548988672
Multi-Channel Linear Prediction-Based Speech Dereverberation With Sparse Priors
https://cir.nii.ac.jp/crid/1360868146034314240
Speaker-Aware Neural Network Based Beamformer for Speaker Extraction in Speech Mixtures
https://cir.nii.ac.jp/crid/1361137045715499520
U-Net: Convolutional Networks for Biomedical Image Segmentation
https://cir.nii.ac.jp/crid/1361418520447167744
Deep clustering: Discriminative embeddings for segmentation and separation
https://cir.nii.ac.jp/crid/1361699993835302016
Independent Low-Rank Matrix Analysis with Decorrelation Learning
https://cir.nii.ac.jp/crid/1361699994108636032
SEGAN: Speech Enhancement Generative Adversarial Network
https://cir.nii.ac.jp/crid/1361699995466029184
Blind separation of instantaneous mixtures of nonstationary sources
https://cir.nii.ac.jp/crid/1361699996014168320
Looking to listen at the cocktail party
https://cir.nii.ac.jp/crid/1362262944112939392
Fast fixed-point independent vector analysis algorithms for convolutive blind source separation
https://cir.nii.ac.jp/crid/1362262944432271744
On Optimal Frequency-Domain Multichannel Linear Filtering for Noise Reduction
https://cir.nii.ac.jp/crid/1362262945679875200
Speech Dereverberation Based on Variance-Normalized Delayed Linear Prediction
https://cir.nii.ac.jp/crid/1362544418584529024
Independent component analysis, A new concept?
https://cir.nii.ac.jp/crid/1362825894259639040
Blind Source Separation Exploiting Higher-Order Frequency Dependencies
https://cir.nii.ac.jp/crid/1362825894624610688
A Robust and Precise Method for Solving the Permutation Problem of Frequency-Domain Blind Source Separation
https://cir.nii.ac.jp/crid/1362825895038653440
Determined Blind Source Separation Unifying Independent Vector Analysis and Nonnegative Matrix Factorization
https://cir.nii.ac.jp/crid/1363107368278328064
Low-latency real-time blind source separation for hearing aids based on time-domain implementation of online independent vector analysis with truncation of non-causal components
https://cir.nii.ac.jp/crid/1363388843261071488
Multitalker Speech Separation With Utterance-Level Permutation Invariant Training of Deep Recurrent Neural Networks
https://cir.nii.ac.jp/crid/1363388844549361536
Joint Separation and Dereverberation of Reverberant Mixtures with Determined Multichannel Non-Negative Matrix Factorization
https://cir.nii.ac.jp/crid/1363388845538344320
Blind Separation and Dereverberation of Speech Mixtures by Joint Optimization
https://cir.nii.ac.jp/crid/1363670320394938240
A summary of the REVERB challenge: state-of-the-art and remaining challenges in reverberant speech processing research
https://cir.nii.ac.jp/crid/1363670320629704960
Fast and robust fixed-point algorithms for independent component analysis
https://cir.nii.ac.jp/crid/1363670321134359680
Inverse filtering of room acoustics
https://cir.nii.ac.jp/crid/1364233268668325376
Conv-TasNet: Surpassing Ideal Time–Frequency Magnitude Masking for Speech Separation
https://cir.nii.ac.jp/crid/1364233270330171392
Data Provider (Database): 国立情報学研究所 : CiNii Research
https://cir.nii.ac.jp/
Original Data Provider (Database): Japan Link Center
https://japanlinkcenter.org/top
Crossref
https://www.crossref.org

Digital

Summary, etc.: In this paper, the latest advancements in source separation and target speech extraction technologies are reviewed. The former technology separates individual sounds from an acoustic signal recorded with multiple voices and other sounds, and the latter one extracts only the speech of the desired speaker. These technologies make speech more understandable for humans and contribute to improving downstream speech applications. Two important approaches are discussed: signal-model-based and neural-network-based methods. Then detailed explanations of representative techniques in the approaches, blind source separation in reverberant environments, and target speech extraction based on voice features are provided. Finally, the future prospects of this technological field are discussed.
複数の音声やそのほかの音が混ざって収録された音響信号から，個々の音を分離して抽出する音源分離，及び特定の話者の音声のみを抽出する目的音声抽出について，最新の技術動向を解説する．これらの技術は，人にとって音声をより聞き取りやすくするだけでなく，後段の音声アプリケーションの性能向上にも寄与する．二つの重要なアプローチとして，信号のモデルに基づく方法とニューラルネットワークに基づく方法をとりあげる．各アプローチの概要と特徴を述べ，代表的な技術として，残響環境下でのブラインド音源分離と声の特徴に基づく目的音声抽出について詳しく紹介する．最後に，この技術分野の今後の展望についても触れる．
DOI: 10.1587/essfr.18.4_267
Access Restrictions: インターネット公開
Data Provider (Database): 科学技術振興機構 : J-STAGE
http://www.jstage.jst.go.jp

See Less