Jump to main content
記事

混ざった声を聞き分ける最新技術:音源分離と目的音声抽出

Icons representing 記事

混ざった声を聞き分ける最新技術:音源分離と目的音声抽出

Material type
記事
Author
池下 林太郎ほか
Publisher
The Institute of Electronics, Information and Communication Engineers
Publication date
2025-04-01
Material Format
Digital
Journal name
電子情報通信学会 基礎・境界ソサイエティ FUNDAMENTALS REVIEW 18 4
Publication Page
p.267-278
View Details

Detailed bibliographic record

Summary, etc.:

<p>複数の音声やそのほかの音が混ざって収録された音響信号から,個々の音を分離して抽出する音源分離,及び特定の話者の音声のみを抽出する目的音声抽出について,最新の技術動向を解説する.これらの技術は,人にとって音声をより聞き取りやすくするだけでなく,後段の音声アプリケーションの性能向上にも寄与する.二...

Holdings of Libraries in Japan

This page shows libraries in Japan other than the National Diet Library that hold the material.

Please contact your local library for information on how to use materials or whether it is possible to request materials from the holding libraries.

other

  • J-STAGE

    Digital
  • CiNii Research

    Search Service
    Digital
    You can check the holdings of institutions and databases with which CiNii Research is linked at the site of CiNii Research.

Bibliographic Record

You can check the details of this material, its authority (keywords that refer to materials on the same subject, author's name, etc.), etc.

Digital

Material Type
記事
Publication Date
2025-04-01
Publication Date (W3CDTF)
2025-04-01
Periodical title
電子情報通信学会 基礎・境界ソサイエティ FUNDAMENTALS REVIEW
No. or year of volume/issue
18 4
Volume
18
Issue
4
Pages
267-278
Publication date of volume/issue (W3CDTF)
2025-04-01
Publication (Periodical Title)
The Institute of Electronics, Information and Communication Engineers
Text Language Code
ja
Target Audience
一般
References
Independent Vector Analysis via Log-Quadratically Penalized Quadratic Minimization
WavLM: Large-Scale Self-Supervised Pre-Training for Full Stack Speech Processing
Independent Vector Extraction for Fast Joint Blind Source Separation and Dereverberation
Self-Supervised Speech Representation Learning: A Review
Speech Enhancement and Dereverberation With Diffusion-Based Generative Models
Fast Online Source Steering Algorithm for Tracking Single Moving Source Using Online Independent Vector Analysis
Blind and Neural Network-Guided Convolutional Beamformer for Joint Denoising, Dereverberation, and Source Separation
Improving Speaker Discrimination of Target Speech Extraction With Time-Domain Speakerbeam
Microphone Array Signal Processing and Deep Learning for Speech Enhancement: Combining model-based and data-driven approaches to parameter estimation and filtering
Target Speech Extraction with Conditional Diffusion Model
Speaker Activity Driven Neural Speech Extraction
The Conversation: Deep Audio-Visual Speech Enhancement
Speech Dereverberation Based on Maximum-Likelihood Estimation With Time-Varying Gaussian Source Model
Joint Dereverberation and Separation With Iterative Source Steering
ISS2: An Extension of Iterative Source Steering Algorithm for Majorization-Minimization-Based Independent Vector Analysis
SpeakerBeam: Speaker Aware Neural Network for Target Speaker Extraction in Speech Mixtures
Autoregressive Moving Average Jointly-Diagonalizable Spatial Covariance Analysis for Joint Source Separation and Dereverberation
Target-Speaker Voice Activity Detection: A Novel Approach for Multi-Speaker Diarization in a Dinner Party Scenario
Target Speech Extraction with Pre-Trained Self-Supervised Learning Models
Multi-Stream Diffusion Model for Probabilistic Integration of Model-Based and Data-Driven Speech Enhancement
Majorization-Minimization Algorithms in Signal Processing, Communications, and Machine Learning
Auxiliary-Function-Based Independent Component Analysis for Super-Gaussian Sources
Solution of Permutation Problem in Frequency Domain ICA, Using Multivariate Probability Density Functions
Multichannel blind deconvolution and equalization using the natural gradient
Determined BSS Based on Time-Frequency Masking and Its Application to Harmonic Vector Analysis
Generalization of Multi-Channel Linear Prediction Methods for Blind MIMO Impulse Response Shortening
Switching Independent Vector Analysis and its Extension to Blind and Spatially Guided Convolutional Beamforming Algorithms
Blind and Spatially-Regularized Online Joint Optimization of Source Separation, Dereverberation, and Noise Reduction
TF-GridNet: Integrating Full- and Sub-Band Modeling for Speech Separation
Neural Blind Source Separation and Diarization for Distant Speech Recognition
Neural Target Speech Extraction: An overview
BUDDy: Single-Channel Blind Unsupervised Dereverberation with Diffusion Models
Acoustic Modeling for Google Home
A Consolidated Perspective on Multimicrophone Speech Enhancement and Source Separation
Fast and Stable Blind Source Separation with Rank-1 Updates
Speech Dereverberation
Beamforming: a versatile approach to spatial filtering
An auxiliary-function approach to online independent vector analysis for real-time blind source separation
Real-Time Independent Vector Analysis for Convolutive Blind Source Separation
End-to-End SpeakerBeam for Single Channel Target Speech Recognition
End-to-End Multi-Speaker Speech Recognition Using Speaker Embeddings and Transfer Learning
The CHiME-8 DASR Challenge for Generalizable and Array Agnostic Distant Automatic Speech Recognition and Diarization
Computationally Efficient and Versatile Framework for Joint Optimization of Blind Speech Separation and Dereverberation
Streaming Target-Speaker ASR with Neural Transducer
Masked Modeling Duo: Towards a Universal Audio Pre-Training Framework
AudioLDM 2: Learning Holistic Audio Generation With Self-Supervised Pretraining
ICASSP 2023 Speech Signal Improvement Challenge
Personal VAD: Speaker-Conditioned Voice Activity Detection
Multi-Channel Linear Prediction-Based Speech Dereverberation With Sparse Priors
Speaker-Aware Neural Network Based Beamformer for Speaker Extraction in Speech Mixtures
U-Net: Convolutional Networks for Biomedical Image Segmentation
Deep clustering: Discriminative embeddings for segmentation and separation
Independent Low-Rank Matrix Analysis with Decorrelation Learning
SEGAN: Speech Enhancement Generative Adversarial Network
Blind separation of instantaneous mixtures of nonstationary sources
Looking to listen at the cocktail party
Fast fixed-point independent vector analysis algorithms for convolutive blind source separation
On Optimal Frequency-Domain Multichannel Linear Filtering for Noise Reduction
Speech Dereverberation Based on Variance-Normalized Delayed Linear Prediction
Independent component analysis, A new concept?
Blind Source Separation Exploiting Higher-Order Frequency Dependencies
A Robust and Precise Method for Solving the Permutation Problem of Frequency-Domain Blind Source Separation
Determined Blind Source Separation Unifying Independent Vector Analysis and Nonnegative Matrix Factorization
Low-latency real-time blind source separation for hearing aids based on time-domain implementation of online independent vector analysis with truncation of non-causal components
Multitalker Speech Separation With Utterance-Level Permutation Invariant Training of Deep Recurrent Neural Networks
Joint Separation and Dereverberation of Reverberant Mixtures with Determined Multichannel Non-Negative Matrix Factorization
Blind Separation and Dereverberation of Speech Mixtures by Joint Optimization
A summary of the REVERB challenge: state-of-the-art and remaining challenges in reverberant speech processing research
Fast and robust fixed-point algorithms for independent component analysis
Inverse filtering of room acoustics
Conv-TasNet: Surpassing Ideal Time–Frequency Magnitude Masking for Speech Separation
Data Provider (Database)
国立情報学研究所 : CiNii Research

Digital

Summary, etc.
<p>In this paper, the latest advancements in source separation and target speech extraction technologies are reviewed. The former technology separates individual sounds from an acoustic signal recorded with multiple voices and other sounds, and the latter one extracts only the speech of the desired speaker. These technologies make speech more understandable for humans and contribute to improving downstream speech applications. Two important approaches are discussed: signal-model-based and neural-network-based methods. Then detailed explanations of representative techniques in the approaches, blind source separation in reverberant environments, and target speech extraction based on voice features are provided. Finally, the future prospects of this technological field are discussed.</p>
<p>複数の音声やそのほかの音が混ざって収録された音響信号から,個々の音を分離して抽出する音源分離,及び特定の話者の音声のみを抽出する目的音声抽出について,最新の技術動向を解説する.これらの技術は,人にとって音声をより聞き取りやすくするだけでなく,後段の音声アプリケーションの性能向上にも寄与する.二つの重要なアプローチとして,信号のモデルに基づく方法とニューラルネットワークに基づく方法をとりあげる.各アプローチの概要と特徴を述べ,代表的な技術として,残響環境下でのブラインド音源分離と声の特徴に基づく目的音声抽出について詳しく紹介する.最後に,この技術分野の今後の展望についても触れる.</p>
DOI
10.1587/essfr.18.4_267
Access Restrictions
インターネット公開
Data Provider (Database)
科学技術振興機構 : J-STAGE