The master’s thesis authored by Nour Al-Huda Muzaffar Suleiman, a student from the Department of Information and Communications Engineering, was reviewed on Monday morning, June 24, 2024, in the postgraduate studies hall at the college. The committee was chaired by Prof. Muhammad Imad Abdel Sattar from Al-Nahrain University, College of Information Engineering, with Dr. Ammar Adel Hassan from the University of Baghdad, College of Engineering, Department of Computer Engineering, and Dr. Fatima Bahjat Ibrahim from the University of Baghdad, Al-Khwarizmi College of Engineering, Information and Communications Engineering, serving as members. The thesis was conducted under the supervision of Prof. Ahmed Sattar Hadi from the University of Baghdad, Al-Khwarizmi College of Engineering, Department of Information and Communications Engineering. Following the student’s presentation, the committee head announced that the researcher had achieved a very good grade.
The thesis proposes an Audio-Visual Source Separation system (AVSS) using deep neural networks to separate speech from audio mixtures with multiple speakers, leveraging visual cues. The system, speaker-independent, includes stages for audio and visual signal preprocessing, followed by feature extraction. It integrates convolutional neural networks (CNNs), LSTMs, and facial attribute detection via MTCNN. Utilizing PCA, STFT, and MFCC aids in feature reduction and training efficiency. Evaluation on AV speech shows a notable 7.7dB SDR improvement over prior methods, highlighting its efficacy.
Comments are disabled.