This course introduces the fundamental technologies employed
in Sound and Music Computing (SMC) focusing on
speech and music. This course introduces the concept of sound and its
representations in the analog and digital domains, as well as in time and
frequency domains. Moreover, this course provides hands-on experience using
relevant Machine Learning (ML) tools, and an in-depth review of related
technologies in sound data analytics, including Automatic Speech Recognition
(ASR), Automatic Music Transcription (AMT). Topics in sound synthesis,
automatic music generation will be covered for breadth. Prospective students
are expected to have some exposure to ML as they are expected to build, adapt,
and/or modify common ML pipelines for speech or music processing as part of
their group projects.
In summary, this course includes two aspects: understanding
and generation. We will focus on speech and music understanding and generation.
In addition to help students get hands-on experience in Discrete Fourier
Transform (DFT), Automatic Speech Recognition (ASR), and Automatic Music
Transcription (AMT), we will seek to enhance students’ abilities of effective
communications in the context of a group project.
Learning Objectives:
-Understand Discrete Fourier Transform (DFT) in
the context of audio analysis and synthesis.
-Understand the building blocks of automatic
speech recognition (ASR) systems, be able to implement and evaluate an ASR
system with a SOTA toolkit such as Speechbrain in a group project.
-Understand, be able to implement and evaluate
the approaches to automatic music transcription (AMT) in a group project.
-Work in groups, present solutions in both oral
and written formats, and discuss with other students on speech and music
processing.