Speech Command Classification

Speech gif
Speech confusion matrix
Speech loss plot

Project information

  • Category: Artificial Intelligence, Speach Recognition
  • Project Tools: Python, Google Colab
  • Project date: July, 2023
  • Project URL: GitHub Repository

Project description

This project aims to implement a voice assistant to recognize and classify the command into one of the 12 classes. These titles incude the following: bonds, currency, coin, bank, gold, petroleum, derivatives, metals, stock fund, fixed income fund, mixed fund, tradable fund. The commands are recorded and trained in Persian. To do so, we implemented a custom convolutional neural network in Python and trained it to get a high accuracy of 85%.

In this project, we use a custom CNN to process the raw data. Usually, more advanced transformations are applied to voice data, but CNNs can be used for more accurate processing of raw data. Our particular architecture consists of five layers. Each layer is a combination of a convolutional layer, batch normalization, and max pooling. After the last layer, we have a linear layer outputting the result as a classification.