Introduction
In this guide, weāll learn how to train models using sounds, which opens up many possibilities for creative projects. For example, you can make a game where players control characters with voice commands in their own language. You could also create an app that helps kids practice pronouncing words correctly. Another idea is making advergames to catch customersā attention by challenging them to mimic the sound of a movie star, with prizes for those who succeed.
Training the Model
Follow the steps in the P5JS - ml5 and Teachable Machine Integration tutorial. Goto Teachable Machine audio training section to start.
Start by capturing the background noise of your environment. Doing this increases the accuracy of our model because background noise is always present. If there are no voice commands by the user and only background noise, it may lead to incorrect detections. By feeding the model with background noise, we aim to prevent such false inferences during our sample detection process.
The training process is more or less same with image classifier
method. Even we record our sound samples, the webpage converts sound to image data using sound spectrums processing.
Case Study
The following code is a demonstration of a simple game. Iāve trained the model using two words āsolā (left) and āsaÄā (right) in Turkish. The circle on the canvas tends to move right, and the challenge is to keep it within the canvas. Otherwise, you fail. This can work as a game to improve pronunciation of the Turkish language. Actually, it could be for any language. As long as you train the model with sufficient data, it can adapt to anything.
Even though Iāve trained the model using just three samplesāone for background noise and the others being two different words with 20 samples on eachāit still works. Some incorrect detections add an interesting twist to the game, almost as if the computer is trying to challenge the player.
Source Code index.html
<html>
<head>
<meta charset="UTF-8" />
<meta http-equiv="X-UA-Compatible" content="IE=edge" />
<meta name="viewport" content="width=device-width, initial-scale=1.0" />
<title>Sound classification using p5.js</title>
<script src="https://cdnjs.cloudflare.com/ajax/libs/p5.js/1.9.4/p5.min.js"></script>
<script src="https://unpkg.com/ml5@1/dist/ml5.min.js"></script>
</head>
<body>
<script src="sketch.js"></script>
</body>
</html>
Source Code sketch.js
// A variable to initialize the Image Classifier
let classifier;
// Variable for displaying the results on the canvas
let label = "Model loading...";
let soundModelURL = "https://teachablemachine.withgoogle.com/models/XicRv7--q/";
let words = ["sol", "Background Noise"];
let xx;
// Variable for displaying the results on the canvas
let predictedWord = "";
function preload() {
// Options for the SpeechCommands18w model, the default probabilityThreshold is 0
let options = { probabilityThreshold: 0.5 ,overlapFactor: 0.05};
// Load SpeechCommands18w sound classifier model
classifier = ml5.soundClassifier(soundModelURL);
}
function setup() {
createCanvas(650, 450);
// Classify the sound from microphone in real time
classifier.classifyStart(gotResult);
// set x position
xx = width / 2;
}
function draw() {
background(250);
// Call function for displaying background words
displayWords();
// Once the model outputs results start displaying the predicted word on the canvas
if (predictedWord !== "") {
fill(211, 107, 255);
textAlign(CENTER, CENTER);
textSize(64);
text(predictedWord, width / 2, 90);
}
// move the circle if the word == sol
if(predictedWord === "sol") {
xx = xx - 1;
}else{
xx = xx + 1;
}
if(xx > width) {
classifier.classifyStop();
predictedWord = "Pronounce better š";
}
fill(200, 0, 0);
circle(xx, height / 2, 50);
}
// Function to display the 18 words on the canvas
function displayWords() {
textAlign(CENTER, CENTER);
textSize(22);
fill(96);
text("Say sol in Turkish language to keep the circle in the canvas", width / 2, 40);
let x = 125;
let y = 150;
}
// A function to run when we get any errors and the results
function gotResult(results) {
// The results are in an array ordered by confidence
console.log(results);
// Load the first label to the text variable displayed on the canvas
predictedWord = results[0].label;
}
Check another example trained with voices of people in the class time. Supported color tones are Kırmızı, Mavi, Mor, Pembe, YeÅil. Choose a color and say its name, check to see the app is working or not.
Link to P5JS Code - Section A
Check the following code trained with daily objects in Section B. You can check the accuracy of the model by saying computer, mirror, or chair.
Link to P5JS Code - Section B