This is a tutorial on understanding and teaching machine learning with mBlock 5.
This section explores how to introduce the knowledge of artificial intelligence to beginners, such as primary and middle school students, and how to design activities and problems to make learners think and experience artificial intelligence. "Knowledge Point" and "Think About It" are provided in the article as references for teachers to make teaching plans.
This work is licensed under the Creative Commons Attribution 4.0 International license agreement.
The latest version of mBlock, mBlock 5, provides a deep learning visual model. It is an important tool for experiencing, learning, and creating visual AI applications. On mBlock 5, you can quickly train a machine learning model to be used in Scratch programming. mBlock 5 can also be used in combination with hardware such as robots to create rich interactive effects.
Note: The use of deep learning requires a camera and certain performance features on a computer. It's still experimental. Check whether the computer used in the classroom can use this function smoothly before using it to teach a class.
Open the following page in your browser and select mBlock 5 version that is applicable to the operating system that runs on your computer.
Click the Panda sprite to select it.
Then, click + extension to enter Extension center.
In Extension center that pops up, click + Add at the bottom of Teachable Machine.
WHY SWITCH TO PANDA? Because in mBlock 5, extensions are sprite-related; an Arduino or micro-python robot doesn't have the ability to run complex machine-vision models, and therefore the Teachable Machine extension is not provided for devices. What if you want to use it to control a robot? You can use the communication variables to communicate with the robot. But that's another topic.
Here is an example of how to use a machine learning visual model with a "rock-paper-scissors" program.
Generally speaking, the use of a machine learning model includes three steps, namely training, testing, and application. If you want to create an application that can identify the rock, paper, scissors gestures, first you need to teach the computer what a rock is, what scissors are, and what paper is.
Knowledge Point: The machine learning model we're going to build can take an input (say, a picture), classify the input (for example, "scissors", "rock", or "paper") . This model is also called a classifier. Classifiers are very useful in our daily life, for example, they are used in parking lots to record license plate numbers, shooting the license plate numbers with cameras and classify the numbers into the categories of 0 to 9.
After adding the extension, in the new TM category, click Training model to train the model.
If your computer has multiple cameras, you can choose which one to use in the upper left corner of the screen.
After selecting the camera, make a "Rock" motion with your hand, then hold down the Learn button on the first category item with the mouse to let the machine learning model know what a "Rock" is. When the number of samples is more than 10, release the mouse to finish the learning.
Knowledge Point: How does a machine learn what is "Rock" and what is "Scissors" ? It requires someone to provide some pictures for it to tell it, "This is the Rock, and this is scissors." The pictures we take are called training samples, and the information in the samples that says "this is the rock" is called tagging. Many large companies hire people to do the tagging: Tell a computer all day that "This is the cat's left eye." As a result, large Internet companies such as Baidu or Google often take the lead in machine learning because people use their products every day, providing them with lots of training samples.
Knowledge Point: The machine learning model we are going to create is called the supervised learning model. As the name suggests, the machine learns with the supervision of the labeled sample we provide. Another type of learning is called "unsupervised learning," in which the computer sorts photos of some fruits without tagging information. The result is usually that "these photos belong to one category" and "those photos belong to another" .
Think About It: What are the uses of supervised learning? What are the uses of unsupervised learning?
In the same way, the model is trained to recognize "Scissors" and "Paper". Change the category names to Rock, Scissors, and Paper.
Then, switch between different gestures in front of the camera, and you can see that the percentage between each category keeps changing. It indicates the confidence the computer thinks the image it sees in each category is. The category with the most confidence is the classification result of the computer.
Knowledge Point: What is "confidence" ? Confidence is the result calculated by the machine learning model based on a new input after it is trained. It's not a probability. Probability refers to the probability of a random event, where "confidence" has nothing to do with probability. It just helps the machine learning model figure out what the result is.
Have a Try: Try a few rock-paper-scissors motions, and try to get other students to do such moves, see if the computer can accurately judge the classification of movements? You can try it 30 times and write down on paper how many times the computer has got it right and how many times it has got it wrong. By dividing the number of attempts by the number of correct model judgments, we can determine the effectiveness of machine learning models. The effect of machine learning is called its performance.
Think About It: Compare your models with other groups'. Whose model works better? Why? Can you try to inmprove your model's performance? Is it better with more samples? Are there other ways to improve model performance?
Knowledge Point: Underfitting and overfitting. If just a few samples are provided, the computer will fail to learn the characteristics of a thing, like the blind men touching an elephant, resulting in poor performance. This is called underfitting. On the contrary, overfitting refers to that the machine learning model overcharacterizes the given samples and fails to respond correctly to general situations. For example, it can recognize a person's gesture at an angle as "scissors" but cannot see that the hand gesture of another person is also "scissors". Overfitting is like some students who learn by rote, unable to apply what they learn to a changing environment.
After the training is complete, click Use the model to write the program using the model in mBlock 5. You can click Build a new model to empty the current model and retrain a new model.
After using the model, three statement blocks have been added to the TM category: recognition result, confidence of ( ), and recognition result is ( )?
recognition result is the tagging of the object the computer thinks it sees. It's a string, for example, Rock or Scissors.
confidence of ( ) is the confidence of the computer in a certain category. It's a decimal number greater than 0 and less than 1, just like the percentage you see above. You can use it to make more specific judgments. For example, when the confidence for each category is less than a certain number, you can determine that the computer has not made a conclusion.
recognition result is ( )? is used to get a result of yes or no. You can place it in the "if" statement block as a condition for program execution.
Note: Before using a model, you need to click Open recognition window. The computer uses a trained machine learning model only when the recognition window is open. Machine learning models are resource intensive. Turning them on may slow down the computer. Turn them on only when you need to use them.
Knowledge Point: How do you make machine learning models work faster? A lot of people use graphics processing units (GPUs) for computing. Central processing units (CPUs) are often called the brain of computers. Unlike CPUs, GPUs are dedicated to graphics computing and are typically placed on a graphics card to make video games cool. What makes the game so cool is that it can quickly compute linear algebra (a college math subject). The speed at which linear algebra is computed determines the computing speed of position, angle, and light and shadow in 3D spaces (which determines how well the game is played). Similarly, machine learning models use a large amount of linear algebra computation, and therefore many people use GPUs (graphics cards) to speed up machine learning computation. Nowadays, some mobile phones made in China are using their self-developed chips for machine learning. In this way, their cameras can quickly identify the object being photographed and process the images to make them more beautiful.
Knowledge Point: Training a model requires far more computing power than using a model. As a result, many people use many computers for group operation (which is called a computing cluster) or use supercomputers to train a model and then apply the model to microcomputers, such as mobile phone. The computing power of China's Tianhe series of supercomputers is among the best in the world. Google "Tianhe-2" to learn more about them.
Now, it is time for Scratch programming. Think about how you can use this model. Here's a simple (but boring) example. Try to make it interesting.
Have a Try: Use other blocks of mBlock 5 in combination with the TM blocks to see what you can create. If you are confident, you can compile a program to enable the computer to fight with you in the rock-paper-scissors game. You can also think about what else you can teach a computer to recognize.
Think About It: Besides license plate recognition, what else can we use machine learning models to do?
Knowledge Point: The machine visual model of mBlock 5 uses a technique called Convolutional Neural Network (CNN, adapted from Google's Teachable Machine). It allows regions of an image to be abstracted to a higher level. CNN has made neural networks and machine learning hot again (the concept of neural networks first appeared in 1943) with excellent performance. It also allows mBlock 5 to achieve high machine learning performance with a small sample size (a dozen to dozens).