Monday, June 14, 2010

Explanation of my Thesis

Ever heard about something sounds like image recognition system? well, it's a system that recognizes image to the extend of objects and facial recognition. Honda Asimo is one of the example using facial detection and recognition. What if a series of image recognition systems take place for let say, 30 times per second? Hence, we have a visual recognition system that gives us continuous output for 30fps.

Besides, there is something more. Well, ever heard about sound recognition? yes, you can speak and the words can be recognized by this machine. Exactly, a series of continuous processes will give us an audio recognition system.

However, is there something missing here? yes, of course. We've an audio recognition system, we've a visual recognition system. But, does the machine know where those audio and video come from? is that in front of the machine? or at the back? or left side? or right side? exactly, we need a device that tell us exactly which direction we are looking at. That formed the directional awareness of the combination of gyroscope and accelerometer.



So now, how on earth that we gonna do all of these concurrently. If you put an Intel Core 2 Duo processor inside, you are not gonna able to get 30fps + 44khz audio processing + khz gyro accelero output. Hence, the key lies in the technology of parallel processing. This is where Intel and AMD are heading. The current highest true cores processors are from AMD, 12 cores. so, is 12 cores enough? no, hence what we have here? the answer is CUDA from Nvidia. with hundreds of cores inside, Nvidia able to turn their Graphic cards into something very very useful for scientific calculation, aka supercomputer.

But do we need a so advance system that consume high power carried in a mobile robot? of course not. So what we can do is we steal some concept from Sony in their play station 3. The answer is the custom Integrated circuit design. But i'm not at that level. What to do next? finally i turned to Altera for their Field Programmable Gate Arrays (FPGA). The best tools for low power , high bandwidth , parallel processing and fast time to market.

So this so called FPGA will help us in the central processing of all these audio, visual and directional awareness sensors. The objective is to create a complete sensing network that able to behave like human. How is it possible? the answer lies in Artificial neural networks.

1 comment:

Kae Vin said...

Nice one :)