Thanks to a lot of careful design, PhotoTime makes organizing your pictures intuitive and easy. Underneath the hood, however, there’s cutting-edge technology at work, and the advances that make PhotoTime possible are constantly evolving. This week, we wanted to speak to an expert: our own Chief Scientist, Dr. Wei Xia, who gave a talk about large scale visual recognition at GPU Technology Conference last week. What led him to the field of computer vision, and how did that lead him to Orbeus (PhotoTime’s mother company) and this project? And what exactly is in store for this growing technology and the applications it makes possible?
How did you get interested in computer vision?
Wei: When I was a kid, I was very interested in science fiction, and through this became fascinated by robots and artificial intelligence. After I went to college, I joined Media and Communication Lab at Huazhong University of Science and Technology where I worked in computer vision which was quite fancy at the time. Then I applied as an intern, and began my years of research in the field of computer vision.
And what inspired you to pursue a PhD degree in computer vision?
Wei: As an intern, I did several related projects at the lab and accumulated some fundamental knowledge about computer vision. I participated as the team-leader in an innovative project for undergraduates with a funding of 10,000 RMB (~1600 USD) through the ministry of education. Our team developed a trademark retrieval system. Although the system was quite simple and not so practical, it did make us believe that we could do something really meaningful. Pursuing a PhD seemed like a good way to fulfill this goal.
During his PhD studies at National University of Singapore, Dr. Xia was introduced to the PASCAL Visual Object Classes segmentation challenge. PASCAL is an annual contest seeking the most advanced methods of object recognition, broken up into categories such as object classification, large scale recognition and action recognition in still images. The experience was a real challenge. “On beginning this challenge, I knew almost nothing about segmentation,” Dr. Xia recalls. “It was tough and lonely- most of the time I was the only person who knew where the problems were, and I didn’t yet know how to solve them. But I persevered, and thanks to the lessons I’d learned about good research, I managed to take first prize.”
How did you meet Orbeus and what made you decide to join the team?
Wei: Since I was quite interested in the possibility of applying computer vision technology to real world problems, I kept watching the news for startups in this field during my PhD years. Before I knew anyone from Orbeus, I came across their websites and tried their APIs. A few months later, a friend of mine introduced me to Meng Wang, the CTO of Orbeus. We had a good talk about the field and how we could push forward. I was excited about what they were doing, as well as our shared beliefs, so I decided to join them as a research scientist, hoping that we could come up with some real killer-apps in the visual recognition field.
How do you value the things Orbeus is doing?
Wei: Orbeus is trying some very interesting things to narrow down the gap between academia and real consumer products. Our API website demonstrates the great capability in both facial and object recognition, while the newly released app, PhotoTime, further pushed the integrated technology of computer vision and deep learning to the everyday users, by solving a very urgent problem of photo searching and organizing. It has the potential to be the first killer-app in this field with a lot of possibilities.
Despite his dedication to Orbeus- and its growing list of accomplishments, Dr. Xia admits that the transition from a laboratory to a startup was quite difficult at times. “In a startup, other than research, you have many more responsibilities,” he explains. “For example, you have to communicate with the product time, and even serve as an effective marketer, sometimes.” He also notes a difference in focus. “The overall philosophies are different. Research has more to do with novelty- commercial enterprises demand performance. The finished product must reflect more than theoretical improvement. Some very clever engineering skills are required to make the product work reliably and efficiently.”
GPU Technology Conference is your first show. How did you feel about it?
Wei: In one word, it was mind-blowing. I have learned a lot from the keynotes and the talks with a lot of talented researchers and engineers. Since deep learning is the key topic of GTC15, the most important message I want to convey is that, although we have made great progress in this field, from both the computational hardware, to the algorithmic level of deep neural networks, it’s still far from the real solution. Every researcher should think about the possibility of the killer-app: autonomous cars, medical applications or photo-search apps. PhotoTime has taken first step. I hope it will provide us with much more feedback and many useful hints.
Deep learning now is a hot topic. What’s your expectation of visual recognition in the next three years?
Wei: Personally, with the aid of deep learning, together with the ever-increasing amount of data and computational power, I think supervised image classification problem will be solved in the next three years. However, it does not mean the visual recognition problem is solved entirely. For example, deep learning is still far from perfect in solving higher-level recognition problems, like object detection and segmentation. Of course, real time video analysis is another challenge. Although I am not sure whether these problems will be solved in a short time, I do have strong belief that the coming progress will be great. The other key aspect I’d like to see is in unsupervised learning. Whether the machine can learn the basic structure of the world without human supervision is also an interesting topic, and the research in this direction is still not quite as popular, compared to supervised visual recognition.
Outside of the lab and the office, Dr. Xia has some of the same hobbies as anyone else. He’s got a penchant for film and loves basketball, and when asked about his biggest influences, he likes to keep things classic: “In philosophy, Albert Einstein, Bill Gates and Steve Jobs have influenced me the most. Einstein lit the fire of scientific curiosity for me when I was young. Bill Gates showed me the great possibilities that technology can bring to us, while Jobs’ famous words about combining arts and technology has always inspired me to think about technology from different perspectives.” But there’s a more personal influencer, whom he credits with his greatest inspiration. “It’s my wife,” he says with a smile. “She has a very optimistic mind about life and a strong appreciation for anything beautiful, however small it may be.”