Li, now a founding director of Stanford University's Institute for Human-Centered Artificial Intelligence, is out with a new memoir that recounts her pioneering work in curating the dataset that accelerated the computer vision branch of AI. (Pexels)News 

AI pioneer emphasizes the importance of acknowledging human agency in discussions about intelligent machines.

Fei-Fei Li, a key figure in the current artificial intelligence revolution, faced skepticism from some computer scientists when she first proposed the concept of ImageNet, a massive visual database that took several years to develop. However, Li, who currently serves as a founding director at Stanford University’s Institute for Human-Centered Artificial Intelligence, has now released a memoir detailing her groundbreaking efforts in creating this dataset, which significantly advanced the field of computer vision in AI.

The book “The World I See” also chronicles his formative years, abruptly shifted from China to New Jersey, and follows him through academia, Silicon Valley and the halls of Congress as the growing commercialization of artificial intelligence technology sparked public attention and backlash. He spoke with The Associated Press about the book and the current AI moment. The interview has been edited for length and clarity.

Q: Your book describes how you envisioned ImageNet as more than just a huge data set. Can you explain?

A: ImageNet is really the quintessential story of identifying the North Star of an AI problem and then finding a way around it. The bottom star for me was really rethinking how we can solve the problem of visual intelligence. One of the basic problems of visual intelligence is understanding or seeing objects because the world is made up of objects. Human vision is based on our understanding of objects. And there are many, many, many. ImageNet is really an attempt to define the object recognition problem and also provide a path to solve it, which is a big data path.

Q: If I could travel back in time to 15 years ago when you were working hard at ImageNet and tell you about DALL-E, Stable Diffusion, Google Gemini and ChatGPT – what would surprise you the most?

A: It doesn’t surprise me that everything you mentioned – DALL-E, ChatGPT, Gemini – is based on big data. They are pre-trained for a large amount of information. That’s exactly what I was hoping for. I was surprised that we got to generative AI faster than most of us thought. Creating a generation of people is really not that easy. Most of us are not natural artists. The easiest generation for humans is words, because speaking is generative, but drawing and painting is not generative for a normal person. We need the Van Goghs of the world.

Q: What do you think most people want from smart machines, and does it align with what scientists and tech companies are building?

A: I think fundamentally people want dignity and a good life. It is almost a basic principle of our country. Machines and technology should be aligned with universal human values—human dignity and a better life—including freedom and all those things. Sometimes when we talk about technology or sometimes when we build technology, be it intentional or unintentional, we don’t talk about it enough. When I say “we”, it includes technicians, companies, but also suppliers. It is our shared responsibility.

Q: What are the biggest misconceptions about artificial intelligence?

A: The biggest misunderstanding about artificial intelligence in journalism is when journalists use artificial intelligence and a verb as the subject and put humans in the object. The human factor is very, very important. We create technology, use technology and manage technology. The media and public discourse, but under the influence of the media, talk about artificial intelligence without proper respect for human functionality. We have so many articles, so many conversations that start with ‘AI brings blah, blah, blah; AI does blah blah blah; AI delivers blah blah blah; AI destroys blah, blah, blah. And I think we need to recognize this.

Q: Having studied neuroscience before getting into computer vision, how different or similar are AI processes to human intelligence?

A: Having scratched the surface of neuroscience, I have an even greater respect for their differences. We don’t really know the intricate details of how our brains think. We have an inkling of lower-level visual tasks, such as seeing colors and shapes. But we don’t know how people write Shakespeare, how we come to love someone, how we designed the Golden Gate Bridge. There is so much complexity in human brain science that it remains a mystery. We don’t know how we do it in less than 30 watts, the energy used by the brain. Why are we so terrible at math when we are so quick to see, navigate and manipulate the physical world? The brain is an infinite source of inspiration for what AI should be and do. Its neural architecture — (Nobel Prize-winning neurophysiologists) Hubel and Wiesel were actually its discoverers — was the beginning of the inspiration for the artificial neural network. We borrowed this architecture, even though it doesn’t exactly mathematically reproduce what the brain does. There is a lot of intertwined inspiration. But we also have to respect that there are a lot of unknowns out there, so it’s hard to answer how similar they are.

Related posts

Leave a Comment