I’m going to make an assumption that you’re aware of a book written a gazillion years ago called 1984. It predicted a seriously messed up dystopian society where everyone was constantly stressed out because they had no privacy and there were way too many dumb rules. One of the most famous concepts from that book was that “Big brother is watching.” Wow, super creepy, right?! Well, good thing that something like that is only fiction.
Right? Uh… right?
Computer vision. It’s a real thing, and it’s… watching you already. Is this the birth of Big Brother or a tool that we can use to our advantage?
What is it?
Imagine an airport. You step off the shuttle and walk into the terminal. The line for security is long, but you’re in the fast lane today—literally. No need to pull out your ID or boarding pass. You step up to a kiosk, look into a small camera, and wait.
A second or two passes. Then… [ping.] You’re cleared to go. No overly handsy searches for you today. Not because someone recognized you, but because something did. A machine scanned your face, matched it against a database, and verified your identity—automatically, invisibly, and in a fraction of a second.
What just happened is more than just airport efficiency. It’s a glimpse into the world of computer vision—a branch of AI that’s teaching machines to see the world the way we do. Or at least, to try.
How does it work?
So what is computer vision, exactly?
At its core, it’s a way of teaching machines to interpret visual information—photos, videos, even real-time scenes from a camera. It’s not just about seeing pixels; it’s about understanding what those pixels represent. Is that a dog? A tree? A stop sign? Your face?
Just like we learn to recognize objects over time—from countless experiences and examples—machines learn too. But instead of a childhood filled with dogs and cats and picture books, they get training datasets. Sometimes millions of labeled images, each one telling the system: this is a cat, this is a car, this is a person wearing sunglasses.
Using techniques from deep learning, especially convolutional neural networks, computer vision systems process images in layers. I talked about this in a previous episode. The first layers might detect simple features like edges and corners. Later layers identify more complex patterns—like shapes, textures, and even facial features.
It’s kind of like assembling a puzzle. The machine pieces together low-level information to form a high-level guess about what it’s seeing.
And just like we get better with practice, so do these systems. The more examples they see, the better they can recognize things in new, unfamiliar images. That’s how facial recognition at the airport works: it compares your face to one it already knows and looks for a match with a high enough level of confidence.
But vision isn’t always about recognition. Sometimes it’s about detection—like spotting a pedestrian in front of a self-driving car. Or segmentation—like identifying the boundaries of a tumor in a medical scan. Or tracking—like following the movement of a soccer ball in a broadcast.
All of this is happening because the machine has been trained to “see” in ways that are useful, fast, and, ideally, accurate.
Why it matters?
You’ve already met one example of computer vision in the wild—airport security. But machines with eyes are everywhere once you start looking around.
Take your phone for instance. When it unlocks with a glance, it’s scanning key features on your face—like the distance between your eyes, or the shape of your jaw. That’s facial recognition, just like at the airport, only packed into your pocket.
Or think about self-driving cars. These vehicles rely on an entire suite of cameras to interpret their surroundings in real-time. Computer vision systems help them identify other cars, stop signs, lane markings, and that raccoon darting across the road with something disgusting in its mouth. Without vision, these cars are blind.
As I’ve mentioned in previous episodes, computer vision is being used to analyze scans—like X-rays, MRIs, or retinal images—to help doctors spot things like tumors or diabetic eye disease. It doesn’t replace the doctor, but it can act like a second set of eyes. And sometimes, it sees things a human might miss.
On farms, drones fly over crops and use computer vision to detect signs of disease or stress from drought conditions—helping farmers act before problems spread.
And of course, the dystopian master plan wouldn’t be complete without including social media. Ever wonder how apps can tag your friends in a photo automatically? That’s computer vision doing face detection and recognition in the background. Sometimes it’s convenient. Sometimes… a little creepy. Alright, fine, super creepy.
And then there are accessibility tools. Like apps that help people who are blind or visually impaired by describing the world around them: “person standing five feet away,” “door ahead on the left,” “red light,” “other person now standing way too close to my face.”
Whether it’s diagnosing cancer or describing a room, computer vision is giving machines a kind of visual awareness. Not quite the same as ours—but useful in all kinds of ways.
Drawbacks
For all its power though, computer vision still has blind spots—sometimes literal ones.
Let’s start with bias. These systems are only as good as the data they’re trained on. And if those datasets don’t reflect the full range of human diversity—say, in terms of skin tone, age, or lighting conditions—the results can be skewed. That’s led to some very real problems, especially with facial recognition misidentifying people of color at higher rates. In high-stakes settings like law enforcement, that kind of error isn’t just inconvenient—it’s dangerous.
Then there’s context. A human might look at a stop sign partially covered by snow and still recognize it. A computer vision model might not. Or it might flag a turtle on a racetrack as a rifle. Incidentally, this actually happened once in an early experiment at MTI. Turns out, some systems can be tricked by what are called “adversarial examples”—tiny changes in an image that throw off the algorithm completely, even if a human wouldn’t notice a thing.
Privacy is another concern. Cameras are everywhere—on phones, in doorbells, in public spaces—and as computer vision gets better, it becomes easier to track and identify people without their consent. Some cities have pushed back by banning facial recognition in public spaces. Others are leaning in.
And finally, there’s the black box problem. Many computer vision systems—especially those built with deep learning—can make eerily accurate predictions without offering much insight into how they got there. That can be a tough sell in fields like healthcare or criminal justice, where understanding the “why” is just as important as the “what.”
Computer vision is powerful, but it’s not infallible. Sure, the telescreens in 1984 were creepy—but at least they didn’t auto-tag your face and upload it to the cloud.
Conclusion
Machines may not see the world the way we do—but that’s not always a bad thing. When we see a blurry scan, a computer might spot a tumor. The times we get distracted behind the wheel, a camera might stay focused. Where we overlook the details, an algorithm might catch something vital.
Computer vision is still growing up. It’s learning—fast—but it’s also learning from us. What we choose to teach it–and how carefully we guide that learning–will shape the kind of vision we build into our world. Kinda like the difference between building R2D2, and the killer robots from Dr. Who.
So as we train our machines to recognize faces, streets, tumors, and text, the real challenge is teaching them what matters. Not just what something is—but why it matters, and to whom.
Because the better we understand how AI sees, the better we can shape what it should see.





