It’s very difficult, if not impossible, for us humans to understand how robots see the world. Their cameras work like our eyes do, but the space between the image that a camera captures and actionable information about that image is filled with a black box of machine learning algorithms that are trying to translate patterns of features into something that they’re familiar with. Training these algorithms usually involves showing them a set of different pictures of something (like a stop sign), and then seeing if they can extract enough common features from those pictures to reliably identify stop signs that aren’t in their training set.
This works pretty well, but the common features that machine learning algorithms come up with generally are not “red octagons with the letters S-T-O-P on them.” Rather, they’re looking features that all stop signs share, but would not be in the least bit comprehensible to a human looking at them. If this seems hard to visualize, that’s because it reflects a fundamental disconnect between the way our brains and artificial neural networks interpret the world.