I concur, and as you say it comes from a video frame and thus a video. The fact that the video frame contains only a single one seems to change nothing.
If I were to agree with this, then would you be willing to agree that the single-pixel ambient light sensor adorning many pocket supercomputers is a camera?
And that recording a series of samples from this sensor would result in a video?