Lessons Learned from Kaggle’s Airbus Challenge

lettergram · on Jan 27, 2019

One thing I always recommend when doing segmentation is first altering the color space; even utilizing a CNN (which should, when trained, essentially perform that step). This is especially true of pre-trained models where you don't know if it's been tuned for that.

Just in terms of edge detection, you'll see nearly a 10% improvement just from shifting the color space to LAB:

https://austingwalters.com/edge-detection-in-computer-vision...

What's even better about the color space conversion, is you can utilize it as a pre-processing step to dramatically reduce the number of edges to search:

https://austingwalters.com/chromatags/

In the case of a blue sea, you can shift to the LAB color space and probably only search the 'A' channel; the channel representing green to red. As the 'B' channel represents yellow to blue. Which is less likely to produce edges in the ocean.

This means you only process 1 channel, which dramatically speeds up most algorithms.

Just food for thought.

ska · on Jan 27, 2019

The related L-star c-star h-star works well too, and is easier to interpret than LAB.

These spaces benefit from being approximately perceptually linear, which also helps when any metric is computed on them.

CoolGuySteve · on Jan 27, 2019

Great meta article. Some other general advice concerning iterative development:

- Test your ideas on a smaller data sample at first to iterate and debug more rapidly.

- Caching your augmented data to disk may also improve iteration speed. Or it may not, depending on kernal disk cache vs disk bandwidth vs compute tradeoffs.

- Wrap your functions in timers so you can notice when a new processing stage is unnaturally slow, rather than having to back out later what happened.

- Make sure you're actually setup and using the GPU and not accidentally running on the CPU.