If the tag-line of this post doesn’t make any sense then you might not be up to speed of the current trend in deep learning, the transformer. I have been aware of the transformer architecture for a while now. I spend time at work exploring LLMs like bart, T5, and like everyone else have seen the impressive GPT-3 demo from openAI. So, over christmas I decided it was time to dig into the detail and truly get upto speed.
The original use case of the transformers is for NLP, in the original paper it was used for sequence to sequence machine translation. However, I’m not an NLP expert. I also know transformers are data hungry and I have easy access to a lot of satellite imagery (about 300GB). Fundamentally, the attention / transformer architecture just deals with sequences and there have been work to use them for images e.g ViT. So why not use them to try and do weather forecasts?