ChatGPT and Other Transformers: How to Select Large Language Model for Your NLP Projects

Three types of transformers: Encoder model, decoder model, and sequence-to-sequence model

Alina Zhang


All images in the article are created by author

We are in the golden age of natural language processing. Since 2018, the born of GPT and BERT, a gang of transformers has emerged and gradually become the workhorse in industry. The star players of large language models are shown in the following figure:

Star players of large language models.

But how to know which language model would perform best on your task? How to choose the best model for your NLP project?

The common NLP projects are text classification, text summarization, question answering, text generation, named entity recognition, sentiment analysis, language modeling, and translation. To answer which model would be the best candidate, we need to understand three problems first:

  1. Why transformers are so powerful?
  2. What are the three types of transformers?
  3. What are the advantages of different types of large language models?

Why transformers like GPT, BERT, and BART are so powerful — standing on the shoulders of giants

Whether your father is a scientist or he dropped out of high school, you have to start your learning from scratch. Because you cannot inherit knowledge directly from your father or another brain.

But what if the knowledge from your father was transferred to you a hundred percent when you were born? What if you are a 1-month-old scientist? This would significantly save lots of time humans spend on education and boost the development of science and technology.

A transformer is a deep learning model with a large number of layers. The combination of attention mechanism, parallelizable computation, and transfer learning makes transformer a powerful tool. But the most unique part of its architecture is the transfer learning.