Microsoft and Nvidia team up to train one of the world’s largest language models

Microsoft and Nvidia today announced that they trained what they claim is the largest and most capable AI-powered language model to date: Megatron-Turing Natural Language Generation.

The successor to the companies’ Turing NLG 17B and Megatron-LM models, MT-NLP contains 530 billion parameters and achieves “Unmatched” accuracy in a broad set of natural language tasks, Microsoft and Nvidia say – including reading comprehension, commonsense reasoning, and natural language inferences.

When benchmarked, Microsoft says that MT-NLP can infer basic mathematical operations even when the symbols are “Badly obfuscated.” While not extremely accurate, the model seems to go beyond memorization for arithmetic and manages to complete tasks containing questions that prompt it for an answer, a major challenge in NLP. It’s well-established that models like MT-NLP can amplify the biases in data on which they were trained, and indeed, Microsoft and Nvidia acknowledge that the model “Picks up stereotypes and biases from the [training] data.” That’s likely because a portion of the dataset was sourced from communities with pervasive gender, race, physical, and religious prejudices, which curation can’t completely address.


Sign Up for NextBigWhat Newsletter