AI can write just like me… I’ve seen how OpenAI’s GPT2 system can produce a column in my style. Guardian journalist

In February, non-profit research company OpenAI, cofounded by Elon Musk and backed by tech investors Reid Hoffman and Peter Thiel, revealed a language modelling program called GPT-2. The system was trained on around 40GB of internet texts obtained from most up-voted links on social media site Reddit.

AI system works by predicting the next word after given sequence of words and is so good at this, that it not only obtained high scores on benchmark tests but could “adapt to the style and content of the conditioning text” allowing “the user to generate realistic and coherent continuations about a topic of their choosing”.

The OpenAI team was so impressed by the ability of AI system to generate coherent and flexible texts that they took a rare step in open source community and decided against releasing the complete program, opting instead for the limited version of the system.

Impressive samples of AI content

In their blog post introducing the system, OpenAI provided several samples of the program at work. On feeding their AI program the following input:

In a shocking finding, scientist discovered a herd of unicorns living in a remote, previously unexplored valley, in the Andes Mountains. Even more surprising to the researchers was the fact that the unicorns spoke perfect English,

the AI system continued the story by producing the following text after 10 attempts:

The scientist named the population, after their distinctive horn, Ovid’s Unicorn. These four-horned, silver-white unicorns were previously unknown to science. Now, after almost two centuries, the mystery of what sparked this odd phenomenon is finally solved. Dr. Jorge Pérez, an evolutionary biologist from the University of La Paz, and several companions, were exploring the Andes Mountains when they found a small valley, with no other animals or humans. Pérez noticed that the valley had what appeared to be a natural fountain, surrounded by two peaks of rock and silver snow. Pérez and the others then ventured further into the valley. “By the time we reached the top of one peak, the water looked blue, with some crystals on top,” said Pérez.

AI text generator that is too good to be released

After publication, the OpenAI work has resulted in wide coverage with some alarming headlines: “OpenAI built a text generator so good, it’s considered too dangerous to release” (Techcrunch), “AI can write just like me. Brace for the robot apocalypse ” (Guardian).

Although OpenAI system still suffers from problems like excessive repetitions and changing of topics, the latest breakthrough nevertheless indicates that text production could at some time in the future experience the same fate as some other human skills. Namely, that AI would be superior at it, similar to how it bettered humans at complex skills such as playing Go or Doha.

This makes the implications of AI content generation in near future something worth exploring.

AI content production – parallels with Neural Machine Translation

To explore what may happen when improved AI models enter the content production industry we may turn to another field that has already undergone part of similar transition – translation services.

Translation was long thought to be impervious to automation due to complex nature of human language. This, however, did not stop the translation industry as being one of the first fields that looked to machines for the purpose of automation with the first major beginnings dating back to the 60s.

Despite gradual improvements in machine translation over the subsequent decades, an important major breakthrough was the recent success of so-called neural machine translation (NMT) systems which use neural nets and vector representations of words and replaced previous statistical methods. NMT systems are already used by Facebook, Microsoft for their translation products.

Even though NMT systems like those of Google and DeepL are often achieving human-like performance, even better results in practice and on tests like the BLEU benchmark are achieved by specialized NMT models trained on domain-specific data.

The emergence of AI text quality editors?

These developments have led to increasing use of NMT systems in translation services, whereby the texts are first translated by NMT programs and then checked by a human proof editor with work process usually integrated in CAT tools.

While the quality of AI content production systems is not yet high enough to be generally used in production environments, it is possible that eventual introduction will follow a similar path as in translation services. With content first produced by AI models and subsequently checked by human “content quality editors”.

Implications for Digital Marketing and SEO practitioners

High-quality content is consistently quoted as one of the major ranking factors of search engines with content important subject of Penguin and Hummingbird Google Updates.

AI content generation models could be exploited for generation of spammy content and indeed, this was one of the reasons why OpenAI withheld the release of their full GPT-2 model. However such mass-produced unaltered texts can be easily detected and MIT-IBM Watson AI lab and HarvardNLP already introduced a tool which detects texts based on GPT-2 model.

A greater potential could lie in using AI content generation texts as the first step, similar to NMT translations, with humans adding the final touch. For those business minded among you, there is an added benefit of using AI in that it can have almost zero costs and quick delivery.

Future AI programs may also genuinely surprise us with its quality, as it already happened before.

When observing the games of Go played by AI program AlphaGo, experienced Go players wondered how some patterns of game were so unique that they were not observed in 2000+ years of human history of Go playing.

Which leads us to wonder about potentials of AI content production.

What if the AI content generation models could once provide us with digital marketing texts that surpass those of humans?

What if the best copywriters at some point in the future could become AI programs?

Or maybe the best results could be achieved with the combination of humans working in tandem with AI programs.