25 October 2018
#explore is the online magazine of the TÜV NORD GROUP. A six-member team looks at themes relating to digitalisation, mobility, virtual reality, industry 4.0, security testing, innovation, networking, risk analysis, quality assurance, artificial intelligence and IT. To date, 102 articles have been published in English and German under four headings. Among the most widely read articles are “What is the difference between safety and security?”, “Security comes first in the Safari Park” and “the range and practicality of electric cars have improved significantly”. 34 terms from A to Z have so far been explained by the glossary. #explore has been online for 1,051,200 minutes. And is now two years old. Happy birthday!
We didn’t write this birthday greeting ourselves. A computer wrote it for us. To be more precise, a program for automatic text generation. Such systems are now being used more and more often: US News Agency AP, for instance, uses software which generates about 4,000 standardised sports and financial reports each quarter. FOCUS Online has been posting automatically generated weather news for all the major cities and regions in Germany every day for the last two years. Since May 2018, the news portal has also been getting its financial news from machines. And German newspapers the Stuttgarter Zeitung and the Stuttgarter Nachrichten publish an automatically generated particulates radar map every day.
“Fact-based routine texts” are one area in which machines can keep up with human writers - and far outperform them when it comes to the speed of generation, explains Sebastian Golly, head of text generation at Berlin-based company Retresco. These are texts whose information content is based on large quantities of data and which use fixed forms and structures. Just like sports news or weather reports. But also product descriptions or real estate sales particulars for online portals.
Thousands of match reports in one single weekend
The idea is for machines not to replace human journalists or text composers but instead to relieve them of the burden of routine tasks. They are also supposed to help editors and publishers to expand their range without high additional costs and thereby to attract new target groups. For instance, machines can be left to write reports about football matches from the lower leagues which editors would not otherwise have to capacity to cover. “Machines can easily write texts for 70,000 matches on a single weekend and, for example, also cover youth leagues which might have only a few hundred followers,” Golly reports. Automatic text generation is also intended to make it possible to tailor reports to individual readers: The stock exchange report is generated at the moment the user accesses the site - based on current data and the user’s interests.
“Machines can easily write texts for 70,000 matches on a single weekend and, for example, also cover youth leagues which might have only a few hundred followers.”
At the same time, the robotic text generators aren’t miracle machines which you can feed with a few bits of data and which then spit out a perfect text. If they are to have any idea of what a football report is all about, they first of all have to be trained. One possible way is machine learning: “If there’s already a large collection of texts and data on which these texts are based, then we can use an automated learning mechanism to instruct the system in formulations that match particular data constellations,” explains computer linguist Golly. If there aren’t yet the large numbers of sample texts and associated data needed for a specific kind of text, some manual work is required first. For this purpose, the computer linguists and developers from Retresco work with sports journalists or meteorologists. “In collaboration with these experts, my colleagues teach the system the formulations and define the conditions under which these formulations will be appropriate,” explains Golly.
Journalism school for computers
Manual intervention was necessary also for our #explore birthday text. First of all, we collated all the data that was supposed to appear in the text in an Excel spreadsheet which we then uploaded into a Retresco text generation system. We then started off by writing our own text. We formulated statement sentences and told the system at which point it should access which data to insert them into the sentence. This allowed the system to make causal connections between statements. In addition, we created sentence variations and defined synonyms for individual terms. We then clicked on the button, and, lo and behold, the text was generated. It does, of course, sound as it did when we wrote it down in advance. Because our little test doesn’t really do the system full justice. After all, automatic text generation isn’t designed to generate a single original text. Its function is instead to use data and predefined text modules to produce varied, grammatically correct and readable texts concerning recurring events such as football matches.
But if machines are to have the slightest idea of what to say and when, they need human support. The challenge for the computer linguists and developers at Retresco, then, is this: Sports journalists have a feeling for which formulation fits where and when. “If we’re going to be able to teach a computer, we need to formalise this feeling and translate it into specific rules,” Golly explains. If, for instance, it should emerge from the data that a goal was struck from some distance, the system can use the formulation which says that player XY “hammered the ball in between the posts”. “And if we know that the player had come off the bench shortly beforehand, we can refer to him as the joker in the pack,” adds Sebastian Golly. If such detailed information is missing, the text engine can draw on more neutral formulations accordingly.
© iStockThe first "robot message" has been published by the Los Angeles Times in 2014. The computer generated text informed about a small earthquake.
Teaching text robots to see
The robot journalists are thus already capable of using colourful formulations. But the world of sense perception has so far always been a closed book to them: here, a human eye is still needed. “We generate live property particulars for online real estate portals, for instance - here, all the facts concerning this property have already been couched in glowing language and written down. But things that can’t be derived from the data, such as the pretty brick facade, for example, are added in by the human advertiser,” Golly explains. The idea is for computers to increasingly learn to see. Or, to be more precise, to derive information from photos and videos. The experts from Retresco are working on a research project to this end. “If we generate product descriptions for shoes, for example, we rely on standard data such as the colour or type of the shoe. But if the heel height isn’t included in this information, it’s possible to derive it from the pictures,” Golly says.
Opinions remain a human prerogative
However, to expect a computer to provide emotional or cryptic descriptions of places or situations such as those looked for in current affairs reports, portraits or literature is going to remain an unreasonable demand. “Artificial intelligence is the perfection of in-the-box thinking. As soon as it gets out of the box, things get difficult,” says Sebastian Golly. What this means in practice is that, when it comes to routine jobs, robot journalists are hard to beat, but commentaries, opinion, creativity and the classification of events in political, economic and sporting contexts do not form part of their skill set. “As soon as they’re asked not just to describe events in the world but also to name their causes, such systems come up against their limitations,” Golly states. Text-generating computers are just as incapable of identifying the reasons for the collapse of shares in a company as they are of assessing the significance of unforeseen events. Controversial referee decisions or an injury to a central key player with the consequence that he won’t be able to take part in an imminent cup match won’t feature in the robotic texts. To date, computers have rarely succeeded in assigning the appropriate weighting to incidental and key information.
And yet, as a study of the Ludwig-Maximilians Universität (LMU) in Munich has revealed, the robot texts still go down pretty well with readers. In this study, the 986 respondents assigned particularly high levels of credibility to the computer-generated written texts. A result which the researchers attribute to the high fact and data density of the generated texts.
"We have to disclose the data used in the generation of a particular text and reveal how the statements have been derived from the data.”
Fake news at your fingertips?
Computer-generated texts are not, however, automatically objective and neutral. "It would be technically possible without much fuss to generate some football lyrics that would present the home team in a better light,” Sebastian Golly concedes. You could, for instance, basically get the computer to write in terms of an “undeserved defeat”. If robot texts appear to be particularly credible on the one hand but can present events from a one-sided or distorted perspective on the other, doesn’t this just open the floodgates to abuse? Couldn’t you use computers as tireless fake news generators to send thousands of false reports out into the world every day? To counter this danger, Golly and his colleagues work with “AlgorithmWatch”, an initiative that relies on ethical and transparent algorithms. Together they are seeking to develop criteria and standards and “in the best-case scenario, a label for good automatic text generation,” Golly says. If this endeavour is to succeed, it will be crucial to ensure that their own algorithms work transparently: "We have to disclose the data used in the generation of a particular text and reveal how the statements have been derived from the data,” Golly stresses. This will help us know in the future which robotic text generators we can trust.