Natural Language Processing 101 : The technology behind ‘automatic fake news’ !

While the ramifications of ‘fake news’ have been more than evident in not so distant past, the combination of man and machine could take things to a different level altogether. In such context, it is essential to understand on a high level, how a machine generates fake news; while the ‘human’ form is something which we usually see in action nowadays.

While, ‘true’ NLP (Natural Language Processing, the branch of AI which is essentially behind fake news generation) is far away in distant future, it has four essential techniques, currently under deployment, including for those who try to generate fake news automatically.

a) Distributional Semantics: Words have meaning with the context they are used and that’s how they are related too. These types of algorithms try to find the relationship between words by finding patterns of their use in conjunction and the frequency. Once the model learns the relationships with a vast amount of data, it can generate automatically sentences, paragraphs or even articles, automatically. But this does not mean, that for humans the text generated would be entirely meaningful, because machine is just using the association between words to generate it, and not ‘meaning’.

b) Frame Semantics: Sentences consists of parts, which we call in English grammar “Parts Of Speech”. Such algorithms break down sentences in such parts. The ‘4Ws’ are ‘Who’,’What’,’Where’ and ‘When’. While in first type of algorithms (a), the algorithm does not understand the text, in this case, it can make out the difference between the various constituents of a sentence. These are good for very simple sentence only however. Usually the voice assistants use this.

c) Model-theoretical Semantics: Language was created to share knowledge. This algorithm works on logical deductions, based on ‘first principle’ that all human knowledge can be deduced on the basis of logical rules. For Ex, to find out which is the most populous city in India; it can be logically deduced to find the most populous city in each state (including UTs) and then find out the most populous among them. This is however, no longer use much, because of large number of exceptions it can throw. This is almost as same as making a model of human knowledge and hence difficult to put in to production.

d) Grounded Semantics: Language was essentially created to accomplish tasks. When we share knowledge, there is an inherent ‘action’ associated with it, which needs to be accomplished and hence the particular ‘knowledge’ is required. In this approach, the model learns on it’s own by conversation and the interaction which follows, based on tasks. The model starts with a blank slate and as commands are issued and it is demonstrated how to execute the commands, the model starts learning and soon it does not need human help to execute the commands. While these types of algorithms are closest to real language understanding, they cannot comprehend still, all the verbs and phrases through illustration alone, as many tasks are too complex to simplify in simple phrases.

[Source: Technology Review]

Leave a Reply