Microservice Recipe: Text Parsing With NLP Compromise Library

Service Recipes > Text Parsing With NLP Compromise Library
28 Jun 2016

A basic example service for text retrieval, parsing and manipulation using NLP Compromise.

Eventn services execute inside a secure sandboxed JavaScript runtime that exposes not only core ECMAScript 2015 (ES6) JavaScript functionality but also a set of selected NPM modules.

One such supported module is "NLP Compromise". Described as "a cool way to use natural language in javascript", NLP Compromise is a lightweight NLP library supporting a variety of useful text parsing and manipulation functionality.

Use Case

This recipe provides an example of how to build a microservice to support collecting real-time conversation data, as well as functionality for fast data lookup and retrieval for NLP parsing.

Sample Data

Although any textual data set could be used, a sample SMS message data set was selected due to the interesting linguistic challenges posed by the shorthand nature of the content. The nlp-corpus project from nlp_compromise also provides "medium length texts for nlp integration tests". This includes a set of 55k SMS messages from the National University of Singapore. Here is a sample:

'Hey pple...$700 or $900 for 5 nights...Excellent location wif breakfast hamper!!!',
'Yun ah.the ubi one say if ü wan call by tomorrow.call 67441233 look for irene.ere only got bus8,22,65,61,66,382. Ubi cres,ubi tech park.6ph for 1st 5wkg days.èn',
'Hey tmr maybe can meet you at yck',
'Oh...i asked for fun. Haha...take care. ü',
'We are supposed to meet to discuss abt our trip... Thought xuhui told you? In the afternoon. Thought we can go for lesson after that',
't finish my film yet...',
'm having dinner with my cousin...',
'Oh... Kay... On sat right?',
'I need... Coz i never go before',

Given the data is provided in JS format already, loading in to a service store is trivial. Each record had a randomsession_idadded to represent a unique identifier provided by an upstream system for message retrial. The final format can be seen below:

{ records:
   [ { session_id: 'pkh2ngvt',
       msg: 'm walking in citylink now ü faster come down... Me very hungry...' },
     { session_id: 'pkchnnutv',
       msg: '5 nights...We nt staying at port step liao...Too ex' },
     { session_id: 'f3zhnnutt',
       msg: 'Hey pple...$700 or $900 for 5 nights...Excellent location wif breakfast hamper!!!' },
     { session_id: 'pka7n2utv',
       msg: 'Yun ah.the ubi one say if ü wan call by tomorrow.call 67441233 look for irene.ere only got bus8,22,65,61,66,382. Ubi cres,ubi tech park.6ph for 1st 5wkg days.èn' },
     { session_id: 'f3x7nngvv',
       msg: 'Hey tmr maybe can meet you at yck' } ] }

ExamplePOSTFunction

Eventn provides a POST function that is executed when an HTTP POST request is made. This is typically used for data collection. The Eventn architecture is highly scalable and capable of consuming hundreds of thousands of events per second with real-time analytics being available immediately. For this example, the records were batched in to bulks of 100 and submitted asynchronously.

As the JavaScript code below demonstrates, each record was simply saved to the service store. Any other ETL or NLP processes could of course also be preformed during the load (or collection) stage. For example, if the service was being built for a more analytical use cases, it is likely that key properties would be extracted in to their own fields.

function onPost(context, request) {
    const inserts = request.payload.records.map(function(record){
        return context.stores.default()
                .table()
                .insert({ data: JSON.stringify(record) });
    });
    return Promise.all(inserts);
}
module.exports = onPost;

Service Functionality

The basic idea of the service is that for a givensession_id, the record will be retrieved from the store and based on the specific request parameter, an example NLP action will be preformed on the text using the nlp_compromise API. The following examples were selected:

  • .root()- returns an 'aggressive-but-accurate' machine-readable reduction. Very useful for the SMS data given the slang, shorthand, bad punctuation etc.
  • .verbs()- extract Verbs. Useful for providing context in aggregate.
  • .to_past()- translate the text to past tense.
  • .to_present()- translate the text to present tense.
  • .to_future()- translate the text to future tense.
  • .tags()- Part-of-Speech tagging. Assigns the particular used grammar for each term.

ExampleGETFunction

Eventn provides a GET function that is executed when an HTTP GET request is made. This is typically used for analytics retrieval and with the context of the service functionality listed above, will provide the service logic for the NLP.

let nlp = require('nlp_compromise');

function onGet(context, request) { // Check we have a session_id as part of the request const session_id = request.query.session_id || false; if(!session_id) return "Missing session_id parameter"; // Select record based on session_id

return context.stores.default() .raw(`SELECT ts_created, data FROM ${context.id} WHERE data-> "$.session_id" = "${session_id}"`) .then(record => {
let d = JSON.parse(record[0][0].data).msg;
// use the "parser" query param switch(request.query.parser) { case "verbs": return nlp.text(d).verbs(); case "root": return nlp.text(d).root(); case "past": return nlp.sentence(d).to_past().text(); case "present": return nlp.sentence(d).to_present().text(); case "future": return nlp.sentence(d).to_future().text(); case "tags": return nlp.text(d).tags(); default: return d; } }); }
module.exports = onGet;

The Eventn web application includes a visual Service Editor for rapid service development and testing:

Microservice Editor

The example function above will expect two URL parameters as part of the request:

  • session_id- the unique identifier for the record for retrieval
  • parser- the NLP function to execute which can be one of "verbs", "root", "past", "present", "future", "tags". Providing noparserparameter will yield the default text value.

An example request would look like:https://service.eventn.com/SV_F39TA5UV?session_id=f38vsdpvt&parser=future

Example Results

Default

"Yup... I havent been there before... You want to go for the yoga? I can call up to book"

.root()

"yup i havent been there before you want to go for the yoga i can call up to book"

.present()

"Yup... I havent been there before... You wants to go for the yoga? I can call up to book"

.future()

"Yup... I havent been there before... You will want to go for the yoga? I can call up to book"

.past()

"Yup... I havent been there before... You wanted to go for the yoga? I could call up to book"

.tags()

[ [ "Expression", "Person", "Adjective", "Noun", "Conjunction", "Person", "Infinitive", "Preposition", "Infinitive", "Conjunction", "Determiner", "Noun" ], [ "Person", "Modal", "Noun" ] ]

.verbs

[ { "whitespace": { "preceding": "", "trailing": " " }, "text": "want", "normal": "want", "expansion": null, "reasoning": [ "lexicon_pass" ], "pos": { "Verb": true, "Infinitive": true }, "tag": "Infinitive" }, { "whitespace": { "preceding": "", "trailing": " " }, "text": "go", "normal": "go", "expansion": null, "reasoning": [ "lexicon_pass" ], "pos": { "Verb": true, "Infinitive": true }, "tag": "Infinitive" }, { "whitespace": { "preceding": "", "trailing": " " }, "text": "can", "normal": "can", "expansion": null, "reasoning": [ "lexicon_pass" ], "pos": { "Verb": true, "Modal": true }, "tag": "Modal" } ]

Get Started For Free

Get Started