Ingest Stopword index pipeline

Stopwords are easily generated using the NiFi pipeline.

Stopword Index Field Mapping From Data Specification

​​The following diagram illustrates the Stopword indexing pipeline implemented in Apache NiFi. The flow consists of mainly three stages:
  1. Generate a Stopword dictionary document for Elastic Search based on the input Stopword per language.
  2. (IF POST) Extract current stopwords ​in the product index dictionary, and add them to the document generated in stage one.
  3. Update Product's language specific dictionaries with the Stopword ​document generated from Stage one and Stage two.
Initial
PUT or POST REST Call: http://<Hostname>:30700/connectors/JsonStopword/data
{​​​

​"stopwords": {

​ "english": {

"stopwords": ["step1", "car"]

​},​

"french": {

"stopwords​": ["step2", "dark"]

​​}

}

​}​​​
Stage 1. Generate Stopword dictionary document​
The following dataflow describes how the language specific Stopword data can be transformed using the ​CreateStopwordBodyPart1​ Groovy script.​
Output:
{
    ​"analysis" : {
        "filter" : {
            "custom_english_stopwords_dictionary" : {
                "stopwords": ["step1", "car"]
                "type" : "stop"
            },
            "custom_french_stopwords_dictionary" : {
                "stopwords​": ["step2", "dark"]​​
                "type" : "stop"
            ​}
        ​}
    }
}​
Stage 2. ​​​(IF POST) Extract current s​topwords in the product index dictionary, and add them to the generated document
The following dataflow shows that when the user makes a POST* request, the following steps take place:
  1. A GET call is made to get the current Stopword Dictionaries per language from the product index.
  2. The language specific Stopword​ data from (stage 1) will be transformed using the ​CreateStopwordBodyPart2​ Groovy script, to merge the data with document generated from Stage 1.

*Else the user will make a PUT request, which will not add the current Language Specified Stopword Dictionaries in the index to the document from Stage 1.

Step 2 Output​:
​{
    "analysis" : {
        "filter" : {
            "custom_english_stopwords_dictionary" : {
                "stopwords": ["the","step1", "car"]
                "type" : "stop"
            },
            "custom_french_stopwords_dictionary" : {
                "stopwords​": ["step2", "dark"]​​
                "type" : "stop"
            ​}
        ​}
    }
}​
Stage 3. Update the Product's language specific dictionaries with the stopword documen​t generated.
The following dataflow decribes the process of updating (Overwriting) the Language Specific Dictionary with the previously generated documentation through the following steps:
  1. Close Product index
  2. Update Product index
  3. Open Product index