2021-08-14

Fun project using batch and stream processing

Sharing fun project using several scenarios would be useful for whom interested in Apache Spark usages in real projects. Actually it depends on the projects main aim, in some cases it is used for various processes such as ETL/ELT , anomaly detection, ML, graph analysis, stream processing etc.

Batch and stream processing

 

In our project I have used Spark for both stream and batch processing in Scala within 5 separate cases.

  1. First case : For each product :

    • net sales price and amount
    • gross sales price and amount
    • average sales amount of last 5 selling days
    • top selling location
    • write results to json.

       2. Second case : Find top 10 sellers on net sales in daily period and get :

    • sales amount 
    • top selling category 
    • write results to json.

       3. Third case : Find price changes per each product : 

    • if price rised display as RISE, if falled then FALL, if not any changes then SAME 
    • write results to json.

       4. Fourth case : Per location within 10 minutes windowing find : 

    • count of distinct sellers 
    • count of products 
    • write result to Kafka topic.

       5. Fifth case : Per each product : 

    • how many times is viewed together with other products 
    • write result to Elasticsearch.

To access codes, visit my Github repo
Most probably, the repo will be arranged periodically and improvement (code developments, new cases, data flow diagrams etc.) will be applied on it. Stay tuned.

No comments:

Post a Comment