[{"data":1,"prerenderedAt":837},["ShallowReactive",2],{"blog:2015:time-window-events-with-apache-spark-streaming":3,"blogMore-Development":823,"comments-time-window-events-with-apache-spark-streaming":836},{"id":4,"title":5,"body":6,"category":807,"commentCount":40,"date":808,"description":12,"excerpt":809,"extension":810,"filenames":811,"hidden":812,"image":811,"meta":813,"minutes":113,"navigation":116,"path":814,"seo":815,"showCategory":811,"stem":816,"tags":817,"updated":811,"url":820,"wordCount":821,"__hash__":822},"content\u002Fblog\u002F2015\u002Ftime-window-events-with-apache-spark-streaming.md","Time window events with Apache Spark Streaming",{"type":7,"value":8,"toc":801},"minimark",[9,13,16,21,27,247,251,259,262,272,275,279,282,295,306,376,382,390,399,403,409,788,791,797],[10,11,12],"p",{},"If you’re working with Spark Streaming, you might run into an interesting problem if you want to output an event based on multiple messages within a specific time period.",[10,14,15],{},"For example, I want to send a security alert if I see 10 DDOS attempts to an IP address in a five-minute window.",[17,18,20],"h2",{"id":19},"groupbykeyandwindow","groupByKeyAndWindow",[10,22,23,26],{},[24,25,20],"code",{}," allows us to choose the IP address for the key and 5 minutes for the window. If we wanted to subsequently collect the sourceIp and the timestamp, it looks like this:",[28,29,34],"pre",{"className":30,"code":31,"language":32,"meta":33,"style":33},"language-scala shiki shiki-themes everforest-light dracula","var messageLimit = 10\nvar messageWindow = Minutes(5)\nval scc = new StreamingContext(conf, Minutes(1))\n\n\u002F\u002F ... setup Kafka consumer via SparkUtils\nkafkaConsumer\n    .flatMap(parseSecurityMessage)\n    .filter(m => m.securityType == 'DDOS')\n    .map(m => m.targetIp -> Seq((m.timestamp, m.sourceIp)))\n    .reduceByKeyAndWindow({(x, y) => x ++ y}, messageWindow)\n    .filter(g => g._2.length >= messageLimit)\n    .foreachRDD(m => m.foreach(createAlertEvent))\n\nscc.start()\nscc.streamUntilTerminated()\n","scala","",[24,35,36,57,80,111,118,125,131,137,165,185,202,219,230,235,241],{"__ignoreMap":33},[37,38,41,45,49,53],"span",{"class":39,"line":40},"line",1,[37,42,44],{"class":43},"smiwp","var",[37,46,48],{"class":47},"s6Vpi"," messageLimit ",[37,50,52],{"class":51},"s9HRq","=",[37,54,56],{"class":55},"s3Ipq"," 10\n",[37,58,60,62,65,67,71,74,77],{"class":39,"line":59},2,[37,61,44],{"class":43},[37,63,64],{"class":47}," messageWindow ",[37,66,52],{"class":51},[37,68,70],{"class":69},"saSZQ"," Minutes",[37,72,73],{"class":47},"(",[37,75,76],{"class":55},"5",[37,78,79],{"class":47},")\n",[37,81,83,86,89,91,94,97,100,103,105,108],{"class":39,"line":82},3,[37,84,85],{"class":51},"val",[37,87,88],{"class":47}," scc ",[37,90,52],{"class":51},[37,92,93],{"class":43}," new",[37,95,96],{"class":69}," StreamingContext",[37,98,99],{"class":47},"(conf, ",[37,101,102],{"class":69},"Minutes",[37,104,73],{"class":47},[37,106,107],{"class":55},"1",[37,109,110],{"class":47},"))\n",[37,112,114],{"class":39,"line":113},4,[37,115,117],{"emptyLinePlaceholder":116},true,"\n",[37,119,121],{"class":39,"line":120},5,[37,122,124],{"class":123},"sSX4p","\u002F\u002F ... setup Kafka consumer via SparkUtils\n",[37,126,128],{"class":39,"line":127},6,[37,129,130],{"class":47},"kafkaConsumer\n",[37,132,134],{"class":39,"line":133},7,[37,135,136],{"class":47},"    .flatMap(parseSecurityMessage)\n",[37,138,140,143,146,149,152,156,160,163],{"class":39,"line":139},8,[37,141,142],{"class":47},"    .filter(m ",[37,144,145],{"class":51},"=>",[37,147,148],{"class":47}," m.securityType ",[37,150,151],{"class":51},"==",[37,153,155],{"class":154},"s2G2r"," '",[37,157,159],{"class":158},"sMoiV","DDOS",[37,161,162],{"class":154},"'",[37,164,79],{"class":47},[37,166,168,171,173,176,179,182],{"class":39,"line":167},9,[37,169,170],{"class":47},"    .map(m ",[37,172,145],{"class":51},[37,174,175],{"class":47}," m.targetIp ",[37,177,178],{"class":51},"->",[37,180,181],{"class":69}," Seq",[37,183,184],{"class":47},"((m.timestamp, m.sourceIp)))\n",[37,186,188,191,193,196,199],{"class":39,"line":187},10,[37,189,190],{"class":47},"    .reduceByKeyAndWindow({(x, y) ",[37,192,145],{"class":51},[37,194,195],{"class":47}," x ",[37,197,198],{"class":51},"++",[37,200,201],{"class":47}," y}, messageWindow)\n",[37,203,205,208,210,213,216],{"class":39,"line":204},11,[37,206,207],{"class":47},"    .filter(g ",[37,209,145],{"class":51},[37,211,212],{"class":47}," g._2.length ",[37,214,215],{"class":51},">=",[37,217,218],{"class":47}," messageLimit)\n",[37,220,222,225,227],{"class":39,"line":221},12,[37,223,224],{"class":47},"    .foreachRDD(m ",[37,226,145],{"class":51},[37,228,229],{"class":47}," m.foreach(createAlertEvent))\n",[37,231,233],{"class":39,"line":232},13,[37,234,117],{"emptyLinePlaceholder":116},[37,236,238],{"class":39,"line":237},14,[37,239,240],{"class":47},"scc.start()\n",[37,242,244],{"class":39,"line":243},15,[37,245,246],{"class":47},"scc.streamUntilTerminated()\n",[17,248,250],{"id":249},"problem","Problem",[10,252,253,254,258],{},"The problem is ",[255,256,257],"strong",{},"your event fires many times as the stateless RDD is re-run every batch period",".",[10,260,261],{},"The simplest solution would be to make the batch interval the same as your message window size, but that causes more problems, namely:",[263,264,265,269],"ul",{},[266,267,268],"li",{},"Your job can’t perform any other triggers on the source data at a shorter interval",[266,270,271],{},"You won’t know about these alerts until some time after they happen (in this case 5 minutes)",[10,273,274],{},"External would be terrible, and neither Spark counters nor globals are much use here.",[17,276,278],{"id":277},"solution","Solution",[10,280,281],{},"We need to do two things:",[283,284,285,292],"ol",{},[266,286,287,288,291],{},"Stop the RDD re-running and instead use the streaming state. We can do this by using the ",[24,289,290],{},"reduceByKeyAndWindow"," overload that allows us to specify the inverse function for removing data as it goes out of the window.",[266,293,294],{},"Introduce a small amount of in-RDD state used to identify when the event is clear and when it should fire again.",[10,296,297,298,301,302,305],{},"Let us assume we have a class to handle part 2 named ",[24,299,300],{},"WindowEventTrigger"," that provides add and remove methods and a boolean ",[24,303,304],{},"triggerNow"," flag that identifies when the event should re-fire. Our RDD body would now look like this:",[28,307,309],{"className":30,"code":308,"language":32,"meta":33,"style":33},"kafkaConsumer\n    .flatMap(parseSecurityMessage)\n    .filter(m => m.securityType == 'DDOS')\n    .map(m => m.targetIp -> WindowEventTrigger(Seq(m.timestamp, m.sourceIp), messageLimit))\n    .reduceByKeyAndWindow(_ add _, _ remove _, messageWindow)\n    .filter(_._2.triggerNow)\n    .foreachRDD(m => m.foreach(createAlertEvent))\n",[24,310,311,315,319,337,358,363,368],{"__ignoreMap":33},[37,312,313],{"class":39,"line":40},[37,314,130],{"class":47},[37,316,317],{"class":39,"line":59},[37,318,136],{"class":47},[37,320,321,323,325,327,329,331,333,335],{"class":39,"line":82},[37,322,142],{"class":47},[37,324,145],{"class":51},[37,326,148],{"class":47},[37,328,151],{"class":51},[37,330,155],{"class":154},[37,332,159],{"class":158},[37,334,162],{"class":154},[37,336,79],{"class":47},[37,338,339,341,343,345,347,350,352,355],{"class":39,"line":113},[37,340,170],{"class":47},[37,342,145],{"class":51},[37,344,175],{"class":47},[37,346,178],{"class":51},[37,348,349],{"class":69}," WindowEventTrigger",[37,351,73],{"class":47},[37,353,354],{"class":69},"Seq",[37,356,357],{"class":47},"(m.timestamp, m.sourceIp), messageLimit))\n",[37,359,360],{"class":39,"line":120},[37,361,362],{"class":47},"    .reduceByKeyAndWindow(_ add _, _ remove _, messageWindow)\n",[37,364,365],{"class":39,"line":127},[37,366,367],{"class":47},"    .filter(_._2.triggerNow)\n",[37,369,370,372,374],{"class":39,"line":133},[37,371,224],{"class":47},[37,373,145],{"class":51},[37,375,229],{"class":47},[10,377,378,379,381],{},"How this works is simple. We have a case class called ",[24,380,300],{}," that we map into the stream for each incoming message. It then:",[283,383,384,387],{},[266,385,386],{},"Tracks incoming messages - if it hits the level, sets the flag, and notes the event",[266,388,389],{},"Tracks outgoing messages - and resets when the event that caused the trigger leaves the window",[391,392,393],"blockquote",{},[10,394,395,396,398],{},"By switching to the in-memory ",[24,397,20],{},", Spark needs to persist state in case executors go down or it is necessary to shuffle data between them. Ensure your SparkStreamingContext object has a checkpoint folder set to reliable storage like HDFS.",[17,400,402],{"id":401},"windoweventtrigger-class","WindowEventTrigger class",[10,404,405,406,408],{},"Here is the ",[24,407,300],{}," class for your utilisation.",[28,410,412],{"className":30,"code":411,"language":32,"meta":33,"style":33},"case class WindowEventTrigger[T] private(eventsInWindow: Seq[T], triggerNow: Boolean, private val lastTriggeredEvent: Option[T], private val triggerLevel: Int) {\n  def this(item: T, triggerLevel: Int) = this(Seq(item), false, None, triggerLevel)\n\n  def add(incoming: WindowEventTrigger[T]): WindowEventTrigger[T] = {\n    val combined = eventsInWindow ++ incoming.eventsInWindow\n    val shouldTrigger = lastTriggeredEvent.isEmpty && combined.length >= triggerLevel\n    val triggeredEvent = if (shouldTrigger) combined.seq.drop(triggerLevel - 1).headOption else lastTriggeredEvent\n    new WindowEventTrigger(combined, shouldTrigger, triggeredEvent, triggerLevel)\n  }\n\n  def remove(outgoing: WindowEventTrigger[T]): WindowEventTrigger[T] = {\n    val reduced = eventsInWindow.filterNot(y => outgoing.eventsInWindow.contains(y))\n    val triggeredEvent = if (lastTriggeredEvent.isDefined && outgoing.eventsInWindow.contains(lastTriggeredEvent.get)) None else lastTriggeredEvent\n    new WindowEventTrigger(reduced, false, triggeredEvent, triggerLevel)\n  }\n}\n",[24,413,414,499,554,558,596,614,637,667,677,682,686,722,739,764,778,782],{"__ignoreMap":33},[37,415,416,419,422,424,427,430,433,436,438,442,445,447,449,451,454,456,458,461,464,466,469,472,475,478,480,482,484,486,488,491,493,496],{"class":39,"line":40},[37,417,418],{"class":43},"case",[37,420,421],{"class":43}," class",[37,423,349],{"class":69},[37,425,426],{"class":47},"[",[37,428,429],{"class":69},"T",[37,431,432],{"class":47},"] ",[37,434,435],{"class":51},"private",[37,437,73],{"class":47},[37,439,441],{"class":440},"s7cAX","eventsInWindow",[37,443,444],{"class":47},": ",[37,446,354],{"class":69},[37,448,426],{"class":47},[37,450,429],{"class":69},[37,452,453],{"class":47},"], ",[37,455,304],{"class":440},[37,457,444],{"class":47},[37,459,460],{"class":69},"Boolean",[37,462,463],{"class":47},", ",[37,465,435],{"class":51},[37,467,468],{"class":51}," val",[37,470,471],{"class":47}," lastTriggeredEvent",[37,473,474],{"class":51},":",[37,476,477],{"class":69}," Option",[37,479,426],{"class":47},[37,481,429],{"class":69},[37,483,453],{"class":47},[37,485,435],{"class":51},[37,487,468],{"class":51},[37,489,490],{"class":47}," triggerLevel",[37,492,474],{"class":51},[37,494,495],{"class":69}," Int",[37,497,498],{"class":47},") {\n",[37,500,501,504,508,510,513,515,517,519,522,524,527,530,532,535,537,539,542,546,548,551],{"class":39,"line":59},[37,502,503],{"class":43},"  def",[37,505,507],{"class":506},"sS4Kt"," this",[37,509,73],{"class":47},[37,511,512],{"class":440},"item",[37,514,444],{"class":47},[37,516,429],{"class":69},[37,518,463],{"class":47},[37,520,521],{"class":440},"triggerLevel",[37,523,444],{"class":47},[37,525,526],{"class":69},"Int",[37,528,529],{"class":47},") ",[37,531,52],{"class":51},[37,533,507],{"class":534},"stJs5",[37,536,73],{"class":47},[37,538,354],{"class":69},[37,540,541],{"class":47},"(item), ",[37,543,545],{"class":544},"sRn2c","false",[37,547,463],{"class":47},[37,549,550],{"class":69},"None",[37,552,553],{"class":47},", triggerLevel)\n",[37,555,556],{"class":39,"line":82},[37,557,117],{"emptyLinePlaceholder":116},[37,559,560,562,565,567,570,572,574,576,578,581,583,585,587,589,591,593],{"class":39,"line":113},[37,561,503],{"class":43},[37,563,564],{"class":506}," add",[37,566,73],{"class":47},[37,568,569],{"class":440},"incoming",[37,571,444],{"class":47},[37,573,300],{"class":69},[37,575,426],{"class":47},[37,577,429],{"class":69},[37,579,580],{"class":47},"])",[37,582,474],{"class":51},[37,584,349],{"class":69},[37,586,426],{"class":47},[37,588,429],{"class":69},[37,590,432],{"class":47},[37,592,52],{"class":51},[37,594,595],{"class":47}," {\n",[37,597,598,601,604,606,609,611],{"class":39,"line":120},[37,599,600],{"class":51},"    val",[37,602,603],{"class":47}," combined ",[37,605,52],{"class":51},[37,607,608],{"class":47}," eventsInWindow ",[37,610,198],{"class":51},[37,612,613],{"class":47}," incoming.eventsInWindow\n",[37,615,616,618,621,623,626,629,632,634],{"class":39,"line":127},[37,617,600],{"class":51},[37,619,620],{"class":47}," shouldTrigger ",[37,622,52],{"class":51},[37,624,625],{"class":47}," lastTriggeredEvent.isEmpty ",[37,627,628],{"class":51},"&&",[37,630,631],{"class":47}," combined.length ",[37,633,215],{"class":51},[37,635,636],{"class":47}," triggerLevel\n",[37,638,639,641,644,646,649,652,655,658,661,664],{"class":39,"line":133},[37,640,600],{"class":51},[37,642,643],{"class":47}," triggeredEvent ",[37,645,52],{"class":51},[37,647,648],{"class":43}," if",[37,650,651],{"class":47}," (shouldTrigger) combined.seq.drop(triggerLevel ",[37,653,654],{"class":51},"-",[37,656,657],{"class":55}," 1",[37,659,660],{"class":47},").headOption ",[37,662,663],{"class":43},"else",[37,665,666],{"class":47}," lastTriggeredEvent\n",[37,668,669,672,674],{"class":39,"line":139},[37,670,671],{"class":43},"    new",[37,673,349],{"class":69},[37,675,676],{"class":47},"(combined, shouldTrigger, triggeredEvent, triggerLevel)\n",[37,678,679],{"class":39,"line":167},[37,680,681],{"class":47},"  }\n",[37,683,684],{"class":39,"line":187},[37,685,117],{"emptyLinePlaceholder":116},[37,687,688,690,693,695,698,700,702,704,706,708,710,712,714,716,718,720],{"class":39,"line":204},[37,689,503],{"class":43},[37,691,692],{"class":506}," remove",[37,694,73],{"class":47},[37,696,697],{"class":440},"outgoing",[37,699,444],{"class":47},[37,701,300],{"class":69},[37,703,426],{"class":47},[37,705,429],{"class":69},[37,707,580],{"class":47},[37,709,474],{"class":51},[37,711,349],{"class":69},[37,713,426],{"class":47},[37,715,429],{"class":69},[37,717,432],{"class":47},[37,719,52],{"class":51},[37,721,595],{"class":47},[37,723,724,726,729,731,734,736],{"class":39,"line":221},[37,725,600],{"class":51},[37,727,728],{"class":47}," reduced ",[37,730,52],{"class":51},[37,732,733],{"class":47}," eventsInWindow.filterNot(y ",[37,735,145],{"class":51},[37,737,738],{"class":47}," outgoing.eventsInWindow.contains(y))\n",[37,740,741,743,745,747,749,752,754,757,759,762],{"class":39,"line":232},[37,742,600],{"class":51},[37,744,643],{"class":47},[37,746,52],{"class":51},[37,748,648],{"class":43},[37,750,751],{"class":47}," (lastTriggeredEvent.isDefined ",[37,753,628],{"class":51},[37,755,756],{"class":47}," outgoing.eventsInWindow.contains(lastTriggeredEvent.get)) ",[37,758,550],{"class":69},[37,760,761],{"class":43}," else",[37,763,666],{"class":47},[37,765,766,768,770,773,775],{"class":39,"line":237},[37,767,671],{"class":43},[37,769,349],{"class":69},[37,771,772],{"class":47},"(reduced, ",[37,774,545],{"class":544},[37,776,777],{"class":47},", triggeredEvent, triggerLevel)\n",[37,779,780],{"class":39,"line":243},[37,781,681],{"class":47},[37,783,785],{"class":39,"line":784},16,[37,786,787],{"class":47},"}\n",[10,789,790],{},"Happy streaming,",[10,792,793],{},[794,795,796],"em",{},"[)amien",[798,799,800],"style",{},"html pre.shiki code .smiwp, html code.shiki .smiwp{--shiki-default:#F85552;--shiki-dark:#FF79C6}html pre.shiki code .s6Vpi, html code.shiki .s6Vpi{--shiki-default:#5C6A72;--shiki-dark:#F8F8F2}html pre.shiki code .s9HRq, html code.shiki .s9HRq{--shiki-default:#F57D26;--shiki-dark:#FF79C6}html pre.shiki code .s3Ipq, html code.shiki .s3Ipq{--shiki-default:#DF69BA;--shiki-dark:#BD93F9}html pre.shiki code .saSZQ, html code.shiki .saSZQ{--shiki-default:#DFA000;--shiki-dark:#8BE9FD}html pre.shiki code .sSX4p, html code.shiki .sSX4p{--shiki-default:#939F91;--shiki-default-font-style:italic;--shiki-dark:#6272A4;--shiki-dark-font-style:inherit}html pre.shiki code .s2G2r, html code.shiki .s2G2r{--shiki-default:#5C6A72;--shiki-dark:#BD93F9}html pre.shiki code .sMoiV, html code.shiki .sMoiV{--shiki-default:#DFA000;--shiki-default-font-style:inherit;--shiki-default-text-decoration:inherit;--shiki-dark:#FF5555;--shiki-dark-font-style:italic;--shiki-dark-text-decoration:underline}html .default .shiki span {color: var(--shiki-default);background: var(--shiki-default-bg);font-style: var(--shiki-default-font-style);font-weight: var(--shiki-default-font-weight);text-decoration: var(--shiki-default-text-decoration);}html .shiki span {color: var(--shiki-default);background: var(--shiki-default-bg);font-style: var(--shiki-default-font-style);font-weight: var(--shiki-default-font-weight);text-decoration: var(--shiki-default-text-decoration);}html .dark .shiki span {color: var(--shiki-dark);background: var(--shiki-dark-bg);font-style: var(--shiki-dark-font-style);font-weight: var(--shiki-dark-font-weight);text-decoration: var(--shiki-dark-text-decoration);}html.dark .shiki span {color: var(--shiki-dark);background: var(--shiki-dark-bg);font-style: var(--shiki-dark-font-style);font-weight: var(--shiki-dark-font-weight);text-decoration: var(--shiki-dark-text-decoration);}html pre.shiki code .s7cAX, html code.shiki .s7cAX{--shiki-default:#5C6A72;--shiki-default-font-style:inherit;--shiki-dark:#FFB86C;--shiki-dark-font-style:italic}html pre.shiki code .sS4Kt, html code.shiki .sS4Kt{--shiki-default:#8DA101;--shiki-dark:#50FA7B}html pre.shiki code .stJs5, html code.shiki .stJs5{--shiki-default:#5C6A72;--shiki-default-font-style:inherit;--shiki-dark:#BD93F9;--shiki-dark-font-style:italic}html pre.shiki code .sRn2c, html code.shiki .sRn2c{--shiki-default:#3A94C5;--shiki-dark:#BD93F9}",{"title":33,"searchDepth":59,"depth":59,"links":802},[803,804,805,806],{"id":19,"depth":59,"text":20},{"id":249,"depth":59,"text":250},{"id":277,"depth":59,"text":278},{"id":401,"depth":59,"text":402},"Development","2015-06-27T12:46:25+00:00","[object Object]","md",null,false,{},"\u002Fblog\u002F2015\u002Ftime-window-events-with-apache-spark-streaming",{"title":5,"description":12},"blog\u002F2015\u002Ftime-window-events-with-apache-spark-streaming",[818,819],"Apache Spark","Scala","\u002Fblog\u002F2015\u002Ftime-window-events-with-apache-spark-streaming\u002F",729,"Qb3XwySdIVIB_-R97fTSeQlhbKltvij1_0kbhxHqCrg",[824,828,832],{"title":825,"date":826,"url":827},"Transactions in the MongoDB EF Core Provider","2025-10-25","\u002Fblog\u002F2025\u002Fmongodb-explicit-transactions\u002F",{"title":829,"date":830,"url":831},"Queryable Encryption with the MongoDB EF Core Provider","2025-09-22","\u002Fblog\u002F2025\u002Fmongodb-queryable-encryption\u002F",{"title":833,"date":834,"url":835},"Lazy Loading with EF Core Proxies","2025-04-02","\u002Fblog\u002F2025\u002Fef-proxies\u002F",[],1780900526310]