# Hotspot analysis (using G*)¶

## Problem definition¶

• We have a map, in this case a map of land surface temperatures
• We want to find hotspots, i.e., areas on map that are significantly different from their surrounding area.
• We want to use Getis-Ord G* statistic as the tool for finding the hotspots
• We want to parallelize the computation in our Spark cluster.

## Hotspot analysis using geotrellis¶

In this section we show a simplified version of the hotspot analysis. We use the Geotrellis library to achieve the parallelization. Some assumptions are:

• we use 2-dimenational data (only the spatial part without the time component)
• we store our data as a layer of tiles in geotrellis catalog (distributed raster)
• our hotspot analysis uses the standard G* with variable window

First, we need to express the G* formula in terms of the map algebra operations.

## Scala code¶

Full source code can be found in our github repository: biggis-project/biggis-landuse

// typical type definition used by geotrellis
type SpatialRDD = RDD[(SpatialKey, Tile)]

def getisord(rdd: SpatialRDD, weightMatrix: Kernel,
globMean:Double, globStdev:Double, numPixels:Long): SpatialRDD = {

val wcells: Array[Double] = weightMatrix.tile.toArrayDouble
val sumW: Double = wcells.sum
val sumW2: Double = wcells.map(x => x * x).sum

// variables used in the getis-ord formula
val A: Double = globalMean * sumW
val B: Double = globalStdev * Math.sqrt((numPixels * sumW2 - sumW * sumW) / (numPixels - 1))

rdd.withContext {
_.bufferTiles(weightMatrix.extent)
.mapValues { tileWithCtx =>
tileWithCtx.tile
.focalSum(weightMatrix, Some(tileWithCtx.targetArea)) // focal op.
.mapDouble { x => (x - A) / B } // local op.
}
}
}


Let's assume, we already have the following variables:

• layerReader: helper class to query tiles from geotrellis catalog/layer,
• layerId: ID of the raster layer used as input raster,
• kernelRadius: size of the weight matrix (how many pixels)
// RDD (distributed dataset from Apache Spark) representing all tiles in the layer
val queryResult: SpatialRDD =

// here, we use a circular kernel as a weight matrix

The result outRdd is an RDD (distributed dataset from Apache Spark) that can be further processed or stored as a new layer in geotrellis catalog.