In-class Exercise 6

Author

Ho Zi Jun

Published

September 30, 2024

Modified

September 30, 2024

6 Overview

Emerging Hot Spot Analysis (EHSA) is a spatio-temporal analysis method for revealing and describing how hot spot and cold spot areas evolve over time. The analysis consist of four main steps:

Building the space-time cube ,
Usind data to perform Getis-Ord local Gi* statistic for each bin by using an FDR correction,
Evaluating hot and cold spot trends by using Mann-Kendall trend test,
Categorising each study area location based on the z-score and p-value for each location with data, and with the hot spot z-score and p-value for each bin. Sieving away those that do not conform to the significance level.

6.1 Getting started

6.1.1 Installing and Loading the R Packages

pacman::p_load(sf, sfdep, tmap, plotly, tidyverse)

6.2 The Data

Hunan, a geospatial data set in ESRI shapefile format, and
Hunan_GDPPC, an attribute data set in csv format.

6.3 Importing geospatial data

In the code chunk below, st_read() of sf package is used to import Hunan shapefile into R.

hunan <- st_read(dsn = "data/geospatial", 
                 layer = "Hunan")

Reading layer `Hunan' from data source 
  `C:\zjho008\ISSS626-GAA\In-class_Ex\In-class_Ex06\data\geospatial' 
  using driver `ESRI Shapefile'
Simple feature collection with 88 features and 7 fields
Geometry type: POLYGON
Dimension:     XY
Bounding box:  xmin: 108.7831 ymin: 24.6342 xmax: 114.2544 ymax: 30.12812
Geodetic CRS:  WGS 84

6.4 Importing attribute table

In the code chunk below, read_csv() of readr is used to import Hunan_GDPPC.csv into R.

GDPPC <- read_csv("data/aspatial/Hunan_GDPPC.csv")

Rows: 1496 Columns: 3
── Column specification ────────────────────────────────────────────────────────
Delimiter: ","
chr (1): County
dbl (2): Year, GDPPC

ℹ Use `spec()` to retrieve the full column specification for this data.
ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.

6.5 Creating a Time Series Cube

spacetime and spacetime cubes illustrates the basic concept of spatio-temporal cube and its implementation in sfdep package.

Spacetime cube is useful for fixed administrative boundary, planning area, planing subzone etc but not applicable for dynamic space events such as forest areas, flooding for instance.

In the code chunk below, spacetime() of sfdep is used to create a spatio-temporal cube.

GDPPC_st <- spacetime(GDPPC, hunan,  # two data files: spatial and attribute
                      .loc_col = "County", # indicating which field is spatial
                      .time_col = "Year") # indicating which field is the attribute

Note

Original time/date field cannot be used as it is in continuous form Hence, date has to be converted to integer or to drop away the time to have a continuous Day/Month/Year indicators.

Next, is_spacetime_cube() of sfdep package which will be used to verify if GDPPC_st is indeed a space-time cube object.

is_spacetime_cube(GDPPC_st)

[1] TRUE

The TRUE return confirms that GDPPC_st object is indeed an time-space cube.

6.6 Computing Gi*

In this section we will compute the local Gi* statistics.

GDPPC_nb <- GDPPC_st %>%
  activate("geometry") %>% # to use the geometric layer and exclude the attributes; this line is needed before computing the weight matrix
  
  mutate(nb = include_self(st_contiguity(geometry)), # include_self function 
    
# parsing tp calculate the spatial weight - using mutate to attain the two columns 
         wt = st_inverse_distance(nb, geometry,
                                  scale = 1, 
                                  alpha = 1), # extra parameters to emphasise distance decay
         .before = 1) %>%
  set_nbs("nb") %>% # for the data to be arranged in time-sequence
  set_wts("wt")

! Polygon provided. Using point on surface.

Warning: There was 1 warning in `stopifnot()`.
ℹ In argument: `wt = st_inverse_distance(nb, geometry, scale = 1, alpha = 1)`.
Caused by warning in `st_point_on_surface.sfc()`:
! st_point_on_surface may not give correct results for longitude/latitude data

Sorting should not be done after time-space cube is calculated

Note that this dataset now has neighbours and weights for each time-slice.

Using head() function

head(GDPPC_nb)

spacetime ────

Context:`data`

88 locations `County`

17 time periods `Year`

── data context ────────────────────────────────────────────────────────────────

# A tibble: 6 × 5
   Year County  GDPPC nb        wt       
  <dbl> <chr>   <dbl> <list>    <list>   
1  2005 Anxiang  8184 <int [6]> <dbl [6]>
2  2005 Hanshou  6560 <int [6]> <dbl [6]>
3  2005 Jinshi   9956 <int [5]> <dbl [5]>
4  2005 Li       8394 <int [5]> <dbl [5]>
5  2005 Linli    8850 <int [5]> <dbl [5]>
6  2005 Shimen   9244 <int [6]> <dbl [6]>

6.7 Computing Gi*

Now to utilise th new columns to manually calculate the local Gi* for each location. We can do this by grouping by Year and using local_gstar_perm() of sfdep package.

After which, we use unnest() to unnest gi_star column of the newly created gi_starts data.frame.

gi_stars <- GDPPC_nb %>% 
  group_by(Year) %>% 
  mutate(gi_star = local_gstar_perm(
    GDPPC, nb, wt)) %>% 
  tidyr::unnest(gi_star)

6.8 Mann-Kendall Test

To perform confirmatory analysis whether there is a monotonic (meaning there is no trend) or no monotonic trend

With Gi* measures calculated the next step is to evaluate each location for a trend using the Mann-Kendall test. The code chunk below uses the Changsha county.

cbg <- gi_stars %>% 
  ungroup() %>% # since it is a 'cube' to filter away the other county
  filter(County == "Changsha") |> 
  select(County, Year, gi_star)

Plotting the result by using ggplot2 functions.

ggplot(data = cbg, 
       aes(x = Year, 
           y = gi_star)) +
  geom_line() +
  theme_light()

From the plot, we can are unable to interpret much as it is static.

6.8.1 Interacitve Mann-Kendall Plot

Creating an interactive plot by using ggplotly() of plotly package.

p <- ggplot(data = cbg, 
       aes(x = Year, 
           y = gi_star)) +
  geom_line() +
  theme_light()

ggplotly(p)

For such a test it is advisable to have at least 10 years of data.

6.8.2 Mann-Kendall Test

Reject the null-hypothesis null if the p-value is smaller than the alpha value (i.e. 1-confidence level)

6.8.3 Printing Mann-Kendall Test Report

Kendall package is a special package to run this calculation

cbg %>%
  summarise(mk = list(
    unclass(
      Kendall::MannKendall(gi_star)))) %>% 
  tidyr::unnest_wider(mk)  # to generate the report

# A tibble: 1 × 5
    tau      sl     S     D  varS
  <dbl>   <dbl> <dbl> <dbl> <dbl>
1 0.485 0.00742    66  136.  589.

In the above result, sl is the p-value. This result tells us that there is a slight upward but insignificant trend.

To attain the p-values for some of which are closer or further away from one.

strong close to 1

6.8.4 Mann-Kendall test data.frame

We can replicate this for each location by using group_by() of dplyr package.

ehsa <- gi_stars %>%
  group_by(County) %>%
  summarise(mk = list(
    unclass(
      Kendall::MannKendall(gi_star)))) %>%
  tidyr::unnest_wider(mk)

head(ehsa)

# A tibble: 6 × 6
  County        tau        sl     S     D  varS
  <chr>       <dbl>     <dbl> <dbl> <dbl> <dbl>
1 Anhua      0.191  0.303        26  136.  589.
2 Anren     -0.294  0.108       -40  136.  589.
3 Anxiang    0      1             0  136.  589.
4 Baojing   -0.691  0.000128    -94  136.  589.
5 Chaling   -0.0882 0.650       -12  136.  589.
6 Changning -0.750  0.0000318  -102  136.  589.

6.8.5 Mann-Kendall test data.frame

We can also sort to show significant emerging hot/cold spots

emerging <- ehsa %>% 
  arrange(sl, abs(tau)) %>% 
  slice(1:10)
head(emerging)

# A tibble: 6 × 6
  County        tau         sl     S     D  varS
  <chr>       <dbl>      <dbl> <dbl> <dbl> <dbl>
1 Shuangfeng  0.868 0.00000143   118  136.  589.
2 Xiangtan    0.868 0.00000143   118  136.  589.
3 Xiangxiang  0.868 0.00000143   118  136.  589.
4 Chengbu    -0.824 0.00000482  -112  136.  589.
5 Dongan     -0.824 0.00000482  -112  136.  589.
6 Wugang     -0.809 0.00000712  -110  136.  589.

6.9 Performing Emerging Hotspot Analysis (To confirm on the classification )

Lastly, we will perform EHSA analysis by using emerging_hotspot_analysis() of sfdep package.

It takes a spacetime object x (i.e. GDPPC_st), and the quoted name of the variable of interest (i.e. GDPPC) for .var argument. The k argument is used to specify the number of time lags which is set to 1 by default. Lastly, nsim map numbers of simulation to be performed.

ehsa <- emerging_hotspot_analysis(
  x = GDPPC_st, 
  .var = "GDPPC", 
  k = 1, 
  nsim = 99 #no of simulations is 100
)

6.9.1 Visualising the distribution of EHSA classes

In the code chunk below, ggplot2 functions are used to reveal the distribution of EHSA classes using a bar chart.

ggplot(data = ehsa,
       aes(x = classification)) +
  geom_bar()

The bar chart above shows that sporadic cold spots class has the highest number of counties.

Note that the p-value is calculated here and some of them are not statistically significant despite the representation of the bar chart.

6.9.2 Visualising EHSA

In this section, it illustrates how to visualise the geographic distribution EHSA classes. However, before we can do so, we need to join both hunan and ehsa together by using the code chunk below.

hunan_ehsa <- hunan %>%
  left_join(ehsa,
            by = join_by(County == location))

tmap functions are used to plot a categorical choropleth map by using the code chunk below:

Code

ehsa_sig <- hunan_ehsa  %>%
  filter(p_value < 0.05)
tmap_mode("plot")

tmap mode set to plotting

Code

tm_shape(hunan_ehsa) +
  tm_polygons() +
  tm_borders(alpha = 0.5) +
tm_shape(ehsa_sig) +
  tm_fill("classification") + 
  tm_borders(alpha = 0.4)

Warning: One tm layer group has duplicated layer types, which are omitted. To
draw multiple layers of the same type, use multiple layer groups (i.e. specify
tm_shape prior to each of them).

We can backtrack to cbg whether it is an oscillating hotspot and compare with the chart.