Introduction

1 File Operations

1.1 Import GIS

There are multiple formats. This example uses a “shapefile”. It would be more accurate to refer to them as “shapefiles” because it is comprised of a directory of files. This shapefile was downloaded from the U.S. Census Cartographic Boundary Files. This is the “state” file and its scale is 1:20m.

layer <- "cb_2023_us_state_20m"
dsn <- here::here("data/cb_2023_us_state_20m")
us <- sf::st_read(dsn = dsn, layer = layer)
#> Reading layer `cb_2023_us_state_20m' from data source 
#>   `/Users/rkw/Dropbox/coding/rproj/mapswR/data/cb_2023_us_state_20m' 
#>   using driver `ESRI Shapefile'
#> Simple feature collection with 52 features and 9 fields
#> Geometry type: MULTIPOLYGON
#> Dimension:     XY
#> Bounding box:  xmin: -179 ymin: 17.9 xmax: 180 ymax: 71.4
#> Geodetic CRS:  NAD83
results <- capture.output(sf::st_read(dsn = dsn, layer = layer))

Careful reading of the above output tells you important information about the file. For the sake of speed, it may be tempting to skip this crucial information. The better advice is to review it carefully.

Note

A thorough understanding of import options can save a programmer a lot of time later in an analysis. The st_read() function contains many arguments allowing for flexibility on import. Note that all geometries are converted to the “Multi” variety. For example, polygons become multipolygons. If this is undesired behavior, then set promote_to_multi = F.

1.2 Export GIS

if(!dir.exists("./data/us")) dir.create(path = "./data/us", recursive = TRUE)
dsn <- here::here("data/us/us.shp")
sf::st_write(us, dsn = dsn, append=FALSE)
#> Deleting layer `us' using driver `ESRI Shapefile'
#> Writing layer `us' to data source 
#>   `/Users/rkw/Dropbox/coding/rproj/mapswR/data/us/us.shp' using driver `ESRI Shapefile'
#> Writing 52 features with 9 fields and geometry type Multi Polygon.

1.3 Get Info

Within the Rstudio viewer, the file behaves much like any dataframe or tibble. That feeling is intentional. The author’s of the sf package have worked hard to preserve a dplyr like workflow so that programmers don’t have to learn new skills. However, this is not your typical dataframe!

1.3.1 Class

The object is both a dataframe and of the “sf” class.

class(us)
#> [1] "sf"         "data.frame"

1.3.2 Attributes

attributes(us)
#> $names
#>  [1] "STATEFP"  "STATENS"  "GEOIDFQ"  "GEOID"    "STUSPS"   "NAME"    
#>  [7] "LSAD"     "ALAND"    "AWATER"   "geometry"
#> 
#> $row.names
#>  [1]  1  2  3  4  5  6  7  8  9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25
#> [26] 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50
#> [51] 51 52
#> 
#> $class
#> [1] "sf"         "data.frame"
#> 
#> $sf_column
#> [1] "geometry"
#> 
#> $agr
#> STATEFP STATENS GEOIDFQ   GEOID  STUSPS    NAME    LSAD   ALAND  AWATER 
#>    <NA>    <NA>    <NA>    <NA>    <NA>    <NA>    <NA>    <NA>    <NA> 
#> Levels: constant aggregate identity

1.3.3 Dependencies

Software dependencies can cause lots of headaches when incompatiable or out dated. You can see the complementary software along with the versions by running the command below.

sf::sf_extSoftVersion()
#>           GEOS           GDAL         proj.4 GDAL_with_GEOS     USE_PROJ_H 
#>       "3.11.0"        "3.5.3"        "9.1.0"         "true"         "true" 
#>           PROJ 
#>        "9.1.0"

1.3.4 CRS

A Coordinate Reference System (CRS) tells the computer how the coordinates were calculated. This is a topic of importance, but generally it should just work. Don’t change it unless you’re confident in what you’re doing.

sf::st_crs(us)[[1]]
#> [1] "NAD83"

“NAD83” is an abbreviation for the “North American Datum of 1983”.

#|label: basic-crs-2
cat(sf::st_crs(us)[[2]], sep = "\n")
#> GEOGCRS["NAD83",
#>     DATUM["North American Datum 1983",
#>         ELLIPSOID["GRS 1980",6378137,298.257222101,
#>             LENGTHUNIT["metre",1]]],
#>     PRIMEM["Greenwich",0,
#>         ANGLEUNIT["degree",0.0174532925199433]],
#>     CS[ellipsoidal,2],
#>         AXIS["latitude",north,
#>             ORDER[1],
#>             ANGLEUNIT["degree",0.0174532925199433]],
#>         AXIS["longitude",east,
#>             ORDER[2],
#>             ANGLEUNIT["degree",0.0174532925199433]],
#>     ID["EPSG",4269]]

1.3.5 Bounding Box

A “bounding box” gives the range (minimum and maximum) along the two axes. The coordinates were printed to the console when the object was imported initially, but can be accessed via st_bbox.

sf::st_bbox(us)
#>   xmin   ymin   xmax   ymax 
#> -179.2   17.9  179.8   71.4

1.3.6 Geometry Column

The spatial portion of the dataframe is stored in a column within the dataframe. Usually, this column is named “geometry” and can be accessed via object$geometry. However, a more conventional way is the following:

sf::st_geometry(us)
#> Geometry set for 52 features 
#> Geometry type: MULTIPOLYGON
#> Dimension:     XY
#> Bounding box:  xmin: -179 ymin: 17.9 xmax: 180 ymax: 71.4
#> Geodetic CRS:  NAD83
#> First 5 geometries:

1.3.7 Validity

A simple way to check if the geometries were imported correctly is to run the following command:

all(sf::st_is_valid(us))
#> [1] TRUE

Note that 52 rows of the dataframe generated a “TRUE” response, meaning that they are all valid. In the event of a “FALSE”, the function sf::st_make_valid might fix the problem.

1.3.8 Plot

Many authorities recommend doing a quick plot. A quick plot will help discover if the correct file was downloaded or if there are any additional problems with the data.

For a quick plot, my default command is:

plot(us["NAME"], axes = T)

The map looks to be the correct map and there were no errors when plotting. Looking at the 52 names in the NAME field reveal Washington D.C and Puerto Rico are included. There are no additional states at the 150E 55N location. This suggests that some of the Aleutian Islands in Alaska have breached a division within the map. Here’s a re-plot where the longitudinal center of the map is shifted -90 degrees. The sf package also includes the st_shift_longitude function for a similar result.

us["NAME"] |> st_break_antimeridian(lon_0 = -90) |> 
     st_transform('+proj=longlat +ellps=WGS84 +datum=WGS84 +no_defs +lon_0=-90') |> 
    plot(axes = T)
Figure 1: Map of United States with longitude rotated -90 degrees.

2 Geometry Creation

The workflow to creating an sf class object is: sfg >> sfc >> sf. An sf class object contains both geometries and data called attributes.

2.1 SFG

The sfg class represents simple feature geometries. There are 17 possible geometry types. We will discuss LINE, POINT, POLYGON/MULTIPOLYGON and only in 2-dimensional space. All features are built upon points.

2.1.1 Point

(pt <- sf::st_point(c(0, 0)))
class(pt)
#> [1] "XY"    "POINT" "sfg"

2.1.2 Line

(line <- sf::st_linestring(rbind(c(0, 0), c(1, 0))))
class(line)
#> [1] "XY"         "LINESTRING" "sfg"

2.1.3 Polygon

Note that the final point is a repeat of the first point in order to close the polygon.

(plgn <- sf::st_polygon(list(rbind(c(0,0), c(0,1), c(1,1), c(1,0), c(0,0)))))
class(plgn)
#> [1] "XY"      "POLYGON" "sfg"

2.1.4 Plot

Figure 2: Examples of simple feature geometries.

2.2 SFC

Next a geometry is included in a simple feature column. A simple feature column is a column-list of sfg objects. It might be a little confusing because in the example below there is only a single sfg object that is then converted to a list-column.

beijing_sfg <- sf::st_point(c(116.4, 39.9))
class(beijing_sfg)
#> [1] "XY"    "POINT" "sfg"
(beijing_sfc <- sf::st_sfc(list(beijing_sfg),crs = "WGS84"))
#> Geometry set for 1 feature 
#> Geometry type: POINT
#> Dimension:     XY
#> Bounding box:  xmin: 116 ymin: 39.9 xmax: 116 ymax: 39.9
#> Geodetic CRS:  WGS 84
class(beijing_sfc)
#> [1] "sfc_POINT" "sfc"

2.3 SF

city <- "Beijing"
country <- "China"
(beijing_sf <- sf::st_sf(city = city,
                    country = country,
                    beijing_sfc))
#> Simple feature collection with 1 feature and 2 fields
#> Geometry type: POINT
#> Dimension:     XY
#> Bounding box:  xmin: 116 ymin: 39.9 xmax: 116 ymax: 39.9
#> Geodetic CRS:  WGS 84
#>      city country      beijing_sfc
#> 1 Beijing   China POINT (116 39.9)

2.4 Plot

chn <- rnaturalearth::ne_countries(country = "China")
plot(chn$geometry, axes = T)
plot(beijing_sfc, add = T, cex = 2, col = "steelblue", pch = 16)
Figure 3: Map of China with Beijing shown as blue point.

3 CRS

3.1 Coordinate Reference Systems

  • The earth is not a perfectly round sphere. Matter of fact, if one measures the circumference of the earth between the North and South poles, the measurement would be 11.5 kilometers shorter than the circumference of the equator. The earth is an ellipsoid rather than a sphere. An accurate Coordinate Reference System must account for these nuances.

  • Coordinate Reference System (CRS), which defines how the spatial elements of the data relate to the surface of the Earth (or other bodies). CRSs are either geographic or projected. (Lovelace et al., 2019; Moraga, 2023)

  • Geographic CRS use latitude and longitude; projected CRS use Cartesian coordinates to reference a two-dimensional or flat representation of the earth.

3.2 Geographic CRS

  • Geographic CRS use latitude and longitude to identify unique positions on the Earth’s surface. By way of reminder, the equator runs east and west, bisecting the Earth into the Northern and Southern hemispheres. It is halfway between the North and South poles. The equator’s latitude is equal to 0 degrees.

  • Latitude values are the angle values north or south of the equator (0 degrees) and range from –90 degrees at the south pole to 90 degrees at the north pole.

  • The prime meridian is an arbitrarily-chosen meridian running North and South, dividing the Earth into Eastern and Western hemispheres.Earth’s current international standard prime meridian is the International IERS Reference Meridian. The current standard differs from the previous Greenwich Meridian.

  • Longitude values range from –180 degrees when running west to 180 degrees when running east

  • The latitude and longitude values where the Equator and Prime Meridean insersect is (0, 0).

  • Latitude and longitude coordinates may be expressed in degrees, minutes, and seconds, or in decimal degrees. In decimal form, northern latitudes are positive, and southern latitudes are negative. Also, eastern longitudes are positive, and western longitudes are negative.

Format Latitude Longitude

  • “Taken together, the geodetic datum (e.g, WGS84), the type of map projection (e.g., Mercator) and the parameters of the projection (e.g., location of the origin) specify a coordinate reference system, or CRS, a complete set of assumptions used to translate the latitude and longitude information into a two-dimensional map.” (hadley)
#obtain CRS info from object 
sf::st_crs(us_map)
  • The WKT format will usually start with a PROJCRS[…] tag for a projected coordinate system, or a GEOGCRS[…] tag for a geographic coordinate system. The CRS output will also consist of a user defined CS definition which can be an EPSG code, or a string defining the datum and projection type. In the past, a coordinate system could be defined by its EPSG numeric code or the PROJ4 formatted string. Now, an sf object’s coordinate system can also be designated with Well Known Text (WTK/WTK2) format. “When possible, adopt an EPSG code which comes from a well established authority. However, if customizing a coordinate system, it may be easiest to adopt a PROJ4 syntax.”(Gimond, 2023)

3.3 Projections

  • All projected CRSs are based on a geographic CRS. A projection takes a three-dimensional object, the earth, and reduces it to a two-dimensional object. The conversion always results in deformation.

  • There are three types of projections: conic, cylindrical, and planar (azimuthal). Conic projections are appropriate for maps of mid-latitude areas. Cylindrical projections are used when displaying the entire world. (The “Mercator” map, by Gerardus Mercator back in 1569, is an example). A planar projection is typically used in mapping polar regions.

  • “Projections are often named based on a property they preserve: equal-area preserves area, azimuthal preserve direction, equidistant preserve distance, and conformal preserve local shape.”(Lovelace et al., 2019)

“you will have to make choices about what distortions you are prepared to accept when drawing a map. This is the job of the map projection.”

“North American Datum” (NAD83) is a good choice, whereas if your perspective is global the “World Geodetic System” (WGS84) is probably better.

When selecting geographic CRSs, the answer is often WGS84. It is used not only for web mapping, but also because GPS datasets and thousands of raster and vector datasets are provided in this CRS by default. WGS84 is the most common CRS in the world, so it is worth knowing its EPSG code: 4326. Lovelace.

# to see list of available projections
sf_proj_info(type = "proj")

Upon import, most objects will have a CRS already set. The challenge is to know what CRS you’re working with and for it to match in all layers of the map. Occassionally, a need to “transform” the CRS will arise. We’ll use the United States GIS data from the U.S Census bureau.

3.4 Query

3.4.1 User Input

The user input element can contain the AUTHORITY:CODE representation (e.g., EPSG:4326), the CRS’s name (e.g., WGS 84), or the proj-string definition (e.g., “+proj=lcc +lat_1=20 +lat_2=60 lon_0=-96 +datum=NAD83 +units=m”).

sf::st_crs(us)[[1]]
#> [1] "NAD83"
sf::st_crs(us)$IsGeographic # geographic or projected?
#> [1] TRUE
sf::st_crs(us)$units_gdal # meters or degrees?
#> [1] "degree"
sf::st_crs(us)$proj4string # proj string
#> [1] "+proj=longlat +datum=NAD83 +no_defs"

3.4.2 WKT

“WKT” is an abbreviation for “well known text string”.

cat(sf::st_crs(us)[[2]], sep = "\n")
#> GEOGCRS["NAD83",
#>     DATUM["North American Datum 1983",
#>         ELLIPSOID["GRS 1980",6378137,298.257222101,
#>             LENGTHUNIT["metre",1]]],
#>     PRIMEM["Greenwich",0,
#>         ANGLEUNIT["degree",0.0174532925199433]],
#>     CS[ellipsoidal,2],
#>         AXIS["latitude",north,
#>             ORDER[1],
#>             ANGLEUNIT["degree",0.0174532925199433]],
#>         AXIS["longitude",east,
#>             ORDER[2],
#>             ANGLEUNIT["degree",0.0174532925199433]],
#>     ID["EPSG",4269]]

3.5 Transform

Use the CRS that is best for your work. However, keep this in mind prior to conversion:

When selecting geographic CRSs, the answer is often WGS84. It is used not only for web mapping, but also because GPS datasets and thousands of raster and vector datasets are provided in this CRS by default. WGS84 is the most common CRS in the world, so it is worth knowing its EPSG code: 4326. This ‘magic number’ can be used to convert objects with unusual projected CRSs into something that is widely understood.(Lovelace et al., 2019)

#|label: basics-crs-transform
us_wgs84 <- sf::st_transform(us, "EPSG:4326")
sf::st_crs(us_wgs84)
#> Coordinate Reference System:
#>   User input: EPSG:4326 
#>   wkt:
#> GEOGCRS["WGS 84",
#>     ENSEMBLE["World Geodetic System 1984 ensemble",
#>         MEMBER["World Geodetic System 1984 (Transit)"],
#>         MEMBER["World Geodetic System 1984 (G730)"],
#>         MEMBER["World Geodetic System 1984 (G873)"],
#>         MEMBER["World Geodetic System 1984 (G1150)"],
#>         MEMBER["World Geodetic System 1984 (G1674)"],
#>         MEMBER["World Geodetic System 1984 (G1762)"],
#>         MEMBER["World Geodetic System 1984 (G2139)"],
#>         ELLIPSOID["WGS 84",6378137,298.257223563,
#>             LENGTHUNIT["metre",1]],
#>         ENSEMBLEACCURACY[2.0]],
#>     PRIMEM["Greenwich",0,
#>         ANGLEUNIT["degree",0.0174532925199433]],
#>     CS[ellipsoidal,2],
#>         AXIS["geodetic latitude (Lat)",north,
#>             ORDER[1],
#>             ANGLEUNIT["degree",0.0174532925199433]],
#>         AXIS["geodetic longitude (Lon)",east,
#>             ORDER[2],
#>             ANGLEUNIT["degree",0.0174532925199433]],
#>     USAGE[
#>         SCOPE["Horizontal component of 3D system."],
#>         AREA["World."],
#>         BBOX[-90,-180,90,180]],
#>     ID["EPSG",4326]]
Finding CRS

Finding the right CRS for your project can be a challenge. In researching this project, I discovered the crsuggest package, available here, and this website.

A helpful package is the crsuggestpackage which can be installed from CRAN: install.packages("crsuggest"). An example would be New Zealand. When exported from the rnaturalearth package, it is in the common “WGS 84” format. Because of its proximity to the South Pole, this can cause distortions and a better CRS may be available.

rnaturalearth::ne_states(country = "New Zealand") |> 
    crsuggest::suggest_crs()
#> # A tibble: 10 × 6
#>   crs_code crs_name                         crs_type crs_gcs crs_units crs_proj4
#>   <chr>    <chr>                            <chr>      <dbl> <chr>     <chr>    
#> 1 3851     NZGD2000 / NZCS2000              project…    4167 m         +proj=lc…
#> 2 9191     WGS 84 / NIWA Albers             project…    4326 m         +proj=ae…
#> 3 3994     WGS 84 / Mercator 41             project…    4326 m         +proj=me…
#> 4 3857     WGS 84 / Pseudo-Mercator         project…    4326 m         +proj=me…
#> 5 6932     WGS 84 / NSIDC EASE-Grid 2.0 So… project…    4326 m         +proj=la…
#> # ℹ 5 more rows

4 Operations - Attributes

This section deals with typical dataframe manipulations like subsetting, filtering and creating new columns with vector datasets. Let’s continue to use the us dataframe of class sf. I prefer to work in a pipline using the dplyr package so the base package equivalents will be omitted.

4.1 Drop Geometry

The original us dataframe has 10 columns, one of which is a geometry list column. Once dropped, the new dataframe has 9 columns.

#basics-attr-drop
setequal(ncol(sf::st_drop_geometry(us)), ncol(us))
#> [1] FALSE

4.2 Filter

us |> 
    dplyr::filter(NAME %in% c("Texas", "California"))
#> Simple feature collection with 2 features and 9 fields
#> Geometry type: MULTIPOLYGON
#> Dimension:     XY
#> Bounding box:  xmin: -124 ymin: 25.8 xmax: -93.5 ymax: 42
#> Geodetic CRS:  NAD83
#>   STATEFP  STATENS     GEOIDFQ GEOID STUSPS       NAME LSAD    ALAND   AWATER
#> 1      48 01779801 0400000US48    48     TX      Texas   00 6.77e+11 1.90e+10
#> 2      06 01779778 0400000US06    06     CA California   00 4.04e+11 2.03e+10
#>                         geometry
#> 1 MULTIPOLYGON (((-107 31.9, ...
#> 2 MULTIPOLYGON (((-119 33.5, ...

4.3 Select

us |> 
    sf::st_drop_geometry() |> 
    dplyr::filter(NAME %in% c("Texas", "California")) |> 
    dplyr::select(NAME, ALAND)
#>         NAME    ALAND
#> 1      Texas 6.77e+11
#> 2 California 4.04e+11

4.4 Slice

us |> 
    sf::st_drop_geometry() |> 
    dplyr::select(NAME, ALAND) |> 
    dplyr::arrange(dplyr::desc(ALAND)) |> 
    dplyr::slice_head(n = 5)
#>         NAME    ALAND
#> 1     Alaska 1.48e+12
#> 2      Texas 6.77e+11
#> 3 California 4.04e+11
#> 4    Montana 3.77e+11
#> 5 New Mexico 3.14e+11

4.5 Summarize

This example divides states into big and small depending if their area exceeds the average. Then, it computes the average by big and small. The geometry column stays with the dataframe until droped in the last line.

library(dplyr)
library(magrittr)
us |> 
    mutate(avg_lnd = mean(ALAND), .after = ALAND) |> 
    mutate(typ_lnd = ifelse(ALAND > avg_lnd, "big", "small"), .after = avg_lnd) |>
    group_by(typ_lnd) |> 
    summarize(avg_lnd_typ = mean(ALAND)) |> 
    sf::st_drop_geometry()
#> # A tibble: 2 × 2
#>   typ_lnd   avg_lnd_typ
#> * <chr>           <dbl>
#> 1 big     335454262870.
#> 2 small    84437535114.

5 Operations - Geometric

5.1 Boundary

The st_boundary function will convert the geometry type from multipolygon to linestring.

swus <- us[us$NAME %in% c("Texas", "Oklahoma", "New Mexico"), "geometry"]
bndry_swus <- sf::st_boundary(us[us$NAME %in% c("Texas", "Oklahoma", "New Mexico"), ])
plot(swus$geometry, col = c("red", "steelblue", "yellow"), main = "Southwest US", axes = T)
plot(bndry_swus$geometry, add = T, lwd = 5, col = "grey20")
Figure 4: Map of New Mexico, Oklahoma, and Texas with border layer widened for emphasis.

5.2 Buffer

txs <- us[us$NAME == "Texas", "geometry"]
txs_buffer <- sf::st_buffer(txs, dist = 100000)
plot(txs_buffer$geometry, axes = T, main = "Texas")
plot(txs$geometry, add = T, col = "steelblue")
Figure 5: Map of Texas with black outline showing buffer.

5.3 Centroids

This function gets used frequently, especially for labels. The function returns the center, in the form of a point geometry, for a polygon. For labels, it works well for a polygon that’s symetrical along the xy axis like Nebraska.

nbrsk <- us[us$NAME == "Nebraska", "geometry"]
nbrsk_ctr <- st_centroid(nbrsk)
plot(nbrsk$geometry, axes = T, main = "Nebraska")
plot(nbrsk_ctr$geometry, add = T, cex = 2, pch = 16, col = "steelblue")
Figure 6: Map of Nebraska with centroid/point displayed in blue.

For states that are not symetrical, like Florida, it will place the label in a seemingly odd location.

flrd <- us[us$NAME == "Florida", "geometry"]
flrd_ctr <- st_centroid(flrd)
plot(flrd$geometry, axes = T, main = "Florida")
plot(flrd_ctr$geometry, add = T, cex = 2, pch = 16, col = "steelblue")
Figure 7: Map of Florida with centroid/point displayed in blue.

5.4 Convex Hull

plot(sf::st_convex_hull(us[us$NAME %in% c("Texas", "California", "Florida", "Michigan"), "NAME"]), key.pos = NULL, main = "CA, TX, FL and MI")
Figure 8: Map of California, Texas, Florida and Michigan displayed as convex hulls.

5.5 Simplification

Simplification can be appropriate when building a choropleth map of national boundaries, but not when deciding a which side of a boundary line a treasure is located. You should always simplify polygons to the point where they are recognizable and still perform the function intended. I usually simplify upon import so that plotting does not take excessive time. Reduced file sizes are a big benefit when pushing to repositories too.

Below is the United States Cartographic boundary file that we’ve been experimenting with. The file is subset to Alaska and then the bounding box was restried to the famous “Yukon” area with many small islands. Our original file was the smallest available at a scale of 1:20,000,000. Using the st_simplify function, the file is shrunk even further to the point where the details are becoming opaque.

Another great tool is the rmapshaper package and an accompanying website is offered that allows for experimentation.

alsk <- us[us$NAME == "Alaska", ]
alsk_smplfy <- sf::st_simplify(alsk, dTolerance = 5000)
par(mfrow=c(1,2))
par(mar=c(5,4,4,2))
plot(alsk$geometry,
     xlim = c(-137.985830, -130),
    ylim = c(54.8, 59.3),
    axes = T,
    main = "Original"
)
par(mar=c(5,4,4,2))
plot(alsk_smplfy$geometry,
     xlim = c(-137.985830, -130),
    ylim = c(54.8, 59.3),
    axes = T,
    main = "Simplified")
Figure 9: Map of Yukon, Alaska, showing loss of detail from simplification.

5.6 Affine Transformations

Affine transformations allow for shifting, rotating, and scaling. They can speed the generation of experimental objects to plot. This example is taken directly from the sf vignette: “3. Manipulating Simple Feature Geometries”.

(p = st_point(c(0,2))) # create point
p + 1
p + c(-5, -5)
p * p
(plgn <- st_polygon(list(rbind(c(0, 0), c(0, 1), c(1, 1), c(1, 0), c(0, 0)))))
(plgn_1 <- (plgn + c(4, 0)) * 2)
plgn_sfc <- st_sfc(plgn, plgn_1)
plgn_sf <- st_sf(data.frame(name = c("A", "B"), plgn_sfc))
class(plgn_sf)
#> [1] "sf"         "data.frame"
plot(plgn_sf)
Figure 10: Two polygons with “B” created via affine transfomation.

5.7 Clipping

The traditional set operations can be used in clipping geospatial data. They are x or y, the difference between x and y, the intersection of x and y and the union of x and y. The example in Geocomputation for R inspired this plot with the only change being the state of Georgia is used as the polygon.(Lovelace et al., 2019)

Figure 11: Illustration of two Georgias, the actual one and the one created after affine transformation.
knitr::include_graphics("../images/venn_georgia_all.png")
Figure 12: Illustration of the difference between two Georgias using spatial clipping.

The next plot shows how to subset to a point that is shared by the two Georgias.

Code
ga <-
    us |> 
    dplyr::filter(NAME == "Georgia") |>
    dplyr::select(geometry)

# pull geometry only
ga_orgnl <- sf::st_geometry(ga)
# affine transformation
ga_affn  <- ga_orgnl * 1.03
# reset CRS
ga_affn <- sf::st_set_crs(ga_affn, value = "EPSG:4269")

# convert back to sf and combine for bbox
ga_orgnl <- sf::st_sf(name = "original", geometry = ga_orgnl)
ga_affn <- sf::st_sf(name = "transformed", geometry = ga_affn)

# combine
ga <- dplyr::bind_rows(ga_affn, ga_orgnl)

#subset to point within intersection
x <- ga[ga$name == "original", ]
y <- ga[ga$name == "transformed", ]
bb = st_bbox(st_union(x, y))
box = st_as_sfc(bb)
x_and_y = st_intersection(x, y)
set.seed(2024)
p = st_sample(x = box, size = 10)
p_xy1 = p[x_and_y]

plot.new()
par(mar = c(0, 0, 0, 0))
plot(box, border = "grey", lty = 2)
plot(x$geometry, add = TRUE, border = "grey", lwd = 2)
plot(y$geometry, add = TRUE, border = "grey", lwd = 2)
plot(p, add = TRUE, cex = 2, pch = 20)
plot(p_xy1, cex = 5, col = "red", add = TRUE)
text(x = c(-86, -82.8), y = 32.8, labels = c("x", "y"), cex = 2)

Points subset to those, in this case one, shared by the actual Georgia and the transformed Georgia.

Points subset to those, in this case one, shared by the actual Georgia and the transformed Georgia.

5.8 Union by Grouping

The first example uses simple points, while this examples uses a map of the US and simulated data.

us_main <- 
    us |> 
    dplyr::filter(!(NAME %in% c("Alaska", "District of Columbia",
                                "Hawaii", "Puerto Rico"))) |>
    dplyr::select(NAME, geometry) |>
    dplyr::mutate(var_cont = rnorm(48))

us_df <-
    us_main %>%
    st_centroid() %>%
    st_coordinates(ctr) %>%
    as_tibble() %>%
    rename(lon = X, lat = Y) %>%
    {bind_cols(us_main, .)} %>%
    mutate(avg_lon = mean(lon)) %>%
    mutate(coast = ifelse(lon < avg_lon, "West", "East")) |>
    mutate(coast = factor(coast)) |>
    group_by(coast) |>
    summarise(var_cont = sum(var_cont))

par(mfrow = c(1, 2))
par(mar=c(5,4,4,2))
plot(us_df["var_cont"], main = "Continous Variable")

Illustration of the continental US being grouped by a factor variable. Note that the internal state boundaries have dissolved.

Illustration of the continental US being grouped by a factor variable. Note that the internal state boundaries have dissolved.
par(mar=c(5,4,4,2))
plot(us_df["coast"], main = "Factor Variable")

Illustration of the continental US being grouped by a factor variable. Note that the internal state boundaries have dissolved.

Illustration of the continental US being grouped by a factor variable. Note that the internal state boundaries have dissolved.

6 Measurement

6.1 Area

sf::st_area(us[us$NAME %in% c("Alaska", "Rhode Island"), ])
#> Units: [m^2]
#> [1] 3.20e+09 1.55e+12

6.2 Distance

sf::st_distance(us[us$NAME %in% c("Alaska", "Rhode Island", "New Jersey"), ])
#> Units: [m]
#>         [,1]    [,2]    [,3]
#> [1,]       0 4370984  173573
#> [2,] 4370984       0 4236239
#> [3,]  173573 4236239       0

6.3 Perimeter

sf::st_perimeter(us[us$NAME %in% c("Alaska", "Rhode Island"), ])
#> Units: [m]
#> [1]   292847 20133913

6.4 Length

sf::st_sfc(st_linestring(rbind(c(0, 0), c(0, 1))), crs = 4326) |> 
sf::st_length()
#> 111195 [m]