layer <- "cb_2023_us_state_20m"
dsn <- here::here("data/cb_2023_us_state_20m")
us <- sf::st_read(dsn = dsn, layer = layer)
#> Reading layer `cb_2023_us_state_20m' from data source
#> `/Users/rkw/Dropbox/coding/rproj/mapswR/data/cb_2023_us_state_20m'
#> using driver `ESRI Shapefile'
#> Simple feature collection with 52 features and 9 fields
#> Geometry type: MULTIPOLYGON
#> Dimension: XY
#> Bounding box: xmin: -179 ymin: 17.9 xmax: 180 ymax: 71.4
#> Geodetic CRS: NAD83
results <- capture.output(sf::st_read(dsn = dsn, layer = layer))Introduction
1 File Operations
1.1 Import GIS
There are multiple formats. This example uses a “shapefile”. It would be more accurate to refer to them as “shapefiles” because it is comprised of a directory of files. This shapefile was downloaded from the U.S. Census Cartographic Boundary Files. This is the “state” file and its scale is 1:20m.
Careful reading of the above output tells you important information about the file. For the sake of speed, it may be tempting to skip this crucial information. The better advice is to review it carefully.
A thorough understanding of import options can save a programmer a lot of time later in an analysis. The st_read() function contains many arguments allowing for flexibility on import. Note that all geometries are converted to the “Multi” variety. For example, polygons become multipolygons. If this is undesired behavior, then set promote_to_multi = F.
1.2 Export GIS
if(!dir.exists("./data/us")) dir.create(path = "./data/us", recursive = TRUE)
dsn <- here::here("data/us/us.shp")
sf::st_write(us, dsn = dsn, append=FALSE)
#> Deleting layer `us' using driver `ESRI Shapefile'
#> Writing layer `us' to data source
#> `/Users/rkw/Dropbox/coding/rproj/mapswR/data/us/us.shp' using driver `ESRI Shapefile'
#> Writing 52 features with 9 fields and geometry type Multi Polygon.1.3 Get Info
Within the Rstudio viewer, the file behaves much like any dataframe or tibble. That feeling is intentional. The author’s of the sf package have worked hard to preserve a dplyr like workflow so that programmers don’t have to learn new skills. However, this is not your typical dataframe!
1.3.1 Class
The object is both a dataframe and of the “sf” class.
1.3.2 Attributes
attributes(us)
#> $names
#> [1] "STATEFP" "STATENS" "GEOIDFQ" "GEOID" "STUSPS" "NAME"
#> [7] "LSAD" "ALAND" "AWATER" "geometry"
#>
#> $row.names
#> [1] 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25
#> [26] 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50
#> [51] 51 52
#>
#> $class
#> [1] "sf" "data.frame"
#>
#> $sf_column
#> [1] "geometry"
#>
#> $agr
#> STATEFP STATENS GEOIDFQ GEOID STUSPS NAME LSAD ALAND AWATER
#> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA>
#> Levels: constant aggregate identity1.3.3 Dependencies
Software dependencies can cause lots of headaches when incompatiable or out dated. You can see the complementary software along with the versions by running the command below.
1.3.4 CRS
A Coordinate Reference System (CRS) tells the computer how the coordinates were calculated. This is a topic of importance, but generally it should just work. Don’t change it unless you’re confident in what you’re doing.
“NAD83” is an abbreviation for the “North American Datum of 1983”.
#|label: basic-crs-2
cat(sf::st_crs(us)[[2]], sep = "\n")
#> GEOGCRS["NAD83",
#> DATUM["North American Datum 1983",
#> ELLIPSOID["GRS 1980",6378137,298.257222101,
#> LENGTHUNIT["metre",1]]],
#> PRIMEM["Greenwich",0,
#> ANGLEUNIT["degree",0.0174532925199433]],
#> CS[ellipsoidal,2],
#> AXIS["latitude",north,
#> ORDER[1],
#> ANGLEUNIT["degree",0.0174532925199433]],
#> AXIS["longitude",east,
#> ORDER[2],
#> ANGLEUNIT["degree",0.0174532925199433]],
#> ID["EPSG",4269]]1.3.5 Bounding Box
A “bounding box” gives the range (minimum and maximum) along the two axes. The coordinates were printed to the console when the object was imported initially, but can be accessed via st_bbox.
1.3.6 Geometry Column
The spatial portion of the dataframe is stored in a column within the dataframe. Usually, this column is named “geometry” and can be accessed via object$geometry. However, a more conventional way is the following:
1.3.7 Validity
A simple way to check if the geometries were imported correctly is to run the following command:
Note that 52 rows of the dataframe generated a “TRUE” response, meaning that they are all valid. In the event of a “FALSE”, the function sf::st_make_valid might fix the problem.
1.3.8 Plot
Many authorities recommend doing a quick plot. A quick plot will help discover if the correct file was downloaded or if there are any additional problems with the data.
For a quick plot, my default command is:
The map looks to be the correct map and there were no errors when plotting. Looking at the 52 names in the NAME field reveal Washington D.C and Puerto Rico are included. There are no additional states at the 150E 55N location. This suggests that some of the Aleutian Islands in Alaska have breached a division within the map. Here’s a re-plot where the longitudinal center of the map is shifted -90 degrees. The sf package also includes the st_shift_longitude function for a similar result.
2 Geometry Creation
The workflow to creating an sf class object is: sfg >> sfc >> sf. An sf class object contains both geometries and data called attributes.
2.1 SFG
The sfg class represents simple feature geometries. There are 17 possible geometry types. We will discuss LINE, POINT, POLYGON/MULTIPOLYGON and only in 2-dimensional space. All features are built upon points.
2.1.1 Point
2.1.2 Line
2.1.3 Polygon
Note that the final point is a repeat of the first point in order to close the polygon.
2.1.4 Plot
2.2 SFC
Next a geometry is included in a simple feature column. A simple feature column is a column-list of sfg objects. It might be a little confusing because in the example below there is only a single sfg object that is then converted to a list-column.
beijing_sfg <- sf::st_point(c(116.4, 39.9))
class(beijing_sfg)
#> [1] "XY" "POINT" "sfg"
(beijing_sfc <- sf::st_sfc(list(beijing_sfg),crs = "WGS84"))
#> Geometry set for 1 feature
#> Geometry type: POINT
#> Dimension: XY
#> Bounding box: xmin: 116 ymin: 39.9 xmax: 116 ymax: 39.9
#> Geodetic CRS: WGS 84
class(beijing_sfc)
#> [1] "sfc_POINT" "sfc"2.3 SF
city <- "Beijing"
country <- "China"
(beijing_sf <- sf::st_sf(city = city,
country = country,
beijing_sfc))
#> Simple feature collection with 1 feature and 2 fields
#> Geometry type: POINT
#> Dimension: XY
#> Bounding box: xmin: 116 ymin: 39.9 xmax: 116 ymax: 39.9
#> Geodetic CRS: WGS 84
#> city country beijing_sfc
#> 1 Beijing China POINT (116 39.9)2.4 Plot
3 CRS
3.1 Coordinate Reference Systems
The earth is not a perfectly round sphere. Matter of fact, if one measures the circumference of the earth between the North and South poles, the measurement would be 11.5 kilometers shorter than the circumference of the equator. The earth is an ellipsoid rather than a sphere. An accurate Coordinate Reference System must account for these nuances.
Coordinate Reference System (CRS), which defines how the spatial elements of the data relate to the surface of the Earth (or other bodies). CRSs are either geographic or projected. (Lovelace et al., 2019; Moraga, 2023)
Geographic CRS use latitude and longitude; projected CRS use Cartesian coordinates to reference a two-dimensional or flat representation of the earth.
3.2 Geographic CRS
Geographic CRS use latitude and longitude to identify unique positions on the Earth’s surface. By way of reminder, the equator runs east and west, bisecting the Earth into the Northern and Southern hemispheres. It is halfway between the North and South poles. The equator’s latitude is equal to 0 degrees.
Latitude values are the angle values north or south of the equator (0 degrees) and range from –90 degrees at the south pole to 90 degrees at the north pole.
The prime meridian is an arbitrarily-chosen meridian running North and South, dividing the Earth into Eastern and Western hemispheres.Earth’s current international standard prime meridian is the International IERS Reference Meridian. The current standard differs from the previous Greenwich Meridian.
Longitude values range from –180 degrees when running west to 180 degrees when running east
The latitude and longitude values where the Equator and Prime Meridean insersect is (0, 0).
Latitude and longitude coordinates may be expressed in degrees, minutes, and seconds, or in decimal degrees. In decimal form, northern latitudes are positive, and southern latitudes are negative. Also, eastern longitudes are positive, and western longitudes are negative.
| Format | Latitude | Longitude |
|---|
- “Taken together, the geodetic datum (e.g, WGS84), the type of map projection (e.g., Mercator) and the parameters of the projection (e.g., location of the origin) specify a coordinate reference system, or CRS, a complete set of assumptions used to translate the latitude and longitude information into a two-dimensional map.” (hadley)
- The WKT format will usually start with a PROJCRS[…] tag for a projected coordinate system, or a GEOGCRS[…] tag for a geographic coordinate system. The CRS output will also consist of a user defined CS definition which can be an EPSG code, or a string defining the datum and projection type. In the past, a coordinate system could be defined by its EPSG numeric code or the PROJ4 formatted string. Now, an sf object’s coordinate system can also be designated with Well Known Text (WTK/WTK2) format. “When possible, adopt an EPSG code which comes from a well established authority. However, if customizing a coordinate system, it may be easiest to adopt a PROJ4 syntax.”(Gimond, 2023)
3.3 Projections
All projected CRSs are based on a geographic CRS. A projection takes a three-dimensional object, the earth, and reduces it to a two-dimensional object. The conversion always results in deformation.
There are three types of projections: conic, cylindrical, and planar (azimuthal). Conic projections are appropriate for maps of mid-latitude areas. Cylindrical projections are used when displaying the entire world. (The “Mercator” map, by Gerardus Mercator back in 1569, is an example). A planar projection is typically used in mapping polar regions.
“Projections are often named based on a property they preserve: equal-area preserves area, azimuthal preserve direction, equidistant preserve distance, and conformal preserve local shape.”(Lovelace et al., 2019)
“you will have to make choices about what distortions you are prepared to accept when drawing a map. This is the job of the map projection.”
“North American Datum” (NAD83) is a good choice, whereas if your perspective is global the “World Geodetic System” (WGS84) is probably better.
When selecting geographic CRSs, the answer is often WGS84. It is used not only for web mapping, but also because GPS datasets and thousands of raster and vector datasets are provided in this CRS by default. WGS84 is the most common CRS in the world, so it is worth knowing its EPSG code: 4326. Lovelace.
# to see list of available projections
sf_proj_info(type = "proj")
Upon import, most objects will have a CRS already set. The challenge is to know what CRS you’re working with and for it to match in all layers of the map. Occassionally, a need to “transform” the CRS will arise. We’ll use the United States GIS data from the U.S Census bureau.
3.4 Query
3.4.1 User Input
The user input element can contain the AUTHORITY:CODE representation (e.g., EPSG:4326), the CRS’s name (e.g., WGS 84), or the proj-string definition (e.g., “+proj=lcc +lat_1=20 +lat_2=60 lon_0=-96 +datum=NAD83 +units=m”).
3.4.2 WKT
“WKT” is an abbreviation for “well known text string”.
cat(sf::st_crs(us)[[2]], sep = "\n")
#> GEOGCRS["NAD83",
#> DATUM["North American Datum 1983",
#> ELLIPSOID["GRS 1980",6378137,298.257222101,
#> LENGTHUNIT["metre",1]]],
#> PRIMEM["Greenwich",0,
#> ANGLEUNIT["degree",0.0174532925199433]],
#> CS[ellipsoidal,2],
#> AXIS["latitude",north,
#> ORDER[1],
#> ANGLEUNIT["degree",0.0174532925199433]],
#> AXIS["longitude",east,
#> ORDER[2],
#> ANGLEUNIT["degree",0.0174532925199433]],
#> ID["EPSG",4269]]3.5 Transform
Use the CRS that is best for your work. However, keep this in mind prior to conversion:
When selecting geographic CRSs, the answer is often WGS84. It is used not only for web mapping, but also because GPS datasets and thousands of raster and vector datasets are provided in this CRS by default. WGS84 is the most common CRS in the world, so it is worth knowing its EPSG code: 4326. This ‘magic number’ can be used to convert objects with unusual projected CRSs into something that is widely understood.(Lovelace et al., 2019)
#|label: basics-crs-transform
us_wgs84 <- sf::st_transform(us, "EPSG:4326")
sf::st_crs(us_wgs84)
#> Coordinate Reference System:
#> User input: EPSG:4326
#> wkt:
#> GEOGCRS["WGS 84",
#> ENSEMBLE["World Geodetic System 1984 ensemble",
#> MEMBER["World Geodetic System 1984 (Transit)"],
#> MEMBER["World Geodetic System 1984 (G730)"],
#> MEMBER["World Geodetic System 1984 (G873)"],
#> MEMBER["World Geodetic System 1984 (G1150)"],
#> MEMBER["World Geodetic System 1984 (G1674)"],
#> MEMBER["World Geodetic System 1984 (G1762)"],
#> MEMBER["World Geodetic System 1984 (G2139)"],
#> ELLIPSOID["WGS 84",6378137,298.257223563,
#> LENGTHUNIT["metre",1]],
#> ENSEMBLEACCURACY[2.0]],
#> PRIMEM["Greenwich",0,
#> ANGLEUNIT["degree",0.0174532925199433]],
#> CS[ellipsoidal,2],
#> AXIS["geodetic latitude (Lat)",north,
#> ORDER[1],
#> ANGLEUNIT["degree",0.0174532925199433]],
#> AXIS["geodetic longitude (Lon)",east,
#> ORDER[2],
#> ANGLEUNIT["degree",0.0174532925199433]],
#> USAGE[
#> SCOPE["Horizontal component of 3D system."],
#> AREA["World."],
#> BBOX[-90,-180,90,180]],
#> ID["EPSG",4326]]A helpful package is the crsuggestpackage which can be installed from CRAN: install.packages("crsuggest"). An example would be New Zealand. When exported from the rnaturalearth package, it is in the common “WGS 84” format. Because of its proximity to the South Pole, this can cause distortions and a better CRS may be available.
rnaturalearth::ne_states(country = "New Zealand") |>
crsuggest::suggest_crs()
#> # A tibble: 10 × 6
#> crs_code crs_name crs_type crs_gcs crs_units crs_proj4
#> <chr> <chr> <chr> <dbl> <chr> <chr>
#> 1 3851 NZGD2000 / NZCS2000 project… 4167 m +proj=lc…
#> 2 9191 WGS 84 / NIWA Albers project… 4326 m +proj=ae…
#> 3 3994 WGS 84 / Mercator 41 project… 4326 m +proj=me…
#> 4 3857 WGS 84 / Pseudo-Mercator project… 4326 m +proj=me…
#> 5 6932 WGS 84 / NSIDC EASE-Grid 2.0 So… project… 4326 m +proj=la…
#> # ℹ 5 more rows4 Operations - Attributes
This section deals with typical dataframe manipulations like subsetting, filtering and creating new columns with vector datasets. Let’s continue to use the us dataframe of class sf. I prefer to work in a pipline using the dplyr package so the base package equivalents will be omitted.
4.1 Drop Geometry
The original us dataframe has 10 columns, one of which is a geometry list column. Once dropped, the new dataframe has 9 columns.
4.2 Filter
us |>
dplyr::filter(NAME %in% c("Texas", "California"))
#> Simple feature collection with 2 features and 9 fields
#> Geometry type: MULTIPOLYGON
#> Dimension: XY
#> Bounding box: xmin: -124 ymin: 25.8 xmax: -93.5 ymax: 42
#> Geodetic CRS: NAD83
#> STATEFP STATENS GEOIDFQ GEOID STUSPS NAME LSAD ALAND AWATER
#> 1 48 01779801 0400000US48 48 TX Texas 00 6.77e+11 1.90e+10
#> 2 06 01779778 0400000US06 06 CA California 00 4.04e+11 2.03e+10
#> geometry
#> 1 MULTIPOLYGON (((-107 31.9, ...
#> 2 MULTIPOLYGON (((-119 33.5, ...4.3 Select
4.4 Slice
4.5 Summarize
This example divides states into big and small depending if their area exceeds the average. Then, it computes the average by big and small. The geometry column stays with the dataframe until droped in the last line.
library(dplyr)
library(magrittr)
us |>
mutate(avg_lnd = mean(ALAND), .after = ALAND) |>
mutate(typ_lnd = ifelse(ALAND > avg_lnd, "big", "small"), .after = avg_lnd) |>
group_by(typ_lnd) |>
summarize(avg_lnd_typ = mean(ALAND)) |>
sf::st_drop_geometry()
#> # A tibble: 2 × 2
#> typ_lnd avg_lnd_typ
#> * <chr> <dbl>
#> 1 big 335454262870.
#> 2 small 84437535114.5 Operations - Geometric
5.1 Boundary
The st_boundary function will convert the geometry type from multipolygon to linestring.
swus <- us[us$NAME %in% c("Texas", "Oklahoma", "New Mexico"), "geometry"]
bndry_swus <- sf::st_boundary(us[us$NAME %in% c("Texas", "Oklahoma", "New Mexico"), ])
plot(swus$geometry, col = c("red", "steelblue", "yellow"), main = "Southwest US", axes = T)
plot(bndry_swus$geometry, add = T, lwd = 5, col = "grey20")5.2 Buffer
5.3 Centroids
This function gets used frequently, especially for labels. The function returns the center, in the form of a point geometry, for a polygon. For labels, it works well for a polygon that’s symetrical along the xy axis like Nebraska.
For states that are not symetrical, like Florida, it will place the label in a seemingly odd location.
5.4 Convex Hull
5.5 Simplification
Simplification can be appropriate when building a choropleth map of national boundaries, but not when deciding a which side of a boundary line a treasure is located. You should always simplify polygons to the point where they are recognizable and still perform the function intended. I usually simplify upon import so that plotting does not take excessive time. Reduced file sizes are a big benefit when pushing to repositories too.
Below is the United States Cartographic boundary file that we’ve been experimenting with. The file is subset to Alaska and then the bounding box was restried to the famous “Yukon” area with many small islands. Our original file was the smallest available at a scale of 1:20,000,000. Using the st_simplify function, the file is shrunk even further to the point where the details are becoming opaque.
Another great tool is the rmapshaper package and an accompanying website is offered that allows for experimentation.
alsk <- us[us$NAME == "Alaska", ]
alsk_smplfy <- sf::st_simplify(alsk, dTolerance = 5000)
par(mfrow=c(1,2))
par(mar=c(5,4,4,2))
plot(alsk$geometry,
xlim = c(-137.985830, -130),
ylim = c(54.8, 59.3),
axes = T,
main = "Original"
)
par(mar=c(5,4,4,2))
plot(alsk_smplfy$geometry,
xlim = c(-137.985830, -130),
ylim = c(54.8, 59.3),
axes = T,
main = "Simplified")5.6 Affine Transformations
Affine transformations allow for shifting, rotating, and scaling. They can speed the generation of experimental objects to plot. This example is taken directly from the sf vignette: “3. Manipulating Simple Feature Geometries”.
5.7 Clipping
The traditional set operations can be used in clipping geospatial data. They are x or y, the difference between x and y, the intersection of x and y and the union of x and y. The example in Geocomputation for R inspired this plot with the only change being the state of Georgia is used as the polygon.(Lovelace et al., 2019)
The next plot shows how to subset to a point that is shared by the two Georgias.
Code
ga <-
us |>
dplyr::filter(NAME == "Georgia") |>
dplyr::select(geometry)
# pull geometry only
ga_orgnl <- sf::st_geometry(ga)
# affine transformation
ga_affn <- ga_orgnl * 1.03
# reset CRS
ga_affn <- sf::st_set_crs(ga_affn, value = "EPSG:4269")
# convert back to sf and combine for bbox
ga_orgnl <- sf::st_sf(name = "original", geometry = ga_orgnl)
ga_affn <- sf::st_sf(name = "transformed", geometry = ga_affn)
# combine
ga <- dplyr::bind_rows(ga_affn, ga_orgnl)
#subset to point within intersection
x <- ga[ga$name == "original", ]
y <- ga[ga$name == "transformed", ]
bb = st_bbox(st_union(x, y))
box = st_as_sfc(bb)
x_and_y = st_intersection(x, y)
set.seed(2024)
p = st_sample(x = box, size = 10)
p_xy1 = p[x_and_y]
plot.new()
par(mar = c(0, 0, 0, 0))
plot(box, border = "grey", lty = 2)
plot(x$geometry, add = TRUE, border = "grey", lwd = 2)
plot(y$geometry, add = TRUE, border = "grey", lwd = 2)
plot(p, add = TRUE, cex = 2, pch = 20)
plot(p_xy1, cex = 5, col = "red", add = TRUE)
text(x = c(-86, -82.8), y = 32.8, labels = c("x", "y"), cex = 2)5.8 Union by Grouping
The first example uses simple points, while this examples uses a map of the US and simulated data.
us_main <-
us |>
dplyr::filter(!(NAME %in% c("Alaska", "District of Columbia",
"Hawaii", "Puerto Rico"))) |>
dplyr::select(NAME, geometry) |>
dplyr::mutate(var_cont = rnorm(48))
us_df <-
us_main %>%
st_centroid() %>%
st_coordinates(ctr) %>%
as_tibble() %>%
rename(lon = X, lat = Y) %>%
{bind_cols(us_main, .)} %>%
mutate(avg_lon = mean(lon)) %>%
mutate(coast = ifelse(lon < avg_lon, "West", "East")) |>
mutate(coast = factor(coast)) |>
group_by(coast) |>
summarise(var_cont = sum(var_cont))
par(mfrow = c(1, 2))
par(mar=c(5,4,4,2))
plot(us_df["var_cont"], main = "Continous Variable")














