Skip to contents

For two lists of Swiss municipality IDs at any two points in time, this function creates a data frame with two columns where each row represents a match between municipality IDs. This can be used as an intermediate table for merging two data sets with municipality identifiers taken at different, possibly unknown, points in time.

Usage

swc_get_mapping(ids_from, ids_to)

Arguments

ids_from

A list of "source" municipality IDs, preferably a factor

ids_to

A list of "target" municipality IDs, preferably a factor

Value

A data frame with columns prefixed by from. and to that represents the computed match. The municipality IDs are stored in the columns from.mId and to.mId. The columns

from.MergeType and to.MergeType contain valid if the municipality is contained in both the input and the mapping table,

missing if the municipality is missing from the input, and

extra if the municipality is in the input but not in the mapping table; most columns are NA for such rows. In addition, the column

MergeType offers a summary of the "from" and "to" status: Rows with values other than "valid" or "missing" should be examined.

Details

It is advisable to use factors as list of municipality IDs. By that, comparisons and merges for municipality IDs are automatically checked for compatibility.

Note that the "from" list must be from an earlier time than the "to" list. Trying to compute the mapping the other way round results in an error. This is intentional: As municipalities are usually merged, it makes sense to use the most recent data set as target for the mapping. This can also be a file with suitable geometries to allow for visualization.

For two lists of municipalities, we construct a mapping from the first list to the second. First, the most probable mutation number in the "municipality mutations" data set is computed.

Examples

library(dplyr)
#> 
#> Attaching package: ‘dplyr’
#> The following objects are masked from ‘package:stats’:
#> 
#>     filter, lag
#> The following objects are masked from ‘package:base’:
#> 
#>     intersect, setdiff, setequal, union
data(SwissPop)
data(SwissBirths)

# Show mismatch of municipality IDs:
ids_from <- with(SwissPop, MunicipalityID)
ids_to <- with(SwissBirths, MunicipalityID)
setdiff(ids_from, ids_to)
#>   [1] "327"  "328"  "330"  "343"  "384"  "531"  "601"  "618"  "621"  "752" 
#>  [11] "753"  "851"  "854"  "862"  "864"  "882"  "887"  "926"  "933"  "974" 
#>  [21] "984"  "986"  "994"  "1027" "1028" "1029" "1034" "1035" "1036" "1038"
#>  [31] "1042" "1060" "1087" "1090" "1101" "1105" "1106" "1124" "1133" "1134"
#>  [41] "1141" "1144" "1148" "1149" "1601" "1602" "1603" "1604" "1605" "1606"
#>  [51] "1607" "1608" "1609" "1610" "1611" "1612" "1613" "1614" "1615" "1616"
#>  [61] "1617" "1618" "1619" "1620" "1621" "1622" "1623" "1624" "1625" "1626"
#>  [71] "1627" "1628" "1629" "2001" "2002" "2003" "2007" "2012" "2017" "2018"
#>  [81] "2020" "2023" "2026" "2031" "2032" "2036" "2037" "2042" "2046" "2062"
#>  [91] "2064" "2069" "2070" "2071" "2074" "2075" "2077" "2081" "2082" "2083"
#> [101] "2085" "2088" "2091" "2092" "2093" "2094" "2095" "2100" "2101" "2103"
#> [111] "2105" "2107" "2108" "2109" "2110" "2112" "2132" "2133" "2136" "2139"
#> [121] "2141" "2142" "2144" "2146" "2150" "2151" "2154" "2156" "2158" "2159"
#> [131] "2161" "2176" "2180" "2181" "2182" "2188" "2190" "2191" "2202" "2203"
#> [141] "2204" "2210" "2214" "2215" "2219" "2227" "2229" "2232" "2241" "2247"
#> [151] "2253" "2263" "2269" "2282" "2322" "2324" "2326" "2327" "2329" "2330"
#> [161] "2331" "2332" "2443" "2447" "2460" "2494" "2496" "2531" "2552" "2905"
#> [171] "2911" "2912" "2913" "2916" "2918" "2934" "3335" "3336" "3351" "3354"
#> [181] "3355" "3357" "3371" "3376" "3406" "3523" "3573" "3591" "3600" "3601"
#> [191] "3602" "3632" "3639" "3641" "3642" "3664" "3665" "3666" "3667" "3692"
#> [201] "3702" "3704" "3706" "3709" "3710" "3771" "3773" "3774" "3775" "3776"
#> [211] "3841" "3842" "3843" "3844" "3845" "3846" "3892" "3912" "3913" "3914"
#> [221] "3915" "3928" "3930" "3942" "3943" "3944" "3971" "3973" "3984" "4011"
#> [231] "4036" "4043" "4070" "4101" "4116" "4118" "4162" "4168" "4171" "4174"
#> [241] "4178" "4180" "4225" "4278" "5011" "5016" "5031" "5032" "5033" "5034"
#> [251] "5035" "5036" "5037" "5038" "5039" "5040" "5041" "5042" "5043" "5044"
#> [261] "5045" "5046" "5047" "5062" "5065" "5066" "5067" "5068" "5069" "5070"
#> [271] "5074" "5075" "5080" "5092" "5093" "5094" "5098" "5101" "5104" "5106"
#> [281] "5107" "5110" "5111" "5114" "5116" "5122" "5123" "5127" "5128" "5134"
#> [291] "5142" "5145" "5147" "5150" "5153" "5156" "5158" "5159" "5164" "5165"
#> [301] "5168" "5169" "5175" "5177" "5179" "5182" "5183" "5184" "5185" "5188"
#> [311] "5190" "5191" "5201" "5204" "5209" "5211" "5215" "5217" "5218" "5220"
#> [321] "5223" "5228" "5232" "5234" "5235" "5241" "5244" "5245" "5246" "5247"
#> [331] "5248" "5252" "5256" "5258" "5259" "5262" "5264" "5265" "5267" "5301"
#> [341] "5302" "5303" "5305" "5306" "5308" "5311" "5312" "5313" "5314" "5316"
#> [351] "5318" "5319" "5320" "5321" "5322" "5433" "5452" "5453" "5454" "5455"
#> [361] "5457" "5459" "5460" "5461" "5462" "5463" "5517" "5519" "5524" "5525"
#> [371] "5526" "5528" "5532" "5536" "5538" "5558" "5567" "5569" "5570" "5602"
#> [381] "5603" "5605" "5608" "5612" "5630" "5641" "5647" "5677" "5681" "5687"
#> [391] "5691" "5781" "5783" "5784" "5786" "5787" "5793" "5794" "5795" "5796"
#> [401] "5797" "5800" "5801" "5802" "5811" "5814" "5815" "5818" "5820" "5823"
#> [411] "5824" "5825" "5826" "5829" "5901" "5916" "5917" "5918" "5920" "5927"
#> [421] "5936" "6051" "6059" "6060" "6062" "6063" "6065" "6066" "6067" "6070"
#> [431] "6071" "6085" "6086" "6088" "6103" "6107" "6174" "6175" "6176" "6179"
#> [441] "6180" "6196" "6200" "6231" "6233" "6237" "6245" "6247" "6251" "6457"
#> [451] "6460" "6501" "6502" "6503" "6505" "6506" "6507" "6508" "6509" "6510"
#> [461] "6746" "6747" "6749" "6752" "6755" "6756" "6772" "6777" "6779" "6780"
#> [471] "6786" "6788" "6791" "6794" "6795" "6796" "6797" "6798" "6799" "6801"
#> [481] "6802" "6804" "6805"
setdiff(ids_to, ids_from)
#>  [1] "632"  "756"  "855"  "888"  "948"  "1151" "1630" "1631" "1632" "2050"
#> [11] "2051" "2052" "2114" "2115" "2116" "2162" "2233" "2234" "2235" "2337"
#> [21] "2338" "2503" "3340" "3358" "3359" "3378" "3617" "3671" "3713" "3792"
#> [31] "3847" "3931" "3932" "3955" "4049" "4184" "5048" "5049" "5050" "5137"
#> [41] "5138" "5236" "5237" "5238" "5269" "5323" "5324" "5397" "5398" "5464"
#> [51] "5540" "5541" "5571" "5613" "5804" "5805" "5831" "6074" "6075" "6076"
#> [61] "6090" "6118" "6181" "6203" "6204" "6252" "6461" "6512" "6807" "6808"
#> [71] "6809" "6810"

# Compute mapping and count non-matching municipality IDs:
mapping <- swc_get_mapping(ids_from = ids_from, ids_to = ids_to)
#> 2023-11-05 05:21:05.776557 INFO::meltMutations
#> 2023-11-05 05:21:05.938916 INFO::meltMutations
#> 2023-11-05 05:21:05.953015 INFO::computeMunList
#> 2023-11-05 05:21:05.967004 INFO::meltMutations
#> 2023-11-05 05:21:05.99671 INFO::meltMutations
#> 2023-11-05 05:21:06.012439 INFO::computeMunList
with(mapping, sum(mapping$mIdAsNumber.from != mapping$mIdAsNumber.to))
#> [1] 487

# Communes that are "missing" are mostly lakes and other special communes:
subset(mapping, MatchType == "missing")[, c("mIdAsNumber.from", "mShortName.from")]
#>      mIdAsNumber.from          mShortName.from
#> 253              5238 C'za Corticiasca/Valcol.
#> 2222             4510                   Sulgen

# These should be looked at in some detail, and fixed manually:
subset(mapping, !(MatchType %in% c("valid", "missing")))
#>     mHistId.from cAbbreviation.from mId.from
#> 589        11250                 TI     <NA>
#> 590        11251                 TI     <NA>
#>                              mLongName.from          mShortName.from
#> 589 C'za Bidogno/Sala Capriasca/Corticiasca C'za Bid./Sala C./Corti.
#> 590     C'za Sala Capriasca/Vaglio/Lugaggia C'za Sala C./Vag./Lugag.
#>     MatchType.from mHistId.to cAbbreviation.to mId.to mLongName.to
#> 589        missing      14938               TI   5226    Capriasca
#> 590        missing      14938               TI   5226    Capriasca
#>     mShortName.to MatchType.to mIdAsNumber.from mIdAsNumber.to    MatchType
#> 589     Capriasca        valid             5236           5226 missing.from
#> 590     Capriasca        valid             5237           5226 missing.from

# Test for injectivity. The result shows that the mapping is almost injective,
# only one "from" commune is mapped to more than one other "to" commune.
# This situation requires further examination.
mapping.dupes <- subset(mapping, duplicated(mIdAsNumber.from))
(noninjective.mapping <- subset(
  mapping, mIdAsNumber.from %in% mapping.dupes$mIdAsNumber.from
))
#>     mHistId.from cAbbreviation.from mId.from mLongName.from mShortName.from
#> 610        11403                 TI     5246       Caneggio        Caneggio
#> 611        11403                 TI     5246       Caneggio        Caneggio
#>     MatchType.from mHistId.to cAbbreviation.to mId.to      mLongName.to
#> 610          valid      14499               TI   5249 Castel San Pietro
#> 611          valid      14974               TI   5269           Breggia
#>         mShortName.to MatchType.to mIdAsNumber.from mIdAsNumber.to MatchType
#> 610 Castel San Pietro        valid             5246           5249     valid
#> 611           Breggia        valid             5246           5269     valid

# Simple treatment (just for this example): Remove duplicates, and use only
# valid matches:
cleaned.mapping <- subset(
  mapping,
  !duplicated(mIdAsNumber.from) & MatchType == "valid"
)

# Now merge the two datasets based on the mapping table:
SwissPop.1970 <- subset(SwissPop, Year == "1970")
SwissPopMapping.1970 <- merge(SwissPop.1970,
  cleaned.mapping[, c("mId.from", "mId.to")],
  by.x = "MunicipalityID", by.y = "mId.from"
)

# Datasets from the "from" table must be suitably aggregated.  For the given
# case of population totals we use the sum.
SwissPopMapping.1970.agg <- group_by(
  SwissPopMapping.1970,
  mId.to,
  HouseholdSize
) %>%
  summarize(Households = sum(Households))
#> `summarise()` has grouped output by 'mId.to'. You can override using the
#> `.groups` argument.
with(SwissPopMapping.1970.agg, stopifnot(
  length(unique(mId.to)) * length(levels(HouseholdSize)) ==
    length(mId.to)
))

# The aggregated "from" dataset now can be merged with the "to" dataset:
SwissBirths.1970 <- subset(SwissBirths, Year == "1970")
SwissPopBirths.1970 <- merge(SwissPopMapping.1970.agg, SwissBirths.1970,
  by.x = "mId.to", by.y = "MunicipalityID"
)

# Some more communes are still missing from the 1970 statistics, although
# the matches are valid:
subset(mapping, mIdAsNumber.to %in% setdiff(
  SwissPopMapping.1970.agg$mId.to, SwissBirths.1970$MunicipalityID
))[
  ,
  c("mId.from", "mShortName.from", "MatchType")
]
#>      mId.from          mShortName.from MatchType
#> 16       3533                Marmorera     valid
#> 30       3503                   Mutten     valid
#> 36       3593                    Duvin     valid
#> 42       3583                  Schnaus     valid
#> 48       3577                    Luven     valid
#> 122      3803                  Braggio     valid
#> 135      3806                    Cauco     valid
#> 146      3743                    Lavin     valid
#> 147      3742                   Guarda     valid
#> 176      3811                    Selma     valid
#> 197      3631                   Almens     valid
#> 198      3616        Waltensburg/Vuorz     valid
#> 217      3711                Rongellen     valid
#> 220      3708                   Mathon     valid
#> 221      3707                Lohn (GR)     valid
#> 222      3706             Innerferrera     valid
#> 226      3702            Ausserferrera     valid
#> 234      3670                   Urmein     valid
#> 235      3669               Tschappina     valid
#> 346      5129                  Sonogno     valid
#> 390      5688                    Syens     valid
#> 439      5732                     Vich     valid
#> 441      5233             Vico Morcote     valid
#> 452      5690         Villars-le-Comte     valid
#> 455      5651     Villars-Sainte-Croix     valid
#> 458      5935           Villars-Epeney     valid
#> 479      3923                  Castiel     valid
#> 549      5650          Vaux-sur-Morges     valid
#> 576      5932                   Ursins     valid
#> 591      5173                  Certara     valid
#> 609      5307      Campo (Vallemaggia)     valid
#> 627      2008           Châtillon (FR)     valid
#> 641      5475      Chavannes-le-Veyron     valid
#> 650      5906                  Chanéaz     valid
#> 720      5668                   Cremin     valid
#> 728      5479                 Cuarnens     valid
#> 816      5102                  Corippo     valid
#> 933      5621                   Aclens     valid
#> 976      6172                   Bister     valid
#> 980      5155                    Bogno     valid
#> 983      4302                  Böbikon     valid
#> 985      5747                 Bofflens     valid
#> 992      5622                Bremblens     valid
#> 999      5096     Brione sopra Minusio     valid
#> 1049     5513           Bioley-Orjulaz     valid
#> 1224     5109                   Gresso     valid
#> 1247     2079               Grangettes     valid
#> 1315     5671           Dompierre (VD)     valid
#> 1427     4106                  Mönthal     valid
#> 1500     5488                   Mauraz     valid
#> 1502     4314                 Mellikon     valid
#> 1503      389                Meienried     valid
#> 1512     5012                   Moleno     valid
#> 1540     6216                 Mex (VS)     valid
#> 1730     6109                    Inden     valid
#> 1800     6178               Martisberg     valid
#> 1803     5676             Martherenges     valid
#> 1872     5315                 Linescio     valid
#> 1911     5564                 Novalles     valid
#> 1964     5798                   Ropraz     valid
#> 2016     5081                   Sobrio     valid
#> 2031     5762                   Sergey     valid
#> 2042     5686                  Sarzens     valid
#> 2167     2216           Pierrafortscha     valid
#> 2372      532                Bangerten     valid
#> 2375      535 Deisswil b. Münchenbuch.     valid
#> 2455      629              Oberhünigen     valid
#> 2464     4084                Islisberg     valid
#> 2467      362                  Ittigen     valid
#> 2468      363            Ostermundigen     valid
#> 2474     6716               Mettembert     valid
#> 2476     3613                   Pigniu     valid
#> 2521     2522                 Hersiwil     valid
#> 2540     4826                  Mammern     valid
#> 2541      631                Trimstein     valid
#> 2542      630             Allmendingen     valid
#> 2550     4643              Bottighofen     valid
#> 2576     2851           Kilchberg (BL)     valid
#> 2805      972                   Berken     valid
#> 2835     4621          Warth-Weiningen     valid
#> 2839     4716               Bettwiesen     valid
#> 2860     6728                 Vellerat     valid
#> 2865     4724                Eschlikon     valid
#> 2881     4786               Wilen (TG)     valid
#> 2887     4546             Schlatt (TG)     valid
#> 2891     4723                  Braunau     valid

# The "from" list must be from an earlier time than the "to" list.
try(swc_get_mapping(ids_from = ids_to, ids_to = ids_from))
#> 2023-11-05 05:21:10.145095 INFO::meltMutations
#> 2023-11-05 05:21:10.187964 INFO::meltMutations
#> 2023-11-05 05:21:10.199548 INFO::computeMunList
#> 2023-11-05 05:21:10.211782 INFO::meltMutations
#> 2023-11-05 05:21:10.233549 INFO::meltMutations
#> 2023-11-05 05:21:10.241401 INFO::computeMunList
#> Error in getMunicipalityMappingWorker(mutations, hist.list.from, mid.from,  : 
#>   mid.from <= mid.to is not TRUE