Finding Patterns in Convenience Store Locations with Geospatial Association Rule Mining | by Elliot Humphrey | Apr, 2023

Photo by Matt Liu on Unsplash

When walking around Tokyo you will often pass numerous convenience stores, locally known as “konbinis”, which makes sense since there are over 56,000 convenience stores in Japan. Often there will be different chains of convenience store located very close to one another; it is not uncommon to see stores around the corner from each other or on opposite sides of the street. Given Tokyo’s population density, it is understandable for competing businesses to be forced closer to each other, however, could there be any relationships between which chains of convenience stores are found near each other?

The goal will be to collect location data from numerous convenience store chains in a Tokyo neighbourhood, to understand if there are any relationships between which chains are co-located with each other. To do this will require:

  • Ability to query the location of different convenience stores in Tokyo, in order to retrieve each store’s name and location
  • Finding which convenience stores are co-located with each other within a pre-defined radius
  • Using the data on co-located stores to derive association rules
  • Plotting and visualising results for inspection

Let’s begin!

For our use case we want to find convenience stores in Tokyo, so first we’ll need to do a little homework on what are the common store chains. A quick Google search tells me that the main stores are FamilyMart, Lawson, 7-Eleven, Ministop, Daily Yamazaki and NewDays.

Now we know what we are searching, lets go to OSMNX; a great Python package for searching data in OpenStreetMap (OSM). According the OSM’s schema, we should be able to find the store name in either the ‘brand:en’ or ‘brand’ field.

We can start by importing some useful libraries for getting our data, and defining a function to return a table of locations for a given convenience store chain within a specified area:

import geopandas as gpd
from shapely.geometry import Point, Polygon
import osmnx
import shapely
import pandas as pd
import numpy as np
import networkx as nx

def point_finder(place, tags):
Returns a dataframe of coordinates of an entity from OSM.

place (str): a location (i.e., 'Tokyo, Japan')
tags (dict): key value of entity attribute in OSM (i.e., 'Name') and value (i.e., amenity name)
results (DataFrame): table of latitude and longitude with entity value

gdf = osmnx.geocode_to_gdf(place)
#Getting the bounding box of the gdf
bounding = gdf.bounds
north, south, east, west = bounding.iloc[0,3], bounding.iloc[0,1], bounding.iloc[0,2], bounding.iloc[0,0]
location = gdf.geometry.unary_union
#Finding the points within the area polygon
point = osmnx.geometries_from_bbox(north,
point = point[point.geometry.within(location)]
#Making sure we are dealing with points
point['geometry'] = point['geometry'].apply(lambda x : x.centroid if type(x) == Polygon else x)
point = point[point.geom_type != 'MultiPolygon']
point = point[point.geom_type != 'Polygon']

results = pd.DataFrame({'name' : list(point['name']),
'longitude' : list(point['geometry'].x),
'latitude' : list(point['geometry'].y)}

results['name'] = list(tags.values())[0]
return results

convenience_stores = place_finder(place = 'Shinjuku, Tokyo',
tags={"brand:en" : " "})

We can pass each convenience store name and combine the results into a single table of store name, longitude and latitude. For our use case we can focus on the Shinjuku neighbourhood in Tokyo, and see what the abundance of each convenience store looks like:

Frequency count of convenience stores. Image by author.

Clearly FamilyMart and 7-Eleven dominate in the frequency of stores, but how does this look spatially? Plotting geospatial data is pretty straightforward when using, which includes a nice interface for creating visualisations which can be saved as html objects or visualised directly in Jupyter notebooks:

Location map of Shinjuku convenience stores, colour coded by store name. Image by author.
Location map of Shinjuku convenience stores, colour coded density in a two minute walking radius (168m). image by author.

Now that we have our data, the next step will be to find nearest neighbours for each convenience store. To do this, we will be using Scikit Learn’s ‘BallTree’ class to find the names of the nearest convenience stores within a two minute walking radius. We are not interested in how many stores are considered nearest neighbours, so we will just look at which convenience store chains are within the defined radius.

# Convert location to radians
locations = convenience_stores[["latitude", "longitude"]].values
locations_radians = np.radians(locations)

# Create a balltree to search locations
tree = BallTree(locations_radians, leaf_size=15, metric='haversine')

# Find nearest neighbours in a 2 minute walking radius
is_within, distances = tree.query_radius(locations_radians, r=168/6371000, count_only=False, return_distance=True)

# Replace the neighbour indices with store names
df = pd.DataFrame(is_within)
df.columns = ['indices']
df['indices'] = [[val for val in row if val != idx] for idx, row in enumerate(df['indices'])]

# create temporary index column
convenience_stores = convenience_stores.reset_index()
# set temporary index column as index
convenience_stores = convenience_stores.set_index('index')
# create index-name mapping
index_name_mapping = convenience_stores['name'].to_dict()

# replace index values with names and remove duplicates
df['indices'] = df['indices'].apply(lambda lst: list(set(map(index_name_mapping.get, set(lst)))))
# Append back to original df
convenience_stores['neighbours'] = df['indices']

# Identify when a store has no neighbours
convenience_stores['neighbours'] = [lst if lst else ['no-neighbours'] for lst in convenience_stores['neighbours']]

# Unique store names
unique_elements = set([item for sublist in convenience_stores['neighbours'] for item in sublist])
# Count each stores frequency in the set of neighbours per location
counts = [dict(Counter(row)) for row in convenience_stores['neighbours']]

# Create a new dataframe with the counts
output_df = pd.DataFrame(counts).fillna(0)[sorted(unique_elements)]

If we want to improve the accuracy of our work, we could replace the haversine distance measure for something more accurate (i.e., walking times calculated using networkx), but we’ll keep things simple.

This will give us a DataFrame where each row corresponds to a location, and a binary count of which convenience store chains are nearby:

Sample DataFrame of convenience store nearest neighbours for each location. Image by author.

We now have a dataset ready to perform association rule mining. Using the mlxtend library we can derive association rules using the Apriori algorithm. There is a minimum support of 5%, so that we can examine only the rules related to frequent occurrences in our dataset (i.e., co-located convenience store chains). We use the metric ‘lift’ when deriving rules; lift is the ratio of the proportion of locations that contain both the antecedent and consequent relative to the expected support under the assumption of independence.

from mlxtend.frequent_patterns import association_rules, apriori

# Calculate apriori
frequent_set = apriori(output_df, min_support = 0.05, use_colnames = True)
# Create rules
rules = association_rules(frequent_set, metric = 'lift')
# Sort rules by the support value
rules.sort_values(['support'], ascending=False)

This gives us the following results table:

Association rules for convenience store data. Image by author.

We will now interpret these association rules to make some high level takeaway learnings. To interpret this table its best to read more about Association Rules, using these links:

Okay, back to the table.

Support is telling us how often different convenience store chains are actually found together. Therefore we can say that 7-Eleven and FamilyMart are found together in ~31% of the data. A lift over 1 indicates that the presence of the antecedent increases the likelihood of the consequent, suggesting that the locations of the two chains are partially dependent. On the other hand, the association between 7-Eleven and Lawson shows a higher lift but with a lower confidence.

Daily Yamazaki has a low support near our cutoff and shows a weak relationship with the location of FamilyMart, given by a lift slightly above 1.

Other rules are referring to combinations of convenience stores. For example when a 7-Eleven and FamilyMart are already co-located, there is a high lift value of 1.42 that suggests a strong association with Lawson.

If we had just stopped at finding the nearest neighbours for each store location, we would not have been able to determine anything about the relationships between these stores.

An example of why geospatial association rules can be insightful for businesses is in determining new store locations. If a convenience store chain is opening a new location, association rules can help to identify which stores are likely to co-occur.

The value in this becomes clear when tailoring marketing campaigns and pricing strategies, as it provides quantitative relationships about which stores are likely to compete. Since we know that FamilyMart and 7-Eleven often co-occur, which we demonstrate with association rules, it would make sense for both of these chains to pay more attention to how their products compete relative to other chains such as Lawson and Daily Yamazaki.

In this article we have created geospatial association rules for convenience store chains in a Tokyo neighbourhood. This was done using data extraction from OpenStreetMap, finding nearest neighbour convenience store chains, visualising data on maps, and creating association rules using an Apriori algorithm.

Thanks for reading!

Source link

Leave a Comment