## Understanding spatial trends in the location of Tokyo convenience stores

When walking around Tokyo you will often pass numerous convenience stores, locally known as “konbinis”, which makes sense since there are **over ****56,000** convenience stores in Japan. Often there will be different chains of convenience store located very close to one another; it is not uncommon to see stores around the corner from each other or on opposite sides of the street. Given Tokyo’s population density, it is understandable for competing businesses to be forced closer to each other, however, **could there be any relationships between which chains of convenience stores are found near each other?**

The goal will be to collect location data from numerous convenience store chains in a Tokyo neighbourhood, to understand if there are any relationships between which chains are co-located with each other. To do this will require:

- Ability to query the location of different convenience stores in Tokyo, in order to retrieve each store’s name and location
- Finding which convenience stores are co-located with each other within a pre-defined radius
- Using the data on co-located stores to derive association rules
- Plotting and visualising results for inspection

Let’s begin!

For our use case we want to find convenience stores in Tokyo, so first we’ll need to do a little homework on what are the common store chains. A quick Google search tells me that the main stores are **FamilyMart, Lawson, 7-Eleven, Ministop, Daily Yamazaki and NewDays**.

Now we know what we are searching, lets go to OSMNX; a great Python package for searching data in OpenStreetMap (OSM). According the OSM’s schema, we should be able to find the store name in either the ** ‘brand:en’** or

**field.**

*‘brand’*We can start by importing some useful libraries for getting our data, and defining a function to return a table of locations for a given convenience store chain within a specified area:

`import geopandas as gpd`

from shapely.geometry import Point, Polygon

import osmnx

import shapely

import pandas as pd

import numpy as np

import networkx as nxdef point_finder(place, tags):

'''

Returns a dataframe of coordinates of an entity from OSM.

Parameters:

place (str): a location (i.e., 'Tokyo, Japan')

tags (dict): key value of entity attribute in OSM (i.e., 'Name') and value (i.e., amenity name)

Returns:

results (DataFrame): table of latitude and longitude with entity value

'''

gdf = osmnx.geocode_to_gdf(place)

#Getting the bounding box of the gdf

bounding = gdf.bounds

north, south, east, west = bounding.iloc[0,3], bounding.iloc[0,1], bounding.iloc[0,2], bounding.iloc[0,0]

location = gdf.geometry.unary_union

#Finding the points within the area polygon

point = osmnx.geometries_from_bbox(north,

south,

east,

west,

tags=tags)

point.set_crs(crs=4326)

point = point[point.geometry.within(location)]

#Making sure we are dealing with points

point['geometry'] = point['geometry'].apply(lambda x : x.centroid if type(x) == Polygon else x)

point = point[point.geom_type != 'MultiPolygon']

point = point[point.geom_type != 'Polygon']

results = pd.DataFrame({'name' : list(point['name']),

'longitude' : list(point['geometry'].x),

'latitude' : list(point['geometry'].y)}

)

results['name'] = list(tags.values())[0]

return results

convenience_stores = place_finder(place = 'Shinjuku, Tokyo',

tags={"brand:en" : " "})

We can pass each convenience store name and combine the results into a single table of store name, longitude and latitude. For our use case we can focus on the **Shinjuku** neighbourhood in Tokyo, and see what the abundance of each convenience store looks like:

Clearly FamilyMart and 7-Eleven dominate in the frequency of stores, but how does this look spatially? Plotting geospatial data is pretty straightforward when using **Kepler.gl**, which includes a nice interface for creating visualisations which can be saved as html objects or visualised directly in Jupyter notebooks:

Now that we have our data, the next step will be to find nearest neighbours for each convenience store. To do this, we will be using Scikit Learn’s ‘BallTree’ class to find the names of the nearest convenience stores within a two minute walking radius. We are not interested in how many stores are considered nearest neighbours, so we will just look at which convenience store chains are within the defined radius.

`# Convert location to radians`

locations = convenience_stores[["latitude", "longitude"]].values

locations_radians = np.radians(locations)# Create a balltree to search locations

tree = BallTree(locations_radians, leaf_size=15, metric='haversine')

# Find nearest neighbours in a 2 minute walking radius

is_within, distances = tree.query_radius(locations_radians, r=168/6371000, count_only=False, return_distance=True)

# Replace the neighbour indices with store names

df = pd.DataFrame(is_within)

df.columns = ['indices']

df['indices'] = [[val for val in row if val != idx] for idx, row in enumerate(df['indices'])]

# create temporary index column

convenience_stores = convenience_stores.reset_index()

# set temporary index column as index

convenience_stores = convenience_stores.set_index('index')

# create index-name mapping

index_name_mapping = convenience_stores['name'].to_dict()

# replace index values with names and remove duplicates

df['indices'] = df['indices'].apply(lambda lst: list(set(map(index_name_mapping.get, set(lst)))))

# Append back to original df

convenience_stores['neighbours'] = df['indices']

# Identify when a store has no neighbours

convenience_stores['neighbours'] = [lst if lst else ['no-neighbours'] for lst in convenience_stores['neighbours']]

# Unique store names

unique_elements = set([item for sublist in convenience_stores['neighbours'] for item in sublist])

# Count each stores frequency in the set of neighbours per location

counts = [dict(Counter(row)) for row in convenience_stores['neighbours']]

# Create a new dataframe with the counts

output_df = pd.DataFrame(counts).fillna(0)[sorted(unique_elements)]

If we want to improve the accuracy of our work, we could replace the haversine distance measure for something more accurate (i.e., walking times calculated using networkx), but we’ll keep things simple.

This will give us a DataFrame where each row corresponds to a location, and a binary count of which convenience store chains are nearby:

We now have a dataset ready to perform association rule mining. Using the mlxtend library we can derive association rules using the Apriori algorithm. There is a minimum **support** of 5%, so that we can examine only the rules related to frequent occurrences in our dataset (i.e., co-located convenience store chains). We use the metric ‘lift’ when deriving rules; **lift** is the ratio of the proportion of locations that contain both the antecedent and consequent relative to the expected support under the assumption of independence.

`from mlxtend.frequent_patterns import association_rules, apriori`# Calculate apriori

frequent_set = apriori(output_df, min_support = 0.05, use_colnames = True)

# Create rules

rules = association_rules(frequent_set, metric = 'lift')

# Sort rules by the support value

rules.sort_values(['support'], ascending=False)

This gives us the following results table:

We will now interpret these association rules to make some high level takeaway learnings. To interpret this table its best to read more about Association Rules, using these links:

Okay, back to the table.

Support is telling us how often different convenience store chains are actually found together. Therefore we can say that 7-Eleven and FamilyMart are found together in ~31% of the data. A lift over 1 indicates that the presence of the antecedent increases the likelihood of the consequent, suggesting that the locations of the two chains are partially dependent. On the other hand, the association between 7-Eleven and Lawson shows a higher lift but with a lower confidence.

Daily Yamazaki has a low support near our cutoff and shows a weak relationship with the location of FamilyMart, given by a lift slightly above 1.

Other rules are referring to combinations of convenience stores. For example when a 7-Eleven and FamilyMart are already co-located, there is a high lift value of 1.42 that suggests a strong association with Lawson.

If we had just stopped at finding the nearest neighbours for each store location, we would not have been able to determine anything about the relationships between these stores.

An example of why geospatial association rules can be insightful for businesses is in determining new store locations. If a convenience store chain is opening a new location, association rules can help to identify which stores are likely to co-occur.

The value in this becomes clear when tailoring marketing campaigns and pricing strategies, as it provides quantitative relationships about which stores are likely to compete. Since we know that FamilyMart and 7-Eleven often co-occur, which we demonstrate with association rules, it would make sense for both of these chains to pay more attention to how their products compete relative to other chains such as Lawson and Daily Yamazaki.

In this article we have created geospatial association rules for convenience store chains in a Tokyo neighbourhood. This was done using data extraction from OpenStreetMap, finding nearest neighbour convenience store chains, visualising data on maps, and creating association rules using an Apriori algorithm.

Thanks for reading!