title

An intelligent location study and machine learning algorithms to select locations from a Italian restaurant in the city of San Francisco

The Italian restaurant of San Francisco are part of the culture of the city, the customs of its inhabitants and its tourist circuit. They have been the subject of study by different writers, inspirers of countless artistic creations and traditional union meeting. In this project, the idea is to find an optimal location for a new Italian restaurant, based on machine learning algorithms taken from the "The Battle of Neighborhoods: Coursera Capstone Project" course (1). Starting from the association of Italian restaurant with restaurants, we will first try to detect locations based on the definition of factors that will influence our decision:

1- Places that are not yet full of restaurants.

2- Areas with little or no cafe nearby.

3- Near the center, if possible, assuming the first two conditions are met.

With these simple parameters we will program an algorithm to discover what solutions can be obtained.

Data Source

The following data sources will be needed to extract and generate the required information:

1.- The centers of the candidate areas will be generated automatically following the algorithm and the approximate addresses of the centers of these areas will be obtained using one of the Geopy Geocoders packages. (2)

2-The number of restaurants, their type and location in each neighborhood will be obtained using the Foursquare API. (3)

The data will be used in the following scenarios:

1- To discover the density of all restaurants and cafes from the data extracted.

2- To identify areas that are not very dense and not very competitive.

3- To calculate the distances between competing restaurants.

Locate the candidates

The target area will be the center of the city, where tourist attractions are more numerous compared to other places. From this we will create a grid of cells that covers the area of ​​interest which will be about 12x12 kilometers centered around the center of the city of San Francisco.

In [140]:
import requests

from geopy.geocoders import Nominatim


address = '199 Gough St, San Francisco, CA 94102, USA'
geolocator = Nominatim(user_agent="usa_explorer")
location = geolocator.geocode(address)
lat = location.latitude
lng = location.longitude
sf_center = [lat, lng]
print('Coordinate of {}: {}'.format(address, sf_center), ' location : ', location)
Coordinate of 199 Gough St, San Francisco, CA 94102, USA: [37.7752096, -122.4227735]  location :  Rich Table, 199, Gough Street, Western Addition, San Francisco, San Francisco City and County, California, 94102, United States

We create a grid of the equidistant candidate areas, centered around the city center and that is 6 km around this point, for this we calculate the distances we need to create our grid of locations in a 2D Cartesian coordinate system that will allow us to then Calculate distances in meters.

Next, we will project these coordinates in degrees of latitude / longitude to be displayed on the maps with Mapbox and Folium (3).

In [141]:
#!pip install shapely
import shapely.geometry

#!pip install pyproj
import pyproj

import math

def lonlat_to_xy(lon, lat):
    proj_latlon = pyproj.Proj(proj='latlong',datum='WGS84')
    proj_xy = pyproj.Proj(proj="utm", zone=10, datum='WGS84')
    xy = pyproj.transform(proj_latlon, proj_xy, lon, lat)
    return xy[0], xy[1]

def xy_to_lonlat(x, y):
    proj_latlon = pyproj.Proj(proj='latlong',datum='WGS84')
    proj_xy = pyproj.Proj(proj="utm", zone=10, datum='WGS84')
    lonlat = pyproj.transform(proj_xy, proj_latlon, x, y)
    return lonlat[0], lonlat[1]

def calc_xy_distance(x1, y1, x2, y2):
    dx = x2 - x1
    dy = y2 - y1
    return math.sqrt(dx*dx + dy*dy)

print('Coordinate Verification')
print('-------------------------------')
print('San Francisco Center Union Square longitude={}, latitude={}'.format(sf_center[1], sf_center[0]))
x, y = lonlat_to_xy(sf_center[1], sf_center[0])
print('San Francisco Center Union Square UTM X={}, Y={}'.format(x, y))
lo, la = xy_to_lonlat(x, y)
print('San Francisco Center Union Square longitude={}, latitude={}'.format(lo, la))
Coordinate Verification
-------------------------------
San Francisco Center Union Square longitude=-122.4227735, latitude=37.7752096
San Francisco Center Union Square UTM X=550833.4653390996, Y=4181031.39254272
San Francisco Center Union Square longitude=-122.4227735, latitude=37.7752096

We create a hexagonal grid of cells: we move all the lines and adjust the spacing of the vertical lines so that each cell center is equidistant from all its neighbors.

In [142]:
sf_center_x, sf_center_y = lonlat_to_xy(sf_center[1], sf_center[0]) # City center in Cartesian coordinates

k = math.sqrt(3) / 2 # Vertical offset for hexagonal grid cells
x_min = sf_center_x - 6000
x_step = 600
y_min = sf_center_y - 6000 - (int(21/k)*k*600 - 12000)/2
y_step = 600 * k 

latitude = []
longitude = []
distances_from_center = []
xs = []
ys = []
for i in range(0, int(21/k)):
    y = y_min + i * y_step
    x_offset = 300 if i%2==0 else 0
    for j in range(0, 21):
        x = x_min + j * x_step + x_offset
        distance_from_center = calc_xy_distance(sf_center_x, sf_center_y, x, y)
        if (distance_from_center <= 6001):
            lon, lat = xy_to_lonlat(x, y)
            latitude.append(lat)
            longitude.append(lon)
            distances_from_center.append(distance_from_center)
            xs.append(x)
            ys.append(y)

print(len(latitudes), 'Union Square San Francisco grid - SF')
728 Union Square San Francisco grid - SF

Let's look at the data we have so far: location in the center and the candidate neighborhood centers:

In [143]:
import folium
In [144]:
tileset = r'https://api.mapbox.com'
attribution = (r'Map data © <a href="http://openstreetmap.org">OpenStreetMap</a>'
                ' contributors, Imagery © <a href="http://mapbox.com">MapBox</a>')

map_sf = folium.Map(location=sf_center, zoom_start=14, tiles=tileset, attr=attribution)
folium.Marker(sf_center, popup='San Francisco').add_to(map_sf)
for lat, lon in zip(latitude, longitude):
    #folium.CircleMarker([lat, lon], radius=2, color='blue', fill=True, fill_color='blue', fill_opacity=1).add_to(map_lyon) 
    folium.Circle([lat, lon], radius=300, color='purple', fill=False).add_to(map_sf)
    #folium.Marker([lat, lon]).add_to(map_caba)
map_sf
Out[144]:

At this point, we now have the coordinates of the local centers / areas to be evaluated, at the same distance (the distance between each point and its neighbors is exactly the same) and approximately 4 km from downtown San Francisco.

In [145]:
def get_address(lat, lng):
    #print('entering get address')
    try:
        #address = '{},{}'.format(lat, lng)
        address = [lat, lng]
        geolocator = Nominatim(user_agent="usa_explorer")
        location = geolocator.geocode(address)
        #print(location[0])
        return location[0]
    except:
        return 'nothing found'


addr = get_address(sf_center[0], sf_center[1])
print('Reverse geocoding check')
print('-----------------------')
print('Address of [{}, {}] is: {}'.format(sf_center[0], sf_center[1], addr)) 
print(type(location[0]))
Reverse geocoding check
-----------------------
Address of [37.7752096, -122.4227735] is: Rich Table, 199, Gough Street, Western Addition, San Francisco, San Francisco City and County, California, 94102, United States
<class 'str'>
In [146]:
print('Getting Locations: ', end='')
addresses = []
for lat, lon in zip(latitude, longitude):
    address = get_address(lat, lon)
    if address is None:
        address = 'NO ADDRESS'
    address = address.replace(', United States', '') 
    addresses.append(address)
    print(' .', end='')
print(' done.')
Getting Locations:  . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . done.
In [180]:
import pandas as pd

df_locations = pd.DataFrame({'Dirección': addresses,
                             'Latitude': latitude,
                             'Longitude': longitude,
                             'X': xs,
                             'Y': ys,
                             'Distance from centroid': distances_from_center})

df_locations.head()
Out[180]:
Dirección Latitude Longitude X Y Distance from centroid
0 San Jose Avenue, Excelsior, San Francisco, San... 37.723793 -122.443598 549033.465339 4.175316e+06 5992.495307
1 nothing found 37.723760 -122.436790 549633.465339 4.175316e+06 5840.376700
2 335, Edinburgh Street, Excelsior, San Francisc... 37.723727 -122.429982 550233.465339 4.175316e+06 5747.173218
3 John McLaren Park Playground, Burrows Street, ... 37.723694 -122.423174 550833.465339 4.175316e+06 5715.767665
4 400, Yale Street, Portola, San Francisco, San ... 37.723661 -122.416365 551433.465339 4.175316e+06 5747.173218
In [181]:
df_locations.shape
Out[181]:
(364, 6)
In [182]:
df_locations.to_pickle('./Dataset/sf_locations.pkl')    

Foursquare

Now we will use the Foursquare API to explore the number of restaurants available within these grids and we will limit the search to food categories to retrieve latitude and longitude data from restaurants and Italian restaurant.

In [183]:
client_id = 'xxx'
client_secret = 'xxx'
VERSION = 'xxx'

We use the Foursquare API to explore the number of restaurants available within 4 km of downtown San Francisco and limit the search to all locations associated with the category of restaurants and especially those that correspond to Italian restaurants.

In [184]:
food_category = '4d4b7105d754a06374d81259' 

sf_italian_categories = ['4bf58dd8d48988d110941735', '55a5a1ebe4b013909087cbb6', '55a5a1ebe4b013909087cb7c', '55a5a1ebe4b013909087cba7',
                       '55a5a1ebe4b013909087cba1', '55a5a1ebe4b013909087cba4', '55a5a1ebe4b013909087cb95', '55a5a1ebe4b013909087cb89',
                       '55a5a1ebe4b013909087cb9b', '55a5a1ebe4b013909087cb98', '55a5a1ebe4b013909087cbbf', '55a5a1ebe4b013909087cb79',
                       '55a5a1ebe4b013909087cbb0', '55a5a1ebe4b013909087cbb3', '55a5a1ebe4b013909087cb74', '55a5a1ebe4b013909087cbaa',
                       '55a5a1ebe4b013909087cb83', '55a5a1ebe4b013909087cb8c', '55a5a1ebe4b013909087cb92', '55a5a1ebe4b013909087cb8f',
                       '55a5a1ebe4b013909087cb86', '55a5a1ebe4b013909087cbb9', '55a5a1ebe4b013909087cb7f', '55a5a1ebe4b013909087cbbc',
                       '55a5a1ebe4b013909087cb9e', '55a5a1ebe4b013909087cbc2', '55a5a1ebe4b013909087cbad'] # 'Food' Catégorie de restaurants cafe
In [185]:
def is_restaurant(categories, specific_filter=None):
    restaurant_words = ['restaurant', 'sushi', 'hamburger', 'seafood']
    restaurant = False
    specific = False
    for c in categories:
        category_name = c[0].lower()
        category_id = c[1]
        for r in restaurant_words:
            if r in category_name:
                restaurant = True
        if 'Restaurante' in category_name:
            restaurant = False
        if not(specific_filter is None) and (category_id in specific_filter):
            specific = True
            restaurant = True
    return restaurant, specific

def get_categories(categories):
    return [(cat['name'], cat['id']) for cat in categories]

def format_address(location):
    address = ', '.join(location['formattedAddress'])
    address = address.replace(', USA', '')
    address = address.replace(', United States', '')
    return address

def get_venues_near_location(lat, lon, category, client_id, client_secret, radius=500, limit=1000):
    version = '20180724'
    url = 'https://api.foursquare.com/v2/venues/explore?client_id={}&client_secret={}&v={}&ll={},{}&categoryId={}&radius={}&limit={}'.format(
        client_id, client_secret, version, lat, lon, category, radius, limit)
    try:
        results = requests.get(url).json()['response']['groups'][0]['items']
        venues = [(item['venue']['id'],
                   item['venue']['name'],
                   get_categories(item['venue']['categories']),
                   (item['venue']['location']['lat'], item['venue']['location']['lng']),
                   format_address(item['venue']['location']),
                   item['venue']['location']['distance']) for item in results]        
    except:
        venues = []
    return venues
In [186]:
import pickle

def get_restaurants(lats, lons):
    restaurants = {}
    sf_italian = {}
    location_restaurants = []

    print('Obtaining the candidates', end='')
    for lat, lon in zip(lats, lons):
        venues = get_venues_near_location(lat, lon, food_category, client_id, client_secret, radius=350, limit=100)
        area_restaurants = []
        for venue in venues:
            venue_id = venue[0]
            venue_name = venue[1]
            venue_categories = venue[2]
            venue_latlon = venue[3]
            venue_address = venue[4]
            venue_distance = venue[5]
            is_res, is_italian = is_restaurant(venue_categories, specific_filter=sf_italian_categories)
            if is_res:
                x, y = lonlat_to_xy(venue_latlon[1], venue_latlon[0])
                restaurant = (venue_id, venue_name, venue_latlon[0], venue_latlon[1], venue_address, venue_distance, is_italian, x, y)
                if venue_distance<=300:
                    area_restaurants.append(restaurant)
                restaurants[venue_id] = restaurant
                if is_italian:
                    sf_italian[venue_id] = restaurant
        location_restaurants.append(area_restaurants)
        print(' .', end='')
    print(' done.')
    return restaurants, sf_italian, location_restaurants


restaurants = {}
sf_italian = {}
location_restaurants = []
loaded = False
try:
    with open('/Dataset/restaurants_350.pkl', 'rb') as f:
        restaurants = pickle.load(f)
        print('Restaurant data loaded.')
    with open('/Dataset/sf_italian_350.pkl', 'rb') as f:
        caba_cafe = pickle.load(f)
        print('Descargando Datos de las Cafeterías')
    with open('/Dataset/location_restaurants_350.pkl', 'rb') as f:
        location_restaurants = pickle.load(f)
        print('Downloading data from San Francisco Restaurants')
    loaded = True
except:
    print('Restaurant Data Downloading')
    pass


if not loaded:
    restaurants, sf_italian, location_restaurants = get_restaurants(latitudes, longitudes)
    
Restaurant Data Downloading
Obtaining the candidates . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . done.
In [187]:
import numpy as np
In [188]:
print('**Results**',)
print('Total Number of Restaurants:', len(restaurants))
print('Total Number of Italian restaurants:', len(sf_italian))
print('Percentage of Italian restaurants: {:.2f}%'.format(len(sf_italian) / len(restaurants) * 100))
print('Average of Venues per grid:', np.array([len(r) for r in location_restaurants]).mean())
**Results**
Total Number of Restaurants: 1681
Total Number of Italian restaurants: 118
Percentage of Italian restaurants: 7.02%
Average of Venues per grid: 4.052197802197802
In [189]:
print('List of All Restaurants')
print('-----------------------')
for r in list(restaurants.values())[:10]:
    print(r)
print('...')
print('Total:', len(restaurants))
List of All Restaurants
-----------------------
('4ceec3b83b03f04de88d3bdc', "Henry's Hunan Restaurant", 37.72218603642267, -122.43659651808754, '4753 Mission St, San Francisco, CA 94112', 176, False, 549651.5465355547, 4175141.074954057)
('4a0e123af964a520c2751fe3', 'Taquerias El Farolito', 37.72122961664814, -122.43739536867459, '4817 Mission St (at Onondaga St), San Francisco, CA 94112', 286, False, 549581.7824803684, 4175034.538650764)
('546960f7498eac74bd5baf47', 'Tao Sushi', 37.721036775089686, -122.4376651904847, '4808 Mission At (Onondaga Ave), San Francisco, CA', 312, False, 549558.1316389883, 4175013.0004052045)
('4b244110f964a520c76424e3', 'Taqueria Guadalajara', 37.7212324569519, -122.43763599260711, '4798 Mission St (at Onondaga Ave), San Francisco, CA 94112', 291, False, 549560.574468874, 4175034.726401493)
('4a6b8478f964a520ecce1fe3', 'Mexico Tipico', 37.72501226746621, -122.43447912554541, '4581 Mission St (at Brazil Ave), San Francisco, CA 94112', 246, False, 549836.2556397481, 4175455.7650877447)
('4a91a3faf964a520171b20e3', 'Beijing Restaurant 北京小馆', 37.723599683798, -122.43719187724251, '1801 Alemany Blvd (at Ocean Ave), San Francisco, CA 94112', 39, False, 549598.1357189683, 4175297.6010806696)
('588e3e6632b072494c6cf57e', 'An Chi', 37.72343008519264, -122.43573516334256, '4683 Mission St, San Francisco, CA 94112', 99, False, 549726.6248046655, 4175279.5569969686)
('4aff274cf964a5200b3522e3', 'Hawaiian Drive Inn #28', 37.72114068878443, -122.43738942911332, '4827 Mission St, San Francisco, CA 94112', 296, False, 549582.3652084664, 4175024.675411926)
('57bd06c8cd10e903763a7664', 'Hwaro', 37.725637597880784, -122.43431782363075, '4516 Mission St, San Francisco, CA 94112', 322, False, 549850.0512717982, 4175525.230272441)
('5941ec67e2ead1688f4f464a', 'El Gran Taco Loco', 37.724746, -122.43448300000001, '4591 Mission St, San Francisco, CA 94112', 230, False, 549836.0926191276, 4175426.2211156166)
...
Total: 1681
In [190]:
print('List of all Italian restaurants')
print('---------------------------')
for r in list(sf_italian.values())[:10]:
    print(r)
print('...')
print('Total:', len(sf_italian))
List of all Italian restaurants
---------------------------
('4be4bf122457a593e2b9aa15', 'Marche Club', 37.728095, -122.432397, '4346 Mission St (btwn Tingley St & Theresa St), San Francisco, CA 94112', 91, True, 550017.6701432205, 4175798.899217597)
('4ef010c00e01e1fde2099099', 'Manzoni', 37.73467816914885, -122.43389799980405, '2790 Diamond St, San Francisco, CA 94131', 302, True, 549880.9832699064, 4176528.490363779)
('5195394d498e344eeb952b4f', 'Trattoria Da Vittorio', 37.739295412112625, -122.46759110305597, '150 West Portal Ave, San Francisco, CA 94127', 151, True, 546909.2447572381, 4177023.347445145)
('4be72d932457a593b8a6ad15', 'Spiazzo Ristorante', 37.74049906835031, -122.46611414213069, '33 West Portal Ave, San Francisco, CA 94127', 306, True, 547038.6154491554, 4177157.632339159)
('4b2edd7df964a520a2e724e3', 'Vega', 37.7391742135669, -122.41743951497574, '419 Cortland Ave (btwn Bennington & Wool), San Francisco, CA 94110', 253, True, 551328.0990331663, 4177036.2170301196)
('4ae4ff0cf964a520f49f21e3', 'VinoRosso', 37.73901245660888, -122.41534272358848, '629 Cortland Ave (at Anderson Street), San Francisco, CA 94110', 263, True, 551512.9563385877, 4177019.42214691)
('49bed272f964a520e3541fe3', 'La Ciccia', 37.74200800946477, -122.42653101682663, '291 30th St (at Church), San Francisco, CA 94131', 311, True, 550525.1341258159, 4177345.6763315448)
('58c6b74f730a925fc305a126', 'Ardiana', 37.74248738572593, -122.42650722060347, '1781 Church St, San Francisco, CA 94131', 306, True, 550526.9048224975, 4177398.875309537)
('4b5fb718f964a5209dc929e3', 'Cafe Stefano', 37.74236536, -122.423196, '59 30th St (btw Mission & San Jose), San Francisco, CA 94110', 16, True, 550818.7219270115, 4177387.1293513896)
('4be1d60c4283c9b68da754f8', 'South Beach Cafe', 37.74791482485267, -122.43318557739258, '800 Embarcadero, San Francisco, CA 94107', 84, True, 549934.8644184133, 4177997.458160931)
...
Total: 118
In [191]:
print('Author Restaurants')
print('---------------------------')
for i in range(100, 110):
    rs = location_restaurants[i][:8]
    names = ', '.join([r[1] for r in rs])
    print('Restaurants around location {}: {}'.format(i+1, names))
Author Restaurants
---------------------------
Restaurants around location 101: 
Restaurants around location 102: Rainbow Cafe
Restaurants around location 103: 
Restaurants around location 104: restaurante pressman@berman, Le Chateau De Bob
Restaurants around location 105: 
Restaurants around location 106: Lolinda, Foreign Cinema, El Techo, Loló, Radio Habana Social Club, Naked Kitchen, Californios, Udupi Palace
Restaurants around location 107: Heirloom Café, Bon, Nene, El Metate, flour + water, Sushi Hon, Mis Antojitos, El Porvenir Produce Market, Sasaki
Restaurants around location 108: La Paz Restaurant Pupuseria, VBOWLS
Restaurants around location 109: 
Restaurants around location 110: ChocolateLab

All restaurants in the city of San Francisco are indicated in gray and those associated with Italian restaurants will be highlighted in red.

In [192]:
map_sf = folium.Map(location=sf_center, zoom_start=13, tiles=tileset, attr=attribution)
folium.Marker(sf_center, popup='San Francisco').add_to(map_sf)
for res in restaurants.values():
    lat = res[2]; lon = res[3]
    is_cafe = res[6]
    color = 'red' if is_cafe else 'grey'
    folium.CircleMarker([lat, lon], radius=3, color=color, fill=True, fill_color=color, fill_opacity=1).add_to(map_sf)
map_sf
Out[192]: