Skip to Content

Calculate Isochrones and Closeness Centrality

Learn how to use Shortest Paths One-To-All (SPOA) and Breadth First Search (BFS) functions, which allow you to calculate isochrones and closeness centrality.
You will learn
  • How to use Shortest Path One-to-All (SPOA) function to calculate minimum distance
  • How to use Breadth First Search (BFS) function to implement Closeness Centrality
Created by
VijayKrishnanSR
July 26, 2021
Contributors
VijayKrishnanSR

Prerequisites

In this tutorial, you will explore two more fundamental functions: Shortest Path One-to-All (SPOA) and Breadth First Search (BFS). The first one does the obvious - given a start vertex, it calculates the distance/cost to every other vertex in the graph. SPOA can be used to calculate Isochrones (*), i.e. areas with the same drive time distance. While BFS is a fundamental building block for many graph algorithms, including Closeness Centrality - one of the multiple standard centrality algorithms to identify the importance of a vertex in a network.

This tutorial consists of two steps:

  • Calculate minimum distance using SPOA function
  • Implement closeness centrality using BFS function
  • Step 1

    The Shortest_Path_One_To_All function is like the Shortest_Path that you have used in the previous tutorials. As the name suggests, it calculates the minimum distances/costs from a start vertex to all other vertices in the network. The result is not a single WeigthedPath, but a Graph which contains edges of shortest paths only. The vertices of the output Graph have an attribute CALCULATED_COST which indicates the minimum distance/cost required to reach the vertex.

    The SPOA statement is constructed like this:

    SQL
    Copy
    GRAPH g_spoa = SHORTEST_PATHS_ONE_TO_ALL(:g, :v_start, "CALCULATED_COST",
    (EDGE e, DOUBLE current_path_cost) => DOUBLE {
    IF(:current_path_cost < :i_max) { RETURN :e."length"/DOUBLE(:e."SPEED_MPH"); }
    ELSE { END TRAVERSE; }
    });
    
    1. Here is the complete statement that you can run in your example. As previously, first you create a table type and then the GRAPH procedure:

      SQL
      Copy
      CREATE TYPE "TT_SPOA_VERTICES" AS TABLE ("osmid" BIGINT, "CALCULATED_COST" DOUBLE);
      CREATE OR REPLACE PROCEDURE "GS_SPOA"(
      IN i_startVertex BIGINT, 		-- the key of the start vertex
      IN i_max DOUBLE,				-- the maximum distance/cost
      OUT o_vertices "TT_SPOA_VERTICES"
      )
      LANGUAGE GRAPH READS SQL DATA AS
      BEGIN
      GRAPH g = Graph("DAT260", "LONDON_GRAPH");
      VERTEX v_start = Vertex(:g, :i_startVertex);
      -- Running shortest paths one to all, which returns a subgraph. The WEIGHT based path length to a vertex is stored in the attribute CALCULATED_COST
      GRAPH g_spoa = SHORTEST_PATHS_ONE_TO_ALL(:g, :v_start, "CALCULATED_COST",
      	(EDGE e, DOUBLE current_path_cost) => DOUBLE{
        			IF(:current_path_cost < :i_max) { RETURN :e."length"/(DOUBLE(:e."SPEED_MPH")*0.44704); }
              ELSE { END TRAVERSE; }
        		});
      o_vertices = SELECT :v."osmid", :v."CALCULATED_COST" FOREACH v IN Vertices(:g_spoa);
      END;
      
    2. As an example, what to do with the SPOA method, you can calculate, where you can go in 300 seconds from your starting point Canary Wharf with this statement:

      SQL
      Copy
      CALL "GS_SPOA" (1433737988, 300, ?);
      
    3. To visualize the data, let’s wrap it in a function and do some post-processing to our graph results. With resultType = POINTS the function will return the raw data, i.e. the individual street intersections that are reachable given a certain maximum of travel time. With resultType = CONVEXHULL the function will calculate a convex hull of the points and return a single shape. Finally, with resultType = HEXAGON a spatial clustering is applied, and travel times are averaged for each cluster cell.

      SQL
      Copy
      CREATE OR REPLACE FUNCTION "F_SPOA_VERTICES"(
      IN i_startVertex BIGINT, 		-- the key of the start vertex
      IN i_max DOUBLE,				-- the maximum distance/cost
      IN i_resultType VARCHAR(20)	-- indicates if the result should be POINTS, CONVEXHULL, or HEXAGON
      )
      RETURNS TABLE("ID" BIGINT, "SHAPE" ST_GEOMETRY(32630), "CALCULATED_COST" DOUBLE)
      LANGUAGE SQLSCRIPT READS SQL DATA AS
      BEGIN
      CALL "GS_SPOA"(:i_startVertex, :i_max, o_path_vertices);
      IF (:i_resultType = 'POINTS') THEN
      	RETURN SELECT pv."osmid" AS "ID", lv."SHAPE", pv."CALCULATED_COST"
      	FROM :o_path_vertices AS pv
      	LEFT JOIN "LONDON_VERTICES" lv ON pv."osmid" = lv."osmid";
      ELSEIF (:i_resultType = 'CONVEXHULL') THEN
      	RETURN SELECT i_startVertex AS "ID", ST_CONVEXHULLAGGR("SHAPE") AS "SHAPE", :i_max AS "CALCULATED_COST" FROM (
      	SELECT pv."osmid", lv."SHAPE", pv."CALCULATED_COST"
      	FROM :o_path_vertices AS pv
      	LEFT JOIN "LONDON_VERTICES" lv ON pv."osmid" = lv."osmid");
      ELSEIF (:i_resultType = 'HEXAGON') THEN
      	RETURN SELECT ST_CLUSTERID() AS "ID", ST_CLUSTERCELL() AS "SHAPE", CAST(AVG("CALCULATED_COST") AS DOUBLE) AS "CALCULATED_COST" FROM (
      	SELECT pv."osmid", lv."SHAPE", pv."CALCULATED_COST"
      	FROM :o_path_vertices AS pv
      	LEFT JOIN "LONDON_VERTICES" lv ON pv."osmid" = lv."osmid")
      	GROUP CLUSTER BY "SHAPE" USING HEXAGON X CELLS 50;
      END IF;
      END;
      
    4. You can use these three statements to explore from your starting point Canary Wharf:

      SQL
      Copy
      SELECT * FROM "F_SPOA_VERTICES"(1433737988, 60, 'POINTS') ORDER BY "CALCULATED_COST" DESC;
      SELECT * FROM "F_SPOA_VERTICES"(1433737988, 60, 'CONVEXHULL') ORDER BY "CALCULATED_COST" DESC;
      SELECT * FROM "F_SPOA_VERTICES"(1433737988, 240, 'HEXAGON') ORDER BY "CALCULATED_COST" DESC;
      
    5. Using a GIS application like Esri ArcGIS Pro(*) or QGIS(*), the HEXAGON result can be color-coded by the average CALCULATED_COST, resulting in a map visualization like below. Areas with the same color can be reached with the same drive time - so called isochrones.

      ISOCHRONE Color
    6. Let’s say you next want to do business with cyclists in London. Your goal is to open a new bike repair shop. To find the right location for your shop, you could look at the existing repair stations and their reach. Maybe you can find white spots on the map where there is not a lot of competition.

      So let’s calculate 3 min drive time areas around all the bike repair shops in London. You’ll use the MAP_MERGE operation to digest multiple POI’s:

      SQL
      Copy
      CREATE OR REPLACE FUNCTION "F_SPOA_VERTICES_MULTI" (IN i_filter VARCHAR(5000), IN i_max DOUBLE, IN i_resultType VARCHAR(20))
      RETURNS TABLE("ID" BIGINT, "SHAPE" ST_GEOMETRY(32630), "CALCULATED_COST" DOUBLE)
      LANGUAGE SQLSCRIPT READS SQL DATA AS
      BEGIN
      startPOIs = APPLY_FILTER("LONDON_POI", :i_filter);
      res = MAP_MERGE(:startPOIs, "F_SPOA_VERTICES"(:startPOIs."VERTEX_OSMID", :i_max, :i_resultType));
      RETURN SELECT * FROM :res;
      END;
      SELECT * FROM "F_SPOA_VERTICES_MULTI"(' "amenity" = ''bicycle_repair_station'' ', 180, 'CONVEXHULL');
      

    The result is a set of CONVEXHULL polygons which indicate “good repair shop coverage”.

    ISOCHRONE Blue
  • Step 2

    For this step, you will use a different network, the London Tube network. It is a simple dataset in which the tube stations represent vertices, the sequence of stations along a tube line are the edges.

    TUBE LINE
    1. First, select the tables and then create a GRAPH workspace with these statements:

      SQL
      Copy
      SELECT * FROM "LONDON_TUBE_STATIONS";
      SELECT * FROM "LONDON_TUBE_CONNECTIONS";
      CREATE GRAPH WORKSPACE "TUBE_GRAPH"
      EDGE TABLE "LONDON_TUBE_CONNECTIONS"
      	SOURCE COLUMN "SOURCE"
      	TARGET COLUMN "TARGET"
      	KEY COLUMN "ID"
      VERTEX TABLE "LONDON_TUBE_STATIONS"
      	KEY COLUMN "ID";
      

      When you traverse a network in a breadth first search manner, you explore the vertices level by level. Starting at a vertex, you first visit all its direct neighbors, then the neighbors of the neighbors and so forth. BFS is a fundamental building block for many custom graph algorithms. In SAP HANA Cloud, SAP HANA database you can “hook” into the vertex/edge “visit” events, executing any logic.

      SQL
      Copy
      TRAVERSE BFS ('ANY') :g FROM :v_start
      ON VISIT VERTEX (Vertex v_visited, BIGINT lvl) {
      [any logic goes here]
      };
      

      In the example below, you will hook into the VISIT VERTEX event and calculate a count and a cost. These numbers are then used to derive multiple closeness centrality measures.

    2. As usual, create the table type first:

      SQL
      Copy
      CREATE TYPE "TT_RESULT_CC" AS TABLE (
      "ID" BIGINT, "CLOSENESS_CENTRALITY" DOUBLE, "NORMALIZED_CLOSENESS_CENTRALITY" DOUBLE, "HARMONIC_CENTRALITY" DOUBLE, "NORMALIZED_HARMONIC_CENTRALITY" DOUBLE
      );
      
    3. Then, create the GRAPH procedure:

      SQL
      Copy
      CREATE OR REPLACE PROCEDURE "GS_CC_SINGLE_SOURCE"(
      IN i_start BIGINT,
      OUT o_vertices "TT_RESULT_CC")
      LANGUAGE GRAPH READS SQL DATA AS
      BEGIN
      GRAPH g = Graph("DAT260","TUBE_GRAPH");
        -- you need to add attributes to the vertices to store the data
      ALTER g ADD TEMPORARY VERTEX ATTRIBUTE (DOUBLE "CLOSENESS_CENTRALITY");
      ALTER g ADD TEMPORARY VERTEX ATTRIBUTE (DOUBLE "NORMALIZED_CLOSENESS_CENTRALITY");
      ALTER g ADD TEMPORARY VERTEX ATTRIBUTE (DOUBLE "HARMONIC_CENTRALITY");
      ALTER g ADD TEMPORARY VERTEX ATTRIBUTE (DOUBLE "NORMALIZED_HARMONIC_CENTRALITY");
        -- initialize the start vertex and some variables
      VERTEX v_start = Vertex(:g, :i_start);
      BIGINT v_sumNodes = 0L;
      BIGINT v_sumCost = 0L;
      DOUBLE v_sumReciprocCost = 0.0;
        -- now you are traversing the graph from the start vertex, following the edges in any direction.
        -- when a vertex is visited, the vertex is accessible as "v_visited". The "level" information is stored in "lvl".
        -- within the vertex visit event, you increase the sum of visited nodes and the sum of costs.
      TRAVERSE BFS ('ANY') :g FROM :v_start ON VISIT VERTEX (Vertex v_visited, BIGINT lvl) {
          IF (:lvl > 0L){
          	v_sumNodes = :v_sumNodes + 1L;
      	    v_sumCost = :v_sumCost + :lvl;
      	    v_sumReciprocCost = :v_sumReciprocCost + 1.0/DOUBLE(:lvl);
      	}
      };
        -- if the traversal is finished, you derive the final measures
      IF (:v_sumCost > 0L AND :v_sumReciprocCost > 0.0 AND :v_sumNodes > 1L){
      	v_start."CLOSENESS_CENTRALITY" = 1.0/DOUBLE(:v_sumCost);
      	v_start."NORMALIZED_CLOSENESS_CENTRALITY" = DOUBLE(:v_sumNodes)/DOUBLE(:v_sumCost);
      	v_start."HARMONIC_CENTRALITY" = :v_sumReciprocCost;
      	v_start."NORMALIZED_HARMONIC_CENTRALITY" = :v_sumReciprocCost/DOUBLE(:v_sumNodes);
      }
      MULTISET<Vertex> m_v = v IN Vertices(:g) WHERE :v."CLOSENESS_CENTRALITY" >= 0.0;
      o_vertices = SELECT :v."ID", :v."CLOSENESS_CENTRALITY", :v."NORMALIZED_CLOSENESS_CENTRALITY", :v."HARMONIC_CENTRALITY", :v."NORMALIZED_HARMONIC_CENTRALITY" FOREACH v IN :m_v;
      END;
      
    4. Once the procedure is created, you can call it:

      SQL
      Copy
      CALL "GS_CC_SINGLE_SOURCE"(117, ?);
      

      The closeness centrality measure for a single vertex in a network is not very meaningful. You need to calculate the centrality for all vertices to find the most important one. You can do this by adding a loop into your GRAPH program. A nice way to implement such loops in a parallel way is by using the MAP_MERGE operator in SQLScript. For this, you need to wrap the procedure in a function like you did in tutorial 9, step 4.

    5. Create the function like this:

      SQL
      Copy
      CREATE OR REPLACE FUNCTION "F_CC_SINGLE_SOURCE"(IN i_start BIGINT)
      RETURNS "TT_RESULT_CC"
      LANGUAGE SQLSCRIPT READS SQL DATA AS
      BEGIN
      CALL "GS_CC_SINGLE_SOURCE"(:i_start, result);
      RETURN :result;
      END;
      
    6. Next, you can invoke this function on a set input parameter in a parallel way.

      SQL
      Copy
      CREATE OR REPLACE FUNCTION "F_CC_MAP_MERGE" ()
      RETURNS "TT_RESULT_CC"
      LANGUAGE SQLSCRIPT READS SQL DATA AS
      BEGIN
      startVertices = SELECT DISTINCT "ID" FROM "LONDON_TUBE_STATIONS";
      result = MAP_MERGE(:startVertices, "F_CC_SINGLE_SOURCE"(:startVertices."ID"));
      RETURN :result;
      END;
      
    7. Then you can select from it:

      SQL
      Copy
      SELECT * FROM "F_CC_MAP_MERGE"() ORDER BY "NORMALIZED_CLOSENESS_CENTRALITY" DESC;
      
    8. Again, you can mix the results from a graph function with other SQL operations, like a JOIN.

      SQL
      Copy
      SELECT *
        FROM "F_CC_MAP_MERGE"() AS C
      LEFT JOIN "LONDON_TUBE_STATIONS" AS S
        ON C."ID" = S."ID"
      ORDER BY "NORMALIZED_CLOSENESS_CENTRALITY" DESC;
      

      Using a GIS application for advanced visualization, you see the tube stations, where the blue ones are most important/central to the network.

      Closeness Centrality

    In this tutorial, you got to know more about complex procedures and functions that offer you helpful visualization approaches for your spatial data.

    That is all for this tutorial group. You got to know various methods to work with spatial and graph data.

    Thanks for reading!

    Follow our tag in the SAP Community to stay up-to-date on the latest updates and newest content! For more learning materials on SAP HANA Cloud, click here.

    Related content:

  • Step 3

    Which among the following statements are true?

Back to top