Skip to Content

Create a Calculation View with K-anonymity (XS Advanced)

Anonymize data to protect privacy.
You will learn
jung-thomasThomas JungJanuary 5, 2021
Created by
Lsubatin
September 5, 2018
Contributors
jung-thomas
Lsubatin

Prerequisites

  • This tutorial is designed for SAP HANA on premise and SAP HANA, express edition. It is not designed for SAP HANA Cloud.
  • How to create a Calculation View using an anonymization node
  • Use k-anonymity to hide data that can identify individuals by comparing other identifying information. See the SAP Help for more information
  • How to create a calculation view
  • Use artifacts to import data from text file

GitHub repository available

This project can be cloned from this repository: https://github.com/SAP/hana-xsa-calculation-view-anonymity

  • Step 1

    Map a space to your tenant

    Make sure your project is using a space that is mapped to a tenant. Here is how to how to map it to a tenant database.

    Start the script server in your tenant database

    If you have not already, log in to your System database as SYSTEM and execute the following statement. In HANA express, as hxeadm

    SQL
    Copy
    hdbsql -i 90 -d systemdb -u SYSTEM
    alter database HXE  ADD 'scriptserver';
    quit;
    

    You can also do this from the Database Explorer tool, as user SYSTEM and logged into the SYSTEMDB database:

    Add scriptserver

    Note: This example assumes the tenant database is the default first tenant created in SAP HANA, express edition. Replace HXE with the name of your tenant. You can query available tenants from table M_DATABASES

    Deploy your database module in the space mapped to the tenant

    FOLLOW THESE STEPS ONLY if you created the database module before mapping the space to the tenant, you will see the HDI container in the space but it will still be connected to the system database. The system database is meant for administration purposes and does not have the necessary engines to execute advanced analytics such as anonymization.

    Use command xs ds to delete the HDI container and deploy it again once the space has been mapped to the tenant database.

    Note: An indication of using the system database after completing the SQL statement in the previous prerequisite is an error stating column store error search table error No ScriptServer available or similar.

  • Step 2

    Right-click Workspace and choose New > Project from template.

    Create a new project

    Choose SAP HANA Database Application and then click Next.

    Create a new project

    Call your project ANALYTICS and click Next.

    Create a new project

    Make sure you are using the space that is mapped to the tenant database. Also you must clear out the Namespace field. Click Finish.

    Create a new project

    Click Finish to create your project.

  • Step 3

    In the db/src folder, create a folder called data and a file called jobs.hdbtable. Use the following code for the table:

    SQL
    Copy
    COLUMN TABLE JOBS (
    	ID INTEGER GENERATED BY DEFAULT AS IDENTITY (NO CYCLE NO CACHE NO MINVALUE START WITH 300001000 INCREMENT BY 1 MAXVALUE 399999999) NOT NULL COMMENT 'Opinion ID',
    	ORIGIN_COUNTRY NVARCHAR(100),
    	CURRENT_ROLE NVARCHAR(100),
    	JOB_LIKES NVARCHAR(100),
    	AGE INTEGER,
    	CURRENT_TENURE INTEGER,
    	primary key (ID)
    )
    
    
  • Step 4

    In the data folder, create a file called data.csv with the following content:

    text
    Copy
    ORIGIN_COUNTRY	CURRENT_ROLE	JOB_LIKES	AGE	CURRENT_TENURE
    United States	Sr Architect	Designing systems with SAP HANA	37	12
    United States	Development Lead	Creating solutions for my customers	45	3
    United States	Development Lead	Providing customers with HANA models	45	3
    Australia	Architect	Helping internal customers with their HANA implementations	37	9
    Australia	Development Lead	Helping co-workers advance their careers		2
    United States	Technical manager	Creating new solutions for customers	32	12
    United States	Sr Architect	Learning new technologies in Big Data	37	4
    Mexico	CIO	Helping my company achieve strategic goals	46	14
    United States	Manager	Creating solutions on time and on budget	46	9
    United States	Development manager	Innovating with SAP HANA	32	13
    Canada	Developer Evangelist	Creating fun tutorials	28	7
    Costa Rica	CFO	Generating value for shareholders	43	4
    United States	Architect	The people I work with	65	17
    Poland	Developer Evangelist	Turning ideas into technical solutions	28	9
    Poland	BW/HANA Consultant	Creating complex IT solutions	58	17
    United States	Marketing	Implementing fun developer programs	24	4
    United States	CDO	Turning data into business value	54	3
    United Kingdom	Solution Advisor	Helping my customers gain value from their data	34	18
    United States	Intern	Learning new technologies in Big Data	21	2
    

    Create another file called load.hdbtabledata and add the following contents into it

    json
    Copy
    {
    "format_version": 1,
    "imports": [{
      "target_table": "JOBS",
      "source_data": {
        "data_type": "CSV",
        "file_name": "data.csv",
        "has_header": true,
        "dialect": "HANA",
        "type_config": {
          "delimiter": "\t"
        }
      },
      "import_settings": {
        "import_columns": ["ORIGIN_COUNTRY", "CURRENT_ROLE", "JOB_LIKES", "AGE", "CURRENT_TENURE"]
      },
      "column_mappings": {
        "ORIGIN_COUNTRY": 1, "CURRENT_ROLE" : 2, "JOB_LIKES" : 3, "AGE" : 4, "CURRENT_TENURE" : 5
      }
    }]
    }
    
    

    Save all of the files and Build the db module.

  • Step 5

    Create folder called models under src and a calculation view of type cube called K_ANONYMITY

    New calculation view

    Drag and drop an anonymization node.

    New CV

    Click on the + sign to add a data source

    New CV

    Search for the jobs table and add it to the node.

    New CV

    Double-click on the node and add all of the fields to the output. Then click on Details to configure the anonymity.

    Add all fields to output

    Set ID as the sequence column, 2 as the value for k and click on + to add quasi columns:

    Add all fields to output

    Choose the ORIGIN_COUNTRY

    Add all fields to output

    Click on the arrow to configure the values for the anonymity

    Add all fields to output

    Import the values and replace the values for the first level. Save,

    Add all fields to output

    Click Back. Connect the JOBS node to the aggregation node.

    Add all fields to output

    Why is this useful?

    Anonymization of data helps companies protect privacy and comply with data protection regulations, such as GDPR, together with many other measures to mask data that could potentially identify an individual.
     
    In this example, there are single combinations of job roles in specific countries. In other words, if the other fields were sensitive information such as the salary and somebody knew who filled in the spreadsheet, they could deduce an individual’s salary because of the unique combination of job role and country.

  • Step 6

    Click on the aggregation node and add all of the columns to the output.

    Add all fields to output

    Save and build the view.

    Add all fields to output

    Once finished, right-click and choose the data preview. Choose Raw Data to complete the validation below.

    Sort the column CURRENT TENURE descending. What is the value of the first Origin Country?

Back to top