Create a Calculation View with K-anonymity (XS Advanced)

Intermediate

20 min.

Big Data, SAP HANA, Intermediate, SAP HANA, express edition, Tutorial, SAP Web IDE

Anonymize data to protect privacy.

You will learn

Thomas JungJanuary 5, 2021

Created by

September 5, 2018

Contributors

Prerequisites

This tutorial is designed for SAP HANA on premise and SAP HANA, express edition. It is not designed for SAP HANA Cloud.

How to create a Calculation View using an anonymization node
Use k-anonymity to hide data that can identify individuals by comparing other identifying information. See the SAP Help for more information
How to create a calculation view
Use artifacts to import data from text file

GitHub repository available

This project can be cloned from this repository: https://github.com/SAP/hana-xsa-calculation-view-anonymity

Step 1
Map a space to your tenant

Make sure your project is using a space that is mapped to a tenant. Here is how to how to map it to a tenant database.

Start the script server in your tenant database

If you have not already, log in to your System database as SYSTEM and execute the following statement. In HANA express, as hxeadm

SQL
Copy
hdbsql -i 90 -d systemdb -u SYSTEM alter database HXE ADD 'scriptserver'; quit;

You can also do this from the Database Explorer tool, as user SYSTEM and logged into the SYSTEMDB database:

Note: This example assumes the tenant database is the default first tenant created in SAP HANA, express edition. Replace HXE with the name of your tenant. You can query available tenants from table M_DATABASES

Deploy your database module in the space mapped to the tenant

FOLLOW THESE STEPS ONLY if you created the database module before mapping the space to the tenant, you will see the HDI container in the space but it will still be connected to the system database. The system database is meant for administration purposes and does not have the necessary engines to execute advanced analytics such as anonymization.

Use command xs ds to delete the HDI container and deploy it again once the space has been mapped to the tenant database.

Note: An indication of using the system database after completing the SQL statement in the previous prerequisite is an error stating column store error search table error No ScriptServer available or similar.
Step 2
Right-click Workspace and choose New > Project from template.

Choose SAP HANA Database Application and then click Next.

Call your project ANALYTICS and click Next.

Make sure you are using the space that is mapped to the tenant database. Also you must clear out the Namespace field. Click Finish.

Click Finish to create your project.

Step 3

In the db/src folder, create a folder called data and a file called jobs.hdbtable. Use the following code for the table:

SQL

Copy

COLUMN TABLE JOBS (
	ID INTEGER GENERATED BY DEFAULT AS IDENTITY (NO CYCLE NO CACHE NO MINVALUE START WITH 300001000 INCREMENT BY 1 MAXVALUE 399999999) NOT NULL COMMENT 'Opinion ID',
	ORIGIN_COUNTRY NVARCHAR(100),
	CURRENT_ROLE NVARCHAR(100),
	JOB_LIKES NVARCHAR(100),
	AGE INTEGER,
	CURRENT_TENURE INTEGER,
	primary key (ID)
)

Step 4

In the data folder, create a file called data.csv with the following content:

text

Copy

ORIGIN_COUNTRY	CURRENT_ROLE	JOB_LIKES	AGE	CURRENT_TENURE
United States	Sr Architect	Designing systems with SAP HANA	37	12
United States	Development Lead	Creating solutions for my customers	45	3
United States	Development Lead	Providing customers with HANA models	45	3
Australia	Architect	Helping internal customers with their HANA implementations	37	9
Australia	Development Lead	Helping co-workers advance their careers		2
United States	Technical manager	Creating new solutions for customers	32	12
United States	Sr Architect	Learning new technologies in Big Data	37	4
Mexico	CIO	Helping my company achieve strategic goals	46	14
United States	Manager	Creating solutions on time and on budget	46	9
United States	Development manager	Innovating with SAP HANA	32	13
Canada	Developer Evangelist	Creating fun tutorials	28	7
Costa Rica	CFO	Generating value for shareholders	43	4
United States	Architect	The people I work with	65	17
Poland	Developer Evangelist	Turning ideas into technical solutions	28	9
Poland	BW/HANA Consultant	Creating complex IT solutions	58	17
United States	Marketing	Implementing fun developer programs	24	4
United States	CDO	Turning data into business value	54	3
United Kingdom	Solution Advisor	Helping my customers gain value from their data	34	18
United States	Intern	Learning new technologies in Big Data	21	2

Create another file called load.hdbtabledata and add the following contents into it

json

Copy

{
"format_version": 1,
"imports": [{
  "target_table": "JOBS",
  "source_data": {
    "data_type": "CSV",
    "file_name": "data.csv",
    "has_header": true,
    "dialect": "HANA",
    "type_config": {
      "delimiter": "\t"
    }
  },
  "import_settings": {
    "import_columns": ["ORIGIN_COUNTRY", "CURRENT_ROLE", "JOB_LIKES", "AGE", "CURRENT_TENURE"]
  },
  "column_mappings": {
    "ORIGIN_COUNTRY": 1, "CURRENT_ROLE" : 2, "JOB_LIKES" : 3, "AGE" : 4, "CURRENT_TENURE" : 5
  }
}]
}

Save all of the files and Build the db module.

Step 5
Create folder called models under src and a calculation view of type cube called K_ANONYMITY

Drag and drop an anonymization node.

Click on the + sign to add a data source

Search for the jobs table and add it to the node.

Double-click on the node and add all of the fields to the output. Then click on Details to configure the anonymity.

Set ID as the sequence column, 2 as the value for k and click on + to add quasi columns:

Choose the ORIGIN_COUNTRY

Click on the arrow to configure the values for the anonymity

Import the values and replace the values for the first level. Save,

Click Back. Connect the JOBS node to the aggregation node.

Why is this useful?

Anonymization of data helps companies protect privacy and comply with data protection regulations, such as GDPR, together with many other measures to mask data that could potentially identify an individual.

In this example, there are single combinations of job roles in specific countries. In other words, if the other fields were sensitive information such as the salary and somebody knew who filled in the spreadsheet, they could deduce an individual’s salary because of the unique combination of job role and country.
Step 6
Click on the aggregation node and add all of the columns to the output.

Save and build the view.

Once finished, right-click and choose the data preview. Choose Raw Data to complete the validation below.

Sort the column CURRENT TENURE descending. What is the value of the first Origin Country?