-
First, you must create a sample table PERSONALDATA
that has sensitive information on people regarding their age. Run the following SQL codes in your SAP HANA Database Explorer.
CREATE COLUMN TABLE PERSONALDATA (
-- sequence column
ID BIGINT NOT NULL PRIMARY KEY GENERATED ALWAYS AS IDENTITY,
-- identifier
NAME NVARCHAR(10),
-- quasi-identifiers (QIDs) (to be generalized)
GENDER NVARCHAR(1) NOT NULL,
CITY NVARCHAR(10) NOT NULL,
-- sensitive data
AGE NVARCHAR(3) NOT NULL);
The SQL codes used in this tutorial are available to you in a GitHub repository. You can find them here.
-
Now, you need to insert information into each column of your table.
INSERT INTO PERSONALDATA VALUES ('Peter', 'm', 'Berlin', '30');
INSERT INTO PERSONALDATA VALUES ('Sigrid', 'f', 'Cologne', '31');
INSERT INTO PERSONALDATA VALUES ('François', 'm', 'Paris', '24');
INSERT INTO PERSONALDATA VALUES ('Bernhard', 'm', 'Munich', '31');
INSERT INTO PERSONALDATA VALUES ('Pierre', 'm', 'Nice', '25');
INSERT INTO PERSONALDATA VALUES ('Andrea', 'f', 'Hamburg', '32');
INSERT INTO PERSONALDATA VALUES ('Juliette', 'f', 'Marseille', '28');
INSERT INTO PERSONALDATA VALUES ('Fabienne', 'f', 'Toulouse', '28');
-
Based on the table created above, here is an example of how a completed statement could look like for an anonymized view on a table that contains two quasi-identifying columns (gender and location) as well as a sensitive age column. In this view, at least K=2
people should be indistinguishable based on their quasi-identifiers and at least L=2
people should have different age values:
CREATE VIEW PERSONALDATA_L_ANON (ID, GENDER, LOCATION, AGE)
AS SELECT ID, GENDER, CITY AS LOCATION, AGE
FROM PERSONALDATA
WITH ANONYMIZATION (ALGORITHM 'L-DIVERSITY'
PARAMETERS '{"k": 2, "l": 2}'
COLUMN ID PARAMETERS '{"is_sequence": true}'
COLUMN GENDER PARAMETERS '{"is_quasi_identifier":true, "hierarchy":{"embedded": [["f"], ["m"]]}}'
COLUMN LOCATION PARAMETERS '{"is_quasi_identifier":true, "hierarchy":{"embedded": [["Berlin", "Germany"], ["Cologne", "Germany"], ["Paris", "France"], ["Munich", "Germany"],
["Nice", "France"], ["Hamburg", "Germany"], ["Marseille", "France"], ["Toulouse", "France"]]}}'
COLUMN AGE PARAMETERS '{"is_sensitive":true}');
This is what the data looked like before anonymization:
The anonymized view looks like this:
Here you can see that location has been anonymized to level 2 (*) to achieve that at least 2 individuals have different age values in a group of people who share the same quasi-identifying attributes. In this case, everyone who is male or female.
Advantage of L-Diversity
If we would have only used k-anonymity, we would not be able to leverage privacy guarantees, because, for example, two female individuals from France share the same age (28), therefore revealing their sensitive information:
Keep in mind that if you use K-Anonymity with L-Diversity, you can still combine both with the advanced parameters like min/max, loss, or weighted quasi-identifiers as well as data change strategies.
For more details on the parameters of L-Diversity, you can read our technical documentation here.
Well done!
You have completed the fourth tutorial of this group! Now you know how to complement K-Anonymity with L-Diversity to create anonymized views that consider sensitive data.
Get to know a different approach to anonymizing data in the next tutorial , where you will learn how to use differential privacy in an anonymized view.