Skip to Content
test
0 %
Import a Census Sample Dataset
Details
// Explore More Tutorials

Import a Census Sample Dataset

09/10/2018

Import SAP Predictive Analytics Census Sample Dataset in your SAP HANA, express edition instance

You will learn

In this tutorial, you will learn how to download and import the SAP Predictive Analytics Census sample dataset into your SAP HANA, express edition instance.

Step 1: Sample file structure

The sample structure for the Sample Data Census is the following:

|--sample_census_3.3.1_en-us_production.zip
    |-- Census.zp
    |    |-- Census
    |        |-- Census01.csv
    |        |-- Desc_Census01.csv
    |-- metadata.xml

Note: desc and KxDesc files are SAP Predictive Analytics dataset description files and will not be loaded.

This Census data set (Census01.csv) presents the data on 48842 individual Americans, of at least 17 years of age. Each individual is characterized by 15 data items. These data, or variables, are described in the following table:

Variable Description Example of Values
age Age of individuals Any value greater than 17
workclass Employer category Private, Self-employed, …
fnlwgt Statistical Weight variable Any numerical value
education Level of study 11th, Bachelors
education_num Number of years of study A value between 1 and 16
marital_status Marital status Divorced, Married, …
occupation Job classification Sales, Cleaners, …
relationship Position in family Husband, Wife, …
race Ethnicity
sex Gender Male, Female, …
capital_gain Annual capital gains Any numerical value
capital_loss Annual capital losses Any numerical value
native country Country of origin United States, France, …
class Variable indicating if the salary is greater $50 k if salary > $50 k then 1 else 0
Log on to answer question
Step 2: Create the table structure

Connect to the HXE tenant using the ML_USER user credentials and execute the following SQL statement to create the table structure:

CREATE TABLE PA_DATA.CENSUS (
    AGE             INT,
    WORKCLASS       VARCHAR(20),
    FNLWGT          INT,
    EDUCATION       VARCHAR(20),
    EDUCATION_NUM   INT,
    MARITAL_STATUS  VARCHAR(30),
    OCCUPATION      VARCHAR(20),
    RELATIONSHIP    VARCHAR(20),
    RACE            VARCHAR(20),
    SEX             VARCHAR(10),
    CAPITAL_GAIN    INT,
    CAPITAL_LOSS    INT,
    HOURS_PER_WEEK  INT,
    NATIVE_COUNTRY  VARCHAR(30),
    CLASS           INT
);
Log on to answer question
Step 3: Download the Sample Data

Open the SAP Predictive Analytics documentation page in a browser and click on the View All for the Sample section.

This will display the list of sample dataset available.

Right-click on the Download the Sample Data Census link and use the Save link address to get the download URL.

Open the download URL and save the sample data archive either:

  • the Eclipse host if you want to use the SAP HANA Tools for Eclipse
  • the SAP HANA, express host if you want the IMPORT FROM SQL command

Extract the files (located in the Census subdirectory in the embedded Census.zip archive).

Import Using the SAP HANA Tools for Eclipse

You can extract the sample file anywhere you want on the Eclipse host.

You can now move to Step 3: Import Using the SAP HANA Tools for Eclipse.

Import Using the IMPORT FROM SQL command

Here is an example script that you can reuse to download and extract the dataset directly from the SAP HANA, express edition host:

URL=https://help.sap.com/http.svc/download?deliverable_id=20555035
OUTPUT_FILE=sample_census
OUTPUT_DIR=/usr/sap/HXE/HDB90/work/$OUTPUT_FILE

 # create a new subdirectory for the sample data'
mkdir $OUTPUT_DIR

 # download the archive in the sample data directory'
 # wget -O $OUTPUT_DIR/$OUTPUT_FILE.zip $URL
curl $URL -o $OUTPUT_DIR/$OUTPUT_FILE.zip

 # switch to the new directory'
cd $OUTPUT_DIR

 # extract all archives and embedded archives'
while [ "`find . -type f -name '*.zip' | wc -l`" -gt 0 ]; \
  do find -type f -name "*.zip" \
     -exec unzip -o --  '{}' \; \
     -exec rm -- '{}' \;; done

It requires CURL (or WGET) to be installed.

The dataset files should now be located in: /usr/sap/HXE/HDB90/work/sample_census/Census

You can now move to Step 3: Import Using the IMPORT FROM SQL command.

Log on to answer question
Step 4: Import Using the SAP HANA Tools for Eclipse

You can import the data using the SAP HANA Tools for Eclipse Import feature using the following details:

  • File Details:

    • Set the Field Delimiter value to Comma (,).
    • Check the Header row exists box and set the Header row value to 1
    • Check the Import all data box.
  • Target Table:

    • Select the Existing radio, and pick PA_DATA.CENSUS

  • Manage Table Definition and Data Mappings:

    Using the Mapping menu icon, select the One by One option.

The source files should be mapped with the following target tables:

Source File Target Table
Census01.csv PA_DATA.CENSUS
Log on to answer question
Step 5: Import Using the IMPORT FROM SQL command

The dataset files should be located in: /usr/sap/HXE/HDB90/work/sample_census/Census

Connect to the HXE tenant using the ML_USER user credentials using your SQL query tool.

Execute the following SQL statement:

IMPORT FROM CSV FILE '/usr/sap/HXE/HDB90/work/sample_census/Census/Census01.csv' INTO PA_DATA.CENSUS
WITH
   RECORD DELIMITED BY '\n'
   FIELD DELIMITED BY ','
   OPTIONALLY ENCLOSED BY '"'
   SKIP FIRST 1 ROW
   FAIL ON INVALID DATA
   ERROR LOG '/usr/sap/HXE/HDB90/work/sample_census/Census/Census01.csv.err'
;
Log on to answer question
Step 6: Verify the imported data

You can verify that the data was imported properly using the following SQL statement:

SELECT 'CENSUS' as TABLE_NAME, count(1) as ROW_COUNT from PA_DATA.CENSUS;
Log on to answer question
Step 7: Validation

Provide an answer to the question below then click on Validate.

How many rows were imported from the file in the CENSUS table?
×

Next Steps

Back to top