Skip to Content

Set Up Jupyter Notebook and Client Library for Document Classification

Requires Customer/Partner License
Set up a local JupyterLab instance as well as the Jupyter Notebook and the client library for Document Classification.
You will learn
  • How to install Python and Anaconda
  • How to set up a local JupyterLab instance
Juliana-MoraisJuliana MoraisNovember 4, 2021
Created by
tobias61
May 18, 2020
Contributors
Juliana-Morais
tobias61

In order to give you a head start, a client library for the Document Classification service is available. The library already has all necessary operations implemented and ready to use, like, for example, Creating a Dataset and Training a Model. The client library can be found here.

Additionally, a Jupyter Notebook is used in the following tutorials to make the steps interactive and easy to follow. For more information on Jupyter Notebooks, see Jupyter. The notebook that was created for this tutorial mission is also available in the client library repository.

  • Step 1

    First, you need to install Anaconda. Anaconda is a platform that offers tools to process large datasets and is often used by data scientists. When installing Anaconda, the programming language Python is installed as well.

    If you are on Linux, there is a script available for the installation of Anaconda here. The first three commands are necessary to install Anaconda whereas the other commands are used in the following step of this tutorial.

    If you are on a different operating system or the script does not work for you, head over to the Anaconda installation guide and look for your operating system. Once there, follow the installation guide.

    To verify the installation, search for Anaconda Prompt in your programs and open it. In the command prompt, enter conda --version and the version is printed as in the image below.

    Check Anaconda Version

    You have successfully installed Anaconda.

  • Step 2

    As mentioned before, you now need to execute the other commands of the script.

    Open the Anaconda Prompt again and copy the commands from line 7 to line 10 into the command prompt. Anaconda will now install JupyterLab which is a user interface that enables you to work with Jupyter Notebooks.

    Once the installation is done, enter jupyter lab to start JupyterLab. Once you started JupyterLab, the browser automatically opens the respective web page. Additionally, you can find the URLs in the output in the command prompt as you can see in the image below.

    Start JupyterLab

    You have successfully installed a local instance of JupyterLab and you are now able to work with Jupyter Notebooks.

  • Step 3

    As everything is installed now, you can open the Jupyter Notebook that is used in the following tutorials. Therefore, you are going to clone the repository that includes the notebook and the example dataset.

    Open JupyterLab in your browser, using the URL from the command prompt from the previous step. Once opened, click the tile under the heading Notebook to open an empty notebook.

    Alternatively, you can click File > New > Notebook to open an empty notebook.

    Open Empty Notebook

    In the notebook, click into the first cell and enter the following command:

    !git clone https://github.com/SAP/business-document-processing.git
    

    Then click Run. This command clones the repository.

    Clone Repository

    Once everything is done, a folder appears on the left named document-classification-client. This folder is the cloned repository.

    Now navigate into the folder document-classification-client > examples using the navigation pane on the left. In this folder the actual notebook, called train_and_evaluate_custom_model.ipynb, is located. Open the notebook by double-clicking it. The content of the notebook will now appear on the right side of the page.

    Open Notebook
  • Step 4

    Jupyter Notebooks are interactive playgrounds to code and are often used in data science to explore datasets.

    Notebooks contain a number of cells in a sequence whereas each cell mainly contains text or code but can also output diagrams and graphics.

    Throughout this tutorial mission, you walk through the notebook and explore the capabilities of the Document Classification service as well as understand the capabilities of Jupyter Notebooks.

    Notebook Structure
  • Step 5

    Select the correct command that starts your JupyterLab instance:

Back to top