Install & Setup notebook (Jupyter/Zeppelin)

Created with Sketch.

Install & Setup notebook (Jupyter/Zeppelin)

0
(0)

With Metatron Discovery, you can analyze various data using ‘Workbook’ and ‘Workbench’.
In additionally for more advanced analysis, it supports interconnect with 3rd party Notebook application.

In this post, we will learn how to install the Jupyter and Zeppelin Notebook server.

Jupyter

Install Jupyter through Anaconda. Anaconda installation is recommended because data analysis requires a lot of Python Library.

Anaconda3

  • https://www.anaconda.com/distribution/(shows the latest version of Anaconda)
  • We need Python 3.x You can download here
$ ~/.Anaconda3-2018.12-MacOSX-x86_64.sh

After the installation, install R-kernel. (Only Python3-kernel comes with the package)

$ conda install -c r r –yes
$ conda install -c r r-essentials –yes
$ conda install -c r r-httr
$ conda install -c r r-jsonlite

// if you want to install more packages…

$ conda install -c r r-rserve --yes
$ conda install -c r r-devtools --yes
$ conda install -c r r-rcurl --yes
$ conda install -c r r-RJSONIO --yes
$ conda install -c r r-jpeg --yes
$ conda install -c r r-png --yes

//  if you want to update latest r packages 

$ conda update -c r --all

To use R-kernel on Jupyter, install the native library and set links as below, and verify the version. (for CentOS)

$ /usr/lib64/libpng12.so.0 -> /usr/lib64/libpng12.so.0.50.0
$ /usr/lib64/libXrender.so.1 -> /usr/lib64/libXrender.so.1.3.0
$ /usr/lib64/libXext.so.6 -> /usr/lib64/libXext.so.6.4.0
$ /usr/lib64/libc.so.6 -> libc-2.17.so

In addition, if you’d like to install Deep learning library or sparklyr, command as follows: (for CentOS)

$ conda install -c conda-forge tensorflow
$ conda install -c conda-forge keras
$ conda install -c r r-sparklyr

And to use matplotlib on Jupyter, install native library and set links as below, and verify the version. (for CentOS)

$ /usr/lib64/libGL.so.1 -> /usr/lib64/libGL.so.1.2.0
$ /usr/lib64/libxshmfence.so.1 -> /usr/lib64/libxshmfence.so.1.0.0
$ /usr/lib64/libglapi.so.0 -> /usr/lib64/libglapi.so.0.0.0
$ /usr/lib64/libXdamage.so.1 -> /usr/lib64/libXdamage.so.1.1.0
$ /usr/lib64/libXfixes.so.3 -> /usr/lib64/libXfixes.so.3.1.0
$ /usr/lib64/libXxf86vm.so.1 -> /usr/lib64/libXxf86vm.so.1.0.0

Generate-config

Generate a jupyter-config file for configuring pgcontents.

$ jupyter notebook --generate-config
$ vi /home/metatron/.jupyter/jupyter_notebook_config.py

Open the config file and add the codes below.

c.NotebookApp.notebook_dir = '/user/Metatron/jupyter'// common config

// Basically, it is assumed that the notebook server connected with discovery does not support authentication.
c.NotebookApp.allow_origin = '*'
c.NotebookApp.disable_check_xsrf = True
c.NotebookApp.token = ''
 
// no localhost
c.NotebookApp.ip = '0.0.0.0'

Custom Packages

pymetis

A utility package for Python-kernel used by metatron on Jupyter.

git clone https://github.com/metatron-app/discovery-jupyter-py-utils.git
 
$ cd discovery-jupyter-py-utils/

$ python setup.py sdist
$ pip uninstall pymetis

$ cp dist/pymetis-0.0.3.tar.gz {ANACONDA_HOME}/anaconda3/pkgs/

$ pip install {ANACONDA_HOME}/pkgs/pymetis-x.x.x.tar.gz (current ver. 0.0.3)

RMetis

A utility package for R-kernel used by metatron on Jupyter.

git clone https://github.com/metatron-app/discovery-jupyter-r-utils

$ cd discovery-jupyter-r-utils
$ R CMD build ${This Source Directory – Relative or Absolute path ok. Ex. /home/metatron/discovery-jupyter-r-utils}
  

$ cp RMetis_0.0.3.tar.gz ${ANACONDA_HOME}/pkgs/
$ R CMD INSTALL --no-multiarch ${ANACONDA_HOME}/pkgs/RMetis_x.x.x.tar.gz (current ver. 0.0.3)

Run

When all the above configurations are done, start the Jupyter process with the commands below. After that, connect to http://localhost:8888 and check if everything works fine.

If you need, you can change the port in ~/.jupyter/jupyter_notebook_config.py

$ mkdir {ANACONDA_HOME}/logs
$ nohup jupyter notebook >> {ANACONDA_HOME}/logs/jupyter.log 2>&1 &

Set a Spark directory

To execute the scripts created with Jupyter as an API, you need to install Spark on the same server as that of Metatron. (Run as a spark-driver-node)

After installation, set a directory in the METATRON_SPARK_HOME environment variable.

$ conf/metaron-env.sh
export METATRON_JAVA_OPTS="-Dspark.home.dir={SPARK_HOME}"

 

 


Zeppelin

Download and extract the installer from the link below.

Install

Download binary package from zeppelin home : http://zeppelin.apache.org/download.html and extract package. (You can follow install guide in zeppelin home)

Custom Packages

Discovery-interpreter

A utility package for Spark-interpreter used by metatron on Zeppelin.

$ git clone https://github.com/metatron-app/discovery-zeppelin-interpreter.git
 $ mvn clean package -P prod -P spark-2.2 -DskipTests //Use “-Dspark.version=${spark version}” instead of -P “spark-2.2”
 $ cp target/discovery-zeppelin-interpreter-{spark.version}-1.0.0.jar {ZEPPELIN_HOME}/lib/interpreter

Run

When all the above configurations are done, start the Zeppelin process with the command below. After that, connect to http://localhost:8080 and check if everything works fine.

If you need, you can change the port in conf/zeppelin-site.xml

$ ./{ZEPPELIN_HOME}/bin/zeppelin-daemon.sh start

(optional) run in yarn-client mode

If you want to run Zeppelin Spark-interpreter’s master in yarn-client mode, you need to install and setup Zeppelin-Spark-Hadoop configuration.

from https://zeppelin.apache.org/docs/0.7.3/install/yarn_install.html

$ vi {ZEPPELIN_HOME}/conf/zeppelin-env.sh
  
 export MASTER=yarn-client
 export SPARK_HOME=/home/metatron/servers/spark-2.2.0-bin-hadoop2.7
 export HADOOP_CONF_DIR=/home/metatron/servers/hadoop-2.7.2/etc/hadoop

(optional) run with R interpreter

from https://zeppelin.apache.org/docs/0.7.3/interpreter/r.html

To run Zeppelin with the R Interpreter, the SPARK_HOME environment variable must be set. The best way to do this is by editing conf/zeppelin-env.sh. If it is not set, the R Interpreter will not be able to interface with Spark. You should also copy conf/zeppelin-site.xml.template to conf/zeppelin-site.xml. That will ensure that Zeppelin sees the R Interpreter the first time it starts up.

How useful was this post?

Click on a star to rate it!

Average rating 0 / 5. Vote count: 0

No votes so far! Be the first to rate this post.

As you found this post useful...

Share this post on your social media!

We are sorry that this post was not useful for you!

Let us improve this post!

Tell us how we can improve this post?

One Response

  1. David Wang says:

    Hi, I’m having some trouble setting this up. I have Jupyter working on my Linux server, but I’m getting the error: Fail to create notebook of jupyter from http://192.168.0.152:8888/?token=71374f5ef5a965c4316bada0bc66472de7f7c8542f1c9fa1
    when I try creating a notebook from Metatron. Did I configure my metatron-env.sh file correctly?

    export JAVA_HOME=”/usr/lib/jvm/jre-openjdk”
    export METATRON_JAVA_OPTS=”-Dspark.home.dir=/home/rlovelace/spark-3.0.1-bin-hadoop2.7″ # Additional jvm options. for example, “-Dprop1=val1 -Dprop2=value2”
    export METATRON_MEM # metatron jvm mem options Default “-Xms2048m -Xmx2048m”

    export METATRON_ENV_MODE # ENV mode, profile mode, default local
    export METATRON_EXTRA_PROFILE # Extra profile, comma seperated
    export METATRON_LOG_DIR # Where log files are stored. PWD by default.
    export METATRON_PID_DIR # The pid files are stored. ${METATRON_HOME}/run by default.
    export METATRON_IDENT_STRING # A string representing this instance of metatron. $USER by default.
    export METATRON_NICENESS # The scheduling priority for daemons. Defaults to 0.
    export $METATRON_CLASSPATH_OVERRIDES # additional classpath

    export METATRON_DB_TYPE # h2 or mysql. h2 by default
    export METATRON_H2_DATA_DIR # h2 db data directory

Leave a Reply

Your email address will not be published. Required fields are marked *