This article is intended as an introduction to GeoMesa. It walks you through all the installation steps for a local dev environment and shows you how to run a few basic examples. The examples (which will be introduced in Part 3) are all borrowed from the GeoMesa help site (copying quite a lot of their content) - while this site provides a lot of good info, it doesn’t lend itself that well to a consistent practical introduction.
Here are a few useful links:
- GeoMesa Website
- GeoMesa FAQ
- Project Overview
- Maven Repository
- Github, see for SBT dependencies.
- GeoMesa: Scaling up Geospatial Analysis
- GeoMesa Spark: Basic Analysis
- GeoMesa Spark
- Using the GeoMesa Accumulo Spark Runtime
- geomesa-accumulo-compute example: Source data from Spark and analyse it
- Command Line Tools
GeoMesa is an add-on to one of the following data stores:
- Accumulo
- Kafka
- HBase
- Bigtable
- Cassandra
Installation
The instructions below are for a local dev environment only!
Hadoop
If you haven’t got Hadoop installed yet, follow my instructions outlined in Setting up a Hadoop Dev Environment for Pentaho Data Integration - just the first part, there is no need to install Pentaho Data Integration.
Zookeeper
To install Zookeeper, follow the script below and adjust where required:
# adjust working directory if required
export WORK_DIR=~/apps
# ########### ZOOKEEPER #############
# https://zookeeper.apache.org/releases.html
# requires:
cd $WORK_DIR
export VERSION=3.4.10
wget http://apache.mirrors.nublue.co.uk/zookeeper/zookeeper-$VERSION/zookeeper-$VERSION.tar.gz
tar -xzvf zookeeper-$VERSION.tar.gz
echo " " >> ~/.bashrc
echo "# ======== ZOOKEEPER ======== #" >> ~/.bashrc
echo "export ZOOKEEPER_HOME=$WORK_DIR/zookeeper-$VERSION" >> ~/.bashrc
echo 'export PATH=$PATH:$ZOOKEEPER_HOME/bin' >> ~/.bashrc
source ~/.bashrc
cd zookeeper-$VERSION
cp conf/zoo_sample.cfg conf/zoo.cfg
# define permanent data dir
mkdir ~/zookeeper-data
# IMPORTANT: CHANGE USER HOME DIR PATH
perl -0777 -i.original -pe 's@/tmp/zookeeper@/home/dsteiner/zookeeper-data@igs' conf/zoo.cfg
Now start service:
sh ./bin/zkServer.sh start
Accumulo
Important: Geomesa version 1.3.1 is only compatible with Accumulo version 1.7.x.
A note about Accumulo 1.8: “GeoMesa supports Accumulo 1.8 when built with the accumulo-1.8 profile. Accumulo 1.8 introduced a dependency on libthrift version 0.9.3 which is not compatible with Accumulo 1.7/libthrift 0.9.1. The default supported version for GeoMesa is Accumulo 1.7.x and the published jars and distribution artifacts reflect this version. To upgrade, build locally using the accumulo-1.8 profile.”
Again, follow the script below and adjust where required:
# ########### ACCUMULO #############
# https://accumulo.apache.org/downloads/
# requires: hdfs, zookeeper
cd $WORK_DIR
# export VERSION=1.8.1
export VERSION=1.7.3
wget http://www.mirrorservice.org/sites/ftp.apache.org/accumulo/$VERSION/accumulo-$VERSION-bin.tar.gz
tar -xzvf accumulo-$VERSION-bin.tar.gz
echo " " >> ~/.bashrc
echo "# ======== ACCUMULO ======== #" >> ~/.bashrc
echo "export ACCUMULO_HOME=$WORK_DIR/accumulo-$VERSION" >> ~/.bashrc
echo 'export PATH=$PATH:$ACCUMULO_HOME/bin' >> ~/.bashrc
source ~/.bashrc
cd accumulo-$VERSION
./bin/build_native_library.sh
cp conf/examples/1GB/standalone/* conf/
# by default, Accumulo's HTTP monitor binds only to the local network interface.
# to be able to access it over the Internet, you have to set the value
# of ACCUMULO_MONITOR_BIND_ALL to true.
sed 's/# export ACCUMULO_MONITOR_BIND_ALL/export ACCUMULO_MONITOR_BIND_ALL/' < conf/accumulo-env.sh > conf/accumulo-env.test.sh
rm conf/accumulo-env.sh
mv conf/accumulo-env.test.sh conf/accumulo-env.sh
chmod 700 conf/accumulo-env.sh
# instance.volumes: where to store the data on HDFS
perl -0777 -i.original -pe 's@<name>instance.volumes</name>\n\s+<value></value>@<name>instance.volumes</name>\n<value>hdfs://localhost:8020/accumulo</value>@igs' conf/accumulo-site.xml
# change instance.secret
perl -0777 -i.original -pe 's@<value>DEFAULT</value>@<value>password</value>@igs' conf/accumulo-site.xml
To avoid the following warning message change hdfs-site.xml
:
WARN : dfs.datanode.synconclose set to false in hdfs-site.xml: data loss is possible on hard system reset or power loss
Adjust your Hadoop config.
Before running first time, initialisation is required, this is a manual process. Make sure Hadoop is running!
./bin/accumulo init
Name the instance something like BISSOL_CONSULTING
, provide password password
Once command completes, you can start Accumulo:
./bin/start-all.sh
The Web UI available on http://localhost:9995/
Note: Whenever you put your machine/laptop to sleep, Accumulo will stop. So once you are back, you have to start it up again.
Troubleshooting
Waiting for Accumulo to be initialized
If you get this error message, check Zookeeper:
$ZOOKEEPER_HOME/bin/zkCli.sh
ls /accumulo
# should not be empty
ls /accumulo/instances
# should show instance names
If you get:
ls /accumulo
Node does not exist: /accumulo
“Well that’s your problem! Your ZooKeeper is empty. Accumulo uses the information it places in ZooKeeper to bootstrap and find the data in HDFS.” Source
Make sure you configure the ZooKeeper dataDir
to be something other than /tmp
?
Change the dataDir
property in $ZOOKEEPER_HOME/conf/zoo.cfg
to something like this:
dataDir=/var/lib/zookeeper/
Now stop Accumulo, restart Zookeeper, remove the existing HDFS /accumulo
directory, init accumulo and then start it.
Changing Root User’s Password
If you ever plan to change the root user’s password, you must not use accumulo init --reset-security
for this purpose as it deletes all the users when you run it, in addition to changing the Accumulo root password. You can change a single user’s password in the accumulo shell with the passwd
command.
accumulo shell -u root
passwd
GeoMesa
Standard Install
Follow the script and adjust where required:
# ########### GEOMESA ACCUMULO ADD-ON #############
# http://www.geomesa.org/#downloads
# http://www.geomesa.org/documentation/user/accumulo/install.html
cd $WORK_DIR
# download and unpackage the most recent distribution
export VERSION=1.3.1
wget http://repo.locationtech.org/content/repositories/geomesa-releases/org/locationtech/geomesa/geomesa-accumulo-dist_2.11/$VERSION/geomesa-accumulo-dist_2.11-$VERSION-bin.tar.gz
tar xvf geomesa-accumulo-dist_2.11-$VERSION-bin.tar.gz
cd geomesa-accumulo_2.11-$VERSION
echo " " >> ~/.bashrc
echo "# ======== GEOMESA ======== #" >> ~/.bashrc
echo "export GEOMESA_ACCUMULO_HOME=$WORK_DIR/geomesa-accumulo_2.11-$VERSION" >> ~/.bashrc
echo 'export PATH=$PATH:$GEOMESA_ACCUMULO_HOME/bin' >> ~/.bashrc
source ~/.bashrc
Installing the distributed runtime library
There are two options:
- copy jars to Accumulo tablet servers in the cluster (old approach, do not use)
- use Accumulo namespace
There are two runtime JARs available, with and without raster support. Only one is needed and including both will cause classpath issues.
As of Accumulo 1.6 we can use namespaces to isolate the GeoMesa classpath from the rest of Accumulo.
There is a utility script available to do this:
./bin/setup-namespace.sh -u root -n myNamespace
However, this didn’t work correctly last time I tried it, so we can use the manual approach below instead:
# alternatively you can use this
accumulo shell -u root -p password
> createnamespace myNamespace
> grant NameSpace.CREATE_TABLE -ns myNamespace -u root
> config -s general.vfs.context.classpath.myNamespace=hdfs://localhost:8020/accumulo/classpath/myNamespace/[^.].*.jar
> config -ns myNamespace -s table.classpath.context=myNamespace
> exit
Then copy the distributed runtime jar into HDFS under the path you specified:
hdfs dfs -mkdir -p /accumulo/classpath/myNamespace
hdfs dfs -copyFromLocal dist/accumulo/geomesa-accumulo-distributed-runtime_2.11-$VERSION.jar /accumulo/classpath/myNamespace
Note: When connecting to a data store using Accumulo namespaces, you must prefix the tableName parameter with the namespace. For example, refer to the my_catalog table as myNamespace.my_catalog.
Prepare command line tools
You can configure environment variables and classpath settings in geomesa-accumulo_2.11-$VERSION/bin/geomesa-env.sh
.
This is not required in our case as we have all the correct env variables set already.
Run this script:
./bin/geomesa configure
Due to licensing restrictions, dependencies for shape file support and raster ingest must be separately installed:
Run these interactive scripts:
./bin/install-jai.sh
./bin/install-jline.sh
Test the command that invokes the GeoMesa Tools:
geomesa
For more details, see Command Line Tools
Build from Source
Note: Latest builds are also available directly from the Locationtech Artifactory
As an alternative you can build GeoMesa from source.
Just a brief description here:
$ git clone https://github.com/locationtech/geomesa.git
$ cd geomesa
$ git checkout -b geomesa-1.3.1 geomesa_2.11-1.3.1
$ mvn clean install -DskipTests=true
# OR TO COMPILE THE LATEST VERSION FOR ACCUMULO 1.8.1
$ git checkout master
$ mvn clean install -Paccumulo-1.8 -DskipTests=true
In case you are wondering what maven install
does: “This command tells Maven to build all the modules, and to install it in the local repository. The local repository is created in your home directory (or alternative location that you created it), and is the location that all downloaded binaries and the projects you built are stored. That’s it! If you look in the target subdirectory, you should find the build output and the final library or application that was being built. Note: Some projects have multiple modules, so the library or application you are looking for may be in a module subdirectory.” Source
Note: As Jim from the GoeMesa mailing list pointed out: “you’ll need to make sure that sbt picks up the artifacts which you have built locally”.
If you need the very latest, you can also build of the master branch. Then you can e.g. just copy the geomesa-accumulo-dist
target tar.gz
file and unzip it in a convenient directory:
cp geomesa/geomesa-accumulo/geomesa-accumulo-dist/target/geomesa-accumulo_2.11-1.3.2-SNAPSHOT-bin.tar.gz .
tar -xzvf geomesa-accumulo_2.11-1.3.2-SNAPSHOT-bin.tar.gz
Spark
Follow the install script and adjust where required:
# ########### SPARK #############
#
cd $WORK_DIR
wget http://d3kbcqa49mib13.cloudfront.net/spark-2.1.0-bin-hadoop2.7.tgz
tar -zxvf spark-2.1.0-bin-hadoop2.7.tgz
rm -rf spark-2.1.0-bin-hadoop2.7.tgz
echo " " >> ~/.bashrc
echo "# ======== SPARK ======== #" >> ~/.bashrc
echo "export SPARK_HOME=$WORK_DIR/spark-2.1.0-bin-hadoop2.7" >> ~/.bashrc
echo 'export PATH=$PATH:$SPARK_HOME/bin' >> ~/.bashrc
source ~/.bashrc
echo "--- Finished installing SPARK ---"
GeoServer
The below commands will help you to install GeoServer:
# ########### GEOSERVER #############
# http://geoserver.org/download/
# choose platform independent version
# http://docs.geoserver.org/stable/en/user/installation/linux.html
# http://docs.geoserver.org/stable/en/user/index.html
cd $WORK_DIR
export VERSION=2.9.1
wget https://freefr.dl.sourceforge.net/project/geoserver/GeoServer/$VERSION/geoserver-$VERSION-bin.zip
unzip geoserver-$VERSION-bin.zip
echo " " >> ~/.bashrc
echo "# ======== GEOSERVER ======== #" >> ~/.bashrc
echo "export GEOSERVER_HOME=$WORK_DIR/geoserver-$VERSION" >> ~/.bashrc
echo 'export PATH=$PATH:$GEOSERVER_HOME/bin' >> ~/.bashrc
source ~/.bashrc
# change port
cd $GEOSERVER_HOME
perl -0777 -i.original -pe 's@jetty.port=8080@jetty.port=8077@igs' start.ini
Next we have to install the WPS extension: On the downloads page click on the Archived tab to find the plugin for version 2.9.1.: Click on WPS (and not WPS Hazelcast) the Extension > Services section
# install WPS extension
cd webapps/geoserver/WEB-INF/lib/
wget https://netix.dl.sourceforge.net/project/geoserver/GeoServer/$VERSION/extensions/geoserver-$VERSION-wps-plugin.zip
unzip geoserver-$VERSION-wps-plugin.zip
rm geoserver-$VERSION-wps-plugin.zip
cd $GEOSERVER_HOME
Next let’s add the GeoMesa Accumulo dependencies as described in Installing GeoMesa Accumulo in GeoServer:
# install GeoMesa’s Accumulo data store as a GeoServer plugin
# interactive dialog. option to install it fully automatically also available,
# see docu: http://www.geomesa.org/documentation/user/accumulo/install.html#installing-geomesa-accumulo-in-geoserver
cd $GEOMESA_ACCUMULO_HOME
sh ./bin/manage-geoserver-plugins.sh --install
# install Accumulo, Zookeeper, Hadoop, and Thrift dependencies
sh ./bin/install-hadoop-accumulo.sh $GEOSERVER_HOME/webapps/geoserver/WEB-INF/lib/
cd $GEOSERVER_HOME
# start service
nohup sh ./bin/startup.sh &
The Web UI will be available on:
http://localhost:8077/geoserver/web/
Check on the Welcome page that WPS is listed under Service Capabilities.
The default username and password is admin
and geoserver
.
Further reading (but not required for this setup):
Jupyter
Deploying GeoMesa Spark with Jupyter Notebook.
This might vary depending on your OS. See above docu for more details.
Since my system had all the essential dev tools installed, the setup was simple like this:
# ########### JUPYTER #############
# http://jupyter.org/install.html
export GEOMESA_ACCUMULO_VERSION=1.3.1
cd $WORK_DIR
# ensure that you have the latest pip
pip3 install --upgrade pip
# then install the Jupyter Notebook using:
sudo pip3 install jupyter
Next we have to install to Spark kernel Toree. In this case I did not following the docu but the Toree Readme as this seemed to be more up to date
sudo pip3 install https://dist.apache.org/repos/dist/dev/incubator/toree/0.2.0/snapshots/dev1/toree-pip/toree-0.2.0.dev1.tar.gz
Important: DO NOT install Toree yet with Jupyter as we have to add the geomesa dependencies.
Next we have to build the specific Geomesa Jupyter visualisation plugins for Leaftlet and Vegas from source:
git clone https://github.com/locationtech/geomesa.git
cd geomesa
git checkout -b geomesa-$GEOMESA_ACCUMULO_VERSION geomesa_2.11-$GEOMESA_ACCUMULO_VERSION
mvn clean install -Pvegas -pl geomesa-jupyter/geomesa-jupyter-vegas
Note: Vegas is build with the
vegas
profile.
Next let’s install all these GeoMesa dependencies with Jupyter. I am now following again the official instructions
Three files are referenced:
- geomesa dependency
- geomesa-vega dependency
- geomesa-leaflet dependency
# requires following vars to be set
# GEOMESA_ACCUMULO_HOME
# GEOMESA_ACCUMULO_VERSION
# SPARK_HOME
# WORK_DIR
export jars="file://$GEOMESA_ACCUMULO_HOME/dist/spark/geomesa-accumulo-spark-runtime_2.11-$GEOMESA_ACCUMULO_VERSION.jar,file://$WORK_DIR/geomesa/geomesa-jupyter/geomesa-jupyter-vegas/target/original-geomesa-jupyter-vegas_2.11-$GEOMESA_ACCUMULO_VERSION.jar,file://$WORK_DIR/geomesa/geomesa-jupyter/geomesa-jupyter-leaflet/target/geomesa-jupyter-leaflet_2.11-$GEOMESA_ACCUMULO_VERSION.jar"
jupyter toree install \
--replace \
--user \
--kernel_name "GeoMesa Spark $GEOMESA_ACCUMULO_VERSION" \
--spark_home=${SPARK_HOME} \
--spark_opts="--master yarn --jars $jars"
Note: This is the very basic setup. Read the online doc if you require support for Shapefiles, Converters or GeoTools RDD Provider.
Finally, let’s start the Jupyter notebook server:
jupyter notebook
This will automatically open the web page on following URL:
http://localhost:8888
Via the Jupyter website create a new notebook.
Important: Just after you created the new notebook, watch the Jupyter logs to make sure that all the dependencies are correctly picked up. You might see some errors for wrong file references etc.
Errors
Spark context stopped while waiting for backend
ERROR SparkContext: Error initializing SparkContext.
java.lang.IllegalStateException: Spark context stopped while waiting for backend
This is an issue with Java 8 and Yarn. Use the workaround described here: Add the following to yarn-site.xml
within your Hadoop folder:
<property>
<name>yarn.nodemanager.pmem-check-enabled</name>
<value>false</value>
</property>
<property>
<name>yarn.nodemanager.vmem-check-enabled</name>
<value>false</value>
</property>
We have everything set up now - stay tuned for Part 2 where we will look at GeoMesa and Accumulo Basics.