Concepts (AEN 4.2.1)#
System overview¶
The Anaconda Enterprise Notebooks platform consists of 3 main service groups: AEN server, AEN gateway and AEN compute, which are called “nodes”:
- Server node—The administrative front-end to the system where users login, user accounts are stored, and administrators manage the system.
- Gateway node(s)—A reverse proxy that authenticates users and directs them to the proper compute node for their project. Users will not notice this node after installation as it automatically routes them.
- Compute nodes—Where projects are stored and run.
These services can be run on a single machine or distributed across multiple servers.
Organizationally, each AEN installation has exactly 1 server instance and 1 or more gateway instances. Each compute node can only be connected to a single gateway. The collection of compute nodes served by a single gateway is called a data center. You can add data centers to the AEN installation at any time.
EXAMPLE: An AEN deployment with 2 data centers, where 1 gateway has a cluster of 20 physical computers, and the second gateway has 30 virtual machines, must have the following services installed and running:
- 1 AEN server instance
- 2 AEN gateway instances
- 50 AEN compute instances (20 + 30)
Nodes must be configured and maintained separately.
Server node¶
The server node controls login, accounts, admin, project creation and management as well as interfacing with the database. It is the main entry point to AEN for all users. The server node handles project setup and ensures that users are sent to the correct project data center.
Since AEN is web-based, it uses the standard HTTP port 80 or HTTPS port 443 on the server.
AEN uses MongoDB for its internal data persistency. It is typically run on the same host as the server but can also be installed on a separate host.
Server nodes use NGINX to handle the user-facing AEN web interface. NGINX acts as a request proxy for the actual server web-process which runs on a high numbered port that only listens on localhost. NGINX is also responsible for static content.
Server is installed in the /opt/wakari/wakari-server
directory.
Server processes¶
When you view the status of server processes, you may see the processes explained below.
supervisord | details |
---|---|
description | Manage wakari-worker , multiple processes of wk-server . |
user | wakari |
configuration | /opt/wakari/wakari-server/etc/supervisord.conf |
log | /opt/wakari/wakari-server/var/log/supervisord.log |
control | service wakari-server |
ports | none |
wk-server | details |
---|---|
description | Handles user interaction and passing jobs on to the wakari gateway. Access to it is managed by NGINX. |
user | wakari |
command | /opt/wakari/wakari-server/bin/wk-server |
configuration | /opt/wakari/wakari-server/etc/wakari/ |
control | service wakari-server |
logs | /opt/wakari/wakari-server/var/log/wakari/server.log |
ports | Not used in versions after 4.1.2 * |
* AEN 4.1.2 and earlier use port 5000. This port is used only on localhost.
Later versions of AEN use Unix sockets instead. The Unix socket path is:
unix:/opt/wakari/wakari-server/var/run/wakari-server.sock
wakari-worker | details |
---|---|
description | Asynchronously executes tasks from wk-server . |
user | wakari |
logs | /opt/wakari/wakari-server/var/log/wakari/worker.log |
control | service wakari-server |
nginx | details |
---|---|
description | Serves static files and acts as proxy for all other requests passed to wk-server process. * |
user | nginx |
configuration | /etc/nginx/nginx.conf
/opt/wakari/wakari-server/etc/conf.d/www.enterprise.conf |
logs | /var/log/nginx/woc.log /var/log/nginx/woc-error.log |
control | service nginx status |
port | 80 |
* In AEN 4.1.2 and earlier the wk-server process runs on port 5000 on
localhost only. In later versions of AEN the wk-server process uses the Unix
socket path unix:/opt/wakari/wakari-server/var/run/wakari-server.sock
.
NGINX runs at least two processes:
- Master process running as root user.
- Worker processes running as nginx user.
Gateway node¶
The gateway node serves as an access point for a given group of compute nodes. It acts as a proxy service and manages the authorization and mapping of URLs and ports to services that are running on those nodes. The gateway nodes provide a consistent uniform interface for the user.
NOTE: The gateway may also be referred to as a data center because it serves as the proxy for a collection of compute nodes.
You can put a gateway in each data center in a tiered scale-out fashion.
AEN gateway is installed in the /opt/wakari/wakari-gateway
directory.
Gateway processes¶
When you view the status of server processes, you may see the processes explained below.
supervisord | details |
---|---|
description | Manages the wk-gateway process. |
user | wakari |
configuration | /opt/wakari/wakari-gateway/etc/supervisord.conf |
log | /opt/wakari/wakari-gateway/var/log/supervisord.log |
control | service wakari-gateway |
ports | none |
wakari-gateway | details |
---|---|
description | Passes requests from the AEN Server to the Compute nodes. |
user | wakari |
configuration | /opt/wakari/wakari-gateway/etc/wakari/wk-gateway-config.json |
logs | /opt/wakari/wakari-gateway/var/log/wakari/gateway.application.log
/opt/wakari/wakari-gateway/var/log/wakari/gateway.log |
working dir | / (root) |
port | 8089 (webcache) |
Compute node(s)¶
Compute nodes are where applications such as Jupyter Notebook and Workbench actually run. They are also the hosts that a user sees when using the Terminal app or when using SSH to access a node. Compute nodes contain all user-visible programs.
Compute nodes only need to communicate with a gateway, so they can be completely isolated by a firewall.
Each project is associated with one or more compute nodes that are part of a single data center.
AEN compute nodes are installed in the
/opt/wakari/wakari-compute
directory.
Each compute node in the AEN system requires a compute launcher service to mediate access to the server and gateway.
Compute processes¶
When you view the status of server processes, you may see the processes explained below.
supervisord | details |
---|---|
description | Manages the wk-compute process. |
user | wakari |
configuration | /opt/wakari/wakari-compute/etc/supervisord.conf |
log | /opt/wakari/wakari-compute/var/log/supervisord.log |
control | service wakari-compute |
working dir | /opt/wakari/wakari-compute/etc |
ports | none |
wk-compute | details |
---|---|
description | Launches compute processes. |
user | wakari |
configuration | /opt/wakari/wakari-compute/etc/wakari/wk-compute-launcher-config.json
/opt/wakari/wakari-compute/etc/wakari/scripts/config.json |
logs | /opt/wakari/wakari-compute/var/log/wakari/compute-launcher.application.log
/opt/wakari/wakari-compute/var/log/wakari/compute-launcher.log |
working dir | / (root) |
control | service wakari-compute |
port | 5002 (rfe) |
Wk-compute loads each of the following configuration files, in this order:
/etc/wakari/config.json
./etc/wakari/compute-launcher-config.json
../compute-launcher-config.json
.- Any configuration file specified by the
-c
option.
If an option is specified in multiple files, the last one encountered takes precedence.
Supervisor and supervisord¶
AEN uses a process control system called “Supervisor” to run its services. Supervisor is run by the AEN Service Account user, usually wakari or aen_admin.
The Supervisor daemon process is called “supervisord”. It runs in the background and should rarely need to be restarted.
Anaconda environments¶
Each project has an associated conda environment containing the packages needed for that project. When a project is first started, AEN clones a default environment with the name “default” into the project directory.
For more information about environments, see Working with environments.
Projects and permissions¶
AEN users interact with the system predominantly through projects.
Projects are associated with a single data center within the AEN environment. The team of users includes one owner, which is the user that created the project.
Projects live in the projectRoot
folder on the compute
node—by default, /projects
.
The project directory is created the first time a project is
started. The start-project
script clones it
from /opt/wakari/wakari-compute/lib/node_modules/wakari-compute-launcher/skeleton
.
Project directory permissions are:
owner: rwx, user who created the project
group: rwx, group of the owner
other: --x, to allow access to the Public folder
ACL: rwx for any other team members
Files and subdirectories within the project directory have the same permissions as the project directory, except:
- The public folder and everything in it are open to anyone.
- Any files hardlinked into the root anaconda
environment—
/opt/wakari/anaconda
—are owned by the root or wakari users.
Project file and directory permissions are maintained by the
start-project
script. All files and directories in the
project will have their permissions set when the project is
started, except for files owned by root or the AEN_SRVC_ACCT
user—by default, wakari or aen_admin.
The permissions set for files owned by root or the AEN_SRVC_ACCT
user are not changed to avoid changing the permissions settings
of any linked files in the /opt/wakari/anaconda
directory.
CAUTION: Do not start a project as the AEN_SRVC_ACCT user. The permissions system does not correctly manage project files owned by this user.