Posts Tagged ‘operations

11
Feb
11

Self-Classifying Puppet Nodes

Puppet has a  very cool node classification system which pretty much lets you do what you want (by writing your own one) if the default classifier doesn’t work for you. So, there are already a couple of good posts around this, and its worth reading some of the following posts: Jordan Sissel , Gary Larizza as well as the official docs on external node classifiers.

So, from the above posts, I’m going to take a few of the ideas, mix them up, and go through the steps to reproduce on your own system. The goal of the end configuration is to have a node come online, identify itself using it’s Role, Platform, and Environment; and then issue it the relevant classes. What’s important here, is that nodes must be classified, before they reach puppet, into Roles and Platforms (as well as Environment, but this is already handled by puppet). Dividing nodes by their Platform/Role gives us the simplicity needed when you’re managing a large number of machines across different clusters. Its easier to group machines than it is to individually assign classes to each node. Of course, not all your puppetized nodes need to belong to a group as they might just be one machine performing a specific action. In cases like this, we must be able to add exceptions easily.

For the purpose of this post, let’s assume we have 2 clusters in Europe and USA, and each cluster has several Application and Web Servers. I’m also assuming you’re following the recommended puppet-mcollective-facter setup, because it works well.

From a high level overview, we want to write a facter plugin for mcollective which will read a facts file on the host. This facts file will contain the Role and Platform information that can be used from Puppet. We then need an mcollective agent so we can update this file if we need to at a later stage. Finally we look at how to create node classification system that can use these facts to hand out the right manifest.

Identification

Facter is the game, and we need a new fact. So facter gives puppet access to information about a host at run-time like what country a host is in or what distribution of linux it’s running. We’re going to put 2 new facts, and for the sake of best practices, we’ll make it extensible. I don’t like polluting the existing facter namespace with odd names, and so i’m going to prefix all facts with a name (use your company name or whatever you want).

The following is a facter plugin that will parse the file /etc/company.facts and append them to the existing facts.


require 'facter'

if File.exist?("/etc/company.facts")
    File.readlines("/etc/company.facts").each do |line|
        if line =~ /^(.+)=(.+)$/
            var = "company_"+$1.strip; 
            val = $2.strip

            Facter.add(var) do
                setcode { val }
            end
        end
    end
end

Given the following facts file /etc/company.facts:

Role = Web
Platform = USA

We will get the following from facter

...
company_role = Web
company_platform = USA
...

These variables are now available straight away in your puppet manifests.

Updating the Facts
Before i continue on using these facts in puppet, its important to have a way to update the facts. Equally important is that you implement the facts into your server deploy process. So, we have a script that installs mcollective and puppet when we commission a new server, and one of the first things that is done is to create this file and automatically populate the Role and Platform based on the commissioning paramaters.

Apart from server deploy-time, we can write a small mcollective RPC agent which will get/set/delete values from our facts file. The file has a simple key-value structure and so the following should do the job

module MCollective
	module Agent
		class Companyfact<RPC::Agent
			metadata	:name		=> "Company Fact Agent",
					:description	=> "Key/values in a text file",
					:author		=> "Puppet Master Guy",
					:license	=> "GPL",
					:version	=> "Version 1",
					:url		=> "www.company.com",
					:timeout	=> 10
			
			companyfile = "/etc/company.facts"
	
			def parse_facts(fname)
				begin
					if File.exist?(fname)
						kv_map = {}
						File.readlines(fname).each do |line|
							if line =~ /^(.+)=(.+)$/	
								@key = $1.strip;				 
								@val = $2.strip				  
								kv_map.update({@key=>@val})
							end						 
						end					 
						return kv_map
					else
						f = File.open(fname,'w')
						f.close
						return {}
					end 			
				rescue
					logger.warn("Could not access company facts file. There was an error in companyfacts.rb:parse_facts")
					return {}
				end
			end

			def write_facts(fname, facts)

				if not File.exists?(File.dirname(fname))
 				   Dir.mkdir(File.dirname(fname))
				end

				begin
					f = File.open(fname,"w+")
					facts.each do |k,v|
						f.puts("#{k} = #{v}")
					end
					f.close
					return true
				rescue
					return false
				end
			end

			action "get" do
				validate :key, String
				
				kv_map = parse_facts(companyfile)
				if kv_map[request[:key]] != nil
					reply[:value] = kv_map[request[:key]]
				end
			end

			action "put" do
				validate :key, String
				validate :value, String

				kv_map = parse_facts(companyfile)
				kv_map.update({request[:key] => request[:value]})

				if write_facts(companyfile,kv_map)
					reply[:msg] = "Settings Updated!"
				else
					reply.fail!  "Could not write file!"
				end

			end
			action "delete" do
				validate :key, String

				kv_map = parse_facts(companyfile)	
				kv_map.delete(request[:key])

				if write_facts(companyfile,kv_map)
					reply[:msg] = "Setting deleted!"
				else
					reply.fail!  "Could not write file!"
				end

			end
		end
	end
end

We also need the ddl:

metadata        :name           => "Company Fact Agent",
		:description    => "Key/values in a text file",
		:author         => "Puppet Master Guy",
		:license        => "GPL",
		:version        => "Version 1",
		:url            => "www.company.com",
		:timeout        => 10

action "get",	:description => "fetches a value from a file" do
	display :failed

	input :key,
		:prompt		=> "Key",
		:description	=> "Key you want from the file",
		:type		=> :string,
		:validation	=> '^[a-zA-Z0-9_]+$',
		:optional	=> false,
		:maxlength	=> 90
	
	output :value,
		:description	=> "Value",
		:display_as	=> "Value" 
end

action "put", :description = "Value to add to file" do
	display :failed

	input :key,
		:prompt		=> "Key",
		:description	=> "Key you want to set in the file",
		:type 		=> :string,
		:validation	=> '^[a-zA-Z0-9_]+$',
		:optional	=> false,
		:maxlength	=> 90

	input :value,
                :prompt         => "Value",
                :description    => "Value you want to set in the file",
                :type           => :string,
                :validation     => '^[a-zA-Z0-9_]+$',
                :optional       => false,
                :maxlength      => 90

	output :msg,
		:description	=> "Status",
		:display_as	=> "Status"
end

action "delete", :description = "Delete a key/value pair if it exists" do
        display :failed

        input :key,
                :prompt         => "Key",
                :description    => "Key you want to change in the file",
                :type           => :string,
                :validation     => '^[a-zA-Z0-9_]+$',
                :optional       => false,
                :maxlength      => 90

        output :msg,
                :description    => "Status",
                :display_as     => "Status"
end

For a quick refresh on using your mc-rpc agent, we can set a key using the following:
mc-rpc -v --agent companyfact --action put --argument key=role --argument value=Web

And we can get a key using the following
mc-rpc -v --agent companyfact --action get --argument key=role

And we can delete a key using the following
mc-rpc -v --agent companyfact --action delete --argument key=role

Self-Classifying Nodes
This is where we want to be. A node comes in and says to puppet, I’m a Web machine on platform USA.

The default basic setup is to use a node definition for each node, or plug some sort of external classifier on. I’m going to build on from Jordan Sissel’s blog that I mentioned at the start. Essentially, every node goes through the ‘default’ node definition, which then goes to the ‘truth enforcer’. This truth enforcer will look at the facts of the node and hand off the relevant classes accordingly. Note that if you want to add exceptions, just create a node definition for the exception node. simple.

So the enforcer node is a very basic definition:

node default {
  include truth::enforcer
}

From here, we create a truth enforcer class like so (using our example). Naturally this is just an example of how it might be used:

class truth::enforcer {

        $groupname = "$company_platform:$company_role"
        case $groupname {
                "USA:Web" : {
                        include roles::web
                }
        }

        case $company_role {
                "Application" : {
                        include roles::application
                }
        }       
}

That’s pretty much it as far as getting a self-classifying puppet node goes. One more thing that’s worth mentioning is that this also ties in well with Extlookup to manage your parameters. You can use something like the following configuration which I find works well:

$extlookup_precedence = ["fqdn_%{fqdn}", "role_%{company_role}-%{company_platform}", "platform_%{company_platform}", "common"]

Comments or questions welcome.

07
Jan
11

Pulling a list of hosts from MCollective for Puppet

In one of my previous posts, I wrote about using foreman (a node classifier and dashboard for puppet) to retrieve a list of hosts (and meta information) so that you can use it in a puppet manifest. We’re looking to do something like iterate over a set of nodes in a template (I use it for generating the Munin server config). Stored resources provide a way to centralize information from nodes, but it isn’t very intuitive, and gets a little tricky to plan and maintain.

While foreman is good for a lot of use-cases, not everyone uses it. So, I want to provide an alternative for those that don’t use foreman. The alternative uses MCollective to populate a list of hosts based on information given to us by the MC Registration plugin. Before I dive in, I’d like to quickly cover off a blog from the MCollective architect, R.I.Pienaar, on this very topic. His blog post (PuppetSearch) brings together 3 things. MongoDB, MC Registration, and Puppet in a very powerful way. It is a really good solution, is more robust, and we’ve since moved on to something similar. The following blog is more just to help understand how flexible puppet can be, and how well it integrates with MCollective. The advantage of using PuppetSearch is that you can load a specific node, and you can query using MongoDB syntax.

So, we’re looking to achieve something quite simple really. We want a subset of hosts matching a basic query, with their meta information, in a variable in a puppet manifest.

MCollective and Registration
MCollective is great, and very flexible. One of the core plugins for MCollective is called registration. Essentially, every node/host sends a registration message at a pre-determined interval set in the configuration. The registration message is sent as a broadcast, and so any client can pick up the registration messages of any other client. We only want one registration handler, but its nice to know that there can be more than one handler (1 per node).

The registration message can be anything, and in our case, we want to send the client identifier, it’s facts, and maybe you want to add some more information along the way. This message is picked up by a handler, which then processes the message.

Currently, we’ve taken the lead of R.I.Pienaar (from his blog above) and shoved the messages straight into a MongoDB instance so that we can query it and use it for different parts of our operations infrastructure. Since that’s done, I’m going to cover off the plain old text file version. It’s extremely similar in architecture, but the code does completely different things.

Installing the registration plugin isn’t too difficult. We need to do it in 2 parts which is pretty standard and has been well documented.

Part 1 is getting the clients to register with all the information they have, including facts, and anything else you like. For this, one of the documented registrations will do just fine. This file will sit in your MCollective plugins/mcollective/registration directory. You then need to adjust your config file for the server to say what registration plugin you’re using and how often.

server.cfg

registerinterval = 300
registration = Meta

After this, you should have registration messages flying around. You need to handle them. The RegistrationAgent provides a simple handler which will write the messages to a text file per client. check_mcollective is optional, and we won’t be using it. It provides a link to nagios if you wish to explore. So, for the client we want to handle the registration messages (probably your mcollective/puppet server), we want to put registration.rb in MCollective plugins/mcollective/agent.

Restart MCollective on all nodes that have been affected, and you should start seeing the registrations text files populating in /var/tmp/mcollective/. You can change this directory by specifying it in the client.cfg on the node where the handler is. (plugin.registration.directory = '/etc/mcollective/registered/')

Ok, so that’s MCollective registration done. If you want to add more information, just make some changes to meta.rb.

Hostlist (Puppet function)

Ok, here we come onto the real topic which is to use what we’ve implemented above to retrieve a list of hosts with their facts. The function itself is very easy because all the information we have is already in YAML and we can just load it, and spit it out!

So the function below is a puppet parser function which can be used in manifests. Please excuse my Ruby…


# mc_hostlist.rb
# Duncan Phillips

# Retrieve a list of hosts and their meta information by querying the data stored by the registration agent.
# info on the registration agent can be found at http://marionette-collective.org/reference/plugins/registration.html

# Usage: mc_hostlist([class],[fact])
# If neither is specified, all hosts are returned.
# Class and Fact are filters and can both be specified
# Fact can be specific or non-specific. i.e. machine with fact, or machine with fact=z

# e.g. mc_hostlist(class=hosting, fact=operatingsystem=Ubuntu)
#[hostname => {facts : {fact1 : value1}, classes : {class1 : value 1}}]

require 'yaml'

module Puppet::Parser::Functions
	newfunction(:mc_hostlist, :type => :rvalue) do |args|
		#populate our array/map
		hosts = Dir.entries("/var/tmp/mcollective")
		for h in hosts do
			begin
				if (h == '.') or (h == '..')
					hosts[hosts.index(h)] = nil; 
				else
					hfile = open("/var/tmp/mcollective/"+h)
					raw = hfile.read.gsub("!ruby/sym ","")
					hosts[hosts.index(h)]=YAML.load(raw).merge({"fqdn"=>h})
				end
			rescue Exception => e
				raise Puppet::ParseError, "There was an exception: " + e + "\n"
			end
		end

		args.each do |arg|

			name, value, factvalue = arg.split("=")

			case name
			when "fact"
				hosts=hosts.compact
				for h in hosts do
					if hosts[hosts.index(h)]["facts"][value]
						if (factvalue) and (hosts[hosts.index(h)]["facts"][value] != factvalue)
							hosts[hosts.index(h)]=nil
						end
					else
						hosts[hosts.index(h)]=nil
					end
				end
			when "class"
				hosts=hosts.compact
				for h in hosts do
					if hosts[hosts.index(h)]["classes"].index(value) == nil
						hosts[hosts.index(h)]=nil
					end
				end
			 
			else
				raise Puppet::ParseError, "mc_hostlist: Invalid parameter #{name}"
			end #case
		end #args
	
		return hosts.compact

	end #func
end

This is a puppet parser, and so it needs to be installed into a module as such. You can find out more about this here. I recommend just putting it into the common module, in which case it will go into MCollective modules/common/lib/puppet/parser/functions/. After this you’ll need to resync the plugins (usually a puppet run will suffice if pluginsync is turned on in the configs).

Into the manifest

So, how do we use this? I’m going to give some insight into how one can use this to generate a Munin conf file… I won’t go into other bits, but will look at what’s relevant here.

Below is an example of how one might use a list of all hosts which have the class ‘Web‘ to create an aggregate graph. We can aggregate anything, for now we’ll create a graph of the load for every node in the list.


	$hl_web  = mc_hostlist("class=Web")

	file {	"/etc/munin/munin.conf": content => template("munin/munin.conf"), }

We can then use this in the template file as below (Once again, excuse my Ruby):


<% hl_web.each do |h| -%>

# Register the nodes

[GroupName;<%= h['fqdn'] %>]
        address <%= h['fqdn'] %>
        use_node_name yes
<% end %>

# Create a new group Totals which holds aggregate graphs

[Totals; GroupName]

        # Generate our aggregate graph

        web_load.graph_title Load Average
        web_load.graph_category GroupName
        web_load.graph_scale no
        web_load.graph_vlabel Load
        web_load.graph_order \<% hl_web.each_with_index do |h,i| -%><% if (h != '') %>
                <%= h['fqdn'][/[a-zA-Z0-9]*/] %>=GroupName;<%= h['fqdn'] %>:load.load <% if i != hl_web.size-1 -%>\<% end -%><% end -%><% end %>
        <% hl_web.each_with_index do |h,i| %>
        <% if (h != '') -%>web_load.<%= h['fqdn'][/[a-zA-Z0-9]*/] %>.draw LINE1<% end %>
        <% end %>

End result: A nice aggregate graph that will dynamically add hosts as they register.

18
Nov
10

Get Better: Processes

The get better series puts a focus on some utils and knowledge areas where people like me tend to learn enough to get going, but just forget to come back later, learn more, and get the most out what you’re given.

When it comes to process management, my usual is to check top for what’s consuming cpu and memory, then have a glance at the load. If i’m looking to see whether something is running or not, just a quick ps|grep does the trick. This only touches the surface as these two utilities provide huge amounts of insight into processes, how they’re running, what section of the code is running, what the program stack is doing, and so on…

I’d like to delve into various aspects of processes, from some simple adjustments in using ps, to some of the less documented fields in /proc/pid/x .

more on ps

nb: I will be skipping over some parameters that most will already be familiar with like custom output formats(o), thread display(H), sorting(k), filtering.

Firstly, i want to cover off something that a lot of people might have glanced over in the man page and not seen. %CPU is not really the amount of percentage of CPU that the process is consuming for the entire machine. It is the quotient of the CPU time used / the CPU time that the process has been given to use. This equates more roughly to the efficiency of the process. 100% means that the process is using all the cpu time its being given. This is why if you add the cpu usage for all processes together you are very unlikely to get 100%. Most processes will only use a short amount of its allocated time slice.

Let’s start with a quick win that some might not have come across already… the awesome Forest mode (f). This output format sorts processes according the their job batch and displays them as children. This is extremely useful to see which commands have been forked off from other commands, and where dependencies lie.
Below: very easy to see just what’s going on:


 /usr/sbin/sshd
  \_ sshd: root@pts/0 
      \_ -bash
          \_ ps axwwf -o command

Once you can see dependencies, we’d like to look at things that help us debug a bit more what might be causing something to eat cpu, or hang, etc… For that, we’re going to be looking at the Status (STAT) and Waiting Channel (WCHAN) columns. Status comes in most views, but waiting channel is more limited. Its available in the long view (l) format, or you can use a custom output.

Status is pretty simple. A process can be in one of about 7 basic states which are all documented in the man pages. They are all pretty self explanatory and easy to guess what they mean. In practice, most processes will be in D/R/S. Uninterruptable Sleep (D) means that the executing code is currently in the kernel which usually amounts to I/O access. While in this state the process won’t respond to signals, instead they will just be queued, and when the code returns from the kernal mode, the queue is read and the signals are processed. Interruptible sleep means that the code is running and will process any signals sent such as kill.

The waiting channel is lesser known, but good to know. It provides insight as to what a sleeping process is waiting for. A short code is given which may or may not help depending on what you’re trying to find out. ps shortens this field to 6 characters, and so, if you don’t quite get what you want, you’re going to need to dig a bit more. ps gets this code from /proc/pid/wchan, so you can just cat that for your full code. To take this further involves delving into kernel code to trace exactly what might be causing a stall. Things get serious here, and so i’m going to leave it at that for now. Note that for kernels <2.6, system.map must be installed. I didn't find a lot of info out there on WCHAN, so if you find anything, drop me a message.

A quick note on the TIME column. I was quick to assume in the beginning that this was how long the process had been running. Then I noticed that some processes have 0:00 in their time field. Running in forest mode, I can see that all these processes either have child processes, or are in a sleep state. Reading up, this field is for the amount of time that the process has access to the cpu (the linux scheduler determines how much time each process gets).

TTY is also an interesting field. Simply put, it tells us what terminal the command is running on. Using this info we can see what other processes are being run from that terminal (using ps -T pty_num), try to redirect the output or play with the terminal settings using stty. You can read up more on terminals here.

The scheduler class (CLS) is available by specifying the optional format class. This tells us what scheduler class the process is using. In almost all cases we will see the the class is TS which, according to man ps, is for the conventional time-sharing scheduler algorithm.

Pending Signals is interesting to see whether a process has received one or more signals in it’s queue. When the process returns from kernel space, it will process these signals if they are not in the Ignore Signals field. I won’t go further into this as it also leads into kernel code – not my strong point :)

more from /proc
Proc is central to the kernel and on the side, it gives us a look into whats going on in our system. Part of the proc system is dedicated to processes and information on them. This information can be found in /proc/pid where pid is the actual process id.

/proc/pid/environ is a list of the environment variables that the process is using.
/proc/pid/cwd is a link to the working directory
/proc/pid/root tells us what the process thinks its root directory is (different for cases of chroot’d apps)

/proc/pid/fd is a very interesting and useful section. It provides a list of file descriptors letting us know what files the process has open and is accessing. The lsof util gives us insight into this. The usual suspects are normal files, sockets and pipes. In kernels >2.6.22, there is also a directory called /proc/pid/fdinfo which will give you pos and flags. Pos is the position of the file pointer and flags gives us the flags with which is was opened (write/read/append). The primary benefit of knowing the open files is simply to know what the application is doing, or where it is writing logs, communicating, etc…

That’s all i’ve got on processes at the moment. Hope this helps some people…

19
Aug
10

Ushering in a new Nagios with Ninja and Geomap

Nagios is one of those pretty essential tools for managing rather large systems. Having been around for almost 10 years now, most of the work has been focused around getting a stable capable system. The user interface has taken a bit of a back seat, staying a little clunky in true web 1.0 style, while the rest of the web has geared up into more interactive and usable interfaces. Ninja, a frontend developed by Op5, is looking to change this and bring Nagios up to speed. Enter the Ninja.

Op5 sells custom monitoring SaaS based off Nagios and have made the majority of their code public and accessible. They have made a huge contribution to Nagios in their release of Ninja and while not their only contribution, Ninja offers a new interface with customizable components and widgets. These features have set it on a path to become the new face of Nagios. It has a shiny interface with some new features and reporting.

One of my favourite features, was unfortunately also one of the hardest features to get working. Geomap is a flash application that sits on the dashboard (Tactical Overview) and lets you visualize your network status across the globe. You can setup nodes and assign them to points on the map, and create connections between points. This way, a quick trip to the Dash can tell you if an international connection has gone down or whether one of your routers is causing an outage for several machines.

The complete setup can be a little tricky and involves following multiple guides which you just copy and paste. The geomap is still in beta, and so there is no guide for it at all. So, if you want to give it a go, I suggest firing up an Ubuntu EC2 machine and following along to get going. For ease of access I have the entire script at pastebin

The first step to get going is to get the basic nagios install going. The following little snippet will install the dependencies (apache+some libraries), add a nagios user, install nagios and setup some basic auth for apache. Note that you could also install Nagios from your package manager if you want, but if it installs to a different location, just take note. You’ll also need to check the users/groups are setup


# install our dependencies
apt-get install --assume-yes apache2
apt-get install --assume-yes libapache2-mod-php5
apt-get install --assume-yes build-essential
apt-get install --assume-yes libgd2-xpm
apt-get install --assume-yes libgd2-xpm-dev

# stop apache running since we're going to reconfigure it 
/etc/init.d/apache2 stop

# add our nagios user
/usr/sbin/useradd -m -s /bin/bash nagios

# set the password
passwd nagios

# add the nagios user to the apache group (www-data)
/usr/sbin/groupadd nagcmd
/usr/sbin/usermod -a -G nagcmd nagios
/usr/sbin/usermod -a -G nagcmd www-data

# let's fetch nagios + plugins and install

mkdir nagiosInstalls && cd nagiosInstalls
wget --quiet http://prdownloads.sourceforge.net/sourceforge/nagios/nagios-3.2.1.tar.gz
wget --quiet http://prdownloads.sourceforge.net/sourceforge/nagiosplug/nagios-plugins-1.4.15.tar.gz
 
tar xzf nagios-3.2.1.tar.gz && cd nagios-3.2.1
./configure --with-command-group=nagcmd
make all
make install
make install-init
make install-config
make install-commandmode
make install-webconf
cd ..

# optional - change email address in nagios config to your email
sed -i 's/nagios@localhost/nagiosadmin@yourcompany.com/' /usr/local/nagios/etc/objects/contacts.cfg

# setup htaccess for the front-end. If you want to skip this you can, but just remember to
# edit /etc/apache/conf/nagios.conf and remove the basic auth section

htpasswd -b -c /usr/local/nagios/etc/htpasswd.users nagiosadmin nagiosadmin

# install the plugins

tar xzf nagios-plugins-1.4.15.tar.gz && cd nagios-plugins-1.4.15
./configure --with-nagios-user=nagios --with-nagios-group=nagios
make && make install
cd ..

Next up we want to install Merlin (another contribution from Op5). Merlin converts nagios information into a database where each entity is a table. This offers good flexibility and scalability, the kind of things we’re looking for when monitoring large networks!

Unfortunately Merlin hasn’t been packaged on any distro that I know of yet, so for now we’ll need to install from source.


# first up, install mysql, if already installed, skip this...

# set some debconf stuff for mysql so we don't get asked too many questions, this is optional...
# NB: change your_password to whatever you want your password for mysql to be.
cat <<EOF | debconf-set-selections
debconf debconf/frontend select Readline
mysql-server-5.0 mysql-server/root_password_again your_password
mysql-server-5.0 mysql-server/root_password your_password
mysql-server-5.0 mysql-server-5.0/really_downgrade boolean false
mysql-server-5.0 mysql-server-5.0/need_sarge_compat boolean false
mysql-server-5.0 mysql-server-5.0/start_on_boot boolean true
mysql-server-5.0 mysql-server-5.0/nis_warning note
mysql-server-5.0 mysql-server-5.0/postrm_remove_databases boolean false
mysql-server-5.0 mysql-server-5.0/need_sarge_compat_done boolean true
EOF

apt-get install --assume-yes mysql-server

# merlin will need these libraries if you don't have them installed
apt-get install --assume-yes libapache2-mod-php5 libdbi0 libdbi0-dev libdbd-mysql php5-cli php5-mysql

# READ - DON'T JUST PASTE
# this section we login to our database and create a new table, and a new user, and grant the privileges accordingly

mysql -u root -e 'create database merlin'
mysql -u root -e "grant all privileges on merlin.* to merlin@localhost identified by 'merlin'"
mysql -u root -e 'flush privileges'

# finally, install merlin

wget --quiet http://www.op5.org/op5media/op5.org/downloads/merlin-0.6.8.tar.gz
tar -zxvf merlin-0.6.8.tar.gz && cd merlin-0.6.8
make

# Note: You'll probably see an error message at the end. I'm not sure if everyone gets it, but i did and it didnt have an impact on anything.
./install-merlin.sh --nagios-cfg=/usr/local/nagios/etc/nagios.cfg --dest-dir=/usr/local/nagios/addons/merlin --batch || echo #merlin install always fails, but not really.

cd ..

Next up is ninja. This is where things start getting a little harder and we have to juggle some config files.
Install paths matter hugely, if you installed nagios different, be sure to check the following snippet carefully.


# download ninja and the reports module
wget --quiet http://www.op5.org/op5media/op5.org/downloads/ninja-1.0.1.tar.gz
wget --quiet http://www.op5.org/op5media/op5.org/downloads/reports-module-2.0.10.tar.gz

# installing ninja is just copying the folder across (it's a plugin for nagios)

tar -zxvf ninja-1.0.1.tar.gz
cp -a ninja-1.0.1 /usr/local/nagios/addons/ninja

# now we need to modify some configs

pushd /usr/local/nagios/addons/ninja
sed -i 's~$merlin_path = .*$~$merlin_path = "/usr/local/nagios/addons/merlin";~g' install_scripts/auth_import_mysql.php
sed -i 's~$nagios_cfg_path = .*$~$nagios_cfg_path = "/usr/local/nagios/etc";~g' install_scripts/auth_import_mysql.php
sed -i 's/\/bin\/awk/\/usr\/bin\/awk/g' install_scripts/auth_import_mysql.php
install_scripts/ninja_db_init.sh /usr/local/nagios/addons/ninja
popd

# install reports module
tar zxf reports-module-2.0.10.tar.gz
cp -a reports-module-2.0.10 /usr/local/nagios/addons/reports-module
pushd /usr/local/nagios/addons/reports-module

# ninja also needs graphviz
apt-get install --assume-yes php5-gd graphviz

# now we need the libmysql
failed=0
apt-get install --assume-yes libmysqld-dev || failed=1
if [ $failed == 1 ]
then
        echo "WARNING: Failed to isntall the right mysql development files. Trying to find right package"
        p=`apt-cache search libmysql | grep 'database development files' | awk '{print $1}'
        echo "Installing $p"
        [ "x$p" != "x" ] && apt-get install $p
fi

# we're getting ready to run the ninja setup script which integrates itself with nagios and merlin
# before we do that, we need to change some paths

sed -i 's~mod_path=/opt/monitor/op5/reports/module~mod_path=/usr/local/nagios/addons/reports-module~g' scripts/setup.sh
sed -i 's~prefix=/opt/monitor~prefix=/usr/local/nagios~g' scripts/setup.sh
sed -i 's~php $mod_path/find_configured.php \\~~g' scripts/setup.sh
sed -i 's~> /tmp/$name.interesting~~g' scripts/setup.sh
sed -i 's~archived="/opt/monitor/var/archives/nagios-*.log"~archived="$prefix/var/archives/nagios-*.log"~g' scripts/setup.sh
sed -i 's~nagioslog=/opt/monitor/var/nagios.log~nagioslog=$prefix/var/nagios.log~g' scripts/setup.sh
sed -i 's~/etc/rc.d/init.d/monitor start~~g' scripts/setup.sh

# run it!

bash scripts/setup.sh
make

# install reports into the db
mysql monitor_reports < /usr/local/nagios/addons/ninja/install_scripts/reports.sql

# change an installed config

cd ../ninja/application/config/
sed -i '/nagios_base_path/ s~/opt/monitor~/usr/local/nagios~g' config.php
popd

Ok, so we’ve got Nagios, Ninja and Merlin installed. Note that your apache isn’t configured yet, so don’t worry that you can’t load up ninja or nagios just yet!

This next step is to get Geomap working. As I said in the opening paragraph, Geomap is a funky little contribution which maps hosts/connections on a world map.

If you want geomap working, this next snippet is for that!

Currently, nagvis does not have the bits and pieces we need. Op5 has a custom build which does. So this next step involves taking nagvis from op5-monitor (Nagvis falls under the GPL and Op5 have kindly said that everything is fine)


# We need to rip nagvis from the rpm
apt-get install --assume-yes cpio rpm
wget http://download.op5.com/shop/op5-monitor-software-install-latest.tar.gz
tar -xzvf op5-monitor-software-install-latest.tar.gz && pushd monitor-software*
cd rpm
nv=`ls | grep nagvis`
rpm2cpio $nv | cpio -dimv
cd opt/monitor/op5/nagvis/
 
# install nagvis 
chmod u+x install.sh
./install.sh -n /usr/local/nagios -B /usr/local/nagios/bin/nagios -b /usr/local/bin -p /usr/local/nagios/addons/nagvis -u nagios -g nagcmd -w /tmp/ -i merlinmy -q || echo

# fix up some permissions and configs
chmod g+wx /usr/local/nagios/addons/nagvis/var/
cd /usr/local/nagios/addons/nagvis/etc
sed -i 's~base="/opt/monitor/op5/nagvis/"~base="/usr/local/nagios/addons/nagvis/"~g' nagvis.ini.php
chmod g+w /usr/local/nagios/addons/nagvis/etc/geomap/*
popd

Ok, whew, this is quite a process. Everything is installed and waiting, now we have to do now is get our apache setup running. A bit of an annoyance is that geomap needs some nagvis stuff behind ssl. So we need to generate a certificate

We need /etc/ssl/server.crt and /etc/ssl/server.key.insecure. You can read up on generating certificates here or here


#enable apache ssl and rewrite
a2enmod ssl
a2enmod rewrite

#backup old apache conf.d directory
mv /etc/apache2/conf.d /etc/apache2/conf.d.pre_nagios
mkdir /etc/apache2/conf.d

# if you know you're ip, just set ip= instead of the following attempt to auto-detect your ip
ip_list=`ifconfig | grep 'inet addr' | tr ':' ' ' | awk '{print $3}'`
ip=${ip_list%%[^0-9\.]*}
[ "x$ip_list" != "x$ip" ] && echo "WARNING : Multiple IP Addresses Found, using first one for apache configs."

# let's create our apache config, you might want yours to be different. This is just something to get you going

cat > /etc/apache2/sites-available/nagios.conf << EOF
<IfModule !mod_alias.c>
        LoadModule alias_module modules/mod_alias.so
</IfModule>
 
NameVirtualHost *:80
 
<VirtualHost $ip:80>
 
 
ScriptAlias /nagios/cgi-bin "/usr/local/nagios/sbin"
 
<Directory "/usr/local/nagios/sbin">
   Options ExecCGI
   AllowOverride None
   Order allow,deny
   Allow from all
   AuthName "Nagios Access"
   AuthType Basic
   AuthUserFile /usr/local/nagios/etc/htpasswd.users
   Require valid-user
</Directory>
 
Alias /nagios "/usr/local/nagios/share"
 
<Directory "/usr/local/nagios/share">
   Options None
   AllowOverride None
   Order allow,deny
   Allow from all
   AuthName "Nagios Access"
   AuthType Basic
   AuthUserFile /usr/local/nagios/etc/htpasswd.users
   Require valid-user
</Directory>
 
 
Alias /nagvis "/usr/local/nagios/addons/nagvis/share"
 
<Directory "/usr/local/nagios/addons/nagvis/share">
  Options FollowSymLinks
  AllowOverride None
  Order allow,deny
  Allow from all
 
  AuthName "NagVis Access"
  AuthType Basic
  AuthUserFile /usr/local/nagios/etc/htpasswd.users
  Require valid-user
 
  <IfModule mod_rewrite.c>
    RewriteEngine On
    RewriteBase /nagvis
   
    RewriteCond %{REQUEST_URI} ^/nagvis(/index\.php|/|)(\?.*|)$
    RewriteRule ^(index\.php|)(\?.*|)$ /nagvis/frontend/nagvis-js/$1$2 [R=301,L]
    RewriteCond %{REQUEST_URI} ^/nagvis/config\.php.*$
    RewriteRule ^config\.php(.*) /nagvis/frontend/wui/$1 [R=301,L]
   
    RewriteCond %{REQUEST_URI} ^/nagvis/frontend/nagvis-js
    RewriteCond %{QUERY_STRING} map=(.*)
    RewriteRule ^(.*)$ /nagvis/frontend/nagvis-js/index.php?mod=Map&act=view&show=%1 [R=301,L]
 
    RewriteCond %{REQUEST_URI} ^/nagvis/frontend/wui
    RewriteCond %{QUERY_STRING} map=(.*)
    RewriteRule ^(.*)$ /nagvis/frontend/wui/index.php?mod=Map&act=edit&show=%1 [R=301,L]
 
    RewriteCond %{REQUEST_URI} ^/nagvis/frontend/nagvis-js
    RewriteCond %{QUERY_STRING} !mod
    RewriteCond %{QUERY_STRING} rotation=(.*)
    RewriteRule ^(.*)$ /nagvis/frontend/nagvis-js/index.php?mod=Rotation&act=view&show=%1 [R=301,L]
  </IfModule>
</Directory>
 
        <IfModule !mod_alias.c>
                LoadModule alias_module modules/mod_alias.so
        </IfModule>
 
        Alias /ninja /usr/local/nagios/addons/ninja
        <Directory "/usr/local/nagios/addons/ninja">
                Order allow,deny
                Allow from all
                DirectoryIndex index.php
        </Directory>
 
</VirtualHost>
 
<VirtualHost $ip:443>
        SSLEngine On
        SSLCertificateFile /etc/ssl/server.crt
        SSLCertificateKeyFile /etc/ssl/server.key.insecure
       
        DocumentRoot /usr/local/nagios/addons/nagvis/share/netmap/
        Alias /nagvis "/usr/local/nagios/addons/nagvis/share/"
       
        RewriteEngine On
        RewriteRule ^crossdomain.xml$ nagvis/netmap/crossdomain.xml
 
        <Directory />
                Options FollowSymLinks
                AllowOverride None
        </Directory>
 
 
        <Directory "/usr/local/nagios/addons/nagvis/share/">
                Order allow,deny
                Allow from all
                DirectoryIndex index.php
                AllowOverride None
        </Directory>
 
</VirtualHost>
EOF

LASTLY! and this is important. We need to fix up some permissions and start our services!


# Permissions

mkdir /usr/local/nagios/addons/ninja/application/logs || echo
chown nagios:nagcmd -R /usr/local/nagios/addons/
chmod g+wx /usr/local/nagios/addons/ninja/application/logs/

# Daemons
/etc/init.d/merlind stop
/etc/init.d/nagios stop
/etc/init.d/apache2 stop
sleep 10
/etc/init.d/nagios start
/etc/init.d/merlind start
/etc/init.d/apache2 start

That’s it. It’s quite a complicated process for now, but hopefully in the coming months, ninja will be integrated more with nagios and we’ll see the need for this process dissappear!

These scripts come with no guarentee and it’s recommended you don’t run anything on a production machine unless you know what you’re doing.




Follow

Get every new post delivered to your Inbox.