[osiris] Re: Osiris scaling
David Thiel
lx at redundancy.redundancy.org
Thu Jun 29 16:41:57 EDT 2006
On Thu, Jun 29, 2006 at 10:15:41AM -0400, David Vasil wrote:
> I was wondering what sort of configurations people with large Osiris
> deployments are using. How many hosts are you scanning? How often are
> you scanning them? How many files are scanned per scan? What hardware
> is the MD run on?
I'm scanning ~1400 hosts, every 3 hours. My scan configs are fairly
close to the defaults, with a couple extra inclusions. The hardware
is an unimpressive 1U Rackable running FreeBSD, with a hot spare that
takes backups of it.
> How do you manage that many hosts? I have a friend who was wondering
> how well osiris scaled to possibly 1000's of hosts. I can only imagine
> that it would be an absolute management nightmare to have anywhere near
> 1000 hosts in a single deployment. What are your experiences?
My strategy is this:
- Every host in the environment gets added to the LDAP directory and to
CFengine.
- CFengine installs the osiris packages on every newly rolled-out machine
and starts the agent.
- I use a periodically run a script to walk through every host name in LDAP,
ensure that port 2265 is open by using tcping and check to see if
it's already been initialized or not. If the host is listening but
not in osiris, an option is given to initialize the host(just by
creating a directory with a template file, searched-and-replaced
with the hostname). It then checks to see if there are any hosts
left over in osiris that aren't in LDAP, aren't listening or
don't resolve, giving an option to delete them. There are
a couple other additional sanity checks, but this is about it.
- I use another script that parses the osiris syslogs to provide daily
reports on changes, similar to the e-mail reports, but
condensed. In its basic form, it shows:
- List of changed files, and on which hosts they changed
- Lists of hosts that changed, and which files changed on them
- What users or groups were changed anywhere in the environment
This is to cut down on things like the several thousand alerts
or e-mails that occur when you do something like upgrade sshd
across the board.
The result ends up looking like this(not a real report, just
snippets from various ones):
#############################
Affected files report:
#############################
/etc/ssh/sshd_config
doppn1 [inode]
doppn1 [mtime]
doppn1 [ctime]
peregrine [mtime]
peregrine [ctime]
podbse1-16: cd26be65eea2554f5c2d83a86b4b1fd988b1f9f5 -> 22440191d031c6ceeb2e41d63484776817652409
podbse116 [inode]
podbse116 [mtime]
podbse116 [ctime]
podbse116 [bytes]
podpse720: cd26be65eea2554f5c2d83a86b4b1fd988b1f9f5 -> 22440191d031c6ceeb2e41d63484776817652409
podpse720 [inode]
podpse720 [mtime]
podpse720 [ctime]
podpse720 [bytes]
podbse115: cd26be65eea2554f5c2d83a86b4b1fd988b1f9f5 -> 22440191d031c6ceeb2e41d63484776817652409
#############################
Affected hosts report:
#############################
nyse34
/boot [mtime]
/boot [ctime]
/boot/config [mtime]
/boot/config [ctime]
/boot/kernel.h [mtime]
/boot/kernel.h [ctime]
/etc/gmond.conf [checksum]
/etc/gmond.conf [inode]
/etc/gmond.conf [uid]
/etc/gmond.conf [gid]
/etc/gmond.conf [mtime]
/etc/gmond.conf [ctime]
/etc/gmond.conf [bytes]
/etc/init.d/cfservd [checksum]
#############################
User change report:
#############################
missing: dthiel
se116
se720
se115
se312
se313
Obviously these scripts are pretty site-specific and far from perfect,
but it should be pretty easy to come up with the glue you need for your
own environment. I had hoped to be able to switch over to using the
Host Integrity console, but it failed to materialize, at least with the
functionality needed for really large-scale environments.
-David
More information about the osiris
mailing list