Monitoring: events correlation on a timeline

June 30th, 2008

Another monitoring feature to put on a wish list.

At Oanda, FxNewsEffects shows a price graph with related news plotted.

That would be cool to have such a feature for monitoring; the graph would be whatever we monitor, and the news would be company-level events (new install, config change, network issues).

It might not be so far ahead:

Vigilo is a complete monitoring system designed for large environments (network and servers) thanks to a fully scalable and modular architecture. Built around Nagios, Vigilo integrates metrology graphs and events correlation. Vigilo also provides new features: notifications dashboard, centralized configuration tool, SNMP traps, etc.

Well, Vigilo is a bit too big for us, and a bit too young (f.e. its web site is only in french for now, but docs are in english).

And what they mention as events correlation is a tool that couples graph to better compare them.

Still, that’s what we might be able to expect in the near future.

And we could start by putting all our issues on the same calendar (releases, config changes, issues) on a tool that has a programmatic interface.

Collaboration tools

June 25th, 2008

The more I look at it, the less I like eGroupware.

Collaboration web solutions is the wrong approach; they are front-end centric; the GUI is the main feature hiding its core services (calendar, directory, file sharing). At best you might have some import/export features…

It should be the other way round; the GUI should just be one interface amongst others to the core services.

Behind the scene we should find LDAP for directories, WebDAV for file sharing (or a wiki replacement?), iCalendar for calendars.

This way we could find the contacts from any mail applications, files directly in our finder/explorer, calendars in our calendar tools.

Information would be available in our software of choice or mobile devices.
And it would work online and offline (merging or adding a record offline might not work, be update woud be automatic at the next connection).

Actively Geographically Distributed Teams and WebDAV

May 26th, 2007

Geographically distributed team

Our set up for file version control (Subversion) was made for geographically distributed teams; one code repository that anyone could checkout locally. Then the work is done on that local copy.
Actively geographically distributed team

Now we are not just distributed, but we move around, we have a copy of the current work opened on our desktops at the office, another one at home.

We need to submit our code in the middle of a work before moving from home to the office in order to find it back. And it is a pain because:

  • it makes many code changes for the same feature you add to the software
  • if you forget to submit, you won’t be able to use your recent work

Well there are different ways to overcome the first inconvenience; work in a different branch, commit as often as needed, branch back to main code trunk only once the feature is implemented, or copy the code over and on one of your workplace, work on non code not managed by subversion.

WebDAV

We nearly all heard about it, but what is it exactly?

It is a distributed file system, intended to make the WWW writable… a big wiki.

I set it up at home, apache config is just about loading the webdav module and defining the document directory (I didn’t go through user identification and rights management, but it is possible, I read).

Then you can either browse the files with a web client (Firefox), or mount the remote file system like any other file system and from there edit the files.

It could be possible to use it at Olsen to check our code there, and work on the very same version of our checked out files, from wherever we might be.

Notes:

  • Subversion uses a version of the WebDAV protocol to exchange files and files properties over the net, but it doesn’t provide a remote filesystem
  • all of this was very easy… on a Mac… because WebDAV is built in the OS, it might require some extra work on another OS.

References:

  • how to set up webdav: http://www.gregwestin.com/webdav_for_ical.php
  • webdav info: http://en.wikipedia.org/wiki/WebDAV
  • webdav and subversion: http://subversion.tigris.org/webdav-usage.html

Object persistence

March 21st, 2007

We had a discussion with Vito and Poongodi about object persistence. Object persistence means storing objects on disk for history or to bring back and object to its previous state after a restart. Poongodi presented Spring and Hibernate and Vito db4o.

Pros

  • db4o is fast (claimed 55x faster than Hibernate), easy to use, requires close to no configuration
  • hibernate is backed by standard database, is backward compatible with our existing databases, has a big user base, is supposed to be of industrial design, has drivers for any language

Cons

  • db4o is quite new, not backward compatible, has drivers for only java and .net
  • hibernate requires to write xml files for each class mapped

Description

Hibernate takes xml files describing the mapping of object data members to a relational database, it interprets them at run-time to extract/update the database, so we could map objects to existing tables thus have backward compatibility. Spring is a framework helping with database connectivity.
db4o uses the reflective capabilities of java to automatically store the objects to files.

Knowing MySQL, I would be tempted to stick to it, but db4o doc describes that it can do everything mysql does (including replication). Actually it would even be possible to couple it with Hibernate and MySQL as a storage engine, but this is not a solution because it would bring the defaults of Hibernate into db4o.

Still whatever we go for, we should use DAO layer to be able to put whatever storing strategy.
Follow up

I have to make a decision… let me read a bit more about it…

Other links: Sping/Hibernate example, Torque, Java Data Object

Disk full and bandwidth consumption tracking

March 19th, 2007

On friday evening, ops12 started to complain about disk being full… I didn’t see that until sunday evening a few minutes before Gary’s phone call.

Disk full

Disk got full, because it was already on a limit (database extractions we did) and because of log growing.Log should grow under a certain limit (fixed size rotating log files over a fixed number of files), disk space required should be deterministic… still we didn’t compute the needed space.
Usually it occurs when someone upgrade their software (Oanda for example) and shut down their services for a few hours, we then can’t connect and log it, ultimately our soft stops get restarted, retries, re-logs… until it fills the disk.

Of course we have some check and notification for that… but usually things like that happen during the week-end.

Make sure we get sms on the production server.

Main problems that show up are:

  • database getting out of space throws errors making more logs
  • soft get no space to write their PID files and get started again and again, making more logs and possibly exhausting the bandwidth

After cleaning up, to do quick, it is better to reboot the machine, so make sure your soft runs with init scripts triggered by cron… hoping the machine reboots :)

Bandwidth consumption tracking

After rebooting we saw an unusual bandwidth consumption (look at sunday evening here):
internet bandwidth consumption - weekly

Then trying to find the machine that causes it is another problem. In the end it was ops12, you can tell by pattern matching:

internet bandwidth consumption - hourly
ops12 traffic
Here it is clear because it is caused by the AutoTrader that I had to shut down for some time (not enough rates on DB making too much noise, around 1 mail/minute, for me to work) so the pattern is clear. Otherwise you have to guess and it is not easy because of the scale difference (look at internet’s graph, peaks are at 100kB as on ops12’s peaks are at 1gB).
For the story; we didn’t change anything so it must be DB that changed the way their servers behave.

no DB quotes on friday eve and monday morning

March 19th, 2007

This is a recurrent problem, we get tons of sms and email because we stop trading to late and restart trading too early.

DB and Oanda trading times are not the same… that’s why, and “exotic” currencies (EUR-HUF) take even longer to be quotted.

It is not a big deal because we net-out the pending trades, so when we can trade we trade at once the position difference (we might still miss a peak).

The solution is to release the notification filter I am supposed to code. And to improve the week detection which is not sufficient now: I just check if we are saturday or sunday, which means we would still get emails until friday midnight and from sunday midnight.

Parsing generic Enum

March 16th, 2007

I didn’t notice before these interesting methods of Enum:

public static Enum> T valueOf(Class enumType, String name)
public final Class<E> getDeclaringClass()

It make it possible to parse an enum from a generic method

public class Field {
public T value;
public void parseOtsdb(ResultSet rs) throws FieldParseException {
try {
if (value instanceof Double) {
value = (T) (Double) rs.getDouble(name);
} else
	if (value instanceof Enum) {
		value = (T) Enum.valueOf(((Enum)value).getDeclaringClass(), rs.getString(name));
} else {
throw new FieldParseException("unrecognized type: "+ value);
}
} catch (SQLException e) {
throw new FieldParseException("unrecognized field: "+ name +", exception: "+ e.getMessage());
}
}
}

Isn’t that cool? Whatever the Enum type of your Field instance, it will be parsed all the same!

Report

March 16th, 2007

Just if anybody wonder I am now setting up automated Roll-Over for DB AutoTrader.

You can see activity here.
I have to finish that pretty quickly to jump on Oanda java port, needed to test OIP

Some work has already been made for Oanda, for example the position checker is already running on the code of the future Oanda java AutoTrader.

The OIP modification is that a user has several accounts, right now we just take the first account, here we have to specify which one we should trade.