[osiris] Re: HELP! Hours of session key negotiation failure

Justis Peters josiris at vitrumenterprises.com
Fri Nov 10 02:32:35 EST 2006


John,

I hope that maybe you've found a solution by now.  This one has me
stumped.  As a side effect, though, you've piqued my curiosity about ISCS.

See below, in case my further grasping at straws turns up something
useful for you.

John A. Sullivan III wrote:
> Here is the telnet response to the failing client:
>
> [root at ltskbunk ~]# telnet mail1dc1 2265
> Trying 172.26.204.31...
> Connected to mail1dc1.atlasgroup.net (172.26.204.31).
> Escape character is '^]'.
> Connection closed by foreign host.
>
> Immediate closure.
>
> By the way, here is the tcpdump of that telnet - same pattern as the
> osiris connection attempt.  SYN SYN ACK initiated by the manager
> followed by FIN FIN ACK initiated by the failed client.  Almost like it
> doesn't like the proposal.
> <snip />
>   
> Here is the trace of the successful telnet:
> <snip />
> SYN SYN ACK PUSH ACK all initiated by the manager.
>
> In the failed case, the client seems to dislike something the manager is
> doing.
>   
It seems odd that the client sends its FIN immediately after finishing
the handshake.
  Unless I'm missing something obvious, there isn't even anything for
the client to dislike.  There's been no data received from the manager.
Based on that premise, I think this has very little to do with session
keys.  The log message is technically correct, since it says, "daemon
did not present session key."  There are so many other possible points
of failure before getting that far, though.

So, here's the grasping at straws.  Consider these points and also
consider it could be something completely outside these.  I'm reaching a
little beyond my knowledge at this point:
1) Is it possible that the client is simply experiencing a segmentation
fault or respawning for some reason?
2) From the client machine that's failing, try "telnet localhost 2265"
and do the same tcpdump technique you used before.  I think we've
already established that this probably isn't networking, but I've still
got slight doubts.  With the extra layers you've added in with Xen and
ISCS, it just makes it harder to know if you've eliminated problems that
are outside the application.  For this test, we just want to get as
close to the socket as we can.
3) If you think the results of that test warrant it, then try the same
thing but with netcat.  You can shut down the osirisd client and then
run "netcat -l -p 2265".  Now that netcat is listening on port 2265, try
your "telnet localhost 2265" test again.
4) Can you think of anything regarding Xen or your tightened security
that could screw up OpenSSL?  Maybe difficulties getting to the random
number generator or cryptography libraries.
5) You might consider hacking the osirisd code to include a few extra
log messages, to provide context as to where the failure occurs.
Alternately, you could try some sort of debugger (like gdb) and step
through the code.  It's your choice as to which you think has more
learning curve.  Personally, I think the log messages are easier.
6) Maybe in the morning someone else on the list will chip in with other
ideas.

Best of luck, John.  I hope you find the solution soon.

Kind regards,
Justis




More information about the osiris mailing list