Wednesday, February 5, 2014

Database connection pooling problems

We have some problems running Oracle and ASP.NET application in different network segments. Similar to Garreth Hooper's problem in his blog, we encountered a ORA-12571 TNS: packet writer failure during period of server inactivity.
This all came from the best practice of connection pooling, which in my opinion a complex matter.

The Original Problems

In one side, we have application servers, hosting applications written either in ASP.NET, Java, or PHP, and on the other side we have database servers.
The first problem is, opening a connection from application to db server takes time, even though this is not a certainity (a lot of factors came into play, such as database server configuration, firewall setup, etc), this fact seems to encourage application developers to introduce connection pooling system to the architecture. 
The second problem comes from the fact that a connection open is a session open in the database server, and it almost always takes precious resources, so databases naturally want to ensure that idle connections doesn't linger too long.
The third problem, network firewalls want to police the network, so instead of database server, the network firewall does the connection disconnect in the name of network performance management.

Connection Pool

The connection pool, is a list of connections kept open to the database server. Even if we know nothing about this, the standard ODP.NET connection uses connection pools, and C3PO / DBCP connection pools are the norm in Java Spring Framework applications. So we must live with them. The thing is, connection pools prevents the application to actually close database connections. Instead of closing them, the close operation returns the connection to the pool.
If, for example, the application request a new connection 30 minutes after the close operation, the connection pool might returns the same connection. The connection, might be already killed by the database server, or killed by the network firewall. Which will cause us similar problem, both with Oracle database servers and MySQL database servers.
The configuration points that we might need to configure are :
1) the database server parameters
2) db connection string in the application
3) connection pool parameters in the application (this is the same as 2 on .NET platforms)

Solution A : Ensure connections are not killed

So, we could try to ensure that connections would not be killed at all. That includes configuring database server so there is no idle disconnect, and configure firewall to give special cases to database ports. Oh, the last one is impossible, I forgot. So we could, in oracle databases, enter this one-liner in NETWORK/admin/sqlnet.ora and restart the tns listener, so the database would always initiate ping/network activity on the connection after 10 minutes of idle time :
This would ensure the network firewall stays happy and prevent it from killing our connections.
Then we also needs to ensure the oracle user's profile is having unlimited idle time, so the database would not kill our connections :
Or, for MySQL databases, set the wait_timeout to a large value :
#The number of seconds the server waits for activity on a connection before closing it
#The number of seconds the server waits for activity on an interactive connection before closing it.

For MySQL, the configuration  you might want to set is keepalive setting in the client connection parameters.  For example, in the MySQL/net connection string :

For MySQL/Java connections :

Solution B Ensure connections are valid, reconnect if not

 In Java's DBCP connection pooling, we could set these parameters to ensure that each connections are valid before passing it to the application code :


 So all connections would be tested before use, preventing users from seeing connection errors.
For Oracle connections on .NET platform, we could set in the connection string as follows :
User Id=scott;Password=tiger;Data Source=oracle;Validate connections=true
For MySQL connections in .NET platform, the validation already hardcoded, there is no need to set any option (see the source).  


Please test suggestions in the article in a controlled test environment before applying them in the production environments, because my environment is definitely different from yours :).


No comments: