During a recent release of a new ASP.net website, the launch went horribly wrong and required a rollback.
Setup
Front of house |
Back of house |
- Windows 2003 Server Standard
- IIS6
- Microsoft .net 1.1
- Oracle Data Provider v10.1.0.301
|
- Redhat Linux Enterprise R4
- Oracle 10g Enterprise Release 2, clustered using RAC
|
Scenario
Whilst the code was in the development environment, we weren’t seeing any issues. Once the code was published to a staging environment and the load on the temporary under powered server increased, we started to see errors creeping in. At the time, it was written off as being a combination of Oracle 10g Release 1 Standard and the server simply not having the resources to handle the load. We had previously seen poorly written legacy code bring a Dual 3.2Ghz Xeon running Oracle 10g Standard Release 1 (dedicated server processes) to its knees with similar style random errors.
Searching online seemed to reveal a common trend; it was an Oracle Data Provider problem. Unfortunately, everything we found was related to the version 9 data provider – while we were already running version 10. After four developers spending a few hours trying to resolve the error, a support request was logged with Oracle via MetaLink. Over the following three hours and a continuous stream of phone and email correspondence with Oracle, one of their technicians informed us of a patch set available for our current Oracle Data Provider.
Patch Information
Patch Number |
#4355425 |
Description |
Oracle ODP.NET Patchset 10.1.0.3.04 |
Product |
Oracle Data Provider for .NET |
Release |
Oracle 10.1.0.3 |
Bugs Addressed |
4228597 |
OracleDataAdapter returning incorrect schema information |
4205389 |
ODP.net hangs with multiple WHEN clauses on CASE statement |
4190650 |
Direct path INSERT doesn’t work via ODP.net |
4066828 |
Unmanaged exceptions return -3000 error without further information |
4028378 |
Need attributes/methods on OracleParameter/OracleCommand classes |
4020081 |
ODP.net -3000 errors under high load |
3937454 |
Calling cancel before command execution causes an error |
3930596 |
Output bind variable initialized with blanks using ParameterDirection.OUTPUT |
3897454 |
Aborting selecting thread fails with internal error -3000 |
3893458 |
ODP internal error -3000 in ExecuteReader() method |
Out of list of bug fixes, two items should be highlighted:
- 4020081: As we were running short of servers during the development phase, we had Oracle 10g Release 1 Standard installed on a standard desktop machine with a single hard drive. Once the application was released to testing, the Linux ‘load’ on this machine was regularly breaking 15-20. As such, this could have been contributing to the
Data Provider Internal Error
.
- 3893458: The legacy version of this application had been running on the v10.1.0.301 of the Oracle Data Provider for nearly a year without any issues. That code base made very limited use of PL/SQL stored procedures, while all new code was being funneled through PL/SQL. Strangely, the legacy code would have been firing the
ExecuteReader()
method, while the new code was firing ExecuteNonQuery()
. If this was the cause directly, you would have expected those to be around the other way. I think it is worth listing here, as the decision to funnel everything through PL/SQL is a fundamental shift in the application architecture.
It should be noted that without the help of the Oracle Technician, we would have never found this patch on MetaLink in a timely fashion; which I would consider a serious shortcoming of the site. However, after downloading and applying the patch manually, the random errors immediately vanished.
There was a lesson learned from this experience, the Oracle Data Provider was discounted as a likely problem early on because we were using a newer version of the provider than that of the error reports we were seeing. Looking back on it, we should have immediately upgraded the Oracle Data Provider to the latest public release, simply to rule out a bug in an older version.