Thought Delimited: Database techniques

Showing posts with label Database techniques. Show all posts

Monday, April 29, 2013

Quick CFML Tips: Script-based Queries And Tag/Script Hybrid CFCs

While refactoring a particular process in a ColdFusion project at work, I learned two things I thought worth passing on:

1. There are certain scenarios where doing queries in cfscript, instead of using the <cfquery> tag, does come with a significant performance penalty. In my case, the scenario was one where I was taking two recordsets created by querying the database and running subqueries against those recordsets in a recursive fashion. My guess is that instantiating the query object needed to perform query functions in cfscript (which I was doing with each recursion) was the main issue. When I converted the queries and functions involved in the recursive process to use tag-based syntax, the amount of time it took the process to run dropped by over 33%.

Not saying that this means you should avoid doing script-style queries, just that if there are a large number of such queries in a long-running process, you might want to experiment with refactoring them.

2. I already knew that I could have one or more cfscript blocks inside the body of a CFC function, but I didn't know that I could combine tag-syntax functions and script-syntax functions in the same CFC like so:

<cfcomponent accessors="true">
    <cffunction name="doX" output="false" returntype="void">
        ...
    </cffunction>

    <cfscript>
        public void function doY () {
            ....
        }
    </cfscript>
</cfcomponent>

...it had never crossed my mind to try it that way. Certainly useful when you need to run a CFML function that has no cfscript equivalent in whatever version of CFML engine you're using (I should note that I was using ColdFusion 9.0.1 in both of the above cases).

Wednesday, March 13, 2013

Adding Oracle Support to BugLogHQ: How the Application Design Made it Easy

As I mentioned in my previous blog post, in order to try out BugLogHQ in my work environment, I knew I would have to update the code so it could run against Oracle database tables. So I started looking through the code to see how difficult of a task that would be, to see if it was worth the trouble. Changing the code to support another database would mean at least looking at (if not updating) every block of code that performed a SQL operation: if there were a lot of such blocks, it could be a time-consuming task.

Fortunately for me, BugLogHQ was designed to abtract most of the SQL operations behind a DAO layer. Each database table used by BugLogHQ is mapped to a DAO CFC in the components/db folder of the BugLogHQ application, and those CFCs spell out the name and cfqueryparam data type for each table field and inherit getter and setter-type functions from a base DAO class. When it comes time to persist data from instantiated DAO objects into the database, the DAOs are processed by the dbDataProvider.cfc in the components/lib/dao/ folder, and that CFC contains the SQL code for performing all of the CRUD operations.

(I should point out that this is a somewhat simplified description of all of the "moving parts" involved in data persistence in BugLogHQ, but after following the code through the process of adding and removing records I determined that I didn't need to make changes to other objects such as the DAO factory).

The main issue I needed to address was the fact that the current code for inserting a new record took advantage of the autoincrement feature in the other databases supported by BugLogHQ (MySQL, MS SQL, etc.). Oracle doesn't come with an autoincrement field option. Instead, Oracle has what are called sequences, which are essentially autoincrementing numbers that exist separately from the database tables (meaning you could, if you wanted to, pull the next integer from a particular sequence into any table). So in the SQL install script I wrote to create the needed tables in Oracle (mirroring the install scripts for the other database engines present in the BugLogHQ install folder), I added lines to create sequences for each table, with each sequence name being the table name with "_seq" appended to the end of it. I then added code to the _insert() function in the dbDataProvider.cfc to use the sequences for the primary key value when inserting a record:

  
<cffunction name="_insert" access="private" returntype="any">
  <cfargument name="columns" required="true" type="struct">
  <cfargument name="_mapTableInfo" type="struct" required="true">
  ...  
  <cfset var dbtype = variables.oConfigBean.getDBType()>
  <cfset var lstFields = structKeyList(arguments.columns)>
  <cfset var tableName = arguments._mapTableInfo.tableName>
  <cfset var seqName= tableName & "_seq">
  <cfset var pkeyName= arguments._mapTableInfo.PKName>
  ...
  <cfif dbtype eq "oracle">
    <cftransaction>	
      <cfquery name="qry" datasource="#DSN#" username="#username#" password="#password#">
        INSERT INTO #getSafeTableName(tableName)# (#pkeyName#,#lstFields#)
        VALUES (
        #seqName#.nextVal,
        <cfloop list="#lstFields#" index="col">
          <cfqueryparam cfsqltype="#arguments.columns[col].cfsqltype#" value="#arguments.columns[col].value#" null="#arguments.columns[col].isNull#">
          <cfif i neq listLen(lstFields)>,</cfif>
            <cfset i = i + 1>
          </cfloop>
        )
      </cfquery>
				
      <cfquery name="qry" datasource="#DSN#" username="#username#" password="#password#" result="qryInfo">
        select #seqName#.currVal as lastId from dual
      </cfquery>
    </cftransaction>
    <cfset newID = qry.lastID>
  <cfelse>
  ...

So if the dbtype value (which comes from the config/buglog-config.xml.cfm file) is "oracle", the Oracle insert code block will execute. The first cfquery block will insert the record, using the nextVal() function of the sequence associated with the table to get the next value of the sequence, while the second cfquery block retrieves the current/latest value of the sequence via Oracle's DUAL "table" (a construct designed to perform pretty much any utility operation that returns a single record or value).

The other issue I needed to address was adjusting some of the field data types. Oracle's "varchar2" data type can only hold 4,000 characters of text; to store larger amounts of text, you need to use fields with the CLOB data type. And in order to use cfqueryparam with such fields, you need to set the cfsqltype parameter of cfqueryparam to "cf_sql_clob". So I updated the code in the entryDAO and extensionDAO CFCs to handle that (with the entryDAO.cfc code shown below):

<cfif variables.oDataProvider.getConfig().getDBType() EQ "oracle">
  <cfset addColumn("exceptionDetails", "cf_sql_clob")>
  <cfset addColumn("HTMLReport", "cf_sql_clob")>
<cfelse>
  <cfset addColumn("exceptionDetails", "cf_sql_varchar")>
  <cfset addColumn("HTMLReport", "cf_sql_varchar")>
</cfif>

After that, all that I had left to do was to add Oracle-specific conditional blocks to two other queries (ones that involved using database functions specific to each database platform) in two other files (components/hq/appService.cfc and components/lib/entryFinder.cfc), and I was done.

Had I done all of the changes I just described perfectly on the first try - which admittedly I didn't :) - it would have taken me less time than it did to write this blog post. And that's a mark of tight, well-designed application architecture, where certain key aspects of application behavior are centralized in a manner that lets you add functionality (in this case database engine support) by updating just a few files responsible for handling that particular task.

Thursday, November 29, 2012

Quick ColdFusion Tip: Use ClearParams() When Doing SQL Inserts in a Loop

Today I had the need to write code that would loop over a recordset, parse the data in each iteration, and then insert the transformed data into another table. Nothing I haven't done countless times, but this time I was using the query functions in cfscript to do my inserts. I knew the resulting code was going to be a bit slow (as inserting thousands of rows takes time), but when I ran it it was a LOT slower than I expected.

After adding some code to time the various parts of my routine, I discovered that the time it took for each insert transaction was steadily growing with each iteration of the loop, to the point where it was taking 500+ milliseconds per insert. But why?

Then I saw the problem. I had forgotten to invoke the query object's clearParams() function at either the beginning or end of my loop. Apparently ColdFusion will let you create a query parameter with the same name attribute using addParam() - as was happening in my loop - and not throw an error (which is what I would have expected to happen), but it leads to a performance issue with the SQL execution.

In the few times where I've reused a query object with different parameters, I've been careful to use clearParams(), but I simply overlooked it this time. Lesson learned.

Sunday, December 4, 2011

Announcing tableSnapshots: A ColdFusion Tool For Preserving and Reverting Database Table Data

Several months ago, I was doing some functional testing on a web application that was taking forever. Each run of the test resulted in changes to the data in several tables, and in order to reset the test I had to go into the tables and undo the changes. It occurred to me that it would be nice to have a tool or script that could preserve the current table data and then later write that version of the data back to the tables.

That marked the beginning of the tableSnapshots tool.

Creating the initial version of the tool just to suit my own needs was quick and easy compared to the work I've put into the tool since then to make it worth sharing, but I'm pretty happy with the result. It's essentially a small web application that you can drop into any folder on your ColdFusion-enabled (ColdFusion 8 or 9) web server. Once you update the configuration settings in its Application.cfc file with the name of your datasource, the tool enables you to:

Take a "snapshot" of one or more of your database tables, storing all of the current table data as JSON data in a text file (NOTE: this tool will not work on tables with binary data fields).
Replace the current data in a table with the data stored in a snapshot, creating a backup snapshot of the data being replaced in the process.
View the data in a snapshot or backup snapshot in a jQuery-enhanced table that allows you to sort and filter the data.
Delete older snapshot and backup snapshot files.

All of this functionality is available via a web interface. Creating a snapshot is as easy as clicking on the "Create" menu option, selecting the tables from your datasource that you want to generate snapshots from an HTML table (enhanced with the jQuery Datatables plugin), and clicking a button. You reload the data from one or more snapshot files in the same fashion (though of course reloading the data takes longer), and you can view the data stored in any snapshot or backup file in tabular format. The tool even makes it easy to delete snapshot and backup files you no longer need so you don't have to manually delete them from your file system.

The tool serializes the table data using the functions provided in the JSONUtil project because the "strictMapping" option provided by JSONUtil avoids the data conversion issues that can occur with the native ColdFusion JSON functions (such as strings being converted to numbers and "true"/"false" string values being converted to "yes/no"). However, it uses the native JSON functions for deserializing the data because they seem to perform better when working with large data sets.

This tool was built and tested on ColdFusion 9.0.1. It should work on Adobe ColdFusion 8, and perhaps other CFML engines as well, but it has not been tested in those environments. It relies on the <cfdbinfo> tag to acquire the data types of the table fields it processes, so access to that tag is a requirement.

The current version supports Apache Derby, MySQL, and Oracle: there is one .cfc file in the tool for each of these databases that tells tableSnapshots how to map the data stored in JSON format to the relevant table fields during the snapshot reload process. So adding support for additional databases simply means writing similar .cfc files and placing them in the main tool directory. Further details about how tableSnapshots works and what configuration options are available are described in the "Technical Details" page within the application itself.

As I said at the beginning, the original purpose of tableSnapshots was to make it easy to rollback data in multiples tables during testing. But there are a couple of other potential uses as well:

You could use it to simply track changes to data over time (since each snapshot file is timestamped).
If for some reason you don't have an IDE or database client program that lets you browse table data, you can create snapshots and view the data with the snapshot view option (which gives you the ability to sort and filter the data).
You can use it to copy table data from one type of database server to another. I used it for this purpose myself: I made a snapshot of the data, created a similar table with a different name on the other datasource, renamed the snapshot file to match that different name, reconfigured and reset tableSnapshots to interact with the new datasource, and "reloaded" the snapshot data into the new table.
If your development database isn't being backed up as well or as often as you'd like, you could use the snapshots as your own backup.
You can create snapshots as data for use while offline or in mock objects. tableSnapshots comes with a mini-tool in the "tools/jsonasquery" subfolder that can be used to pull snapshot data into a query object.

I do want to be clear that tableSnapshots is meant for use as a development tool. While there's no risk of data loss when taking a snapshot, reloading snapshots involves deleting data and no software is perfect, so make sure you take additional steps to safeguard any data that you're afraid to lose.

tableSnapshots is available for download from both RIAForge and GitHub, so have at it!

Wednesday, May 14, 2008

Table normalization verses long-term data storage

I'm currently working on an application that involves long-term storage of assessment data. Users submit records of their activities and assess their performance, and then reviewers look over those assessments and denote whether they agree or disagree with them. Each assessment database record is related to a reviewer through the unique reviewer id that is part of the assessment record, and I can use that relationship to retrieve the reviewer's name whenever I display the assessment record.

It's a standard example of table normalization. If the reviewer's name was stored within the assessment record itself, and the reviewer changed their name for some reason (marriage, divorce, mid-life crisis, etc.), the application would have to update the name in both the reviewer's record AND the assessment record. But by using the reviewer's id in the assessment record to establish a relationship between the assessment record and the reviewer record, the reviewer's name only needs to be recorded or updated once.

However, this project will entail keeping the assessment data for an undetermined number of years. With the data arrangement I just described, that means I would have to store the assessment records and all of the related reviewer records if I want to be able to keep showing the name of the reviewer when looking at older assessment records. That could result in keeping a lot of extra data about reviewers (addresses, e-mail addresses, logins, passwords, etc.) who are no longer associated with the program simply because we need to keep their name tied to the assessments.

I think this is one of those situations where it makes sense to repeat a little data. Recording the reviewer's name in the assessment records allows me to let the administrative users of the application delete reviewer user accounts without impacting historical data. It means a bit more work in keeping the reviewer's name the same in both records, but in the long run I think it's worth the effort.

Thursday, February 7, 2008

Switching Between Sets of Database Tables On The Fly

Ever need to tell your ColdFusion app to run select queries against a different set of tables while you do something to the original set (like perform a batch update)?

I did. Here's how I did it.

First, I gave each table in each set a two-part name, two words separated by an underscore. The first part of the name was descriptive of the data in the table ("chapters","sections", etc.) while the second part of the name identified the data set the table belonged to ("live", "archive", whatever). So the "chapters_live" and the "chapters_archive" tables were exactly the same in structure.

Then I created an application variable called "application.tableSuffix" and set its initial value to "live".

(Can you see where this is leading?)

Finally, I changed my select queries to use table names comprised of the first, data-descriptive part of the table name, the underscore, and the value of application.tableSuffix:

<cfquery name="showChapters" datasource="ds">
   select chapterTitle, chapterIntro
   from chapters_#application.tableSuffix#
   order by chapterOrder ASC
</cfquery>

Once that was done, I could change the table set my select queries ran against at any time by simply changing the value of application.tableSuffix. Two things worth noting:

This technique only makes sense if only one set of tables can be updated by the user: if users can make changes to either set, you'll have to figure out how to resolve the differences between the data in each set.

If the switch could potentially be in place for awhile, you might want to store the current tableSuffix value in a database record or a file and then retrieve that data in the onApplicationStart method of your Application.cfc file so that the current value is preserved even if the application scope expires.