22 November 2012

ColdFusion - Find common elements in two lists

I recently stumbled across an interesting programming question, how to find the duplicates in two lists? In this case I had two lists of email addresses and wanted to know which email addresses were in both lists. Sounds simple.

I'm curious to know what is the optimum approach, both in terms of simplicity and in terms of speed.

The first solution (compareLists) is the one I came up with, the second (listCommon) is one I adapted a little from Phillip's Coldfusion Blog.

<cffunction name="compareLists" access="public" returnType="string" output="false">
    <cfargument name="strList1" type="string" required="true" />
    <cfargument name="strList2" type="string" required="true" />

    <cfscript>
        var stuCompare        = {};
        var intCount        = 0;
        var strTempElement    = "";
        
        //loop through second list
        for(intCount=1; intCount LTE listLen(arguments.strList2); intCount++){
            strTempElement    = listGetAt(arguments.strList2,intCount);
            if(listContainsNoCase(arguments.strList1,strTempElement)){
                //Add key to struct
                stuCompare[strTempElement]    = "";
            }
        }
    
        return structKeyList(stuCompare);
    </cfscript>
</cffunction>

<cffunction name="listCommon" access="public" output="false" returnType="string">
    <cfargument name="strList1" type="string" required="true" />
    <cfargument name="strList2" type="string" required="true" />
     
    <cfset var arrList1 = ListToArray(arguments.strList1) />
    <cfset var arrList2 = ListToArray(arguments.strList2) />
     
    <cfset arrList1.retainAll(arrList2) />
     
    <!--- Return in list format --->
    <cfreturn ArrayToList(arrList1) />
</cffunction>

I was a little surprised you could use a java function like retainAll quite so nativity, but other than that it's fairly self explanatory.
So to time the methods, I took a leaf out of Ben Nadel's book and used getTickCount() to count the processing of 10,000 iterations of the above functions with two small lists.

Function Name Time (in milliseconds)
compareLists 849
listCommon 611
compareLists 352
listCommon 182
compareLists 232
listCommon 234
compareLists 208
listCommon 151
compareLists 80
listCommon 104

So as you can see, the listCommon method is much quicker. However it seems after the first few runs, caching kicks in and from then on the compareLists method remains very fast. I don't really want to get into caching, I'd rather focus on the differences between the two methods. Does anyone have any suggestions or input? Anyone got any examples of how they'd do this kind of thing in Java or possibly even something lower level?

1 comment:

Adam Cameron said...

Hey dude.
Hey, you might wanna get rid of those first two comments: they're spam (it's best not to encourage spammers, yes?)

Also, you should probably be using listFind() not listContains(): listContains() will find partial matches (eg: it'll find "mat" in "match").

Interesting that CFML is quicker than the Java equiv though... I guess it would be the overhead of converting to/from an array.

I suspect, though, that the performance would be the other way around with larger lists. List-processing in CF is rather slow. I imagine this will start being apparent with longer lists.

--
Adam