skip to main content

Use PATINDEX to match data from another table

One of my favorite SQL Server functions is PATINDEX. It can be used to get the position of pattern in an expression or a table column.

As per the official docs, the most basic usage is as follows.

SELECT PATINDEX('%er%', 'interesting data');

This returns 4.

I have used PATINDEX similarly to find whether a pattern exists in an expression or to extract a pattern out of an expression.

Say, I have a column of data which contains PID (process ID) information and may contain the word PID more than once. I need to get the PID 88920, which is the real PID, in my contrived case.

DECLARE @UnstructuredText VARCHAR(100)
SET @UnstructuredText='PIDXWWEPID88920;-LSKPIDD199922;PID9911AKKAA'

To check the existence of the pattern, we can use IIF and PATINDEX. IIF is a recent addition, so if you are using SQL Server 2012 and below, you can use a CASE WHEN END to do the same thing.

SELECT IIF(PATINDEX('%PID[0-9;]%', @UnstructuredText) > 0, 'yes', 'no')

which would return yes.

To extract the PID, we could employ CHARINDEX and SUBSTRING along with PATINDEX.

DECLARE @UnstructuredText VARCHAR(100)
SET @UnstructuredText='PIDXWWEPID88920;-LSKPIDD199922;PID9911AKKAA'
DECLARE @Start INT
DECLARE @End INT
SET @Start = PATINDEX('%PID[0-9;]%', @UnstructuredText)
SET @End = CHARINDEX(';', @UnstructuredText, @start)
SELECT REPLACE(SUBSTRING(@UnstructuredText, @Start, @End-@Start),'PID','') AS REALPID

This returns the real PID.

REALPID
------
88920

Recently, I came across a scenario where I would need to match a set of strings from one table to another. I gave PATINDEX a try and amazed that this could be done and that I never even thought about this.

Say, I have a table, CompaniesByInvestmentType, that holds a list of types of investments. This data is derived from an XML file and is a bit unstructured.

CREATE TABLE CompaniesByInvestmentType (InvestmentType VARCHAR(MAX))
INSERT INTO CompaniesByInvestmentType VALUES
('Funds - ABC Insurance and Funds Co'),
('Bonds - XYX Financial Holdings'),
('Alternative - GLS Investments'),
('Stocks - ABA Stocks and Insurances'),
('Real estate - Bluth Company'),
('Real estate - AAA Realtors'),
('Stocks - B2 Home Loan and Stock Management'),
('Real estate - Pioneer Housing Projects'),
('Funds - BAK Stocks and Funds'),
('Stocks - Newstock Management'),
('Real estate - BBP Classic Realty'),
('Funds - Greenrock Infra and Mutual Funds'),
('Alternative - MS Options and Futures')
SELECT InvestmentType from CompaniesByInvestmentType
InvestmentType
------------------------
Funds - ABC Insurance and Funds Co
Bonds - XYX Financial Holdings
Alternative - GLS Investments
Stocks - ABA Stocks and Insurances
Real estate - Bluth Company
Real estate - AAA Realtors
Stocks - B2 Home Loan and Stock Management
Real estate - Pioneer Housing Projects
Funds - BAK Stocks and Funds
Stocks - Newstock Management
Real estate - BBP Classic Realty
Funds - Greenrock Infra and Mutual Funds
Alternative - MS Options and Futures
(13 row(s) affected)

I have another table, AllowedInvestmentTypes, that has the allowed investment types.

CREATE TABLE AllowedInvestmentTypes (AllowedType VARCHAR(50))
INSERT INTO AllowedInvestmentTypes VALUES
('Funds'),
('Alternative')
SELECT AllowedType FROM AllowedInvestmentTypes
AllowedType
------
Funds
Alternative
(2 row(s) affected)

To get all the companies that fall under these two types, we can join both the tables on the condition that we match the start of the type and that the match ends with a hyphen.

SELECT 
    InvestmentType AS AllowedInvestmentType 
    FROM CompaniesByInvestmentType C
INNER JOIN
    AllowedInvestmentTypes AIT
ON 
    PATINDEX(AIT.AllowedType + '-%', REPLACE(C.InvestmentType, " ", "") = 1

This query will fetch us only Funds and Alternative investment types.

AllowedInvestmentType
--------------------
Funds - ABC Insurance and Funds Co
Alternative - GLS Investments
Funds - BAK Stocks and Funds
Funds - Greenrock Infra and Mutual Funds
Alternative - MS Options and Futures

References:
patindex workbench