AI in Payment Fraud Detection

May 2, 2024

5 mins read

Wallets, keys and telephones are objects to which every one of us pays a lot of attention. Due to their importance, we always take care not to have them stolen. The same, however, cannot always be said for our financial data. Credit card information is nowadays very often digitally stolen and leads to payment fraud. According to estimates, in 2022 payment frauds caused a loss of 42 billion dollars for e-commerce. For businesses, it can result in significant financial losses, damage to reputation, and legal liabilities. Additionally firms incur costs related to investigating and resolving fraud cases, as well as potential fines and penalties for not satisfying security standards. The most common payment frauds are:

Phishing: a form of social engineering where hackers persuade people to reveal sensitive information and to install malwares.
Skimming: the use of a small electronic device, skimmer, to store victims’ card numbers.
Identity fraud: Unauthorized use of other people’s personal and financial information.
Refund frauds: A type of payment fraud in which an individual or a group of people falsely claim a refund or reimbursement from a company, government, or financial institution.

In order to fight these crimes, payment’s providers, banks and governments started to implement payment fraud detection systems. The two most important laws that aim to also contrast payment fraud are:

General Data Protection Regulation (GDPR)
Payment Card Industry Data Security Standard (PCI DSS)

Traditionally, the anti-fraud systems were rule-based. They were built on predefined rules to identify potential instances of fraud. For example, if a transaction of an enormous amount occurs, the system could notice it and ask for a human review. However, this approach has many limitations: it mostly struggles to adapt to evolving fraud patterns, making it less efficient against new fraud techniques. Therefore, in the last years a new anti-fraud approach based on machine learning has been implemented. Due to its ability to analyze large scales of data, machine learning enables financial institutions and payment providers to detect payment frauds that normally humans would not recognize. The principal applications of machine learning in payment fraud detection are:

Anomaly detection: Identifying unusual patterns and deviations from normal transaction behavior. This includes detecting transactions with unusual amounts, unusual transaction times, or transactions from unusual locations.
Network analysis: It uses graph analysis to detect unusual relationships between financial entities, helping uncover criminal organization activities and networks of fraudulent actors. This involves analyzing transaction networks to identify clusters of connected entities engaging in suspicious behavior.
Identity verification: It uses machine learning to verify user identity information, such as recognition documents or facial recognition data. This includes techniques like document authentication, biometric verification, and behavioral analysis to ensure the authenticity of user identities.
Text analysis: It uses unsupervised learning to analyze unstructured data, such as email and message text, to detect keywords indicating fraud. Natural language processing techniques are used to extract meaningful information from text data, such as identifying phishing attempts or fraudulent communications.
Risk rating: It utilizes machine learning algorithms like logistic regression to assign a risk level to transactions. Transactions exceeding a certain risk threshold are flagged as potentially fraudulent and subjected to human review. Other techniques, such as decision trees or random forests, are also used for risk assessment.

Among all these different fraud detection systems, we now focus on a technical description of how risk rating works.

Logistic Regression

A popular way to build a fraud detection system is through Logistic Regression. This is a popular classification algorithm, used to predict a binary outcome based on a set of independent variables. In this case, the variables are all the information about the payment such as the amount, time of the day, location, purchased good and others, which are evaluated to predict the odds that the payment is a fraud. We will now see the steps involved in building an algorithm of this sort, from cleaning the training database to testing and evaluation.

Data Cleaning and database division

The first phase consists in selecting a database to train your model on. We chose a database on transactions made using credit cards by European cardholders during two days of September 2013. Since frauds represent a minority of transactions the data is very unbalanced, over the whole sample of 284,807 transactions only 492 are fraudulent (0.172%). [data available here] Since the data is often noisy, we must clean it and prepare it to train the classifier. During this procedure we must fill in possible missing values and remove any outliers, failing to do so could lead to poor model performance or errors in the training phase. First, we can fill the missing values with the mean of the considered category. Then, outliers are removed with a clustering method, dividing the dataset in three bins: two for higher and lower outliers and one for the rest. The first two are then eliminated. At last the database is divided into training and testing samples. The goal of the training database is to construct the classifier (model), while the goal of the testing database is to test (evaluate) the built classifier. In this work, the cross-validation method is used to divide the database:

As shown in the figure, the database is divided into 10 parts. In the first iteration (𝑘 = 1), the first nine parts are considered a training set, while the last part of the database is considered a testing set. In the second iteration (𝑘 = 2), both the first eight parts and the tenth part are considered as a training set, while the ninth part of the database is considered a testing set. This process continues until the last iteration (𝑘 = 10), where the first part is the testing set and the last nine parts are the training set.

Building the classifier

In the context of building the classifier, logistic regression is preferred over linear regression because it allows classification of more complex data. Logistic regression takes the data as input (interpreted as variables), estimates the probability that it belongs to the fraud category or not, and returns the odds of it actually being a fraud as output, call it y. The mathematical steps to obtain the logistic equations from linear regression are given below: The equation of the straight line can be written as: $$y=a_0+a_1\times x_1+a_2\times x_2+\dots a_k\times x_k$$ Where $x_1,x_2,\dots$ are the variables and $a_0,a_1,a_2,\dots$ the coefficients that we are going to estimate during training. In logistic regression, $y$ can be between 0 and 1 only, so we divide the above equation by $(1 − 𝑦)$ to obtain the odds, defined as the ratio of favorable to unfavorable outcomes. It evaluates as: $$\frac{y}{1-y}|0\text{ for } y=0\text{ and }\infty\text{ for } y=1$$ As a result, the logistic regression equation is defined as: $$\log\left(\frac{y}{1-y}\right)=a_0+a_1\times x_1 + a_2 \times x_2 +\dots a_k \times x_k$$ Now the logit can take values from positive to negative infinity. To transform it back into a probability, which must be between 0 and 1, we apply the inverse logit function, also known as the sigmoid function: $$y=\frac{1}{1+e^{-(a_0+a_1x_1+\dots +a_nx_n)}}$$ In other words, the fraud class takes the value “1”, while the non-fraud class takes the value “0”. A threshold of 0.5 is used to differentiate between the two classes.

Testing and evaluating

Since the cross-validation method divides the database into 10 parts, there are 10 testing data sets. To determine accuracy, sensitivity and error rate of each test we rely on a confusion matrix. The confusion matrix is formed based on the following terms: true positives, true negatives, false positives and false negatives.

By accuracy we define the percentage of records in the test set that are correctly classified (fraudulent or non-fraudulent). The final accuracy of the trained classifier is defined as the average accuracy of the model on the ten datasets.

If the model performs according to the target metrics, it can then be deployed in real financial applications, otherwise, further training will be required.

Sources:

Authors: Matteo Mello, Riccardo Scibetta

Logistic Regression

Data Cleaning and database division

Building the classifier

Testing and evaluating

You May Also Like