Telecommunications, Cable & Utilities
Experian is excited to have been chosen as one of the first data and analytics companies that will enable access to Social Security Administration (SSA) data for the purposes of verifying identity against the Federal Agency’s records. The agency’s involvement in the wake of Congressional interest and successful legislation will create a seismic shift in the landscape of identity verification. Ultimately, the ability to leverage SSA data will reduce the impact of identity fraud and synthetic identity and put real dollars back into the pockets of people and businesses that absorb the costs of fraud today. As this era of government and private sector collaboration begins, many of our clients and partners are breathing a sigh of relief. We see this in a common question our customers ask every day, “Do I still need an analytical solution for synthetic ID now that eCBSV is on the horizon?” The common assumption is that help is on the way and this long tempest of rising losses and identity uncertainty is about to leave us. Or is it? We don’t believe it’s the end of the synthetic ID storm. This is the eye. Rather than basking in the calm light of this moment, we should be thinking ahead and assessing our vulnerabilities because the second half of this storm will be worse than the first. Consider this: The people who develop and exploit synthetic IDs are playing a long game. It takes time, research, planning and careful execution to create an identity that facilitates fraud. The bigger the investment, the bigger the spoils will be. Synthetic ID are being used to purchase luxury automobiles. They’re passing lender marketing criteria and being offered credit. The criminals have made their investment, and it’s unlikely they will walk away from it. So, what does SSA’s pending involvement mean to them? How will they prepare? These aren’t hard questions. They’ll do what you would do in the eye of a storm — maximize the value of the preparations that are in place. Gather what you can quickly and brace yourself for the uncertainty that’s coming. In short, there’s a rush to monetize synthetic IDs on the horizon, and this is no time to declare ourselves safe. It’s doubtful that the eCBSV process will be the silver bullet that ends synthetic ID fraud — and certainly not on day one. It’s more likely that the physical demands of the data exchange, volume constraints, response times and the actionability of the results will take time to optimize. In the meantime, the criminals aren’t going to sit by and watch as their schemes unravel and lose value. We should take some comfort that we’ve made it through the first half of the storm, but recognize and prepare for what still needs to be faced.
The future is, factually speaking, uncertain. We don't know if we'll find a cure for cancer, the economic outlook, if we'll be living in an algorithmic world or if our work cubical mate will soon be replaced by a robot. While futurists can dish out some exciting and downright scary visions for the future of technology and science, there are no future facts. However, the uncertainty presents opportunity. Technology in today's world From the moment you wake up, to the moment you go back to sleep, technology is everywhere. The highly digital life we live and the development of our technological world have become the new normal. According to The International Telecommunication Union (ITU), almost 50% of the world's population uses the internet, leading to over 3.5 billion daily searches on Google and more than 570 new websites being launched each minute. And even more mind-boggling? Over 90% of the world's data has been created in just the last couple of years. With data growing faster than ever before, the future of technology is even more interesting than what is happening now. We're just at the beginning of a revolution that will touch every business and every life on this planet. By 2020, at least a third of all data will pass through the cloud, and within five years, there will be over 50 billion smart connected devices in the world. Keeping pace with digital transformation At the rate at which data and our ability to analyze it are growing, businesses of all sizes will be forced to modify how they operate. Businesses that digitally transform, will be able to offer customers a seamless and frictionless experience, and as a result, claim a greater share of profit in their sectors. Take, for example, the financial services industry - specifically banking. Whereas most banking used to be done at a local branch, recent reports show that 40% of Americans have not stepped through the door of a bank or credit union within the last six months, largely due to the rise of online and mobile banking. According to Citi's 2018 Mobile Banking Study, mobile banking is one of the top three most-used apps by Americans. Similarly, the Federal Reserve reported that more than half of U.S. adults with bank accounts have used a mobile app to access their accounts in the last year, presenting forward-looking banks with an incredible opportunity to increase the number of relationship touchpoints they have with their customers by introducing a wider array of banking products via mobile. Be part of the movement Rather than viewing digital disruption as worrisome and challenging, embrace the uncertainty and potential that advances in new technologies, data analytics and artificial intelligence will bring. The pressure to innovate amid technological progress poses an opportunity for us all to rethink the work we do and the way we do it. Are you ready? Learn more about powering your digital transformation in our latest eBook. Download eBook Are you an innovation junkie? Join us at Vision 2020 for future-facing sessions like: - Cloud and beyond - transforming technologies - ML and AI - real-world expandability and compliance
Experian Boost provides a unique opportunity to help dealers build loyalty while helping consumers.
Earlier this year, the Consumer Financial Protection Bureau (CFPB) issued a Notice of Proposed Rulemaking (NPRM) to implement the Fair Debt Collection Practices Act (FDCPA). The proposal, which will go into deliberation in September and won't be finalized until after that date at the earliest, would provide consumers with clear-cut protections against disturbance by debt collectors and straightforward options to address or dispute debts. Additionally, the NPRM would set strict limits on the number of calls debt collectors may place to reach consumers weekly, as well as clarify how collectors may communicate lawfully using technologies developed after the FDCPA’s passage in 1977. So, what does this mean for collectors? The compliance conundrum is ever present, especially in the debt collection industry. Debt collectors are expected to continuously adapt to changing regulations, forcing them to spend time, energy and resources on maintaining compliance. As the most recent onslaught of developments and proposed new rules have been pushed out to the financial community, compliance professionals are once again working to implement changes. According to the Federal Register, here are some key ways the new regulation would affect debt collection: Limited to seven calls: Debt collectors would be limited to attempting to reach out to consumers by phone about a specific debt no more than seven times per week. Ability to unsubscribe: Consumers who do not wish to be contacted via newer technologies, including voicemails, emails and text messages must be given the option to opt-out of future communications. Use of newer technologies: Newer communication technologies, such as emails and text messages, may be used in debt collection, with certain limitations to protect consumer privacy. Required disclosures: Debt collectors will be obligated to send consumers a disclosure with certain information about the debt and related consumer protections. Limited contact: Consumers will be able to limit ways debt collectors contact them, for example at a specific telephone number, while they are at work or during certain hours. Now that you know the details, how can you prepare? At Experian, we understand the importance of an effective collections strategy. Our debt collection solutions automate and moderate dialogues and negotiations between consumers and collectors, making it easier for collection agencies to reach consumers while staying compliant. Powerful locating solution: Locate past-due consumers more accurately, efficiently and effectively. TrueTraceSM adds value to each contact by increasing your right-party contact rate. Exclusive contact information: Mitigate your compliance risk with a seamless and unparalleled solution. With Phone Number IDTM, you can identify who a phone is registered to, the phone type, carrier and the activation date. If you aren’t ready for the new CFPB regulation, what are you waiting for? Learn more Note: Click here for an update on the CFPB's proposal.
Have you seen the latest Telephone Consumer Protection Act (TCPA) class action lawsuit? TCPA litigations in the communications, energy and media industries are dominating the headlines, with companies paying up to millions of dollars in damages. Consumer disputes have increased more than 500 percent in the past five years, and regulations continue to tighten. Now more than ever, it’s crucial to build effective and cost-efficient contact strategies. But how? First, know your facts. Second, let us help. What is the TCPA? As you’re aware, TCPA aims to safeguard consumer privacy by regulating telephone solicitations and the use of prerecorded messages, auto-dialed calls, text messages and unsolicited faxes. The rule has been amended and more tightly defined over time. Why is TCPA compliance important? Businesses found guilty of violating TCPA regulations face steep penalties – fines range from $500 to $1500 per individual infraction! Companies have been delivered hefty penalties upwards of hundreds of thousands, and in some cases, millions of dollars. Many have questions and are seeking to understand how they might adjust their policies and call practices. How can you protect yourself? To help avoid risk for compliance violations, it’s integral to assess call strategies and put best practices in place to increase right-party contact rates. Strategies to gain compliance and mitigate risk include: Focus on right and wrong-party contact to improve customer service: Monitoring and verifying consumer contact information can seem like a tedious task, but with the right combination of data, including skip tracing data from consumer credit data, alternative and other exclusive data sources, past-due consumers can be located faster. Scrub often for updated or verified information: Phone numbers can continuously change, and they’re only one piece of a consumer’s contact information. Verifying contact information for TCPA compliance with a partner you can trust can help make data quality routine. Determine when and how often you dial cell phones: Or, given new considerations proposed by the CFPB, consider looking at collections via your consumers’ preferred communication channel – online vs. over the phone. Provide consumers user-friendly mechanisms to opt-out of receiving communications At Experian, our TCPA solutions can help you monitor and verify consumer contact information, locate past-due consumers, improve your right-party contact rates and automate your collections process. Get started
If you’re a credit risk manager or a data scientist responsible for modeling consumer credit risk at a lender, a fintech, a telecommunications company or even a utility company you’re certainly exploring how machine learning (ML) will make you even more successful with predictive analytics. You know your competition is looking beyond the algorithms that have long been used to predict consumer payment behavior: algorithms with names like regression, decision trees and cluster analysis. Perhaps you’re experimenting with or even building a few models with artificial intelligence (AI) algorithms that may be less familiar to your business: neural networks, support vector machines, gradient boosting machines or random forests. One recent survey found that 25 percent of financial services companies are ahead of the industry; they’re already implementing or scaling up adoption of advanced analytics and ML. My alma mater, the Virginia Cavaliers, recently won the 2019 NCAA national championship in nail-biting overtime. With the utmost respect to Coach Tony Bennett, this victory got me thinking more about John Wooden, perhaps the greatest college coach ever. In his book Coach Wooden and Me, Kareem Abdul-Jabbar recalled starting at UCLA in 1965 with what was probably the greatest freshman team in the history of basketball. What was their new coach’s secret as he transformed UCLA into the best college basketball program in the country? I can only imagine their surprise at the first practice when the coach told them, “Today we are going to learn how to put on our sneakers and socks correctly. … Wrinkles cause blisters. Blisters force players to sit on the sideline. And players sitting on the sideline lose games.” What’s that got to do with machine learning? Simply put, the financial services companies ready to move beyond the exploration stage with AI are those that have mastered the tasks that come before and after modeling with the new algorithms. Any ML library — whether it’s TensorFlow, PyTorch, extreme gradient boosting or your company’s in-house library — simply enables a computer to spot patterns in training data that can be generalized for new customers. To win in the ML game, the team and the process are more important than the algorithm. If you’ve assembled the wrong stakeholders, if your project is poorly defined or if you’ve got the wrong training data, you may as well be sitting on the sideline. Consider these important best practices before modeling: Careful project planning is a prerequisite — Assemble all the key project stakeholders, and insist they reach a consensus on specific and measurable project objectives. When during the project life cycle will the model be used? A wealth of new data sources are available. Which data sources and attributes are appropriate candidates for use in the modeling project? Does the final model need to be explainable, or is a black box good enough? If the model will be used to make real-time decisions, what data will be available at runtime? Good ML consultants (like those at Experian) use their experience to help their clients carefully define the model development parameters. Data collection and data preparation are incredibly important — Explore the data to determine not only how important and appropriate each candidate attribute is for your project, but also how you’ll handle missing or corrupt data during training and implementation. Carefully select the training and validation data samples and the performance definition. Any biases in the training data will be reflected in the patterns the algorithm learns and therefore in your future business decisions. When ML is used to build a credit scoring model for loan originations, a common source of bias is the difference between the application population and the population of booked accounts. ML experts from outside the credit risk industry may need to work with specialists to appreciate the variety of reject inference techniques available. Segmentation analysis — In most cases, more than one ML model needs to be built, because different segments of your population perform differently. The segmentation needs to be done in a way that makes sense — both statistically and from a business perspective. Intriguingly, some credit modeling experts have had success using an AI library to inform segmentation and then a more tried-and-true method, such as regression, to develop the actual models. During modeling: With a good plan and well-designed data sets, the modeling project has a very good chance of succeeding. But no automated tool can make the tough decisions that can make or break whether the model is suitable for use in your business — such as trade-offs between the ML model’s accuracy and its simplicity and transparency. Engaged leadership is important. After modeling: Model validation — Your project team should be sure the analysts and consultants appreciate and mitigate the risk of over fitting the model parameters to the training data set. Validate that any ML model is stable. Test it with samples from a different group of customers — preferably a different time period from which the training sample was taken. Documentation — AI models can have important impacts on people’s lives. In our industry, they determine whether someone gets a loan, a credit line increase or an unpleasant loss mitigation experience. Good model governance practice insists that a lender won’t make decisions based on an unexplained black box. In a globally transparent model, good documentation thoroughly explains the data sources and attributes and how the model considers those inputs. With a locally transparent model, you can further explain how a decision is reached for any specific individual — for example, by providing FCRA-compliant adverse action reasons. Model implementation — Plan ahead. How will your ML model be put into production? Will it be recoded into a new computer language, or can it be imported into one of your systems using a format such as the Predictive Model Markup Language (PMML)? How will you test that it works as designed? Post-implementation — Just as with an old-fashioned regression model, it’s important to monitor both the usage and the performance of the ML model. Your governance team should check periodically that the model is being used as it was intended. Audit the model periodically to know whether changing internal and external factors — which might range from a change in data definition to a new customer population to a shift in the economic environment — might impact the model’s strength and predictive power. Coach Wooden used to say, “It isn’t what you do. It’s how you do it.” Just like his players, the most successful ML practitioners understand that a process based on best practices is as important as the “game” itself.
Your model is only as good as your data, right? Actually, there are many considerations in developing a sound model, one of which is data. Yet if your data is bad or dirty or doesn’t represent the full population, can it be used? This is where sampling can help. When done right, sampling can lower your cost to obtain data needed for model development. When done well, sampling can turn a tainted and underrepresented data set into a sound and viable model development sample. First, define the population to which the model will be applied once it’s finalized and implemented. Determine what data is available and what population segments must be represented within the sampled data. The more variability in internal factors — such as changes in marketing campaigns, risk strategies and product launches — and external factors — such as economic conditions or competitor presence in the marketplace — the larger the sample size needed. A model developer often will need to sample over time to incorporate seasonal fluctuations in the development sample. The most robust samples are pulled from data that best represents the full population to which the model will be applied. It’s important to ensure your data sample includes customers or prospects declined by the prior model and strategy, as well as approved but nonactivated accounts. This ensures full representation of the population to which your model will be applied. Also, consider the number of predictors or independent variables that will be evaluated during model development, and increase your sample size accordingly. When it comes to spotting dirty or unacceptable data, the golden rule is know your data and know your target population. Spend time evaluating your intended population and group profiles across several important business metrics. Don’t underestimate the time needed to complete a thorough evaluation. Next, select the data from the population to aptly represent the population within the sampled data. Determine the best sampling methodology that will support the model development and business objectives. Sampling generates a smaller data set for use in model development, allowing the developer to build models more quickly. Reducing the data set’s size decreases the time needed for model computation and saves storage space without losing predictive performance. Once the data is selected, weights are applied so that each record appropriately represents the full population to which the model will be applied. Several traditional techniques can be used to sample data: Simple random sampling — Each record is chosen by chance, and each record in the population has an equal chance of being selected. Random sampling with replacement — Each record chosen by chance is included in the subsequent selection. Random sampling without replacement — Each record chosen by chance is removed from subsequent selections. Cluster sampling — Records from the population are sampled in groups, such as region, over different time periods. Stratified random sampling — This technique allows you to sample different segments of the population at different proportions. In some situations, stratified random sampling is helpful in selecting segments of the population that aren’t as prevalent as other segments but are equally vital within the model development sample. Learn more about how Experian Decision Analytics can help you with your custom model development needs.
As our society becomes ever more dependent on everything mobile, criminals are continually searching for and exploiting weaknesses in the digital ecosystem, causing significant harm to consumers, businesses and the economy. In fact, according to our 2018 Global Fraud & Identity Report, 72 percent of business executives are more concerned than ever about the impact of fraud. Yet, despite the awareness and concern, 54 percent of businesses are only “somewhat confident” in their ability to detect fraud. That needs to change, and it needs to change right away. Our industry has thrived by providing products and services that root out bad transactions and detect fraud with minimal consumer friction. We continue to innovate new ways to authenticate consumers, apply new cloud technologies, machine learning, self-service portals and biometrics. Yet, the fraud issue still exists. It hasn’t gone away. How do we provide effective means to prevent fraud without inconveniencing everyone in the process? That’s the conundrum. Unfortunately, a silver bullet doesn’t exist. As much as we would like to build a system that can detect all fraud, eliminate all consumer friction, we can’t. We’re not there yet. As long as money has changed hands, as long as there are opportunities to steal, criminals will find the weak points – the soft spots. That said, we are making significant progress. Advances in technology and innovation help us bring new solutions to market more quickly, with more predictive power than ever, and the ability to help clients to turn these services on in days and weeks. So, what is Experian doing? We’ve been in the business of fraud detection and identity verification for more than 30 years. We’ve seen fraud patterns evolve over time, and our product portfolio evolves in lock-step to counter the newest fraud vectors. Synthetic identity fraud, loan stacking, counterfeit, identity theft; the specific fraud attacks may change but our solution stack counters each of those threats. We are on a continuous innovation path, and we need to be. Our consumer and small business databases are unmatched in the industry for quality and coverage, and that is an invaluable asset in the fight against fraud. It used to be that knowing something about a person was the same as authenticating that same person. That’s just not the case today. But, just because I may not be the only person who knows where I live, doesn’t mean that identity information is obsolete. It is incredibly valuable, just in different ways today. And that’s where our scientists come into their own, providing complex predictive solutions that utilize a plethora of data and insight to create the ultimate in predictive performance. We go beyond traditional fraud detection methods, such as knowledge-based authentication, to offer a custom mix of passive and active authentication solutions that improve security and the customer experience. You want the latest deep learning techniques? We have them. You want custom models scored in milliseconds alongside your existing data requests. We can do that. You want a mix of cloud deployment, dedicated hosted services and on-premise? We can do that too. We have more than 20 partners across the globe, creating the most comprehensive identity management network anywhere. We also have teams of experts across the world with the know how to combine Experian and partner expertise to craft a bespoke solution that is unrivaled in detection performance. The results speak for themselves: Experian analyzes more than a billion credit applications per year for fraud and identity, and we’ve helped our clients save more than $2 billion in annual fraud losses globally. CrossCore™, our fraud prevention and identity management platform, leverages the full breadth of Experian data as well as the data assets of our partners. We execute machine learning models on every decision to help improve the accuracy and speed with which decisions are made. We’ve seen CrossCore machine learning result in a more than 40 percent improvement in fraud detection compared to rules-based systems. Our certified partner community for CrossCore includes only the most reputable leaders in the fraud industry. We also understand the need to expand our data to cover those who may not be credit active. We have the largest and most unique sets of alternative credit data among the credit bureaus, that includes our Clarity Services and RentBureau divisions. This rich data helps our clients verify an individual’s identity, even if they have a thin credit file. The data also helps us determine a credit applicant’s ability to pay, so that consumers are empowered to pursue the opportunities that are right for them. And in the background, our models are constantly checking for signs of fraud, so that consumers and clients feel protected. Fraud prevention and identity management are built upon a foundation of trust, innovation and keeping the consumer at the heart of every decision. This is where I’m proud to say that Experian stands apart. We realize that criminals will continue to look for new ways to commit fraud, and we are continually striving to stay one step ahead of them. Through our unparalleled scale of data, partnerships and commitment to innovation, we will help businesses become more confident in their ability to recognize good people and transactions, provide great experiences, and protect against fraud.
In 2011, data scientists and credit risk managers finally found an appropriate analogy to explain what we do for a living. “You know Moneyball? What Paul DePodesta and Billy Beane did for the Oakland A’s, I do for XYZ Bank.” You probably remember the story: Oakland had to squeeze the most value out of its limited budget for hiring free agents, so it used analytics — the new baseball “sabermetrics” created by Bill James — to make data-driven decisions that were counterintuitive to the experienced scouts. Michael Lewis told the story in a book that was an incredible bestseller and led to a hit movie. The year after the movie was made, Harvard Business Review declared that data science was “the sexiest job of the 21st century.” Coincidence? The importance of data Moneyball emphasized the recognition, through sabermetrics, that certain players’ abilities had been undervalued. In Travis Sawchik’s bestseller Big Data Baseball: Math, Miracles, and the End of a 20-Year Losing Streak, he notes that the analysis would not have been possible without the data. Early visionaries, including John Dewan, began collecting baseball data at games all over the country in a volunteer program called Project Scoresheet. Eventually they were collecting a million data points per season. In a similar fashion, credit data pioneers, such as TRW’s Simon Ramo, began systematically compiling basic credit information into credit files in the 1960s. Recognizing that data quality is the key to insights and decision-making and responding to the demand for objective data, Dewan formed two companies — Sports Team Analysis and Tracking Systems (STATS) and Baseball Info Solutions (BIS). It seems quaint now, but those companies collected and cleaned data using a small army of video scouts with stopwatches. Now data is collected in real time using systems from Pitch F/X and the radar tracking system Statcast to provide insights that were never possible before. It’s hard to find a news article about Game 1 of this year’s World Series that doesn’t discuss the launch angle or exit velocity of Eduardo Núñez’s home run, but just a couple of years ago, neither statistic was even measured. Teams use proprietary biometric data to keep players healthy for games. Even neurological monitoring promises to provide new insights and may lead to changes in the game. Similarly, lenders are finding that so-called “nontraditional data” can open up credit to consumers who might have been unable to borrow money in the past. This includes nontraditional Fair Credit Reporting Act (FCRA)–compliant data on recurring payments such as rent and utilities, checking and savings transactions, and payments to alternative lenders like payday and short-term loans. Newer fintech lenders are innovating constantly — using permissioned, behavioral and social data to make it easier for their customers to open accounts and borrow money. Similarly, some modern banks use techniques that go far beyond passwords and even multifactor authentication to verify their customers’ identities online. For example, identifying consumers through their mobile device can improve the user experience greatly. Some lenders are even using behavioral biometrics to improve their online and mobile customer service practices. Continuously improving analytics Bill James and his colleagues developed a statistic called wins above replacement (WAR) that summarized the value of a player as a single number. WAR was never intended to be a perfect summary of a player’s value, but it’s very convenient to have a single number to rank players. Using the same mindset, early credit risk managers developed credit scores that summarized applicants’ risk based on their credit history at a single point in time. Just as WAR is only one measure of a player’s abilities, good credit managers understand that a traditional credit score is an imperfect summary of a borrower’s credit history. Newer scores, such as VantageScore® credit scores, are based on a broader view of applicants’ credit history, such as credit attributes that reflect how their financial situation has changed over time. More sophisticated financial institutions, though, don’t rely on a single score. They use a variety of data attributes and scores in their lending strategies. Just a few years ago, simply using data to choose players was a novel idea. Now new measures such as defense-independent pitching statistics drive changes on the field. Sabermetrics, once defined as the application of statistical analysis to evaluate and compare the performance of individual players, has evolved to be much more comprehensive. It now encompasses the statistical study of nearly all in-game baseball activities. A wide variety of data-driven decisions Sabermetrics began being used for recruiting players in the 1980’s. Today it’s used on the field as well as in the back office. Big Data Baseball gives the example of the “Ted Williams shift,” a defensive technique that was seldom used between 1950 and 2010. In the world after Moneyball, it has become ubiquitous. Likewise, pitchers alter their arm positions and velocity based on data — not only to throw more strikes, but also to prevent injuries. Similarly, when credit scores were first introduced, they were used only in originations. Lenders established a credit score cutoff that was appropriate for their risk appetite and used it for approving and declining applications. Now lenders are using Experian’s advanced analytics in a variety of ways that the credit scoring pioneers might never have imagined: Improving the account opening experience — for example, by reducing friction online Detecting identity theft and synthetic identities Anticipating bust-out activity and other first-party fraud Issuing the right offer to each prescreened customer Optimizing interest rates Reviewing and adjusting credit lines Optimizing collections Analytics is no substitute for wisdom Data scientists like those at Experian remind me that in banking, as in baseball, predictive analytics is never perfect. What keeps finance so interesting is the inherent unpredictability of the economy and human behavior. Likewise, the play on the field determines who wins each ball game: anything can happen. Rob Neyer’s book Power Ball: Anatomy of a Modern Baseball Game quotes the Houston Astros director of decision sciences: “Sometimes it’s just about reminding yourself that you’re not so smart.”
This is an exciting time to work in big data analytics. Here at Experian, we have more than 2 petabytes of data in the United States alone. In the past few years, because of high data volume, more computing power and the availability of open-source code algorithms, my colleagues and I have watched excitedly as more and more companies are getting into machine learning. We’ve observed the growth of competition sites like Kaggle, open-source code sharing sites like GitHub and various machine learning (ML) data repositories. We’ve noticed that on Kaggle, two algorithms win over and over at supervised learning competitions: If the data is well-structured, teams that use Gradient Boosting Machines (GBM) seem to win. For unstructured data, teams that use neural networks win pretty often. Modeling is both an art and a science. Those winning teams tend to be good at what the machine learning people call feature generation and what we credit scoring people called attribute generation. We have nearly 1,000 expert data scientists in more than 12 countries, many of whom are experts in traditional consumer risk models — techniques such as linear regression, logistic regression, survival analysis, CART (classification and regression trees) and CHAID analysis. So naturally I’ve thought about how GBM could apply in our world. Credit scoring is not quite like a machine learning contest. We have to be sure our decisions are fair and explainable and that any scoring algorithm will generalize to new customer populations and stay stable over time. Increasingly, clients are sending us their data to see what we could do with newer machine learning techniques. We combine their data with our bureau data and even third-party data, we use our world-class attributes and develop custom attributes, and we see what comes out. It’s fun — like getting paid to enter a Kaggle competition! For one financial institution, GBM armed with our patented attributes found a nearly 5 percent lift in KS when compared with traditional statistics. At Experian, we use Extreme Gradient Boosting (XGBoost) implementation of GBM that, out of the box, has regularization features we use to prevent overfitting. But it’s missing some features that we and our clients count on in risk scoring. Our Experian DataLabs team worked with our Decision Analytics team to figure out how to make it work in the real world. We found answers for a couple of important issues: Monotonicity — Risk managers count on the ability to impose what we call monotonicity. In application scoring, applications with better attribute values should score as lower risk than applications with worse values. For example, if consumer Adrienne has fewer delinquent accounts on her credit report than consumer Bill, all other things being equal, Adrienne’s machine learning score should indicate lower risk than Bill’s score. Explainability — We were able to adapt a fairly standard “Adverse Action” methodology from logistic regression to work with GBM. There has been enough enthusiasm around our results that we’ve just turned it into a standard benchmarking service. We help clients appreciate the potential for these new machine learning algorithms by evaluating them on their own data. Over time, the acceptance and use of machine learning techniques will become commonplace among model developers as well as internal validation groups and regulators. Whether you’re a data scientist looking for a cool place to work or a risk manager who wants help evaluating the latest techniques, check out our weekly data science video chats and podcasts.
Machine learning (ML), the newest buzzword, has swept into the lexicon and captured the interest of us all. Its recent, widespread popularity has stemmed mainly from the consumer perspective. Whether it’s virtual assistants, self-driving cars or romantic matchmaking, ML has rapidly positioned itself into the mainstream. Though ML may appear to be a new technology, its use in commercial applications has been around for some time. In fact, many of the data scientists and statisticians at Experian are considered pioneers in the field of ML, going back decades. Our team has developed numerous products and processes leveraging ML, from our world-class consumer fraud and ID protection to producing credit data products like our Trended 3DTM attributes. In fact, we were just highlighted in the Wall Street Journal for how we’re using machine learning to improve our internal IT performance. ML’s ability to consume vast amounts of data to uncover patterns and deliver results that are not humanly possible otherwise is what makes it unique and applicable to so many fields. This predictive power has now sparked interest in the credit risk industry. Unlike fraud detection, where ML is well-established and used extensively, credit risk modeling has until recently taken a cautionary approach to adopting newer ML algorithms. Because of regulatory scrutiny and perceived lack of transparency, ML hasn’t experienced the broad acceptance as some of credit risk modeling’s more utilized applications. When it comes to credit risk models, delivering the most predictive score is not the only consideration for a model’s viability. Modelers must be able to explain and detail the model’s logic, or its “thought process,” for calculating the final score. This means taking steps to ensure the model’s compliance with the Equal Credit Opportunity Act, which forbids discriminatory lending practices. Federal laws also require adverse action responses to be sent by the lender if a consumer’s credit application has been declined. This requires the model must be able to highlight the top reasons for a less than optimal score. And so, while ML may be able to deliver the best predictive accuracy, its ability to explain how the results are generated has always been a concern. ML has been stigmatized as a “black box,” where data mysteriously gets transformed into the final predictions without a clear explanation of how. However, this is changing. Depending on the ML algorithm applied to credit risk modeling, we’ve found risk models can offer the same transparency as more traditional methods such as logistic regression. For example, gradient boosting machines (GBMs) are designed as a predictive model built from a sequence of several decision tree submodels. The very nature of GBMs’ decision tree design allows statisticians to explain the logic behind the model’s predictive behavior. We believe model governance teams and regulators in the United States may become comfortable with this approach more quickly than with deep learning or neural network algorithms. Since GBMs are represented as sets of decision trees that can be explained, while neural networks are represented as long sets of cryptic numbers that are much harder to document, manage and understand. In future blog posts, we’ll discuss the GBM algorithm in more detail and how we’re using its predictability and transparency to maximize credit risk decisioning for our clients.
Customer Identification Program (CIP) solution through CrossCore® Every day, I work closely with clients to reduce the negative side effects of fraud prevention. I hear the need for lower false-positive rates; maximum fraud detection in populations; and simple, streamlined verification processes. Lately, more conversations have turned toward ID verification needs for Customer Information Program (CIP) administration. As it turns out, barriers to growth, high customer friction and high costs dominate the CIP landscape. While the marketplace struggles to manage the impact of fraud prevention, CIP routinely disrupts more than 10 percent of new customer acquisitions. Internally at Experian, we talk about this as the biggest ID problem our customers aren’t solving. Think about this: The fight for business in the CIP space quickly turned to price, and price was defined by unit cost. But what’s the real cost? One of the dominant CIP solutions uses a series of hyperlinks to connect identity data. Every click is a new charge. Their website invites users to dig into the data — manually. Users keep digging, and they keep paying. And the challenges don’t stop there. Consider the data sources used for these solutions. The winners of the price fight built CIP solutions around credit bureau header data. What does that do for growth? If the identity wasn’t sufficiently verified when a credit report was pulled, does it make sense to go back to the same data source? Keep digging. Cha-ching, cha-ching. Right about now, you might be feeling like there’s some sleight of hand going on. The true cost of CIP administration is much more than a single unit price. It’s many units, manual effort, recycled data and frustrated customers — and it impacts far more clients than fraud prevention. CIP needs have moved far beyond the demand for a low-cost solution. We’re thrilled to be leading the move toward more robust data and decision capabilities to CIP through CrossCore®. With its open architecture and flexible decision structure, our CrossCore platform enables access to a diverse and robust set of data sources to meet these needs. CrossCore unites Experian data, client data and a growing list of available partner data to deliver an intelligent and cost-conscious approach to managing fraud and identity challenges. The next step will unify CIP administration, fraud analytics and a range of verification treatment options together on the CrossCore platform as well. Spoiler alert. We’ve already taken that step.
As more financial institutions express interest and leverage alternative credit data sources to decision and assess consumers, lenders want to be assured of how they can best utilize this data source and maintain compliance. Experian recently interviewed Philip Bohi, Vice President for Compliance Education for the American Financial Services Association (AFSA), to learn more about his perspective on this topic, as well as to gain insights on what lenders should consider as they dive into the world of alternative credit data. Alternative data continues to be a hot topic in the financial services space. How have you seen it evolve over the past few years? It’s hard to pinpoint where it began, but it has been interesting to observe how technology firms and people have changed our perceptions of the value and use of data in recent years. Earlier, a company’s data was just the information needed to conduct business. It seems like people are waking up to the realization that their business data can be useful internally, as well as to others. And we have come to understand how previously disregarded data can be profoundly valuable. These insights provide a lot of new opportunities, but also new questions. I would also say that the scope of alternative credit data use has changed. A few years ago, alternative credit data was a tool to largely address the thin- and no-file consumer. More recently, we’ve seen it can provide a lift across the credit spectrum. We recently conducted a survey with lenders and 23% of respondents cited “complying with laws and regulations” as the top barrier to utilizing alternative data. Why do you think this is the case? What are the top concerns you hear from lenders as it relates to compliance on this topic? The consumer finance industry is very focused on compliance, because failure to maintain compliance can kill a business, either directly through fines and expenses, or through reputation damage. Concerns about alternative data come from a lack of familiarity. There is uncertainty about acquiring the data, using the data, safeguarding the data, selling the data, etc. Companies want to feel confident that they know where the limits are in creating, acquiring, using, storing and selling data. Alternative data is a broad term. When it comes to utilizing it for making a credit decision, what types of alternative data can actually be used? Currently the scope is somewhat limited. I would describe the alternative data elements as being analogous to traditional credit data. Alternative data includes rent payments, utility payments, cell phone payments, bank deposits, and similar records. These provide important insights into whether a given consumer is keeping up with financial obligations. And most importantly, we are seeing that the particular types of obligations reflected in alternative data reflect the spending habits of people whose traditional credit files are thin or non-existent. This is a good thing, as alternative data captures consumers who are paying their bills consistently earlier than traditional data does. Serving those customers is a great opportunity. If a lender wants to begin utilizing alternative credit data, what must they know from a compliance standpoint? I would begin with considering what the lender’s goal is and letting that guide how it will explore using alternative data. For some companies, accessing credit scores that include some degree of alternative data along with traditional data elements is enough. Just doing that provides a good business benefit without introducing a lot of additional risk as compared to using traditional credit score information. If the company wants to start leveraging its own customer data for its own purposes, or making it available to third parties, that becomes complex very quickly. A company can find itself subject to all the regulatory burdens of a credit-reporting agency very quickly. In any case, the entire lifecycle of the data has to be considered, along with how the data will be protected when the data is “at rest,” “in use,” or “in transit.” Alternative data used for credit assessment should additionally be FCRA-compliant. How do you see alternative credit data evolving in the future? I cannot predict where it will go, but the unfettered potential is dizzying. Think about how DNA-based genealogy has taken off, telling folks they have family members they did not know and providing information to solve old crimes. I think we need to carefully balance personal privacy and prudent uses of customer data. There is also another issue with wide-ranging uses of new data. I contend it takes time to discern whether an element of data is accurately predictive. Consider for a moment a person’s utility bills. If electricity usage in a household goes down when the bills in the neighborhood are going up, what does that tell us? Does it mean the family is under some financial strain and using the air conditioning less? Or does it tell us they had solar panels installed? Or they’ve been on vacation? Figuring out what a particular piece of data means about someone’s circumstances can be difficult. About Philip Bohi Philip joined AFSA in 2017 as Vice President, Compliance Education. He is responsible for providing strategic direction and leadership for the Association’s compliance activities, including AFSA University, and is the staff liaison to the Operations and Regulatory Compliance Committee and Technology Task Forces. He brings significant consumer finance legal and compliance experience to AFSA, having served as in-house counsel at Toyota Motor Credit Corporation and Fannie Mae. At those companies, Philip worked closely with compliance staff supporting technology projects, legislative tracking, and vendor management. His private practice included work on manufactured housing, residential mortgage compliance, and consumer finance matters at McGlinchey Stafford, PLLC and Lotstein Buckman, LLP. He is a member of the Virginia State Bar and the District of Columbia Bar. Learn more about the array of alternative credit data sources available to financial institutions.
As I mentioned in my previous blog, model validation is an essential step in evaluating a recently developed predictive model’s performance before finalizing and proceeding with implementation. An in-time validation sample is created to set aside a portion of the total model development sample so the predictive accuracy can be measured on a data sample not used to develop the model. However, if few records in the target performance group are available, splitting the total model development sample into the development and in-time validation samples will leave too few records in the target group for use during model development. An alternative approach to generating a validation sample is to use a resampling technique. There are many different types and variations of resampling methods. This blog will address a few common techniques. Jackknife technique — An iterative process whereby an observation is removed from each subsequent sample generation. So if there are N number of observations in the data, jackknifing calculates the model estimates on N - 1 different samples, with each sample having N - 1 observations. The model then is applied to each sample, and an average of the model predictions across all samples is derived to generate an overall measure of model performance and prediction accuracy. The jackknife technique can be broadened to a group of observations removed from each subsequent sample generation while giving equal opportunity for inclusion and exclusion to each observation in the data set. K-fold cross-validation — Generates multiple validation data sets from the holdout sample created for the model validation exercise, i.e., the holdout data is split into K subsets. The model then is applied to the K validation subsets, with each subset held out during the iterative process as the validation set while the model scores the remaining K-1 subsets. Again, an average of the predictions across the multiple validation samples is used to create an overall measure of model performance and prediction accuracy. Bootstrap technique — Generates subsets from the full model development data sample, with replacement, producing multiple samples generally of equal size. Thus, with a total sample size of N, this technique generates N random samples such that a single observation can be present in multiple subsets while another observation may not be present in any of the generated subsets. The generated samples are combined into a simulated larger data sample that then can be split into a development and an in-time, or holdout, validation sample. Before selecting a resampling technique, it’s important to check and verify data assumptions for each technique against the data sample selected for your model development, as some resampling techniques are more sensitive than others to violations of data assumptions. Learn more about how Experian Decision Analytics can help you with your custom model development.
An introduction to the different types of validation samples Model validation is an essential step in evaluating and verifying a model’s performance during development before finalizing the design and proceeding with implementation. More specifically, during a predictive model’s development, the objective of a model validation is to measure the model’s accuracy in predicting the expected outcome. For a credit risk model, this may be predicting the likelihood of good or bad payment behavior, depending on the predefined outcome. Two general types of data samples can be used to complete a model validation. The first is known as the in-time, or holdout, validation sample and the second is known as the out-of-time validation sample. So, what’s the difference between an in-time and an out-of-time validation sample? An in-time validation sample sets aside part of the total sample made available for the model development. Random partitioning of the total sample is completed upfront, generally separating the data into a portion used for development and the remaining portion used for validation. For instance, the data may be randomly split, with 70 percent used for development and the other 30 percent used for validation. Other common data subset schemes include an 80/20, a 60/40 or even a 50/50 partitioning of the data, depending on the quantity of records available within each segment of your performance definition. Before selecting a data subset scheme to be used for model development, you should evaluate the number of records available in your target performance group, such as number of bad accounts. If you have too few records in your target performance group, a 50/50 split can leave you with insufficient performance data for use during model development. A separate blog post will present a few common options for creating alternative validation samples through a technique known as resampling. Once the data has been partitioned, the model is created using the development sample. The model is then applied to the holdout validation sample to determine the model’s predictive accuracy on data that wasn’t used to develop the model. The model’s predictive strength and accuracy can be measured in various ways by comparing the known and predefined performance outcome to the model’s predicted performance outcome. The out-of-time validation sample contains data from an entirely different time period or customer campaign than what was used for model development. Validating model performance on a different time period is beneficial to further evaluate the model’s robustness. Selecting a data sample from a more recent time period having a fully mature set of performance data allows the modeler to evaluate model performance on a data set that may more closely align with the current environment in which the model will be used. In this case, a more recent time period can be used to establish expectations and set baseline parameters for model performance, such as population stability indices and performance monitoring. Learn more about how Experian Decision Analytics can help you with your custom model development needs.