Description
Data are of high quality "if they are fit for their intended uses in operations, decision making and planning" (J. M. Juran). Alternatively, the data are deemed of high quality if they correctly represent the real-world construct to which they refer. Furthermore, apart from these definitions, as data volume increases, the question of internal consistency within data becomes paramount, regardless of fitness for use for any external purpose, e.g. a person's age and birth date may conflict within different parts of a database.
SAS® Sourcing Data Quality
An introduction and overview
A SAS White Paper
Table of Contents
Introduction .................................................................................................................... 1 Data integration ............................................................................................................. 1 Extraction, transformation and loading (ETL) .............................................................. 2 Extraction from any data source............................................................................... 2 Transformation ......................................................................................................... 3 Loading..................................................................................................................... 3 Data cleansing ............................................................................................................. 3 Profile, monitor and manage the quality of enterprise data...................................... 3 Integrate and standardize data across multiple systems and business units .......... 4 Easily define data correction rules to reflect organizational changes ...................... 4 Augmentation and enrichment ..................................................................................... 5 Commodity classification.............................................................................................. 5 The SAS commodity classification engine ................................................................. 5 Classification Viewer .................................................................................................... 6 Match Manager ............................................................................................................ 6 Match Reviewer ........................................................................................................... 7 Reports......................................................................................................................... 8 The commodity classification process ....................................................................... 10 Conclusion ................................................................................................................... 11 About SAS .................................................................................................................... 11
SAS® Sourcing Data Quality
Introduction
In all industries and markets, procurement organizations are expected to reduce the cost of purchased goods and services without reducing quality or responsiveness from suppliers. Accomplishing this goal requires both a broad and detailed understanding of the items you purchase — and who is supplying them. The ability to conduct meaningful analysis at multiple levels of detail — to determine ways of increasing negotiation leverage or reducing overall supply risk — can only be achieved with accurate, meaningful data. SAS® Sourcing Data Quality is a solution designed to ensure the quality of your sourcing data. It is the foundation that supports all other spend and supplier management activities with SAS software. SAS combines world-class tools and extensive experience in data management and data quality to deliver proven, high-quality results that can’t be found anywhere else. This white paper explores several critical facets of SAS Sourcing Data Quality. We begin with a discussion of data integration — the process of gathering supplier data into one location, cleansing it, augmenting it with additional information and then classifying commodity data for use in supplier and spend analysis. Then we provide an overview of the SAS commodity classification engine, which helps automate the process of preparing commodity data for analysis.
Data integration
SAS data integration tools, which are included with SAS Sourcing Data Quality, aggregate data from across the organization into a single repository using an automated, repeatable process for frequent updates. These tools can incorporate data from all relevant source systems, including ERP, legacy systems and purchase cards. Once the data is aggregated, it can be cleansed, normalized, classified and enriched with external or third-party data. Such enhanced, accurate information is ready to support spend analysis and strategic purchasing decisions. To ensure the accuracy and credibility of the information in spend analysis tools, SAS also provides data quality capabilities that are fully integrated within the data integration and transformation process. Our standard process includes integrated, prebuilt capabilities for advanced data manipulation, data quality and quality verification. Using proprietary logic, SAS tools cleanse and de-dupe data quickly and effectively to transform it into information that business users can trust. SAS data integration technology also ensures that the sourcing environment is auditable, repeatable and secure because it bases that environment on a layer of business and technical metadata. The metadata layer allows companies to gather and store data in multiple formats and locations without losing the ability to use it consistently for business intelligence. Through metadata, SAS provides a centralized and easily managed system for consistent enterprise data. Information in this system is clearly documented, with information about storage location, data contents, process flows and more.
1
SAS® Sourcing Data Quality
Extraction, transformation and loading (ETL)
The ETL process is often the main barrier to effective spend analysis. SAS solutions solve this problem by simplifying the ETL process. In particular, SAS Sourcing Data Quality includes ETL processes to extract, transform and load source data into a standard data mart. Visual design tools also simplify the visualization, navigation and maintenance of ETL processes. With them, you can quickly build, implement and manage ETL processes from source to destination. Further, SAS Sourcing Data Quality contains ETL processes that import client-specific standard abbreviations and other rules into a classification library. Other processes create custom classification systems (taxonomies) from your specific categories or taxonomies.
Figure 1: SAS Sourcing Data Quality includes powerful visual design tools for creating and maintaining ETL processes.
Extraction from any data source
Data extraction is the first and most significant step in creating any kind of enterprise intelligence. With data residing on numerous platforms and servers in a multitude of formats, gaining efficient and complete access to all relevant organizational data is essential.
2
SAS® Sourcing Data Quality
One of the core strengths of SAS software is its ability to access data directly from nearly every information source, regardless of complexity, storage medium or operational platform. Through the use of engines native to a given platform and “adapters” that can translate underlying, complex data structures for quick and easy access, SAS completely solves the problem of gathering information for analysis. SAS customers have at their disposal more than 100 native access engines covering databases, operational systems, external data sources, e-sources and more. SAS extraction tools also read relevant metadata and associated information. Because SAS provides a complete environment for automating and managing access to data, additional programming and scheduling resources are not typically required.
Transformation
SAS Sourcing Data Quality integrates transformation and data quality routines into the data management process. Built-in transformations include validation and scrubbing, integration and data organization, and analytical procedures — all designed to ensure that data in the data repository conforms to established business rules. All of these built-in transformations are open for customization. SAS also supports a number of transformation and storage standards to facilitate ease of use and maintenance.
Loading
Loading is the process of storing data in a physical repository for further use. Our philosophy for loading is the same as for data extraction — we provide a flexible architecture that accommodates platform-independent storage and distributed processing options for limitless possibilities.
Data cleansing
Another element of SAS Sourcing Data Quality is data cleansing, a process that enhances the quality of all sourcing data. More specifically, SAS helps eliminate inconsistent categorization, duplicated master records and other data anomalies, so your organization can make decisions based on accurate, concise and trustworthy information.
Profile, monitor and manage the quality of enterprise data
SAS Sourcing Data Quality provides the ability to analyze and assess the quality of data across the enterprise. Profiling allows you to discover areas where potential gaps exist and identify what efforts will be required to rectify them. This capability enables organizations to focus on improving key data areas and business processes for better planning and project execution.
3
SAS® Sourcing Data Quality
Integrate and standardize data across multiple systems and business units
With SAS Sourcing Data Quality, you can incorporate data quality business rules across data sources and platforms. End-to-end audit capabilities highlight system-related quality issues by applying data quality algorithms and analytics. Through the implementation of standard and custom processes, data from different systems can be regulated and standardized into a unified, accurate view. Standardizing and integrating master data is particularly important. SAS provides automated processes to normalize supplier records and other types of master data. These standardization steps, along with other data cleansing rules, are integrated into the overall data management process.
Figure 2: Easily compare supplier data to cleanse data and consolidate suppliers.
Easily define data correction rules to reflect organizational changes
Specialized interfaces in SAS Sourcing Data Quality make it easy for business analysts and data stewards to create data standardizations and visualize the impact of business rules and data cleansing. State-of-the-art data quality tools enable both business and technical users to cleanse, standardize, integrate and augment data. These tools can be customized to meet the individual requirements of each organization.
4
SAS® Sourcing Data Quality
Augmentation and enrichment
When working with sourcing data, it can be beneficial to augment existing cleansed information with external data. Relevant external enrichment information could include supplier parent/child information, supplier risk estimates, benchmark pricing or economic indicators. SAS supports the inclusion of third party augmentation data from all major data service providers. Comprehensive supplier content could include: • • • • • • Business demographic information (standardized name and address). Corporate linkage (parent-child) information. Business financial information (sales, profit, financial ratios, risk rating). Product/service commodity information (in SIC, NAICS and UNSPSC™ code formats). Federal Form 294 and 295 supplier reporting categories. Socioeconomic classifications/certifications (including certification source and certificate identification number). U.S. Federal government debarment lists.
•
Commodity classification
A final step in preparing data for analysis is accurate commodity classification of item descriptions to a standard taxonomy such as UNSPSC. SAS Sourcing Data Quality helps you match materials to the desired classification using predefined and customized rules. Our automated classification engine (described below) achieves both high match rates and a high degree of match accuracy. Match rules can be refined continually to improve accuracy even more.
The SAS commodity classification engine
SAS Sourcing Data Quality includes a commodity classification application — an intuitive, easyto-use Java application that automatically classifies purchased item descriptions from a variety of sources using either a standard or custom commodity coding taxonomy. The classification engine provides both rule-based and text matching to allow for the most accurate, reliable classification. The application consists of four components: a Classification Viewer, Match Manager, Match Reviewer and various reports.
5
SAS® Sourcing Data Quality
Classification Viewer
The Classification Viewer enables users to view the commodity classification taxonomy being used for processing, in either a tabular or grid view. The default taxonomies delivered with the solution are UNSPSC and eCl@ss. The Classification Viewer enables users to view all levels of the taxonomy. For UNSPSC, the four levels — segment, family, class and commodity — and each node’s corresponding numeric designation are displayed. SAS Sourcing Data Quality also supports the use of any other standard taxonomy or custom-built structure.
Match Manager
The Match Manager stores and administers all the rules for single-word and multiple-word matches. These rules help drive a text-based description to match either a single commodity code or multiple commodity codes. SAS Sourcing Data Quality includes an extensive library of both single- and multiple-match rules for many of the more common commodity segments. Users, with proper authority, can easily add, delete or modify rules in this library. In addition to handling single- and multiple-match rules, the Match Manager stores libraries of synonyms, abbreviations and brand names. These are applied to item descriptions after the engine processes both single- and multiple-match rules to help the classification engine decide what the item is. This library can also be modified using client-specific abbreviations, synonyms or files that contain values for any type of rule in the library. The Match Manager also enables users to select which commodity segments should be included when the classification engine is run. This increases accuracy in matching and reduces the runtime of the engine. Segments not represented by the sample data can be excluded from the processing. In addition, users can select family, class and commodity-level nodes within a particular segment. Context fields in the source data can be set in the Match Manager to be used with the item description during matching. Data fields such as category, secondary description and others can be used to provide further context for item descriptions. Multiple sessions of the classification engine can run simultaneously when hardware permits, which greatly improves run-time.
6
SAS® Sourcing Data Quality
Figure 3: The Match Manager stores libraries of match rules and synonyms, abbreviations and brand names.
Match Reviewer
The Match Reviewer enables users to view the results of the matching process. When the classification engine completes a run, automated e-mail notifications are sent to let reviewers know that item description matches are available for review. A number of filtering options help in the review process. Users can view all matched item descriptions, just the single matches, just the multiple matches, all unmatched item descriptions or all item descriptions. Any source data fields (e.g. supplier, item number, manufacturer part number) can be included in the review to help classify an item. Item descriptions can also be grouped by any source data values, so reviewers can choose to see only those items for which they are responsible. For example, you might group items by buyer codes. To better facilitate confirmation of correct matches, the Match Reviewer provides synchronization of item descriptions and commodities in the Classification Viewer. The Match Reviewer also allows automated Web searching for hard-to-classify descriptions. The application administrator chooses the specific search engine.
7
SAS® Sourcing Data Quality
The goal of the Match Reviewer is to facilitate the quality assurance process and to confirm all the correct matches while working on exceptions to improve the overall match rate. Best practices dictate an iterative process that focuses on exceptions. The match engine can and should be run a few times. This practice facilitates more meaningful spend analysis.
Figure 4: The Match Reviewer and Classification Viewer let users confirm and review commodity classifications manually to ensure accuracy.
Reports
A series of reports is available in SAS Sourcing Data Quality to present results and assist in the iterative quality assurance matching process. These reports include: • Match engine statistics. Displays match process statistics for each iteration of the match engine. Matches by commodity segments. This report helps to determine which commodity segments are represented in the data and which ones are not. Commodity segment frequency. Corresponds to the “Matches by Commodity Segment” report and displays the frequency of each segment included in the matching process. Matches found. This report is useful for understanding how a match was determined and why it was considered strong enough to keep.
•
•
•
8
SAS® Sourcing Data Quality
•
Unmatched items. This report displays the Original Item ID and Original Item Description for all items that did not satisfy any of the match rules. Unmatched words. Displays each unique alphabetic word found in the unmatched item descriptions and the number of times the word occurs in item descriptions. Unmatched terms. Displays each unique alphanumeric term found in the unmatched item descriptions and the number of times the word occurs in item descriptions that have no proposed matches for the current iteration of the matching process. View of commodity segments. Displays all commodity segments included in the match processing for the current matches displayed in the Match Reviewer. New match rules by user. Displays match rules added since the last match process was run, as well as the user who added each rule. Confirmed matches. This report displays the statistics of all confirmed matches. It helps provide an assessment of the classification review process.
•
•
•
•
•
Figure 5: Reports available within SAS Sourcing Data Quality assist in the iterative matching process.
9
SAS® Sourcing Data Quality
The commodity classification process
How does commodity classification work within SAS Sourcing Data Quality? Using prepared commodity data, the classification engine processes item descriptions in the following order: 1. The system checks single-match rules. Single-match rules are stored in two sets. The first set contains rules that will match to the lowest level in the taxonomy. The second set contains rules that match to a higher level, in most cases next to the lowest level. Descriptions are first compared to the lowest level single-match rules. If a match is not found for a description, it is then compared to the higher level single-match rules. Items that match to a single-match rule drop out of the processing, and the match is automatically confirmed. If desired, the automatic confirmation can be turned off before the match process is run. Items that have been automatically confirmed are still displayed in the match reviewer interface and can be reassigned to a different commodity by the reviewer. 2. If single-match rules fail to locate a match, the system checks multiple-match rules. Multiple-match rules are helpful for descriptions that contain strong words such as “table,” along with several descriptive words that could result in extraneous matches. There may be more than one commodity code that the particular description could match. Items that match to multiple-match rules will continue being processed and will also be compared to commodity descriptions. During match processing, the engine keeps statistics on each match. Once all potential matches are found, the engine uses these statistics to determine which matches to keep and which to discard. Matches that result from multiple-match rules are always considered strong and will be kept. This process results in the highest confidence for matches and helps reduce the number of items that need manual review. 3. The system then applies abbreviations, brands and synonyms to the item descriptions. After multiple-match rule matches are found, the engine applies abbreviation rules, brand rules and then synonym rules to any standardized descriptions that did not match to a single-match rule. A system-generated review of the UNSPSC segments processes for a potential match against available commodity descriptions. The classification engine performs text matching by comparing the descriptions to commodity descriptions in the taxonomy. The engine compares segments or portions of segments one at a time, removing noise words from descriptions that do not occur in the particular segment.
4.
10
SAS® Sourcing Data Quality
When reviewers have completed the review and confirmation process, an ETL process runs to export the confirmed classifications. A custom process imports the classifications to the appropriate source data. This export of confirmed classifications and import to the source can be executed at any time during the review process or all at once when the decision is made that all or enough of the items in the current cycle have been classified. Once classified items are exported, they are no longer displayed in the Match Reviewer or reflected in reports that are run on request. Best practices dictate an iterative process that involves running the match engine, reviewing exceptions or item descriptions that did not match, building out new match rules to capture and match those unmatched items and finally approving all of the matches. Over time, an extensive library of match rules will be created, thereby increasing first-time match rates.
Conclusion
SAS Sourcing Data Quality provides a complete, powerful set of data management capabilities and an intuitive, robust cleansing and classification engine that enables companies to understand their supplier spend. With the cleansed and standardized, classified data that SAS Sourcing Data Quality provides, you can: • • • Align procurement decisions with overall vendor and partner strategies. Consolidate suppliers based on a complete view of your relationship with them. Understand true volumes per commodity and supplier.
SAS also offers a number of other solutions that extend the value of SAS Sourcing Data Quality and provide even more insight into supplier relationships and the extended supply chain. SAS Supplier Relationship Management is a suite of solutions that includes SAS Sourcing Data Quality and solutions for spend analysis, tracking and measurement of key performance indicators, and supply base optimization. For more information about SAS Supplier Relationship Management, visit http://www.sas.com/solutions/srm.
About SAS
SAS is the market leader in providing a new generation of business intelligence software and services that create true enterprise intelligence. SAS solutions are used at about 40,000 sites — including 96 of the top 100 companies on the FORTUNE Global 500® — to develop more profitable relationships with customers and suppliers; to enable better, more accurate and informed decisions; and to drive organizations forward. SAS is the only vendor that completely integrates leading data warehousing, analytics and traditional BI applications to create intelligence from massive amounts of data. For nearly three decades, SAS has been giving customers around the world The Power to Know®.
11
World Headquarters and SAS Americas SAS Campus Drive Cary, NC 27513 USA Tel: (1) 919 677 8000 Fax: (1) 919 677 4444 U.S. & Canada sales: (1) 800 727 0025
SAS International PO Box 10 53 40 Neuenheimer Landsr. 28-30 D-69043 Heidelberg, Germany Tel: (49) 6221 4160 Fax: (49) 6221 474850
www.sas.com
SAS and all other SAS Institute Inc. product or service names are registered trademarks or trademarks of SAS Institute Inc. in the USA and other countries. ® indicates USA registration. Other brand and product names are trademarks of their respective companies. Copyright © 2005, SAS Institute Inc. All rights reserved. 102301US_355734.1005
doc_521434903.pdf
Data are of high quality "if they are fit for their intended uses in operations, decision making and planning" (J. M. Juran). Alternatively, the data are deemed of high quality if they correctly represent the real-world construct to which they refer. Furthermore, apart from these definitions, as data volume increases, the question of internal consistency within data becomes paramount, regardless of fitness for use for any external purpose, e.g. a person's age and birth date may conflict within different parts of a database.
SAS® Sourcing Data Quality
An introduction and overview
A SAS White Paper
Table of Contents
Introduction .................................................................................................................... 1 Data integration ............................................................................................................. 1 Extraction, transformation and loading (ETL) .............................................................. 2 Extraction from any data source............................................................................... 2 Transformation ......................................................................................................... 3 Loading..................................................................................................................... 3 Data cleansing ............................................................................................................. 3 Profile, monitor and manage the quality of enterprise data...................................... 3 Integrate and standardize data across multiple systems and business units .......... 4 Easily define data correction rules to reflect organizational changes ...................... 4 Augmentation and enrichment ..................................................................................... 5 Commodity classification.............................................................................................. 5 The SAS commodity classification engine ................................................................. 5 Classification Viewer .................................................................................................... 6 Match Manager ............................................................................................................ 6 Match Reviewer ........................................................................................................... 7 Reports......................................................................................................................... 8 The commodity classification process ....................................................................... 10 Conclusion ................................................................................................................... 11 About SAS .................................................................................................................... 11
SAS® Sourcing Data Quality
Introduction
In all industries and markets, procurement organizations are expected to reduce the cost of purchased goods and services without reducing quality or responsiveness from suppliers. Accomplishing this goal requires both a broad and detailed understanding of the items you purchase — and who is supplying them. The ability to conduct meaningful analysis at multiple levels of detail — to determine ways of increasing negotiation leverage or reducing overall supply risk — can only be achieved with accurate, meaningful data. SAS® Sourcing Data Quality is a solution designed to ensure the quality of your sourcing data. It is the foundation that supports all other spend and supplier management activities with SAS software. SAS combines world-class tools and extensive experience in data management and data quality to deliver proven, high-quality results that can’t be found anywhere else. This white paper explores several critical facets of SAS Sourcing Data Quality. We begin with a discussion of data integration — the process of gathering supplier data into one location, cleansing it, augmenting it with additional information and then classifying commodity data for use in supplier and spend analysis. Then we provide an overview of the SAS commodity classification engine, which helps automate the process of preparing commodity data for analysis.
Data integration
SAS data integration tools, which are included with SAS Sourcing Data Quality, aggregate data from across the organization into a single repository using an automated, repeatable process for frequent updates. These tools can incorporate data from all relevant source systems, including ERP, legacy systems and purchase cards. Once the data is aggregated, it can be cleansed, normalized, classified and enriched with external or third-party data. Such enhanced, accurate information is ready to support spend analysis and strategic purchasing decisions. To ensure the accuracy and credibility of the information in spend analysis tools, SAS also provides data quality capabilities that are fully integrated within the data integration and transformation process. Our standard process includes integrated, prebuilt capabilities for advanced data manipulation, data quality and quality verification. Using proprietary logic, SAS tools cleanse and de-dupe data quickly and effectively to transform it into information that business users can trust. SAS data integration technology also ensures that the sourcing environment is auditable, repeatable and secure because it bases that environment on a layer of business and technical metadata. The metadata layer allows companies to gather and store data in multiple formats and locations without losing the ability to use it consistently for business intelligence. Through metadata, SAS provides a centralized and easily managed system for consistent enterprise data. Information in this system is clearly documented, with information about storage location, data contents, process flows and more.
1
SAS® Sourcing Data Quality
Extraction, transformation and loading (ETL)
The ETL process is often the main barrier to effective spend analysis. SAS solutions solve this problem by simplifying the ETL process. In particular, SAS Sourcing Data Quality includes ETL processes to extract, transform and load source data into a standard data mart. Visual design tools also simplify the visualization, navigation and maintenance of ETL processes. With them, you can quickly build, implement and manage ETL processes from source to destination. Further, SAS Sourcing Data Quality contains ETL processes that import client-specific standard abbreviations and other rules into a classification library. Other processes create custom classification systems (taxonomies) from your specific categories or taxonomies.
Figure 1: SAS Sourcing Data Quality includes powerful visual design tools for creating and maintaining ETL processes.
Extraction from any data source
Data extraction is the first and most significant step in creating any kind of enterprise intelligence. With data residing on numerous platforms and servers in a multitude of formats, gaining efficient and complete access to all relevant organizational data is essential.
2
SAS® Sourcing Data Quality
One of the core strengths of SAS software is its ability to access data directly from nearly every information source, regardless of complexity, storage medium or operational platform. Through the use of engines native to a given platform and “adapters” that can translate underlying, complex data structures for quick and easy access, SAS completely solves the problem of gathering information for analysis. SAS customers have at their disposal more than 100 native access engines covering databases, operational systems, external data sources, e-sources and more. SAS extraction tools also read relevant metadata and associated information. Because SAS provides a complete environment for automating and managing access to data, additional programming and scheduling resources are not typically required.
Transformation
SAS Sourcing Data Quality integrates transformation and data quality routines into the data management process. Built-in transformations include validation and scrubbing, integration and data organization, and analytical procedures — all designed to ensure that data in the data repository conforms to established business rules. All of these built-in transformations are open for customization. SAS also supports a number of transformation and storage standards to facilitate ease of use and maintenance.
Loading
Loading is the process of storing data in a physical repository for further use. Our philosophy for loading is the same as for data extraction — we provide a flexible architecture that accommodates platform-independent storage and distributed processing options for limitless possibilities.
Data cleansing
Another element of SAS Sourcing Data Quality is data cleansing, a process that enhances the quality of all sourcing data. More specifically, SAS helps eliminate inconsistent categorization, duplicated master records and other data anomalies, so your organization can make decisions based on accurate, concise and trustworthy information.
Profile, monitor and manage the quality of enterprise data
SAS Sourcing Data Quality provides the ability to analyze and assess the quality of data across the enterprise. Profiling allows you to discover areas where potential gaps exist and identify what efforts will be required to rectify them. This capability enables organizations to focus on improving key data areas and business processes for better planning and project execution.
3
SAS® Sourcing Data Quality
Integrate and standardize data across multiple systems and business units
With SAS Sourcing Data Quality, you can incorporate data quality business rules across data sources and platforms. End-to-end audit capabilities highlight system-related quality issues by applying data quality algorithms and analytics. Through the implementation of standard and custom processes, data from different systems can be regulated and standardized into a unified, accurate view. Standardizing and integrating master data is particularly important. SAS provides automated processes to normalize supplier records and other types of master data. These standardization steps, along with other data cleansing rules, are integrated into the overall data management process.
Figure 2: Easily compare supplier data to cleanse data and consolidate suppliers.
Easily define data correction rules to reflect organizational changes
Specialized interfaces in SAS Sourcing Data Quality make it easy for business analysts and data stewards to create data standardizations and visualize the impact of business rules and data cleansing. State-of-the-art data quality tools enable both business and technical users to cleanse, standardize, integrate and augment data. These tools can be customized to meet the individual requirements of each organization.
4
SAS® Sourcing Data Quality
Augmentation and enrichment
When working with sourcing data, it can be beneficial to augment existing cleansed information with external data. Relevant external enrichment information could include supplier parent/child information, supplier risk estimates, benchmark pricing or economic indicators. SAS supports the inclusion of third party augmentation data from all major data service providers. Comprehensive supplier content could include: • • • • • • Business demographic information (standardized name and address). Corporate linkage (parent-child) information. Business financial information (sales, profit, financial ratios, risk rating). Product/service commodity information (in SIC, NAICS and UNSPSC™ code formats). Federal Form 294 and 295 supplier reporting categories. Socioeconomic classifications/certifications (including certification source and certificate identification number). U.S. Federal government debarment lists.
•
Commodity classification
A final step in preparing data for analysis is accurate commodity classification of item descriptions to a standard taxonomy such as UNSPSC. SAS Sourcing Data Quality helps you match materials to the desired classification using predefined and customized rules. Our automated classification engine (described below) achieves both high match rates and a high degree of match accuracy. Match rules can be refined continually to improve accuracy even more.
The SAS commodity classification engine
SAS Sourcing Data Quality includes a commodity classification application — an intuitive, easyto-use Java application that automatically classifies purchased item descriptions from a variety of sources using either a standard or custom commodity coding taxonomy. The classification engine provides both rule-based and text matching to allow for the most accurate, reliable classification. The application consists of four components: a Classification Viewer, Match Manager, Match Reviewer and various reports.
5
SAS® Sourcing Data Quality
Classification Viewer
The Classification Viewer enables users to view the commodity classification taxonomy being used for processing, in either a tabular or grid view. The default taxonomies delivered with the solution are UNSPSC and eCl@ss. The Classification Viewer enables users to view all levels of the taxonomy. For UNSPSC, the four levels — segment, family, class and commodity — and each node’s corresponding numeric designation are displayed. SAS Sourcing Data Quality also supports the use of any other standard taxonomy or custom-built structure.
Match Manager
The Match Manager stores and administers all the rules for single-word and multiple-word matches. These rules help drive a text-based description to match either a single commodity code or multiple commodity codes. SAS Sourcing Data Quality includes an extensive library of both single- and multiple-match rules for many of the more common commodity segments. Users, with proper authority, can easily add, delete or modify rules in this library. In addition to handling single- and multiple-match rules, the Match Manager stores libraries of synonyms, abbreviations and brand names. These are applied to item descriptions after the engine processes both single- and multiple-match rules to help the classification engine decide what the item is. This library can also be modified using client-specific abbreviations, synonyms or files that contain values for any type of rule in the library. The Match Manager also enables users to select which commodity segments should be included when the classification engine is run. This increases accuracy in matching and reduces the runtime of the engine. Segments not represented by the sample data can be excluded from the processing. In addition, users can select family, class and commodity-level nodes within a particular segment. Context fields in the source data can be set in the Match Manager to be used with the item description during matching. Data fields such as category, secondary description and others can be used to provide further context for item descriptions. Multiple sessions of the classification engine can run simultaneously when hardware permits, which greatly improves run-time.
6
SAS® Sourcing Data Quality
Figure 3: The Match Manager stores libraries of match rules and synonyms, abbreviations and brand names.
Match Reviewer
The Match Reviewer enables users to view the results of the matching process. When the classification engine completes a run, automated e-mail notifications are sent to let reviewers know that item description matches are available for review. A number of filtering options help in the review process. Users can view all matched item descriptions, just the single matches, just the multiple matches, all unmatched item descriptions or all item descriptions. Any source data fields (e.g. supplier, item number, manufacturer part number) can be included in the review to help classify an item. Item descriptions can also be grouped by any source data values, so reviewers can choose to see only those items for which they are responsible. For example, you might group items by buyer codes. To better facilitate confirmation of correct matches, the Match Reviewer provides synchronization of item descriptions and commodities in the Classification Viewer. The Match Reviewer also allows automated Web searching for hard-to-classify descriptions. The application administrator chooses the specific search engine.
7
SAS® Sourcing Data Quality
The goal of the Match Reviewer is to facilitate the quality assurance process and to confirm all the correct matches while working on exceptions to improve the overall match rate. Best practices dictate an iterative process that focuses on exceptions. The match engine can and should be run a few times. This practice facilitates more meaningful spend analysis.
Figure 4: The Match Reviewer and Classification Viewer let users confirm and review commodity classifications manually to ensure accuracy.
Reports
A series of reports is available in SAS Sourcing Data Quality to present results and assist in the iterative quality assurance matching process. These reports include: • Match engine statistics. Displays match process statistics for each iteration of the match engine. Matches by commodity segments. This report helps to determine which commodity segments are represented in the data and which ones are not. Commodity segment frequency. Corresponds to the “Matches by Commodity Segment” report and displays the frequency of each segment included in the matching process. Matches found. This report is useful for understanding how a match was determined and why it was considered strong enough to keep.
•
•
•
8
SAS® Sourcing Data Quality
•
Unmatched items. This report displays the Original Item ID and Original Item Description for all items that did not satisfy any of the match rules. Unmatched words. Displays each unique alphabetic word found in the unmatched item descriptions and the number of times the word occurs in item descriptions. Unmatched terms. Displays each unique alphanumeric term found in the unmatched item descriptions and the number of times the word occurs in item descriptions that have no proposed matches for the current iteration of the matching process. View of commodity segments. Displays all commodity segments included in the match processing for the current matches displayed in the Match Reviewer. New match rules by user. Displays match rules added since the last match process was run, as well as the user who added each rule. Confirmed matches. This report displays the statistics of all confirmed matches. It helps provide an assessment of the classification review process.
•
•
•
•
•
Figure 5: Reports available within SAS Sourcing Data Quality assist in the iterative matching process.
9
SAS® Sourcing Data Quality
The commodity classification process
How does commodity classification work within SAS Sourcing Data Quality? Using prepared commodity data, the classification engine processes item descriptions in the following order: 1. The system checks single-match rules. Single-match rules are stored in two sets. The first set contains rules that will match to the lowest level in the taxonomy. The second set contains rules that match to a higher level, in most cases next to the lowest level. Descriptions are first compared to the lowest level single-match rules. If a match is not found for a description, it is then compared to the higher level single-match rules. Items that match to a single-match rule drop out of the processing, and the match is automatically confirmed. If desired, the automatic confirmation can be turned off before the match process is run. Items that have been automatically confirmed are still displayed in the match reviewer interface and can be reassigned to a different commodity by the reviewer. 2. If single-match rules fail to locate a match, the system checks multiple-match rules. Multiple-match rules are helpful for descriptions that contain strong words such as “table,” along with several descriptive words that could result in extraneous matches. There may be more than one commodity code that the particular description could match. Items that match to multiple-match rules will continue being processed and will also be compared to commodity descriptions. During match processing, the engine keeps statistics on each match. Once all potential matches are found, the engine uses these statistics to determine which matches to keep and which to discard. Matches that result from multiple-match rules are always considered strong and will be kept. This process results in the highest confidence for matches and helps reduce the number of items that need manual review. 3. The system then applies abbreviations, brands and synonyms to the item descriptions. After multiple-match rule matches are found, the engine applies abbreviation rules, brand rules and then synonym rules to any standardized descriptions that did not match to a single-match rule. A system-generated review of the UNSPSC segments processes for a potential match against available commodity descriptions. The classification engine performs text matching by comparing the descriptions to commodity descriptions in the taxonomy. The engine compares segments or portions of segments one at a time, removing noise words from descriptions that do not occur in the particular segment.
4.
10
SAS® Sourcing Data Quality
When reviewers have completed the review and confirmation process, an ETL process runs to export the confirmed classifications. A custom process imports the classifications to the appropriate source data. This export of confirmed classifications and import to the source can be executed at any time during the review process or all at once when the decision is made that all or enough of the items in the current cycle have been classified. Once classified items are exported, they are no longer displayed in the Match Reviewer or reflected in reports that are run on request. Best practices dictate an iterative process that involves running the match engine, reviewing exceptions or item descriptions that did not match, building out new match rules to capture and match those unmatched items and finally approving all of the matches. Over time, an extensive library of match rules will be created, thereby increasing first-time match rates.
Conclusion
SAS Sourcing Data Quality provides a complete, powerful set of data management capabilities and an intuitive, robust cleansing and classification engine that enables companies to understand their supplier spend. With the cleansed and standardized, classified data that SAS Sourcing Data Quality provides, you can: • • • Align procurement decisions with overall vendor and partner strategies. Consolidate suppliers based on a complete view of your relationship with them. Understand true volumes per commodity and supplier.
SAS also offers a number of other solutions that extend the value of SAS Sourcing Data Quality and provide even more insight into supplier relationships and the extended supply chain. SAS Supplier Relationship Management is a suite of solutions that includes SAS Sourcing Data Quality and solutions for spend analysis, tracking and measurement of key performance indicators, and supply base optimization. For more information about SAS Supplier Relationship Management, visit http://www.sas.com/solutions/srm.
About SAS
SAS is the market leader in providing a new generation of business intelligence software and services that create true enterprise intelligence. SAS solutions are used at about 40,000 sites — including 96 of the top 100 companies on the FORTUNE Global 500® — to develop more profitable relationships with customers and suppliers; to enable better, more accurate and informed decisions; and to drive organizations forward. SAS is the only vendor that completely integrates leading data warehousing, analytics and traditional BI applications to create intelligence from massive amounts of data. For nearly three decades, SAS has been giving customers around the world The Power to Know®.
11
World Headquarters and SAS Americas SAS Campus Drive Cary, NC 27513 USA Tel: (1) 919 677 8000 Fax: (1) 919 677 4444 U.S. & Canada sales: (1) 800 727 0025
SAS International PO Box 10 53 40 Neuenheimer Landsr. 28-30 D-69043 Heidelberg, Germany Tel: (49) 6221 4160 Fax: (49) 6221 474850
www.sas.com
SAS and all other SAS Institute Inc. product or service names are registered trademarks or trademarks of SAS Institute Inc. in the USA and other countries. ® indicates USA registration. Other brand and product names are trademarks of their respective companies. Copyright © 2005, SAS Institute Inc. All rights reserved. 102301US_355734.1005
doc_521434903.pdf