In this study, we utilized the dataset provided by Nhu Mai Nong (2021), which focuses on supplier selection criteria in Vietnamese Textile and Apparel (T&A) companies. This dataset, accessible through the Mendeley Data repository, offers comprehensive insights derived from a survey conducted across 362 T&A companies. Specifically, the dataset encompasses responses from 303 garment companies, 41 textile companies, and 18 companies involved in both textile and garment production [18].
The study targeted experts and practitioners in managerial positions responsible for supplier selection, including sourcing managers, sales managers, deputy directors, and directors. All respondents possessed a minimum of three years of experience in procurement, ensuring the reliability and relevance of the data collected. Given the geographic distribution of T&A companies in Vietnam—62% located in southern Vietnam, 30% in northern Vietnam, and 8% in central Vietnam—the study employed stratified random sampling to ensure comprehensive and representative results. The stratum proportions in the sample mirrored the distribution of the total population, thereby enhancing the robustness and generalizability of the findings.
This dataset is instrumental in understanding the critical factors influencing supplier selection decisions within the Vietnamese T&A sector. The categories covered in the dataset include Supplier Development, Supplier Selection, and Supplier Management, making it a valuable resource for both academic research and practical applications in supply chain management. This data set contains 64 features and 362 lines of supplier information.
The dataset used in this study was meticulously structured to capture comprehensive details pertinent to supplier selection. Each entry in the dataset was identified by a unique SbjNum, representing the respondent's identifier. The Name column documented the respondent's name, while Positions_in_company outlined the job titles held by the respondent within their respective companies. Company_name specified the names of the companies, and Province indicated their geographic locations by province. Region further categorized these locations into broader geographical regions. The Business_field column provided insights into the industry or business sector of each company. The Export column indicated whether the company was involved in exporting activities. Various business types were detailed in columns Type_of_business1 to Type_of_business5. The Type_of_company column classified companies into categories such as public or private, while FDI highlighted their Foreign Direct Investment status. Information regarding the number of employees and the capital of the companies was recorded in Company_labor and Company_capital respectively. Additionally, Number_of_suppliers documented the count of suppliers each company engaged with. Respondent-specific details were also included, such as Sex (gender), Level (education or experience level), Your_position (specific position within the company), and Your_experience (years of experience). These details provided context to the responses and contributed to a more nuanced understanding of the supplier selection process. To evaluate the suppliers, several key performance metrics were considered. These were categorized into different criteria:
Quality Assurance (QA1 to QA5)
These columns assessed various aspects of the suppliers' quality assurance practices.
Cost (CO1 to CO5)
This criterion measured the cost-related factors, emphasizing the economic aspect of supplier selection.
Delivery (DE1 to DE5)
Delivery-related criteria evaluated the timeliness and efficiency of the suppliers' delivery processes.
Service (SE1 to SE5)
Service criteria focused on the suppliers' ability to provide consistent and reliable service.
Capability (CA1 to CA6)
This set of criteria assessed the overall capabilities of the suppliers, including technological and operational aspects.
Production (PR1 to PR4)
These criteria evaluated the suppliers' production capabilities and capacities.
Customer Service (CS1 to CS4)
Customer service criteria measured the quality and responsiveness of the suppliers' customer service.
Sustainability (SC1 to SC4)
This criterion assessed the suppliers' adherence to sustainability practices and their environmental impact.
Supplier Development (SD1 to SD4)
These columns evaluated the suppliers' efforts in continuous improvement and development.
To streamline the supplier selection process, we developed a Python script that automates the analysis using a systematic approach. The first step involved data loading, where the dataset was imported into a Pandas DataFrame from an Excel file. This provided a structured format for subsequent analysis. Following this, we performed performance metrics calculation, where a composite score for each supplier was computed based on individual performance metrics. These metrics were assigned specific weights to reflect their relative importance, ensuring a balanced evaluation across all critical factors.
Subsequently, we utilized the DistilGPT-2 model from Hugging Face for text generation. This model was tasked with generating detailed analyses of each supplier using a structured input format derived from the dataset. To ensure the readability and coherence of the generated text, an output cleaning step was implemented. This involved removing unwanted characters and formatting issues.
In the rapidly evolving field of natural language processing (NLP), text generation has emerged as a critical area of research and application. Transformer-based models have been at the forefront of this development, with DistilGPT-2, a compressed version of GPT-2, gaining significant attention for its efficiency and performance. The comparative advantages of DistilGPT-2 in text generation, contextualizing it within the broader landscape of prominent language models such as GPT-2, GPT-3, BERT, and T5.
DistilGPT2 represents a significant advancement in model compression techniques, employing knowledge distillation to create a more efficient version of GPT-2. Sanh et al. (2019) developed DistilGPT2 using a combination of distillation techniques, including soft target probabilities and cosine embedding loss, to transfer knowledge from the larger GPT-2 model to a significantly smaller architecture. This approach has resulted in a model that retains approximately 97% of GPT-2's language understanding capabilities while being 60% faster and 40% smaller in size [19].
The efficiency gains of DistilGPT2 are particularly noteworthy when compared to larger models like GPT-3. While GPT-3 demonstrates superior performance in generating coherent and contextually rich text, its immense size and resource-intensive nature limit its practical deployment in resource-constrained environments[20]. In contrast, DistilGPT2's reduced computational requirements make it a more viable option for a wider range of applications and deployment scenarios.
Empirical evaluations have shown that DistilGPT2 maintains much of the performance of the original GPT-2 model across various language tasks [19]. In the context of text generation, DistilGPT2 demonstrates remarkable proficiency, particularly in shorter text generation tasks, effectively capturing syntactic and semantic nuances. However, it is important to note that larger models like GPT-3 still surpass DistilGPT2 in generating longer, more intricate texts due to their larger model architectures and higher parameter counts, which allow for a deeper understanding of complex contexts [20].
The practical applications of DistilGPT2 are extensive, particularly in scenarios where computational resources are limited or where faster inference times are critical. It has been effectively employed in chatbots, real-time content creation, and automated customer service responses [19]. This versatility is complemented by its cost-effectiveness, as the reduced computational requirements translate to lower operational costs for both training and inference.
In comparison, while models like BERT (Bidirectional Encoder Representations from Transformers) excel in tasks requiring deep contextual understanding and classification, their primary use cases differ from DistilGPT2's focus on efficient text generation [21]. Similarly, while GPT-3 offers superior generative capabilities, its immense computational demands and associated costs make it less feasible for small to medium enterprises [20].
DistilGPT2 represents a significant step forward in making advanced NLP capabilities more accessible and deployable in resource-constrained environments. Its balanced approach to efficiency and performance makes it an attractive choice for many practical applications, particularly where speed, efficiency, and cost are critical considerations. While it may not match the generative prowess of larger models like GPT-3, its impressive efficiency-performance trade-off has inspired further research into model compression techniques for other large language models.
The analyses generated by the DistilGPT-2 model provided comprehensive insights into the performance of each supplier across various dimensions. The composite scores effectively highlighted the strengths and weaknesses of each supplier, thereby facilitating a more informed and strategic decision-making process. The detailed textual analyses produced by the LLM offered nuanced interpretations of the data that would be challenging to achieve through manual analysis. These insights were invaluable in understanding the overall supplier landscape and in making data-driven decisions.
We developed a computational approach to analyze supplier data and generate selection recommendations using natural language processing techniques. The process was implemented in Python, utilizing the pandas and numpy libraries for data manipulation and the transformers library for text generation. We employed the DistilGPT2 model, a lightweight version of GPT-2, for text generation. This model was accessed through the Hugging Face transformers library, configured with a maximum sequence length of 200 tokens and truncation enabled to manage input size. A custom function was created to process individual supplier data. This function calculates mean scores across six key performance categories:
The function generates a prompt incorporating the calculated scores, which is then input into the DistilGPT2 model. The model's output is analyzed for key phrases indicative of selection decisions. Based on this analysis, suppliers are categorized into three groups: selected, not selected, or requiring further evaluation. The decision of the artificial intelligence model used in the proposed approach is in the "Analysis" column, and the processed results are in Table-1. Since it does not affect the commercial identities of the suppliers, only 5 rows containing the information "This supplier can be selected" are included in Table-1.
Table-1. Results of the proposed approach
Supplier Name | Company Name | Analysis |
Vu Thi Hoa Binh | Ha Noi Industrial Textile Joint Stock Company | This supplier can be selected. |
Pham Hung Cuong | Vietnam Bizways Joint Stock Company | This supplier can be selected. |
Pham Van Bac | HUNG YEN GARMENT JOINT STOCK COMPANY NO II | This supplier can be selected. |
Vu Tien Dung | NAM DINH GARMENT JOINT STOCK COMPANY | This supplier can be selected. |
Dinh Duc Cai | Nam Thanh Textile Factory | This supplier can be selected. |
When we analyze the distribution of decision results in our data set, a result as shown in Table-2 emerges in the classification of the suppliers in the data set.
Table-2. Distribution in Analysis Column
No final decision could be made about this supplier. Further evaluation may be required. | 339 |
This supplier can be selected. | 23 |
Table-2 presents a comprehensive analysis of the decision outcomes using artificial intelligence via LLM for the suppliers evaluated in our study. The distribution of results with the majority of suppliers (339) requiring further evaluation, indicating the need for additional data or analysis to reach a conclusive decision. Only a small subset of suppliers (23) was deemed suitable for selection based on the criteria set forth in our evaluation framework. This distribution underscores the importance of thorough and nuanced analysis in supplier selection processes, ensuring that decisions are well-informed and based on robust data-driven insights. The predominance of suppliers needing further evaluation suggests that while our initial analysis is effective in identifying potential candidates, it also emphasizes the complexity and multi-faceted nature of supplier performance assessment.