HSC-Bench
Dataset source, construction pipeline, schema, statistics, task support, download, and citation for HSC+.
HSC+ is a large-scale AI model service dataset built from Hugging Face model services. It provides unified service metadata, refined functional annotations, QoS measurements, generated user requirements, and executable service workflows for service recommendation and service composition.
HSC+ is designed as a shared data resource for three connected research problems: service recommendation, service composition, and QoS analysis.
Collect publicly accessible Hugging Face model services.
Check endpoint accessibility and record HTTP status and invocation results.
Extract service name, author, description, URL, update time, and input/output parameters.
Use large language models to annotate input type, output type, and functional categories.
Resolve ambiguous samples through volunteer and domain-expert validation.
Invoke services repeatedly to collect response time, waiting time, reliability, and successability.
Generate syntactically valid workflows from input/output transformation constraints.
Use LLMs to generate natural-language user requirements for verified workflows.
| Field | Description |
|---|---|
service_name | Service or model name. |
author | Service author or model publisher. |
function | Functional category that the service can satisfy. |
description | Service description from the provider or enriched annotation. |
url | Model or service access link. |
input_parameter | Input data required by the service. |
output_parameter | Output returned by the service. |
downloads | Recent monthly download count. |
likes | Number of user likes. |
response_time | Time from sending a request to receiving a response. |
waiting_time | Model initialization or loading waiting time. |
reliability | Ratio of successful HTTP responses over all requests. |
successability | Ratio of valid task completions that return correct usable outputs. |
Number of models under each Hugging Face task category.
Popularity distributions showing long-tail service usage patterns.
Average loading time across model types.
Average, minimum, and maximum response time by service type.
HTTP 200, 400, 403, 500, 503 and other response status ratios.
Distribution of stable successful invocations and valid task completions.
Complexity distribution of generated service workflows.
Diversity of natural-language user requirements generated for workflows.
Requirement-to-service and mashup-to-service matching with Top-K evaluation.
Input/output compatible workflow generation and QoS-aware optimization.
Response time, waiting time, reliability, successability, and status code analysis.
The public dataset link, version number, checksum, and license should be filled in before release.
@inproceedings{hscbench2026,
title = {HSC-Bench: A Comprehensive Benchmark for Unified Service Recommendation and Composition Evaluation},
author = {TBD},
booktitle = {TBD},
year = {2026}
}