HSC-Bench

HSC+ Dataset

Dataset source, construction pipeline, schema, statistics, task support, download, and citation for HSC+.

Overview

HSC+ is a large-scale AI model service dataset built from Hugging Face model services. It provides unified service metadata, refined functional annotations, QoS measurements, generated user requirements, and executable service workflows for service recommendation and service composition.

Core value

HSC+ is designed as a shared data resource for three connected research problems: service recommendation, service composition, and QoS analysis.

Construction pipeline

1

Hugging Face Model Collection

Collect publicly accessible Hugging Face model services.

2

Endpoint Probing

Check endpoint accessibility and record HTTP status and invocation results.

3

Metadata Extraction

Extract service name, author, description, URL, update time, and input/output parameters.

4

LLM-assisted Annotation

Use large language models to annotate input type, output type, and functional categories.

5

Human Verification

Resolve ambiguous samples through volunteer and domain-expert validation.

6

QoS Invocation

Invoke services repeatedly to collect response time, waiting time, reliability, and successability.

7

Workflow Generation

Generate syntactically valid workflows from input/output transformation constraints.

8

Requirement Generation

Use LLMs to generate natural-language user requirements for verified workflows.

Data schema

FieldDescription
service_nameService or model name.
authorService author or model publisher.
functionFunctional category that the service can satisfy.
descriptionService description from the provider or enriched annotation.
urlModel or service access link.
input_parameterInput data required by the service.
output_parameterOutput returned by the service.
downloadsRecent monthly download count.
likesNumber of user likes.
response_timeTime from sending a request to receiving a response.
waiting_timeModel initialization or loading waiting time.
reliabilityRatio of successful HTTP responses over all requests.
successabilityRatio of valid task completions that return correct usable outputs.

Statistics to publish

Service categories

Number of models under each Hugging Face task category.

Downloads and likes

Popularity distributions showing long-tail service usage patterns.

Loading / waiting time

Average loading time across model types.

Response time

Average, minimum, and maximum response time by service type.

Status code distribution

HTTP 200, 400, 403, 500, 503 and other response status ratios.

Reliability and successability

Distribution of stable successful invocations and valid task completions.

Workflow length

Complexity distribution of generated service workflows.

Requirement length and type

Diversity of natural-language user requirements generated for workflows.

Task support

Recommendation

Requirement-to-service and mashup-to-service matching with Top-K evaluation.

Composition

Input/output compatible workflow generation and QoS-aware optimization.

QoS Analysis

Response time, waiting time, reliability, successability, and status code analysis.

Download and citation

The public dataset link, version number, checksum, and license should be filled in before release.

@inproceedings{hscbench2026,
  title     = {HSC-Bench: A Comprehensive Benchmark for Unified Service Recommendation and Composition Evaluation},
  author    = {TBD},
  booktitle = {TBD},
  year      = {2026}
}