HSC-Bench
Task definition, datasets, models, metrics, and results for unified Top-K service recommendation evaluation.
Service recommendation takes a user requirement, mashup description, service library, service tags, service descriptions, and optional QoS attributes as input. The output is a Top-K candidate service list. HSC-Bench documents candidate-set scope, train/validation/test split strategy, and recommendation generation logic so that models are compared under the same protocol.
| Dataset | Task Support | Domain | Main Fields | Page Notes |
|---|---|---|---|---|
| ProgrammableWeb | Service Recommendation / Service Composition | Web API | Mashup, API, invocation relations, tags, descriptions | Classic Web API recommendation and composition dataset; platform availability may limit reproducibility. |
| HSC | Service Recommendation / Service Composition | AI Model Service | AI model services, service workflows, QoS, function tags | Hugging Face based AI service composition dataset. |
| HSC+ | Service Recommendation / Service Composition / QoS | AI Model Service | Function tags, input/output parameters, QoS, requirements, workflows | Core dataset of HSC-Bench for unified service computing evaluation. |
| MovieLens | General Recommendation Baseline | Recommender System | Users, items, ratings | Used to validate generalization of recommendation baselines. |
| Amazon | Cross-domain Recommendation Baseline | E-commerce | Users, products, reviews, interactions | Used for cross-domain recommendation baseline comparison. |
Simple frequency baseline for interpretable and reproducible recommendation.
CodeCollaborative filtering baseline based on similar mashups or users.
CodeLatent factor model for service invocation interactions.
CodeGraph structure learning method for mashup service recommendation.
CodeText-to-text generation baseline for candidate service generation or reranking.
CodePre-trained service representation model for matching and reranking.
CodeLarge language model reranking over retrieved candidate services.
CodeFraction of Top-K recommended services that match ground-truth services.
Fraction of ground-truth services covered by the Top-K list. This is important when a mashup has multiple target services.
Harmonic mean of Precision@K and Recall@K for balanced comparison.
Ranking-sensitive metric that rewards correct services appearing earlier in the list.
Mean reciprocal rank of the first correct recommendation, emphasizing early hits.
Use the filters below to inspect the CSV-backed static leaderboard. Replace TBD values after final experiments are available.
| Model | Dataset | Type | P@5 | P@10 | R@5 | R@10 | NDCG@5 | NDCG@10 | MRR | Code | Official | Unified Split |
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| SRLCF | HSC+ | Benchmark Model | TBD | TBD | TBD | TBD | TBD | TBD | TBD | Link | Yes | Yes |
| MTFM | HSC+ | Neural | TBD | TBD | TBD | TBD | TBD | TBD | TBD | Link | Planned | Yes |
| GSAT | HSC+ | Graph-based | TBD | TBD | TBD | TBD | TBD | TBD | TBD | Link | Planned | Yes |
| GSL-Mash | ProgrammableWeb | Graph-based | TBD | TBD | TBD | TBD | TBD | TBD | TBD | Link | Planned | TBD |
| Frequency | HSC+ | Traditional | TBD | TBD | TBD | TBD | TBD | TBD | TBD | Link | Yes | Yes |
| LLM-based reranking | HSC+ | LLM-based | TBD | TBD | TBD | TBD | TBD | TBD | TBD | Link | Planned | Yes |