自然科学版
陕西师范大学学报(自然科学版)
数据挖掘专题
基于带权图的多维大数据模型优化算法
PDF下载 ()
鄂海红*,田川,宋美娜
(北京邮电大学 计算机学院,北京 100876)
鄂海红,女,副教授,博士,主要研究方向为大数据平台、云计算及微服务架构。E-mail:ehaihong@bupt.edu.cn。
摘要:
针对传统的物化视图选择(materialized view selection,MVS)算法评价指标单一(仅评价物化时间,过度追求物化视图的查询命中率)会导致超高维度时的维度灾难以及物化视图集频繁抖动的问题,本文提出了一种基于带权图的多维大数据模型优化算法(multi-dimensional big data model optimization, MMO),通过引入平均查询时延和膨胀率评价指标,基于带权图模型找出物化视图集的最优解。实验结果表明,本文算法在综合评分、平均查询时延、膨胀率方面均优于粒子群算法(particle swarm optimization, PSO),解决了超高维数据下的维度灾难问题,并且能够快速收敛。
关键词:
多维大数据;物化视图选择;视图集抖动;带权图;膨胀率
收稿日期:
2020-11-15
中图分类号:
TP391
文献标识码:
A
文章编号:
1672-4291(2021)01-0022-07
基金项目:
国家重点研发计划(2018YFB1403000);北京市自然科学基金(L191012)
Doi:
Multi-dimensional big data model optimization algorithm based on weighted graph
E Haihong*, TIAN Chuan, SONG Meina
(School of Computer, Beijing University of Posts and Telecommunications,Beijing 100876, China)
Abstract:
To solve the problems of traditional materialized view selection(MVS) algorithms, such as the dimensional disaster in the ultra-high dimension caused by the only evaluation index(only evaluate the materialization time, excessive pursuit of the query hit rate of the materialized view), and the shake of the materialized view set, a multi-dimensional big data model optimization algorithm (MMO) based on the weighted graph is proposed.New evaluation indexes are introduced: average query latency and expansion rate.The optimal solution of the materialized view set based on the weighted graph model is found out. Experimental results show that the algorithm in this paper is better than the PSO algorithm in terms of comprehensive score, average query delay, and expansion rate.The dimensional disaster problem under ultra-high dimensional data is solved, and quickly converge.
KeyWords:
multi-dimensional big data analysis; materialized view selection; shake of the materialized view set; weighted graph; expansion rate