Scientific workflows can be understood as arrangements of managed activities executed by different processing entities. It is a regular Bioinformatics approach applying workflows to solve problems in Molecular Biology, notably those related to sequence analyses. Due to the nature of the raw data and the in silico environment of Molecular Biology experiments, apart from the research subject, 2 practical and closely related problems have been studied: reproducibility and computational environment. Aerospike delivers a next-generation open source NoSQL database that powers some of the world’s leading Web-scale real-time big data driven platforms in digital. When aiming to enhance the reproducibility of Bioinformatics experiments, various aspects should be considered. Aerospike USA Private Aerospike is the company behind the Aerospike open source NoSQL distributed database which has a horizontally scalable high-speed lightweight data layer. The reproducibility requirements comprise the data provenance, which enables the acquisition of knowledge about the trajectory of data over a defined workflow, the settings of the programs, and the entire computational environment. This funding round will be used to develop and expand its graph platform, which provides organizations with a comprehensive solution for managing structured and unstructured data across multiple databases, geographies and clouds. One database, one query language, and three data models. ArangoDB is a leading open source graph database company that recently raised 27.8 million in Series B funding. Considering this specific scenario, we proposed a solution to improve the reproducibility of Bioinformatics workflows in a cloud computing environment using both Infrastructure as a Service (IaaS) and Not only SQL (NoSQL) database systems.Ĭloud computing is a booming alternative that can provide this computational environment, hiding technical details, and delivering a more affordable, accessible, and configurable on-demand environment for researchers. more than 7 million downloads and over 7,000 stargazers on GitHub, ArangoDB is the leading open source native multi-model database. We update the results as we receive pull requests and improvements. Latest NoSQL Performance Test Read the Full Cluster White Paper Our complete test setup is available on GitHub. To meet the goal, we have built 3 typical Bioinformatics workflows and ran them on 1 private and 2 public clouds, using different types of NoSQL database systems to persist the provenance data according to the Provenance Data Model (PROV-DM). We deployed ArangoDB in a cluster with 640 virtual CPU’s and found that it could sustain a write-load of 1.1 million JSON documents, an equivalent of 1GB, per second.
0 Comments
Leave a Reply. |
AuthorWrite something about yourself. No need to be fancy, just an overview. ArchivesCategories |