VOOZH about

URL: https://pubmed.ncbi.nlm.nih.gov/35966392/

⇱ Big data analytics in Cloud computing: an overview - PubMed


Clipboard, Search History, and several other advanced features are temporarily unavailable.
Skip to main page content
👁 Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

👁 Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

Abstract

Big Data and Cloud Computing as two mainstream technologies, are at the center of concern in the IT field. Every day a huge amount of data is produced from different sources. This data is so big in size that traditional processing tools are unable to deal with them. Besides being big, this data moves fast and has a lot of variety. Big Data is a concept that deals with storing, processing and analyzing large amounts of data. Cloud computing on the other hand is about offering the infrastructure to enable such processes in a cost-effective and efficient manner. Many sectors, including among others businesses (small or large), healthcare, education, etc. are trying to leverage the power of Big Data. In healthcare, for example, Big Data is being used to reduce costs of treatment, predict outbreaks of pandemics, prevent diseases etc. This paper, presents an overview of Big Data Analytics as a crucial process in many fields and sectors. We start by a brief introduction to the concept of Big Data, the amount of data that is generated on a daily bases, features and characteristics of Big Data. We then delve into Big Data Analytics were we discuss issues such as analytics cycle, analytics benefits and the movement from ETL to ELT paradigm as a result of Big Data analytics in Cloud. As a case study we analyze Google's BigQuery which is a fully-managed, serverless data warehouse that enables scalable analysis over petabytes of data. As a Platform as a Service (PaaS) supports querying using ANSI SQL. We use the tool to perform different experiments such as average read, average compute, average write, on different sizes of datasets.

Keywords: Analytics; Big data; BigQuery; Cloud computing.

PubMed Disclaimer

Conflict of interest statement

Competing interestsThe authors declare that they have no competing interests.

Figures

👁 Fig. 1
Fig. 1
Volume of data/information created, captured, copied, and consumed worldwide from 2010 to 2024 (estimated) [3]
👁 Fig. 2
Fig. 2
3 V’s of Big Data [6]
👁 Fig. 3
Fig. 3
Examples of the velocity of Big Data [9]
👁 Fig. 4
Fig. 4
Main categories of data variety in Big Data [9]
👁 Fig. 5
Fig. 5
Flow in the processing of Big Data [11]
👁 Fig. 6
Fig. 6
Differences between ETL and ELT [15]
👁 Fig. 7
Fig. 7
BigQuery Interface
👁 Fig. 8
Fig. 8
BigQuery execution details
👁 Fig. 9
Fig. 9
Adding table to the created dataset
👁 Fig. 10
Fig. 10
Average compute time dependence in dataset size
👁 Fig. 11
Fig. 11
Using data studio for data visualization

References

    1. Hillbert M, Lopez P. The world’s technological capacity to store, communicate and compute information. Science. 2011;III:62–65. - PubMed
    1. Hellerstein J. Gigaom Blog. 2019.
    1. Statista . Statista. 2020.
    1. Reinsel D, Gantz J, Rydning J. Data age 2025: the evolution of data to-life critical. Framingham: International Data Corporation; 2017.
    1. Forbes . Forbes. 2020.

LinkOut - more resources

Cite

NCBI Literature Resources

MeSH PMC Bookshelf Disclaimer

The PubMed wordmark and PubMed logo are registered trademarks of the U.S. Department of Health and Human Services (HHS). Unauthorized use of these marks is strictly prohibited.