Nowadays, Big Data, which is a very valuable and popular technology, is being appreciated more and more with the advancement of technology. While many companies around the world perform digital transformation with data, this transformation defines Big Data as the beginning of a new era.
In fact, we have been using the data for years. Databases, inquiries, analyzes are the technologies that all developers and companies use without giving up. But with the advancement of technology and the development of the internet, billions of data have occurred.
- social media shares,
- use of applications in mobile devices,
- the logs we leave on web pages,
- sensor data generated by the Internet of objects
As such, we met a lot of scientific and non-scientific data. This led to an ecosystem that we could never imagine. This is exactly what we call Big Data, not only the size of the data but also collecting, analyzing, storing the data.
Many organizations have created Big Data teams due to this change because the data is not the most, it makes the best sense of data, the best analysis is the strongest.
Social media has an indispensable place in our lives. Millions of people every day on social networking sites exabyte even zettabyte is the size of the transactions. For example; 481k tweets are posted on Twitter every minute, and Google searches 3.7 million searches per minute. For digital leaders, even storing these data means a high cost. However, it is possible to combine this data with the open source distributed file systems in the big data ecosystem to save them at a lower cost and to obtain meaningful results from these data.
Is the data at each exabyte level available in your hand, Big Data? In order for your existing data to be included in the Big Data ecosystem, it must have at least one of the following 5 components (also known as the 5V rule).
Variety: There is no need to have a certain type of data. Pictures can be of many types of data, such as text, log files, audio files. They must be integrated and convertible.
Velocity: The graph above shows the amount of data generated on social media every minute. The data must be obtained very quickly and continuously just like social media data. Of course, the same speed must also be processed.
Volume: The most important requirement for a data to be big data is its size. The size of the data determines the value of the data.
Verification: During the flow of data that is so fast and large, it is necessary to check whether the incoming data is safe, because the storage and subsequent analysis of dirty and corrupted data may lead to extra time and erroneous results.
Value: One of the most important components is the value layer. Once the data has been filtered out of the above components, the analyzed data should provide added value for the company.
Who Uses Big Data?
Contrary to known, the source of big data is not only social media sharing. We talked about the fact that companies create data sources within themselves and make analyzes using them. As examples are much more meaningful, I immediately go to examples.
Government Resources : President of the Republic of Turkey under the auspices of the Digital Transformation Office, which will in large data team has decided to establish. With large data owned by the Republic of Turkey, to analyze data quickly and on a regular basis;
- crime prevention, particularly terrorism and cybercrime,
- fight against traffic jams,
- management and services of institutions
As such, they will use Big Data technologies on a lot of issues.
E-Commerce Leaders: Many e-commerce companies are analyzing your in-house search records, products you look at and products in your cart, and organizing your home page displays for you (dynamically). Aliexpress, one of the world’s largest e-commerce companies, organizes its homepage in this direction.
Sports: In Germany and Spain, teams of different teams compare the performance of the players with the big data and Google Glass to match the previous matches. The player is instructed by the player’s performance to look at the current situation of the match.
Shopping Centers: A chain market combines weather data with customer shopping habits. Thus, on days when the weather is rainy, he discovers that the umbrellas that are placed in the entrance of the store sell more. Shop; rainfall, customer base, the best-selling products, combining the information in the shop unit is changing the layout automatically.
Big Data Technologies
The Big Data ecosystem, which I’ve been trying to explain from the beginning of the essay, is actually the parts you can see on many websites and articles. We are launching a series of articles for analyzing Big Data with more detailed data and developing projects.
In the following weeks, we will analyze the technology on a regular basis and make analyzes. While learning a technology I will try to explain not only its coding, but also what the technology is producing, how it is formed, and its architectural logic.
Technologies we will examine;
- Apache Hadoop (HDFS, MapReduce),
- Apache Pig,
- Apache Hive,
- Apache Spark (Core, Sql, ML)