Portál UTB - Browse IS/STAG

Prohlížení IS/STAG (S025)

Main menu for Browse IS/STAG

Search for a Thesis

Print/export: Bookmark this link in your browser so that you may quickly load this IS/STAG page in the future.

Only logged-in user will see student personal numbers.

Dates found, count: 1

Search result paging

Found 1 records Print Export to xls List URL

Surname	Name	Title	Thesis status		Supervisors	Reviewers	Type of thesis	Date of def.	Title
Student	Type of thesis	-	-	-	-	-	-	-	-	-	-
Hanzlík	Roman	Towards Data Science in Apache Spark Towards Data Science in Apache Spark			Šenkeřík Roman	-	Master's thesis	06.09.2021	Towards Data Science in Apache Spark
Roman Hanzlík	Master's thesis	0XX	0XX	0XX	0XX	0XX	0XX	0XX	0XX	0XX	0XX

Thesis info Data science v prostředí Apache Spark

Basic data

Annotation
The document you are accessing is protected by copyright law. Unauthorised use may lead to criminal sanctions.
Name	Hanzlík Roman
Acad. Yr.	2020/2021
Assigning department	AUIUI
Date of defence	Sep 6, 2021
Type of thesis	Master's thesis
Thesis status	Thesis finished and defended successfully (DUO).
Completeness of mandatory entries	- All mandatory fields for this Thesis are filled in.
Main topic	Data science v prostředí Apache Spark
Main topic in English	Towards Data Science in Apache Spark
Title according to student	Data science v prostředí Apache Spark
English title as given by the student	Towards Data Science in Apache Spark
Parallel name	-
Subtitle	-
Thesis supervisor	Šenkeřík Roman, prof. Ing. Ph.D.
Annotation	Tato diplomová práce představuje téma Data Science jako nový fenomén v oblasti počítačového zpracování dat. Hlavním cílem této práce je poskytnout prvotní náhled do problematiky Data Science a v krátkosti představit její dílčí oblasti se zaměřením na Big Data a Machine Learning jako dva pilíře, které hrají v posledních letech primární úlohu v rychle se měnící době, zejména v oblasti informačních technologií, což je odvětví, které zásadním způsobem zasahuje snad už do všech oblastí lidské činnosti. Teoretická část nejprve podává přehled historie zpracování dat a informací a představuje faktory, které vedly k potřebě nového přístupu ve zpracování dat. Značná část je věnována představení metodik v oblasti zpracování dat. Neodmyslitelnou součástí je samotná definice Data Science a jejich základních komponent, Big Data včetně datového inženýrství a přehled možností a typů analýz dat. Praktická část popisuje základní koncepty Apache Spark vč. několika možností instalací jako jsou on-premise či in-cloud. Dále se zaměřuje na představení možnosti Apache Spark v rámci jeho základních komponent přímo na reálných případech použití s využitím některých veřejně dostupných datových sad. Součástí práce je sada ukázkových příkladů s funkčními řádky kódů, které demonstrují využití dané technologie.
Annotation in English	This master thesis introduces the topic of Data Science as a new phenomenon in the field of computer data processing. The main objective of this thesis is to provide an initial insight into the area of Data Science and to briefly introduce its sub-areas, focusing on Big Data and Machine Learning as two pillars that have played a primary role in recent years in a rapidly changing era, especially in the field of information technology, an industry that has already fundamentally affected perhaps all areas of human activity. The theoretical part first gives an overview of the history of data and information processing and presents the factors that led to the need for a new approach in data processing. A significant part is devoted to introducing methodologies in data processing. An essential part is the actual definition of Data Science and its basic components, Big Data including data engineering and a review of the possibilities and types of data analysis. The practical part describes the basic concepts of Apache Spark including several installation options such as on-premise or in-cloud. It also focuses on presenting the capabilities of Apache Spark within its core components directly on real use cases using some of the publicly available datasets. This paper includes a set of sample examples with working lines of code that demonstrate the use of the technology.
Keywords	Data, Data Science, Data Engineering, Big Data, Machine Learning, Data Mining, Matematika, Statistika, Analýza, DLM, CRISP-DM, DSMM, Apache Spark
Keywords in English	Data, Data Science, Data Engineering, Big Data, Machine Learning, Data Mining, Mathematics, Statistics, Analytics, Analysis, DLM, CRISP-DM, DSMM, Apache Spark
Length of the covering note	145 s. (233 617 znaků)
Language	CZ
Tato diplomová práce představuje téma Data Science jako nový fenomén v oblasti počítačového zpracování dat. Hlavním cílem této práce je poskytnout prvotní náhled do problematiky Data Science a v krátkosti představit její dílčí oblasti se zaměřením na Big Data a Machine Learning jako dva pilíře, které hrají v posledních letech primární úlohu v rychle se měnící době, zejména v oblasti informačních technologií, což je odvětví, které zásadním způsobem zasahuje snad už do všech oblastí lidské činnosti. Teoretická část nejprve podává přehled historie zpracování dat a informací a představuje faktory, které vedly k potřebě nového přístupu ve zpracování dat. Značná část je věnována představení metodik v oblasti zpracování dat. Neodmyslitelnou součástí je samotná definice Data Science a jejich základních komponent, Big Data včetně datového inženýrství a přehled možností a typů analýz dat. Praktická část popisuje základní koncepty Apache Spark vč. několika možností instalací jako jsou on-premise či in-cloud. Dále se zaměřuje na představení možnosti Apache Spark v rámci jeho základních komponent přímo na reálných případech použití s využitím některých veřejně dostupných datových sad. Součástí práce je sada ukázkových příkladů s funkčními řádky kódů, které demonstrují využití dané technologie.
Annotation in English
This master thesis introduces the topic of Data Science as a new phenomenon in the field of computer data processing. The main objective of this thesis is to provide an initial insight into the area of Data Science and to briefly introduce its sub-areas, focusing on Big Data and Machine Learning as two pillars that have played a primary role in recent years in a rapidly changing era, especially in the field of information technology, an industry that has already fundamentally affected perhaps all areas of human activity. The theoretical part first gives an overview of the history of data and information processing and presents the factors that led to the need for a new approach in data processing. A significant part is devoted to introducing methodologies in data processing. An essential part is the actual definition of Data Science and its basic components, Big Data including data engineering and a review of the possibilities and types of data analysis. The practical part describes the basic concepts of Apache Spark including several installation options such as on-premise or in-cloud. It also focuses on presenting the capabilities of Apache Spark within its core components directly on real use cases using some of the publicly available datasets. This paper includes a set of sample examples with working lines of code that demonstrate the use of the technology.
Keywords
Data, Data Science, Data Engineering, Big Data, Machine Learning, Data Mining, Matematika, Statistika, Analýza, DLM, CRISP-DM, DSMM, Apache Spark
Keywords in English
Data, Data Science, Data Engineering, Big Data, Machine Learning, Data Mining, Mathematics, Statistics, Analytics, Analysis, DLM, CRISP-DM, DSMM, Apache Spark
Research Plan	Zpracujte literární rešerši na dané téma. Proveďte popis základních komponent Data Science. Popište prostředí Apache Spark pro distribuované výpočty. Vytvořte funkční sady demonstračních příkladů pro prostředí Apache Spark na různých datasetech. Proveďte celkové zhodnocení a závěr.
Research Plan
Zpracujte literární rešerši na dané téma. Proveďte popis základních komponent Data Science. Popište prostředí Apache Spark pro distribuované výpočty. Vytvořte funkční sady demonstračních příkladů pro prostředí Apache Spark na různých datasetech. Proveďte celkové zhodnocení a závěr.
Recommended resources	Data science & big data analytics: discovering, analyzing, visualizing and presenting data. Indianapolis: Wiley, [2015], xviii, 410 s. ISBN 9781118876138. GRUS, Joel. Data science from scratch. Sebastopol: O'Reilly, 2015, xvi, 311 s. ISBN 9781491901427. OJEDA, Tony, Sean Patrick MURPHY, Benjamin BENGFORT a Abhijit DASGUPTA. Practical data science cookbook: 89 hands-on recipes to help you complete real-world data science projects in R and Python. Birmingham: Packt Publishing, 2014, 380 s. ISBN 9781783980246. MILES, Matthew B., A. M. HUBERMAN a Johnny SALDA ΝA. Qualitative data analysis: a methods sourcebook. Fourth edition. Los Angeles: SAGE, [2020], xxi, 380 s. ISBN 9781544371856. KARAU, Holden, Andy KONWINSKI, Patrick WENDELL a Matei ZAHARIA. Learning Spark. Sebastopol: O'Reilly, 2015, xvi, 256 s. ISBN 9781449358624. RYZA, Sandy, Uri LASERSON, Sean OWEN a Josh WILLS. Advanced analytics with Spark. Beijing: O'Reilly, 2015, xii, 260 s. ISBN 9781491912768. DORSEY, Richard. Data analytics. [CreateSpace Independent Publishing Platform], [2017], 67 s. ISBN 9781547089291. ANKAM, Venkat. Big data analytics: a handy reference guide for data analysts and data scientists to help to obtain value from big data analytics using Spark on Hadoop clusters. Birmingham: Packt, 2016, xv, 300 s. ISBN 9781785884696.
Recommended resources
Data science & big data analytics: discovering, analyzing, visualizing and presenting data. Indianapolis: Wiley, [2015], xviii, 410 s. ISBN 9781118876138. GRUS, Joel. Data science from scratch. Sebastopol: O'Reilly, 2015, xvi, 311 s. ISBN 9781491901427. OJEDA, Tony, Sean Patrick MURPHY, Benjamin BENGFORT a Abhijit DASGUPTA. Practical data science cookbook: 89 hands-on recipes to help you complete real-world data science projects in R and Python. Birmingham: Packt Publishing, 2014, 380 s. ISBN 9781783980246. MILES, Matthew B., A. M. HUBERMAN a Johnny SALDA ΝA. Qualitative data analysis: a methods sourcebook. Fourth edition. Los Angeles: SAGE, [2020], xxi, 380 s. ISBN 9781544371856. KARAU, Holden, Andy KONWINSKI, Patrick WENDELL a Matei ZAHARIA. Learning Spark. Sebastopol: O'Reilly, 2015, xvi, 256 s. ISBN 9781449358624. RYZA, Sandy, Uri LASERSON, Sean OWEN a Josh WILLS. Advanced analytics with Spark. Beijing: O'Reilly, 2015, xii, 260 s. ISBN 9781491912768. DORSEY, Richard. Data analytics. [CreateSpace Independent Publishing Platform], [2017], 67 s. ISBN 9781547089291. ANKAM, Venkat. Big data analytics: a handy reference guide for data analysts and data scientists to help to obtain value from big data analytics using Spark on Hadoop clusters. Birmingham: Packt, 2016, xv, 300 s. ISBN 9781785884696.
Týká se praxe	No
Enclosed appendices	-
Appendices bound in thesis	-
Taken from the library	No
Full text of the thesis
Appendices
Reviewer's report
Supervisor's report
Defence procedure record file