Spark requires that you have set POSIX compatible permissions for a temporary directory used by the Hive metastore, which defaults to C:\tmp\hive (the location of this can be changed as described here ). If Spark cannot find the required service executable, WinUtils.exe, it will throw a warning as below, but will proceed to try and run the Spark shell. This allows management of the POSIX file system permissions that the HDFS file system requires of the local file system. In order to run Apache Spark locally, it is required to use an element of the Hadoop code base known as ‘WinUtils’. It is intended for an audience unfamiliar with building C projects, and as such seasoned C developers will no doubt want to skip some of the ‘hand-holding’ steps. This post serves to supplement the main thread of the series on Development on Databricks, making a stop at C world (don’t panic!) as we handle the situation where you are required to build your own WinUtils executable for use with Spark. For this to happen however, you’ll need to have an executable file called winutils.exe. The option of setting up a local spark environment on a Windows build, whether for developing spark applications, running CI/CD activities or whatever, brings many benefits for productivity and cost reduction.
0 Comments
Leave a Reply. |
AuthorWrite something about yourself. No need to be fancy, just an overview. ArchivesCategories |