Thursday, July 25, 2013

Cloudera Sentry Overview | Cloudera Sentry Basics


Cloudera has introduced Sentry, a new Apache licensed open source project that provides what it calls the first "fine-grained authorization framework" for Hadoop.

Sentry is an independent security module that integrates with open source SQL query engines Apache Hive and Cloudera Impala, providing advanced authorization controls to enable multi-user applications and cross-functional processes for enterprise datasets.
Cloudera says this level of granular control is essential to meet enterprise Role Based Access Control requirements of highly regulated industries, such as healthcare, financial services and government.
Sentry alleviates the security concerns that have prevented some organizations from opening Hadoop data systems to a more diverse set of users, extending the power of Hadoop and making it suitable for new industries, organizations and enterprise use cases.
The company says it plans to submit the Sentry security module to the Apache Incubator at the Apache Software Foundation later this year.
For data safeguards to be deemed compliant with standard data regulatory requirements, there are four functional areas of information security that must be achieved, including perimeter, data, access, and visibility.
Perimeter relates to guarding access to the cluster itself through network security, firewalls and ultimately, authentication to confirm user identities; protecting the data in the cluster from unauthorized visibility through masking and encryption, both at rest and in transit; access in regards to defining what authenticated users and applications can do with the data in the cluster through file system ACLs and fine-grained authorization; and visibility in terms of reporting on the origins of data and on data usage through centralized auditing and lineage capabilities.
Recent developments by the Hadoop community, as well as integration with solution providers, have addressed the perimeter and data elements through authentication, encryption and masking.
The release of Cloudera Navigator earlier this year brought Visibility to Hadoop with centralized auditing for files, records and metadata.
As a fine-grained authorization solution for Apache Hadoop, Sentry gives database administrators holistic, granular user access control that addresses the limitations of previous solutions.
Features of the Sentry security module include secure authorization, which enables administrator to prevent authenticated users from accessing data and/or having privileges on data, fine-grained authorization that grants Hadoop administrators unprecedented, comprehensive and precise control to specify user access rights to subsets of data within a database, role-based authorization which simplifies permissions management by allowing administrators to create and assign templatized privileges based on functional roles, and multi-tenant administration which empowers central administrators to deputize individual administrators to manage security settings for each separate database or schema.
Cloudera has worked closely with the open source community to expand Hadoop’s security capabilities, including the improved security features in a new HiveServer2 release, which delivers concurrency and Kerberos-based authentication for Hadoop.

Prerequisites
Sentry depends on an underlying authentication framework to reliably identify the requesting user. It requires:
• CDH4.3.0 or later.
• HiveServer2 with strong authentication (Kerberos or LDAP).
• A secure Hadoop cluster.
This is to prevent a user bypassing the authorization and gaining direct access to the underlying data.
In addition, make sure that the following are true:
• The Hive warehouse directory (/user/hive/warehouse or any path you specify as
hive.metastore.warehouse.dir in your hive-site.xml) must be owned by the Hive user.
– Permissions on the warehouse directory must be set as follows:
– 777 on the directory itself (for example, /user/hive/warehouse)
– 750 on all subdirectories (for example, /user/hive/warehouse/mysubdir)
For example:
$ sudo hdfs hdfs -chmod 777 /user/hive/warehouse
$ sudo hdfs hdfs -chmod 750 /user/hive/warehouse/*
Important: These instructions override the recommendations in the Hive section of the CDH4
Installation Guide.
• If you used Cloudera Manager to set up HiveServer2, turn off HiveServer2 impersonation in Cloudera Manager.
Note: You should not need HiveServer2 impersonation because Sentry provides fine-grained
access control. But if you still want to use HiveServer2 impersonation for some reason, you can
do so by configuring it manually in the Sentry Configuration File on page 10, sentry-site.xml:
sentry.allow.hive.impersonation
true
• The Hive user must be able to submit MapReduce jobs. You can ensure that this is true by setting the
minimum user ID for job submission to 0. Set this value in Cloudera Manager under MapReduce Properties,
or (if you are not using Cloudera Manager) edit the taskcontroller.cfg file and set min.user.id=0.

Roles and Privileges
Sentry uses a role-based privilege model. A role is a collection of rules for accessing a given Hive object. The
objects supported in the current release are server, database, table, and URI. Access to each object is governed
by privileges: Select, Insert, or All.
Note: All is not supported explicitly in the table scope; you have to specify Select and Insert
explicitly.
For example, a rule for the Select privilege on table customers from database sales would be formulated as
follows:
server=server1->db=sales->table=customer->action=Select
Each object must be specified as a hierarchy of the containing objects, from server to table, followed by the
privilege granted for that object. A role can contain multiple such rules, separated by commas. For example a
role might contain the Select privilege for the customer and items tables in the sales database, and the
Insert privilege for the sales_insights table in the reports database. You would specify this as follows:
sales_reporting =
\server=server1->db=sales->table=customer->action=Select,
\server=server1->db=sales->table=items>action=Select,

\server=server1->db=reports->table=sales_insights>action=Insert

Users and Groups
• A user is an entity that is permitted by the authentication subsystem to access the Hive service. This entity
can be a Kerberos principal, an LDAP userid, or an artifact of some other pluggable authentication system
supported by HiveServer2.
• A group connects the authentication system with the authorization system. It is a collection of one or more
users who have been granted one or more authorization roles. Sentry allows a set of roles to be configured
for a group.
• A configured group provider determines a user’s affiliation with a group. The current release supports
HDFS-backed groups and locally configured groups. For example,
analyst = sales_reporting, data_export, audit_report
Here the group analyst is granted the roles sales_reporting, data_export, and audit_report. The members
of this group can run the HiveQL statements that are allowed by these roles. If this is an HDFS-backed group,
then all the users belonging to the HDFS group analyst can run such queries.
User to Group Mapping
You can configure Sentry to use either Hadoop groups or groups defined in the policy file.
Important: You can use either Hadoop groups or local groups, but not both at the same time.
To configure Hadoop groups:
Set the sentry.provider property in sentry-site.xml to
org.apache.sentry.provider.file.HadoopGroupResourceAuthorizationProvider.
OR
To configure local groups:
Define local groups in a [users] section of the Sentry Configuration File on page 10, sentry-site.xml. For
example:
[users]
user1 = group1, group2, group3

user2 = group2, group3
Installing Sentry
1. To download Sentry, go to the Sentry Version and Download Information page.
2. Install Sentry as follows, depending on your operating system:
• On Red Hat and similar systems:
$ sudo yum install sentry
• On SLES systems:
$ sudo zypper install sentry
• On Ubuntu and Debian systems:

sudo apt-get update; sudo apt-get install sentry

No comments:

Popular Posts