@raghujuluri: July 2013

Thursday, July 25, 2013

Cloudera Sentry Overview | Cloudera Sentry Basics

Cloudera has introduced Sentry, a new Apache licensed open source project that provides what it calls the first "fine-grained authorization framework" for Hadoop.

Sentry is an independent security module that integrates with open source SQL query engines Apache Hive and Cloudera Impala, providing advanced authorization controls to enable multi-user applications and cross-functional processes for enterprise datasets.

Cloudera says this level of granular control is essential to meet enterprise Role Based Access Control requirements of highly regulated industries, such as healthcare, financial services and government.

Sentry alleviates the security concerns that have prevented some organizations from opening Hadoop data systems to a more diverse set of users, extending the power of Hadoop and making it suitable for new industries, organizations and enterprise use cases.

The company says it plans to submit the Sentry security module to the Apache Incubator at the Apache Software Foundation later this year.

For data safeguards to be deemed compliant with standard data regulatory requirements, there are four functional areas of information security that must be achieved, including perimeter, data, access, and visibility.

Perimeter relates to guarding access to the cluster itself through network security, firewalls and ultimately, authentication to confirm user identities; protecting the data in the cluster from unauthorized visibility through masking and encryption, both at rest and in transit; access in regards to defining what authenticated users and applications can do with the data in the cluster through file system ACLs and fine-grained authorization; and visibility in terms of reporting on the origins of data and on data usage through centralized auditing and lineage capabilities.

Recent developments by the Hadoop community, as well as integration with solution providers, have addressed the perimeter and data elements through authentication, encryption and masking.

The release of Cloudera Navigator earlier this year brought Visibility to Hadoop with centralized auditing for files, records and metadata.

As a fine-grained authorization solution for Apache Hadoop, Sentry gives database administrators holistic, granular user access control that addresses the limitations of previous solutions.

Features of the Sentry security module include secure authorization, which enables administrator to prevent authenticated users from accessing data and/or having privileges on data, fine-grained authorization that grants Hadoop administrators unprecedented, comprehensive and precise control to specify user access rights to subsets of data within a database, role-based authorization which simplifies permissions management by allowing administrators to create and assign templatized privileges based on functional roles, and multi-tenant administration which empowers central administrators to deputize individual administrators to manage security settings for each separate database or schema.

Cloudera has worked closely with the open source community to expand Hadoop’s security capabilities, including the improved security features in a new HiveServer2 release, which delivers concurrency and Kerberos-based authentication for Hadoop.

Prerequisites

Sentry depends on an underlying authentication framework to reliably identify the requesting user. It requires:

• CDH4.3.0 or later.

• HiveServer2 with strong authentication (Kerberos or LDAP).

• A secure Hadoop cluster.

This is to prevent a user bypassing the authorization and gaining direct access to the underlying data.

In addition, make sure that the following are true:

• The Hive warehouse directory (/user/hive/warehouse or any path you specify as

hive.metastore.warehouse.dir in your hive-site.xml) must be owned by the Hive user.

– Permissions on the warehouse directory must be set as follows:

– 777 on the directory itself (for example, /user/hive/warehouse)

– 750 on all subdirectories (for example, /user/hive/warehouse/mysubdir)

For example:

$ sudo hdfs hdfs -chmod 777 /user/hive/warehouse

$ sudo hdfs hdfs -chmod 750 /user/hive/warehouse/*

Important: These instructions override the recommendations in the Hive section of the CDH4

Installation Guide.

• If you used Cloudera Manager to set up HiveServer2, turn off HiveServer2 impersonation in Cloudera Manager.

Note: You should not need HiveServer2 impersonation because Sentry provides fine-grained

access control. But if you still want to use HiveServer2 impersonation for some reason, you can

do so by configuring it manually in the Sentry Configuration File on page 10, sentry-site.xml:

sentry.allow.hive.impersonation

true

• The Hive user must be able to submit MapReduce jobs. You can ensure that this is true by setting the

minimum user ID for job submission to 0. Set this value in Cloudera Manager under MapReduce Properties,

or (if you are not using Cloudera Manager) edit the taskcontroller.cfg file and set min.user.id=0.

Roles and Privileges
Sentry uses a role-based privilege model. A role is a collection of rules for accessing a given Hive object. The
objects supported in the current release are server, database, table, and URI. Access to each object is governed
by privileges: Select, Insert, or All.
Note: All is not supported explicitly in the table scope; you have to specify Select and Insert
explicitly.
For example, a rule for the Select privilege on table customers from database sales would be formulated as
follows:
server=server1->db=sales->table=customer->action=Select
Each object must be specified as a hierarchy of the containing objects, from server to table, followed by the
privilege granted for that object. A role can contain multiple such rules, separated by commas. For example a
role might contain the Select privilege for the customer and items tables in the sales database, and the
Insert privilege for the sales_insights table in the reports database. You would specify this as follows:
sales_reporting =
\server=server1->db=sales->table=customer->action=Select,
\server=server1->db=sales->table=items>action=Select,

\server=server1->db=reports->table=sales_insights>action=Insert

Users and Groups
• A user is an entity that is permitted by the authentication subsystem to access the Hive service. This entity
can be a Kerberos principal, an LDAP userid, or an artifact of some other pluggable authentication system
supported by HiveServer2.
• A group connects the authentication system with the authorization system. It is a collection of one or more
users who have been granted one or more authorization roles. Sentry allows a set of roles to be configured
for a group.
• A configured group provider determines a user’s affiliation with a group. The current release supports
HDFS-backed groups and locally configured groups. For example,
analyst = sales_reporting, data_export, audit_report
Here the group analyst is granted the roles sales_reporting, data_export, and audit_report. The members
of this group can run the HiveQL statements that are allowed by these roles. If this is an HDFS-backed group,
then all the users belonging to the HDFS group analyst can run such queries.
User to Group Mapping
You can configure Sentry to use either Hadoop groups or groups defined in the policy file.
Important: You can use either Hadoop groups or local groups, but not both at the same time.
To configure Hadoop groups:
Set the sentry.provider property in sentry-site.xml to
org.apache.sentry.provider.file.HadoopGroupResourceAuthorizationProvider.
OR
To configure local groups:
Define local groups in a [users] section of the Sentry Configuration File on page 10, sentry-site.xml. For
example:
[users]
user1 = group1, group2, group3

user2 = group2, group3
Installing Sentry
1. To download Sentry, go to the Sentry Version and Download Information page.
2. Install Sentry as follows, depending on your operating system:
• On Red Hat and similar systems:
$ sudo yum install sentry
• On SLES systems:
$ sudo zypper install sentry
• On Ubuntu and Debian systems:

sudo apt-get update; sudo apt-get install sentry

Friday, July 19, 2013

PATCH : HTTP Method RFC 5789

"In a PUT request, the enclosed entity is considered to be a modified version of the resource stored on the origin server, and the client is requesting that the stored version be replaced. With PATCH, however, the enclosed entity contains a set of instructions describing how a resource currently residing on the origin server should be modified to produce a new version."

HTTP Patch Method is used to update the JSON Resource efficiently.

JSON Patch

JSON Example :

{
"name": "abc123",
"colour": "blue",
"count": 4
}

and you want to update the “count” member’s value to “5”.
Now, you could just PUT the entire thing back with the updated value, but that requires a recent GET of its state, can get heavyweight (especially for mobile clients),

For these and other reasons, many APIs define a convention for POSTing to resources that allows partial updates. E.g.

POST /widgets/abc123?action=incrementCount

PATCH /widgets/abc123 HTTP/1.1
Host: api.example.com
Content-Length: ...
Content-Type: application/json-patch

[
{"replace": "/count", "value": 5}

]

Easy to understand, and even write by hand. If it succeeds, the response can be as simple as:

HTTP/1.1 200 OK
Content-Type: text/plain
Connection: close

Your patch succeeded. Yay!

Friday, July 05, 2013

OPEN LDAP STEP BY STEP INSTALLATION ON LINUX

PREREQUISITES:

Download Berkeley DB (db-4.8.30.NC.tar.gz) from Following Link

http://www.oracle.com/technetwork/products/berkeleydb/downloads/index-082944.

Sudo su

root$ mkdir /usr/local/BerkelyDB4.8

cd /usr/local/BerkelyDB4.8/

chown -R rjuluri:dba /usr/local/BerkelyDB4.8/

tar xvf db-4.8.30.NC.tar.gz

cd db-4.8.30.NC

cd build_unix

INSTALL BERKELY DB: LINK FOR INSTALLATION

http://download.oracle.com/docs/cd/E17076_02/html/installation/build_unix.html

$ ../dist/configure

$make

$ make install

Exit the root

INSTALLATION OF BERKELY DB IS COMPLETED, NOW INSTALL OPENLDAP

Get the software
You can obtain a copy of the software by following the instructions on the OpenLDAP download page (http://www.openldap.org/software/download/). It is recommended that new users start with the latest release.

tar xvf openldap*.gz

cd /scratch/rjuluri/openldap-2.4.35/

CPPFLAGS="-I/usr/local/include -I/usr/local/BerkeleyDB.4.8/include" LDFLAGS="-L/usr/local/lib -L/usr/local/BerkeleyDB.4.8/lib -R/usr/local/lib -R/usr/local/BerkeleyDB.4.8/lib -R/usr/local/ssl/lib" LD_LIBRARY_PATH="/usr/local/BerkeleyDB.4.8/lib" ./configure --prefix=/scratch/rjuluri/openldap-2.4.35

make depend

make

make test

sudo su (root)

make install

Added these lines to /scratch/rjuluri/openldap-2.4.35/etc/openldap/slapd.conf

include /scratch/rjuluri/openldap-2.4.35/etc/openldap/schema/cosine.schema

include /scratch/rjuluri/openldap-2.4.35/etc/openldap/schema/inetorgperson.schema

include /scratch/rjuluri/openldap-2.4.35/etc/openldap/schema/nis.schema

Edit the configuration file.
Use your favorite editor to edit the provided slapd.conf(5) example (usually installed as /usr/local/etc/openldap/slapd.conf) to contain a BDB database definition of the form:

database bdb
suffix "dc=,dc="
rootdn "cn=Manager,dc=,dc="
rootpw secret
directory /usr/local/var/openldap-data

Be sure to replace and with the appropriate domain components of your domain name. For example, for example.com, use:

database bdb
suffix "dc=example,dc=com"
rootdn "cn=Manager,dc=example,dc=com"
rootpw secret
directory /usr/local/var/openldap-data

START OPENLDAP:

You are now ready to start the stand-alone LDAP server, slapd(8), by running the command:

su root -c /scratch/rjuluri/openldap-2.4.35/libexec/slapd

To check to see if the server is running and configured correctly, you can run a search against it with ldapsearch(1). By default, ldapsearch is installed as /scratch/rjuluri/openldap-2.4.35/ bin/ldapsearch:

ldapsearch -x -b '' -s base '(objectclass=*)' namingContexts

Note the use of single quotes around command parameters to prevent special characters from being interpreted by the shell. This should return:

dn:
namingContexts: dc=example,dc=com

vi example.ldif

## DEFINE DIT ROOT/BASE/SUFFIX ####

## uses RFC 2377 format

## replace example and com as necessary below

## or for experimentation leave as is

## dcObject is an AUXILLIARY objectclass and MUST

## have a STRUCTURAL objectclass (organization in this case)

# this is an ENTRY sequence and is preceded by a BLANK line

dn: dc=example,dc=com

dc: example

description: My wonderful company as much text as you want to place

in this line up to 32K continuation data for the line above must

have <CR> or <CR><LF> i.e. ENTER works

on both Windows and *nix system - new line MUST begin with ONE SPACE

objectClass: dcObject

objectClass: organization

o: Example, Inc.

## FIRST Level hierarchy - people

## uses mixed upper and lower case for objectclass

# this is an ENTRY sequence and is preceded by a BLANK line

dn: ou=people, dc=example,dc=com

ou: people

description: All people in organisation

objectclass: organizationalunit

## SECOND Level hierarchy

## ADD a single entry under FIRST (people) level

# this is an ENTRY sequence and is preceded by a BLANK line

# the ou: Human Resources is the department name

dn: cn=Robert Smith,ou=people,dc=example,dc=com

objectclass: inetOrgPerson

cn: Robert Smith

cn: Robert J Smith

cn: bob smith

sn: smith

uid: rjsmith

userpassword: rJsmitH

carlicense: HISCAR 123

homephone: 555-111-2222

mail: r.smith@example.com

mail: rsmith@example.com

mail: bob.smith@example.com

description: swell guy

ou: Human Resources

#######################################################################

./ldapadd -x -D "cn=Manager,dc=example,dc=com" -W -f example.ldif

./ldapsearch -x -b '' -s base '(objectclass=*)' namingContexts

@raghujuluri

Thursday, July 25, 2013

Cloudera Sentry Overview | Cloudera Sentry Basics

Friday, July 19, 2013

PATCH : HTTP Method RFC 5789

Friday, July 05, 2013

OPEN LDAP STEP BY STEP INSTALLATION ON LINUX

Blog Archive

About Me

Popular Posts