Posts Tagged hibernate

Storing Large Data – Yay! Streaming !!!

The information below is written for Postgres 8.3, Spring 2.0 and Hibernate 3.0

Dealing with large data in database has always been an issue. There are various approaches to store and fetch large data in database. The simple and basic one is storing data as bytea and retrieving it as it is needed. The problem in this approach is that, because the size of the data is too large, loading it in a byte array as it is needed would cause an OutOfMemory error sooner or later.
As far as I know there are two ways of avoiding this problem but both ways have also their own limitations.

First one is storing large data as bytea in database and fetching it via stream. The downside of this method is a column type of bytea can hold up to 1 GB of binary data.

The first thing to do is to find out a way to read large data via input stream. By using jdbc it can be easily done but in Hibernate it is a bit tricky.

Let’s start with defining the entity class and its mapping file.

package streaming.entity;

import streaming.BlobStream;

public class Foo {
	private String id;
	private BlobStream data;
 //setter - getter
}
package streaming.type;
import java.io.InputStream;

public class BlobStream {
	private InputStream stream;
	//length of the data to be read
	private int length;
}

Here BlogStream is neither an entity nor a type class. It is just a simple class which is created to hold the inputstream and the length together.

the mapping file for Foo entity would be like this :

<hibernate-mapping package="streaming.entity">
  <class name="Foo">
     <id column="id" length="36" name="id">
      <generator class="uuid"/>
    </id>
    <property name="data"  type="CustomBlobType">
         <column name="data" sql-type="bytea"></column>
    </property>
  </class>
</hibernate-mapping>

As you may notice hibernate type of the field data is defined as “CustomBlobType” whereas sql_type is defined as bytea.
The reason for doing this is we want to create data column as bytea but we also want CustomBlobType class to be responsible for reading and writting to this column.

FYI : If this sql_type was not defined Hibernate would create this column as an OID which is basically a long value references to the actual Large Object. But Spring’s DefaultLobHandler does not support handling large objects in postgres 8.3 as it calls ps.setBinaryStream(paramIndex, binaryStream, contentLength); and bytea is tried to be inserted into a long type column. So this is a trick which forces Hibernate to create the inputStream type field as bytea in database.
Actually according to the Postgres documentation the releases prior to 7.2 ps.setBinaryStream and ps.getBinaryStream methods operated on the oid data type associated with large objects but with 7.2 release it has changed to bytea. Probably this is the reason behind the failure of DefaultLobHandler class on handling large objects for postgres 8.3.

After this has done, CustomBlobType class has to be created and it should look like this :

package streaming.type;

import java.io.InputStream;
import java.sql.PreparedStatement;
import java.sql.ResultSet;
import java.sql.SQLException;
import java.sql.Types;

import streaming.type.BlobStream;

import org.springframework.jdbc.support.lob.LobCreator;
import org.springframework.jdbc.support.lob.LobHandler;
import org.springframework.orm.hibernate3.support.AbstractLobType;

public class CustomBlobType extends AbstractLobType {

	//called when the column is to be retrieved from db
	protected Object nullSafeGetInternal(ResultSet rs, String[] names, Object owner, LobHandler lobHandler)
					throws SQLException {

		InputStream is = lobHandler.getBlobAsBinaryStream(rs, names[0]);

		return new BlobStream(is);
	}

	//called when the data is to be stroed in db
	protected void nullSafeSetInternal(PreparedStatement ps, int index, Object value, LobCreator lobCreator)
					throws SQLException {
		if (value != null) {
			BlobStream blob = (BlobStream) value;
			lobCreator.setBlobAsBinaryStream(ps, index, blob.getStream(), blob.getLength());

		} else {
			lobCreator.setBlobAsBytes(ps, index, null);
		}
	}

	/**
	 * defines what type of object that this class returns or gets as a a parameter
	 * when the above methods are called.
	 *
	 * @return Class
	 */
	public Class returnedClass() {
		return BlobStream.class;
	}

	/**
	 * Return the SQL type codes for the columns mapped by this type. The codes
	 * are defined on <tt>java.sql.Types</tt>.
	 *
	 * @see java.sql.Types
	 * @return int[] the typecodes
	 */
	public int[] sqlTypes() {
		return new int[] { Types.BLOB };
	}
}

So each time our Foo object is persisted “nullSafeSetInternal” method is called for the field “data” by the hibernate process. Within this method “lobCreator” saves the inputstream as a binary stream on to the statement object.
If “DefaultLobHandler” is used then basically what lobCreator does here is like that :

	ps.setBinaryStream(paramIndex, binaryStream, contentLength);

DefaultLobHandler is a handler which provides support for handling large objects with different database types and it has to be added to sessionFactory as lob handler which can be done simply like this :

 <bean id="defaultLobHandler" class="org.springframework.jdbc.support.lob.DefaultLobHandler" />
 <bean id="mySessionFactory" class="org.springframework.orm.hibernate3.LocalSessionFactoryBean" >
		<property name="lobHandler" ref="deafultLobHandler"/>
		<property name="dataSource" ref="dataSource" />
		<property name="mappingLocations" ref="mappingLocations"/>
		<property name="hibernateProperties">
		...
		</property>
</bean>

One last thing is defining “CustomBlobType” as a Hibernate type which can be done simply by using typedef element.

<typedef name="CustomBlobType" class="streaming.type.CustomBlobType"></typedef>

Second way of stroring large data is using postgres Large Object. The good thing is there is not a file limit because the data that you store in your table is just a reference to the original data. However when the data needs to deleted, deleting the row is not enough, a special deleting process has to be done. Despite all this to implement Large Object handling for postgres in Spring we need a new handler but defining a new handler would break the database independence of the code as you end up definining certain LobHandlers for certain database types.

Considering all these issues if you are determined to use Large Objects, you can define Postgres Lob Handler like this :

package streaming.handler;

import java.io.IOException;
import java.io.InputStream;
import java.io.Reader;
import java.sql.Connection;
import java.sql.PreparedStatement;
import java.sql.ResultSet;
import java.sql.SQLException;
import java.sql.Statement;

import org.apache.commons.dbcp.PoolableConnection;
import org.apache.commons.logging.Log;
import org.apache.commons.logging.LogFactory;
import org.postgresql.PGConnection;
import org.postgresql.jdbc3.Jdbc3Connection;
import org.postgresql.largeobject.LargeObject;
import org.postgresql.largeobject.LargeObjectManager;
import org.springframework.jdbc.support.lob.AbstractLobHandler;
import org.springframework.jdbc.support.lob.LobCreator;
import org.springframework.jdbc.support.nativejdbc.NativeJdbcExtractor;

public class PostgresLobHandler extends AbstractLobHandler {

	protected final Log logger = LogFactory.getLog(getClass());

	private NativeJdbcExtractor nativeJdbcExtractor;

	/**
	 * @param nativeJdbcExtractor
	 *            the nativeJdbcExtractor to set
	 */
	public void setNativeJdbcExtractor(NativeJdbcExtractor nativeJdbcExtractor) {
		this.nativeJdbcExtractor = nativeJdbcExtractor;
	}

	public InputStream getBlobAsBinaryStream(ResultSet rs, int columnIndex) throws SQLException {

		InputStream is = null;
		try {
			PGConnection pgConn = getPostgresConnection(rs.getStatement());
			if (pgConn != null) {
				// Get the Large Object Manager to perform operations with
				LargeObjectManager lobj = pgConn.getLargeObjectAPI();

				long oid = rs.getLong(columnIndex);
				LargeObject obj = lobj.open(oid, LargeObjectManager.READ);
				is = obj.getInputStream();
			}

		} catch (ClassNotFoundException e) {
			this.logger.error("Error", e);
		}
		return is;
	}
	//other methods

	/**
	 * Retrieve the underlying PGConnection, using a NativeJdbcExtractor if set.
	 */
	protected PGConnection getPostgresConnection(Statement st) throws SQLException, ClassNotFoundException {

		PGConnection pgConn = null;
		Connection conn = (this.nativeJdbcExtractor != null) ? this.nativeJdbcExtractor
						.getNativeConnectionFromStatement(st) : st.getConnection();

		if (conn instanceof PoolableConnection) {
			PoolableConnection poolConn = (PoolableConnection) conn;
			Connection innermostDelegate = poolConn.getInnermostDelegate();
			if (innermostDelegate instanceof PGConnection) {
				pgConn = (PGConnection) innermostDelegate;

			}
		}
		return pgConn;
	}

	protected class PostgresLobCreator implements LobCreator {

		public void setBlobAsBytes(PreparedStatement ps, int paramIndex, byte[] content) throws SQLException {

			ps.setBytes(paramIndex, content);
			if (PostgresLobHandler.this.logger.isDebugEnabled()) {
				PostgresLobHandler.this.logger.debug(content != null ? "Set bytes for BLOB with length "
								+ content.length : "Set BLOB to null");
			}
		}

		public void setBlobAsBinaryStream(PreparedStatement ps, int paramIndex, InputStream binaryStream,
						int contentLength) throws SQLException {

			try {
				// All LargeObject API calls must be within a transaction block
				PGConnection pgConn = getPostgresConnection(ps);
				if (pgConn != null) {

					// Get the Large Object Manager to perform operations with
					LargeObjectManager lobj = pgConn.getLargeObjectAPI();

					// Create a new large object
					long oid = lobj.createLO(LargeObjectManager.READ | LargeObjectManager.WRITE);

					// Open the large object for writing
					LargeObject obj = lobj.open(oid, LargeObjectManager.WRITE);

					// Copy the data from the file to the large object
					byte buf[] = new byte[2048];
					int s, tl = 0;
					while ((s = binaryStream.read(buf, 0, 2048)) > 0) {
						obj.write(buf, 0, s);
						tl += s;
					}

					// Close the large object
					obj.close();
					ps.setLong(paramIndex, oid);
				}
			} catch (ClassNotFoundException e) {
				PostgresLobHandler.this.logger.error("Error", e);
			} catch (IOException ex) {
				PostgresLobHandler.this.logger.error("Error", ex);
			}

		}

		// other methods

		public void close() {
			PostgresLobHandler.this.logger.debug("Closing DefaultLobCreator");
			// nothing to do here
		}
	}
}

to enable PostgresLobHandler to handle Large Objects

 <bean id="postgresLobHandler" class="streaming.handler.PostgresLobHandler" />
 <bean id="mySessionFactory" class="org.springframework.orm.hibernate3.LocalSessionFactoryBean" >
		<property name="lobHandler" ref="postgresLobHandler"/>
		<property name="dataSource" ref="dataSource" />
		<property name="mappingLocations" ref="mappingLocations"/>
		<property name="hibernateProperties">
		...
		</property>
</bean>

Now we don’t have to force Hibernate to create the column as bytea so the mapping xml will be like this :

<hibernate-mapping package="streaming.entity">
  <class name="Foo">
     <id column="id" length="36" name="id">
      <generator class="uuid"/>
    </id>
    <property name="data"  type="CustomBlobType"/>
  </class>
</hibernate-mapping>

References :
http://turgaykivrak.wordpress.com/tag/spring/
http://www.postgresql.org/docs/7.2/static/jdbc-binary-data.html

Leave a Comment

Property Lazy Fetching

Property lazy fetching can be really useful especially when you have a field which is binary data and you don’t want to fetch this property every time the object is loaded. Lets say you have an entity class Foo as below.

package entities;
public class Foo {
	private String id;
	private String filename;
	private byte[] filedata;
}

and the mapping file is as follows

<hibernate-mapping package="entities">
        <class name="Foo">
		<id name="id" column="id" length="36">
			<generator class="uuid" />
		</id>
		<property name="filename" />
		<property name="filedata" lazy="true"/>
	</class>
</hibernate-mapping>

Here as you may have noticed filedata property is marked as lazy=”true”. However this attribute is ignored by Hibernate unless the class is bytecode instrumented. Therefore to enable lazy fetching for properties the classes must be instrumented. Luckily there is an ant task for this and if you use maven for the build process you can call it simply by calling maven-ant-plugin.

   <plugin>
      <artifactId>maven-antrun-plugin</artifactId>
      <executions>
           <execution>
               <id>instrument</id>
                   <phase>compile</phase>
                       <configuration>
                        <tasks>
                           <echo>Instrumenting classes</echo>
                           <taskdef name="instrument" classname="org.hibernate.tool.instrument.cglib.InstrumentTask">
                               <classpath>
                                   <path refid="maven.dependency.classpath" />
                                   <path  refid="maven.plugin.classpath" />
                               </classpath>
                           </taskdef>
                          <instrument verbose="true">
                             <fileset dir="${project.build.outputDirectory}">
                                <include name="**/entities/Foo.class" />
                             </fileset>
                          </instrument>
                    </tasks>
                    </configuration>
                    <goals>
                            <goal>run</goal>
                    </goals>
               </execution>
           </executions>
           <dependencies>
               <dependency>
                       <groupId>org.hibernate</groupId>
                       <artifactId>hibernate</artifactId>
                       <version>3.2.1.ga</version>
               </dependency>
          </dependencies>
          </plugin>

Comments (1)