Optimized processing of your ‘not-so-large’ Large Objects (LOBs) with Value LOBs

Text Size 100%:

What are large objects (LOBs), and why store them in a database?

Looking at the SQL standard, you will find that “.. A large object [character binary ] string is of variable length, up to some implementation-defined maximum that is probably greater than that of other [character|binary] strings. ..”. Short, it’s data that is too large to fit into common data types like VARCHAR2 or RAW that ends up in LOBs. Think of images, PDF documents, movies, or large JSON or XML files. 

So why should I store such large data in a database, not a file system or an object store? There are multiple reasons for this.

  • Metadata. Have you ever tried to find a specific image on your computer just by looking at the file names? There is a high likelihood you haven’t. Large objects always have metadata, such as an employee photo associated with all other employee data. It’s much easier to find someone’s picture (or display it next to the employee’s phone number) than to decipher the employee’s information from a picture name.
  • Security. A lot of the information in a large object can be sensitive. Just like with regular columns, only some people are allowed to see everything, and you want to control the access to this information with the same fine-grained controls (access, DML, encryption, auditing) you have for your other data. Just think about medical records, financial documents, or resumes.
  • Consistency. Storing such information inside a database provides consistency (e.g., you don’t want to delete a photo of an existing employee) and prevents stale data. Also, operational aspects like backup, recovery, or replication are greatly simplified when you store large objects with their metadata.

Working with LOBs

LOBs stored in the database will eventually be read (and potentially updated) by an application. But LOBs aren’t always equal. How large is large? With Oracle, when you have more than 32k bytes, you can no longer store the data in a VARCHAR2. But is 32k plus one byte large? Compare this with the size of a 3-hour-long 4k movie.

Also, there are different ways an application consumes such information; if you are watching your 3-hour movie that streams from a database, you want your application to read this movie in chunks, be able to stop the streaming process by hitting the pause button and resume the video where you stopped it. Or you want to watch the funny scene over and over again. 

You want to display an employee’s thumbnail picture next to the employee’s information. In that case, however, you’ll read the image once as a whole and treat it similarly to the rest of the information displayed.  

Yes, it’s really common

Have you ever visited stackoverflow.com and looked closely at the information presented to you? Without knowing the implementation behind this page, let’s use it as an example for medium-sized objects; let’s call those MOBs.

  • Do you see the profile pictures of the accounts that answered a question? If stored in the database, these little icons could be MOBs and tied to an account’s profile information.
  • Do you see long answers abbreviated and ending with ‘…’? The answers might be MOBs shortened with an SQL SUBSTR() operator for display purposes.

That’s just another example. Look around, and you will see many more of these MOBs.

While Oracle’s LOB interfaces are versatile and support all of these use cases, this blog emphasizes a new and exciting optimization in Oracle Database 23ai specifically for the use case for MOBs, the ‘not-so-large’ large objects: Value LOBs.

What are Value LOBs?

Value LOBs are a new kind of LOB specifically built for the read-and-forget use case, like in our employee picture example before. You store medium-sized objects, about a few megabytes maximum in size, and you just want to read the LOB value in the context of an SQL query. The LOBs are used as large VARCHAR or RAW data types because the idea is to produce a result, consume it, and drop it. In that context, one could say that the LOB is disposable because it is no longer needed after consuming it. 

Value LOBs offer three main advantages over the classical handling of LOBs for extremely large objects through reference pointers (the classical LOB architecture with a LOB Locator and LOB Value is described in the doc).

  1. Better performance. The performance of Value LOBs is measurably better than classical reference LOBs and close to the performance of classical datatypes.
  2. Optimized memory management. A Value LOB’s memory is managed transparently as part of the requesting SQL query.
  3. Easy to use and implement. A Value LOB is defined as part of the table metadata.

Value LOBs have the following properties:

  • Value LOBs are read-only. Write operations are not allowed for value LOBs. Remember the read-and-forget model?
  • Value LOBs only exist in the context of a SQL query for fetch duration. The LOB data read in the previous fetch is “freed” and overwritten, and as soon as the next fetch is made, a cursor is closed or re-executed. You don’t have to worry about memory management or free objects, so you will never be able to make the mistake of memory leaking, not freeing the memory on the database side allocated to a reference LOB. With Value LOBs, you will never keep memory allocated for something you don’t use anymore. Oracle automatically takes care of it.

Using Value LOBs

Let’s return to our fictive stackoverflow example and put value LOBs into action.

If you start with Oracle Database 23ai and want to take advantage of Value LOBs for your profile management at table creation time. It’s as simple as 1-2-3; define the picture column as Value LOB:

create table users (
user_id identity,
username varchar2(64), ...
user_picture clob)
lob (user_picture) query as value;

Is your application migrated to Oracle Database 23ai? No problem either; let’s modify the table and its LOB type. Not that it does not change any data structure on disk. It only changes how you deal with LOBs in your application.

alter table users modify lob(user_picture) query as value;

You could also do the reverse, e.g.

alter table users modify lob(user_picture) query as reference;

To check the actual status of your LOB column, you just check the data dictionary information. You can do this on the USER, ALL, or DBA level:

select table_name, column_name, value_based from user_lobs where table_name = 'USERS';

Your user comments are stored as Value LOBs in table answers, so you use an SQL operator with your Value LOB to shorten the answer snippet for the web display. An SQL operator produces the matching type of LOB as output based on the column definition. 

select substr(answer,1, 30)||’ …’ from answers;

Your column definition is a reference-based LOB. If you cannot change this, for whatever reason, just use the new SQL operator LOB_VALUE() to benefit from value LOBs at runtime:

select LOB_VALUE(substr(answer,1, 30))||' ...' from answers;

If your data is actually of type JSON, then you can use the enhanced JSON_SERIALIZE(). As of today, that’s the only operator with explicitly incorporated syntax for the new value LOB functionality.

select json_serialize(answer returning clob value) from answers;

With JSON being one of the most prominent MOBs we are seeing nowadays in the Oracle database, it was important for us to get this done on Day One of Oracle 23ai. That’s just the beginning. 

How to work with Value LOBs in programming languages

The documentation illustrates how to use the new value LOBs as compared to working with reference LOBs, using JAVA as an example. If you’re curious  explicitly specify the return type in the SQL statements to make the code snippets clear and to show what is what. You don’t need those unless you want to change the return type; the return type is automatically controlled by the underlying data type.

The explicit change between reference LOB and value LOB usage – c.free(), the call to free the memory is often forgotten. With classical reference LOBs, you need to explicitly free the memory. It does not look like a big difference, but it can be, especially when you use reference LOBs and forget it.Value LOBs, as read-and-forget LOBs, just do this for you. The cursor is closed, and you’re done.

Some performance data

This all might sound a bit abstract to you, so let’s quickly look at some performance numbers. There’s obviously always some variability, and it’s highly application and context-dependent. But it’s hopefully enough to give you an idea of the benefits that Value LOBs bring to the table.

The following tests were all using a single-column table with 32k JSON documents of different sizes, from 4k to 128k. The documents were fetched and ‘consumed’ in our application to mimic a real application. The ‘consumption’ in the application was lightweight and just to represent the data such as a JSON_SERIALIZE(). We wanted to measure the fetch-and-forget phase and spend as little time as possible working with the data. 

Comparison binary LOBS – Value versus Reference LOBS

performance graph value lob versus reference lob

Value LOBS outperform reference LOBs significantly for smaller values, mostly due to the measurable overhead of multiple roundtrips. The larger the LOBs become, the smaller the performance gap. The performance numbers for character LOBs are unilaterally slightly worse, due to the necessary character conversion between LOBs and 

Last but not least, one more word about prefetch:

The LOB prefetch parameter reduces the number of server round trips by prefetching part of the data and metadata along with the LOB locator during the fetch. Setting a large LOB prefetch size on the client side avoids round trips to the server and leads to a drastically improved performance for LOBs. For value LOBs, Oracle recommends setting the LOB prefetch size to accommodate at least 80% of your LOB data size. 

An easy way to send all LOB data along with the locator is to use the “oracle.jdbc.sendAllDataForValueLobs” connection property in JDBC or OCI_ATTR_VBL_SEND_ALL_LOB_DATA in OCI. When this property is set to true, all the LOB data will always be sent no matter the LOB prefetch size set.

Conclusion

The use cases for MOBs are manifold and everywhere. Value LOBs are closing an important gap in optimally working with these kinds of objects in any application without hassle or complexity, using SQL like for anything else. With the SQL-style fetch-and-forget and its optimized roundtrips for fetching, it’s worth investing in it. Your application performance will benefit, and your code will become (slightly) simpler and significantly more robust.

Use them to your advantage.

Let us know what you think. We are always looking for comments and feedback.

Hermann Baer

Senior Director Product Management

Hermann is a Senior Director of Product Management in the Oracle Database organization. He and his team focus on Oracle’s core functionality, such as Oracle Partitioning, the Oracle Optimizer, and analytical SQL. His team also looks after semi-structured data processing, such as the relational SQL/JSON capabilities, Oracle Text, and more recently Autonomous JSON Database and the Oracle Database API for Mongo DB.

Hermann has held several positions at Oracle in Sales Consulting and Consulting, dealing with the database for quite some time