Leveraging On-Site Citation and Family Functionality
- 1 Introduction
- 2 Preparation
- 2.1 Sequential Loading
- 2.2 Bulk Loading
- 3 Usage
- 3.1 Citations
- 3.2 Family
- 3.2.1 Family Meta
- 3.2.2 Family Citations
Introduction
With the deployment of the cumulative patch alexandria-sql-patch-alpa-3636-20191101, on-site CLAIMS Direct installations now have the ability to utilize family and citation functionality in-house. What was previously only available using the remote CLAIMS Direct shared API is now possible internally with simple SQL functions. The following post will outline the steps required to prepare the data tables as well as presenting a brief walk-through of the functionality.
Preparation
Note: If your initial load occurred after May 13, 2020, these tables were included in the initial load and you do not need to follow the preparation instructions here. However, to maintain the supplemental tables required for on-site family and citation functionality, you must add --api, e.g., CD_OTHER_OPTIONS='--api' to the apgupd configuration file.
Sequential Loading
In order to leverage the new functionality, the necessary lookup tables need to be populated on-site. The functions available to achieve this are:
cdws.f_update_cited_documents– serving forward and backward citationscdws.f_update_priority_documents– serving simple and extended family
Each function takes as input parameter a modified-load-id. Therefore, to populate each table, each modified-load-id from the xml.t_patent_document_values table needs to be processed. The following pseudo-SQL will serve as an example.
DECLARE v_load_id integer;
BEGIN
FOR v_load_id IN
SELECT modified_load_id
FROM xml.t_patent_document_values
GROUP BY modified_load_id
ORDER by modified_load_id LOOP
PERFORM cdws.f_update_cited_documents( v_load_id );
PERFORM cdws.f_update_priority_documents( v_load_id );
END LOOP;
END;Although the above SQL exemplifies the fundamental logic involved, it isn't the most efficient method. To that end, we are offering a supplemental patch that installs bulk loading functions to populate these tables: alexandria-sql-patch-alpa-3636-x-20191215. In addition to adding bulk loading functions, this patch expands some of the citation and family functionality not included in the original alexandria-sql-patch-alpa-3636-20191101 patch, namely:
FUNCTION
cdws.f_family_citations_backward(v_family_id integer)
Returns backward citations for an entire familyFUNCTION
cdws.f_family_citations_forward(v_family_id integer)
Returns forward citations for an entire familyFUNCTION
cdws.f_family_meta(integer)
Returns bibliographic data for an entire family
For a full enhancement listing, please see the README included in the package.
Bulk Loading
After the supplemental patch is installed, 2 new functions for bulk loading are exposed.
FUNCTION
cdws.f_update_cited_documents_bf()FUNCTION
cdws.f_update_priority_documents_bf()
These new functions bulk-load the auxiliary tables and are called simply as:
SELECT cdws.f_update_cited_documents_bf();
SELECT cdws.f_update_priority_documents_bf();Each function requires at least 24 hours to complete and may take up to 7 days, depending on the environment. They can be run in parallel if desired. Because of the duration required to complete the functions, it's important that the SQL runs uninterrupted by server restarts or terminal disconnects.
Important
In order to expedite the completion of these functions, it is required that apgupd be paused. Optionally upon completion, executing an ANALYZE on the auxiliary tables is recommended:
ANALYZE VERBOSE cdws.t_cited_documents;
ANALYZE VERBOSE cdws.t_applications;
ANALYZE VERBOSE cdws.t_priority_documents;Critical
In order to automatically populate these auxiliary tables every new load-id, you must use the --api flag to apgupd. If this parameter isn't available with your version of Alexandria-Client-Tools, you will need to update to the latest version. In versions 2.5.3 and above, this parameter is set in /etc/alexandria/apgupd.conf, as a CD_OTHER_OPTIONS. Please see Client Tools documentation for configuration details.
To confirm that the citation and family functionality have completed loading, locate the log files (citation.log and family.log). The last line of each log should show an entry in the following format:
INFO: added N new citations records from the work table (time)
INFO: added N new priority records from the work table (time)To check whether the calculations are still running, use one of the two SQL queries below while in the database or using pgAdmin.
SELECT datname, query FROM pg_stat_activity;
SELECT * FROM pg_stat_activity;Usage
Once the auxiliary table loading is complete, a wide variety of citation and family functionality is available directly from the on-site CLAIMS Direct instance.
Citations
Bringing the CLAIMS Direct API methods citations/forward and citations/backward on-site is achieved using the new functions:
FUNCTION
cdws.f_citations_backwardFUNCTION
cdws.f_citations_forward
Each function takes as input a publication_id and returns 2 columns: ucid and source.
SELECT * FROM cdws.f_citations_backward( xml.f_ucid2id('US-5551212-A') );
ucid | source
---------------+--------
US-4905451-A | SEA
US-2043419-A | SEA
US-4631900-A | SEA
DE-8002686-U1 | SEA
EP-0225665-A1 | SEA
US-3187480-A | SEA
US-4828110-A | SEA
DE-3138439-A1 | SEA
EP-0061805-A1 | SEA
DE-3606826-A1 | SEA
US-4546875-A | SEA
US-4534151-A | SEA
EP-0313721-A2 | SEAOf course, GROUP BY and ORDER BY are available, e.g., using substring, group forward citations by country:
SELECT substring(ucid, 1, 2) AS country, count(*) AS ncites
FROM cdws.f_citations_forward( xml.f_ucid2id('US-5000000-A') )
GROUP BY country
ORDER BY ncites DESC;
country | ncites
---------+--------
US | 264
WO | 31
EP | 27
AU | 1
KR | 1
Family
Bringing CLAIMS Direct API methods family/simple and family/extended functionality to the on-site install is accomplished with the functions:
FUNCTION
cdws.f_family_simpleFUNCTION
cdws.f_family_extended
Each of these functions, as with the citation functions above, take a publication_id as input. The return columns are ucid and family-id. The extended function returns additional publication_id and level columns.
SELECT * FROM cdws.f_family_simple( xml.f_ucid2id('US-5000000-A') );
ucid | family_id
------------------+-----------
AR-242634-A1 | 26932266
CA-1335430-C | 26932266
EP-0431047-A1 | 26932266
HU-T60328-A | 26932266
JP-H05502366-A | 26932266
KR-900702042-A | 26932266
WO-1990002193-A1 | 26932266
SELECT * FROM cdws.f_family_extended( xml.f_ucid2id('US-5551212-A') );
publication_id | ucid | level
----------------+----------------+-------
71329085 | US-5551212-A | 0
302369 | AT-96388-T | 1
1220699 | AU-636238-B2 | 1
2293384 | AU-8174591-A | 1
5137694 | CA-2050271-A1 | 1
5137693 | CA-2050271-C | 1
14883151 | DE-4027762-C1 | 1
17156127 | DE-59100535-D1 | 1
93003830 | DK-0477487-T3 | 1
21052692 | EP-0477487-A1 | 1
21052691 | EP-0477487-B1 | 1
25818932 | ES-2046827-T3 | 1
128970015 | JP-H054613-A | 1
131336800 | JP-H0825524-B2 | 1
59117993 | NO-913030-D0 | 1
177975213 | NO-913030-L | 1
62462393 | TR-25401-A | 1
159577685 | ZA-9106183-B | 1
A second optional argument in the cdws.f_family_extended function, v_include_apps Boolean, can be used to include application filing data from the patent office. The default setting of this argument is false in order to maintain consistency with the EPO. In some cases, however, application filing data is not included in the priority claims data. Setting this argument to true will include the application filing data, as in the following example.
SELECT ucid
FROM cdws.f_family_extended(xml.f_ucid2id('AU-2012201497-B2'), true ) as f
WHERE f.ucid = 'WO-2008136115-A1';
ucid
------------------
WO-2008136115-A1
(1 row)Additional Functions
In addition to the above, there are other functions in the supplemental patch worth exploring, namely:
FUNCTION
cdws.f_family_meta( v_family_id integer )
Returns a variety of bibliographic and legal status informationFUNCTION
cdws.f_family_citations_backward(v_family_id integer)
Returns backward citations for an entire familyFUNCTION
cdws.f_family_citations_forward(v_family_id integer)
Returns forward citations for an entire family
Family Meta
The cdws.f_family_meta function brings a variety of information together for a simple family. The following columns are returned:
family-id - DOCDB-assigned simple family integer identifier
ucid – unique character identifier (publication)
published – publication date
anucid – unique character identifier (application)
filed – application filing date
title – title of document
ifi_status – IFI Snapshot document status
ifi_anticipated_expiration - IFI anticipated expiration date
ifi_adjusted_expiration - IFI adjusted expiration date
epo_legal_status_xml – XML representation of EPO legal status
An abbreviated example:
SELECT ucid, published, anucid, filed, title FROM cdws.f_family_meta( 26932266 );
ucid | published | anucid | filed | title
------------------+-----------+----------------+----------+------------------------------------------------------------------------------------------
AR-242634-A1 | 19930430 | AR-31479889-A | 19890830 | ETHANOL PRODUCTION BY GENETICALLY ENGINEERED ESCHERICHIA COLI STRAINS
CA-1335430-C | 19950502 | CA-609829-A | 19890830 | ETHANOL PRODUCTION BY ENGINEERED MICROBES
EP-0431047-A1 | 19910612 | EP-89909966-A | 19890830 | ETHANOL PRODUCTION BY GENETICALLY ENGINEERED ESCHERICHIA COLI STRAINS
HU-T60328-A | 19920828 | HU-577189-A | 19890830 | PROCESS FOR PRODUCING ETHANOL WITH TRANSFORMED MICROORGANISMS
KR-900702042-A | 19901205 | KR-900700901-A | 19890830 | 유전공학처리 미생물에 의한 에탄올 생성
US-5000000-A | 19910319 | US-35206289-A | 19890830 | Ethanol production by Escherichia coli strains co-expressing Zymomonas PDC and ADH genes
WO-1990002193-A1 | 19900308 | US-8903753-W | 19890830 | ETHANOL PRODUCTION BY GENETICALLY ENGINEERED ESCHERICHIA COLI STRAINSFamily Citations
Forward and backward citations for an entire family are also available directly inside CLAIMS Direct.
SELECT * FROM cdws.f_family_citations_backward( 26932266 );
ucid
------------------
EP-0047641-A2
WO-1986001833-A1
WO-1986004357-A1Again, GROUP BY and ORDER BY are available.
SELECT substring(ucid, 1, 2) AS country, count(*) AS ncites
FROM cdws.f_family_citations_forward( 26932266 )
GROUP BY country
ORDER BY ncites DESC;
country | ncites
---------+--------
US | 290
WO | 36
EP | 35
JP | 2
FR | 1
KR | 1
AU | 1