Issue compiling forms

Services appear to be up and everything else on the system appears to be working, but when I try compile forms, I get an error:

image

Is there a suggested place to start looking? Specific log files perhaps? Apologies if this should be in the users forum, I assume it's a server / config issue.

Thanks!

Hi Ed,

My apologies! There's a known issue that we're working on fixing that causes Web Apps, App Preview, and a few other features to be disabled for a brief period, whenever we install updates to the system. Giving the timing of your message, which coincided with such an update on our side, it appears likely that you encountered that issue. It gives me no pleasure to say this, but I think if you tried again now (or a minute after you took that screenshot) you should be able to proceed without issue.

As I mentioned we are actively working on making this a much rarer occurance.

Best,
Danny

Hi Danny, apologies if I was a bit vague. I should probably have mentioned that this is my own hosted instance of CommCareHQ in a monolith config. This issue has been present for just over a week though I've had something else to deal with and have now looped back to this issue that was present on a build from a week back.

Which logs are the best to inspect for this issue?

Thanks!
Ed

Hi Ed,

Aha! Well then. I probably should have guessed that from all the questions you've asked previously (and the fact that this is the Developer forum).

The Formplayer service, when it's down, causes the issues I talked about in my initial post (in Web Apps, App Preview, and a few other places in the App Builder), so I would start there.

If you want to just see if it is up, you can use

commcare-cloud production service formplayer status

and if it's down you can use ... start or if it says it's up but you don't believe it you can try ... restart and see if that brings it back up. If those simple "turn it off and on again" suggestions don't work, the next place to look is the formplayer log files. To see where those are hosted check the output of

commcare-cloud production service formplayer logs

and then SSH into your machine (or if you have multiple, the formplayer machine) where you'll find the logs at the path in the output of that command.

Cheers,
Danny

Thanks Danny, I'll check the formplayer logs. The service is definitely reporting that it's up. The other thing that comes to mind is that we transferred the forms from an instance where the CommCare version is 2 builds earlier, and I guess it's feasible that there's some compatibility issue though I doubt it. I'll downgrade and see if it makes any difference if the logs don't throw hints.
Thanks!
Ed

OK, it seems the issue relates to the TLS certificate I've used. The formplayer error is:

org.springframework.web.client.ResourceAccessException: I/O error on POST request for "https://mysite.org/hq/admin/session_details/": sun.security.validator.ValidatorException : PKIX path building failed: sun.security.provider.certpath.SunCertPathBuilderException: unable to find valid certification path to requested target; nested exception is javax.net.ssl.SSLHandshakeException: sun.security.validator.ValidatorException: PKIX path building failed: sun.security.provider.certpath.SunCertPathBuilderException: unable to find valid certtification path to requested target

The question is, is this related to the certificate or the ansible script that deploys the certificate. It is possible that the certificate issuer is not trusted by the JVM.

OK, something interesting. The certificate is accepted by chrome but Firefox says it's an unknown issuer... I can safely assume this is the case with the JVM I think. I'll look into the cert files to ensure everything is present, otherwise I guess we may have to import the file into the jvm trust store as per this:

If you've used commcare-cloud's option to set you up with a TLS certificate using letsencrypt, then that certificate authority should be recognized by formplayer without issue, so long as the certificate has not expired. Following the instructions given in http://dimagi.github.io/commcare-cloud/howto/enable-https.html should, in addition to setting up a valid letsencrypt cert, create a cron job that will update the certificate when need it to keep your TLS cert perpetually valid. If you're using another method to generate your cert, then I like you said you'll have to dig into the details to figure out what about it formplayer is not accepting. Overall, I can verify that formplayer does need to make requests to your commcare site in order to work. (Even though it can be thought of as part of the commcare site, the request does go through the site's "front door" rather than an internal route.)

I'd also like to highlight that if "https://mysite.org/hq/admin/session_details/" was pasted directly from the output then that is almost certainly the problem (since I'm assuming your site isn't hosted at mysite.org). If you just replaced the name of your site with "mysite.org" to make the message more anonymous and general, then of course you can ignore this comment.

Cheers,
Danny

I used the client's own certificate signed by their preferred authority: https://www.incommon.org/
I followed instructions here: TLS certificates - #3 by erobinson which I have used before successfully, however this time it's a different CA for the certificate. I did manage to import the CA's certificate into the java store and resolved the earlier formplayer error in the log relating to the certificate, however, I'm still receiving the error in the browser when compiling the forms (Cannot create menus - unable to validate the forms due to a server error).

Back to the formplayer log file, I've also noticed this error on formplayer startup:

2019-09-13 19:29:37.973 WARN 13554 --- [ main] org.hibernate.orm.url : HHH10000002: File or directory named by URL [file:/home/cchq/www/monolith/releases/2019-09-12_12.09/formplayer_build/formplayer__2019-08-21_13.29/libs/formplayer.jar!/BOOT-INF/classes] could not be found. URL will be ignored
java.io.FileNotFoundException: /home/cchq/www/monolith/releases/2019-09-12_12.09/formplayer_build/formplayer__2019-08-21_13.29/libs/formplayer.jar!/BOOT-INF/classes (No such file or directory)

the file appears to be there, however:
-rw-r--r-- 1 cchq cchq 63402693 Sep 12 12:36 /home/cchq/www/monolith/releases/2019-09-12_12.09/formplayer_build/formplayer__2019-08-21_13.29/libs/formplayer.jar
And within the archive, the /BOOT-INF/classes directory exists as well...

Opening my app in the browser produces these log entries in formplayer's log:

2019-09-13 19:50:34.793 INFO 13554 --- [nio-8181-exec-6] application.Application : Got request URL: http://zdip.itech-zimbabwe.org//delete_application_dbs , response code: 200

2019-09-13 19:50:34.796 WARN 13554 --- [nio-8181-exec-6] io.sentry.dsn.Dsn : *** Couldn't find a suitable DSN, Sentry operations will do nothing! See documentation: https://docs.sentry.io/clients/java/ ***

2019-09-13 19:50:34.797 INFO 13554 --- [nio-8181-exec-6] aspects.LockAspect : Obtained lock for username erobinson@projectbalance_com

2019-09-13 19:50:34.798 INFO 13554 --- [nio-8181-exec-6] aspects.LoggingAspect : Request to delete_application_dbs with bean DeleteApplicationDbsRequestBean with appId=dd97d028373e4f5ba989718bcd4399f8, parent Authenticated request bean wih username=erobinson@projectbalance.com, domain=zdip, restoreAs=null

2019-09-13 19:50:34.798 INFO 13554 --- [nio-8181-exec-6] aspects.LoggingAspect : Request to delete_application_dbs returned result NotificationMessage message=Successfully cleared application database for dd97d028373e4f5ba989718bcd4399f8, isError=false

2019-09-13 19:50:34.798 INFO 13554 --- [nio-8181-exec-6] aspects.LockAspect : Relinquished lock for username erobinson@projectbalance_com

Clicking the make new version button produces no log entries in the formplayer log but in the front end this error:
image

The nginx_access log shows this when clicking the make new version button:

41.150.129.203 - - - - - [13/Sep/2019:20:03:32 +0000] "POST /a/zdip/apps/save/dd97d028373e4f5ba989718bcd4399f8/ HTTP/2.0" 200 63 294 "Log In :: CommCare HQ - CommCare HQ" "Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:69.0) Gecko/20100101 Firefox/69.0"

For reference, I don't believe this error is what's causing the formplayer compile issue as I have seen it on our QA server which has a month old build on it which is compiling fine.

I've tried using different CommCare builds but still receiving this error when making new version:
image

I've tried a few commcare builds from build no 459133 - version 2.47.1 to build no 459153 - version 2.47.4 and no go.

Are there any other suggested logs to look at to see what might be causing the error?

I'd definitely check the main web process logs. You can see where those are stored by running

$ cchq production service commcare logs

1 Like

Thank you Ethan - this is now resolved. I fixed it over the weekend but didn't get an opportunity to update my thread as I have been travelling.
In the main Django logs, there was an entry that alluded to an issue with the certificate we had installed. In short, I was provided with the certificate in PEM and PKCS7 formats. The PEM certificate was only the site certificate but I believe Nginx should include the site certificate, the intermediate and the CA certificate concatenated into one .PEM file. I converted the PKCS7 cert to PEM format and used that as the site certificate which resolved the issue Formplayer was having interacting with the main app.

It's working well now.